<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Chakraborty);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Need-Oriented Environmental Knowledge Base and Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiroaki Shimoma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sudesna Chakraborty</string-name>
          <email>sudesna@it.aoyama.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takeshi Morita</string-name>
          <email>morita@it.aoyama.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aoyama Gakuin University</institution>
          ,
          <addr-line>Kanagawa</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Advanced Industrial Science and Technology</institution>
          ,
          <addr-line>Kotoku</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In Embodied AI, navigation agents using Large Language Models (LLMs) often rely on lengthy prompts that include extensive environmental information. This becomes increasingly problematic in complex environments such as the VirtualHome simulator, where incorporating all object data can reduce accuracy and increase computational costs. To address this, we propose a prompt compression technique based on a “need-oriented” environmental knowledge base. Our system first infers a user's underlying need from their natural language request using Murray's theory of human needs. It then retrieves only the objects relevant to that need from our knowledge base. A compressed prompt, containing only the user's request and the specific objects, is then sent to the LLM. The results showed this method significantly improves navigation accuracy while reducing prompt length compared to approaches that use all environmental data.</p>
      </abstract>
      <kwd-group>
        <kwd>need-oriented environmental knowledge base</kwd>
        <kwd>prompt compression</kwd>
        <kwd>navigation</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In Embodied AI [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], using Large Language Models (LLMs) for dialog-based navigation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a promising
approach. However, a key challenge is that embedding all environmental knowledge into prompts
makes them lengthy, increasing computational costs and reducing accuracy.
      </p>
      <p>
        To address this challenge, our study proposes a prompt compression technique for navigation system
in the VirtualHome (VH) simulator [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The method uses a novel “need-oriented” environmental
knowledge base, inspired by Murray’s theory of human needs, to select only the essential information
required for the LLM’s decision-making. This approach aims to improve inference performance and
reduce operational costs, creating more eficient and scalable LLM-based household agents.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <p>
        While earlier dialog-based navigation systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for the VH simulator [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], such as the one by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
relied on semi-automatically constructed knowledge graphs and manual dialog rules, recent studies
have shifted towards using LLMs for environmental perception and planning [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>A common practice in these recent systems is to embed all available environmental knowledge into
the LLM’s prompt. However, this approach has two major flaws: it can introduce irrelevant information
that harms the accuracy of the LLM’s output, and it becomes impractical and expensive with commercial
LLMs that have token-based pricing models.</p>
      <p>To address these issues, our study proposes an LLM-driven navigation system incorporating a
novel prompt compression technique. In contrast to previous navigation approaches that embed all</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
available environmental knowledge into prompts, our method leverages a need-oriented environmental
knowledge base inspired by Murray’s theory of human needs to selectively filter information. This
ensures that only task-relevant context is preserved while reducing token usage and computational
costs. For evaluation, our work adapts question data from the functional reasoning category of the
OpenEQA benchmark [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which involves commonsense reasoning highly relevant to navigation and
decision-making tasks in household environments.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Method</title>
      <p>The architecture of the proposed system is illustrated in Figure 1. It features a text-based dialog interface
for user interaction. The system initiates the process by prompting a LLM to assume the role of a
guide that infers user intent and facilitates a dialog. When a user issues a request involving guidance
to a specific room or object, the system employs the proposed method to present a set of candidate
destinations. An action script is then generated based on the selected destination, which is subsequently
executed to complete the navigation.</p>
      <p>
        The core of the proposed method lies within the “response generation” module of Figure 1. The
internal structure is outlined in Figure 2, consisting the following steps:
1. Constructing a need-oriented environmental knowledge base for the VH environment, based
on Murray’s theory of human needs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], associating each need with a relevent objects and
explanatory descriptions.
2. Capturing user requests through the dialog interface.
3. Inferring the user’s underlying need using a LLM.
4. Retrieving environmental objects from the knowledge base based on the inferred need.
5. Narrowing down to specific objects that satisfy the user’s request through further LLM inference.
6. Identifying rooms containing the selected object(s) and presenting room-object pairs as navigation
options.
      </p>
      <p>This multi-stage method centers on the creation and utilization of a “Need-Oriented Environmental
Knowledge Base.” Inspired by psychologist Henry Murray’s theory of human needs, the system selects
23 needs from the original 40, prioritizig those most relevent to household settings, such as “Thirst,”
“Sleep,” and “Order.” For each of these 23 needs, a mapping was created that associates the need with
corresponding actions, a detailed textual explanation, and a curated list of VH objects that can satisfy
that need. For instance, the “Thirst” need is linked to objects like “cup” and “faucet.” A separate
knowledge base records the object composition of each room, defining room–object relationships. To
ensure computational eficiency, both knowledge bases are encoded in a custom lightweight format,
rather than a standard knowledge graph representation like JSON-LD. This method minimizes the
number of tokens required when incorporating this information into LLM prompts.</p>
      <p>The full system is implemented with a Telegram bot interface. When a user sends a request, it initiates
a sophisticated, multi-step inference process managed by the LLM. The first step is to infer the user’s
underlying need from their textual request. To accomplish this, the system was tested with two distinct
prompt engineering strategies: one that prompts the LLM to use its general knowledge of Murray’s
theory, and in second, more verbose method that gives the LLM full textual descriptions of each need
for richer semantic context. Once the need is identified, the system proceeds to the crucial prompt
compression step: it retrieves only the objects associated with that specific need from the knowledge
base. For example, if a user states they are hot and thirsty, only relevant objects such as “air conditioner”
and “water glass” are selected, while filtering out hundreds of irrelevant items like “book” or “pillow.”
In the next step, a new prompt is generated, combining the original user request with the filtered object
list. The LLM is tasked with identifying the single most appropriate object to fulfill the user’s intent. In
the final step, the system uses its room-object knowledge base to identify the location of this target
object. The process concludes by presenting these identified room-object pairs to the user as candidate
destinations for navigation, efectively guiding them to their goal.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Evaluation</title>
      <sec id="sec-5-1">
        <title>4.1. Evaluation Overview</title>
        <p>In this experiment, we evaluated the accuracy of two key processes in our system (1) Inference of needs
Corresponding to User Requests and (2) Inference of Objects That Satisfy User Requests.</p>
        <p>We used GPT-4o (gpt-4o-2025-04-17) , provided by OpenAI, as the LLMs. Accuracy was assessed using
standard metrics: precision, recall, and F1-score. The evaluation was conducted using a custom dataset
derived from the OpenEQA. From the original 1,636 question-answer pairs in OpenEQA, we selected
93 navigation-related questions that require commonsense reasoning. Since the OpenEQA original
answers are not directly compatible with the VH environment, we conducted a questionnaire-based
survey to collect VH-compatible ground-truth annotations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Results and Discussion</title>
        <p>The evaluation results are summarized in Table 1 and Table 2. Table 1 presents the experimental results
for inferring needs corresponding to user requests. Table 2 compares the three prompting strategies in
terms of navigation accuracy, average number of prompt tokens, and token count variability (standard
deviation). The three strategies are defined as: (1) Murray’s Theory Description prompts the LLM to
infer user needs based solely on its internal understanding of Murray’s psychological theory. (2) Needs
and Explanations provides the LLM with explicit textual descriptions of each need. (3) All Environment
Knowledge includes the complete set of environmental knowledge in the prompt without compression.</p>
        <p>Across all settings, the best performance was achieved using the “Needs and Explanations” prompt.
This approach outperformed the others in precision, recall, and F1 score, particularly when using
GPT-4o, achieving a precision of 0.693, recall of 0.615, and the F1 score of 0.610. These results indicate
that explicit descriptions of needs help the model better infer intent than relying on Murray’s theory
implicitly embedded in the model.</p>
        <p>In terms of prompt compression, the proposed methods significantly reduced the number of input
tokens compared to the baseline (approximately 1,100 tokens). The “Murray’s Needs Theory
Description” prompt required about 450 tokens and “Needs and Explanations” prompt required about 750,
demonstrating a substantial reduction in input size while maintaining or improving inference accuracy.
These results indicate that all proposed methods successfully reduced the average number of tokens
per navigation session compared to the baseline. Notably, the method using “Murray’s need theory
description” yielded the greatest compression, reducing the prompt size by approximately 600 tokens
relative to the full-knowledge approach. Given the evaluation dataset contains 93 instances, this equates
to a total reduction of roughly 56,000 tokens across the dataset. These findings suggest that while the
“Needs and Explanations” method yields the highest accuracy, the “Murray’s need theory description”
method ofers the most eficient token usage.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>Our proposed prompt compression technique, using a need-oriented knowledge base for a LLM-based
navigation system, successfully improved both inference accuracy and computational eficiency. The
“needs and explanations” method was the most accurate, achieving the highest F1 score of 0.548, while
the “Murray’s need theory description” approach was the most token-eficient. However, significant
challenges remain. The underlying psychological theory is not perfectly suited for navigation, and the
knowledge base requires manual construction, limiting scalability. Furthermore, the system struggles
with ambiguous user requests. Future work will focus on creating a more navigation-specific need
taxonomy, automating knowledge base construction, and developing robust fallback strategies.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by JSPS KAKENHI Grant Numbers 23K11221 and 25K03232, and was partially
supported by NEDO under Grant Numbers JPNP20006 and JPNP25006.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4o to check grammar and improve readability.
After using this tool, the authors carefully reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Survey</given-names>
            <surname>of Embodied</surname>
          </string-name>
          <string-name>
            <surname>AI</surname>
          </string-name>
          : From Simulators to Research Tasks,
          <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>230</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schalkwijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatsu</surname>
          </string-name>
          ,
          <source>T. Morita, An Interactive Virtual Home Navigation System Based on Home Ontology and Commonsense Reasoning, Information</source>
          <volume>13</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Puig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Torralba, VirtualHome: Simulating Household Activities Via Programs, in: 2018
          <source>IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>8494</fpage>
          -
          <lpage>8502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sommerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>On Grounded Planning for Embodied Tasks with Language Models</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>13192</fpage>
          -
          <lpage>13200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Takuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jiading</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huanyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tianchong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shengjie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hongyuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. R.</surname>
          </string-name>
          , Statler:
          <article-title>State-maintaining language models for embodied reasoning</article-title>
          ,
          <source>in: 2024 IEEE International Conference on Robotics and Automation (ICRA)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>15083</fpage>
          -
          <lpage>15091</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arjun*</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ajay*</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Xiaohan*</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pranav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sriram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mikael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sneha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oleksandr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sergio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Karmesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qiyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shiqi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pulkit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yonatan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhruv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mrinal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franziska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sasha</surname>
          </string-name>
          , R. Aravind,
          <article-title>OpenEQA: Embodied Question Answering in the Era of Foundation Models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>16488</fpage>
          -
          <lpage>16498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Murray</surname>
          </string-name>
          , Explorations in personality,
          <source>APA PsycInfo</source>
          (
          <year>1938</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>