<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AgentTravel: Knowledge-Augmented LLM Agent Framework for Urban Travel Planning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jie Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronic Engineering, Tsinghua University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country>China Beijing</country>
          <institution>National Research Center for Information Science and Technology (BNRist)</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Large language models are opening new opportunities for intelligent decision support, with urban travel planning as a challenging and high-impact use case. Efective planning requires integrating real-time, multi-source data (e.g., such as points of interest, transportation, and user preferences), while reasoning spatially to generate feasible itineraries. This paper proposes AgentTravel, a unified framework that combines knowledge-grounded modeling, agentic reasoning, and multi-perspective evaluation. It includes: (1) TravelLLM, a domain-adapted model enriched with urban and spatial knowledge, (2) TravelAgent, an agentic planner with structured itinerary memory and real-time data retrieval, and (3) TravelBench, a benchmark assessing both knowledge grounding and plan quality. Experiments on five Chinese cities show that AgentTravel outperforms strong baselines in factual reasoning and itinerary feasibility in the majority of cases, ofering a promising step toward grounded and adaptive LLMs for urban intelligence. Source code and datasets are available at https://github.com/csjiezhao/AgentTravel.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Urban Travel Planning</kwd>
        <kwd>Knowledge-Grounded Agents</kwd>
        <kwd>Benchmarking and Evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid advancement of large language models (LLMs) has opened new opportunities for building
agentic intelligent systems in real-world decision-making tasks. Among these, urban travel planning
has emerged as a particularly promising and impactful application domain [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. As a representative
case of urban intelligence, travel planning inherently integrates multiple subtasks: retrieving up-to-date
information about points of interest (POIs), reasoning over spatial relationships, selecting transportation
options, and organizing itineraries that satisfy diverse user preferences and constraints. Such complexity
requires LLM-driven systems not only access and integrate heterogeneous knowledge sources, but also
demonstrate spatial reasoning and multi-step decision-making capabilities to operate efectively in
dynamic urban environments.
      </p>
      <p>
        Despite recent advances in benchmarking [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], agent architectures [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and iterative plan
refinement [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], several fundamental challenges remain unresolved. First, current LLMs exhibit limited spatial
reasoning capabilities, they often fail to accurately account for geographic distances, travel times,
or accessibility constraints when generating feasible itineraries [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Second, integrating
heterogeneous and real-time information from open APIs, transportation platforms, and local knowledge bases
remains non-trivial: most existing systems either ignore dynamic contextual factors or depend on
narrow, domain-specific data sources. Third, while prior work such as TravelPlanner [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has proposed
evaluation frameworks based on commonsense and hard constraints, there is still a lack of scalable,
multi-perspective benchmarks that jointly assess knowledge grounding, contextual reasoning, and the
practical quality of generated travel plans.
      </p>
      <sec id="sec-1-1">
        <title>To address these challenges, we propose AgentTravel, a unified framework designed to advance</title>
        <p>urban travel planning through knowledge-augmented LLM agent. The framework integrates three
complementary components designed for reasoning, planning, and evaluation: (1) TravelLLM, a
domain-adapted base model fine-tuned with curated knowledge about cities and POIs. This
component enhances the model’s spatial reasoning and domain adaptability for diverse urban contexts; (2)</p>
      </sec>
      <sec id="sec-1-2">
        <title>TravelAgent, an online agentic planner built upon TravelLLM that leverages open Web APIs for</title>
        <p>real-time information retrieval, maintains structured itinerary memory, and employs adaptive planning
strategies to meet user preferences and contextual constraints; (3) TravelBench, a scalable benchmark
suite with two complementary modules: KnowEval, which evaluates factual and spatial knowledge
integration using curated urban datasets, and TripEval, which measures plan feasibility, personalization,
and constraint satisfaction across realistic travel scenarios.</p>
        <p>The contributions of this paper are threefold: (1) We release a multi-source urban knowledge dataset
covering five representative Chinese cities, encompassing road networks, POIs, attractions,
accommodations, and restaurants. The dataset supports both LLM fine-tuning and knowledge-grounded
evaluation for urban planning tasks. (2) We develop an online agentic framework that integrates
real-time information retrieval, spatially aware planning strategies, and persistent itinerary
memory to generate user-centered travel plans. (3) We introduce a comprehensive evaluation suite that
jointly assesses knowledge grounding and multi-criteria plan quality, enabling a holistic assessment of
knowledge-augmented LLM agents for urban travel planning.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>Recent research on LLM-based travel planning [5, 8] can be broadly categorized into two paradigms:</title>
        <sec id="sec-2-1-1">
          <title>LLM as Planner and LLM as Translator. The former treats the LLM as the central reasoning and</title>
          <p>generation engine that directly produces travel itineraries, often enhanced with tool use, agent-based
strategies, or prompt optimization. The latter leverages the LLM primarily as a natural language
interface, translating user requirements into formal or symbolic representations that external solvers
can optimize.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>LLM as Planner. Planner-based approaches focus on empowering LLMs to handle the end-to</title>
          <p>end travel planning pipeline, from understanding user constraints to generating detailed itineraries.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Early eforts such as TravelPlanner [1] established a benchmark for evaluating an LLM agent’s ability</title>
        <p>
          to use tools and satisfy commonsense and hard constraints. TravelPlanner+ [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] extended this with
personalized user models, highlighting the impact of tailoring itineraries to user preferences.
FlexTravelPlanner [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] examined the robustness of planning under dynamic and uncertain conditions,
while NATURAL PLAN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] revealed persistent challenges in multi-city, long-duration scenarios despite
providing full task information. Beyond benchmarking, multi-phase planning frameworks [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] such
as TDAG [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and HyperTree Planning [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] decomposed complex trips into manageable sub-tasks,
improving scalability. Additional work has targeted prompt optimization [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ], multi-module agent
designs such as TravelAgent [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and dialogue-driven multi-agent planning [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Collectively, these
studies advance the ability of LLMs to operate as autonomous planners, but most still face limitations
in robust spatial reasoning and in integrating diverse real-time data streams into the planning loop.
        </p>
        <p>
          LLM as Translator. Translator-based approaches shift the focus from direct itinerary generation
to bridging natural language and structured reasoning systems. In these methods, LLMs convert user
queries into machine-interpretable formats—such as symbolic constraint sets, semantic graphs, or
formal planning languages—that are then processed by external solvers. For instance, Hao et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
formulated travel planning as a satisfiability modulo theories (SMT) problem, enabling precise constraint
handling. ItiNera [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], TRIP-PAL [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and TTG [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] followed similar pipelines, combining LLM-based
parsing with solver-based optimization. ChinaTravel [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] contributed an open benchmark for scalable
evaluation of travel planning, focusing on aligning generated plans with real-world travel demands.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>This paradigm ofers strong guarantees on constraint satisfaction and optimality, but often relies on static or incomplete knowledge bases, making it less adaptive to dynamic, multi-source inputs and less capable of leveraging LLMs’ generative flexibility for nuanced user preferences.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>Definition 1 (Urban Travel Plan). An urban travel plan  is a structured itinerary spanning 
consecutive days for  travelers within an urban environment. It can be represented in a JSON-like format
containing fields such as date, attractions, restaurants, accommodations, and transportation, along with
optional metadata.</p>
      <p>Definition 2 (Online Trip Data). Online trip data on denotes real-time travel information retrieved
from external APIs during planning. It includes attributes of attractions (name, price), restaurants (name,
price, cuisine), and accommodations (name, price, hotel type), providing up-to-date references for generating
feasible and cost-aware itineraries.</p>
      <p>Definition 3 (Ofline City Data). Ofline city data off refers to static, city-specific information
collected before planning. It comprises road networks, POI datasets, and tourism-related data (e.g., attractions,
restaurants, hotels) obtained from public sources. This data serves as a persistent knowledge base that
enhances the spatial reasoning and domain knowledge of the underlying LLM.</p>
      <sec id="sec-3-1">
        <title>Problem Statement. Given a user query  in natural language, the goal of urban travel planning is to</title>
        <p>generate an itinerary  under accessible online data on:</p>
        <p>= ℱ (, on)
where ℱ denotes an agentic planner built upon LLMs and augmented with ofline city data  off .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. AgentTravel</title>
      <p>4.1. TravelLLM</p>
      <sec id="sec-4-1">
        <title>TravelLLM is a knowledge-augmented large language model tailored for urban travel planning. It equips</title>
        <p>the base LLM with two complementary capabilities often missing in general-purpose models: (1) spatial
reasoning over urban environments, including road networks, POI relations, and travel distances; and (2)
domain-specific travel knowledge, such as details about attractions, accommodations, and restaurants.</p>
        <sec id="sec-4-1-1">
          <title>We use Qwen 2.5-7B as the backbone model and apply Low-Rank Adaptation (LoRA) for eficient</title>
          <p>domain and spatial knowledge injection. The model is fine-tuned on a hybrid corpus that combines two
domain-specific instruction sets: CityInstruction and TripInstruction.
4.1.1. CityInstruction: Urban Spatial Knowledge
CityInstruction focuses on enhancing an LLM’s spatial understanding and reasoning capabilities in
urban contexts. It is built from instruction–response pairs derived from our curated ofline city data
off , covering two primary categories:
• Intersection: mapping intersection names to geographic coordinates (name2coords),
performing reverse lookups from coordinates to names (coords2name), and computing distances between
two intersections (between_distance).
• Points of Interest: linking POI names to their corresponding addresses (name2address) and
categories, enabling the model to recognize and reason about relevant locations.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>These instructions equip the model with fine-grained spatial grounding, facilitating more accurate reasoning over locations, navigation, and proximity when generating travel itineraries.</title>
          <p>4.1.2. TripInstruction: Travel-Specific Knowledge
TripInstruction focuses on travel-specific entities, enriching the model’s understanding of attractions,
accommodations, and restaurants to produce realistic and personalized itineraries. It is also derived
from off and includes three main categories:
• Attractions: mapping attraction names to their addresses (name2address), ticket information
(name2ticket), and operating hours (name2opentime), allowing the model to recommend
feasible and timely visits.
• Hotels: providing hotel addresses (name2address) and average prices (name2price), enabling
accommodation suggestions that fit budget and location constraints.
• Restaurants: associating restaurant names with their addresses (name2address), price
(name2price), and cuisine types (name2cuisine), supporting meal planning for users.</p>
          <p>
            By incorporating these fine-grained attributes, the model gains domain-specific grounding to generate
itineraries that are both factually accurate and preference-aware. To retain broad conversational and
task-following abilities while injecting urban knowledge, we augment the domain-specific instructions
with three open instruction datasets: ShareGPT 1, UltraChat [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ], and Open-Platypus [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ]. This hybrid
mix stabilizes the model’s general reasoning and dialogue quality during LoRA fine-tuning, mitigating
over-specialization to the travel domain.
4.2. TravelAgent
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>TravelAgent is the agentic controller in the framework, responsible for translating user requirements</title>
        <p>into concrete, constraint-aware itineraries through real-time interaction with online trip data on
and the knowledge-enhanced model TravelLLM. It operates through three tightly coupled modules:
a structured memory for state tracking, a domain-specific toolbox for real-time data retrieval, and a</p>
        <sec id="sec-4-2-1">
          <title>ReAct-style planning loop for interleaved reasoning and action.</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>1https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k</title>
          <p>4.2.1. Structured Memory for State Tracking</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Urban travel planning involves numerous interdependent elements and evolving contextual factors.</title>
          <p>TravelAgent maintains a day-by-day structured memory that records itinerary details (e.g., attractions,
meals, accommodations, transportation, and estimated per-capita costs), thus providing a persistent
state for iterative updates as planning progresses. The schema for each day is defined as:
{
}
"date": str,
"num_people": int,
"visit_attractions": list,
"breakfast": {"name": str, "cuisines": str},
"lunch": {"name": str, "cuisines": str},
"dinner": {"name": str, "cuisines": str},
"accommodation": {"name": str, "type": str},
"transportation": {"org-dst": str},
"cost_per_capita": dict
4.2.2. Domain-Specific Toolbox
The domain-specific toolbox is a suite of parameterized functions implemented via JSON-schema-based
calls, enabling TravelAgent to retrieve, filter, and integrate external travel information during itinerary
construction. Each tool serves a specific role in the planning workflow:
• MemoryInit – initializes global trip parameters, such as travel dates and number of travelers,
providing a consistent context for subsequent planning steps.
• AttractionSearch – queries online trip data sources to obtain detailed information about
candidate attractions, including names, locations, and basic attributes.
• NearbyRestaurantSearch – identifies restaurants within a specified radius of a given point
of interest, allowing the integration of geographically coherent dining options.
• NearbyHotelSearch – retrieves available accommodations in the vicinity of a target location,
facilitating proximity-based lodging selection.
• TransportationSearch – returns feasible transportation routes between two locations,
supporting realistic scheduling and connectivity.
• MemoryWrite – updates the structured memory with newly retrieved or revised itinerary
elements, ensuring that intermediate planning states remain accessible for reasoning.
• PlanOutput – compiles the current itinerary state into a coherent, user-facing travel plan
representation.</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>By encapsulating external interactions in modular, parameterized tools, the framework can adapt to diverse data providers, geographic contexts, and planning requirements without altering its core reasoning and control logic.</title>
          <p>4.2.3. ReAct-Style Planning Loop</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>TravelAgent follows a ReAct-style planning paradigm [24], interleaving reasoning and tool invocation</title>
        <p>in an iterative feedback loop. At each iteration, the agent performs three coordinated steps: (1)</p>
      </sec>
      <sec id="sec-4-4">
        <title>State Interpretation: analyzes the structured memory to evaluate progress and identify missing or</title>
        <p>inconsistent elements. (2) Action Selection: decides between internal reasoning (e.g., sequencing
attractions, allocating time slots) and external tool invocation (e.g., querying restaurants, retrieving
routes). (3) State Update: integrates the results of reasoning or retrieved data into the structured
memory, incrementally refining the itinerary state.
4.3. TravelBench</p>
      </sec>
      <sec id="sec-4-5">
        <title>TravelBench is a two-part benchmark designed to evaluate both knowledge grounding and itinerary</title>
        <p>
          quality for LLM-based urban travel planning. Unlike prior evaluations such as TravelPlanner [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], it
is built on (1) curated real-world POI and route datasets from major tourist cities, and (2) a unified
framework that jointly assesses factual knowledge of urban entities and the feasibility of multi-day
itineraries under commonsense and user-preference constraints.
4.3.1. KnowEval
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>KnowEval assesses an LLM’s capability to retrieve and reason over factual urban knowledge before the</title>
        <p>planning stage. It consists of two complementary subsets: CityQA, which focuses on spatial knowledge
such as road networks and general POIs, and TripQA, which targets domain-specific travel entities
including attractions, hotels, and restaurants.</p>
        <sec id="sec-4-6-1">
          <title>Each subset is further structured around fine-grained attribute categories derived from the curated</title>
          <p>ofline dataset off . Specifically, CityQA covers: (1) Road attributes - OD pairs, connectivity, and
distances; (2) POI attributes - name-to-address mappings. TripQA includes: (1) Attractions - address,
ticket price, and opening hours; (2) Hotels - address and average price; (3) Restaurants - address, average
price, and cuisine tags.</p>
          <p>We converte the knowledge item into a multiple-choice question (MCQ) automatically generated
by GPT-4o-mini from off and validated by human annotators for factual accuracy and clarity. All
question text is presented in Chinese to maintain fidelity with real-world POI names and descriptions,
but the underlying methodology is language-agnostic and can be readily applied to other languages
or regions by replacing the source datasets. This ensures that the evaluation is grounded in authentic
curated resources while remaining broadly extensible.
4.3.2. TripEval</p>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>TripEval evaluates the feasibility and personalization quality of travel plans generated by LLM-based</title>
        <p>agents. It operates on the structured memory produced by the agent and applies a suite of rule-based
validators that cross-reference curated POI databases and real-time transportation APIs. The evaluation
metrics are grouped into two major categories, as summarized in Table 1.</p>
        <p>Commonsense Constraints
Valid Fields All required fields in the travel plan are populated.</p>
        <p>Valid Days The number of planned days matches the requested trip length.
Valid Attractions Every listed attraction is real and publicly accessible.</p>
        <p>Valid Restaurants Every listed restaurant is real and currently operating.</p>
        <p>Valid Accommodations All accommodations are valid and bookable.</p>
        <p>Available Transportation Transportation between locations is feasible.</p>
        <p>No Repeated Attractions No attraction is visited more than once.</p>
        <p>No Repeated Restaurants No restaurant is visited more than once.</p>
        <p>Preference Constraints
Reasonable Budget
Favorite Cuisine
Preferred Hotel Type</p>
        <p>The total cost remains within the user-specified budget.</p>
        <p>The itinerary includes the user’s preferred cuisines.</p>
        <p>Accommodation matches the specified hotel category.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>5.1. Settings
5.1.1. City &amp; Trip Datasets</p>
      <sec id="sec-5-1">
        <title>We construct the datasets from five representative tourist cities in China: Beijing, Shanghai, Guangzhou,</title>
      </sec>
      <sec id="sec-5-2">
        <title>Chengdu, and Xi’an. These cities were selected for the rich cultural heritage, diverse urban layouts, and</title>
        <p>high tourist activity, making them ideal testbeds for evaluating urban travel planning systems.</p>
        <p>The city-level data is sourced from OpenStreetMap2 and Amap3 , covering road networks and POIs.
The trip-level data comes from Ctrip4 , including attractions, accommodations, and restaurants with
rich attributes such as prices, operating hours, and category labels. Table 2 summarizes the dataset
statistics. All data is in Chinese to match real-world place names and descriptions, but this does not
impact the generality of our approach. The framework and evaluation pipeline are language-agnostic
and can be applied to other languages or cities.</p>
        <p>City Data Trip Data</p>
        <p>Num. Roads Num. Intersections Num. POIs Num. Attractions Num. Hotels Num. Restaurants
Beijing
Shanghai
Guangzhou
Chengdu
Xi’an
5.1.2. Query Generation</p>
      </sec>
      <sec id="sec-5-3">
        <title>To simulate realistic and diverse user requests for itinerary planning, we develop an automated pipeline</title>
        <p>that generates natural-language queries paired with structured JSON representations. Given a target
city and dificulty level, the generator samples key trip parameters - duration, number of travelers, start
date, and budget - through controlled randomization. Budgets are derived from a per-capita-per-day
baseline cost and adjusted by multiplicative factors for diferent hotel categories, ensuring internal
consistency across trip attributes.</p>
        <sec id="sec-5-3-1">
          <title>Preference constraints are injected in three tiers: (1) No preference - budget constraint only; (2)</title>
        </sec>
        <sec id="sec-5-3-2">
          <title>Single preference - one hotel category or one to three preferred cuisines; (3) Combined preferences</title>
          <p>- both hotel category and multiple cuisines. We generate 100 queries per city with dificulty levels, and
prompt GPT-4o-mini to produce a fluent, user-like query.
5.1.3. Metrics</p>
        </sec>
        <sec id="sec-5-3-3">
          <title>We evaluate model performance using five complementary metrics: Delivery Rate (DR): the percentage</title>
          <p>of itineraries successfully completed within the allowed number of reasoning and tool-invocation
steps; Commonsense Pass Rate (CPR): the proportion of itineraries satisfying all commonsense
constraints defined in TripEval (e.g., valid POIs, non-repetition, feasible transportation); Preference
Pass Rate (PPR) – the proportion satisfying all user-specified preference constraints (e.g., budget,
cuisine, accommodation type); Final Pass Rate (FPR) – the percentage of itineraries simultaneously
meeting both commonsense and preference constraints; Accuracy (ACC) – the fraction of correctly
answered multiple-choice questions in KnowEval, reflecting factual and spatial knowledge grounding.</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>2https://www.openstreetmap.org/</title>
      </sec>
      <sec id="sec-5-5">
        <title>3https://lbs.amap.com/</title>
      </sec>
      <sec id="sec-5-6">
        <title>4https://ctrip.com/</title>
        <p>Model
5.2. Results</p>
        <sec id="sec-5-6-1">
          <title>We evaluate AgentTravel against several competitive LLM baselines on both KnowEval and TripEval.</title>
        </sec>
        <sec id="sec-5-6-2">
          <title>To ensure a fair and controlled comparison, all models operate within the same TravelAgent planning</title>
          <p>framework, sharing an identical prompting template, structured memory schema, ReAct-style reasoning
loop, and domain-specific toolbox.</p>
          <p>Table 3 reports results on CityQA and TripQA across five cities. TravelLLM ranks first or second
in nearly all cases, showing the best overall balance. On TripQA, TravelLLM achieves the highest
scores in Beijing, Chengdu, and Xi’an, and competitive results in Shanghai and Guangzhou. These
gains confirm that domain-specific nfie-tuning improves factual recall and reasoning on travel entities.</p>
        </sec>
        <sec id="sec-5-6-3">
          <title>On CityQA, GPT-4o-mini leads in Beijing, Shanghai, and Chengdu, while TravelLLM performs better</title>
          <p>in Guangzhou and Xi’an. This shows that city-level adaptation can match or surpass larger models in
localized spatial reasoning.</p>
          <p>Table 4 reports delivery (DR), commonsense (CPR), preference (PPR), and final pass rate (FPR) across
ifve cities. AgentTravel achieves near-perfect delivery ( ≥ 0.98) across all settings, indicating strong
execution stability. GPT-4o-mini performs best on commonsense reasoning, while AgentTravel remains
competitive in Beijing and Xi’an, outperforming other open models. On personalization, performance
is moderate but consistent, slightly below Qwen and GLM in some cities. Notably, AgentTravel attains
the highest FPR in four cities, reflecting improved overall feasibility.</p>
        </sec>
      </sec>
      <sec id="sec-5-7">
        <title>Despite these advances, LLM-based travel planning remains challenging. Our results suggest that integrating knowledge-grounded reasoning with structured memory ofers a promising path toward more reliable and adaptive LLM planners.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <sec id="sec-6-1">
        <title>This paper introduced AgentTravel, a unified framework for LLM-based urban travel planning, combin</title>
        <p>ing knowledge-grounded modeling, agentic reasoning, and multi-perspective evaluation. Experiments
across five Chinese cities show that domain- and city-specific fine-tuning strengthens factual reasoning,
while structured agentic planning improves itinerary feasibility. Despite these gains, LLM-based travel
planning remains a challenging task, requiring better commonsense reasoning, preference alignment,
and adaptability to real-world data.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>During the preparation of this work, the authors used ChatGPT-4 in order to: Grammar and spelling check, Paraphrase and reword. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. CityInstruction &amp; TripInstruction Examples</title>
      <p>Example (Intersection-name2coords):
"instruction": "Please provide the geographical coordinates of a given
intersection",
"input": "Zhouzhang Road and Fangyi Road Intersection",
"output": "115.6906259, 39.5750395"
{
}
{
}
{
Example (POI-name2address):</p>
      <sec id="sec-8-1">
        <title>Example (Attractions-name2ticket):</title>
        <p>"instruction": "Please provide the address of a given Point of Interest.",
"input": "Sanyuan Ecological Park in Beijing",
"output": "No. 8, Xiaoyunli, Sanyuan Park, Taiyanggong Township, Chaoyang</p>
        <p>District"
"instruction": "Please tell me the ticket price of a given attraction.",
"input": "Old Summer Palace in Beijing",
"output": "The ticket price for the Old Summer Palace in Beijing is 10 CNY."
Example (Restaurants-name2cuisine)</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>B. CityQA &amp; TripQA Examples</title>
      <p>CityQA Example (Road Connectivity):
Q: Which road is directly connected to Yunquan Road?
A. Haidian South Road B. Zhiquan Road C. Yuanboyuan South Road
Street
Answer: B
D. Zhaitang
TripQA Example (Hotel Price):
Q: What is the average per-capita price of Lavande Hotel (Beijing Headquarters
Base)?
A. 285 CNY B. 699 CNY C. 535 CNY D. 313 CNY</p>
      <p>Answer: D</p>
    </sec>
    <sec id="sec-10">
      <title>C. Prompts</title>
      <p>C.1. ReAct Planning Prompt
You are a travel planning assistant. Your task is to help users create detailed
˓→ daily travel itineraries (in Chinese) by strictly following the instructions
˓→ below.
### Responsibilities
1. Understand user requirements: Accurately extract travel start/end dates, number
˓→ of people, budget, preferences, etc.
2. Retrieve information using tools: Use designated tools to gather data on
˓→ attractions, restaurants, accommodations, and transportation.
3. Preliminary setup: Before starting the planning task, use `MemoryInit` to
˓→ initialize the memory and set up essential information such as travel dates and
˓→ group size.
4. Timely record-keeping: Each time a restaurant, accommodation, or transportation
˓→ item is obtained, immediately write it to the memory using `MemoryWrite`.
5. Step-by-step itinerary construction: First determine the full list of
˓→ attractions to be visited across the trip. Then, collect and record restaurants,
˓→ accommodations, and transportation information on a day-by-day basis.
### Task Execution Flow
#### Phase 1: Plan Attractions Across the Entire Trip
1. Use `MemoryInit` to initialize the memory with travel dates and number of people.
2. Call `AttractionSearch` to retrieve information about attractions in the target
˓→ city.
3. Select appropriate attractions and assign them to each day in a balanced manner
˓→ (avoid overcrowded schedules).
4. Use `MemoryWrite` to record the attractions for Day 1. Repeat this for each day
˓→ until all attractions have been assigned and recorded.
#### Phase 2: Daily Information Collection and Logging
For each day, perform the following steps in sequence:
1. Call `NearbyRestaurantSearch` to obtain breakfast options.
2. Write the breakfast information to the memory using `MemoryWrite`.
3. Repeat the above two steps for lunch.
4. Repeat the above two steps for dinner.
5. Call `NearbyHotelSearch` to find accommodation near the day's attractions.
6. Record accommodation details with `MemoryWrite`.
7. Call `TransportationSearch` to get transportation plans between all visited
˓→ attractions for the day.
8. Log the transportation details using `MemoryWrite`.
### Using Thought-Action-Observation Loop
- Thought: Express your current reasoning using natural language. Do not include any
˓→ tool calls in this phase.
- Action: Based on your thought, invoke the appropriate tool using valid parameters.
˓→ Use the system’s function-calling mechanism where possible.
- Observation: Examine the tool's output and use it to guide the next thought.
### Important Guidelines
- Do not use attraction/restaurant/hotel names unless they come from the results
˓→ returned by `AttractionSearch` or `NearbySearch`.
- Each piece of information must be collected and recorded **independently**;
˓→ merging multiple tasks is not allowed.
- To avoid forgetting data, each collected item must be immediately written to the
˓→ memory using `MemoryWrite`.
- For days with multiple attractions, transportation between each pair must be
˓→ queried and written separately.
- Each Action phase should involve only one tool invocation for a single specific
˓→ task. Multiple tool uses in one action are not allowed.
- After all daily information has been recorded in the memory, call `PlanOutput` to
˓→ generate the final complete travel plan.</p>
      <p>C.2. Knowledge Evaluation Prompt
Here is a multiple-choice question related to urban travel knowledge. You need to
˓→ choose the most appropriate answer from A, B, C, and D. Please output only the
˓→ letter corresponding to the correct answer, with no additional content.</p>
    </sec>
    <sec id="sec-11">
      <title>D. Example Query and Generated Plan</title>
      <sec id="sec-11-1">
        <title>Below is an example of a user query in English and the corresponding structured travel plan produced by our system.</title>
        <p>User Query
I would like a 1-day travel plan in Beijing for 2 people, starting on July 20, 2025,
˓→ with a budget of around 2,200 CNY.</p>
        <p>Generated Plan
{
"date": "2025-07-20",
"num_people": 2,
"visit_attractions": [
"Summer Palace",
],
"breakfast": {
"name": "Palace Museum Restaurant",
"cuisines": "Chinese"
},
"lunch": {
"name": "Tingliguan Restaurant (Summer Palace Branch)",
"cuisines": "Chinese"
},
"dinner": {
"name": "Donglaishun Restaurant (Temple of Heaven Branch)",
"cuisines": "Beijing Cuisine"
},
"accommodation": {
"name": "Atour Light Hotel Beijing Qianmen Temple of Heaven",
"type": "Comfort"
},
"transportation": {
"Summer Palace → Palace Museum": "From the Summer Palace, walk 791 meters
˓→ to ...",
"Palace Museum → Temple of Heaven": "From the Palace Museum, walk 870
˓→ meters to ..."
},
"cost_per_capita": {
"Palace Museum": 60,
"Summer Palace": 30,
"Temple of Heaven": 10,
"breakfast": 86,
"lunch": 153,
"dinner": 147,
"accommodation": 300,
"transit": 8.0</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Chen,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>Travelplanner: A benchmark for real-world planning with language agents</article-title>
          ,
          <source>in: Forty-first International Conference on Machine Learning</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>UTA</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <article-title>Wanderboat: Ai travel planning assistant</article-title>
          ,
          <year>2024</year>
          . URL: https://wanderboat.ai/, accessed:
          <fpage>2025</fpage>
          -06-18.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          , H.-T. Cheng,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          , et al.,
          <article-title>Natural plan: Benchmarking llms on natural language planning</article-title>
          ,
          <source>arXiv preprint arXiv:2406.04520</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Travelagent:</surname>
          </string-name>
          <article-title>An ai assistant for personalized travel planning</article-title>
          ,
          <source>arXiv preprint arXiv:2409.08069</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Fischer</surname>
          </string-name>
          , Y.
          <string-name>
            <surname>-H. Wu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Marwood</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Baluja</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schuurmans</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Evolving deeper llm thinking</article-title>
          ,
          <source>arXiv preprint arXiv:2501.09891</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Citygpt: Empowering urban spatial cognition of large language models</article-title>
          ,
          <source>in: Proceedings of the 31th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Ouyang,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Citybench: Evaluating the capabilities of large language models for urban tasks</article-title>
          ,
          <source>in: Proceedings of the 31th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Purkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raghav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mallick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Tripcraft: A benchmark for spatio-temporally fine grained travel planning</article-title>
          ,
          <source>arXiv preprint arXiv:2502.20508</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bharadwaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fashandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Personal large language model agents: A case study on tailored travel planning</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>486</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <article-title>Flex-travelplanner: A benchmark for flexible planning with language agents</article-title>
          ,
          <source>arXiv preprint arXiv:2506.04649</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>A human-like reasoning framework for multi-phases planning task with large language models</article-title>
          ,
          <source>in: ICML Workshop on Large Language Models and Cognition</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>Tdag: A multi-agent framework based on dynamic task decomposition and agent generation, Neural Networks (</article-title>
          <year>2025</year>
          )
          <fpage>107200</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ma</surname>
          </string-name>
          , H. Zhen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Hypertree planning: Enhancing llm reasoning via hierarchical thinking</article-title>
          ,
          <source>arXiv preprint arXiv:2505.02322</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miin</surname>
          </string-name>
          , T. Wei,
          <article-title>Smart language agents in real-world planning</article-title>
          ,
          <source>arXiv preprint arXiv:2407.19667</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koenig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dilkina</surname>
          </string-name>
          ,
          <article-title>Reprompt: Planning by automatic prompt engineering for large language models agents</article-title>
          ,
          <source>arXiv preprint arXiv:2406.11132</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          , S.
          <article-title>-</article-title>
          K. Ng, T.-S. Chua,
          <article-title>Ask-before-plan: Proactive language agents for real-world planning, in: Findings of the Association for Computational Linguistics: EMNLP</article-title>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>10836</fpage>
          -
          <lpage>10863</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Fan,
          <article-title>Large language models can solve real-world planning rigorously with formal verification tools, in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers</article-title>
          ),
          <year>2025</year>
          , pp.
          <fpage>3434</fpage>
          -
          <lpage>3483</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , et al.,
          <article-title>Itinera: Integrating spatial optimization with large language models for open-domain urban itinerary planning</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1413</fpage>
          -
          <lpage>1432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>T. de la Rosa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gopalakrishnan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pozanco</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Borrajo</surname>
          </string-name>
          , Trip-pal:
          <article-title>Travel planning with guarantees by combining large language models and automated planners</article-title>
          ,
          <source>arXiv preprint arXiv:2406.10196</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Foss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zharmagambetov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Amos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Kao</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. FazelZarandi</surname>
          </string-name>
          , et al.,
          <article-title>To the globe (TTG): Towards language-driven guaranteed travel planning</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>J.-J. Shao</surname>
            ,
            <given-names>X.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , W.-D. Wei, L.-
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.-f.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Chinatravel: A realworld benchmark for language agents in chinese travel planning</article-title>
          ,
          <source>arXiv preprint arXiv:2412.13682</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Enhancing chat language models by scaling high-quality instructional conversations</article-title>
          ,
          <source>arXiv preprint arXiv:2305.14233</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          , Platypus: Quick, cheap, and
          <article-title>powerful refinement of llms</article-title>
          ,
          <source>arXiv preprint arXiv:2308.07317</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          , I. Shafran,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , React:
          <article-title>Synergizing reasoning and acting in language models</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>