<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extended Travel Itinerary Datasets Towards Reproducibility</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Keisuke Otaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yukino Baba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Tokyo</institution>
          ,
          <addr-line>3-8-1 Komaba, Meguro-ku, Tokyo, 1538902</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Toyota Central R&amp;D Labs., Inc.</institution>
          ,
          <addr-line>Koraku Mori Building 10F, 1-4-14 Koraku, Bunkyo-ku, Tokyo, 1120004</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recommending attractive travel itineraries to tourists is a promising application. A well-known drawback of existing research is the lack of public data for evaluating new recommendation methods and discussing the applicability of recommender systems in real-world applications. We aim to create reproducible environments for travel itinerary recommendation tasks. This paper demonstrates our re-implemented method for constructing travel log data based on Flickr metadata and predefined POI information. We also test our re-implemented baseline algorithms previously proposed and compare the results with existing work. Our results could contribute to the reproducibility of travel recommendation tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;travel itinerary recommendation</kwd>
        <kwd>reproducibility</kwd>
        <kwd>data augmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>cities. When they need to customize existing algorithms for region-specific reasons, evaluating
them using benchmark datasets is a common approach for validation.</p>
      <p>Rich travel log data are available only in a few cities in the current status, which are adopted
in previous work. We herein try to augment the rich travel log data, wherever we want to study
and discuss. As a data source, we revisit geo-tagged photos like Flickr photos collected as a
public dataset named YFCC100m [14]. Many researchers adopted this data source type due to
the data availability issue. For example, Lim et al. [15] followed the protocol by Choudhury et
al. [16] using geo-tagged photos to generate travel log data. They opened their datasets that
consist of POI information, trajectories, and (travel) cost information in some selected cities1. In
contrast to such open datasets, we can access few datasets constructed from other data sources.
For example, LBSN-based data collection was common before (Examples are in [17, 18, 19]);
however, recent API updates made it dificult for researchers to collect check-in datasets in LBSN
services, and researchers did not recently adopt this data source. Similarly, publicly available
GPS trajectories are generally limited due to privacy issues. These backgrounds make us again
focus on the geo-tagged media to extend travel log data as mentioned in [8].</p>
      <p>Our contributions are summarized as below.</p>
      <p>• We re-implement and release in public the mining procedure from geo-tagged photos by
following existing studies [16, 15] to process the public Flickr dataset YFCC100m [14].
• We reproduce two existing baseline solvers (named Popularity and MarkovPath) from</p>
      <p>
        Chen et al. [7] as baseline solvers with utility tools related to extended travel log data.
• We provide computational experiments that (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) reproduce existing results to validate our
approach, and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) extend numerical evaluations to diferent cities for demonstration.
Note that the contributions and materials can be found at the public repository https://zenodo.
org/record/8314376 (DOI: 10.5281/zenodo.8314376), which we will maintain and update
incrementally for research purposes.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Travel Log Data: Review, Re-produce, and Argumentation</title>
      <p>
        This section explains our re-implemented procedure to extend travel log datasets. Figure 1
illustrates our data flow. In preparing travel log data, we require the following components:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) YFCC100m metadata ([14], top left side in Fig. 1), (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) YFCC100m metadata downloader
(bottom left side in Fig. 1) and a procedure to preprocess downloaded metadata, (3) external POI
information to design target area (top right in Fig. 1), and (4) travel log data generator based on
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )–(3) (right side in Fig. 1). We have various geo-tagged data sources (e.g., photos and tweets),
and many researchers could start their studies following this methodology.
      </p>
      <p>Here, we review public datasets in Sec. 2.1. We explain how to re-produce them in Sec. 2.2.
We then try to generate existing data using our re-implemented procedure and extend travel
log data for new cities in Sec. 2.3.</p>
      <p>Flickr metadata
(YFCC100m metadata)</p>
      <sec id="sec-2-1">
        <title>2.1. Review on Existing Datasets</title>
        <p>1https://sites.google.com/site/limkwanhui/datacode#ijcai15 (access confirmed on Feb 16, 2023).
process to generate the data seem to be similar to those in C16 and I15. All datasets contain
POI information and travel logs (see also Sec. 2.2). The POI information is mandatory in the
above formats, and trajectory information is stored in diferent formats.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Reproduce Existing Datasets for Validation</title>
        <p>We review how to reproduce travel log data by following the procedure used in Choudhury et
al. [16] and Lim et al. [15].</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. How existing data are produced</title>
          <p>The public itinerary data is generated from YFCC100m [14] as shown in Fig. 1. In their procedure,
we convert photo streams in YFCC100m [14] with the given POI information into travel log
data. Therefore, our trial to reproduce the existing process is valuable to augment travel log
data for a recommendation.</p>
          <p>Each POI  in a given set  of POIs has attributes representing its information (e.g., latitude,
longitude, name, category, opening time, etc.). We assume that the set  is prepared by
researchers. In Lim et al. [15], the authors defined sets of POIs for each city by themselves after
traversing Wikipedia links.</p>
          <p>For a user , let a photo stream recorded on the Flickr website (and collected as YFCC100m
dataset) be  := ⟨(1, 1 , 1 ), . . . (,  ,  )⟩, where ( ,  ,  ) for 1 ≤  ≤  means
a user visited POI  with arriving time  and departure time  . We first generate travel
sequences from  for each user  by splitting  if  − +1 &gt;  with threshold  . In existing
work of [16] and [15],  = 8 hours. After getting travel sequences from all users in the target
subset of YFCC100m, we can build historical data in I15.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Results of our re-implementation for public datasets</title>
          <p>In this paper, we carefully re-implemented the above procedure, which could be the replication
to generate the public travel log data. To assess our re-implementation, we here apply our
procedure to YFCC100m using pre-defined POIs by Lim et al. [ 15] by selecting Toronto, Osaka,
Glasgow, and Edinburgh from [15]. In our evaluation, we compare the re-produced datasets
in four cities from [15] with those public datasets.</p>
          <p>Table 2 shows the validation results. The above rows in each segment correspond to original
data (public data by [15] and [7]) from Table 1, and the below rows show our re-produced data.
Note that these results resemble each other. The results in Table 2 confirm the validity of our
re-implemented procedure for trajectory mining using YFCC100m data.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Generate Extended Datasets</title>
        <p>We can now apply our procedure to generate extended travel log datasets. To demonstrate
how we researchers can generate their travel datasets for research purposes, we show how to
prepare POI information in three ways.</p>
        <p># Users # POI Visits # Travel Sequences</p>
        <p>City
Toronto (public)</p>
        <p>(ours)
Osaka (public)
(ours, bug fix)
Glasgow (public)</p>
        <p>(ours)
Edinburgh (public)
(ours)
# Photos
157,505
392,420
29,019
82,060
1,395
1,397
450
394
601
601
1,454
1,454
39,419
39,479
7,747
7,400
11,434
11,440
33,944
33,952
6,057
6,066
1,115
981
2,227
2,230
5,028
5,029
(a) Handcrafted POIs
(b) Re-produced POI visits
(c) Re-produced timestamps
(POI Ext 1) We handcraft POI information in the public datasets in Kyoto, Japan, next to
Osaka. By comparing produced datasets in the two cities, we can evaluate how our
implementation works and whether or not the method is applied to diferent cities. We
explain this approach in Sec. 2.3.1.
(POI Ext 2) We also try to demonstrate how to build POI information from Web sites. Following
existing studies, we use LonelyPlanet and Google Maps API to collect POIs and their
latitude/longitudes and use our re-implemented procedure to produce extended datasets
for selected cities, which we explain in Sec. 2.3.2.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Extended Datasets Using Handcrafted POIs (POI Ext1)</title>
          <p>We handcraft POI information in Kyoto, Japan, where unique identifiers and locations (i.e.,
latitudes and longitudes) are mandatory. Figure 2 illustrates re-produced results by our POIs (in
Fig. 2a). By counting metrics corresponding to Table 2, the re-produced Kyoto dataset contains
631 users, 17,072 POI visits, and 1,670 travel sequences. These results are similar to those in
Osaka and Glasgow, so we can confirm that the re-implemented procedure could apply to
other cities with handcrafted POI information.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Extended Datasets Using Collected POIs (POI Ext 2)</title>
          <p>To generate extended datasets for our reproduced experiments in Sec. 3, we generate extended
datasets in 11 selected cities sampled from Japan to show the reproducibility of travel itinerary
recommendation tasks wherever we want to study.</p>
          <p>We prepare POI information as follows. First, we manually extracted POI names of pages on
cities in Japan from the website of LonelyPlanet2. Second, we manually obtained categories
of POIs and attached them to POIs. Third, we computed geodesic information (i.e., latitude
and longitude) via Google Maps API3 for each POI. After filtering out outliers in terms of
their locations, we completed POI information in selected 11 cities, which can be applied to
generating extended travel log data. Table 3 explains the re-produced data. Throughout these
results, we can confirm our procedure that extends travel log datasets for targeting areas.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Evaluation</title>
      <p>This section tries to reproduce existing experimental results using our re-implementations
and to explore several extended datasets by providing additional experimental results on the
performance of previously developed methods (Popularity and MarkovPath).</p>
      <sec id="sec-3-1">
        <title>3.1. Re-implemented baselines</title>
        <p>In this paper, we adopted the following baseline methods.</p>
        <p>
          • Popularity: A solver recommending an itinerary with the largest popularity score.
2https://www.lonelyplanet.com/
3https://developers.google.com/maps?hl=en
• MarkovPath: A solver using a factorized transition matrix between POIs based on Chen
et al. [7] to construct an itinerary, based on the public repository of Chen et al. [7].
Note that we do not include deep learning-based methods (e.g., [10, 11]) because (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) the details of
architectures or learning parameters are not clearly explained, and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) the ensuring
reproducibility of deep learning-based methods in recommender systems is known to be challenging [22].
We want to resolve this issue in our future work.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental Setups</title>
        <p>In our re-produced experiments to assess reproducibility, we evaluate the following aspects
using (A) re-produced public datasets and (B) extended datasets.
(Q1) Whether or not our baseline procedures can generate itineraries whose results are similar
to those reported using public and replicated data (A).
(Q2) Whether or not our replicated data (A) in Table 2 on the existing four cities are similar
enough to public data regarding generated itineraries by baselines.
(Q3) Whether or not we can expect the generalization ability of previously reported results to
other areas (i.e., 11 city datasets (B) in Japan from Sec. 2.3.2).</p>
        <p>We herein adopt F1 and pairs F1 scores, used in Chen et al. [7] to evaluate itineraries.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results and Discussions</title>
        <p>Reproducing experimental results Table 4 and Table 5 answer (Q1) and (Q2) using F1
and pairs F1 scores, respectively. In Table 4, by comparing the first column block (Columns
3-4 from [7]) and second column block (Q1), we can confirm that our baseline solvers are
implemented properly, and we can reproduce almost similar results in terms of F1 and pairs F1
scores. In Table 5, using the re-implemented baselines, we can confirm that our procedure for
mining public data works reasonably to generate travel itinerary data, used in recommendation
studies like [7]. Some results of MarkovPath show diferent results, but we conjecture that
these diferences are from the combinatorial solver used in experiments. In Chen et al. [ 7], the
authors adopted a well-known commercial state-of-the-art solver Gurobi [23], but we only
adopt non-commercial solver Cbc [24].</p>
        <p>Extended experimental results Next, we apply our baseline methods to generate extended
datasets in 11 cities for the question (Q3). To evaluate generated itineraries, we again adopted
the point-set-wise evaluation, which enables us to compare results in 11 cities with those in
four public cities. Table 6 shows F1 and pairs F1 scores, and these results seem to be reasonable
by comparing them with the previous results summarized in Table 4.
Discussions The above two results for (Q1), (Q2), and (Q3) could validate both re-implemented
procedures to generate extended data and re-implemented baselines. Our baselines are simple
but efective. In our future work, as noted, we would like to prepare more baseline solvers (e.g.,
deep learning-based methods, and next POI prediction-based methods) for future discussions. In
conclusion, we can use our re-implemented baseline solvers and procedure to generate extended
datasets to start our travel itinerary recommendation studies wherever we want to commit.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Related Work</title>
      <p>Several researchers have studied the itinerary recommendation tasks from a diferent perspective.
For example, optimization-based [7, 15, 20, 8, 25] and learning-based [10, 11] methods have been
proposed. In this paper, we do not dive into the details of existing methods and try to pursue
state-of-the-art scores. Building our environment on a common framework like Recbole
library [26] for travel itinerary recommendation tasks is one of our future works to accelerate
research on this task. One possible issue for this direction is we need to carefully review the
license of travel log data as they possibly have some private information to arrange such a
public environment based on Recbole.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper focuses on the travel recommendation task and arranges the method to generate
extended travel log data for future research. We also re-implement baseline methods to test our
extended travel log data. Our validation confirms that our re-implemented procedure works as
expected via re-produced and extended experimental results. We expect our re-implemented
procedure could accelerate travel recommendation studies with extended data in various cities
in future work.
[3] D. Herzog, C. Laß, W. Wörndl, Tourrec: A tourist trip recommender system for individuals
and groups, in: Proceedings of the RecSys2018, 2018, p. 496–497. doi:10.1145/3240323.
3241612.
[4] D. Chen, D. Kim, L. Xie, M. Shin, A. K. Menon, C. S. Ong, I. Avazpour, J. Grundy, Pathrec:
Visual analysis of travel route recommendations, in: Proceedings of the RecSys2017, 2017,
p. 364–365. doi:10.1145/3109859.3109983.
[5] I. Kangas, M. Schwoerer, L. J. Bernardi, Recommender systems for personalized user
experience: Lessons learned at booking.com, in: Proceedings of the RecSys2021, 2021, p.
583–586. doi:10.1145/3460231.3474611.
[6] J. Borràs, A. Moreno, A. Valls, Intelligent tourism recommender systems: A survey, Expert
Systems with Applications 41 (2014) 7370–7389. doi:https://doi.org/10.1016/j.
eswa.2014.06.007.
[7] D. Chen, C. S. Ong, L. Xie, Learning points and routes to recommend trajectories, in: Proc.</p>
      <p>of CIKM2016, 2016, pp. 2227–2232. doi:10.1145/2983323.2983672.
[8] K. H. Lim, J. Chan, S. Karunasekera, C. Leckie, Tour recommendation and trip planning
using location-based social media: A survey, Knowledge and Information Systems 60
(2019) 1247–1275. doi:10.1007/s10115-018-1297-4.
[9] Q. Gao, G. Trajcevski, F. Zhou, K. Zhang, T. Zhong, F. Zhang, Deeptrip: Adversarially
understanding human mobility for trip recommendation, in: Proceedings of the ACM SIG
SPATIAL2019, 2019, p. 444–447. doi:10.1145/3347146.3359088.
[10] J. He, J. Qi, K. Ramamohanarao, A joint context-aware embedding for trip
recommendations, in: Proceedings of the ICDE2019, IEEE, 2019, pp. 292–303. doi:10.1109/ICDE.
2019.00034.
[11] F. Zhou, P. Wang, X. Xu, W. Tai, G. Trajcevski, Contrastive trajectory learning for tour
recommendation, ACM Transactions on Intelligent Systems and Technology 13 (2021)
1–25. doi:10.1145/3462331.
[12] L. Jiang, J. Zhou, T. Xu, Y. Li, H. Chen, D. Dou, Time-aware neural trip planning reinforced
by human mobility, in: Proceedings of the IJCNN2022, 2022, pp. 1–8. doi:10.1109/
IJCNN55064.2022.9892652.
[13] M. Tenemaza, S. Lujan-Mora, A. De Antonio, J. Ramirez, Improving itinerary
recommendations for tourists through metaheuristic algorithms: an optimization proposal, IEEE
Access 8 (2020) 79003–79023. doi:10.1109/ACCESS.2020.2990348.
[14] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, L.-J. Li,
Yfcc100m: The new data in multimedia research, Communications of the ACM 59 (2016)
64–73. doi:10.1145/2812802.
[15] K. H. Lim, J. Chan, C. Leckie, S. Karunasekera, Personalized tour recommendation based
on user interests and points of interest visit durations, in: Proceedings of the IJCAI2015,
2015, pp. 1778–1784.
[16] M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, C. Yu, Automatic
construction of travel itineraries using social breadcrumbs, in: Proceedings of the HT2010,
2010, pp. 35–44. doi:10.1145/1810617.1810626.
[17] D. Yang, D. Zhang, V. W. Zheng, Z. Yu, Modeling user activity preference by leveraging
user spatial temporal characteristics in lbsns, IEEE Transactions on Systems, Man, and
Cybernetics: Systems 45 (2014) 129–142. doi:10.1109/TSMC.2014.2327053.
[18] D. Yang, D. Zhang, L. Chen, B. Qu, Nationtelescope: Monitoring and visualizing large-scale
collective behavior in lbsns, Journal of Network and Computer Applications 55 (2015)
170–180. doi:10.1016/j.jnca.2015.05.010.
[19] D. Yang, D. Zhang, B. Qu, Participatory cultural mapping based on collective behavior
data in location-based social networks, ACM Transactions on Intelligent Systems and
Technology (TIST) 7 (2016) 1–23.
[20] K. H. Lim, J. Chan, C. Leckie, S. Karunasekera, Personalized trip recommendation for
tourists based on user interests, points of interest visit durations and visit recency,
Knowledge and Information Systems 54 (2018) 375–406.
[21] C. I. Muntean, F. M. Nardini, F. Silvestri, R. Baraglia, On learning prediction models for
tourists paths, ACM Transactions on Intelligent Systems and Technology 7 (2015) 1–34.
doi:10.1145/2766459.
[22] M. Ferrari Dacrema, P. Cremonesi, D. Jannach, Are we really making much progress? a
worrying analysis of recent neural recommendation approaches, in: Proceedings of the
RecSys2019, 2019, pp. 101–109. doi:10.1145/3298689.3347058.
[23] Gurobi Optimization, LLC, Gurobi Optimizer Reference Manual, 2023. URL: https://www.</p>
      <p>gurobi.com.
[24] J. Forrest, T. Ralphs, H. G. Santos, S. Vigerske, J. Forrest, L. Hafer, B. Kristjansson, jpfasano,
EdwinStraver, M. Lubin, rlougee, jpgoncal1, Jan-Willem, h-i gassmann, S. Brito, Cristina,
M. Saltzman, tosttost, B. Pitrus, F. MATSUSHIMA, to st, coin-or/cbc: Release releases/2.10.8,
2022. doi:10.5281/zenodo.6522795.
[25] S. M. M. Rashid, M. E. Ali, M. A. Cheema, Deepalttrip: Top-k alternative itineraries for trip
recommendation, IEEE Transactions on Knowledge and Data Engineering (2023) 1–14.
doi:10.1109/TKDE.2023.3239595.
[26] W. X. Zhao, S. Mu, Y. Hou, Z. Lin, Y. Chen, X. Pan, K. Li, Y. Lu, H. Wang, C. Tian, Y. Min,
Z. Feng, X. Fan, X. Chen, P. Wang, W. Ji, Y. Li, X. Wang, J. Wen, Recbole: Towards a unified,
comprehensive and eficient framework for recommendation algorithms, in: Proceedings
of the CIKM2021, 2021, pp. 4653–4664. doi:10.1145/3459637.3482016.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Varian</surname>
          </string-name>
          ,
          <article-title>Recommender systems</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>40</volume>
          (
          <year>1997</year>
          )
          <fpage>56</fpage>
          -
          <lpage>58</lpage>
          . doi:doi.org/10.1145/245108.245121.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Neidhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wörndl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuflik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goldenberg</surname>
          </string-name>
          , M. Zanker (Eds.),
          <source>Proceedings of the Workshop on Recommenders in Tourism (RecTour</source>
          <year>2022</year>
          )
          <article-title>co-located with the 16th ACM Conference on Recommender Systems (RecSys</article-title>
          <year>2022</year>
          ), Seattle, WA, USA and Online,
          <source>September</source>
          <volume>22</volume>
          ,
          <year>2022</year>
          , volume
          <volume>3219</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2022</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3219</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>