<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ranking Task via Tabular Data Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yu Tokutake</string-name>
          <email>tokutakeyuu@uec.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Recommender Systems, Bari, Italy.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Electro-Communications</institution>
          ,
          <addr-line>1-5-1 Chofugaoka, Chofu, Tokyo 182-8585</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper proposes a solution that won first place in the RecTour 2024 Challenge. The proposed solution employs a tabular data approach to address the review ranking task, comprising two stages: candidate generation, and ranking using the LightGBM model. The experimental results confirm that adjusting the number of negative samples in the training set, as an alternative to including all candidates in training, improves performance. In addition, the incorporation of text embeddings generated from user accommodation and review information as features further enhances accuracy. The code is available at https://github.com/ty1260/rectour2024_challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>Review ranking</kwd>
        <kwd>user review</kwd>
        <kwd>personalization</kwd>
        <kwd>tabular data</kwd>
        <kwd>accommodation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
ceur-ws.org
Training set
Validation set
Test set
#users</p>
      <sec id="sec-1-1">
        <title>2.1. Dataset</title>
        <p>The competition dataset comprises three classes of data:
• Users: Information about users and accommodations.
• Reviews: Review texts for accommodations.
• Matches: Combinations of review texts (user_id, accommodation_id, and review_id) generated
by users for accommodations.</p>
        <p>User features include the type of stay (guest_type), guest’s country of origin (guest_country), number
of nights (room_nights), and month of stay (month) . Accommodation features include the type of
accommodation (accommodation_type), country in which the accommodation is located
(accommodation_country), average review rating of the accommodation (accommodation_score), rating by an
external agency (accommodation_star_rating), and whether the accommodation is located on a beach
(location_is_beach), on a ski resort (location_is_ski), or in a city (location_is_city_center). Review
features include the title (review_title), positive section content (review_positive), and negative section
content (review_negative) as textual data; the overall rating of the accommodation (review_score); and
the number of users who referred to the review (review_helpful_votes).</p>
        <p>The dataset was divided into training, validation, and test subsets. The statistics for each set are
presented in Table 1. All users and reviews were distinct within the entire set, and there was always a
one-to-one correspondence between users and reviews. Each accommodation had at least 10 reviews,
and there were no common accommodations among the sets. The test set did not include match data, as
the competition objective was to predict matching between users, accommodations, and reviews in the
test set. Participants were required to rank reviews for each user accommodation using the prediction
results and submit the top 10 reviews.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Evaluation Metrics</title>
        <p>The top 10 predicted reviews were evaluated to determine whether they matched the reviews generated
by users. To accomplish this, the mean reciprocal rank (MRR) and precision were used as evaluation
metrics. MRR@ is calculated as follows:
1 ∑
| | ∈</p>
        <p>1 ,
rank
1</p>
        <p>∑ I [rank ≤  ]
| | ∈
where  is the set of users and rank is the rank of the review generated by user  (if the review is not
in the top- , 1/(rank ) = 0). Precision@ is calculated as follows:
where I[⋅] is the indicator function. In the field of information retrieval and recommendation,
precision@ in Eq. (2) is used as a hit rate. For the competition task,  was set to 10.
(1)
(2)</p>
        <sec id="sec-1-2-1">
          <title>Reviews</title>
          <p>Join</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>Candidate</title>
        </sec>
        <sec id="sec-1-2-3">
          <title>LightGBM</title>
          <p>Ranking</p>
        </sec>
        <sec id="sec-1-2-4">
          <title>Top10 Reviews</title>
        </sec>
        <sec id="sec-1-2-5">
          <title>Features</title>
          <p>User
Features
Accommodation
Features
Review
Features</p>
          <p>TF-IDF
Embeddings</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Method</title>
      <sec id="sec-2-1">
        <title>3.1. Basic Strategy</title>
        <p>An overview of the proposed approach is presented in Figure 1. The strategy employed for the
competition task is a tabular data approach, wherein features are extracted from users, accommodations,
and reviews to build a supervised model that predicts whether a review was generated by a user for a
particular accommodation. First, candidate combinations of users, accommodations, and reviews are
generated. Then, a binary classification model based on LightGBM is used to rank the reviews using
the output probabilities as ranking scores. This approach is inspired by the two-stage recommendation
approach that has been used in recent recommendation task competitions345, which involves the
generation and re-ranking of candidates for recommendation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Candidate Generation</title>
        <p>In a typical recommendation task, the number of users and items is so large that it is impractical to
generate all user-item combinations. However, in this task, given information regarding
accommodations stayed in by users and the reviews generated for each accommodation, it is feasible to generate all
possible combinations of users, accommodations, and reviews. Specifically, candidates were generated
by combining user and review data using accommodation_id as a key. Furthermore, the matched data
were merged with these candidates for both the training and validation sets, and the existence of a
user-generated review for a given accommodation was assigned as the ground truth. This information
constitutes the fundamental input data for the binary classification model. The statistics of the generated
candidates are listed in Table 2. As seen in the table, the candidates exhibited a pronounced imbalance,
with a markedly greater number of negative samples than positive samples. The appropriate handling
of the negative samples is therefore essential during model training. To address this issue, the negative
samples in the training set were randomly downsampled. In the experiment described in Section 4,
changes in performance were observed with respect to the ratio of negative samples.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Features for Review Ranking</title>
        <p>
          The features used as inputs to the classification model were derived from information contained in the
original data, as presented in Section 2.1. Furthermore, features based on accommodation and review
were added as inputs. For accommodation-based features, aggregate features, such as the frequency of
3https://www.aicrowd.com/challenges/amazon-kdd-cup-23-multilingual-recommendation-challenge
4https://www.kaggle.com/competitions/otto-recommender-system
5https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations
accommodation occurrences and average accommodation score, were added. The following
reviewbased features were added:
• Review length: Number of words in review text (review_title, review_positive, review_negative).
• Sentiment analysis score for review title: A RoBERTa-based sentiment analysis model [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ] was
employed to calculate sentiment scores (positive, negative, neutral) for the review titles. Each
score ranges between 0 and 1.
        </p>
        <p>
          Furthermore, as part of the NLP approach, term frequency-inverse document frequency (TF-IDF) was
employed to generate text embeddings for user accommodation and review data. The texts to be
embedded were concatenated in the format “&lt;field_name&gt;:&lt;field_value&gt;\n,” following the method used by
Igebaria et al [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The user and accommodation fields were arranged in the following order: guest_type,
guest_country, room_nights, month, accommodation_type, accommodation_country,
accommodation_star_rating, accommodation_score, location_is_ski, location_is_beach, location_is_city_center.
The review fields were ordered as follows: review_title, review_positive, review_negative, review_score.
The TF-IDF model was trained using the entire training, validation, and testing sets. Because the
resulting embeddings were large, with 2,031,914 and 238,788 dimensions, ICA [7] was applied to reduce
the dimensionality to 100. The resulting TF-IDF embeddings were used as features in one of the model
variations.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiment</title>
      <p>The efectiveness of the proposed method was evaluated using the test set. For comparison, RAND,
which randomly selects 10 reviews for each user from all possible candidates (see Table 2), and Helpful
Votes, which select the top 10 reviews from the candidates based on review_helpful_votes, were
employed as baselines. First, the proposed method’s performance was evaluated without using TF-IDF
embeddings by varying the number of negative candidates in the training set. In particular, performance
was evaluated using a positive-to-negative ratio of 1 ∶  , where  was set to {1, 2, 10, 15, 20, 25, 30, 131}.
In this instance,  = 131 represents a scenario in which all negative candidates were utilized without
downsampling. Subsequently, the performance of the proposed method with  = 20 , which showed
the best performance, was evaluated using the TF-IDF embeddings as features. Table 3 presents the
MRR@10 and Precision@10 results for the leaderboard across all methods, where the proposed method
is denoted as the LGBM. As observed from the table, the proposed method outperformed both RAND
and Helpful Votes in all variations. Additionally, by adjusting  , MRR@10 improved by up to 0.0078
points and precision@10 improved by up to 0.0188 points compared to the case of  = 131 . Furthermore,
the use of TF-IDF embeddings resulted in an improvement of 0.0665 points for MRR@10 and 0.1171
points for precision@10.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>This study proposes a solution for the RecTour 2024 Challenge that employs a tabular data approach,
comprising a two-stage process: candidate generation and ranking. By appropriately adjusting the
number of negative samples within the training set and incorporating text-embedding features, the
proposed method improves prediction accuracy.</p>
      <p>In future studies, two potential directions can be considered. The first direction entails improving the
downsampling process in candidate generation for the proposed method. The current approach adopts
random downsampling is applied, which may introduce bias to the training data for certain users and
accommodations. Therefore, it is necessary to explore more appropriate downsampling techniques.
The second direction involves comparison and integration with other NLP approaches, such as feature
generation based on fine-tuning language models and ensemble methods for the prediction results.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>I would like to thank Dr. Kazushi Okamoto for his help in providing computational resources and
writing this paper.
Computational Linguistics: System Demonstrations, Association for Computational Linguistics,
2022, pp. 251–260. doi:10.18653/v1/2022.acl-demo.25.
[7] A. Hyvärinen, E. Oja, Independent Component Analysis: Algorithms and Applications, Neural
Networks 13 (2000) 411–430. doi:10.1016/S0893-6080(00)00026-5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <source>Bridging Language and Items for Retrieval and Recommendation</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2403.03952.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Livne</surname>
          </string-name>
          , E. Fainman, Booking.com RecTour 2024 Challenge, https://workshops.ds-ifs.tuwien.ac. at/rectour24/rectour-2024-challenge/,
          <year>2024</year>
          . In ACM RecSys RecTour '24,
          <string-name>
            <surname>October</surname>
            <given-names>18th</given-names>
          </string-name>
          ,
          <year>2024</year>
          , Bari, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Igebaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fainman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizrachi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beladev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Enhancing Travel Decision-Making: A Contrastive Learning Approach for Personalized Review Rankings in Accommodations, 2024</article-title>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2407.00787.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ye</surname>
          </string-name>
          , T.-Y. Liu,
          <article-title>LightGBM: A Highly Eficient Gradient Boosting Decision Tree</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/paper_files/paper/ 2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-collados</surname>
          </string-name>
          , K. Rezaee,
          <string-name>
            <given-names>T.</given-names>
            <surname>Riahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ushio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antypas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boisson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          , E. Martínez Cámara,
          <article-title>TweetNLP: Cutting-Edge Natural Language Processing for Social Media</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>49</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .emnlp- demos.5.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Camacho-collados, TimeLMs: Diachronic Language Models from Twitter, in: Proceedings of the 60th Annual Meeting of the Association for</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>