<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Resources
local food consumption on trips and holidays: A Management Journal 33 (2020) 53-73. URL: http:
grounded theory approach</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.4018/IRMJ</article-id>
      <title-group>
        <article-title>Personality from Text for Restaurant Recom mendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evripides Christodoulou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Gregoriades</string-name>
          <email>andreas.gregoriades@cut.ac.cy</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Herodotos Herodotou</string-name>
          <email>herodotos.herodotou@cut.ac.cy</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Pampaka</string-name>
          <email>maria.pampaka@manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Consumer Personality, Food preference extraction, Recommender System, Topic Modelling</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cyprus University of Technology</institution>
          ,
          <addr-line>Limassol</addr-line>
          ,
          <country country="CY">Cyprus</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The University of Manchester</institution>
          ,
          <addr-line>Manchester</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>9741</volume>
      <fpage>13</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Restaurant recommender systems are designed to support restaurant selection by assisting consumers with the information overload problem. However, despite their promises, they have been criticized of insuficient performance. Recent research in recommender systems has acknowledged the importance of personality in improving recommendation; however, limited work exploited this aspect in the restaurant domain. Similarly, the importance of user preferences in food has been known to improve recommendation but most systems explicitly ask the users for this information. In this paper, we explore the influence of personality and user preference by utilizing text in consumers' electronic word of mouth (eWOM) to predict the probability of a user enjoying a restaurant he/she had not visited before. Food preferences are extracted though a trained named-entity recognizer learned from a labelled dataset of foods, generated using a rule-based approach. The prediction of user personality is achieved through a bi-directional transformer approach with a feed-forward classification layer, due to its improved performance in similar problems over other machine learning models. The personality classification model utilizes the textual information of reviews and predicts the personality of the author. Topic modelling is used to identify additional features that characterize users' preferences and restaurants properties. All aforementioned features are used collectively to train an extreme gradient boosting tree model, which outputs the predicted user rating of restaurants. The trained model is compared against popular recommendation techniques such as nonnegative matrix factorization and single value decomposition.</p>
      </abstract>
      <kwd-group>
        <kwd>terest in the utilization of users' personality</kwd>
        <kwd>since it is</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Dining is one of the top five tourist activities during a</title>
        <p>leisure trip that plays a central role in travel experience.</p>
        <p>Recently, interest on food experience has been growing
[1], with businesses in the hospitality sector seeking
insights regarding the dining behaviors and preferences of
customers to improve decision making in areas such as
marketing [2] and recommendation [3]. Past research
that utilizes food in recommender systems such as [4]
employ simple techniques such as frequencies of food
vocabularies in Bag of Words. However, such techniques
require a lexicon of complete list of foods that usually
is not available for diferent cousins and countries. In
this paper, we utilize implicit and explicit information
of consumers’ eWOM to improve restaurant
recommendation. Implicit information refers to textual comments
in reviews that can be used to estimate consumers’
pernEvelop-O
ences and needs [6]. The application of users’ personality
has enhanced the performance of recommender systems
(H. Herodotou); 0000-0001-5481-1560 (M. Pampaka)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License in the tourism domain with results improving point of
interest, destination recommendations, utilizing either</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Existing Knowledge</title>
      <p>questionnaires or automated personality recognition.</p>
      <p>This paper illustrates the utilization of consumers’
personality and user preferences extracted from the textual This section provides a review of recommendation
techpart of electronic word of mouth (eWOM) to improve niques, the concept of personality, and elaborates on how
recommendation. EWOM represent consumer opinions it has been used in recommender systems so far.
about products and services and has been used
extensively in identifying consumers’ preferences. Recom- 2.1. Restaurant Recommender Systems
mendations are made by training an Extreme Gradient
Boosting (XGBoost) prediction model using as features, Recommender systems aim to predict the satisfaction of a
the users’ personality and the users’ preferences (e.g., consumer with an item (product/service) he/she has not
food). XGBoost is used due to its good performance bought yet [11]. This is part of one-to-one marketing that
in similar recommendation problems [7]. The research seeks to match items to consumers’ preferences in
conquestion addressed in this paper focuses on whether the trast to mass marketing aiming to satisfy a target market
integration of personality with other features inferred segment [12]. Popular approaches focus on consumers’
from structured and unstructured parts of online reviews past experiences (ratings) for the creation of a user-item
improves restaurant recommendation, in contrast to pop- matrix and based on that predict what is more
approular model-based collaborative filtering (CF) techniques priate to a user depending on either similarity between
such as nonnegative matrix factorization (NMF) and sin- users or items (products, services) [11]. The relationship
gle value decomposition (SVD). between consumers or between products can be found</p>
      <p>The proposed approach utilizes consumers’ food pref- using similarity metrics, and this method is known as
erences and personalities along with perceptions about Collaborative Filtering (CF) [13]. This has been
successvenues from eWOM to recommend most suitable restau- fully applied in tourism recommendation problems such
rants to tourists. Labelled personality data is utilized as hotels or points of interests, and is considered as one of
to train a BERT (Bidirectional Encoder Representations the most popular techniques [14]. Another popular
techfrom Transformers) classifier using the personality model nique is content-based filtering, that attempts to guess
of Myers-Briggs Type Indicator (MBTI) due to its good what a user may like based on items’ features rather than
results in previous studies [8]. User preferences are ex- their rating [13]. A hybrid approach takes the advantage
tracted through topic modelling and a trained food named of both content-based filtering and collaborative filtering
entity recognizer. An XGBoost model is generated to pre- [11].
dict the probability of a user liking an unvisited restau- CF techniques, however, sufer from the cold start
probrant based on its personality, preference, and themes that lem that occurs when very little or no data is available
characterize the venue. about a user and thus inability to identify similar
con</p>
      <p>The research question addressed in this work is how sumers [15]. In addition, data sparsity exacerbates the
to best combine user preference and personality mod- problem when there are a lot of unrated items in the
userels with topic features inferred from eWOM to produce item matrix. This occurs when there is not enough data
the best recommendation, in contrast to popular model- to populate the user-item matrix based on which to make
based collaborative filtering (CF) techniques. This is a reliable inferences [16]. In tourism, the collection of data
continuation of our previous work in [9, 10] that exam- is dificult and time-consuming due to the limited time
ine the use of personality and emotion in recommender that tourists spent at a destination. The cold start
probsystems. The contribution of this work lies in the auto- lem appears with first-time users (tourists) since there are
mated detection of food preferences from eWOM and its no records of their purchasing activity at a specific
descombination with user personality and topic modeling tination.To address these CF problems, recent methods
for restaurant recommendation. utilise machine learning techniques such as matrix
fac</p>
      <p>The paper is organized as follows. The next section torisation to approximate the user-item matrix content
introduces background knowledge on restaurant recom- using latent variables that emerge from the initial data.
mender systems and techniques for extracting food pref- The singular value decomposition (SVD), optimized SVD
erences and personality from text. The next section de- (SVD++), and non-negative matrix factorization (NMF)
scribes techniques for identifying topics discussed in con- models factorize the user-item matrix and predict the
sumers’ eWOM and personality prediction using deep satisfaction of users for products that are unknown [17].
neural networks. Subsequent sections elaborate on the Alternatively, content-based approaches utilize metadata
methodology followed and the results obtained. The pa- about new products to address the cold start problem.
per concludes with the discussion and future directions. A useful source for obtaining these metadata is textual
information from eWOM and its analysis using text
analytics [18]. An example application includes work by Sun
et al. [19] that improved CF performance by analysing
restaurants eWOM to define numerical features corre- mechanism for defining the most essential aspects of
persponding to consumers satisfaction through sentiment sonality that describes people characteristics that creates
analysis. In the same vein, topic modelling techniques and reflects their behaviour [ 28].
have been used with CF to assist in estimating the simi- Personality prediction is an important phase of
larity between consumers or items [20]. Finally, work by personality-aware recommender systems, and the two
Zhang et al. [21] used consumers or items characteristics main methods for doing so is through questionnaires and
to cluster them into groups, and then find correlations automated means. Generally, questionnaires are more
between clusters to address the data sparsity problem. accurate in assessing personality; however, the process</p>
      <p>Recently, a strong interest emerged in using the per- is tedious while the automated approach is easier to
consonalities of consumers in an efort to better understand duct, by utilising user’s existing data that can be either
and match their needs, as “personality” relates to the text, images, videos, likes (behavioural data) etc. [18]
perceptions, feelings, motivations, and preferences of in- Predicting personality from text is a popular automated
dividuals [22]. The application of user personality has approach that is based on personality theory claiming
improved the performance of recommendations in the that words can reveal some psychological states and
pertourism domain for points of interest compared to tradi- sonality of the author of the text. There are two main
tional methods [23]. Personality-based recommendations categories of techniques, the feature-based and the deep
have also been shown to greatly reduce the cold start and learning: the former uses unigrams/n-grams (open
vodata sparsity problems, and improved the performance cabulary approach) or lexicons (closed vocabulary) of
of recommendations in areas such as online advertising, features relevant to personality, and the latter text
emsocial media, books, and music [24]. However, these beddings learned from large corpus of text in an
unsuperapproaches do not take advantage of eWOM data from vised manner (language models). Popular feature-based
users on the web to extract their preferences and their methods utilize the Mairesse [29] and linguistic inquiry
personalities. They focus mainly on the extraction of and word count techniques [30]. Features from these are
user data from specialized questionnaires to collect con- fed into diferent machine learning classifiers (e.g., Naïve
sumers’ behaviours and personalities. Such approaches Bayes, support vector machines) to make predictions.
fail to continuously update the system because of the Obtaining such features however is a costly process and
time-consuming use of questionnaires that leads to the cannot efectively represent the original text semantics.
loss of automation and update limitations. To avoid feature engineering, deep neural models and
language models are employed to learn text
representa2.2. Personality Extraction from Text tions that currently result in improved accuracy. Deep
models focus on the context of the text and not just a
Personality is a set of characteristics and behaviours of an static representation for a word or a sentence. Those kind
individual that influence many areas of his/her life such of deep learning techniques are using an attention
mechas motivations, preferences, as well as consumer pref- anism that focuses on giving weights to words based on
erences and behaviour [23]. Applications of automated how they are used in a text giving the ability of
capturpersonality predictions have been applied by researchers ing the semantic content [31]. A popular architecture is
on data from various social networks such as Facebook, the BERT (Bidirectional Encoder Representations from
Twitter, to explore correlations between personalities and Transformers) that utilizes transformers neural network
the diferent user activities, purchasing behaviors and architecture. Attention-based transformers have shown
liking of foods from specific cuisines [ 25]. that collecting the semantics of a text improves the
per</p>
      <p>The two most popular text-based personality classifi- formance level and the predication accuracy of ML
percation methods are based on the Myers-Briggs Type Indi- sonality models [32]. Given this, the method proposed in
cator (MBTI) [26] and the Big Five [27] personality traits this paper utilizes attention-based personality prediction.
due to the availability of labelled data on these models. Most approaches use a binary classifier for each of
The classifiers with best performance are usually employ- the personality traits (MBTI) such as a classifier for
ing the MBTI personality model that focuses on 8 key extraversion-introversion etc. Such methods require pre
types of characteristics that people have, Extraversion or labeled data with the personality class. The first step in
Introversion, Sensing or Intuition, Thinking or Feeling, the process is the vectorization of the text into a form
and Judging or Perceiving, behaviours. The combina- that can be processed by ML algorithms [33]. This can be
tion of characteristics can shape 16 diferent personality done using open/closed lexicons or sentence embeddings
types and classify people to the proper personality clus- in the case of deep learning methods (BERT). The
vectorter [13]. The Big 5 Personality model express personality ized data is used to train a classifier using the data label
in the following 5 dimensions: Agreeableness, Extraver- or fine tune a pretrained model in the case of BERT. The
sion, Openness to Experience, Conscientiousness, and trained and validated model can be used to predict unseen
Neuroticism. Such taxonomies are recognized as a valid data. Recent personality classification techniques that
The proposed method utilises a named entity recognition
to extract food preferences, an automated eWOM topic
modelling for the identification of themes discussed in
review’s text, a BERT-based personality classification,
and an ensemble tree-based regression for the prediction
of consumer restaurant ratings.</p>
      <sec id="sec-2-1">
        <title>3.1. User’s Food Preference Extraction</title>
        <p>A named entity recognition (NER) is utilized to extract
the food preferences of customers. NER is a major
component in NLP systems to extract information from
unstructured text. An entity can be any word or sentence
that refers to the concept of question. There are two
main approaches for the creation of a NER, the
modelbased, and rule-based approach. The latter focuses on the
grammatical rules and linguistic terms to extract entities.</p>
        <p>The model-based approach generates machine learning
models using a text with pre-labelled entities. Most food
NER models as reported in [35] are trained on data that
did not include Cyprus dishes thus their predictions were
insuficient for our case. There is a wide range of generic
libraries suitable for NER, such as NLTK, spaCy,
Stanford NER, Stanza, and Flair, but none was able to provide
appropriate food recognition in text.</p>
        <p>To extract food preferences, the spacy library was
utilized, and several rules have been specified that enabled
the extraction of sentences that mention food
consumption such as “I ate ”,“I had for dinner” etc. To generate
a suficient training set, a local and international food
recipe dataset was used. The returned sentences were
annotated automatically based on the position in the
sentence where the food entity occurred. This was
necessary in order to create a training dataset labelled with
food names, and their start/end position in the sentence,
based on which the food NER training was performed.</p>
        <p>The trained NER achieved an overall accuracy of 81%
(70/30 train-test split) and was applied on the restaurant
reviews to extract foods associated with each review. Due
to the large number of food entities that were generated,
there were a lot of repetitions due to diferent spellings.</p>
        <p>Thus, to reduce the dimensionality of the dataset, a
feature selection process was performed using a random
forest machine learning model to identify the most
important food names using the review ratings as the target
variable. The process yielded the optimum number of
feautilize deep learning for Big Five personality prediction, tures (220) that resulted in the best model performance.
such as the DeepPerson [34] demonstrate classification The selected food features where then one-hot encoded
performance (AUC score) of around 70% per personal- for each consumer review. To identify the food
preferity dimension, using diferent training datasets, which is ences of each user, reviews were grouped by user and the
much lower compared to classifiers that use MBTI data. most frequent food entities in each user’s reviews were
considered as food preferences. This process considers
that, when customers visit diferent restaurants and write
3. Technical Background comments about the food they ordered, irrespective of
the food’s quality and the review rating, it constitutes
food preference of the user.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. EWOM Topic Modelling</title>
        <p>Topic modelling is a popular tool for extracting
information from unstructured data and is used in this work to
identify themes discussed by consumers in eWOM. Topic
models generally involve a statistical model aiming at
ifnding topics that occur in a collection of documents
[36]. Two of the most popular techniques for topic
analysis are the Latent Dirichlet Allocation and the Structural
Topic Model (STM). In this study, the STM approach [37]
is used to develop a topic model due to its ability to
incorporate reviews’ metadata such as sentiment (rating&gt;3)
that help with interpreting and naming the identified
topics. Each topic in STM represents a set of words that
occur frequently together in a corpus and each document
is associated with a probability distribution of topics per
document. The process for learning the topic model
initiates with data preprocessing that includes removal of
common and custom stop-words and irrelevant
information (punctuation), followed by tokenization (breaking
sentences into word tokens), and stemming (converting
words to their root form). Initially, common stop-words
were considered and gradually with the refinement of the
model, additional stop-words that were irrelevant to our
goal were added to the list of custom stop-words such as
names of people, restaurants, cities, etc. The optimum
number of topics that best fits the dataset is identified
through an iterative process examining diferent values
for the number of topics (K) and inspecting the
semantic coherence, held out likelihood, and exclusivity of the
model at each iteration until a satisfactory model is
produced [37]. Coherence measures the degree of semantic
similarity between high scoring words in the topic. Held
out likelihood tests a trained topic model using a test set
that contains previously unseen documents. Exclusivity
measures the extent to which top words in one topic
are not top words in other topics. The naming of the
topics was performed manually based on domain
knowledge and the most prevalent words that characterize each
topic.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. BERT Personality Classification</title>
        <sec id="sec-2-3-1">
          <title>Recent benefits of the “attention” mechanism in deep learning models have demonstrated state-of-the-art performance in numerous text analysis tasks such as classiifcation.</title>
          <p>BERT uses a multi-layer bidirectional transformer
encoder and is inspired by the concept of knowledge
transfer, since in many problems it is dificult to access
suficiently large volume of labelled data to train deep models.
In transfer learning, a pre-train a model is learned from
massive unlabeled datasets not representing the target
problem, but allows the learning of general knowledge.
BERT-like approaches provide pretrained models and
their embedded knowledge can be transferred to a
target domain where labelled data is limited. Fine-tuning
such models is performed using a labelled dataset
representing the actual problem; these tune the model to the
task at hand. Fine-tuning adds a feedforward layer on
top of the pre-trained BERT. Previous work has
demonstrated that this pre-training and fine-tuning approach
outperforms existing text classification approaches. In
our case, fine-tuning the BERT model was performed
using publicly available personality labelled data. BERT
has been used for personality prediction using the
Personality Cafe MBTI dataset in [38] achieving an accuracy
of around 0.75. In contrast, other deep learning methods
that use the Big 5 model as well as the popular
streamof-consciousness essay dataset such as the one reported
in [39] using CNN, achieve inferior classification
performance.</p>
          <p>Despite their good results, BERT-based approaches
have been criticized that their best performance is
reported with short texts. Long text refer to text with
more than 512 tokens. Such text however are
computationally expensive to process thus most transformers
models limit the number of tokens they can process
simultaneously. In our case, most reviews produced by
consumers exceeded the 512 tokens limit and thus the
prediction of personality was considered as a long text
classification problem. Diferent methods exist to dealing
with this issue, which include the naïve head-only, tail
only or semi-naïve approaches, that either use the top
number of words, bottom number of words, or
combination of top/bottom/important words in the text. Such
approaches lose information but have a minimum
computational cost. Recent works have sought to alleviate the
computational cost constraint by applying more
sophisticated models to longer text instances such as dividing
the long text into chunks and combining the embeddings
of the chunks. However, work by Sun et al. [40] that
investigated diferent long-text treatment methods for
consumer reviews, showed that the best classification
performance is achieved using naïve methods such as
using only the head or tail tokens of the text while
dropping all other content. In this work, we explore the naïve
and semi naïve methods to find the one with the best
personality classification performance prior to labelling
users with their personality. The results, described in a
subsequent section, show that the naïve approach yielded
the best performance, which is in line with [40].</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>3.4. XGBoost Regression</title>
        <p>XGBoost regression is used in this study due to its ability
of producing good results in similar problems. It is an
ensemble method; hence multiple trees are constructed
with the training of each tree depending on errors from
previous trees’ predictions. Gradient descent is used
to generate new trees based on all previous trees while
optimizing for loss and regularization. XGBoost
regularization component balances complexity of the learned
model against predictability. XGBoost optimization is
required to minimize model overfitting and treating data
imbalance, by tuning multiple hyperparameters. The
optimal values of hyperparameters can be determined with
diferent techniques such as the exhaustive (grid search),
Bayesian, or random. The grid search method combines
all possible values of each parameter, to obtain the model
with best performance, while the Bayesian utilizes results
from previous optimization cycles to identify
hyperparameters values with higher probability in improving the
classifiers performance. Grid search is better but slower
while Bayesian is faster but not as accurate. In this work,
the grid search approach is adopted.
4. Methodology
matrix is generated with rows corresponding to
consumers and columns to restaurants. The cells
of the matrix contain ratings when these are
available since customers did not visit all restaurants;
3. Development of a topic model using as corpus
the eWOM (reviews) to identify consumers’
opinions and how these are associated with each
review. Restaurant’s topics are generated by
averaging the topics theta values associated with each
restaurant. This represents common consumer
opinions per restaurant;
4. Assessment of customers’ personality from
eWOM is achieved using the MBTI BERT
personality classification model;
5. Food preferences of users are extracted from
eWOM’s text using a custom NER model;
6. The explicit information from each restaurant is
combined with implicit information that emerges
from personality analysis, food preference, and
topic modelling. These features are used
collectively to enhance the user-item matrix and are
used to train an Extreme Gradient Boosting
(XGboost) regressor model using as output variable
the user rating of restaurants and taking values
in the range [1-5]. The XGBoost is optimized
using hyperparameter tuning and validated using
train/test data split (70/30). The trained model is
used to predict user ratings for restaurant users
have never visited;
7. The performance of the XGBoost model is
compared against that of three popular model-based
CF techniques, namely SVD, SVD++, and NMF.
The comparison models are trained using the
initial user-item matrix while the XGBoost using the
enhanced user-item matrix that includes explicit
and implicit information. The performance of
the models is assessed using popular evaluation
metrics such as mean absolute error (MAE), mean
squared error (MSE), and root mean squared error
(RMSE).</p>
        <sec id="sec-2-4-1">
          <title>The methodology employed to address our research question is presented in Figure 1 and is implemented via the following steps.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
      <p>The data collected includes 105k reviews written in
En1. Collection of restaurant reviews from TripAdvi- glish from tourists who visited Cyprus between 2010
sor and extraction of consumers’ eWOM and ad- to 2020 and posted reviews about their experience with
ditional explicit information of restaurants such restaurants in Cyprus (publicly available). The total
numas cuisine type, price range, and value for money; ber of unique users were 56800 and the number of
restau2. Preprocessing of the data and preparation for sub- rants were 650. Figure 2 depicts descriptive statistics of
sequent analyses (topic modelling, personality reviews ratings per year. For this study only users with
classification). Preprocessing includes punctu- at least 20 reviews are considered and only restaurants
ations and URLs elimination, lowering of text, with at least 50 reviews yielding 93 unique users and 410
stop words removal, tokenization, stemming, and venues.</p>
      <p>lemmatization. During this step, the user-item</p>
      <sec id="sec-3-1">
        <title>5.1. Learned Topic Model</title>
        <sec id="sec-3-1-1">
          <title>To extract consumers’ discussed themes from eWOM,</title>
          <p>an STM topic model was developed using the estimated
optimum K (30) number of topics based on the model’s
performance metrics in Figure 3, with focus on high
coherence, high held-out likelihood, low residuals, and high
lower bound scores.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>5.2. Personality Labelling</title>
        <p>The training of the binary classifiers was performed
using the Personality Cafe MBTI dataset consisting of joint
user posts on a social network labeled by personality
type defined using MBTI questionnaire. The dataset is
publicly available on “Kaggle” [41]. To identify the BERT
long text approach with the best classification
performance, two techniques were examined, namely the naïve
and semi naïve approach and the one with the best
perFigure 3: Topic performance measures for identifying the formance was used in the workflow. For the naïve
apoptimum number of topics. The red circle indicated the K proach, we used the head-only using as sentence length
number of topics selected the 256 and 512 words and for the semi-naïve we used
chunking of text into 128 words and combining their
embedding. The results from this process, presented in</p>
        <p>The naming of the topics in Table 1 was based on Table 2, showed that the 512-naïve-head approach
outperdomain knowledge, words with highest probability in formed the other approaches and thus it was employed in
each topic and words with high Lift score. Lift gives users’ personality classification. Results from the BERT
higher weight to words that appear less frequently in model outperformed personality models trained using
other topics. the same dataset and thus improved our confidence in</p>
        <p>The probability distribution of topics per review de- the personality prediction of each user.
notes the probability of each topic discussed in a review The personality distributions in Figure 5 show
descripand the sum of all topics’ probabilities in each review tive statistics regarding the personalities of users
accordtotal 1. Reviews are associated with the distribution of ing to the detected personality from the MBTI BERT
clastopics prevalence per review. The trained STM model’s sifier fine-tuned using a labelled datasets and treating
theta values per review refer to the probability that a long text using the naïve-head approach with 512 tokens.
topic is associated with each review. These theta val- The acronyms refer to combination of dimensions of the
ues, shown in Figure 4, were used as features during the MBTI model. The trained BERT model predicts for each
training of the XGBoost model along with other features. dimension of the personality model the probability that
a user belongs to any of the personality traits (i.e.,
probability for Extraversion – Introversion (E/I), Sensing –</p>
        <p>Words with high probability and lift scores
great, really, music, live, day, atmosphere, enchiladas, music, really, live, pub
nice, prices, atmosphere, reasonable, big, family, polite, quick, nice, relaxing,
families, cafe
time, excellent, went, night, amazing, first, every, occasions, stay, went, amaze,
week
eat, new, end, found, places, second, thai, always, second, none
lovely, recommend, highly, enjoyed, beautiful, setting, party, setting, party,
hosts, fabulous, thoroughly, absolutely
well, lunch, local, attentive, wonderful, presented, chose, breaks, attentive,
presented, chose
evening, bar, friends, though, group, customers, quiet, whiskies, though
restaurant, location, must, beach, view, right, perfect, definitely, must, far
visit, will, back, really, worth, definitely, going, visit, called
wedding, amazing, even, similar, impression, organize, guest, events, beyond,
pleasure
many, birthday, soon, booked, kitchen, also, october, good, love see, year this,
flight, travel, celebration
experience, nothing, special, whole, maybe, dining, perfection, fiancée, maybe
restaurant, probably, also, mountains, open, available, well, best, more, owner,
managers, troodos
summer, use, even, range, late, cool, evenings, use, dine, cozy
always, can, class, owners, restaurant, number, first, classy, number, varied,
feeling, interesting, hidden
two, outside, can, inside, sit, world, get, disappointed, aircon, magic, noise,
trafic, heat
thai, tourist, across, gem, trying, partner, avoid, duck, again, overall, always,
bespoke, gimmicks, hardcore
bit, little, better, average, like, quite, expensive, much, however, criticisms,
average,
diferent, small, cheese, also, breakfast, euros, greek, platter, platter, options,
vegetarian, bacon, eggs
wife, return, disappointed, restaurant, reviews, favourite, holiday, isn’t, trip,
done
old, cypriot, road, stop, village, along, street, waitresses, road, walk
busy, get, people, table, lot, need, without, joyful, early, book
years, restaurant, made, visiting, coming, several, since, ago, forward, visits
staf, friendly, always, see, come, welcoming, feel, chat, truly, smile, come
chips, served, priced, set, large, course, portion, adults, chips, portion,
reasonably
best, cyprus, don’t, ever, never, restaurants, know, traditional, meze
value, money, recommended, excellent, variety, high, meals , bringing, best,
cyprus, eaten
couldn’t, enough, friend, eat, fresh, away, wow, take, out, basilica, excellent,
lovely, rice, more, time ,highly
ordered, came, table, order, waiter, asked, arrived, minutes, waited, waitress,
left, seated, orders, bill
just, restaurant, basic, way, like, standard, much, full, unacceptable, much,
restaurant
Topic Name</p>
        <p>Entertainment Atmosphere
Family Restaurant
Special Occasion
New Place
Party Place
Lunch
Evening/Bar
Location
Worth Visiting
Wedding Place
Celebration parties
Not Worth
Out of town
Summer location
Fabulous Place
Outside eating
Asian Cuisine
Average Place
Breakfast
Disappointment
Stop during trips
Busy Place
Visit over years
Welcoming Staf
Good portions
Traditional foods
Value for money
Fresh ingredients
Bad service
Nothing Special
Intuition (S/N), Thinking – Feeling (T/F), and Judging – which is depicted in Figure 5. The BERT model deals
Perceiving (J/P)). Combinations of letter from each cate- with 4 classifiers, one for each of the dimensions above.
gory generate 16 four-letter personality types: ISFJ, INFP, The classifier’s average area under the curve (AUC)
perINFJ, ISTP, ISTJ, ISFP, INTP, INTJ, ENTP, ESFP, ENFP, formance is 87%. This is an improved performance
comESFJ, ESTP, ESTJ, ENFJ and ENTJ, the distribution of pared to alternative personality classification techniques
personality-based approaches over these baseline
models. The traditional techniques were also optimized by
tuning two hyperparameters, the number of factors and
the regularization value.</p>
        <p>In the experiments conducted using the
aforementioned restaurants reviews, the data was initially split
into test and training sets (70/30) using stratified
sampling to guarantee that all user ratings are suficiently
represented in the test and training samples. The models
were hyper tuned, trained, and tested using the same
samples. The aforementioned metrics were computed,
and the results that emerged (see Table 3) show that the
MBTI XGBoost model produced the best performance
among all other models. Both personality-based models
outperformed traditional approaches, which indicates
that the use of personality and eWOM-extracted topics
improved the recommendations.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <sec id="sec-4-1">
        <title>This study proposes a combined user-preference with</title>
        <p>user personality restaurant recommendation approach
and constitutes one of the first studies that use customer
5.3. Training and Evaluating the XGBoost preference along with personality in the restaurant
recModels ommendation problem. It utilizes a popular personality
model (MBTI) to enhance the restaurant
recommendaThe enhanced user-item matrix that emerged from the tion process by fine tuning a BERT classification model
personality model, food preferences, and the topics asso- on personality labelled dataset. Due to the length of
ciations per user and venue were used to train an XGB the training data, the best long-text handling approach
regressor model. (naïve-head 512) was employed during BERT model
tun</p>
        <p>The XGBoost model underwent hyperparameter tun- ing. EWOM themes are extracted through topic
moding prior to training by tuning the models’ learning rate, elling from eWOM’s text and are also used as additional
gamma, subsample, and regularization options using grid features of restaurants and users that refer to implicit
search. Traditional recommendation models, namely preferences of users and properties of restaurants. All
SVD, SVD++, and NMF were generated using the sur- aforementioned features are used collectively to train an
prise python library. The models were compared based XGBoost regressor to predict consumers’ satisfaction (i.e.,
on the following performance metrics: the mean abso- rating) for unvisited restaurants. The results show that
lute error (MAE) that represents the average of the ab- the MBTI model in combination with topics from eWOM
solute diference between the real and predicted values, outperforms the model-based collaborative filtering
techMean Squared Error (MSE) and Root Mean Squared Er- niques, ofering a first indication that the application of
ror (RMSE) that is the square root of MSE. Comparison personality and food preferences in restaurant
recomof the two models against traditional recommendation mendation can have valuable results. Future work will
techniques revealed an improved performance of the focus on evaluating additional long-text handling
tech</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>