Extracting User Preferences and Personality from Text for
Restaurant Recommendation
Evripides Christodoulou1 , Andreas Gregoriades1 , Herodotos Herodotou1 and Maria Pampaka2
    Cyprus University of Technology, Limassol, Cyprus
    The University of Manchester, Manchester, UK

                                       Restaurant recommender systems are designed to support restaurant selection by assisting consumers with the information
                                       overload problem. However, despite their promises, they have been criticized of insufficient performance. Recent research in
                                       recommender systems has acknowledged the importance of personality in improving recommendation; however, limited
                                       work exploited this aspect in the restaurant domain. Similarly, the importance of user preferences in food has been known
                                       to improve recommendation but most systems explicitly ask the users for this information. In this paper, we explore the
                                       influence of personality and user preference by utilizing text in consumers’ electronic word of mouth (eWOM) to predict the
                                       probability of a user enjoying a restaurant he/she had not visited before. Food preferences are extracted though a trained
                                       named-entity recognizer learned from a labelled dataset of foods, generated using a rule-based approach. The prediction of
                                       user personality is achieved through a bi-directional transformer approach with a feed-forward classification layer, due to
                                       its improved performance in similar problems over other machine learning models. The personality classification model
                                       utilizes the textual information of reviews and predicts the personality of the author. Topic modelling is used to identify
                                       additional features that characterize users’ preferences and restaurants properties. All aforementioned features are used
                                       collectively to train an extreme gradient boosting tree model, which outputs the predicted user rating of restaurants. The
                                       trained model is compared against popular recommendation techniques such as nonnegative matrix factorization and single
                                       value decomposition.

                                       Consumer Personality, Food preference extraction, Recommender System, Topic Modelling

1. Introduction                                                                                                   sonality and preferences, while explicit features refer to
                                                                                                                  ratings of restaurants, their estimated value, price, and
Dining is one of the top five tourist activities during a cuisine offered.
leisure trip that plays a central role in travel experience.                                                         There are several ways to extract user preferences for
Recently, interest on food experience has been growing restaurant recommendation [5]. The simplest is through
[1], with businesses in the hospitality sector seeking in- explicit queries by asking users to define their prefer-
sights regarding the dining behaviors and preferences of ences. This however has some disadvantages, as food
customers to improve decision making in areas such as preferences might not be covered by the questions asked.
marketing [2] and recommendation [3]. Past research Alternative methods utilize user ratings to find similar-
that utilizes food in recommender systems such as [4] ities between users and restaurants (e.g., collaborative
employ simple techniques such as frequencies of food filtering). Another method for preference extraction is
vocabularies in Bag of Words. However, such techniques user opinion analysis that utilizes natural language pro-
require a lexicon of complete list of foods that usually cessing.
is not available for different cousins and countries. In                                                             Traditional recommendation approaches base their rec-
this paper, we utilize implicit and explicit information ommendations on user preferences extracted from users’
of consumers’ eWOM to improve restaurant recommen- historical records, such as ratings, reviews, or purchases.
dation. Implicit information refers to textual comments Popular techniques include the collaborative and content-
in reviews that can be used to estimate consumers’ per- based filtering approaches. Recently, there is strong in-
                                                                                                                  terest in the utilization of users’ personality, since it is
                                                                                                                  interest, destination recommendations, utilizing either
questionnaires or automated personality recognition.           2. Existing Knowledge
   This paper illustrates the utilization of consumers’ per-
sonality and user preferences extracted from the textual       This section provides a review of recommendation tech-
part of electronic word of mouth (eWOM) to improve             niques, the concept of personality, and elaborates on how
recommendation. EWOM represent consumer opinions               it has been used in recommender systems so far.
about products and services and has been used exten-
sively in identifying consumers’ preferences. Recom-           2.1. Restaurant Recommender Systems
mendations are made by training an Extreme Gradient
Boosting (XGBoost) prediction model using as features,         Recommender systems aim to predict the satisfaction of a
the users’ personality and the users’ preferences (e.g.,       consumer with an item (product/service) he/she has not
food). XGBoost is used due to its good performance             bought yet [11]. This is part of one-to-one marketing that
in similar recommendation problems [7]. The research           seeks to match items to consumers’ preferences in con-
question addressed in this paper focuses on whether the        trast to mass marketing aiming to satisfy a target market
integration of personality with other features inferred        segment [12]. Popular approaches focus on consumers’
from structured and unstructured parts of online reviews       past experiences (ratings) for the creation of a user-item
improves restaurant recommendation, in contrast to pop-        matrix and based on that predict what is more appro-
ular model-based collaborative filtering (CF) techniques       priate to a user depending on either similarity between
such as nonnegative matrix factorization (NMF) and sin-        users or items (products, services) [11]. The relationship
gle value decomposition (SVD).                                 between consumers or between products can be found
   The proposed approach utilizes consumers’ food pref-        using similarity metrics, and this method is known as
erences and personalities along with perceptions about         Collaborative Filtering (CF) [13]. This has been success-
venues from eWOM to recommend most suitable restau-            fully applied in tourism recommendation problems such
rants to tourists. Labelled personality data is utilized       as hotels or points of interests, and is considered as one of
to train a BERT (Bidirectional Encoder Representations         the most popular techniques [14]. Another popular tech-
from Transformers) classifier using the personality model      nique is content-based filtering, that attempts to guess
of Myers-Briggs Type Indicator (MBTI) due to its good          what a user may like based on items’ features rather than
results in previous studies [8]. User preferences are ex-      their rating [13]. A hybrid approach takes the advantage
tracted through topic modelling and a trained food named       of both content-based filtering and collaborative filtering
entity recognizer. An XGBoost model is generated to pre-       [11].
dict the probability of a user liking an unvisited restau-        CF techniques, however, suffer from the cold start prob-
rant based on its personality, preference, and themes that     lem that occurs when very little or no data is available
characterize the venue.                                        about a user and thus inability to identify similar con-
   The research question addressed in this work is how         sumers [15]. In addition, data sparsity exacerbates the
to best combine user preference and personality mod-           problem when there are a lot of unrated items in the user-
els with topic features inferred from eWOM to produce          item matrix. This occurs when there is not enough data
the best recommendation, in contrast to popular model-         to populate the user-item matrix based on which to make
based collaborative filtering (CF) techniques. This is a       reliable inferences [16]. In tourism, the collection of data
continuation of our previous work in [9, 10] that exam-        is difficult and time-consuming due to the limited time
ine the use of personality and emotion in recommender          that tourists spent at a destination. The cold start prob-
systems. The contribution of this work lies in the auto-       lem appears with first-time users (tourists) since there are
mated detection of food preferences from eWOM and its          no records of their purchasing activity at a specific des-
combination with user personality and topic modeling           tination.To address these CF problems, recent methods
for restaurant recommendation.                                 utilise machine learning techniques such as matrix fac-
   The paper is organized as follows. The next section         torisation to approximate the user-item matrix content
introduces background knowledge on restaurant recom-           using latent variables that emerge from the initial data.
mender systems and techniques for extracting food pref-        The singular value decomposition (SVD), optimized SVD
erences and personality from text. The next section de-        (SVD++), and non-negative matrix factorization (NMF)
scribes techniques for identifying topics discussed in con-    models factorize the user-item matrix and predict the
sumers’ eWOM and personality prediction using deep             satisfaction of users for products that are unknown [17].
neural networks. Subsequent sections elaborate on the          Alternatively, content-based approaches utilize metadata
methodology followed and the results obtained. The pa-         about new products to address the cold start problem.
per concludes with the discussion and future directions.       A useful source for obtaining these metadata is textual
                                                               information from eWOM and its analysis using text ana-
                                                               lytics [18]. An example application includes work by Sun
                                                               et al. [19] that improved CF performance by analysing
restaurants eWOM to define numerical features corre-           mechanism for defining the most essential aspects of per-
sponding to consumers satisfaction through sentiment           sonality that describes people characteristics that creates
analysis. In the same vein, topic modelling techniques         and reflects their behaviour [28].
have been used with CF to assist in estimating the simi-          Personality prediction is an important phase of
larity between consumers or items [20]. Finally, work by       personality-aware recommender systems, and the two
Zhang et al. [21] used consumers or items characteristics      main methods for doing so is through questionnaires and
to cluster them into groups, and then find correlations        automated means. Generally, questionnaires are more
between clusters to address the data sparsity problem.         accurate in assessing personality; however, the process
   Recently, a strong interest emerged in using the per-       is tedious while the automated approach is easier to con-
sonalities of consumers in an effort to better understand      duct, by utilising user’s existing data that can be either
and match their needs, as “personality” relates to the         text, images, videos, likes (behavioural data) etc. [18]
perceptions, feelings, motivations, and preferences of in-     Predicting personality from text is a popular automated
dividuals [22]. The application of user personality has        approach that is based on personality theory claiming
improved the performance of recommendations in the             that words can reveal some psychological states and per-
tourism domain for points of interest compared to tradi-       sonality of the author of the text. There are two main
tional methods [23]. Personality-based recommendations         categories of techniques, the feature-based and the deep
have also been shown to greatly reduce the cold start and      learning: the former uses unigrams/n-grams (open vo-
data sparsity problems, and improved the performance           cabulary approach) or lexicons (closed vocabulary) of
of recommendations in areas such as online advertising,        features relevant to personality, and the latter text em-
social media, books, and music [24]. However, these            beddings learned from large corpus of text in an unsuper-
approaches do not take advantage of eWOM data from             vised manner (language models). Popular feature-based
users on the web to extract their preferences and their        methods utilize the Mairesse [29] and linguistic inquiry
personalities. They focus mainly on the extraction of          and word count techniques [30]. Features from these are
user data from specialized questionnaires to collect con-      fed into different machine learning classifiers (e.g., Naïve
sumers’ behaviours and personalities. Such approaches          Bayes, support vector machines) to make predictions.
fail to continuously update the system because of the          Obtaining such features however is a costly process and
time-consuming use of questionnaires that leads to the         cannot effectively represent the original text semantics.
loss of automation and update limitations.                     To avoid feature engineering, deep neural models and
                                                               language models are employed to learn text representa-
2.2. Personality Extraction from Text                          tions that currently result in improved accuracy. Deep
                                                               models focus on the context of the text and not just a
Personality is a set of characteristics and behaviours of an   static representation for a word or a sentence. Those kind
individual that influence many areas of his/her life such      of deep learning techniques are using an attention mech-
as motivations, preferences, as well as consumer pref-         anism that focuses on giving weights to words based on
erences and behaviour [23]. Applications of automated          how they are used in a text giving the ability of captur-
personality predictions have been applied by researchers       ing the semantic content [31]. A popular architecture is
on data from various social networks such as Facebook,         the BERT (Bidirectional Encoder Representations from
Twitter, to explore correlations between personalities and     Transformers) that utilizes transformers neural network
the different user activities, purchasing behaviors and        architecture. Attention-based transformers have shown
liking of foods from specific cuisines [25].                   that collecting the semantics of a text improves the per-
   The two most popular text-based personality classifi-       formance level and the predication accuracy of ML per-
cation methods are based on the Myers-Briggs Type Indi-        sonality models [32]. Given this, the method proposed in
cator (MBTI) [26] and the Big Five [27] personality traits     this paper utilizes attention-based personality prediction.
due to the availability of labelled data on these models.         Most approaches use a binary classifier for each of
The classifiers with best performance are usually employ-      the personality traits (MBTI) such as a classifier for
ing the MBTI personality model that focuses on 8 key           extraversion-introversion etc. Such methods require pre
types of characteristics that people have, Extraversion or     labeled data with the personality class. The first step in
Introversion, Sensing or Intuition, Thinking or Feeling,       the process is the vectorization of the text into a form
and Judging or Perceiving, behaviours. The combina-            that can be processed by ML algorithms [33]. This can be
tion of characteristics can shape 16 different personality     done using open/closed lexicons or sentence embeddings
types and classify people to the proper personality clus-      in the case of deep learning methods (BERT). The vector-
ter [13]. The Big 5 Personality model express personality      ized data is used to train a classifier using the data label
in the following 5 dimensions: Agreeableness, Extraver-        or fine tune a pretrained model in the case of BERT. The
sion, Openness to Experience, Conscientiousness, and           trained and validated model can be used to predict unseen
Neuroticism. Such taxonomies are recognized as a valid         data. Recent personality classification techniques that
utilize deep learning for Big Five personality prediction,   tures (220) that resulted in the best model performance.
such as the DeepPerson [34] demonstrate classification       The selected food features where then one-hot encoded
performance (AUC score) of around 70% per personal-          for each consumer review. To identify the food prefer-
ity dimension, using different training datasets, which is   ences of each user, reviews were grouped by user and the
much lower compared to classifiers that use MBTI data.       most frequent food entities in each user’s reviews were
                                                             considered as food preferences. This process considers
                                                             that, when customers visit different restaurants and write
3. Technical Background                                      comments about the food they ordered, irrespective of
                                                             the food’s quality and the review rating, it constitutes
The proposed method utilises a named entity recognition
                                                             food preference of the user.
to extract food preferences, an automated eWOM topic
modelling for the identification of themes discussed in
review’s text, a BERT-based personality classification, 3.2. EWOM Topic Modelling
and an ensemble tree-based regression for the prediction Topic modelling is a popular tool for extracting informa-
of consumer restaurant ratings.                             tion from unstructured data and is used in this work to
                                                            identify themes discussed by consumers in eWOM. Topic
3.1. User’s Food Preference Extraction                      models generally involve a statistical model aiming at
                                                            finding topics that occur in a collection of documents
A named entity recognition (NER) is utilized to extract
                                                            [36]. Two of the most popular techniques for topic analy-
the food preferences of customers. NER is a major com-
                                                            sis are the Latent Dirichlet Allocation and the Structural
ponent in NLP systems to extract information from un-
                                                            Topic Model (STM). In this study, the STM approach [37]
structured text. An entity can be any word or sentence
                                                            is used to develop a topic model due to its ability to incor-
that refers to the concept of question. There are two
                                                            porate reviews’ metadata such as sentiment (rating>3)
main approaches for the creation of a NER, the model-
                                                            that help with interpreting and naming the identified
based, and rule-based approach. The latter focuses on the
                                                            topics. Each topic in STM represents a set of words that
grammatical rules and linguistic terms to extract entities.
                                                            occur frequently together in a corpus and each document
The model-based approach generates machine learning
                                                            is associated with a probability distribution of topics per
models using a text with pre-labelled entities. Most food
                                                            document. The process for learning the topic model ini-
NER models as reported in [35] are trained on data that
                                                            tiates with data preprocessing that includes removal of
did not include Cyprus dishes thus their predictions were
                                                            common and custom stop-words and irrelevant informa-
insufficient for our case. There is a wide range of generic
                                                            tion (punctuation), followed by tokenization (breaking
libraries suitable for NER, such as NLTK, spaCy, Stan-
                                                            sentences into word tokens), and stemming (converting
ford NER, Stanza, and Flair, but none was able to provide
                                                            words to their root form). Initially, common stop-words
appropriate food recognition in text.
                                                            were considered and gradually with the refinement of the
   To extract food preferences, the spacy library was uti-
                                                            model, additional stop-words that were irrelevant to our
lized, and several rules have been specified that enabled
                                                            goal were added to the list of custom stop-words such as
the extraction of sentences that mention food consump-
                                                            names of people, restaurants, cities, etc. The optimum
tion such as “I ate ”,“I had for dinner” etc. To generate
                                                            number of topics that best fits the dataset is identified
a sufficient training set, a local and international food
                                                            through an iterative process examining different values
recipe dataset was used. The returned sentences were
                                                            for the number of topics (K) and inspecting the seman-
annotated automatically based on the position in the
                                                            tic coherence, held out likelihood, and exclusivity of the
sentence where the food entity occurred. This was nec-
                                                            model at each iteration until a satisfactory model is pro-
essary in order to create a training dataset labelled with
                                                            duced [37]. Coherence measures the degree of semantic
food names, and their start/end position in the sentence,
                                                            similarity between high scoring words in the topic. Held
based on which the food NER training was performed.
                                                            out likelihood tests a trained topic model using a test set
The trained NER achieved an overall accuracy of 81%
                                                            that contains previously unseen documents. Exclusivity
(70/30 train-test split) and was applied on the restaurant
                                                            measures the extent to which top words in one topic
reviews to extract foods associated with each review. Due
                                                            are not top words in other topics. The naming of the
to the large number of food entities that were generated,
                                                            topics was performed manually based on domain knowl-
there were a lot of repetitions due to different spellings.
                                                            edge and the most prevalent words that characterize each
Thus, to reduce the dimensionality of the dataset, a fea-
ture selection process was performed using a random
forest machine learning model to identify the most im-
portant food names using the review ratings as the target
variable. The process yielded the optimum number of fea-
Figure 1: Overview of the approach and its evaluation

3.3. BERT Personality Classification                           has been used for personality prediction using the Per-
                                                               sonality Cafe MBTI dataset in [38] achieving an accuracy
Recent benefits of the “attention” mechanism in deep
                                                               of around 0.75. In contrast, other deep learning methods
learning models have demonstrated state-of-the-art per-
                                                               that use the Big 5 model as well as the popular stream-
formance in numerous text analysis tasks such as classi-
                                                               of-consciousness essay dataset such as the one reported
                                                               in [39] using CNN, achieve inferior classification perfor-
   BERT uses a multi-layer bidirectional transformer en-
coder and is inspired by the concept of knowledge trans-
                                                                  Despite their good results, BERT-based approaches
fer, since in many problems it is difficult to access suffi-
                                                               have been criticized that their best performance is re-
ciently large volume of labelled data to train deep models.
                                                               ported with short texts. Long text refer to text with
In transfer learning, a pre-train a model is learned from
                                                               more than 512 tokens. Such text however are compu-
massive unlabeled datasets not representing the target
                                                               tationally expensive to process thus most transformers
problem, but allows the learning of general knowledge.
                                                               models limit the number of tokens they can process si-
BERT-like approaches provide pretrained models and
                                                               multaneously. In our case, most reviews produced by
their embedded knowledge can be transferred to a tar-
                                                               consumers exceeded the 512 tokens limit and thus the
get domain where labelled data is limited. Fine-tuning
                                                               prediction of personality was considered as a long text
such models is performed using a labelled dataset repre-
                                                               classification problem. Different methods exist to dealing
senting the actual problem; these tune the model to the
                                                               with this issue, which include the naïve head-only, tail
task at hand. Fine-tuning adds a feedforward layer on
                                                               only or semi-naïve approaches, that either use the top
top of the pre-trained BERT. Previous work has demon-
                                                               number of words, bottom number of words, or combi-
strated that this pre-training and fine-tuning approach
                                                               nation of top/bottom/important words in the text. Such
outperforms existing text classification approaches. In
                                                               approaches lose information but have a minimum compu-
our case, fine-tuning the BERT model was performed
                                                               tational cost. Recent works have sought to alleviate the
using publicly available personality labelled data. BERT
computational cost constraint by applying more sophis-               matrix is generated with rows corresponding to
ticated models to longer text instances such as dividing             consumers and columns to restaurants. The cells
the long text into chunks and combining the embeddings               of the matrix contain ratings when these are avail-
of the chunks. However, work by Sun et al. [40] that                 able since customers did not visit all restaurants;
investigated different long-text treatment methods for            3. Development of a topic model using as corpus
consumer reviews, showed that the best classification                the eWOM (reviews) to identify consumers’ opin-
performance is achieved using naïve methods such as                  ions and how these are associated with each re-
using only the head or tail tokens of the text while drop-           view. Restaurant’s topics are generated by aver-
ping all other content. In this work, we explore the naïve           aging the topics theta values associated with each
and semi naïve methods to find the one with the best                 restaurant. This represents common consumer
personality classification performance prior to labelling            opinions per restaurant;
users with their personality. The results, described in a         4. Assessment of customers’ personality from
subsequent section, show that the naïve approach yielded             eWOM is achieved using the MBTI BERT per-
the best performance, which is in line with [40].                    sonality classification model;
                                                                  5. Food preferences of users are extracted from
3.4. XGBoost Regression                                              eWOM’s text using a custom NER model;
                                                                  6. The explicit information from each restaurant is
XGBoost regression is used in this study due to its ability
                                                                     combined with implicit information that emerges
of producing good results in similar problems. It is an
                                                                     from personality analysis, food preference, and
ensemble method; hence multiple trees are constructed
                                                                     topic modelling. These features are used collec-
with the training of each tree depending on errors from
                                                                     tively to enhance the user-item matrix and are
previous trees’ predictions. Gradient descent is used
                                                                     used to train an Extreme Gradient Boosting (XG-
to generate new trees based on all previous trees while
                                                                     boost) regressor model using as output variable
optimizing for loss and regularization. XGBoost regular-
                                                                     the user rating of restaurants and taking values
ization component balances complexity of the learned
                                                                     in the range [1-5]. The XGBoost is optimized us-
model against predictability. XGBoost optimization is
                                                                     ing hyperparameter tuning and validated using
required to minimize model overfitting and treating data
                                                                     train/test data split (70/30). The trained model is
imbalance, by tuning multiple hyperparameters. The op-
                                                                     used to predict user ratings for restaurant users
timal values of hyperparameters can be determined with
                                                                     have never visited;
different techniques such as the exhaustive (grid search),
Bayesian, or random. The grid search method combines              7. The performance of the XGBoost model is com-
all possible values of each parameter, to obtain the model           pared against that of three popular model-based
with best performance, while the Bayesian utilizes results           CF techniques, namely SVD, SVD++, and NMF.
from previous optimization cycles to identify hyperpa-               The comparison models are trained using the ini-
rameters values with higher probability in improving the             tial user-item matrix while the XGBoost using the
classifiers performance. Grid search is better but slower            enhanced user-item matrix that includes explicit
while Bayesian is faster but not as accurate. In this work,          and implicit information. The performance of
the grid search approach is adopted.                                 the models is assessed using popular evaluation
                                                                     metrics such as mean absolute error (MAE), mean
                                                                     squared error (MSE), and root mean squared error
4. Methodology                                                       (RMSE).

The methodology employed to address our research ques-
tion is presented in Figure 1 and is implemented via the      5. Results
following steps.
                                                              The data collected includes 105k reviews written in En-
    1. Collection of restaurant reviews from TripAdvi-        glish from tourists who visited Cyprus between 2010
       sor and extraction of consumers’ eWOM and ad-          to 2020 and posted reviews about their experience with
       ditional explicit information of restaurants such      restaurants in Cyprus (publicly available). The total num-
       as cuisine type, price range, and value for money;     ber of unique users were 56800 and the number of restau-
    2. Preprocessing of the data and preparation for sub-     rants were 650. Figure 2 depicts descriptive statistics of
       sequent analyses (topic modelling, personality         reviews ratings per year. For this study only users with
       classification). Preprocessing includes punctu-        at least 20 reviews are considered and only restaurants
       ations and URLs elimination, lowering of text,         with at least 50 reviews yielding 93 unique users and 410
       stop words removal, tokenization, stemming, and        venues.
       lemmatization. During this step, the user-item
Figure 2: Percentage of restaurant review ratings [1-5] per
year from 2010 to 2020

5.1. Learned Topic Model
To extract consumers’ discussed themes from eWOM,
an STM topic model was developed using the estimated
optimum K (30) number of topics based on the model’s
performance metrics in Figure 3, with focus on high co-
herence, high held-out likelihood, low residuals, and high
lower bound scores.

                                                              Figure 4: Average theta values per topic

                                                              5.2. Personality Labelling
                                                          The training of the binary classifiers was performed us-
                                                          ing the Personality Cafe MBTI dataset consisting of joint
                                                          user posts on a social network labeled by personality
                                                          type defined using MBTI questionnaire. The dataset is
                                                          publicly available on “Kaggle” [41]. To identify the BERT
                                                          long text approach with the best classification perfor-
                                                          mance, two techniques were examined, namely the naïve
                                                          and semi naïve approach and the one with the best per-
Figure 3: Topic performance measures for identifying the formance was used in the workflow. For the naïve ap-
optimum number of topics. The red circle indicated the K proach, we used the head-only using as sentence length
number of topics selected                                 the 256 and 512 words and for the semi-naïve we used
                                                          chunking of text into 128 words and combining their
                                                          embedding. The results from this process, presented in
   The naming of the topics in Table 1 was based on Table 2, showed that the 512-naïve-head approach outper-
domain knowledge, words with highest probability in formed the other approaches and thus it was employed in
each topic and words with high Lift score. Lift gives users’ personality classification. Results from the BERT
higher weight to words that appear less frequently in model outperformed personality models trained using
other topics.                                             the same dataset and thus improved our confidence in
   The probability distribution of topics per review de- the personality prediction of each user.
notes the probability of each topic discussed in a review    The personality distributions in Figure 5 show descrip-
and the sum of all topics’ probabilities in each review tive statistics regarding the personalities of users accord-
total 1. Reviews are associated with the distribution of ing to the detected personality from the MBTI BERT clas-
topics prevalence per review. The trained STM model’s sifier fine-tuned using a labelled datasets and treating
theta values per review refer to the probability that a long text using the naïve-head approach with 512 tokens.
topic is associated with each review. These theta val- The acronyms refer to combination of dimensions of the
ues, shown in Figure 4, were used as features during the MBTI model. The trained BERT model predicts for each
training of the XGBoost model along with other features. dimension of the personality model the probability that
                                                          a user belongs to any of the personality traits (i.e., prob-
                                                          ability for Extraversion – Introversion (E/I), Sensing –
Table 1
Specified names for the topics that emerged from STM analysis

  Topics     Words with high probability and lift scores                                           Topic Name
  Topic1:    great, really, music, live, day, atmosphere, enchiladas, music, really, live, pub     Entertainment Atmosphere
  Topic2:    nice, prices, atmosphere, reasonable, big, family, polite, quick, nice, relaxing,     Family Restaurant
             families, cafe
  Topic3:    time, excellent, went, night, amazing, first, every, occasions, stay, went, amaze,    Special Occasion
  Topic4:    eat, new, end, found, places, second, thai, always, second, none                      New Place
  Topic5:    lovely, recommend, highly, enjoyed, beautiful, setting, party, setting, party,        Party Place
             hosts, fabulous, thoroughly, absolutely
  Topic6:    well, lunch, local, attentive, wonderful, presented, chose, breaks, attentive,        Lunch
             presented, chose
  Topic7:    evening, bar, friends, though, group, customers, quiet, whiskies, though              Evening/Bar
  Topic8:    restaurant, location, must, beach, view, right, perfect, definitely, must, far        Location
  Topic9:    visit, will, back, really, worth, definitely, going, visit, called                    Worth Visiting
  Topic10:   wedding, amazing, even, similar, impression, organize, guest, events, beyond,         Wedding Place
  Topic11:   many, birthday, soon, booked, kitchen, also, october, good, love see, year this,      Celebration parties
             flight, travel, celebration
  Topic12:   experience, nothing, special, whole, maybe, dining, perfection, fiancée, maybe        Not Worth
  Topic13:   restaurant, probably, also, mountains, open, available, well, best, more, owner,      Out of town
             managers, troodos
  Topic14:   summer, use, even, range, late, cool, evenings, use, dine, cozy                       Summer location
  Topic15:   always, can, class, owners, restaurant, number, first, classy, number, varied,        Fabulous Place
             feeling, interesting, hidden
  Topic16:   two, outside, can, inside, sit, world, get, disappointed, aircon, magic, noise,       Outside eating
             traffic, heat
  Topic17:   thai, tourist, across, gem, trying, partner, avoid, duck, again, overall, always,     Asian Cuisine
             bespoke, gimmicks, hardcore
  Topic18:   bit, little, better, average, like, quite, expensive, much, however, criticisms,      Average Place
  Topic19:   different, small, cheese, also, breakfast, euros, greek, platter, platter, options,   Breakfast
             vegetarian, bacon, eggs
  Topic20:   wife, return, disappointed, restaurant, reviews, favourite, holiday, isn’t, trip,     Disappointment
  Topic21:   old, cypriot, road, stop, village, along, street, waitresses, road, walk              Stop during trips
  Topic22:   busy, get, people, table, lot, need, without, joyful, early, book                     Busy Place
  Topic23:   years, restaurant, made, visiting, coming, several, since, ago, forward, visits       Visit over years
  Topic24:   staff, friendly, always, see, come, welcoming, feel, chat, truly, smile, come         Welcoming Staff
  Topic25:   chips, served, priced, set, large, course, portion, adults, chips, portion, reason-   Good portions
  Topic26:   best, cyprus, don’t, ever, never, restaurants, know, traditional, meze                Traditional foods
  Topic27:   value, money, recommended, excellent, variety, high, meals , bringing, best,          Value for money
             cyprus, eaten
  Topic28:   couldn’t, enough, friend, eat, fresh, away, wow, take, out, basilica, excellent,      Fresh ingredients
             lovely, rice, more, time ,highly
  Topic29:   ordered, came, table, order, waiter, asked, arrived, minutes, waited, waitress,       Bad service
             left, seated, orders, bill
  Topic30:   just, restaurant, basic, way, like, standard, much, full, unacceptable, much,         Nothing Special

Intuition (S/N), Thinking – Feeling (T/F), and Judging –          which is depicted in Figure 5. The BERT model deals
Perceiving (J/P)). Combinations of letter from each cate-         with 4 classifiers, one for each of the dimensions above.
gory generate 16 four-letter personality types: ISFJ, INFP,       The classifier’s average area under the curve (AUC) per-
INFJ, ISTP, ISTJ, ISFP, INTP, INTJ, ENTP, ESFP, ENFP,             formance is 87%. This is an improved performance com-
ESFJ, ESTP, ESTJ, ENFJ and ENTJ, the distribution of              pared to alternative personality classification techniques
Table 2                                                    personality-based approaches over these baseline mod-
Performance results per long text treatment                els. The traditional techniques were also optimized by
                                                           tuning two hyperparameters, the number of factors and
  Long text treatment for MBTI BERT AUC ACC
                                                           the regularization value.
  Naïve - head 512 tokens                  0.878 0.839        In the experiments conducted using the aforemen-
  Naïve - head 256 tokens                  0.784 0.759     tioned restaurants reviews, the data was initially split
  Semi naïve - Sliced text 128 tokens      0.653 0.662     into test and training sets (70/30) using stratified sam-
                                                           pling to guarantee that all user ratings are sufficiently
                                                           represented in the test and training samples. The models
that utilize deep learning and Big Five personality model, were hyper tuned, trained, and tested using the same
such as the DeepPerson [34] that achieved AUC of around samples. The aforementioned metrics were computed,
70%.                                                       and the results that emerged (see Table 3) show that the
                                                           MBTI XGBoost model produced the best performance
                                                           among all other models. Both personality-based models
                                                           outperformed traditional approaches, which indicates
                                                           that the use of personality and eWOM-extracted topics
                                                           improved the recommendations.

                                                               Table 3
                                                               Performance results per model incorporating all features

                                                               Performance metric             SVD SVD++ NFM XGB
                                                               (lower is better)
                                                               Mean Absolute Error (MAE) 0.65          0.68    0.82   0.40
                                                               Mean Squared Error (MSE)  0.87          0.89    1.22   0.24
                                                               Root Mean Squared Error 0.93            0.94    1.10   0.49

Figure 5: Distribution of MBTI personality traits using each   6. Conclusions
dimension’s acronyms
                                                               This study proposes a combined user-preference with
                                                               user personality restaurant recommendation approach
                                                               and constitutes one of the first studies that use customer
5.3. Training and Evaluating the XGBoost                       preference along with personality in the restaurant rec-
                                                               ommendation problem. It utilizes a popular personality
     Models                                                    model (MBTI) to enhance the restaurant recommenda-
The enhanced user-item matrix that emerged from the            tion process by fine tuning a BERT classification model
personality model, food preferences, and the topics asso-      on personality labelled dataset. Due to the length of
ciations per user and venue were used to train an XGB          the training data, the best long-text handling approach
regressor model.                                               (naïve-head 512) was employed during BERT model tun-
   The XGBoost model underwent hyperparameter tun-             ing. EWOM themes are extracted through topic mod-
ing prior to training by tuning the models’ learning rate,     elling from eWOM’s text and are also used as additional
gamma, subsample, and regularization options using grid        features of restaurants and users that refer to implicit
search. Traditional recommendation models, namely              preferences of users and properties of restaurants. All
SVD, SVD++, and NMF were generated using the sur-              aforementioned features are used collectively to train an
prise python library. The models were compared based           XGBoost regressor to predict consumers’ satisfaction (i.e.,
on the following performance metrics: the mean abso-           rating) for unvisited restaurants. The results show that
lute error (MAE) that represents the average of the ab-        the MBTI model in combination with topics from eWOM
solute difference between the real and predicted values,       outperforms the model-based collaborative filtering tech-
Mean Squared Error (MSE) and Root Mean Squared Er-             niques, offering a first indication that the application of
ror (RMSE) that is the square root of MSE. Comparison          personality and food preferences in restaurant recom-
of the two models against traditional recommendation           mendation can have valuable results. Future work will
techniques revealed an improved performance of the             focus on evaluating additional long-text handling tech-
niques and combine the results of the learned classifiers          formation Systems and Technologies, Springer In-
with other traditional machine learning models in an en-           ternational Publishing, Cham, 2022, pp. 13–21.
semble manner to improve further the performance of [10] A. Gregoriades, M. Pampaka, M. Georgiades, A
personality classification, given that personality is a valu-      Holistic Approach to Requirements Elicitation for
able feature that enhances restaurant recommendation.              Mobile Tourist Recommendation Systems, in:
                                                                   K. Arai, R. Bhatia (Eds.), Future of Information and
                                                                   Communication Conference, Springer International
