<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empathic inclination from digital footprints*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Polignano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaetano Rossiello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Empathy Inclination Prediction Model</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bari “Aldo Moro”, Dept. of Computer Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>⋆The large amount of personal data left by users on the Internet is a valuable source of information for improving the efficacy of profiling tasks. In particular, the data collected from social media can disclose personal habits, preferences and affective traits. The study is focused on the emphatic inclination of a subject, i.e. the ability to feel and share another person's emotions, which can be a relevant aspect to consider in retrieval or recommendation processes. To support this idea, a model was proposed to predict its level and to emphasize the correlations with explicit features that characterize the user.</p>
      </abstract>
      <kwd-group>
        <kwd>Social medium footprint</kwd>
        <kwd>Empathy</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>of features, as sketched in Fig. , to predict an empathy score by different linear
regression models.</p>
      <p>Each user   is represented as the concatenation of five features vectors.
Each vector captures a particular aspect of the user profile. User’s preferences
are obtained by analyzing her likes grouped by topics over social media and
they include likes over pages, artists, movies and many other topics of interest.
The representation used is the SVD [ ] for the representation through relevant
combinations of concepts and LDA [ ] for a combination of descriptive topics. The
posts are analyzed by a pipeline performing basic NLP operations (we adopted
TweetNLP as tokenizer: http://www.cs.cmu.edu/~ark/TweetNLP/), as well as
operations for annotating emoticon and for removing character repetitions longer
than two inside words. In order to capture the semantics behind the words, we
use the word vec algorithm [ ] over all the textual posts in the collection for
learning -dimension vectors, by considering only words that occur at least 10
times and 10 epochs of learning. Moreover, we divide the whole vocabulary of
word vec vectors into clusters, which should represent topics of discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Session</title>
      <p>The aim of the experiment is to predict the user’s empathy by exploiting
information explicitly available on her Facebook profile, as well as implicit information
that can be inferred, as explained in Sec. . Moreover, we want to identify which
groups of features are more important for obtaining an accurate prediction, by
discovering relevant correlations among empathy and user’s features.</p>
      <p>More precisely, we formulated the following research questions:
– RQ . Is it possible to predict empathy from social media footprints?
– RQ . What are the most important features to consider for improving the
prediction accuracy?</p>
      <p>The dataset used in the experiment, proposed by Kosinski [ ], contains
information about 4 millions of Facebook users. Data are collected using the
“myPersonality” Facebook application. We removed those users who have not
terminated the questionnaire or who were not linkable to other data (Demographic,
Personality Traits, Activity, Status), after this step, the dataset is composed by
903 users, 178, 766 status updates. The range of the empathy value is - . We
exploit three different regression algorithms: ) Linear Regression ( ), ) Simple
Regression (
with SMO algorithm ( 
) ,
) different configurations of kernel of the SVM Regression
) . For the  
we used the polynomial kernel
( 
.</p>
      <p>), while the latter computes the empathy score as the simple
average of EQS observed in the dataset (Avg EQS, Value Predicted=
MAE=
.</p>
      <p>, RMSE=
.</p>
      <p>). As for the evaluation metrics, we adopted the
Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE).The
.
.
,
evaluation protocol was</p>
      <p>folds cross validation.</p>
    </sec>
    <sec id="sec-3">
      <title>Discussion of Results</title>
      <p>We execute a first experiment by running 
,  , and  
by using all the
features of the dataset (</p>
      <p>features in total). We compared the results in</p>
      <p>with our baselines observing that using  
with a polynomial kernel
is not a good choice, having a large number of features. On the contrary,  
with an RBF kernel is able to overcome both the baselines by setting  = 1
( 
= 5.9101,</p>
      <p>= 8.2341). These results allow us to answer positively
to RQ . Interesting results are obtained by  . MAE and RMSE are better than
the baselines, despite this algorithm creates a regression function considering only
the feature with higher variance in the dataset. Due to these findings, we decided
to perform feature selection. We exploit the correlation-based feature subset




.
.
.</p>
      <p>.
selection for finding the set of “most informative” features for the prediction task.
The selected features are those with high correlation with the prediction class and
low correlation among them. We obtained a set of 37 features. The best result in
term of MAE (5.6673) is obtained by the  

, with  = 2. This configuration
does not provide the best RMSE (7.8236) that it is achieved by  

with
 = 8. For the  
is obtained with</p>
      <p>configuration, the best result for both MAE and RMSE
= 1 (5.714, 7.8407). It is interesting to note that results
obtained by exploiting only selected features are better than both the baselines
and the runs over the whole set of features. Analyzing the features emerged
after the selection process, we can note some interesting correlations among the
semantics of them and the empathy inclination of the user. In particular, we
observed that for an accurate prediction we have to consider the user’s religion
(Nonreligious/Atheist), country (AG, EG, KW, HN, AR, SR), relationship_status
(Separated), personality (extroversion, agreeableness) and some relevant word vec
clusters: cluster_ : game, team, soccer, battle, race, fans, bowling; cluster_ :
dear, cheers, goody, extraordinaire, excitedly; cluster_ : personality, motivation,
destiny, ability, vision; cluster_ : facebook, phone, message, internet, video.
These correlations can be used as hints for user profiling and partially provide
an answer for RQ , therefore we decided to perform an ablation analysis for
further investigation. We selected the best configuration    with  = 1 and
we removed one set of features at a time. By removing groups of features such
as demographic, activity, LDA, we observed a slight change of MAE and RMSE.
On the contrary, by removing the set of features about personality, a significant
increase of both MAE (9.6308) and RMSE (9.0815) is observed. This provides
a more specific answer for RQ : personality traits are the key for effective
empathy prediction.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we investigated the problem of mining social media footprints to
infer the user’s inclination toward empathy. The main outcome of the experiments
is a strong correlation is observed among empathy and personality traits. As a
future work, we plan to include the findings described in this preliminary study
as part of the user profile and to include them in a recommendation strategy.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>