<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M.P. Salas-Zárate);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Team ITST at FinancES 2023: A Psycholinguistic-based Sentiment Analysis Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>María del Pilar Salas-Zárate</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Andrés Paredes-Valverde</string-name>
          <email>mario.pv@teziutlan.tecnm.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tecnológico Nacional de México/I.T.S. Teziutlán</institution>
          ,
          <addr-line>Fracción l y ll SN, 73960 Teziutlán, Puebla</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes the participation of the ITST team in the FinancES 2023 shared task on financial targeted sentiment analysis in Spanish. This paper proposes a sentiment analysis approach based on psycholinguistic features which are obtained through the LIWC tool. Since the features provided by LIWC are many, the use of a feature selection technique based on Rough Set Theory and Information Gain is proposed to eliminate irrelevant features and thus improve the performance of the generated model. With respect to feature selection, a significant difference can be seen in terms of the LIWC categories selected for determining the sentiment polarity of each news headline towards both consumers and companies.</p>
      </abstract>
      <kwd-group>
        <kwd>Keywords1</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Psycholinguistics features</kwd>
        <kwd>LIWC</kwd>
        <kwd>Feature selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sentiment analysis has become a relevant technology because it allows us to understand people's
opinions regarding various topics such as product preferences, marketing campaigns, politics, among
others. Today, there is a wealth of information published on the web related to the financial domain.
This represents a great opportunity to understand public opinion and generate solutions based on
sentiment analysis to support decision making. However, sentiment analysis in the financial context
represents a major challenge because the language used in this context is inherently complex since
financial terms refer to an underlying social, economic, and legal context
        <xref ref-type="bibr" rid="ref5">(Milne &amp; Chisholm, 2013)</xref>
        .
      </p>
      <p>
        This paper concerns our participation at FinancES 2023 Task
        <xref ref-type="bibr" rid="ref2">(García-Díaz et al., 2023)</xref>
        which
belongs to the IberLEF 2023
        <xref ref-type="bibr" rid="ref4">(Jiménez-Zafra et al., 2023)</xref>
        . This work seeks to determine if
psycholinguistic features can be used to improve the performance of sentiment analysis for financial
domain. For this purpose, the financial dataset was analyzed using the Spanish version of the Linguistic
Inquiry and Word Count (LIWC) program
        <xref ref-type="bibr" rid="ref10">(Tausczik &amp; Pennebaker, 2009)</xref>
        . LIWC counts words in
psychologically meaningful categories such as cognitive process, positive emotion, negative emotions,
discrepancy, negation, certainty, among others. Also, we implemented a feature selection process based
on Rough Set Theory and Information Gain which aims to improve the model performance by
eliminating irrelevant categories thus allowing the model focuses on the most important information,
which can result in a better generalization capability.
      </p>
      <p>Next section describes the developed strategies for identifying the main economic target from news
headlines as well as for determining the sentiment polarity of each news headline towards both
companies and consumers. Finally, under final remarks, we discuss our results and a proposal for future
work.</p>
      <p>2023 Copyright for this paper by its authors.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Developed strategies</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Task 1: Financial targeted sentiment analysis</title>
    </sec>
    <sec id="sec-4">
      <title>2.1.1. Determining the sentiment polarity</title>
      <p>The process described above was implemented for both this task (Determining the sentiment
polarity) and task 2 (Financial sentiment analysis at document level for companies and consumers).
However, for each of the tasks, different categories were selected. With respect to task 1, the result of
the feature selection process was 45 selected categories which are shown in Table 1. As can be seen,
the dimensions of linguistic process and psychological process contain a greater number of selected
categories. This may be because such categories are more related to the understanding of emotions such
as anxiety, anger, sadness, positive, negative, among others, which are present in the expression of
opinions.</p>
    </sec>
    <sec id="sec-5">
      <title>2.2. Task 2: Financial Sentiment Analysis at document level for companies</title>
      <p>and consumers</p>
    </sec>
    <sec id="sec-6">
      <title>2.2.1. Determining the sentiment polarity of news headlines towards companies</title>
      <p>As mentioned above, Task 2 employed the process shown in Figure 1 Specifically, for the case of
sentiment analysis for companies, the feature selection process resulted in a total of 23 selected LIWC
categories (see Table 2). Again, the linguistic process (10) and psychological process (10) dimensions
contain the largest number of categories. With respect to the psychological process, the categories of
negative emotions (EmoNeg), anger (Enfado) and anxiety (Ansiedad) can be highlighted as the most
discriminating characteristics in news headlines towards companies.</p>
    </sec>
    <sec id="sec-7">
      <title>2.2.2. Determining the sentiment polarity consumers of new headlines towards</title>
      <p>Categories
Articulo, WPS, WC, Dic, Funct, Numeros, informal, Futuro, Conjunc,</p>
      <p>BigWords, PronImp, ElElla, PronPer, TotPron, Cuantif
Triste, Tiempo, Ansiedad, Inhib, Relativ, EmoNeg, MecCog</p>
      <p>Hogar, Dinero, Trabajo</p>
      <p>Comma, AllPunc</p>
      <p>Finally, Figure 4 shows the Top 10 features for Task 2 sentiment analysis of news headlines towards
consumers. As can be seen, there is a great difference between the characteristics selected for the
analysis of news headline sentiments towards companies and consumers. One of the main differences
lies in the fact that psycholinguistic features related to negative emotions such as negative emotion
(EmoNeg) and anger (Anger) that were selected for the case of companies do not appear in the case of
consumers. For the latter case there are a greater number of features belonging to the category of
linguistic process with 6.</p>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
    </sec>
    <sec id="sec-9">
      <title>3. Final remarks</title>
      <p>This paper presented a sentiment polarity detection approach for IBERLEF 2023 Task - FinancES
shared task on financial targeted sentiment analysis in Spanish. For polarity detection, an approach was
proposed that extracts psycholinguistic features from the opinions through the LIWC tool. In addition,
a feature selection process was performed to identify and select the most relevant variables for the
construction of the classifier model. As could be seen, the feature selection process produced different
results for each of the targets (companies and consumers), which gives some clues as to the difference
in the use of language for both cases. For future work, we intend to investigate transformers-based
methods for sentiment polarity detection.</p>
    </sec>
    <sec id="sec-10">
      <title>4. Acknowledgements</title>
      <p>We are grateful to the Tecnológico Nacional de Mexico (TecNM, by its Spanish acronym) for
supporting this work. This research was also sponsored by Mexico’s National Council of Humanities,
Sciences and Technologies (CONAHCYT).</p>
    </sec>
    <sec id="sec-11">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Arafat</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elawady</surname>
            ,
            <given-names>R. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barakat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Elrashidy</surname>
            ,
            <given-names>N. M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Different feature selection for sentiment classification</article-title>
          .
          <source>International Journal of Information Science and Intelligent System</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <fpage>137</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>García-Díaz</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almela</surname>
          </string-name>
          , Á.,
          <string-name>
            <surname>García-Sánchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alcaráz Mármol</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marín-Pérez</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Overview of FinancES 2023: Financial Targeted Sentiment Analysis in Spanish</article-title>
          .
          <source>Procesamiento Del Lenguaje Natural</source>
          ,
          <volume>71</volume>
          (
          <issue>0</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Geng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Understanding the focal points and sentiment of learners in MOOC reviews: A machine learning and SC-LIWC-based approach</article-title>
          .
          <source>British Journal of Educational Technology</source>
          ,
          <volume>51</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1785</fpage>
          -
          <lpage>1803</lpage>
          . https://doi.org/10.1111/BJET.12999
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jiménez-Zafra</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <article-title>Montes-y-</article-title>
          <string-name>
            <surname>Gómez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2023</year>
          ),
          <article-title>Co-Located with the 39th Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), CEUR-WS</article-title>
          .
          <year>Org</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chisholm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>The Prospects for Common Financial Language in Wholesale Financial Services. SSRN Electronic Journal</article-title>
          . https://doi.org/10.2139/SSRN.2325362
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Olagunju</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oyebode</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Orji</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Exploring Key Issues Affecting African Mobile eCommerce Applications Using Sentiment and Thematic Analysis</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>8</volume>
          ,
          <fpage>114475</fpage>
          -
          <lpage>114486</lpage>
          . https://doi.org/10.1109/ACCESS.
          <year>2020</year>
          .3000093
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García-Díaz</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Sanchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Evaluation of transformer models for financial targeted sentiment analysis in Spanish</article-title>
          .
          <source>PeerJ Computer Science</source>
          ,
          <volume>9</volume>
          , e1377. https://doi.org/10.7717/PEERJ-CS.
          <fpage>1377</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>1982</year>
          ).
          <article-title>Rough sets</article-title>
          .
          <source>International Journal of Computer &amp; Information Sciences</source>
          ,
          <volume>11</volume>
          (
          <issue>5</issue>
          ),
          <fpage>341</fpage>
          -
          <lpage>356</lpage>
          . https://doi.org/10.1007/BF01001956/METRICS
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Salas-Zárate</surname>
            ,
            <given-names>M. del P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Paredes-Valverde</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <source>Sentiment Classification of Spanish Reviews: An Approach based on Feature Selection and Machine Learning Methods. J. Univers. Comput. Sci.</source>
          ,
          <volume>22</volume>
          (
          <issue>5</issue>
          ),
          <fpage>691</fpage>
          -
          <lpage>708</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Tausczik</surname>
            ,
            <given-names>Y. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods</article-title>
          . Http://Dx.Doi.Org/10.1177/0261927X09351676,
          <volume>29</volume>
          (
          <issue>1</issue>
          ),
          <fpage>24</fpage>
          -
          <lpage>54</lpage>
          . https://doi.org/10.1177/0261927X09351676
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>