<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Barcelona, Catalunya, Spain, April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>TransFeatEx: a NLP pipeline for feature extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agustí Gállego</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Quim Motger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Franch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jordi Marco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Universitat Politècnica de Catalunya</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Service and Information System Engineering, Universitat Politècnica de Catalunya</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>17</volume>
      <issue>2023</issue>
      <abstract>
        <p>Mobile app stores provide centralized access to a large data set of mobile app related natural language textual data, including developer's documentation (e.g., descriptions, changelogs) and user-generated data (e.g., user reviews). Motivated by this context, multiple studies have focused on data-driven elicitation processes for the automatic extraction of the set of features exposed by a catalogue of applications and the inferred, extended knowledge that can be derived from this information. Moreover, with the emerging and generalization of large language models, traditional linguistic-based approaches can be significantly improved by the potential of the knowledge embedded in this kind of models. In this paper, we present TransFeatEx, a NLP-based feature extraction pipeline that combines the use of a RoBERTa-based model with the application of consolidated syntactic and semantic techniques. The pipeline is designed as a customizable, standalone service to be used either as a playground, experimentation tool or as a software component to be embedded into a third-party software system for batch-processing large document corpora. An example of a demo plan is showcased here: https://youtu.be/gfFyi_i_uTw.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;feature extraction</kwd>
        <kwd>natural language processing</kwd>
        <kwd>large language models</kwd>
        <kwd>transformer models</kwd>
        <kwd>mobile apps</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In natural language processing (NLP) and information retrieval, keyword extraction is defined
as a text mining process which automatically identifies relevant terms to build condensed
representations of text documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The definition of relevance is tailored by the purpose or
application of the keyword extraction process, such as text classification [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], sentiment analysis
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or document clustering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Whether using statistical or machine/deep learning strategies
to support these tasks, the availability of a large, representative document corpus is essential to
support their design and implementation, as well as to assess their validity and overall quality.
      </p>
      <p>
        In this sense, app stores have become a popular source of app metadata and app related natural
language documents, which can be used as potential sources of descriptors for mobile software
applications [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. As they have become a basic commodity for mobile users world-wide
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], they provide public access to a wide data set of documents, including oficial developer’s
documentation and user reviews. Consequently, they have been used as data sources and
development context for multiple software components and tools based on keyword extraction
tasks, including app classification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and app feature extraction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Concerning the latter,
a comprehensive, accurate representation of the set of features exposed by a set of apps is
essential for multiple tasks, e.g. software evolution [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or app recommender systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Feature extraction encompasses NLP techniques which include syntactic and/or semantic
strategies [
        <xref ref-type="bibr" rid="ref6 ref8 ref9">8, 9, 6</xref>
        ]. However, the emerging of large, pre-trained language models, and more
specifically, transformer models (e.g., GPT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], BERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), has outperformed the
state-of-theart performance of multiple NLP tasks, including keyword extraction. Moreover, these models
ofer great potential in terms of the linguistic (i.e., syntactic and semantic) knowledge they
embed, which can be tailored and integrated into domain-specific feature extraction processes.
      </p>
      <p>
        In this paper, we present TransFeatEx, a feature extraction tool which combines consolidated
syntactic and semantic feature extraction techniques with the use of a RoBERTa-based
pretrained model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to automatically extract app features from app-related textual documents.
The tool is designed as a customizable pipeline to provide researchers and developers with
domain-specific tuning capabilities, and it is distributed as a decoupled, standalone web service.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        In this context, feature extraction is defined as a data-driven elicitation of functional
requirements from the user perspective for a given software component [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], for which we focus on
mobile applications. The vast amount of data items (i.e., natural language documents) generated
by world-wide used mobile app repositories is a key motivational factor for mobile app oriented
feature extraction research. While this category mainly includes app stores as the most popular
source (e.g., Google Play), alternative platforms include external, non-proprietary repositories
and search engines (e.g., AlternativeTo1) indexing multi-source data.
      </p>
      <p>
        These data sources give access to numerous natural language document from diferent
types. Traditionally, feature extraction has been focused on descriptions and user reviews
[
        <xref ref-type="bibr" rid="ref5 ref6 ref8">8, 6, 5</xref>
        ]. Concerning the latter, user-generated data introduces the challenge of processing
polarized, biased knowledge, for which the addition of a sentiment analysis filtering component
is suggested by multiple state-of-the-art proposals [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Beyond these, in the surveyed literature,
additional document types like changelogs or summary/short descriptions are generally ignored.
      </p>
      <p>
        Traditional syntactic and semantic strategies supporting app feature extraction are built upon
deterministic (rule-based) or probabilistic/statistic (ML-based) techniques [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Syntactic
strategies are focused on syntactic Part-of-Speech (POS) and dependency tree patterns recognition
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], while semantic strategies focus on lexical dictionaries and pre-trained models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. With the
appearance of transformer models, proposals like BERT rapidly outdated the performance of
previous approaches on traditional NLP linguistic tasks (e.g., POS tagging, dependency parsing,
Named Entity Recognition) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Among the variants that followed BERT, one of the most
popular is RoBERTa, which outperformed BERT by limiting the scope of pre-training tasks and
using a larger data-set [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], among other specifications.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Tool description</title>
      <p>The main goal of the TransFeatEx tool is to support researchers and developers in the
development of projects, tools and processes which require from structured data concerning the set of
features exposed by a catalogue of mobile applications. Without the need of annotated data,
TransFeatEx leverages the accuracy of pre-trained transformer models to enrich the natural
language texts received as input (such as descriptions or user-generated reviews) with linguistic
annotations, and then uses such annotations to apply consolidated syntactic and semantic
techniques in order to extract features from said texts. TransFeatEx includes customization
capabilities to fine-tune feature detection tasks (e.g., POS patterns, syntactic dependency patterns)
to allow for domain and context adaptation of the feature extraction process.</p>
      <sec id="sec-3-1">
        <title>3.1. Tool architecture</title>
        <p>
          The TransFeatEx tool has been designed as a standalone, decoupled pipeline deployed in the
form of a web service accessible through an API with two basic user requests. The goal of this
design is to allow easy integration of the pipeline with third-party software systems, as well as
to facilitate playground and experimentation. It consists of a REST controller responsible for
receiving requests, and a core NLP layer concerned with the actual text processing and feature
extraction. The NLP layer is composed of two sub-pipelines:
• Transformer-based pipeline: the tool uses a RoBERTa-based pre-trained instance [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
to process the text and annotate its diferent elements with syntactic features.
• Semantic and syntactic analyser: the enriched textual data resulting from the previous
step is analysed to identify potential features based on syntactic and semantic cues.
1–7
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Inner workings</title>
        <p>When the tool receives a request for processing, the textual data goes through the two
subpipelines in the NLP layer (Figure 1). The transformer-based pipeline consists of several
pipes that annotate, process and enrich the textual data with additional syntactic and semantic
information. The relevant components are the following:
1. RoBERTa-based model: the language model’s core component powering subsequent
ML-based pipes. We use this model as a pre-trained checkpoint with state-of-the-art
accuracy evaluation metrics in syntactic tasks (e.g., POS tagging, dependency parsing)2.</p>
        <sec id="sec-3-2-1">
          <title>2Model’s accuracy scores: https://spacy.io/models/en#en_core_web_trf.</title>
          <p>2. PoS Tagger: it labels each word with its corresponding grammatical category.
3. Sentence boundary disambiguator: custom pipe that splits the text in sentences.
4. Dependency parser: the model infers the syntactic dependencies between the diferent
words and includes this information in the data.
5. Lemmatizer: each word is annotated with its lemma (that is, the base form of the word).
6. Sentiment analyser: a sentiment analysis component that uses a simple algorithm
which assigns a polarity and subjectivity weight based on predefined word scores.</p>
          <p>
            Results serve as input for the semantic and syntactic analyser, where the tool uses the
annotated syntactic features of the input text to identify, extract and clean the noun phrases
that could potentially be mobile app features. This sub-pipeline is composed of the following:
1. Sentiment analysis filter: retrieve the subjectivity and polarity scores of the current
textual data and exclude it from processing if said scores don’t fall between certain
customizable threshold values (default values do not exclude any document).
2. Noun phrase selection: select all noun phrases. The tool focuses on the noun phrases3
as a general pattern matching most frequent feature patterns identified in literature [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
In this step, the algorithm currently omits those chunks whose main word is a pronoun
to avoid ambiguous interpretations (see Section 7 for future work suggestions).
3. Noun phrase filtering: exclude all feature candidates not matching a list of relevant
syntactic dependencies regarding the root element of the phrase. This step is customizable,
so requests may include a list of dependencies for the tool to consider. If none is provided,
based on preliminary results, the tool uses a default set, namely direct objects (dobj in
spaCy), adverbial clauses (advcl), noun phrases in apposition (appos), and subjects (ROOT ).
4. Stopword filtering: exclude the data items that depend on one of a specific set of
irrelevant verbs. This set of verbs can be passed along the request for processing. By
default, no data item is excluded in this phase.
5. Cleaning: clean all the resulting noun phrases. In this process, the tool removes any
stopwords present in the potential feature (i.e. any tokens whose PoS is possessive
pronouns, determiners or punctuation).
6. Normalizing: normalize the data. The remaining words in the cleaned noun phrases are
reverted to its basic, lemmatized form. This is done to avoid duplicated extracted features.
7. Feature building: append the lemmatized root (if any) to the cleaned phrases.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Demo</title>
      <p>TransFeatEx4 exposes two main use cases:
• Batch-processing: it can be used to process in bulk a data-set of NL documents. It
supports text identifiers to easily match a source text to the extracted features and,
subsequently, to a mobile app.
3The Transformer-based pipeline used identifies this syntactic construct and ofers it through the "noun chunk"
structure, which includes additional information, such as the head of the phrase.
4Examples available at: https://github.com/gessi-chatbots/NLP_pipeline. Check README for demo details.
1. Deploy TransFeatEx as a standalone web service (see “How to install").
2. Send a request to /extract-features with the required payload format (see “How to
use"), which includes the set of text documents and customization options.
3. If sentiment analysis filtering is required, send a POST request to /review-extraction
with the relevant payload and the subjectivity/polarity thresholds.
4. The textual data goes through the processes explained in Section 3. When processing
reviews, the data is filtered out based on the subjectivity threshold specified.
5. Receive the response with the text-linked extracted features (see “How to use").
• Playground: the tool also ofers a playground feature to visually explore the results for
a given text document with diferent configurations.</p>
      <sec id="sec-4-1">
        <title>1. Run the script feature-visualization.py with the two required parameters:</title>
        <p>a) -f: text file to be processed (i.e., summary, description, changelog or user review).
b) -c: custom configuration file, including: (1) the list of relevant syntactic patterns
for dependency parsing; and (2) the custom stopword list for stopword filtering.
2. After processing is done, the visualization will be available at http://localhost:5000.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Planned evaluation</title>
      <p>Data set preparation. We collected from Google Play and AlternativeTo a data set of Android
mobile apps in the field of trail tracking, sports activities and other related support apps by
using a set of related keywords from the given domain through its app search engine. For each
app, we collected (1) metadata fields (e.g., app name), (2) proprietary documents (i.e., summary,
description, changelog) and (3) user-generated documents (i.e., user reviews).</p>
      <p>Evaluation metrics. We expect to use a combination of annotated data items (i.e., feature
annotations from real users available in AlternativeTo) with a manual annotation process of the
extracted features to compute evaluation metrics. While we plan to measure accuracy, precision,
recall and f-measure, we also plan to diferentiate two evaluation scenarios. The first scenario
aims at evaluating feature extraction of multiple documents for a single app, for which we plan
to focus on recall (a more permissive approach can be used to only report matching features
among diferent documents). The second scenario aims at evaluating feature extraction of a
single document, for which we plan to focus on precision (a more restrictive approach will
reduce the amount of noisy results, i.e., false positives).</p>
      <p>Experiment set up. We plan to conduct multiple experiments using the customizable
layers of the tool pipeline as experimentation variables: sentiment analysis filtering, noun
phrase selection and noun phrase filtering. For each of these filters, we will define multiple
configuration settings, from more permissive to more restrictive set of criteria (as depicted
above). We will use a cross-validation strategy to determine the optimal tool configuration.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Related work</title>
      <p>
        The SAFE approach is one of the most popular contributions to the RE community in the field of
mobile app feature extraction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. They mainly focus on a POS-based pattern analysis inferred
from a data-set of mobile app descriptions and reviews. They highlighted as the most common
patterns (1) those composed by a noun chunk (e.g., Noun+Noun, Verb+Noun), and (2) those
composed by two words. However, they did not consider syntactic dependency parsing, and
neither applied any kind of sentiment analysis layer to user generated data. Concerning syntactic
analysis, related literature mainly focuses on POS and syntactic-based patterns [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], TFIDF-based
keyword extraction [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or topic modelling approaches [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Concerning sentiment analysis,
most approaches either use syntactic, rule-based approaches like VADER or pattern analysers
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], supervised machine learning approaches [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] and, more recently, deep learning
approaches [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Finally, there are few solutions based on using large language models. KEFE is a
BERT-based approach using a deep learning classifier to filter out non-relevant syntactically
extracted features [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. RE-BERT, similarly to KEFE, focuses on supervised classification with
contextual word embeddings using BERT as a pre-trained language model [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>TransFeatEx provides a software-based technical solution to test and evaluate a customizable
feature extraction process based on the combination of state-of-the-art large language models
and consolidated syntactic and semantic linguistic analysis. The tool is expected to lay the
groundwork for future research and application on domain-specific scenarios, while facilitating
the process of its use both for analytical and playground purposes. As future work, we envisage
three main immediate research action points: (1) to conduct the planned evaluation depicted
in Section 5; (2) to extend the scope of syntactic structures covered in the syntactic pipeline;
(3) to explore the fine-tuning process of a large language model to specifically fit the feature
extraction task by exploring the linguistic knowledge embedded into the model; and (4) to
enhance the SA filter using more advanced techniques.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>With the support from the Secretariat for Universities and Research of the Ministry of Business
and Knowledge of the Government of Catalonia and the European Social Fund. This paper has
been funded by the Spanish Ministerio de Ciencia e Innovación under project / funding scheme
PID2020-117191RB-I00 / AEI/10.13039/501100011033.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Onan</surname>
          </string-name>
          , et al.,
          <article-title>Ensemble of keyword extraction methods and classifiers in text classification</article-title>
          ,
          <source>Expert Systems with Applications</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          , et al.,
          <article-title>Text classification algorithms: A survey</article-title>
          ,
          <source>Information (Switzerland)</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Araque</surname>
          </string-name>
          , et al.,
          <article-title>Enhancing deep learning sentiment analysis with ensemble techniques in social applications</article-title>
          ,
          <source>Expert Systems with Applications</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          , et al.,
          <article-title>ADC: Advanced document clustering using contextualized representations</article-title>
          ,
          <source>Expert Systems with Applications</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Panichella</surname>
          </string-name>
          , et al.,
          <article-title>How can i improve my app? Classifying user reviews for software maintenance and evolution</article-title>
          ,
          <source>in: 31st International Conference on Software Maintenance and Evolution</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Maalej</surname>
          </string-name>
          , et al.,
          <article-title>On the automatic classification of app reviews</article-title>
          , Requirements
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>ACMNL</surname>
          </string-name>
          ,
          <article-title>Market study into mobile app stores</article-title>
          ,
          <year>2022</year>
          . URL: https://www.acm.nl/en/ publications/acm-launches
          <article-title>-market-study-mobile-app-</article-title>
          <string-name>
            <surname>stores</surname>
          </string-name>
          ,
          <source>Accessed 22 November</source>
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Johann</surname>
          </string-name>
          , et al.,
          <article-title>SAFE: A Simple Approach for Feature Extraction from App Descriptions and App Reviews</article-title>
          , in: 25th International Requirements Engineering Conference (RE),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Uddin</surname>
          </string-name>
          , et al.,
          <source>Comparison of Text-Based and Feature-Based Semantic Similarity Between Android Apps</source>
          ,
          <year>2020</year>
          , p.
          <fpage>530</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Improving Language Understanding by Generative PreTraining</surname>
          </string-name>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , et al.,
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL HLT</source>
          <year>2019</year>
          -
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <source>RoBERTa: A Robustly Optimized BERT Pretraining Approach</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Data-Driven Requirements Elicitation: A Systematic Literature</surname>
            <given-names>Review</given-names>
          </string-name>
          , SN Computer Science (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. A.</given-names>
            <surname>Memon</surname>
          </string-name>
          ,
          <article-title>Extracting feature requests from online reviews of travel industry</article-title>
          ,
          <source>Acta Scientiarum - Technology</source>
          <volume>44</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kasri</surname>
          </string-name>
          , et al.,
          <article-title>A Comparison of Features Extraction Methods for Arabic Sentiment Analysis</article-title>
          ,
          <source>in: 4th International Conference on Big Data and Internet of Things</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          , et al.,
          <article-title>How Intense Are You? Predicting Intensities of Emotions and Sentiments using Stacked Ensemble</article-title>
          ,
          <source>IEEE Computational Intelligence Magazine</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          , et al.,
          <article-title>Identifying key features from app user reviews</article-title>
          ,
          <source>in: International Conference on Software Engineering</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>A. F. de Araújo</surname>
          </string-name>
          , R. M. Marcacini, RE-BERT:
          <article-title>Automatic Extraction of Software Requirements from App Reviews Using BERT Language Model</article-title>
          ,
          <source>in: 36th Annual ACM Symposium on Applied Computing</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>