<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Trustworthiness in NLP Systems Through Explainability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Santiago González-Silot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Estudios Avanzados en TIC, Universidad de Jaén</institution>
          ,
          <addr-line>Campus Las Lagunillas s/n, 23007, Jaén</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Doctoral Symposium on Natural Language Processing</institution>
          ,
          <addr-line>26</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LLM</institution>
          ,
          <addr-line>Language Models, XAI, Trustworthy AI, Explainable AI, Interpretability</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Natural Language Processing (NLP) systems have made significant strides in recent years, achieving remarkable success in various applications such as machine translation, sentiment analysis, and question answering. However, the black-box nature of many advanced NLP models raises concerns about their trustworthiness and reliability, especially in critical domains like healthcare, legal, and disinformation. This doctoral thesis addresses the imperative need for enhancing trustworthiness in NLP systems by integrating explainability mechanisms. The research presented here aims to bridge the gap between complex NLP models and their end-users by developing and evaluating methods that provide transparent and interpretable insights throughout the Machine Learning production cycle: data acquisition, preprocessing, training and inference. This doctoral thesis hypothesizes that achieving reliable, explainable, and unbiased language models will lead to more human-friendly and usable Artificial Intelligence.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>
        Intelligence is key for it to have a good impact on society and the acceptance of users to use it
correctly without fears and prejudices. For example, people are more open to use AI if they
know how it works and why they make certain decisions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        If we do not know why the AI makes a decision, produces a response or acts in a certain way
we will not know if that decision is really correct, since in many cases this AI response is highly
subjective, variable, and multifactorial. Many papers [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ] have shown that AI is plagued by
biases of all kinds, e.g., gender, ethnicity, and religion, which are inherent in the data used for
training and can condition it to make decisions that are dangerous to humans.
      </p>
      <p>
        In addition, explainability is not only a goal to see why a model makes a decision and to
see the model’s behavior, it also serves to justify that decision and to help users to investigate
uncertain or inconsistent predictions. For example, in my previous work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], I applied SHAP and
observed that the state-of-the-art models of fake news detection took into consideration spurious
features and named entities, which is a violation of impartiality. Thanks to this application
of explainability I was able to develop a methodology of working to reduce biases in this task
and make the model less biased, more robust to adversarial attacks, more generalizable and
generally more trustworthy. It is worth mentioning that a paper on the application of this
methodology has been written and will be submitted in July.
      </p>
      <p>
        Trustworthy AI has become increasingly crucial due to the growing landscape of regulations
designed to ensure ethical, transparent, and accountable use of Artificial Intelligence, as can be
seen in Figure 2 from the document of ethics guidelines for trustworthy AI of the European
Commision [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As governments and international bodies establish guidelines to protect individual
rights and societal interests, AI researchers and organizations must prioritize trustworthiness
to comply with these standards. Trustworthy AI not only helps in avoiding legal repercussions
and financial penalties but also fosters public confidence and adoption of AI technologies. It
encompasses principles such as fairness, privacy, security, robustness, and explainability, which
are essential to mitigate biases, prevent misuse, and promote transparency. Adhering to these
regulations ensures that AI systems operate responsibly and equitably, reinforcing their positive
impact on society while maintaining public trust and safeguarding against potential harm.
      </p>
      <p>For these reasons, the objective of this doctoral thesis is to bridge the gap between
blackbox, biased, and opaque models to a more secure, transparent, unbiased, and generally more
trustworthy Artificial Intelligence in the Natural Language Processing domain.</p>
      <p>The remaining sections of this paper are organized as follows: Section 2 covers the background
and related work of Trustworthy and Explanability in NLP; Section 3 the main hypothesis and
objectives of the doctoral thesis; Section 4 the research methodology and experiments for this
thesis; Section 5 the specifics research elements proposed for discussion; Finally Section 6
depicts the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and related work</title>
      <p>
        Trustworthy and explainable natural language processing (NLP) has become a critical area of
research in recent years. With the increasing focus on ethical challenges within NLP, such
as bias mitigation, identifying objectionable content, and enhancing system design and data
handling practices [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], researchers have explored into various aspects to ensure trustworthy
NLP models. For example, recent eforts have been made to enhance the trustworthiness of
Graph Neural Networks (GNNs) through aspects like robustness, explainability, privacy, fairness,
accountability, and environmental well-being [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Explainability in NLP has been a key focus,
with research highlighting the importance of interpretability and the application of explainable
AI (XAI) techniques to enhance understanding and trust in NLP models [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Model explainability in the Large Language Model’s era of NLP has been a subject of interest,
with discussions on how explainability analysis can help detect issues unique to NLP models
post-training [12]. Additionally, rationalization for explainable NLP has been emphasized,
recognizing the challenges faced in achieving explainability despite the practical applications of
NLP [13]. Techniques such as feature attribution methods have been employed to visualize and
understand the reasoning capabilities of NLP models, aiding experts in comprehending model
outputs [14].</p>
      <p>Moreover, the application of NLP in various domains, including social media data mining
and knowledge discovery, has highlighted the significance of NLP in enabling machines to
understand and generate meaningful content from diverse sources [15][16]. The development of
ethical NLP models and the machine learning of ethical judgments from natural language have
been explored, shedding light on the assumptions and methodologies involved in generating
moral judgments through NLP [17]. Furthermore, in the healthcare domain, the use of NLP in
structured real-world data for pharmacovigilance purposes has been investigated, emphasizing
the importance of trustworthy AI criteria in AI model implementation due to the sensitivity of
the task domain [18].</p>
      <p>In conclusion, the synthesis of research in trustworthy and explainable NLP underscores the
multidimensional eforts to enhance the reliability, interpretability, and ethical considerations
within NLP models. By addressing issues such as bias, model explainability, and ethical
judgments, researchers aim to advance the field of NLP towards more transparent and trustworthy
applications and putting the focus on a key aspect for the application of AI in society, which is
sometimes not given enough importance.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Main Hypothesis and Objectives</title>
      <p>3.1. Main Hypothesis
The hypothesis behind this line of research is that if we develop explainable, interpretable, and
less-biased models, we can create a more Trustworthy AI which is more usable, human-friendly
and responsible.</p>
      <p>This doctoral thesis aims to bridge the gap between black-box, biased, and opaque models to a
more secure, transparent, unbiased robust, and generally more trustworthy Artificial Intelligence
in the Natural Language Processing domain.
3.2. Objectives
1. Analyze the state of the art of Explainability and Trustworthiness in AI and specifically
in NLP.
2. Analyze the possible regulations that exist and will exist in AI to adapt the line of research
and application to these regulations.
3. Analysis and development of datasets and pre-training methodologies that avoid the
potential problems of inherent human language biases.
4. Creation of unbiased, more transparent, and explainable language models for a variety of
particularly sensitive tasks such as fake news detection or sentiment analysis.
5. Design of frameworks to adapt current state-of-the-art models to the reliability needs
required by their specific application domains.
6. Design of an evaluation framework that takes into account the diferent perspectives of
trustworthiness, and in particular the level of explainability, unbiasedness, and robustness.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Research Methodology and Proposed Experiments</title>
      <p>To achieve the objectives and validate the hypothesis, the research will proceed in four stages
1. Analysis of relevant literature sources: To achieve the objective of the thesis, an
exhaustive analysis of relevant sources has to be performed. This includes the review of
scientific literature related to language models, explainability, trustworthiness, and the
methodologies related to these concepts that may approach a more Trustworthy AI.
2. Experimental design: Development of techniques and methodologies to bring language
models that act as black boxes closer to a more reliable AI. For this purpose, experiments
will be carried out applying diferent NLP and Explainability techniques to obtain key
insights that can guide the development of a Trustworthy AI.
3. Trustworthy Data Creation and Curation: Trustworthy data creation and curation
in Natural Language Processing (NLP) is critical to ensuring the accuracy and reliability
of information extracted from textual data and plays a key role in extracting valuable
insights from unstructured text data. For this reason, datasets will be made to drive the
explainable behavior of the language models. Additionally, data preprocessing techniques
will also be developed to ensure privacy and unbiasedness throughout the data lifecycle.
4. Evaluation of results: Application and development of diferent evaluation metrics that
measure how reliable an AI model is in the diferent aspects involved (absence of biases,
robustness, interpretability, etc). For this purpose, the key aspects and weaknesses of the
current metrics will be analyzed to achieve a correct evaluation.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Research Elements for Discussión</title>
      <p>In a field as broad and incipient as trustworthy AI, there is a discussion on a wide range of
issues, but in particular, I show below the 3 elements of the discussion that I am debating in the
current state of the doctoral thesis.</p>
      <p>1. Data Collection for Trustworthy AI: The basis of Machine Learning is inductive
learning, i.e., models learn from data. That is why one of the key points to make AI
more trustworthy is to make both the datasets used and the collection process meet the
necessary requirements for it. How can we collect data without biases of any kind? How
do we evaluate if a dataset is biased?
2. Evaluation Techniques for Measuring the Quality of an Explanation: A model’s
quality should be evaluated not only by its accuracy and performance but also by how well
it provides explanations for its predictions [19]. Should we use Informal Examination,
Comparison to Ground Truth or Human Evaluation? What are the advantages and
disadvantages of using metrics such as BLEU [20], ROUGE [21], or Perplexity? Can we
rely on what is relevant to attention mechanisms? [22]
3. Efective evaluation of the degree of bias of a language model . The degree of
trustworthiness of a language model depends on several factors such as its robustness,
interpretability, or absence of bias among others. How can we efectively measure the
degree of bias of a language model? How can we know if there is a real bias in the model
output? How can we identify from which part of the model development cycle the bias
comes?</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This paper has shown the first steps of my doctoral thesis aimed at developing more explainable,
interpretable, and unbiased models to bridge the gap between black-box models and Trustworthy
Artificial Intelligence in the domain of Natural Language Processing.</p>
      <p>For this purpose, the state of the art has been analyzed, the objectives to be achieved have
been presented, the methodology to achieve them has been described and finally, diferent
elements for discussion have been put on the table.
[12] S. Gholizadeh, N. Zhou, Model explainability in deep learning based natural language
processing (2021). doi:10.48550/arxiv.2106.07410.
[13] S. Gurrapu, Rationalization for explainable nlp: A survey, Frontiers in Artificial Intelligence
(2023). doi:10.3389/frai.2023.1225093.
[14] X. Wang, R. Huang, Z. Jin, T. Fang, H. Qu, Commonsensevis: Visualizing and understanding
commonsense reasoning capabilities of natural language models, IEEE Transactions on
Visualization and Computer Graphics (2023) 1–11. URL: http://dx.doi.org/10.1109/TVCG.
2023.3327153. doi:10.1109/tvcg.2023.3327153.
[15] A. BOUCHEHAM, Natural language processing for social media data mining (2023).</p>
      <p>doi:10.47832/alanyacongress2- 8.
[16] F. G. VanGessel, Natural language processing for knowledge discovery and information
extraction from energetics corpora, Propellants Explosives Pyrotechnics (2023). doi:10.
1002/prep.202300109.
[17] Z. Talat, H. Blix, J. Valvoda, M. I. Ganesh, R. Cotterell, A. Williams, On the machine learning
of ethical judgments from natural language (2022). doi:10.18653/v1/2022.naacl- main.
56.
[18] S. Dimitsaki, Applying artificial intelligence in structured real-world data for
pharmacovigilance purposes: A systematic literature review (preprint) (2024). doi:10.2196/preprints.
57824.
[19] M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, P. Sen, A Survey of the State
of Explainable AI for Natural Language Processing, in: K.-F. Wong, K. Knight, H. Wu
(Eds.), Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association
for Computational Linguistics and the 10th International Joint Conference on Natural
Language Processing, Association for Computational Linguistics, Suzhou, China, 2020, pp.
447–459. URL: https://aclanthology.org/2020.aacl-main.46.
[20] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of
machine translation, in: Proceedings of the 40th annual meeting of the Association for
Computational Linguistics, 2002, pp. 311–318.
[21] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization
branches out, 2004, pp. 74–81.
[22] S. Serrano, N. A. Smith, Is attention interpretable?, in: A. Korhonen, D. Traum, L. Màrquez
(Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 2931–2951.
URL: https://aclanthology.org/P19-1282. doi:10.18653/v1/P19- 1282.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Klyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maslej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Liang,
          <source>The foundation model transparency index</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>12941</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Church</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kordoni</surname>
          </string-name>
          , Emerging trends: Sota-chasing,
          <source>Natural Language Engineering</source>
          <volume>28</volume>
          (
          <year>2022</year>
          )
          <fpage>249</fpage>
          -
          <lpage>269</lpage>
          . doi:
          <volume>10</volume>
          .1017/S1351324922000043.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barredo Arrieta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bennetot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil-Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benjamins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chatila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          ,
          <source>Information Fusion</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>82</fpage>
          -
          <lpage>115</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/S1566253519308103. doi:https://doi.org/10.1016/j.inffus.
          <year>2019</year>
          .
          <volume>12</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bolukbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Saligrama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalai</surname>
          </string-name>
          ,
          <article-title>Man is to computer programmer as woman is to homemaker? debiasing word embeddings</article-title>
          ,
          <source>CoRR abs/1607</source>
          .06520 (
          <year>2016</year>
          ). URL: http://arxiv.org/abs/1607.06520. arXiv:
          <volume>1607</volume>
          .
          <fpage>06520</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schiebinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Word embeddings quantify 100 years of gender and ethnic stereotypes</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>115</volume>
          (
          <year>2018</year>
          )
          <fpage>E3635</fpage>
          -
          <lpage>E3644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Garrido-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martínez-Santiago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <article-title>Maria and beto are sexist: evaluating gender bias in large language models for spanish, Language Resources and Evaluation (</article-title>
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>González-Silot</surname>
          </string-name>
          , Procesamiento de Lenguaje Natural Explicable para Análisis de Desinformación,
          <source>Master's thesis</source>
          , Universidad de Granada,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>European-Commision</surname>
          </string-name>
          ,
          <article-title>Ethics guidelines for trustworthy AI</article-title>
          .,
          <source>Technical Report</source>
          ,
          <year>2019</year>
          . URL: https://digital-strategy.ec.europa.eu/en/library/ethics
          <article-title>-guidelines-trustworthy-ai.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Prabhumoye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boldt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <article-title>Case study: Deontological ethics in nlp (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>297</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <article-title>Trustworthy graph neural networks: Aspects, methods and trends (</article-title>
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arxiv.2205.07424.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Danilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhanorkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Explainability for natural language processing (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3447548.3470808.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>