<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text-To-Picto Using Lexical Simplification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abbhinav Elliah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ananth Narayanan P</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bhuvan S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P Mirunalini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Augmentative and Alternative Communication (AAC) provides a lifeline for those with language problems by employing pictograms to ensure precise message conveyance. This study fine-tunes a pre-trained translation model for text-to-picto conversion, utilizing tokenization and lexical simplification. The model aids individuals with language impairments due to genetic diseases or aphasia, showcasing its potential in simplifying complex text for efective communication. The study involves using two model: GPT-2 and Helsinki-BERT which are ifne-tuned using given dataset. The Helsinki-NLP model demonstrated superior performance with a Picto-term Error Rate (PictoER) of 18.51. In contrast, the GPT-2 model had a higher PictoER of 170.81, making it prone to produce extraneous terms. These results indicate that the Helsinki-NLP model is more efective in producing accurate and contextually relevant text aligned with pictogram keywords.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Lexical simplification</kwd>
        <kwd>Language-specific fine-tuning</kwd>
        <kwd>GPT-2 model</kwd>
        <kwd>Helsinki BERT</kwd>
        <kwd>NLP tokenizer</kwd>
        <kwd>Keyword mapping</kwd>
        <kwd>Picto-term Error Rate</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        AAC provides a solution for those with language problems brought on by illnesses such as aphasia.
These systems use pictograms to communicate, however, there is still a barrier in translating text or
spoken language into comprehensible pictogram sequences. In the ToPicto subtask of ImageCLEF 2024
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the proposed system fine-tunes an existing model using the given dataset for pictogram generation
via lexical simplification. This task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] introduces two new challenges whose objective is to provide a
translation in pictograms from a natural language, either from (i) text or (ii) speech understandable by
the users, in this case, people with language impairments.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Radford et al. in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] ofer a thorough analysis of the capabilities and performance of the GPT-2 language
model in a range of natural language processing tasks. In-depth assessments of GPT-2’s performance
on datasets like CoQA, the CNN and Daily Mail dataset, summarization, and translation tasks are also
included in this work. Pretrained models for Natural language processing (NLP) are based upon large
conventional datasets, and are thus inefective during classification, prediction tasks based on custom
datasets. Neil et al. in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] experimented Parameter-Eficient Transfer Learning for NLP in order to
improve the accuracies for these tasks. This ideology is further expended to various pre-trained models
such as RoBERTa by Liu et al. in [5], where a larger dataset was used along with calibrations in various
hyper parameters, and Lexfit by Vulic et al. in [6] where lexical simplification is implemented.
      </p>
      <p>Recent progress in natural language processing has been driven by advances in both model
architecture and model pretraining. Wolf et al. in [7] introduced Transformer architectures for the same
involving higher-capacity models and implementing highly-optimized tokenization library built using
Rust. This is extended by the release of open-source Transformers library in Python. Qiang et al. in
[8] proposed LSBert based on pretrained representation model Bert that is capable of using a dataset
for fine-tuning during simplification and substitution of candidates via complex word identification,
substitute generation, filtering and substitute ranking.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>In this work, two distinct methods were explored for lexical simplification using pre-trained models.
The first method uses GPT-2 architecture, while the second method leverages the use of
HelsinkiNLP/opus-mt-ROMANCE-en model for lexical simplification.</p>
      <p>In the initial approach, a sentence compression model is developed using a fine-tuned GPT-2
architecture with an additional linear layer for output generation. The process begins by reading source
and target sentences, creating a dataset where each entry pairs a source sentence with its compressed
counterpart.</p>
      <p>A class is used to manage the data, while another class extends the pre-trained GPT2LMHeadModel1
by adding a linear layer for mapping GPT-2’s hidden states to the vocabulary size. The model is trained
for 10 epochs with a batch size of 16 using the Adam optimizer. During training, sentences are tokenized
with padding and the CrossEntropyLoss function is used to compute the loss between the predicted and
target sequences. Padding is applied to ensure uniform sequence lengths, and the model and optimizer
states are saved and reloaded for training further.</p>
      <p>The second approach utilizes the Helsinki-NLP/opus-mt-ROMANCE-en model 2, a pre-trained
translation model originally designed for Romantic languages of Europe, such as French, Spanish, or Italian.
Despite its primary focus on translation tasks, the model is adapted for lexical simplification within
the context of the ToPicto project. The goal is to convert complex French utterances into simplified
sequences of terms linked to pictograms, thereby enhancing communication for individuals with
language impairments. By fine-tuning the model on a specialized dataset, the proposed system explores its
eficacy in simplifying text while preserving semantic integrity.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Preprocessing</title>
        <p>The dataset for this task is provided by the ImageCLEF 2024 organizers and it is structured in JSON
format, comprising training, validation, and test sets. Each entry in the dataset includes:
id: A unique identifier for each utterance.
src: The source text, an oral transcription in French.
tgt: The target sequence of simplified pictogram terms.</p>
        <p>pictos: A list of pictogram identifiers corresponding to each term in the target sequence.
1GPT-2 pre-trained model documentary: https://huggingface.co/docs/transformers/en/model_doc/gpt2
2Helsinki-NLP repository: https://clarifai.com/helsinkinlp/translation/models/text-translation-romance-lang-english</p>
        <p>The preprocessing involves loading the dataset, extracting the relevant fields (src and tgt), and
tokenizing the text data using the Helsinki-NLP tokenizer. Tokenization converts the text into a format
suitable for model processing, preserving linguistic nuances and syntax.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Proposed Model</title>
        <p>The Helsinki-NLP/opus-mt-ROMANCE-en model is part of the Open Subtitles (opus-mt) project by
the University of Helsinki. It is built on the MarianMT framework, which is a highly optimized neural
machine translation (NMT) system developed by the Marian NMT group. This model is pre-trained
on a vast multilingual corpus, specifically focused on Romance-languages (such as French, Spanish,
Italian, Portuguese, and Romanian). The model leverages a transformer architecture, renowned for its
efectiveness in handling sequence-to-sequence tasks due to its self-attention mechanisms and parallel
processing capabilities.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Encoder-Decoder Framework</title>
          <p>The model employs a standard transformer architecture with an encoder-decoder structure. The encoder
processes the input sequence and generates contextual embeddings, which the decoder then uses to
produce the output sequence.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Self-Attention Mechanism</title>
          <p>Both the encoder and decoder utilize self-attention layers, allowing the model to weigh the importance
of diferent tokens in the sequence dynamically. This mechanism helps capture long-range dependencies
and contextual information. The model is fine-tuned on the ToPicto dataset, which involves adjusting its
parameters to learn the mapping from complex source texts to simplified target sequences. Fine-tuning
leverages the model’s pre-existing linguistic knowledge, adapting it to the specific requirements of the
lexical simplification task.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Methodology</title>
        <p>The Hardware specifications of the system used for Model Training are as follows:</p>
        <p>CPU: 12th Gen Intel(R) Core(TM) i7-12700H
GPU: NVIDIA GeForce RTX 3060</p>
        <p>The training setup is facilitated by the Hugging Face Transformers library, which provides specialized
tools and classes for sequence-to-sequence tasks. The fine-tuning process begins with loading the
training datasets from JSON files, focusing on extracting the source (“src”) and target (“tgt”) fields.
Subsequently, the source and target texts undergo tokenization via the Helsinki-NLP tokenizer to ensure
they are formatted correctly for fine-tuning.</p>
        <p>Training arguments are defined, specifying parameters such as the output directory, batch sizes (here,
4), number of epochs (here, 3), and logging frequency (here, 100) to ensure comprehensive monitoring
and control of the training process, after consideration of data validity. The fine-tuning process involves
optimizing the model’s parameters on the training dataset, leveraging backpropagation and gradient
descent to minimize the loss function and improve the model’s accuracy in generating the desired
sequences. During this phase, the model is iteratively trained over multiple epochs, with periodic
evaluations on the validation dataset to prevent overfitting and ensure generalizability. The fine-tuned
model is subsequently saved for deployment, ensuring that the trained parameters are preserved for
future use. During inference, the model generates hypotheses (hyp) for the test set, which are then
post-processed to ensure conformity with the expected output format and semantic coherence.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>We have experimented with two diferent models as discussed above: GPT-2 and
Helsinki-NLP/opusmt-ROMANCE-en models. The following picto images are provided by ImageClef’24 organizers, and
the image sequence is generated using the script file provided for the ToPicto task.</p>
      <p>Consider for example, the source text "ils ont un accent eux aussi euh".</p>
      <p>It has been proved that Helsinki-NLP model is better in producing a more eficient and accurate
output for the given source text, aligned to the meaning of the given text. Whereas, it was noted that
GPT-2 model gives us comparatively less accurate result, which is due to the fact that the GPT-2 model
is predominantly trained on a large number of English datasets.</p>
      <p>The performance of the proposed architecture was evaluated using the metrics namely Picto-term
Error Rate (PictoER) [9], BLEU score [10], METEOR [11]. In the evaluation of information retrieval
systems based on the provided metrics, the performance of two distinct runs reveals noteworthy insights.</p>
      <p>The Helsinki BERT model demonstrates superior performance in generating French text that aligns
closely with picto keywords, evidenced by its high BLEU score of 68.96, METEOR score of 83.55, and a low
PictoER score of 18.51. These results indicate the model’s efectiveness in producing fluent, contextually
accurate text with minimal error in keyword mapping. The strong BLEU and METEOR scores highlight
the model’s capability to preserve n-gram overlaps and account for synonymy, stemming, and paraphrase
matching, making it highly suitable for tasks requiring precise linguistic and semantic accuracy. It’s
worth noting that while the Helsinki BERT model is trained on a diverse set of languages, including
French, this multilingual training could contribute to a slight reduction in its BLEU score due to the
broad scope of its training data.</p>
      <p>Conversely, the GPT-2 model is significantly weaker at producing cohesive and contextually
appropriate French language, as evidenced by its BLEU score of 3.93, METEOR score of 25.57, and high PictoER
score of 170.81. The high PictoER score indicates considerable keyword alignment problems, while
the poor BLEU and METEOR scores show the model’s inability to maintain contextual relevance and
lfuency. The significant diference between the two shows that models must be tailored specifically to
the target language and application area. In this case, the Helsinki BERT model’s customized approach
performs better than GPT-2’s generalist capabilities. Interestingly, the majority of the English datasets
used to train GPT-2 have limited its ability to perform well on French language tasks. However, the
GPT-2 model did show some ability to correctly predict numbers and nouns.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In conclusion, the Helsinki-NLP model exhibits better performance in showcasing appropriate pictos for
the given text over GPT-2 model. Both the model are able to predict the keywords of a given phrase with
high accuracy, however the former model is able to predict the pronouns and phrase out a meaningful
picto combination at a higher degree of confidence since it is pre-trained over French dataset rather
than English, as in GPT-2.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>Subsequent research endeavors may involve the enhancement of existing models, such as further
ifne-tuning GPT-2, or the creation of novel models to augment their eficacy in analogous
languagespecific assignments. The error rates can be improved by hyperparameter tuning or by removing
erroneous words, which is achieved with more data for training the model and implementing various
other pre-trained models.
Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 2019,
pp. 2790–2799.
[5] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
      <p>Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
[6] I. Vulić, E. M. Ponti, A. Korhonen, G. Glavaš, Lexfit: Lexical fine-tuning of pretrained language
models, in: Proceedings of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume
1: Long Papers), 2021, pp. 5269–5283.
[7] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, et al., Transformers: State-of-the-art natural language processing, in: Proceedings of the
2020 conference on empirical methods in natural language processing: system demonstrations,
2020, pp. 38–45.
[8] J. Qiang, Y. Li, Y. Zhu, Y. Yuan, X. Wu, Lsbert: A simple framework for lexical simplification, arXiv
preprint arXiv:2006.14939 (2020).
[9] J. Woodard, J. Nelson, Pictoer: An information theoretic measure of speech recognition
performance, in: Workshop on standardisation for speech I/O technology, Naval Air Development
Center, Warminster, PA, 1982.
[10] K. Papinesi, Bleu: A method for automatic evaluation of machine translation, in: Proc. 40th Actual</p>
      <p>Meeting of the Association for Computational Linguistics (ACL), 2002, 2002, pp. 311–318.
[11] S. Banerjee, A. Lavie, Meteor: An automatic metric for mt evaluation with improved correlation
with human judgments, in: Proceedings of the acl workshop on intrinsic and extrinsic evaluation
measures for machine translation and/or summarization, 2005, pp. 65–72.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcıa Seco de Herrera</surname>
          </string-name>
          , L. Bloch,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esperança-Rodier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2024:
          <article-title>Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Springer Lecture Notes in Computer Science LNCS, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Macaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esperança-Rodier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          ,
          <article-title>Overview of ImageCLEF 2024 - investigating the translation of natural language into pictograms, in experimental IR meets multilinguality, multimodality, and interaction</article-title>
          ., https://ceur-ws.
          <source>org/</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giurgiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jastrzebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Morrone</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. De Laroussilhe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gesmundo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Attariyan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gelly</surname>
          </string-name>
          ,
          <article-title>Parameter-eficient transfer learning for NLP</article-title>
          ,
          <source>in: Proceedings of the 36th International</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>