Benchmarking the Semantics of Taste: Towards the
                                Automatic Extraction of Gustatory Language
                                Teresa Paccosi1,2,3 , Sara Tonelli1
                                1
                                  Fondazione Bruno Kessler, Via Sommarive, 18, Trento
                                2
                                  Università degli studi di Trento, Via Calepina, 14, Rovereto
                                3
                                  DHLab / KNAW Humanities Cluster, Oudezijds Achterburgwal 185 1012 DK Amsterdam, The Netherlands


                                                Abstract
                                                In this paper, we present a benchmark containing texts manually annotated with gustatory semantic information. We employ
                                                a FrameNet-like approach previously tested to address olfactory language, which we adapt to capture gustatory events. We
                                                then propose an exploration of the data in the benchmark to show the possible insights brought by this type of approach,
                                                addressing the investigation of emotional valence in text genres. Eventually, we present a supervised system trained with the
                                                taste benchmark for the extraction of gustatory information from historical and contemporary texts.

                                                Keywords
                                                Sensory semantics, gustatory language, information extraction, digital humanities


                                1. Introduction                                                      Semantics [4], and the system is trained to identify the
                                                                                                     lexical units and the possible semantic roles contribut-
                                Despite the central role of nutrition in our lives, taste has ing to the construction of a gustatory event. We present
                                been often classified as an inferior sense in the Western the results of the experiments and an exploration of the
                                philosophical tradition. This downplayed role is reflected benchmark data, aiming to demonstrate the potential of
                                in the vocabulary used to describe the gustatory experi- frame-based analysis for sensory studies.
                                ence, which, together with smell, is characterized by a
                                scarcity of domain-specific terms [1]. The difficulty in
                                capturing the semantics of taste could help explain why 2. Related Work
                                there are few works in the fields of Natural Language
                                Processing (NLP) and Digital Humanities (DH) that deal In recent years, there has been a growing interest within
                                with this sense and, in particular, the language used to the NLP community in developing resources designed to
                                describe its experience. While there has been renewed capture the sensory content of language [5]. In particu-
                                interest in the automatic extraction of nutrients and in- lar, in the framework  1
                                                                                                                             of the three-year European Project
                                gredients from texts for health and medicinal purpose [2], “Odeuropa” aimed at preserving intangible cultural her-
                                less attention has been devoted to the development of itage, several works have focused on analyzing smell de-
                                tools and models focused on capturing the semantics of scriptions [6] and extracting olfactory information from
                                sensory experiences, especially in a diachronic fashion. texts. For instance, [3] created a manually annotated
                                   In this paper, we present an English benchmark for benchmark with smell events, which has been subse-
                                the study of gustatory language and a supervised system quently used to train a system for olfactory information
                                for the automatic extraction of taste-related events in extraction [7, 8]. The benchmark focuses on the lan-
                                English, which we trained using this benchmark. The guage used to describe olfactory experiences and covers
                                benchmark was built to be a counterpart to the olfactory a period of four centuries (1600-1900), making it useful
                                one presented in [3], with the idea of making the study for historical research. An extension in this direction
                                of the language of these two senses comparable. The sys- is SENSE-LM, a system for extracting sensory informa-
                                tem is designed as a means to study the language used to tion from texts, which shows that combining language
                                describe the experience of tasting from both synchronic models with lexical resource-based approaches yields
                                and diachronic perspectives. The selected formal repre- better results in extracting sensory references from texts
                                sentation for the semantics of taste is based on Frame compared to systems that do not integrate these two
                                                                                                     components [9]. The authors were the first to combine
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, sensorimotor representations with the textual features
                                Dec 04 — 06, 2024, Pisa, Italy                                       of language models for the task of sensory information
                                $ tpaccosi@fbk.eu;teresa.paccosi@unitn.it (T. Paccosi);              extraction in text documents. Even if they propose the
                                satonelli@fbk.eu (S. Tonelli)
                                 0009-0009-2348-7556 (T. Paccosi); 0000-0001-8010-6689
                                                                                                     system for all the 5 senses, they only tested it on olfactory
                                (S. Tonelli)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License   1
                                          Attribution 4.0 International (CC BY 4.0).                                                         https://odeuropa.eu/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
              Frame Element                                                                           Definition
              Taste_Source                                                     The food items that are ingested
              Quality                             Any property used to describe the taste (usually adjectives)
              Taste_Carrier                                        Anything that can contain the taste source
              Taster                                                  The person/animal who ingests the food
              Evoked_Taste             The taste that is evoked but it is not present (e.g., it tastes like onions)
              Location                                                    The place in which the food is tasted
              Taste_Modifier       An ingredient that can modify the perception of the taste of a taste source
              Circumstances                    The condition or circumstance in which the taste event occurs
              Effect                                           Any effect provoked by the tasting experience

Table 1
List of Gustatory Frame Elements


and auditory language, using respectively the benchmark    mark together with the frame elements associated with
of [3] and an artificial dataset they generated with GPT-4 it, which the taste extraction system should then iden-
[10]. Most existing work on food representation in the     tify automatically. For instance, in the sentence “[Slimy
field of NLP focuses on health-related applications. A no- milk]𝑇 𝑎𝑠𝑡𝑒_𝑆𝑜𝑢𝑟𝑐𝑒 has an [unpleasant]𝑄𝑢𝑎𝑙𝑖𝑡𝑦 taste”, the
table work with a linguistic focus is [2], where the authors
                                                           system has to identify the Taste_Word (‘taste’), and then
concentrate on identifying noun-compound headnouns         the possible frame elements (in this case, Taste_Source
for developing conversational agents in the e-commerce     and Quality). A list of the possible frame elements and
domain. They propose a supervised approach based on a      their definition is provided in Table 1. The documents
neural sequence-to-sequence model to identify the most     annotated in the benchmark cover 5 different domains or
informative token in Italian food compound-nouns, ob-      genres, almost evenly distributed with 3/4 documents for
taining promising results despite the complexity of the    century in every domain for a total of 72 documents. The
task. Taste has been also addressed from a diachronic      genres are: Literature, Science & Philosophy, Household &
point of view in [11], in which the author reconstructs    Recipes, Travel & Ethnography, and Medicine & Botany.
the evolution of food language focusing on the history     To select the documents we automatically search for texts
of some dishes and ingredients across continents using     presenting a greater density of lexical units (taste words)
                                                           2
computational linguistic tools. Several studies have de-      spanning through several English corpora and taste-
veloped named-entity recognition (NER) models to au-       related websites. The corpora form which we extract
tomatically extract food entities for medicinal purposes   the documents we annotated are: (1) Early English Books
and food science applications [12, 13], creating domain-   Online (EEBO)3 , a collection of documents published be-
specific corpora by sourcing data from culinary websites   tween 1475 and 1700 covering different domains such
and online recipe books [14, 15].                          as literature, philosophy, politics, religion, geography,
                                                           history, politics, and mathematics; (2) Project Gutenberg 4 ,
                                                           a digitized archive of cultural works, containing differ-
3. Benchmark for Taste                                     ent repositories, mainly in the literary domain; (3) me-
                                                           dievalcookery.com5 a list of texts freely available online
The training data we use for the models in this paper is
                                                           relating to medieval food and ancient cooking recipes; (4)
a benchmark created according to the annotation guide-
                                                           foodsofengland.co.uk 6 an online library which holds the
lines presented in [16]. The formalization adopted to
                                                           complete texts of several cook books from 1390 to 1974;
annotate the benchmark is inspired by Frame Seman-
                                                           (5) Wikisource7 , an online digital library of free-content
tics [4] and their implementation through the FrameNet
                                                           textual sources managed by the Wikimedia Foundation;
annotation project [17]. In FrameNet, events and situa-
                                                           (6) British Library 8 , a collection of 65,227 digitised vol-
tions are constructed as frames, structures that represent
                                                           umes from the 16th to the 19th Century; (7) London Pulse
the knowledge necessary to understand the meaning of
words. Frames include two main components, namely
lexical units, domain-specific words or expression that 2
                                                             The list of lexical units is provided in Appendix A
trigger the frame, and frame elements, domain-specific 3 https://textcreationpartnership.org/tcp-texts/
semantic roles usually attached as dependents to the lex- eebo-tcp-early-english-books-online/
ical unit. In our case, taste events are captured through 4 https://www.gutenberg.org/
                                                           5
a so-called Gustatory frame, which is triggered in a 6 https://www.medievalcookery.com/etexts.html?England
document by Taste_Words (i.e., domain-specific lexi- 7 http://www.foodsofengland.co.uk/references.htm
                                                             https://en.wikisource.org/wiki/Main_Page
cal units). Each lexical unit is annotated in the bench- 8 https://data.bl.uk/digbks/
                          Frame Elements (FEs)           1500   1600      1700      1800     1900      Overall
                                Taste_Words              440    2417       500      1498      803       5,648
                                Taste_Source             372    1627       375      1081      599       4,393
                                   Quality               197    1495       255      881       489       1,732
                               Taste_Modifier            135     142        66       154       78       1,357
                                   Taster                 65    173         85       185      100        638
                                Evoked_Taste             20      127        31       53        16        247
                                  Location               11      44         12       24        16        116
                                Taste_Carrier              9     38          9        26       12         98
                               Circumstances             19      206        38      228        82        656
                                    Effect               24      56         32       34        31        174

Table 2
Statistics of the Taste Benchmark


Medical Reports9 , a collection of 5800 Medical Officer of  To this purpose, we use the categories proposed in the
Health reports from the Greater London area from 1848       Historical Thesaurus of English of Savouriness and
to 1972.                                                    Unsavouriness for Taste and Fragrant/Fragrance
   In Table 2 we report the statistics of the annotated     and Stench for Smell10 . This thesaurus contains almost
benchmark (note that in [16] we presented only a prelim-    every recorded word in English from medieval times to
inary version of the benchmark containing around 1,400      the present day, ordered into detailed hierarchies of mean-
Taste_Words). The most frequent frame element is the        ing. In the Thesaurus, every category of the hierarchy
Taste_Source, followed by Quality and Taste_Modifier,       is divided per part of speech (PoS). For our analysis, we
which represent the core frame elements, while the rest     manually selected all the nouns, adjectives and adverbs
of the frame elements are much sparser. Even if the distri- used in the period we cover with our documents, namely
bution of the frame elements is not balanced, the system    from 16th century to 20th century. We then assigned the
is trained to extract the taste words and all the 9 frame   words labeled as Taste_Words and Smell_Words in the
elements. Two expert linguists, trained on [16]’s guide-    documents to one of the two categories (positive or neg-
lines, annotated three documents from 1670, 1720, and       ative) and calculated the normalized frequency of each
1920 to assess Inter Annotator Agreement (IAA). The         category across different text genres. As reported in
Krippendorff’s alpha score [18] at span level was 0.70,     Section 3, the genres represented in the gustatory bench-
indicating a moderate agreement.                            mark are: Literature, Science & Philosophy, Household
                                                            & Recipes, Travel & Ethnography, Medicine & Botany.
                                                            In the olfactory benchmark presented in [3], there are
4. Exploration of olfactory and                             instead 10 different genres: Household & Recipes, Law &
     gustatory benchmarks                                   Regulations, Literature, Medicine & Botany, Perfumes &
                                                            Fashion, Public health, Religion, Science & Philosophy,
It has been observed that words used to describe ol- Theatre, Travel & Ethnography.
factory and gustatory experiences tend to appear more          We display the output of this analyses in Fig. 1
frequently in emotionally charged contexts and carry a (for taste words) and Fig. 2 (for smell words), aimed
stronger evaluative content compared to words related at showing which emotional valence prevails in each
to other senses [19]. By ‘evaluative content’, we refer in genre for the two senses. We observe that two gen-
this paper to the concept of ‘emotional valence’, which is res exhibit opposite tendencies: medicine/botany
defined as “the pleasantness of a word in terms of pos- shows a more negative orientation in the smell bench-
itive and negative meaning” ([1], p. 201). We therefore mark and a more positive one in the taste benchmark,
conducted an exploration of the gustatory benchmark whereas travel/ethnography is more positive con-
to investigate the positive and negative connotations of cerning smell and more negative for taste (see Fig. 1
gustatory events across different text genres. We perform and Fig. 2, where the light blue refers to negative va-
the same analysis for olfactory events, using the olfactory lencies and the dark blue to positive ones). We then
benchmark of [3] in order to compare the outcome for analyzed the most frequent smell / taste sources in
the two senses. To perform this analysis, we first divide the two selected genres to motivate why they exhibit
Taste_Words and Smell_Words into positive and negative.
                                                                   10
                                                                        In the categories at https://ht.ac.uk/category/: The world>physical
9
    https://wellcomelibrary.org/moh/about-the-reports/                  sensation>Taste/Flavour>Savouriness&Unsavouriness;            The
    about-the-medical-officer-of-health-reports/                        world>physical sensation>Smell/Odour>Fagrant/Fragrance&Stench
                                                              5. System for Gustatory
                                                                 Information Extraction
                                                              The benchmark introduced in the previous sections is
                                                              used to train a classifier whose goal is to detect gustatory
                                                              information in English texts. The system is based on
                                                              multi-task learning (Section 5.1), and is then compared
                                                              with a “single task” classifier, which we consider our
                                                              baseline (Section 5.2).
Figure 1: Savoury (dark blue) and Unsavoury (light blue)
frequencies of taste words in genres                          5.1. Multitask configuration
                                                              To build our system for gustatory information extraction,
                                                              we adopted a multitask learning approach [20, 21], a con-
                                                               figuration successfully tested for olfactory information
                                                               extraction in [7, 8]. This approach treats the classification
                                                               of lexical units and each frame element as different tasks.
                                                              Additionally, we explored a “single task” classification
                                                               approach, where both lexical units and frame elements
                                                               are classified within a multiclass token classification task.
                                                              The results of these experiments served as a baseline for
                                                               evaluating the effectiveness of the multitask approach. In
                                                               both configurations, we employed a transformer-based
                                                               model fine-tuned for a token classification task [22]. This
                                                               methodology has proved effective across various NLP
                                                               tasks, including olfactory information extraction [8] and
                                                               the extraction of food-related ingredients [13]. We exper-
                                                               iment the two configurations with monolingual (English)
                                                               and multilingual versions of BERT and RoBERTa and
Figure 2: Fragrant/Fragrance (dark blue) and Stench           with an English historical model, MacBERTh. The mod-
(light blue) frequencies of smell words in genres              els we use are listed below:
                                                              - English BERT: bert-base-cased 11 [23]
                                                              - Multilingual BERT (mBERT): bert-base-multilingual-
                                                               cased 12 [23]
such difference in emotional valence. We notice that
                                                              - English historical model: MacBERTh 13 [24]
smell sources in medicine/botany tend to be common
                                                              - English RoBERTa: roberta-base 14 [25]
to hospital and disease-related domains having words
                                                              - Multilingual RoBERTa (RoBERTa xlm): xlm-
such as ‘urine’ and ’fetid bronchitis’, while taste sources
                                                               roberta-large15 [26]
more easily belong to the realm of common food, with
                                                              We fine-tuned each model using the same data, main-
words such as ‘almonds’ and ‘apples’. For what con-
                                                               taining identical training, validation, and test splits, and
cerns travel/ethnography instead, among the most
                                                               evaluated them using 5-fold cross-validation. Each fold
frequently described taste sources there are exotic and
                                                               contained 80% of the lexical units and their related frame
rare foods such as ‘coconut’ and ‘plantain’, likely result-
                                                               elements for training, 10% for validation (dev), and 10%
ing unpleasant to the palates of foreign travelers. Smell
                                                               for testing. These splits were consistent across all con-
sources tend to refer instead to plants, like ‘flowers’ or
                                                               figurations and not entirely random. This configuration
‘roots’, hence usually pleasant or neutral to the noses
                                                               ensured a balanced distribution of frame elements and
of the writers. This analysis of categories and sources’
                                                               comparability in every run. For labeling the data, we
distribution in the genres underlines the importance of
                                                               adopted the IOB (Inside-Outside-Beginning) labeling for-
a frame-base analysis for understanding and comparing
                                                               mat, as used in [7, 8]. This method facilitates a compre-
sensory descriptions, in particular their emotional va-
                                                               hensive analysis of sentences and lexical expressions by
lence.
                                                              11
                                                                 https://huggingface.co/google-bert/bert-base-cased
                                                              12
                                                                 https://huggingface.co/google-bert/bert-base-multilingual-cased
                                                              13
                                                                 https://huggingface.co/emanjavacas/MacBERTh
                                                              14
                                                                 https://huggingface.co/FacebookAI/roberta-base
                                                              15
                                                                 https://huggingface.co/FacebookAI/xlm-roberta-base
    Model         T_Word      T_Source      Quality     Circum.     Effect    Evoked_T       Loc.     T_Carr.     T_Modif.      Taster
     BERT          0.917        0.537        0.780       0.413       0.196       0.457       0.379      0.111       0.781       0.518
     BERT          0.903        0.530        0.712       0.308       0.019       0.254       0.206       0.0        0.681       0.434
    mBERT          0.919        0.554        0.784       0.402       0.180       0.466       0.357      0.087       0.763       0.511
    mBERT          0.910        0.557        0.740       0.284        0.0        0.304       0.162       0.0        0.694       0.434
  MacBERTh         0.943        0.580        0.799       0.444      0.285       0.501        0.338      0.093       0.783       0.512
  MacBERTh         0.909        0.548        0.720       0.366      0.021       0.226        0.242       0.0        0.688       0.455
   RoBERTa         0.913        0.558        0.786       0.414       0.219       0.473       0.406      0.094       0.772       0.508
   RoBERTa         0.891        0.553        0.726       0.343        0.0         0.33       0.228       0.0        0.726        0.5
   RoB.-xlm        0.932        0.587        0.817       0.452       0.279       0.497      0.416      0.105        0.784       0.563
   RoB.- xlm       0.903        0.601        0.777        0.4        0.021       0.409       0.25       0.0         0.743       0.539
Table 3
Results (F1) of the classifiers on the lexical unit (T_Word) and 9 frame elements with single (italics) and multitask configurations.
The results are the average of the f1 results of each label across the 5 folds.


labeling each token with either Inside, Outside, or Begin-         five times, each time with a different data fold, and the
ning labels as appropriate. To fine-tune the models, we            average scores were computed. We present the results of
used MaChAmp [27], a specialized toolkit designed for              for the single task approach of each model in italics in
multi-task fine-tuning scenarios. In this approach, each           Table 3. We observe high performance variations across
label classification is treated as a distinct task. This setup     different frame elements, with the best results obtained
ensures that simpler tasks, such as recognizing lexical            for “Quality” and “Taste_Modifier”. This is probably due
units, contribute as auxiliary tasks to more complex la-           to the fact that their syntactic realization tends to be con-
bel classifications like “Circumstances” or “Effect” which         sistent in the different documents, with “Quality” mainly
include entire sentences rather than individual words.             expressed by adjectives and “Taste_Modifier” by preposi-
MaChAmp enables the choice of different parameters,                tional phrases introduced by with. On the contrary, clas-
such as loss weight, epochs and batch size, and we tested          sification results for “Taste_Source” are quite low despite
different configurations 16 . The results in Table 3 for           it being the most frequent FE in the training set, probably
the multitask approach share the configuration which               because they can be expressed by many different role
yielded the best results. The configuration is the same            fillers and syntactic constructions. Upon reviewing the
for all the models and it is reported in Appendix A.               test and prediction results, we find that most mistakes
                                                                   concerning Taste_Source are due to a wrong span extent,
5.2. “Single Task” configuration as                                for instance the system predicts “the taste of [lollilop]”
                                                                   while the gold standard is “the taste [of lollipop]”. This
     Baseline                                                      issue is also likely reflected in the inter-annotator agree-
Similar to the system for smell information extraction             ment (IAA) of the benchmark. In the future, we will
presented in [8], we designed our baseline approach as             consider alternative ways to evaluate text spans beside
a single-task multiclass classification, where the model           exact match, for instance by computing the cosine simi-
assigns one of 21 possible labels to each token. These             larity between gold instances and system predictions.
labels include 20 representing either “begin” or “inside”              Overall, MacBERTh is the best model for Taste_Word
of each lexical unit and frame element, and 1 label repre-         detection, but the different FEs are mostly detected with
senting “outside”. As we did for the multitask approach,           higher accuracy using RoBERTa xlm. For this reason,
each model is fine-tuned with a token classification head          we plan to adopt this model for our future research on
on top 17 . During the training of each model, a hy-               gustatory language.
perparameter search was conducted on the first fold
of our data. The search space included learning rates
[1𝑒 − 5, 2𝑒 − 5, 3𝑒 − 5, 4𝑒 − 5, 5𝑒 − 5], batch sizes 6. Conclusions and Future
[8, 16, 32], and training epochs up to 20, with warmup ap-                Direction
plied for 10% of the training steps. After determining the
optimal hyperparameters for each model, it is fine-tuned In this paper, we presented a benchmark for gustatory
                                                                      events containing manually annotated taste-related infor-
16
   Loss weight with different combinations over the labels [1, 0.75], mation, built as a counterpart to the one proposed in [3].
   epochs [10, 20, 30], and batch size [16, 32]                       The benchmark is constructed with the same approach
17
   https://huggingface.co/docs/transformers/tasks/token_              adopting a frame-based methodological framework to
 classification
analyze sensory language. We emphasized the impor-                 riyetoğlu, G. Dijkstra, et al., A multilingual bench-
tance of frame-based analysis to capture sensory events            mark to capture olfactory situations over time, in:
by exploring the characterization of positive and nega-            Proceedings of the 3rd Workshop on Computational
tive valence in the benchmarks through the analysis of             Approaches to Historical Language Change, 2022,
taste and smell words and sources. The analysis based              pp. 1–10.
on frames seems to bring relevant insights into captur-        [4] C. J. Fillmore, Frame semantics and the nature of
ing sensory valence from different perspectives, likely            language, Annals of the New York Academy of
supporting the suitability of this approach to deal with           Sciences 280 (1976) 20–32.
humanistic inquiries. We then presented a supervised sys-      [5] S. S. Tekiroğlu, G. Özbal, C. Strapparava, A compu-
tem to automatically extract taste-related frames, trained         tational approach to generate a sensorial lexicon,
on this benchmark. This preliminary exploration and the            in: Proceedings of the 4th Workshop on Cognitive
results obtained with our experiments seem promising               Aspects of the Lexicon (CogALex), Association for
for future exploration with automatically extracted data.          Computational Linguistics and Dublin City Uni-
Indeed, the limited data of the benchmark are not enough           versity, Dublin, Ireland, 2014, pp. 114–125. URL:
to draw relevant conclusions, and for this reason we plan          https://aclanthology.org/W14-4716. doi:10.3115/
to use our system to extract more data and conduct large-          v1/W14-4716.
scale analyses of the evolution of sensory information         [6] R. Brate, P. Groth, M. van Erp, Towards olfactory in-
over time. The limited number of documents is likely a             formation extraction from text: A case study on de-
contributing factor to the significant discrepancies in ac-        tecting smell experiences in novels, in: Proceedings
curacy among the different frame elements, necessitating           of the The 4th Joint SIGHUM Workshop on Com-
more instances to enable a good generalization. Future             putational Linguistics for Cultural Heritage, Social
steps should involve increasing the number of documents            Sciences, Humanities and Literature, International
and providing less sparse annotations, aiming for better           Committee on Computational Linguistics, Online,
temporal balance. The focus should be on annotating                2020, pp. 147–155. URL: https://aclanthology.org/
frame elements with lower scores and fewer instances in            2020.latechclfl-1.18.
the benchmark, such as Taste_Carrier and Location. Ad-         [7] S. Menini, T. Paccosi, S. S. Tekiroğlu, S. Tonelli,
ditionally, alternative metrics and techniques should be           Scent mining: Extracting olfactory events, smell
employed to capture and explain performance variations             sources and qualities, in: Proceedings of the 7th
across different models. As a further comparison, we plan          Joint SIGHUM Workshop on Computational Lin-
also to assess the performance of general-purpose frame            guistics for Cultural Heritage, Social Sciences, Hu-
semantic parsers like LOME [28] on our benchmark.                  manities and Literature, 2023, pp. 135–140.
                                                               [8] S. Menini, Semantic frame extraction in multilin-
                                                                   gual olfactory events, in: Proceedings of the 2024
7. Aknowledgments                                                  Joint International Conference on Computational
                                                                   Linguistics, Language Resources and Evaluation
Funded by the European Union under grant agreement
                                                                   (LREC-COLING 2024), 2024, pp. 14622–14627.
101088548 -TRIFECTA. Views and opinions expressed are
                                                               [9] C. Boscher, C. Largeron, V. Eglin, E. Egyed-
however those of the author only and do not necessarily
                                                                   Zsigmond, Sense-lm: A synergy between a lan-
reflect those of the European Union or the European
                                                                   guage model and sensorimotor representations for
Research Council. Neither the European Union nor the
                                                                   auditory and olfactory information extraction, in:
granting authority can be held responsible for them. The
                                                                   Findings of the Association for Computational Lin-
authors would also like to thank Marieke Van Erp, the
                                                                   guistics: EACL 2024, 2024, pp. 1695–1711.
head of the project, for her support.
                                                              [10] O. AI, Gpt-4 technical report, arXiv preprint
                                                                   arXiv:2303.08774 (2023).
References                                                    [11] D. Jurafsky, The language of food : a linguist reads
                                                                   the menu / Dan Jurafsky., first edition. ed., W.W.
 [1] B. Winter, Sensory linguistics: Language, percep-             Norton Company, New York, 2014 - 2014.
     tion and metaphor, volume 20, John Benjamins Pub-        [12] G. Cenikj, G. Popovski, R. Stojanov, B. K. Sel-
     lishing Company, 2019.                                        jak, T. Eftimov, Butter: Bidirectional lstm for food
 [2] B. Magnini, V. Balaraman, S. Magnolini, M. Guerini,           named-entity recognition, 2020.
     F. B. Kessler, T. Povo, What’s in a food name: Knowl-    [13] R. Stojanov, G. Popovski, G. Cenikj, B. Koroušić Sel-
     edge induction from gazetteers of food main ingre-            jak, T. Eftimov, A fine-tuned bidirectional encoder
     dient, in: Proceedings of CLiC-it 2018, 2018, p. 241.         representations from transformers model for food
 [3] S. Menini, T. Paccosi, S. Tonelli, M. Van Erp, I. Lee-        named-entity recognition: Algorithm development
     mans, P. Lisena, R. Troncy, W. Tullett, A. Hür-               and validation, Journal of Medical Internet Re-
     search 23 (2021) e28229.                             hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
[14] G. Popovski, B. K. Seljak, T. Eftimov, Foodbase      L. Zettlemoyer, V. Stoyanov, Unsupervised cross-
     corpus: a new resource of annotated food entities,   lingual representation learning at scale, CoRR
     Database 2019 (2019) baz121.                         abs/1911.02116 (2019). URL: http://arxiv.org/abs/
[15] A. Wróblewska, A. Kaliska, M. Pawłowski,             1911.02116. arXiv:1911.02116.
     D. Wiśniewski, W. Sosnowski, A. Ławrynowicz, [27] R. Van Der Goot, A. Üstün, A. Ramponi, I. Sharaf,
     Tasteset–recipe dataset and food entities recogni-   B. Plank, Massive choice, ample tasks (machamp): A
     tion benchmark, arXiv preprint arXiv:2204.07775      toolkit for multi-task learning in nlp, arXiv preprint
     (2022).                                              arXiv:2005.14672 (2020).
[16] T. Paccosi, S. Tonelli, A new annotation scheme [28] P. Xia, G. Qin, S. Vashishtha, Y. Chen, T. Chen,
     for the semantics of taste, in: Proceedings of the   C. May, C. Harman, K. Rawlins, A. S. White,
     20th Joint ACL-ISO Workshop on Interoperable Se-     B. Van Durme, LOME: Large ontology multilingual
     mantic Annotation@ LREC-COLING 2024, 2024, pp.       extraction, in: D. Gkatzia, D. Seddah (Eds.), Proceed-
     39–46.                                               ings of the 16th Conference of the European Chap-
[17] J. Ruppenhofer, M. Ellsworth, M. Schwarzer-          ter of the Association for Computational Linguis-
     Petruck, C. R. Johnson, J. Scheffczyk, FrameNet      tics: System Demonstrations, Association for Com-
     II: Extended theory and practice, Technical Report,  putational Linguistics, Online, 2021, pp. 149–159.
     International Computer Science Institute, 2016.      URL: https://aclanthology.org/2021.eacl-demos.19.
[18] K. Krippendorff, Computing krippendorff’s alpha-     doi:10.18653/v1/2021.eacl-demos.19.
     reliability, 2011.
[19] B. Winter, Taste and smell words form an affectively
     loaded and emotionally flexible part of the english
     lexicon, Language, Cognition and Neuroscience 31
     (2016) 975–988.
[20] R. Caruana, Multitask learning: A knowledge-based
     source of inductive bias1, in: Proceedings of the
     Tenth International Conference on Machine Learn-
     ing, Citeseer, 1993, pp. 41–48.
[21] R. Caruana, Multitask learning, Machine learning
     28 (1997) 41–75.
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
     L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, At-
     tention is all you need, Advances in neural infor-
     mation processing systems 30 (2017).
[23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert:
     Pre-training of deep bidirectional transformers for
     language understanding, in: Proceedings of the
     2019 Conference of the North American Chapter
     of the Association for Computational Linguistics:
     Human Language Technologies, Volume 1 (Long
     and Short Papers), 2019, pp. 4171–4186.
[24] E. Manjavacas Arévalo, L. Fonteyn, MacBERTh:
     Development and evaluation of a historically pre-
     trained language model for English (1450-1950), in:
     Proceedings of the Workshop on Natural Language
     Processing for Digital Humanities (NLP4DH), Asso-
     ciation for Computational Linguistics, 2021, pp. 23–
     36. URL: https://aclanthology.org/2021.nlp4dh-1.4.
     pdf.
[25] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
     O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
     Roberta: A robustly optimized BERT pretraining
     approach, CoRR abs/1907.11692 (2019). URL: http:
     //arxiv.org/abs/1907.11692. arXiv:1907.11692.
[26] A. Conneau, K. Khandelwal, N. Goyal, V. Chaud-
 Part of Speech       Lexical Units
 Nouns                Acidity, aftertaste, aroma, bitterness, dainty, delicacy, disgust, distaste, flavor, flavour, flavorful, flavour-
                      ful, flavoring, flavouring, flavorsome, flavoursome, flavorous, flavourous, gustation, insipidity, mistaste,
                      over-eating, palatableness, piquancy, pungency, rancidity, relish, rellish (obsolete), saltness, sapid-
                      ity, sapor, savor, savoriness, savour, sharpness, smack, smatch, sourness, sowreness (archaic form of
                      sourness), sweetness, tang, tarage, tartness, tast (obsolete), taste, tastelessness, tasting, unsavoriness,
                      unsavouriness
 Adjectives           Acid, acidic, appetizing, appetizing, bitter, bitter-sweet, bland, dainty, delectable, delicious, delight-
                      som(e), disgusting, flavorless, flavorful, flavourful, flavourless, flavoursome, gamy, indigestible, insipid,
                      juicy, mellow, palatable, piquant, pungent, racy, rancid, rank, salt/salty, sapid, savory, savoury, savourly,
                      seasoned, sharp, sour, soured, sower (archaic form of sour), spicy, stale, sweet, tangy, tart, tasteless,
                      tasty, toothsome, unpalatable, unsavor, unsavour, unsavoury, unsavory, unseasoned, unsweet, unsweet-
                      ened, wearish, wersh, yummy
 Verbs                Drink (up), drinking (up), drank (up), drunk (up), eat (up), ate (up), eateth (archaic), eaten (up),
                      eating (up), distaste, distasting, distasted, mistaste, mistasted, mistasting, partake, partaking, partook,
                      partaken, relish, relisheth (archaic), relishing, relished, season, seasoning, seasoned, smack, smacking,
                      smacked, smatch (obsolete), sweeten, sweetening, sweetened, taste, tasting, tasted
 Adverbs              Sweetly, sourly, tastefully, bitterly, tastingly, unsavourily, unsavourly, insipidly, savourously, savourily,
                      flavourfully

Table 4
Lexical units for Taste

               Hyperparameter             Value
               𝛽 1, 𝛽 2                 0.9, 0.99
               Dropout                        0.2
               Epochs                          20
               Batch Size                      32
               Learning Rate (LR)         0.0001
               Decay Factor                  0.38
               Cut Fraction                   0.3
               All tasks loss weight            1
Table 5
Hyperparameter value used for the experiments which yield
the best results


Appendices
A. Lexical Units and Frame
   Elements
In Table 4, we display the list of lexical units or taste
words presented in [16].


B. Hyperparameter Values
The hyperparameter setting for all our models is pre-
sented in Table 5. The setting is the default MaChAmp’s
hyperparameter values, with the addition of loss weights
at 1, and 20 epochs of training.