Benchmarking the Semantics of Taste: Towards the Automatic Extraction of Gustatory Language Teresa Paccosi1,2,3 , Sara Tonelli1 1 Fondazione Bruno Kessler, Via Sommarive, 18, Trento 2 Università degli studi di Trento, Via Calepina, 14, Rovereto 3 DHLab / KNAW Humanities Cluster, Oudezijds Achterburgwal 185 1012 DK Amsterdam, The Netherlands Abstract In this paper, we present a benchmark containing texts manually annotated with gustatory semantic information. We employ a FrameNet-like approach previously tested to address olfactory language, which we adapt to capture gustatory events. We then propose an exploration of the data in the benchmark to show the possible insights brought by this type of approach, addressing the investigation of emotional valence in text genres. Eventually, we present a supervised system trained with the taste benchmark for the extraction of gustatory information from historical and contemporary texts. Keywords Sensory semantics, gustatory language, information extraction, digital humanities 1. Introduction Semantics [4], and the system is trained to identify the lexical units and the possible semantic roles contribut- Despite the central role of nutrition in our lives, taste has ing to the construction of a gustatory event. We present been often classified as an inferior sense in the Western the results of the experiments and an exploration of the philosophical tradition. This downplayed role is reflected benchmark data, aiming to demonstrate the potential of in the vocabulary used to describe the gustatory experi- frame-based analysis for sensory studies. ence, which, together with smell, is characterized by a scarcity of domain-specific terms [1]. The difficulty in capturing the semantics of taste could help explain why 2. Related Work there are few works in the fields of Natural Language Processing (NLP) and Digital Humanities (DH) that deal In recent years, there has been a growing interest within with this sense and, in particular, the language used to the NLP community in developing resources designed to describe its experience. While there has been renewed capture the sensory content of language [5]. In particu- interest in the automatic extraction of nutrients and in- lar, in the framework 1 of the three-year European Project gredients from texts for health and medicinal purpose [2], “Odeuropa” aimed at preserving intangible cultural her- less attention has been devoted to the development of itage, several works have focused on analyzing smell de- tools and models focused on capturing the semantics of scriptions [6] and extracting olfactory information from sensory experiences, especially in a diachronic fashion. texts. For instance, [3] created a manually annotated In this paper, we present an English benchmark for benchmark with smell events, which has been subse- the study of gustatory language and a supervised system quently used to train a system for olfactory information for the automatic extraction of taste-related events in extraction [7, 8]. The benchmark focuses on the lan- English, which we trained using this benchmark. The guage used to describe olfactory experiences and covers benchmark was built to be a counterpart to the olfactory a period of four centuries (1600-1900), making it useful one presented in [3], with the idea of making the study for historical research. An extension in this direction of the language of these two senses comparable. The sys- is SENSE-LM, a system for extracting sensory informa- tem is designed as a means to study the language used to tion from texts, which shows that combining language describe the experience of tasting from both synchronic models with lexical resource-based approaches yields and diachronic perspectives. The selected formal repre- better results in extracting sensory references from texts sentation for the semantics of taste is based on Frame compared to systems that do not integrate these two components [9]. The authors were the first to combine CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, sensorimotor representations with the textual features Dec 04 — 06, 2024, Pisa, Italy of language models for the task of sensory information $ tpaccosi@fbk.eu;teresa.paccosi@unitn.it (T. Paccosi); extraction in text documents. Even if they propose the satonelli@fbk.eu (S. Tonelli)  0009-0009-2348-7556 (T. Paccosi); 0000-0001-8010-6689 system for all the 5 senses, they only tested it on olfactory (S. Tonelli) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1 Attribution 4.0 International (CC BY 4.0). https://odeuropa.eu/ CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Frame Element Definition Taste_Source The food items that are ingested Quality Any property used to describe the taste (usually adjectives) Taste_Carrier Anything that can contain the taste source Taster The person/animal who ingests the food Evoked_Taste The taste that is evoked but it is not present (e.g., it tastes like onions) Location The place in which the food is tasted Taste_Modifier An ingredient that can modify the perception of the taste of a taste source Circumstances The condition or circumstance in which the taste event occurs Effect Any effect provoked by the tasting experience Table 1 List of Gustatory Frame Elements and auditory language, using respectively the benchmark mark together with the frame elements associated with of [3] and an artificial dataset they generated with GPT-4 it, which the taste extraction system should then iden- [10]. Most existing work on food representation in the tify automatically. For instance, in the sentence “[Slimy field of NLP focuses on health-related applications. A no- milk]𝑇 𝑎𝑠𝑡𝑒_𝑆𝑜𝑢𝑟𝑐𝑒 has an [unpleasant]𝑄𝑢𝑎𝑙𝑖𝑡𝑦 taste”, the table work with a linguistic focus is [2], where the authors system has to identify the Taste_Word (‘taste’), and then concentrate on identifying noun-compound headnouns the possible frame elements (in this case, Taste_Source for developing conversational agents in the e-commerce and Quality). A list of the possible frame elements and domain. They propose a supervised approach based on a their definition is provided in Table 1. The documents neural sequence-to-sequence model to identify the most annotated in the benchmark cover 5 different domains or informative token in Italian food compound-nouns, ob- genres, almost evenly distributed with 3/4 documents for taining promising results despite the complexity of the century in every domain for a total of 72 documents. The task. Taste has been also addressed from a diachronic genres are: Literature, Science & Philosophy, Household & point of view in [11], in which the author reconstructs Recipes, Travel & Ethnography, and Medicine & Botany. the evolution of food language focusing on the history To select the documents we automatically search for texts of some dishes and ingredients across continents using presenting a greater density of lexical units (taste words) 2 computational linguistic tools. Several studies have de- spanning through several English corpora and taste- veloped named-entity recognition (NER) models to au- related websites. The corpora form which we extract tomatically extract food entities for medicinal purposes the documents we annotated are: (1) Early English Books and food science applications [12, 13], creating domain- Online (EEBO)3 , a collection of documents published be- specific corpora by sourcing data from culinary websites tween 1475 and 1700 covering different domains such and online recipe books [14, 15]. as literature, philosophy, politics, religion, geography, history, politics, and mathematics; (2) Project Gutenberg 4 , a digitized archive of cultural works, containing differ- 3. Benchmark for Taste ent repositories, mainly in the literary domain; (3) me- dievalcookery.com5 a list of texts freely available online The training data we use for the models in this paper is relating to medieval food and ancient cooking recipes; (4) a benchmark created according to the annotation guide- foodsofengland.co.uk 6 an online library which holds the lines presented in [16]. The formalization adopted to complete texts of several cook books from 1390 to 1974; annotate the benchmark is inspired by Frame Seman- (5) Wikisource7 , an online digital library of free-content tics [4] and their implementation through the FrameNet textual sources managed by the Wikimedia Foundation; annotation project [17]. In FrameNet, events and situa- (6) British Library 8 , a collection of 65,227 digitised vol- tions are constructed as frames, structures that represent umes from the 16th to the 19th Century; (7) London Pulse the knowledge necessary to understand the meaning of words. Frames include two main components, namely lexical units, domain-specific words or expression that 2 The list of lexical units is provided in Appendix A trigger the frame, and frame elements, domain-specific 3 https://textcreationpartnership.org/tcp-texts/ semantic roles usually attached as dependents to the lex- eebo-tcp-early-english-books-online/ ical unit. In our case, taste events are captured through 4 https://www.gutenberg.org/ 5 a so-called Gustatory frame, which is triggered in a 6 https://www.medievalcookery.com/etexts.html?England document by Taste_Words (i.e., domain-specific lexi- 7 http://www.foodsofengland.co.uk/references.htm https://en.wikisource.org/wiki/Main_Page cal units). Each lexical unit is annotated in the bench- 8 https://data.bl.uk/digbks/ Frame Elements (FEs) 1500 1600 1700 1800 1900 Overall Taste_Words 440 2417 500 1498 803 5,648 Taste_Source 372 1627 375 1081 599 4,393 Quality 197 1495 255 881 489 1,732 Taste_Modifier 135 142 66 154 78 1,357 Taster 65 173 85 185 100 638 Evoked_Taste 20 127 31 53 16 247 Location 11 44 12 24 16 116 Taste_Carrier 9 38 9 26 12 98 Circumstances 19 206 38 228 82 656 Effect 24 56 32 34 31 174 Table 2 Statistics of the Taste Benchmark Medical Reports9 , a collection of 5800 Medical Officer of To this purpose, we use the categories proposed in the Health reports from the Greater London area from 1848 Historical Thesaurus of English of Savouriness and to 1972. Unsavouriness for Taste and Fragrant/Fragrance In Table 2 we report the statistics of the annotated and Stench for Smell10 . This thesaurus contains almost benchmark (note that in [16] we presented only a prelim- every recorded word in English from medieval times to inary version of the benchmark containing around 1,400 the present day, ordered into detailed hierarchies of mean- Taste_Words). The most frequent frame element is the ing. In the Thesaurus, every category of the hierarchy Taste_Source, followed by Quality and Taste_Modifier, is divided per part of speech (PoS). For our analysis, we which represent the core frame elements, while the rest manually selected all the nouns, adjectives and adverbs of the frame elements are much sparser. Even if the distri- used in the period we cover with our documents, namely bution of the frame elements is not balanced, the system from 16th century to 20th century. We then assigned the is trained to extract the taste words and all the 9 frame words labeled as Taste_Words and Smell_Words in the elements. Two expert linguists, trained on [16]’s guide- documents to one of the two categories (positive or neg- lines, annotated three documents from 1670, 1720, and ative) and calculated the normalized frequency of each 1920 to assess Inter Annotator Agreement (IAA). The category across different text genres. As reported in Krippendorff’s alpha score [18] at span level was 0.70, Section 3, the genres represented in the gustatory bench- indicating a moderate agreement. mark are: Literature, Science & Philosophy, Household & Recipes, Travel & Ethnography, Medicine & Botany. In the olfactory benchmark presented in [3], there are 4. Exploration of olfactory and instead 10 different genres: Household & Recipes, Law & gustatory benchmarks Regulations, Literature, Medicine & Botany, Perfumes & Fashion, Public health, Religion, Science & Philosophy, It has been observed that words used to describe ol- Theatre, Travel & Ethnography. factory and gustatory experiences tend to appear more We display the output of this analyses in Fig. 1 frequently in emotionally charged contexts and carry a (for taste words) and Fig. 2 (for smell words), aimed stronger evaluative content compared to words related at showing which emotional valence prevails in each to other senses [19]. By ‘evaluative content’, we refer in genre for the two senses. We observe that two gen- this paper to the concept of ‘emotional valence’, which is res exhibit opposite tendencies: medicine/botany defined as “the pleasantness of a word in terms of pos- shows a more negative orientation in the smell bench- itive and negative meaning” ([1], p. 201). We therefore mark and a more positive one in the taste benchmark, conducted an exploration of the gustatory benchmark whereas travel/ethnography is more positive con- to investigate the positive and negative connotations of cerning smell and more negative for taste (see Fig. 1 gustatory events across different text genres. We perform and Fig. 2, where the light blue refers to negative va- the same analysis for olfactory events, using the olfactory lencies and the dark blue to positive ones). We then benchmark of [3] in order to compare the outcome for analyzed the most frequent smell / taste sources in the two senses. To perform this analysis, we first divide the two selected genres to motivate why they exhibit Taste_Words and Smell_Words into positive and negative. 10 In the categories at https://ht.ac.uk/category/: The world>physical 9 https://wellcomelibrary.org/moh/about-the-reports/ sensation>Taste/Flavour>Savouriness&Unsavouriness; The about-the-medical-officer-of-health-reports/ world>physical sensation>Smell/Odour>Fagrant/Fragrance&Stench 5. System for Gustatory Information Extraction The benchmark introduced in the previous sections is used to train a classifier whose goal is to detect gustatory information in English texts. The system is based on multi-task learning (Section 5.1), and is then compared with a “single task” classifier, which we consider our baseline (Section 5.2). Figure 1: Savoury (dark blue) and Unsavoury (light blue) frequencies of taste words in genres 5.1. Multitask configuration To build our system for gustatory information extraction, we adopted a multitask learning approach [20, 21], a con- figuration successfully tested for olfactory information extraction in [7, 8]. This approach treats the classification of lexical units and each frame element as different tasks. Additionally, we explored a “single task” classification approach, where both lexical units and frame elements are classified within a multiclass token classification task. The results of these experiments served as a baseline for evaluating the effectiveness of the multitask approach. In both configurations, we employed a transformer-based model fine-tuned for a token classification task [22]. This methodology has proved effective across various NLP tasks, including olfactory information extraction [8] and the extraction of food-related ingredients [13]. We exper- iment the two configurations with monolingual (English) and multilingual versions of BERT and RoBERTa and Figure 2: Fragrant/Fragrance (dark blue) and Stench with an English historical model, MacBERTh. The mod- (light blue) frequencies of smell words in genres els we use are listed below: - English BERT: bert-base-cased 11 [23] - Multilingual BERT (mBERT): bert-base-multilingual- cased 12 [23] such difference in emotional valence. We notice that - English historical model: MacBERTh 13 [24] smell sources in medicine/botany tend to be common - English RoBERTa: roberta-base 14 [25] to hospital and disease-related domains having words - Multilingual RoBERTa (RoBERTa xlm): xlm- such as ‘urine’ and ’fetid bronchitis’, while taste sources roberta-large15 [26] more easily belong to the realm of common food, with We fine-tuned each model using the same data, main- words such as ‘almonds’ and ‘apples’. For what con- taining identical training, validation, and test splits, and cerns travel/ethnography instead, among the most evaluated them using 5-fold cross-validation. Each fold frequently described taste sources there are exotic and contained 80% of the lexical units and their related frame rare foods such as ‘coconut’ and ‘plantain’, likely result- elements for training, 10% for validation (dev), and 10% ing unpleasant to the palates of foreign travelers. Smell for testing. These splits were consistent across all con- sources tend to refer instead to plants, like ‘flowers’ or figurations and not entirely random. This configuration ‘roots’, hence usually pleasant or neutral to the noses ensured a balanced distribution of frame elements and of the writers. This analysis of categories and sources’ comparability in every run. For labeling the data, we distribution in the genres underlines the importance of adopted the IOB (Inside-Outside-Beginning) labeling for- a frame-base analysis for understanding and comparing mat, as used in [7, 8]. This method facilitates a compre- sensory descriptions, in particular their emotional va- hensive analysis of sentences and lexical expressions by lence. 11 https://huggingface.co/google-bert/bert-base-cased 12 https://huggingface.co/google-bert/bert-base-multilingual-cased 13 https://huggingface.co/emanjavacas/MacBERTh 14 https://huggingface.co/FacebookAI/roberta-base 15 https://huggingface.co/FacebookAI/xlm-roberta-base Model T_Word T_Source Quality Circum. Effect Evoked_T Loc. T_Carr. T_Modif. Taster BERT 0.917 0.537 0.780 0.413 0.196 0.457 0.379 0.111 0.781 0.518 BERT 0.903 0.530 0.712 0.308 0.019 0.254 0.206 0.0 0.681 0.434 mBERT 0.919 0.554 0.784 0.402 0.180 0.466 0.357 0.087 0.763 0.511 mBERT 0.910 0.557 0.740 0.284 0.0 0.304 0.162 0.0 0.694 0.434 MacBERTh 0.943 0.580 0.799 0.444 0.285 0.501 0.338 0.093 0.783 0.512 MacBERTh 0.909 0.548 0.720 0.366 0.021 0.226 0.242 0.0 0.688 0.455 RoBERTa 0.913 0.558 0.786 0.414 0.219 0.473 0.406 0.094 0.772 0.508 RoBERTa 0.891 0.553 0.726 0.343 0.0 0.33 0.228 0.0 0.726 0.5 RoB.-xlm 0.932 0.587 0.817 0.452 0.279 0.497 0.416 0.105 0.784 0.563 RoB.- xlm 0.903 0.601 0.777 0.4 0.021 0.409 0.25 0.0 0.743 0.539 Table 3 Results (F1) of the classifiers on the lexical unit (T_Word) and 9 frame elements with single (italics) and multitask configurations. The results are the average of the f1 results of each label across the 5 folds. labeling each token with either Inside, Outside, or Begin- five times, each time with a different data fold, and the ning labels as appropriate. To fine-tune the models, we average scores were computed. We present the results of used MaChAmp [27], a specialized toolkit designed for for the single task approach of each model in italics in multi-task fine-tuning scenarios. In this approach, each Table 3. We observe high performance variations across label classification is treated as a distinct task. This setup different frame elements, with the best results obtained ensures that simpler tasks, such as recognizing lexical for “Quality” and “Taste_Modifier”. This is probably due units, contribute as auxiliary tasks to more complex la- to the fact that their syntactic realization tends to be con- bel classifications like “Circumstances” or “Effect” which sistent in the different documents, with “Quality” mainly include entire sentences rather than individual words. expressed by adjectives and “Taste_Modifier” by preposi- MaChAmp enables the choice of different parameters, tional phrases introduced by with. On the contrary, clas- such as loss weight, epochs and batch size, and we tested sification results for “Taste_Source” are quite low despite different configurations 16 . The results in Table 3 for it being the most frequent FE in the training set, probably the multitask approach share the configuration which because they can be expressed by many different role yielded the best results. The configuration is the same fillers and syntactic constructions. Upon reviewing the for all the models and it is reported in Appendix A. test and prediction results, we find that most mistakes concerning Taste_Source are due to a wrong span extent, 5.2. “Single Task” configuration as for instance the system predicts “the taste of [lollilop]” while the gold standard is “the taste [of lollipop]”. This Baseline issue is also likely reflected in the inter-annotator agree- Similar to the system for smell information extraction ment (IAA) of the benchmark. In the future, we will presented in [8], we designed our baseline approach as consider alternative ways to evaluate text spans beside a single-task multiclass classification, where the model exact match, for instance by computing the cosine simi- assigns one of 21 possible labels to each token. These larity between gold instances and system predictions. labels include 20 representing either “begin” or “inside” Overall, MacBERTh is the best model for Taste_Word of each lexical unit and frame element, and 1 label repre- detection, but the different FEs are mostly detected with senting “outside”. As we did for the multitask approach, higher accuracy using RoBERTa xlm. For this reason, each model is fine-tuned with a token classification head we plan to adopt this model for our future research on on top 17 . During the training of each model, a hy- gustatory language. perparameter search was conducted on the first fold of our data. The search space included learning rates [1𝑒 − 5, 2𝑒 − 5, 3𝑒 − 5, 4𝑒 − 5, 5𝑒 − 5], batch sizes 6. Conclusions and Future [8, 16, 32], and training epochs up to 20, with warmup ap- Direction plied for 10% of the training steps. After determining the optimal hyperparameters for each model, it is fine-tuned In this paper, we presented a benchmark for gustatory events containing manually annotated taste-related infor- 16 Loss weight with different combinations over the labels [1, 0.75], mation, built as a counterpart to the one proposed in [3]. epochs [10, 20, 30], and batch size [16, 32] The benchmark is constructed with the same approach 17 https://huggingface.co/docs/transformers/tasks/token_ adopting a frame-based methodological framework to classification analyze sensory language. We emphasized the impor- riyetoğlu, G. Dijkstra, et al., A multilingual bench- tance of frame-based analysis to capture sensory events mark to capture olfactory situations over time, in: by exploring the characterization of positive and nega- Proceedings of the 3rd Workshop on Computational tive valence in the benchmarks through the analysis of Approaches to Historical Language Change, 2022, taste and smell words and sources. The analysis based pp. 1–10. on frames seems to bring relevant insights into captur- [4] C. J. Fillmore, Frame semantics and the nature of ing sensory valence from different perspectives, likely language, Annals of the New York Academy of supporting the suitability of this approach to deal with Sciences 280 (1976) 20–32. humanistic inquiries. We then presented a supervised sys- [5] S. S. Tekiroğlu, G. Özbal, C. Strapparava, A compu- tem to automatically extract taste-related frames, trained tational approach to generate a sensorial lexicon, on this benchmark. This preliminary exploration and the in: Proceedings of the 4th Workshop on Cognitive results obtained with our experiments seem promising Aspects of the Lexicon (CogALex), Association for for future exploration with automatically extracted data. Computational Linguistics and Dublin City Uni- Indeed, the limited data of the benchmark are not enough versity, Dublin, Ireland, 2014, pp. 114–125. URL: to draw relevant conclusions, and for this reason we plan https://aclanthology.org/W14-4716. doi:10.3115/ to use our system to extract more data and conduct large- v1/W14-4716. scale analyses of the evolution of sensory information [6] R. Brate, P. Groth, M. van Erp, Towards olfactory in- over time. The limited number of documents is likely a formation extraction from text: A case study on de- contributing factor to the significant discrepancies in ac- tecting smell experiences in novels, in: Proceedings curacy among the different frame elements, necessitating of the The 4th Joint SIGHUM Workshop on Com- more instances to enable a good generalization. Future putational Linguistics for Cultural Heritage, Social steps should involve increasing the number of documents Sciences, Humanities and Literature, International and providing less sparse annotations, aiming for better Committee on Computational Linguistics, Online, temporal balance. The focus should be on annotating 2020, pp. 147–155. URL: https://aclanthology.org/ frame elements with lower scores and fewer instances in 2020.latechclfl-1.18. the benchmark, such as Taste_Carrier and Location. Ad- [7] S. Menini, T. Paccosi, S. S. Tekiroğlu, S. Tonelli, ditionally, alternative metrics and techniques should be Scent mining: Extracting olfactory events, smell employed to capture and explain performance variations sources and qualities, in: Proceedings of the 7th across different models. As a further comparison, we plan Joint SIGHUM Workshop on Computational Lin- also to assess the performance of general-purpose frame guistics for Cultural Heritage, Social Sciences, Hu- semantic parsers like LOME [28] on our benchmark. manities and Literature, 2023, pp. 135–140. [8] S. Menini, Semantic frame extraction in multilin- gual olfactory events, in: Proceedings of the 2024 7. Aknowledgments Joint International Conference on Computational Linguistics, Language Resources and Evaluation Funded by the European Union under grant agreement (LREC-COLING 2024), 2024, pp. 14622–14627. 101088548 -TRIFECTA. Views and opinions expressed are [9] C. Boscher, C. Largeron, V. Eglin, E. Egyed- however those of the author only and do not necessarily Zsigmond, Sense-lm: A synergy between a lan- reflect those of the European Union or the European guage model and sensorimotor representations for Research Council. Neither the European Union nor the auditory and olfactory information extraction, in: granting authority can be held responsible for them. The Findings of the Association for Computational Lin- authors would also like to thank Marieke Van Erp, the guistics: EACL 2024, 2024, pp. 1695–1711. head of the project, for her support. [10] O. AI, Gpt-4 technical report, arXiv preprint arXiv:2303.08774 (2023). References [11] D. Jurafsky, The language of food : a linguist reads the menu / Dan Jurafsky., first edition. ed., W.W. [1] B. Winter, Sensory linguistics: Language, percep- Norton Company, New York, 2014 - 2014. tion and metaphor, volume 20, John Benjamins Pub- [12] G. Cenikj, G. Popovski, R. Stojanov, B. K. Sel- lishing Company, 2019. jak, T. Eftimov, Butter: Bidirectional lstm for food [2] B. Magnini, V. Balaraman, S. Magnolini, M. Guerini, named-entity recognition, 2020. F. B. Kessler, T. Povo, What’s in a food name: Knowl- [13] R. Stojanov, G. Popovski, G. Cenikj, B. Koroušić Sel- edge induction from gazetteers of food main ingre- jak, T. Eftimov, A fine-tuned bidirectional encoder dient, in: Proceedings of CLiC-it 2018, 2018, p. 241. representations from transformers model for food [3] S. Menini, T. Paccosi, S. Tonelli, M. Van Erp, I. Lee- named-entity recognition: Algorithm development mans, P. Lisena, R. Troncy, W. Tullett, A. Hür- and validation, Journal of Medical Internet Re- search 23 (2021) e28229. hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, [14] G. Popovski, B. K. Seljak, T. Eftimov, Foodbase L. Zettlemoyer, V. Stoyanov, Unsupervised cross- corpus: a new resource of annotated food entities, lingual representation learning at scale, CoRR Database 2019 (2019) baz121. abs/1911.02116 (2019). URL: http://arxiv.org/abs/ [15] A. Wróblewska, A. Kaliska, M. Pawłowski, 1911.02116. arXiv:1911.02116. D. Wiśniewski, W. Sosnowski, A. Ławrynowicz, [27] R. Van Der Goot, A. Üstün, A. Ramponi, I. Sharaf, Tasteset–recipe dataset and food entities recogni- B. Plank, Massive choice, ample tasks (machamp): A tion benchmark, arXiv preprint arXiv:2204.07775 toolkit for multi-task learning in nlp, arXiv preprint (2022). arXiv:2005.14672 (2020). [16] T. Paccosi, S. Tonelli, A new annotation scheme [28] P. Xia, G. Qin, S. Vashishtha, Y. Chen, T. Chen, for the semantics of taste, in: Proceedings of the C. May, C. Harman, K. Rawlins, A. S. White, 20th Joint ACL-ISO Workshop on Interoperable Se- B. Van Durme, LOME: Large ontology multilingual mantic Annotation@ LREC-COLING 2024, 2024, pp. extraction, in: D. Gkatzia, D. Seddah (Eds.), Proceed- 39–46. ings of the 16th Conference of the European Chap- [17] J. Ruppenhofer, M. Ellsworth, M. Schwarzer- ter of the Association for Computational Linguis- Petruck, C. R. Johnson, J. Scheffczyk, FrameNet tics: System Demonstrations, Association for Com- II: Extended theory and practice, Technical Report, putational Linguistics, Online, 2021, pp. 149–159. International Computer Science Institute, 2016. URL: https://aclanthology.org/2021.eacl-demos.19. [18] K. Krippendorff, Computing krippendorff’s alpha- doi:10.18653/v1/2021.eacl-demos.19. reliability, 2011. [19] B. Winter, Taste and smell words form an affectively loaded and emotionally flexible part of the english lexicon, Language, Cognition and Neuroscience 31 (2016) 975–988. [20] R. Caruana, Multitask learning: A knowledge-based source of inductive bias1, in: Proceedings of the Tenth International Conference on Machine Learn- ing, Citeseer, 1993, pp. 41–48. [21] R. Caruana, Multitask learning, Machine learning 28 (1997) 41–75. [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, At- tention is all you need, Advances in neural infor- mation processing systems 30 (2017). [23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186. [24] E. Manjavacas Arévalo, L. Fonteyn, MacBERTh: Development and evaluation of a historically pre- trained language model for English (1450-1950), in: Proceedings of the Workshop on Natural Language Processing for Digital Humanities (NLP4DH), Asso- ciation for Computational Linguistics, 2021, pp. 23– 36. URL: https://aclanthology.org/2021.nlp4dh-1.4. pdf. [25] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized BERT pretraining approach, CoRR abs/1907.11692 (2019). URL: http: //arxiv.org/abs/1907.11692. arXiv:1907.11692. [26] A. Conneau, K. Khandelwal, N. Goyal, V. Chaud- Part of Speech Lexical Units Nouns Acidity, aftertaste, aroma, bitterness, dainty, delicacy, disgust, distaste, flavor, flavour, flavorful, flavour- ful, flavoring, flavouring, flavorsome, flavoursome, flavorous, flavourous, gustation, insipidity, mistaste, over-eating, palatableness, piquancy, pungency, rancidity, relish, rellish (obsolete), saltness, sapid- ity, sapor, savor, savoriness, savour, sharpness, smack, smatch, sourness, sowreness (archaic form of sourness), sweetness, tang, tarage, tartness, tast (obsolete), taste, tastelessness, tasting, unsavoriness, unsavouriness Adjectives Acid, acidic, appetizing, appetizing, bitter, bitter-sweet, bland, dainty, delectable, delicious, delight- som(e), disgusting, flavorless, flavorful, flavourful, flavourless, flavoursome, gamy, indigestible, insipid, juicy, mellow, palatable, piquant, pungent, racy, rancid, rank, salt/salty, sapid, savory, savoury, savourly, seasoned, sharp, sour, soured, sower (archaic form of sour), spicy, stale, sweet, tangy, tart, tasteless, tasty, toothsome, unpalatable, unsavor, unsavour, unsavoury, unsavory, unseasoned, unsweet, unsweet- ened, wearish, wersh, yummy Verbs Drink (up), drinking (up), drank (up), drunk (up), eat (up), ate (up), eateth (archaic), eaten (up), eating (up), distaste, distasting, distasted, mistaste, mistasted, mistasting, partake, partaking, partook, partaken, relish, relisheth (archaic), relishing, relished, season, seasoning, seasoned, smack, smacking, smacked, smatch (obsolete), sweeten, sweetening, sweetened, taste, tasting, tasted Adverbs Sweetly, sourly, tastefully, bitterly, tastingly, unsavourily, unsavourly, insipidly, savourously, savourily, flavourfully Table 4 Lexical units for Taste Hyperparameter Value 𝛽 1, 𝛽 2 0.9, 0.99 Dropout 0.2 Epochs 20 Batch Size 32 Learning Rate (LR) 0.0001 Decay Factor 0.38 Cut Fraction 0.3 All tasks loss weight 1 Table 5 Hyperparameter value used for the experiments which yield the best results Appendices A. Lexical Units and Frame Elements In Table 4, we display the list of lexical units or taste words presented in [16]. B. Hyperparameter Values The hyperparameter setting for all our models is pre- sented in Table 5. The setting is the default MaChAmp’s hyperparameter values, with the addition of loss weights at 1, and 20 epochs of training.