<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Benchmarking the Semantics of Taste: Towards the Automatic Extraction of Gustatory Language</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Teresa</forename><surname>Paccosi</surname></persName>
							<email>tpaccosi@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<settlement>Trento</settlement>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Università degli studi di Trento</orgName>
								<address>
									<addrLine>Via Calepina</addrLine>
									<postCode>14</postCode>
									<settlement>Rovereto</settlement>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">DHLab / KNAW Humanities Cluster</orgName>
								<address>
									<addrLine>Oudezijds Achterburgwal</addrLine>
									<postCode>185, 1012 DK</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sara</forename><surname>Tonelli</surname></persName>
							<email>satonelli@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<settlement>Trento</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Benchmarking the Semantics of Taste: Towards the Automatic Extraction of Gustatory Language</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BED1AE1D38FE01D4EE107BAAD2111230</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sensory semantics</term>
					<term>gustatory language</term>
					<term>information extraction</term>
					<term>digital humanities</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we present a benchmark containing texts manually annotated with gustatory semantic information. We employ a FrameNet-like approach previously tested to address olfactory language, which we adapt to capture gustatory events. We then propose an exploration of the data in the benchmark to show the possible insights brought by this type of approach, addressing the investigation of emotional valence in text genres. Eventually, we present a supervised system trained with the taste benchmark for the extraction of gustatory information from historical and contemporary texts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Despite the central role of nutrition in our lives, taste has been often classified as an inferior sense in the Western philosophical tradition. This downplayed role is reflected in the vocabulary used to describe the gustatory experience, which, together with smell, is characterized by a scarcity of domain-specific terms <ref type="bibr" target="#b0">[1]</ref>. The difficulty in capturing the semantics of taste could help explain why there are few works in the fields of Natural Language Processing (NLP) and Digital Humanities (DH) that deal with this sense and, in particular, the language used to describe its experience. While there has been renewed interest in the automatic extraction of nutrients and ingredients from texts for health and medicinal purpose <ref type="bibr" target="#b1">[2]</ref>, less attention has been devoted to the development of tools and models focused on capturing the semantics of sensory experiences, especially in a diachronic fashion.</p><p>In this paper, we present an English benchmark for the study of gustatory language and a supervised system for the automatic extraction of taste-related events in English, which we trained using this benchmark. The benchmark was built to be a counterpart to the olfactory one presented in <ref type="bibr" target="#b2">[3]</ref>, with the idea of making the study of the language of these two senses comparable. The system is designed as a means to study the language used to describe the experience of tasting from both synchronic and diachronic perspectives. The selected formal representation for the semantics of taste is based on Frame Semantics <ref type="bibr" target="#b3">[4]</ref>, and the system is trained to identify the lexical units and the possible semantic roles contributing to the construction of a gustatory event. We present the results of the experiments and an exploration of the benchmark data, aiming to demonstrate the potential of frame-based analysis for sensory studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In recent years, there has been a growing interest within the NLP community in developing resources designed to capture the sensory content of language <ref type="bibr" target="#b4">[5]</ref>. In particular, in the framework of the three-year European Project "Odeuropa" 1 aimed at preserving intangible cultural heritage, several works have focused on analyzing smell descriptions <ref type="bibr" target="#b5">[6]</ref> and extracting olfactory information from texts. For instance, <ref type="bibr" target="#b2">[3]</ref> created a manually annotated benchmark with smell events, which has been subsequently used to train a system for olfactory information extraction <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. The benchmark focuses on the language used to describe olfactory experiences and covers a period of four centuries (1600-1900), making it useful for historical research. An extension in this direction is SENSE-LM, a system for extracting sensory information from texts, which shows that combining language models with lexical resource-based approaches yields better results in extracting sensory references from texts compared to systems that do not integrate these two components <ref type="bibr" target="#b8">[9]</ref>. The authors were the first to combine sensorimotor representations with the textual features of language models for the task of sensory information extraction in text documents. Even if they propose the system for all the 5 senses, they only tested it on olfactory</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Frame Element</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Taste_Source</head><p>The food items that are ingested Quality Any property used to describe the taste (usually adjectives) Taste_Carrier</p><p>Anything that can contain the taste source Taster</p><p>The person/animal who ingests the food Evoked_Taste</p><p>The taste that is evoked but it is not present (e.g., it tastes like onions) Location</p><p>The place in which the food is tasted Taste_Modifier An ingredient that can modify the perception of the taste of a taste source Circumstances</p><p>The condition or circumstance in which the taste event occurs Effect Any effect provoked by the tasting experience</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>List of Gustatory Frame Elements and auditory language, using respectively the benchmark of <ref type="bibr" target="#b2">[3]</ref> and an artificial dataset they generated with GPT-4 <ref type="bibr" target="#b9">[10]</ref>. Most existing work on food representation in the field of NLP focuses on health-related applications. A notable work with a linguistic focus is <ref type="bibr" target="#b1">[2]</ref>, where the authors concentrate on identifying noun-compound headnouns for developing conversational agents in the e-commerce domain. They propose a supervised approach based on a neural sequence-to-sequence model to identify the most informative token in Italian food compound-nouns, obtaining promising results despite the complexity of the task. Taste has been also addressed from a diachronic point of view in <ref type="bibr" target="#b10">[11]</ref>, in which the author reconstructs the evolution of food language focusing on the history of some dishes and ingredients across continents using computational linguistic tools. Several studies have developed named-entity recognition (NER) models to automatically extract food entities for medicinal purposes and food science applications <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>, creating domainspecific corpora by sourcing data from culinary websites and online recipe books <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Benchmark for Taste</head><p>The training data we use for the models in this paper is a benchmark created according to the annotation guidelines presented in <ref type="bibr" target="#b15">[16]</ref>. The formalization adopted to annotate the benchmark is inspired by Frame Semantics <ref type="bibr" target="#b3">[4]</ref> and their implementation through the FrameNet annotation project <ref type="bibr" target="#b16">[17]</ref>. In FrameNet, events and situations are constructed as frames, structures that represent the knowledge necessary to understand the meaning of words. Frames include two main components, namely lexical units, domain-specific words or expression that trigger the frame, and frame elements, domain-specific semantic roles usually attached as dependents to the lexical unit. In our case, taste events are captured through a so-called Gustatory frame, which is triggered in a document by Taste_Words (i.e., domain-specific lexical units). Each lexical unit is annotated in the bench-mark together with the frame elements associated with it, which the taste extraction system should then identify automatically. For instance, in the sentence "[Slimy milk]𝑇 𝑎𝑠𝑡𝑒_𝑆𝑜𝑢𝑟𝑐𝑒 has an [unpleasant] 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 taste", the system has to identify the Taste_Word ('taste'), and then the possible frame elements (in this case, Taste_Source and Quality). A list of the possible frame elements and their definition is provided in In Table <ref type="table" target="#tab_2">2</ref> we report the statistics of the annotated benchmark (note that in <ref type="bibr" target="#b15">[16]</ref> we presented only a preliminary version of the benchmark containing around 1,400 Taste_Words). The most frequent frame element is the Taste_Source, followed by Quality and Taste_Modifier, which represent the core frame elements, while the rest of the frame elements are much sparser. Even if the distribution of the frame elements is not balanced, the system is trained to extract the taste words and all the 9 frame elements. Two expert linguists, trained on <ref type="bibr" target="#b15">[16]</ref>'s guidelines, annotated three documents from 1670, 1720, and 1920 to assess Inter Annotator Agreement (IAA). The Krippendorff's alpha score <ref type="bibr" target="#b17">[18]</ref> at span level was 0.70, indicating a moderate agreement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Exploration of olfactory and gustatory benchmarks</head><p>It has been observed that words used to describe olfactory and gustatory experiences tend to appear more frequently in emotionally charged contexts and carry a stronger evaluative content compared to words related to other senses <ref type="bibr" target="#b18">[19]</ref>. By 'evaluative content', we refer in this paper to the concept of 'emotional valence', which is defined as "the pleasantness of a word in terms of positive and negative meaning" ([1], p. 201). We therefore conducted an exploration of the gustatory benchmark to investigate the positive and negative connotations of gustatory events across different text genres. We perform the same analysis for olfactory events, using the olfactory benchmark of <ref type="bibr" target="#b2">[3]</ref> in order to compare the outcome for the two senses. To perform this analysis, we first divide Taste_Words and Smell_Words into positive and negative.</p><p>To this purpose, we use the categories proposed in the Historical Thesaurus of English of Savouriness and Unsavouriness for Taste and Fragrant/Fragrance and Stench for Smell <ref type="foot" target="#foot_8">10</ref> . This thesaurus contains almost every recorded word in English from medieval times to the present day, ordered into detailed hierarchies of meaning. In the Thesaurus, every category of the hierarchy is divided per part of speech (PoS). For our analysis, we manually selected all the nouns, adjectives and adverbs used in the period we cover with our documents, namely from 16 th century to 20 th century. We then assigned the words labeled as Taste_Words and Smell_Words in the documents to one of the two categories (positive or negative) and calculated the normalized frequency of each category across different text genres. As reported in Section 3, the genres represented in the gustatory bench- We display the output of this analyses in Fig. <ref type="figure" target="#fig_1">1</ref> (for taste words) and Fig. <ref type="figure" target="#fig_2">2</ref> (for smell words), aimed at showing which emotional valence prevails in each genre for the two senses. We observe that two genres exhibit opposite tendencies: medicine/botany shows a more negative orientation in the smell benchmark and a more positive one in the taste benchmark, whereas travel/ethnography is more positive concerning smell and more negative for taste (see Fig. <ref type="figure" target="#fig_1">1</ref> and Fig. <ref type="figure" target="#fig_2">2</ref>, where the light blue refers to negative valencies and the dark blue to positive ones). We then analyzed the most frequent smell / taste sources in the two selected genres to motivate why they exhibit  such difference in emotional valence. We notice that smell sources in medicine/botany tend to be common to hospital and disease-related domains having words such as 'urine' and 'fetid bronchitis', while taste sources more easily belong to the realm of common food, with words such as 'almonds' and 'apples'. For what concerns travel/ethnography instead, among the most frequently described taste sources there are exotic and rare foods such as 'coconut' and 'plantain', likely resulting unpleasant to the palates of foreign travelers. Smell sources tend to refer instead to plants, like 'flowers' or 'roots', hence usually pleasant or neutral to the noses of the writers. This analysis of categories and sources' distribution in the genres underlines the importance of a frame-base analysis for understanding and comparing sensory descriptions, in particular their emotional valence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">System for Gustatory Information Extraction</head><p>The benchmark introduced in the previous sections is used to train a classifier whose goal is to detect gustatory information in English texts. The system is based on multi-task learning (Section 5.1), and is then compared with a "single task" classifier, which we consider our baseline (Section 5.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Multitask configuration</head><p>To build our system for gustatory information extraction, we adopted a multitask learning approach <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref>, a configuration successfully tested for olfactory information extraction in <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. This approach treats the classification of lexical units and each frame element as different tasks. Additionally, we explored a "single task" classification approach, where both lexical units and frame elements are classified within a multiclass token classification task. The results of these experiments served as a baseline for evaluating the effectiveness of the multitask approach. In both configurations, we employed a transformer-based model fine-tuned for a token classification task <ref type="bibr" target="#b21">[22]</ref>. This methodology has proved effective across various NLP tasks, including olfactory information extraction <ref type="bibr" target="#b7">[8]</ref> and the extraction of food-related ingredients <ref type="bibr" target="#b12">[13]</ref>. We experiment the two configurations with monolingual (English) and multilingual versions of BERT and RoBERTa and with an English historical model, MacBERTh. The models we use are listed below:</p><p>-English BERT: bert-base-cased<ref type="foot" target="#foot_9">11</ref>  <ref type="bibr" target="#b22">[23]</ref> -Multilingual BERT (mBERT): bert-base-multilingualcased <ref type="foot" target="#foot_10">12</ref> [23] -English historical model: MacBERTh<ref type="foot" target="#foot_11">13</ref>  <ref type="bibr" target="#b23">[24]</ref> -English RoBERTa: roberta-base<ref type="foot" target="#foot_12">14</ref>  <ref type="bibr" target="#b24">[25]</ref> -Multilingual RoBERTa (RoBERTa xlm): xlmroberta-large<ref type="foot" target="#foot_13">15</ref>  <ref type="bibr" target="#b25">[26]</ref> We fine-tuned each model using the same data, maintaining identical training, validation, and test splits, and evaluated them using 5-fold cross-validation. Each fold contained 80% of the lexical units and their related frame elements for training, 10% for validation (dev), and 10% for testing. These splits were consistent across all configurations and not entirely random. This configuration ensured a balanced distribution of frame elements and comparability in every run. For labeling the data, we adopted the IOB (Inside-Outside-Beginning) labeling format, as used in <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. This method facilitates a comprehensive analysis of sentences and lexical expressions by </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Results (F1) of the classifiers on the lexical unit (T_Word) and 9 frame elements with single (italics) and multitask configurations.</p><p>The results are the average of the f1 results of each label across the 5 folds.</p><p>labeling each token with either Inside, Outside, or Beginning labels as appropriate. To fine-tune the models, we used MaChAmp <ref type="bibr" target="#b26">[27]</ref>, a specialized toolkit designed for multi-task fine-tuning scenarios. In this approach, each label classification is treated as a distinct task. This setup ensures that simpler tasks, such as recognizing lexical units, contribute as auxiliary tasks to more complex label classifications like "Circumstances" or "Effect" which include entire sentences rather than individual words. MaChAmp enables the choice of different parameters, such as loss weight, epochs and batch size, and we tested different configurations 16 . The results in Table <ref type="table">3</ref> for the multitask approach share the configuration which yielded the best results. The configuration is the same for all the models and it is reported in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">"Single Task" configuration as Baseline</head><p>Similar to the system for smell information extraction presented in <ref type="bibr" target="#b7">[8]</ref>, we designed our baseline approach as a single-task multiclass classification, where the model assigns one of 21 possible labels to each token. These labels include 20 representing either "begin" or "inside" of each lexical unit and frame element, and 1 label representing "outside". As we did for the multitask approach, each model is fine-tuned with a token classification head on top 17 . During the training of each model, a hyperparameter search was conducted on the first fold of our data. The search space included learning rates <ref type="bibr" target="#b15">16,</ref><ref type="bibr">32]</ref>, and training epochs up to 20, with warmup applied for 10% of the training steps. After determining the optimal hyperparameters for each model, it is fine-tuned 16 Loss weight with different combinations over the labels [1, 0.75], epochs <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr">30]</ref>, and batch size <ref type="bibr" target="#b15">[16,</ref><ref type="bibr">32]</ref> 17 https://huggingface.co/docs/transformers/tasks/token_ classification five times, each time with a different data fold, and the average scores were computed. We present the results of for the single task approach of each model in italics in Table <ref type="table">3</ref>. We observe high performance variations across different frame elements, with the best results obtained for "Quality" and "Taste_Modifier". This is probably due to the fact that their syntactic realization tends to be consistent in the different documents, with "Quality" mainly expressed by adjectives and "Taste_Modifier" by prepositional phrases introduced by with. On the contrary, classification results for "Taste_Source" are quite low despite it being the most frequent FE in the training set, probably because they can be expressed by many different role fillers and syntactic constructions. Upon reviewing the test and prediction results, we find that most mistakes concerning Taste_Source are due to a wrong span extent, for instance the system predicts "the taste of [lollilop]" while the gold standard is "the taste [of lollipop]". This issue is also likely reflected in the inter-annotator agreement (IAA) of the benchmark. In the future, we will consider alternative ways to evaluate text spans beside exact match, for instance by computing the cosine similarity between gold instances and system predictions. Overall, MacBERTh is the best model for Taste_Word detection, but the different FEs are mostly detected with higher accuracy using RoBERTa xlm. For this reason, we plan to adopt this model for our future research on gustatory language.</p><formula xml:id="formula_0">[1𝑒 − 5, 2𝑒 − 5, 3𝑒 − 5, 4𝑒 − 5, 5𝑒 − 5], batch sizes [8,</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Future Direction</head><p>In this paper, we presented a benchmark for gustatory events containing manually annotated taste-related information, built as a counterpart to the one proposed in <ref type="bibr" target="#b2">[3]</ref>.</p><p>The benchmark is constructed with the same approach adopting a frame-based methodological framework to analyze sensory language. We emphasized the importance of frame-based analysis to capture sensory events by exploring the characterization of positive and negative valence in the benchmarks through the analysis of taste and smell words and sources. The analysis based on frames seems to bring relevant insights into capturing sensory valence from different perspectives, likely supporting the suitability of this approach to deal with humanistic inquiries. We then presented a supervised system to automatically extract taste-related frames, trained on this benchmark. This preliminary exploration and the results obtained with our experiments seem promising for future exploration with automatically extracted data. Indeed, the limited data of the benchmark are not enough to draw relevant conclusions, and for this reason we plan to use our system to extract more data and conduct largescale analyses of the evolution of sensory information over time. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendices</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Lexical Units and Frame Elements</head><p>In Table <ref type="table" target="#tab_4">4</ref>, we display the list of lexical units or taste words presented in <ref type="bibr" target="#b15">[16]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Hyperparameter Values</head><p>The hyperparameter setting for all our models is presented in Table <ref type="table" target="#tab_5">5</ref>. The setting is the default MaChAmp's hyperparameter values, with the addition of loss weights at 1, and 20 epochs of training.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>mark are: Literature, Science &amp; Philosophy, Household &amp; Recipes, Travel &amp; Ethnography, Medicine &amp; Botany. In the olfactory benchmark presented in [3], there are instead 10 different genres: Household &amp; Recipes, Law &amp; Regulations, Literature, Medicine &amp; Botany, Perfumes &amp; Fashion, Public health, Religion, Science &amp; Philosophy, Theatre, Travel &amp; Ethnography.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Savoury (dark blue) and Unsavoury (light blue) frequencies of taste words in genres</figDesc><graphic coords="4,89.29,84.19,203.37,104.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Fragrant/Fragrance (dark blue) and Stench (light blue) frequencies of smell words in genres</figDesc><graphic coords="4,89.29,230.21,203.37,167.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table /><note>. The documents annotated in the benchmark cover 5 different domains or genres, almost evenly distributed with 3/4 documents for century in every domain for a total of 72 documents. The genres are: Literature, Science &amp; Philosophy, Household &amp; Recipes, Travel &amp; Ethnography, and Medicine &amp; Botany.To select the documents we automatically search for texts presenting a greater density of lexical units (taste words) 2 spanning through several English corpora and tasterelated websites. The corpora form which we extract the documents we annotated are: (1) Early English Books Online (EEBO)3 , a collection of documents published between 1475 and 1700 covering different domains such as literature, philosophy, politics, religion, geography, history, politics, and mathematics; (2) Project Gutenberg4 , a digitized archive of cultural works, containing different repositories, mainly in the literary domain; (3) medievalcookery.com 5 a list of texts freely available online relating to medieval food and ancient cooking recipes; (4) foodsofengland.co.uk6 an online library which holds the complete texts of several cook books from 1390 to 1974;(5) Wikisource 7 , an online digital library of free-content textual sources managed by the Wikimedia Foundation; (6) British Library 8 , a collection of 65,227 digitised volumes from the 16th to the 19th Century; (7) London Pulse Frame Elements (FEs)</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>1500 1600 1700 1800 1900 Overall</head><label></label><figDesc></figDesc><table><row><cell>Taste_Words</cell><cell>440</cell><cell>2417</cell><cell>500</cell><cell>1498</cell><cell>803</cell><cell>5,648</cell></row><row><cell>Taste_Source</cell><cell>372</cell><cell>1627</cell><cell>375</cell><cell>1081</cell><cell>599</cell><cell>4,393</cell></row><row><cell>Quality</cell><cell>197</cell><cell>1495</cell><cell>255</cell><cell>881</cell><cell>489</cell><cell>1,732</cell></row><row><cell>Taste_Modifier</cell><cell>135</cell><cell>142</cell><cell>66</cell><cell>154</cell><cell>78</cell><cell>1,357</cell></row><row><cell>Taster</cell><cell>65</cell><cell>173</cell><cell>85</cell><cell>185</cell><cell>100</cell><cell>638</cell></row><row><cell>Evoked_Taste</cell><cell>20</cell><cell>127</cell><cell>31</cell><cell>53</cell><cell>16</cell><cell>247</cell></row><row><cell>Location</cell><cell>11</cell><cell>44</cell><cell>12</cell><cell>24</cell><cell>16</cell><cell>116</cell></row><row><cell>Taste_Carrier</cell><cell>9</cell><cell>38</cell><cell>9</cell><cell>26</cell><cell>12</cell><cell>98</cell></row><row><cell>Circumstances</cell><cell>19</cell><cell>206</cell><cell>38</cell><cell>228</cell><cell>82</cell><cell>656</cell></row><row><cell>Effect</cell><cell>24</cell><cell>56</cell><cell>32</cell><cell>34</cell><cell>31</cell><cell>174</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Statistics of the Taste Benchmark</figDesc><table><row><cell>Medical Reports 9 , a collection of 5800 Medical Officer of</cell></row><row><cell>Health reports from the Greater London area from 1848</cell></row><row><cell>to 1972.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>The limited number of documents is likely a contributing factor to the significant discrepancies in accuracy among the different frame elements, necessitating more instances to enable a good generalization. Future steps should involve increasing the number of documents and providing less sparse annotations, aiming for better temporal balance. The focus should be on annotating frame elements with lower scores and fewer instances in the benchmark, such as Taste_Carrier and Location. Additionally, alternative metrics and techniques should be employed to capture and explain performance variations across different models. As a further comparison, we plan also to assess the performance of general-purpose frame semantic parsers like LOME<ref type="bibr" target="#b27">[28]</ref> on our benchmark. Nouns Acidity, aftertaste, aroma, bitterness, dainty, delicacy, disgust, distaste, flavor, flavour, flavorful, flavourful, flavoring, flavouring, flavorsome, flavoursome, flavorous, flavourous, gustation, insipidity, mistaste, over-eating, palatableness, piquancy, pungency, rancidity, relish, rellish (obsolete), saltness, sapidity, sapor, savor, savoriness, savour, sharpness, smack, smatch, sourness, sowreness (archaic form of sourness), sweetness, tang, tarage, tartness, tast (obsolete), taste, tastelessness, tasting, unsavoriness, unsavouriness Adjectives Acid, acidic, appetizing, appetizing, bitter, bitter-sweet, bland, dainty, delectable, delicious, delightsom(e), disgusting, flavorless, flavorful, flavourful, flavourless, flavoursome, gamy, indigestible, insipid, juicy, mellow, palatable, piquant, pungent, racy, rancid, rank, salt/salty, sapid, savory, savoury, savourly, seasoned, sharp, sour, soured, sower (archaic form of sour), spicy, stale, sweet, tangy, tart, tasteless, tasty, toothsome, unpalatable, unsavor, unsavour, unsavoury, unsavory, unseasoned, unsweet, unsweetened, wearish, wersh, yummy Verbs Drink (up), drinking (up), drank (up), drunk (up), eat (up), ate (up), eateth (archaic), eaten (up), eating (up), distaste, distasting, distasted, mistaste, mistasted, mistasting, partake, partaking, partook, partaken, relish, relisheth (archaic), relishing, relished, season, seasoning, seasoned, smack, smacking, smacked, smatch (obsolete), sweeten, sweetening, sweetened, taste, tasting, tasted Adverbs Sweetly, sourly, tastefully, bitterly, tastingly, unsavourily, unsavourly, insipidly, savourously, savourily, flavourfully Lexical units for Taste</figDesc><table><row><cell>Part of Speech Lexical Units</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>Hyperparameter value used for the experiments which yield the best results</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">The list of lexical units is provided in Appendix A</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://textcreationpartnership.org/tcp-texts/ eebo-tcp-early-english-books-online/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://www.gutenberg.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://www.medievalcookery.com/etexts.html?England</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">http://www.foodsofengland.co.uk/references.htm</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://en.wikisource.org/wiki/Main_Page</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">https://data.bl.uk/digbks/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">https://wellcomelibrary.org/moh/about-the-reports/ about-the-medical-officer-of-health-reports/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_8">In the categories at https://ht.ac.uk/category/: The world&gt;physical sensation&gt;Taste/Flavour&gt;Savouriness&amp;Unsavouriness; The world&gt;physical sensation&gt;Smell/Odour&gt;Fagrant/Fragrance&amp;Stench</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_9">https://huggingface.co/google-bert/bert-base-cased</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_10">https://huggingface.co/google-bert/bert-base-multilingual-cased</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_11">https://huggingface.co/emanjavacas/MacBERTh</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_12">https://huggingface.co/FacebookAI/roberta-base</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_13">https://huggingface.co/FacebookAI/xlm-roberta-base</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Aknowledgments</head><p>Funded by the European Union under grant agreement 101088548 -TRIFECTA. Views and opinions expressed are however those of the author only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. The authors would also like to thank Marieke Van Erp, the head of the project, for her support.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Sensory linguistics: Language, perception and metaphor</title>
		<author>
			<persName><forename type="first">B</forename><surname>Winter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>John Benjamins Publishing Company</publisher>
			<biblScope unit="volume">20</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">What&apos;s in a food name: Knowledge induction from gazetteers of food main ingredient</title>
		<author>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Balaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Magnolini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Kessler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Povo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CLiC-it 2018</title>
				<meeting>CLiC-it 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page">241</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A multilingual benchmark to capture olfactory situations over time</title>
		<author>
			<persName><forename type="first">S</forename><surname>Menini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Paccosi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tonelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Erp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Leemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lisena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Tullett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hürriyetoğlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dijkstra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change</title>
				<meeting>the 3rd Workshop on Computational Approaches to Historical Language Change</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Frame semantics and the nature of language</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Fillmore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of the New York Academy of Sciences</title>
		<imprint>
			<biblScope unit="volume">280</biblScope>
			<biblScope unit="page" from="20" to="32" />
			<date type="published" when="1976">1976</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A computational approach to generate a sensorial lexicon</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Tekiroğlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Özbal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Strapparava</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/W14-4716</idno>
		<ptr target="https://aclanthology.org/W14-4716.doi:10.3115/v1/W14-4716" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), Association for Computational Linguistics and</title>
				<meeting>the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), Association for Computational Linguistics and<address><addrLine>Dublin City University; Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="114" to="125" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Towards olfactory information extraction from text: A case study on detecting smell experiences in novels</title>
		<author>
			<persName><forename type="first">R</forename><surname>Brate</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Erp</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.latechclfl-1.18" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, International Committee on Computational Linguistics</title>
				<meeting>the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, International Committee on Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="147" to="155" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Scent mining: Extracting olfactory events, smell sources and qualities</title>
		<author>
			<persName><forename type="first">S</forename><surname>Menini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Paccosi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Tekiroğlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tonelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</title>
				<meeting>the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="135" to="140" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Semantic frame extraction in multilingual olfactory events</title>
		<author>
			<persName><forename type="first">S</forename><surname>Menini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING</title>
				<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING</meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="page" from="14622" to="14627" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Sense-lm: A synergy between a language model and sensorimotor representations for auditory and olfactory information extraction</title>
		<author>
			<persName><forename type="first">C</forename><surname>Boscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Largeron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Eglin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Egyed-Zsigmond</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EACL 2024</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="1695" to="1711" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Ai</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<title level="m">Gpt-4 technical report</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">The language of food : a linguist reads the menu / Dan Jurafsky</title>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<imprint>
			<publisher>W.W. Norton Company</publisher>
			<biblScope unit="page" from="2014" to="2014" />
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
	<note>first edition. ed</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Butter: Bidirectional lstm for food named-entity recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Cenikj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Popovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stojanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">K</forename><surname>Seljak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eftimov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A fine-tuned bidirectional encoder representations from transformers model for food named-entity recognition: Algorithm development and validation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Stojanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Popovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cenikj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Koroušić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Seljak</surname></persName>
		</author>
		<author>
			<persName><surname>Eftimov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Medical Internet Research</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">e28229</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Foodbase corpus: a new resource of annotated food entities</title>
		<author>
			<persName><forename type="first">G</forename><surname>Popovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">K</forename><surname>Seljak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eftimov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">121</biblScope>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Wróblewska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kaliska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pawłowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wiśniewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Sosnowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ławrynowicz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2204.07775</idno>
		<title level="m">Tasteset-recipe dataset and food entities recognition benchmark</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A new annotation scheme for the semantics of taste</title>
		<author>
			<persName><forename type="first">T</forename><surname>Paccosi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tonelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th Joint ACL-ISO Workshop on Interoperable Semantic Annotation@ LREC-COLING 2024</title>
				<meeting>the 20th Joint ACL-ISO Workshop on Interoperable Semantic Annotation@ LREC-COLING 2024</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="39" to="46" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Ruppenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ellsworth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schwarzer-Petruck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Scheffczyk</surname></persName>
		</author>
		<title level="m">FrameNet II: Extended theory and practice</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
		<respStmt>
			<orgName>International Computer Science Institute</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Krippendorff</surname></persName>
		</author>
		<title level="m">Computing krippendorff&apos;s alphareliability</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Taste and smell words form an affectively loaded and emotionally flexible part of the english lexicon, Language</title>
		<author>
			<persName><forename type="first">B</forename><surname>Winter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognition and Neuroscience</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="975" to="988" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Multitask learning: A knowledge-based source of inductive bias1</title>
		<author>
			<persName><forename type="first">R</forename><surname>Caruana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Tenth International Conference on Machine Learning</title>
				<meeting>the Tenth International Conference on Machine Learning</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="41" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Multitask learning</title>
		<author>
			<persName><forename type="first">R</forename><surname>Caruana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="41" to="75" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">MacBERTh: Development and evaluation of a historically pretrained language model for English (1450-1950</title>
		<author>
			<persName><forename type="first">E</forename><surname>Manjavacas Arévalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fonteyn</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.nlp4dh-1.4.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Natural Language Processing for Digital Humanities (NLP4DH), Association for Computational Linguistics</title>
				<meeting>the Workshop on Natural Language Processing for Digital Humanities (NLP4DH), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="23" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized BERT pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>CoRR abs/1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692.arXiv:1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Unsupervised crosslingual representation learning at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>CoRR abs/1911.02116</idno>
		<ptr target="http://arxiv.org/abs/1911.02116.arXiv:1911.02116" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Van Der Goot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Üstün</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramponi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sharaf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plank</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.14672</idno>
		<title level="m">Massive choice, ample tasks (machamp): A toolkit for multi-task learning in nlp</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">LOME: Large ontology multilingual extraction</title>
		<author>
			<persName><forename type="first">P</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vashishtha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>May</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rawlins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>White</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Durme</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.eacl-demos.19</idno>
		<ptr target="https://aclanthology.org/2021.eacl-demos.19.doi:10.18653/v1/2021.eacl-demos.19" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Gkatzia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Seddah</surname></persName>
		</editor>
		<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="149" to="159" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
