<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MUSTI - Multimodal Understanding of Smells in Texts and Images at MediaEval 2022</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali Hürriyetoğlu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Paccosi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Menini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Zinnen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lisena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kiymet Akdemir</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphaël Troncy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marieke van Erp</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>KNAW Humanities Cluster DHLab</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Pattern Recognition Lab, Friedrich-Alexander-Universität</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>MUSTI aims to collect information about smell from digital text and image collections from the 17th to 20th century in a multilingual setting. More precisely, MUSTI studies the relatedness of evocation of smells (smell sources being identified, objects being detected, gestures being mentioned or recognized) between texts and images. The main task is a binary classification task and entails identifying whether a pair of image and a text snippet contains the same smell source independent of what is the smell source. An optional sub-task is the determination of the smell sources that make the respective pair related.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>contexts, language and computer vision technologies can aid in finding olfactory relevant
examples in the collections and related sources.</p>
      <p>
        While the sense of smell is of vital importance in our day-to-day lives, it has received little
attention within the natural language processing and computer vision communities. While there
are some olfactory lexicons [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], the Odeuropa text benchmark dataset is the first multilingual,
cross-domain text dataset focused on smell references [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Similarly, for computer vision, no
prior datasets existed until the ODOR challenge dataset [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. The MUSTI task brings these
modalities together, inviting the research community to explore parallels and complementarities
in the way smells are described and depicted in diferent modalities. MUSTI ofers texts in
English (EN), German (DE), French (FR), and Italian (IT).
      </p>
      <p>The remainder of this paper is structured as follows: we provide details on the task in Section 2.
The steps followed to prepare data for training and evaluation are provided in Section 3. The
evaluation methodology is described in Section 4 before concluding in Section 5.7</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task description</title>
      <p>The manner in which humans engage with smell is a prime example of intangible cultural
heritage: the way smells are created, in what situations they are used, but also how they are
appreciated are highly culturally-dependent. By engaging with expressions of smells in texts
and images across multiple genres and multiple languages over a long period of time, we expect
to gain insights into how smells have afected human interactions through time.</p>
      <p>Smell is an underrepresented dimension of many multimedia analysis and representation
tasks. The goal of MUSTI is to advance the understanding of how smells are described and
depicted by recognizing and connecting references to smells in texts and images. In this shared
task, participants are provided with multilingual texts and images, from the 17th to the 20th
century, that pertain to smell (i.e. selected because they evoke smells).</p>
      <p>Task participants should develop language and image recognition technologies to predict
whether a text passage and an image evoke the same smell source or not. In a subsequent
optional sub-task, the participants are also asked to identify common smell source(s) such as
the person, object or place that has a specific smell, or that produces odours (e.g. plant, animal,
perfume, human), between the text passages and the images.</p>
      <p>The “Quest for insight” part, which goes beyond quantitative evaluation by posing questions
that endorse a deeper understanding of the challenge, including data and the strengths and
weaknesses of particular types of approaches, of MUSTI consists of the following questions:
1. What does it mean for a text passage and an image to be related in terms of smell?
2. Do diferent text and image genres reference smell diferently?
3. Do diferent languages reference smell diferently?
4. How do references to smell in texts and images change over time?
5. How do relationships between smell references in texts and images change over time?</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data preparation and Release</title>
      <p>The data consists of copyright-free texts (historical books and documents) and partly copyrighted
images from open repositories.8 We ofer sentences that should be matched with images, which
are selected from RKD, Bildindex der Kunst und Architektur, Museum Boijmans, Ashmolean</p>
      <sec id="sec-3-1">
        <title>7Task related materials can be found at https://github.com/Odeuropa/musti_mediaeval2022. 8For copyrighted materials, the source URL is shared with participants.</title>
        <p>Museum Oxford, Plateforme ouverte du patrimoine and annotated with 80+ categories of smell
objects and gestures such as flowers, food, animals, snifing and holding the nose. 9</p>
        <sec id="sec-3-1-1">
          <title>3.1. Candidate creation</title>
          <p>In a first step, we search for texts potentially related to the images used for the task. For English,
the text are extracted from the British Library corpus, the Early English Books Online (EEBO),
and from Project Gutenberg. For Italian, the sources are Project Gutenberg, Liber Liber, and
Wikisource. For French, the sources include Gutenberg, Gallica, and the ARTFL Project. Finally,
Berlin State Library OCRs and Deutsches Textarchiv are utilized for German text data.</p>
          <p>Each sentence in the corpora is lemmatized. Next, candidate extraction is based on the
presence in the text of words from three diferent image metadata fields (when available):
• Title: the nouns (lemmatized) in the title, representing the subjects/objects in the painting.
• Categories: labels of visible smell sources, identified by the annotators within the paintings.
• Keywords: list of keywords associated with images in their respective collections.</p>
          <p>We only keep sentences sharing content with at least two of the three fields. Additionally,
we identified if each sentence does or does not contain a Smell Word, e.g. stink, smell, odor, snif,
fetid, smelly. Sentences containing a Smell Word are more likely to be instances where images
and texts represent the same smell while sentences without Smell Word are usually only about
the same subject without evoking any smell related to it.10</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Annotation methodology</title>
          <p>We run our sentence extractor based on categories, keywords, and titles to create text-image
pairs. Using the sentence extractor, we look for pictures containing more categories, keywords
or nouns from the title to find texts as relevant as possible to the image.</p>
          <p>For Subtask 1, we annotate matches (pairs of texts and images) as evoking the same smell
(YES) or not (NO). The images have already been selected as either explicitly or implicitly evoking
a smell. An example of a YES instance is the text “Having secured rooms in this establishment,
we started for a walk through the town. The first thing that strikes a stranger upon his arrival at
Accra is a strong, all - pervading smell of pig” matching with a picture representing pigs.</p>
          <p>The annotation NO is used when there is no match in terms of the smell represented in the
text and the image. An example is the text “How I do hate tobacco and that disgusting habit of
smoking, cried Geraldine, who was a very fastidious little lady, and could not endure the smell of
a pipe or even a cigar” with a picture in which there is no pipes, smoke, or cigars.</p>
          <p>The NO examples have two possible characteristics that make the matching more dificult: i)
a negation in the text and ii) the case when the text and the image contain the same object(s)
but not the same smell. We consider an image-text pair a negation if the object which evokes
the smell represented in the picture is negated in the text. An example of negation is the text
“They had no censers to perfume the air extinguishing the morning fragrance, nor bore they their
diamond crosiers through the streets” with a picture in which there is one or more censers. The
other complex cases are the ones in which the smell-evoking object is mentioned in the text but
does not evoke the same smell represented in the picture. These are metaphorical or similarity
sentences like “Will you not believe these miracles? Which however of themselves they may shine
like a candle lighted up” with a picture representing candles. Sentences mentioning the same</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>9The taxonomy is described in</title>
        <p>TaxonomyOfOlfactoryPhenomena.pdf.
10The translations of the English taxonomy in respective languages are utilized.
smell source represented in the picture but with a diferent acceptation, e.g. “ The barracks and
all buildings were heaps of ruins, the fires still burning, the smoke and stench from which
were ofensive and sufocating ” with a picture representing a diferent kind of smoke such as the
smoke of a pipe. Finally, sentences mentioning the same object as the picture but in a diferent
state, e.g. “The Sunday or visiting dress of Mrs. Yates consisted of a thick sort of silk [...] printed
in a large chintz pattern on a white ground , in which butterflies and flowers, of unknown
and fantastic varieties, predominated.” are considered as tricky.</p>
        <p>In Subtask 2, we add a second layer of annotations to the image-text pairs annotated with
YES in the Subtask 1. In this case, the annotation indicates the person, object or place (multiple
annotations are possible) related to the evoked smell appearing both in the text and in the image.
For instance in "Having secured rooms in this establishment, we started for a walk through the
town. The first thing that strikes a stranger upon his arrival at Accra is a strong, all - pervading
smell of pig.", the smell evoking element appearing both in the text and in an image retrieved
from RKD11 is "pig".</p>
        <p>We split the data in a training and a test sets, with a similar proportion of YES and NO labels.
The YES instances are labeled for subtask 2 as well. The number of instances in the training
and test sets are respectively 795 and 200 for EN, 480 and 213 for DE, 300 and 200 for FR, and
799 and 201 for IT.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We utilize the binary F1-macro score as provided by Scikit-learn library 12 for Subtask 1. We
propose a naive baseline F1-macro score, based on the majority label, that reached 27.14% on
the whole test set (42.85% for EN, 42.89% for DE, 22.07% for FR, and 42.73% for IT). Subtask
2 is evaluated by averaging F1-macro scores calculated for predictions of each instance. The
F1-macro calculation is performed on the overlap between gold and predicted labels. The label
mismatch between gold and predicted labels is resolved by creating a label set by unifying and
alphabetically sorting what appears both in gold and predicted labels for each prediction.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>
        We propose the MUSTI challenge for detecting text-image pairs that contain references to the
same smell source(s) as two subtasks, i.e. prediction of the existence of the relationship as a
binary classification task and detecting the smell-related tokens that are connected through
this relationship. The texts and images are extracted from historical archives. Moreover, we
identified dificult examples of text during the annotation phase. Finally, a baseline based on
prediction as majority label was calculated for the test data released. The system performances
on the MUSTI dataset are described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by European Union’s Horizon 2020 research and
innovation programme within the Odeuropa project (grant agreement No. 101004469). We
would like to thank Hang Tran (Friedrich-Alexander-Universität Erlangen-Nürnberg) and Marta
Sandri (University of Pavia) who significantly contributed to the annotation efort.
11https://rkd.nl/nl/explore/images/193733
12https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Spengler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Boivin</surname>
          </string-name>
          ,
          <article-title>How to use modern science to reconstruct ancient scents</article-title>
          ,
          <source>Nature Human Behaviour</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>611</fpage>
          -
          <lpage>614</lpage>
          . URL: https://doi.org/10.1038/s41562-022-01325-7.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Tekiroğlu</surname>
          </string-name>
          , G. Özbal,
          <string-name>
            <given-names>C.</given-names>
            <surname>Strapparava</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sensicon:</surname>
          </string-name>
          <article-title>An automatically constructed sensorial lexicon</article-title>
          ,
          <source>in: Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <year>2014</year>
          , pp.
          <fpage>1511</fpage>
          -
          <lpage>1521</lpage>
          . URL: https://aclanthology.org/D14-1160.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Paccosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Tekiroğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <article-title>Building a multilingual taxonomy of olfactory terms with timestamps</article-title>
          ,
          <source>in: 13th Language Resources and Evaluation Conference (LREC)</source>
          ,
          <source>European Language Resources Association</source>
          , Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>4030</fpage>
          -
          <lpage>4039</lpage>
          . URL: https://aclanthology. org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>429</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Paccosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van Erp</surname>
            ,
            <given-names>I. Leemans</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lisena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tullett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hürriyetoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dijkstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gordijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jürgens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koopman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ouwerkerk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Steen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Novalija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mladenic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zidar</surname>
          </string-name>
          ,
          <article-title>A multilingual benchmark to capture olfactory situations over time</article-title>
          , in: 3rd Workshop on Computational Approaches to Historical Language Change, Association for Computational Linguistics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lchange-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Christlein</surname>
          </string-name>
          ,
          <article-title>ODOR: The ICPR2022 ODeuropa Challenge on Olfactory Object Recognition</article-title>
          ,
          <source>in: 26th International Conference on Pattern Recognition (ICPR)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>4989</fpage>
          -
          <lpage>4994</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Christlein</surname>
          </string-name>
          ,
          <article-title>Odeuropa dataset of smell-related objects</article-title>
          ,
          <year>2022</year>
          . URL: https://doi.org/10.5281/zenodo.6367776.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Akdemir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hürriyetoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Paccosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Christlein</surname>
          </string-name>
          ,
          <article-title>Multimodal and Multilingual Understanding of Smells using VilBERT and mUNITER</article-title>
          , in: MediaEval Benchmarking Initiative for Multimedia Evaluation,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Wan,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Multilingual Text-Image Olfactory Object Matching Based on Object Detection, in: MediaEval Benchmarking Initiative for Multimedia Evaluation</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>