<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>User Needs Determine Termbase Design</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michal Měchura</string-name>
          <email>michal.boleslav.mechura@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Ó Raghallaigh</string-name>
          <email>brian.oraghallaigh@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Úna Bhreathnach</string-name>
          <email>una.bhreathnach@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gearóid Ó Cleircín</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fiontar &amp; Scoil na Gaeilge, Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Natural Language Processing Centre, Faculty of Informatics, Masaryk University</institution>
          ,
          <addr-line>Brno</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper describes and discusses how the design of the National Terminology Database for Irish (téarma.ie) has been influenced by two factors: the assumed information needs of the intended users, and the data governance needs of the publisher. In particular, we will highlight how these factors have sometimes caused our termbase design to diverge from established practices in the terminology industry and from standards such as TBX.</p>
      </abstract>
      <kwd-group>
        <kwd>online termbases</kwd>
        <kwd>terminology in minority languages</kwd>
        <kwd>terminology in bilingual countries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>vocabulary and looking for the kind of information one would normally expect to find a
generalpurpose dictionary. NTDI has evolved to satisfy this unique mixture of the users’ information
needs (a concept originally defined by [ 5]), both in its content (it contains some general-language
vocabulary) and in its structure.</p>
    </sec>
    <sec id="sec-2">
      <title>1.2. The data-governance needs of the terminologist</title>
      <p>While the users’ information needs are what drives the design of a termbase, the needs of
the terminologists – the editors and maintainers behind the scenes – need to be taken into
account as well. These are concerned mainly with data governance: quality control, keeping the
termbase well organised and well maintained in the long run, avoiding duplicates and so on.
The design of the NTDI reflects some of these needs, as we will show in the rest of this paper.</p>
      <sec id="sec-2-1">
        <title>2. Some features of NTDI</title>
        <p>We will now review some of NTDI’s structural features that have been influenced by the
requirements introduced above, covering both the users’ information needs and the terminologist’s
data-governance needs.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2.1. Grammatical annotations</title>
      <p>Most termbases in the translation industry or in knowledge engineering contain only sparse
grammatical information: it is expected that the user will be a (near-)native speaker and will
need no help in determining the gender of nouns or the plural of noun phrases. In NTDI this
assumption does not apply: NTDI is a public-service termbase, targeting the general public
and serving a user community with a high percentage of learners and non-native speakers.
The consequence is that terms in NTDI come with relatively rich grammatical annotations,
both as labels attached to terms (part of speech, gender, inflection paradigm) and as inflected
forms added to terms (plurals, genitive case). A speciality is that the termbase allows inline
grammatical annotations: it is possible to attach labels not just to the entire term but also to a
single word inside it, for example to the head noun of a noun phrase.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Term sharing</title>
      <p>Because terms in NTDI contain a lot of grammatical annotation and because many terms are
polysemous (= one term designates multiple concepts), the issue of duplication and consistency
have arisen: entering the same term into several entries requires duplicate efort and can result
in inconsistencies (for example, when a mistaken grammatical label is corrected in one entry but
not in another). To prevent duplication and to enforce consistency, NTDI (and Terminologue)
has a feature which allows for a term to be shared among several entries. Any changes made
to the term in one entry (including changes to its grammatical annotation or to its infected
forms) automatically become visible in the other entries too. Approximately 15% of terms in
the termbase are shared like this. This is an example of a database design feature which is
motivated not by the user’s information needs (the end-users are probably not even aware of it)
but by the data-governance needs of the terminologists: the need to eliminate duplicate labour
and inconsistency.</p>
    </sec>
    <sec id="sec-5">
      <title>2.3. No ontologies</title>
      <p>It is popular in terminology to organise entries into networks of is-a, has-a and other relations,
thus building entire ontologies [6]. Ontologies are useful in a knowledge-engineering context
where the goal is to enable the user to explore and understand an entire domain. In NTDI,
however, this goal is almost absent. Our website trafic statistics show that most users consult
NTDI not to explore an entire domain but simply to obtain translations of individual terms.
NTDI users typically consult the termbase while they are doing something else: translating or
writing. Because of this, the software behind NTDI (Terminologue) has no ontology-building
features. The only type of entry-to-entry relation available is a simple “see also” relation, as
well as relations implicit in our relatively rich scheme of hierarchical domain labels. We find
that this is suficient for the information needs of NTDI’s users.</p>
    </sec>
    <sec id="sec-6">
      <title>2.4. Optional hiding of information</title>
      <p>It is a truism that when publishing lexical resources online as opposed to on paper, one does
not need to worry about space constraints, as computer memory is practically unlimited. But
this does not mean that terminological entries can be arbitrarily long: we still need to take the
user’s cognitive capacity into account and avoid creating a situation of information overload
[7]. For this reason NTDI (and Terminologue) has a feature which allows the terminologists
to label certain parts of an entry as non-essential, such as protracted citations from sources,
deprecated terms or certain usage examples. Such parts are hidden by default in the public user
interface, while users who want to view them can reveal them by clicking a ‘plus’ icon.</p>
      <sec id="sec-6-1">
        <title>3. Conclusion</title>
        <p>The termbase described in this paper departs from established practice in terminology. Many
of NTDI’s structural features are dificult to map onto structural categories common in other
terminological software and in interchange standards such as TBX (for example, TBX has no
notion of term sharing). We have attempted to explain in this paper that this divergence is not
arbitrary but motivated: motivated by the genre of the termbase (it is a public-service termbase),
motivated by the information needs of the end-users, and last but not least, motivated by the
data-governance needs of the terminologists.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Acknowledgments</title>
        <p>The NTDI is managed by the Gaois research group in Fiontar &amp; Scoil na Gaeilge, Dublin City
University in partnership with the Irish Terminology Committee, Foras na Gaeilge.
[4] H. Bergenholtz, S. Tarp (Eds.), Manual of Specialised Lexicography: The preparation of
specialised dictionaries, volume 12 of Benjamins Translation Library, John Benjamins Publishing
Company, Amsterdam, 1995. doi:10.1075/btl.12.
[5] R. S. Taylor, The process of asking questions, American Documentation 13 (1962) 391–396.</p>
        <p>doi:10.1002/asi.5090130405.
[6] I. Muñoz, M. R. Zambrana, Applying ontologies to terminology: Advantages and
disadvantages, Hermes: Journal of Language and Communication in Business 51 (2013) 65–77.
doi:10.7146/hjlcb.v26i51.97438.
[7] R. Lew, G.-M. de Schryver, Dictionary users in the digital revolution, International Journal
of Lexicography 27 (2014). doi:10.1093/ijl/ecu011.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Měchura</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Ó Raghallaigh</surname>
          </string-name>
          ,
          <article-title>The Focal.ie National Terminology Database for Irish: software demonstration</article-title>
          , in: A.
          <string-name>
            <surname>Dykstra</surname>
          </string-name>
          , T. Schoonheim (Eds.),
          <source>Proceedings of the 14th EURALEX International Congress</source>
          , Fryske Akademy, Leeuwarden/Ljouwert, The Netherlands,
          <year>2010</year>
          , pp.
          <fpage>937</fpage>
          -
          <lpage>948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nic Pháidín</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Ó Cleircín, Ú. Bhreathnach, Building on a terminology resource - the Irish experience</article-title>
          , in: A.
          <string-name>
            <surname>Dykstra</surname>
          </string-name>
          , T. Schoonheim (Eds.),
          <source>Proceedings of the 14th EURALEX International Congress</source>
          , Fryske Akademy, Leeuwarden/Ljouwert, The Netherlands,
          <year>2010</year>
          , pp.
          <fpage>954</fpage>
          -
          <lpage>965</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Měchura</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Ó Raghallaigh</surname>
          </string-name>
          ,
          <article-title>Introducing Terminologue: a cloud-based, open-source terminology management tool</article-title>
          , Presented at XIX EURALEX International Congress,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>