<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Managing Language Varieties: Examples From Legal Terminology Work</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Eurac Research, Institute for Applied Linguistics</institution>
          ,
          <addr-line>Bolzano/Bozen</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Handling pluricentric languages requires addressing their language varieties. This paper explores strategies to represent these varieties in terminology databases, considering factors such as the quantity of terminological data and the availability or absence of language identifiers. Using the legal domain as a reference point, the analysis examines the associated challenges, as well as the advantages and disadvantages of various approaches to representation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;language variety</kwd>
        <kwd>terminology database</kwd>
        <kwd>legal terminology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Befreiungsschein denotes exemption from medical fees in Germany but a work permit for foreigners
in Austria [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Conversely, the same designation may have a similar meaning across various legal
systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as seen with lockdown, a COVID-19 containment measure, albeit with variations in
regulatory implementation, even across legal systems that share the same language [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>For this reason, in IAL’s daily terminology work, legal comparisons are made between Italian
and German-speaking legal systems (Austria, Germany and Switzerland), as well as EU and
international law. For every Italian term, we provide equivalents for each German-speaking legal
system. For Ladin, we provide terms in the language varieties spoken in the South Tyrolean
valleys, namely Val Gardena and Val Badia.</p>
      <p>This paper proposes how to represent language varieties in terminology databases, with a
particular focus on Trados MultiTerm2, a commercial terminology management system (TMS) used
by the IAL since the mid-1990s. Section 2 describes the terminological metamodel for structuring
terminology databases. Section 3 provides some possible ways of representation by considering the
presence or absence of terminological data and language identifiers. The analysis also encompasses
the related challenges as well as the advantages and disadvantages of each representation. Section 4
concludes the discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The terminological metamodel</title>
      <p>
        Terminological data should be organized and managed according to the terminological principles
[
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]:




each concept entry should contain information about a single concept (concept orientation);
all terms (e.g., synonyms) in a concept entry are treated as independent sub-units. As such,
they are described using the same set of data categories (term autonomy);
data categories should be finely definedd(ata granularity);
data categories should contain only one data element (data elementarity).
      </p>
      <p>
        Their representation is defined in ISO 16642:2017 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This standard provides a terminological
metamodel consisting of two levels of abstraction. The first is the metamodel level, which supports
analysis, design and exchange at a broad level [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The second is the data model level, which adds
the necessary data categories for representing a specific terminological data collection.
      </p>
      <p>
        This paper focuses on the structure of a concept entry (Figure 1), which is organized into three
levels: the concept level (concept entry), the language level (language section) and the term level
(term section) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2 https://www.trados.com/product/multiterm
      </p>
      <p>
        The first level contains administrative data and language-independent terminological
information relevant to the entire concept entry (e.g., /creation date/, /domain/, ) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; the second
level is used to instantiate information about the concept that needs to be available in the
respective language (e.g., terms, /definition/) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; the third level contains all term-related
information (e.g., /context/, /usage note/) [
        <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
        ]. According to cardinalities [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a single concept
can be expressed in n languages. A language section can incorporate one or more term sections.
      </p>
      <p>
        In compliance with terminological principles, a language variety should be treated as a language
and, thus, stored at the language level. Nowadays, most TMS support language varieties. According
to ISO 639 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], these are usually assigned to a language identifier (Table 1).
      </p>
      <p>The terminology database is empty, containing no terminological data. In this case, it
must be structured from scratch, considering language varieties.</p>
      <p>The terminology database contains terminological data and is organized by languages.
However, the inclusion of language varieties now requires an ex-post intervention. The
question is the extent to which modifications can be made to the existing database
structure.</p>
      <p>The terminology database includes language varieties that have not yet been codified.</p>
      <p>These scenarios are not mutually exclusive; instead, they are interrelated and may co-exist. In
the section that follows, we describe the available options to represent them. To ensure clarity and
consistency throughout the paper, we will use the term ‘uncodified language variety’ to refer to a
language variety without an ISO language identifier, as opposed to ‘codified language variety’.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Three ways of representation</title>
      <sec id="sec-3-1">
        <title>3.1. Language varieties as an attributive data category</title>
        <p>Before the early 2000s, TMS providers did not support language varieties. As a result, terminology
databases created in those years were typically organized by languages rather than language
varieties. Suppose
1) there is no possibility of creating a new terminology database, perhaps, due to a large
volume of terminological data or limitations in human and financial resources, and/or
3 https://www.andiamo.co.uk/resources/iso-language-codes</p>
        <p>
          In both cases, we can treat the language variety as an attribute of a term. To this end, we can add a
specific data category (e.g.,/geographical usage/, /legal system/) at the term level and define it as a
picklist. The values of the picklist can be the language or country codes based on ISO 639 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] or
31664 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], respectively. Furthermore, fields such as /definition/, /context/ or /usage note/
should be distinguished by, for example, inserting a language or country code within the data
category. This approach facilitates filtering and exporting data while clearly indicating which fields
belong to a specific language variety (Figure 2).
        </p>
        <p>
          This solution can serve as a viable compromise when significant modifications to the database
structure are not feasible or when working with uncodified language varieties. However, it violates
the terminological principle of term autonomy (see Section 2). Consequently, labelling a preferred
term or indicating its status for each language variety becomes difficult. Other strategies are
needed to give this information. For example, we can explicitly indicate the preference or the status
in a dedicated open or closed data category5 at the term level. Figure 3 shows an example of how to
convey such information. Preference is expressed by the closed data category /Termstatus/ (term
status) and its picklist value Südtirol genormt, indicating that the term has been standardized for
use in South Tyrol by a Terminology Commission. Obsolescence is conveyed through the open
data category /Kurzerläuterung/ (short note), which specifies a terminological change. For instance,
the use of eheliches Kind (legitimate child) in the Italian legal system, expressed in German for
South Tyrol, is documented in this manner.
4 See http://www.lingoes.net/en/translator/langcode.htm and https://www.iban.com/country-codes.
5 According to [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], open data categories are free-text categories like /definition/, /context/ or /note/, while closed data
categories contain a finite set of predefined values. Examples of closed data categories are /domain/, /status/ or
/geographical usage/ (see also [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]).
        </p>
        <p>This type of representation and the creation of ad hoc data categories complicate data exchange
and interoperability, as it does not conform to the terminological metamodel: Being an attributive
data category, the language varieties are not organized within a dedicated language section. Indeed,
they fall under the language ‘German’, expressed by the xml:lang attribute &lt;language lang= “DE”
type= “Deutsch”&gt;&lt;/language&gt;. The XML excerpt from the terminological entry eheliches Kind
(Figure 4) illustrates this approach.</p>
        <p>As is evident from the XML, this language section includes a term section, which contains
termrelated information concerning different legal systems and, hence, distinct German language
varieties. This can also make interaction with MT tools more challenging.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Language varieties at the language level</title>
        <p>Language varieties can be stored at the language level when the following conditions are satisfied:
a) The terminology database contains terminological data organized by languages, but
there is the possibility of reorganizing it by adding language varieties.
b) The terminology database is empty. Thus, it must be structured from scratch with
consideration for language varieties.</p>
        <p>c) The language varieties present in the database all have language identifiers.</p>
        <p>These ideal scenarios enable the representation of language varieties in a methodologically and
technically accurate manner. In full compliance with terminological principles and the
terminological metamodel, we can structure terminological data into the concept, language, and
term levels, whereby the language level separates terms in one language variety from terms in
other language varieties (Figure 5).</p>
        <p>With this representation, each language variety has its own language section containing one or
more term sections. Every language section is identified by a specificxml:lang attribute. In the case
of Figure 5, these attributes are:



&lt;language type="German (Austria)" lang="DE-AT" /&gt;
&lt;language type="German (Germany)" lang="DE-DE" /&gt;
&lt;language type="German (Switzerland)" lang="DE-CH" /&gt;</p>
        <p>This representation avoids creating ad hoc data categories and allows the use of harmonized
data categories in compliance with ISO 12620-1:2022 [17] and ISO 12620-2:2022 [18]6. Furthermore,
it enables anchoring the definition at the language level, which is essential for the legal domain.
Given the system-bound nature of legal terminology (see Section 1), distinct definitions are
required for each legal system. However, this approach is similarly relevant for other domains, like
religion, which lack the same cognitive background or internationalization [19]. Additionally, this
type of representation simplifies the labelling of a preferred termor the indication of its status, as
6 See the data category repository DatCatInfo (www.datcatinfo.net).
compared to the approach discussed in Section 3.1. It also enhances the smoothness of exporting or
filtering terminological data, interactions with MT tools, data exchange, and interoperability.</p>
        <p>However, this representation can generate data redundancy if multiple language varieties from
the same language are involved. For instance, the same designation may occur with a very similar
meaning in several legal systems. This is the case of lockdown (see Section 1) and Vertrag. The latter
is commonly used in German-speaking legal systems to designate a ‘contract’ or ‘agreement’
(Figure 6).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Language varieties at the language level without a language identifier</title>
        <p>The third way of representation is unconventional: artificially assigning an uncodified language
variety to a codified one that is otherwise unused in the terminology database (see [20]). This
extreme solution enables the storage of uncodified language varieties at the language level. It can
be used when dealing with language varieties that the TMS does not support and/or for which
there are still no language identifiers.</p>
        <p>Figure 6 illustrates this method with two language varieties of Ladin. The concept entry’s front
end displays the language Ladin alongside its language varieties. However, in the back end,
“TAIN” (Tamil India) and “TA-MY” (Tamil Malaysia) are used as language identifiers. Naturally, this
solution precludes data interoperability unless adaptation work follows.</p>
        <p>The third representation makes the complexity of accommodating language varieties in
terminology databases even more evident.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This paper presents three ways to representing language varieties: the first (Section 3.1) offers a
viable compromise, the second (Section 3.2) represents the ideal solution and the third (Section 3.3)
employs a workaround. Many factors influence the choice of approach, such as the presence or
absence of language identifiers, the amount of terminological data and the availability of human
and financial resources to modify a database, particularly for retroactive adjustments. In this
regard, it would be desirable to establish guidelines —potentially at the ISO level— to handle
attributive categories and related fields (see Section 3.1). Such guidelines could ensure smooth data
exchange and interoperability, especially for terminology databases unable to alter their structure
due to the large volume of data and the number of working languages involved.</p>
      <p>In the case of uncodified language varieties (e.g., South Tyrolean German), one solution might
be the development of a generic language identifier to serve as a wildcard. This step would also
benefit minority language varieties that currently lack a language identifier.</p>
      <p>The discussion on this topic is far from complete. A future comparative analysis of existing
tools and widely used TMS could assess their effectiveness in accommodating language varieties.
Such an analysis would reveal whether and how the described approaches are implemented in
practice, or if there are other methods of representation.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>In preparing this work, the author used Grammarly for grammar and spelling checks. The content
was then reviewed and edited with assistance from a native English speaker. The author takes full
responsibility for the content of this publication.
[17] ISO 12620-1, Management of terminology resources — Data categories — Part 1: Specifications,</p>
      <p>ISO, Genève, 2022.
[18] ISO 12620-2, Management of terminology resources — Part 2: Repositories, ISO, Genève, 2022.
[19] P. Sandrini, Terminologiearbeit im Recht. Deskriptiver begriffsorientierter Ansatz vom</p>
      <p>Standpunkt des Übersetzers. Braumüller, Wien, 1996
[20] N. Ralli, A. Norbert, bistro – ein Tool für mehrsprachige Rechtsterminologie, trans-kom 11
(2018) 7–44, URL:
http://www.trans-kom.eu/bd11nr01/transkom_11_01_02_Ralli_Andreatta_bistro.20180712.pdf.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Clyne</surname>
          </string-name>
          , Pluricentric Languages - Introduction, in: M.
          <string-name>
            <surname>Clyne</surname>
          </string-name>
          (Ed.),
          <source>Pluricentric Languages: Differing Norms in Different Nations</source>
          , De Gruyter Mouton, Berlin, Boston,
          <year>1991</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://doi.org/10.1515/9783110888140.1.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Muhr</surname>
          </string-name>
          ,
          <article-title>The state of the art of research on pluricentric languages: Where we were and where we are now</article-title>
          , in: R.
          <string-name>
            <surname>Muhr</surname>
            ,
            <given-names>K. E.</given-names>
          </string-name>
          <string-name>
            <surname>Fonyuy</surname>
          </string-name>
          , I. Zeinab, M. Coreyr (Eds.),
          <article-title>Pluricentric Languages and non-dominant Varieties worldwide</article-title>
          , volume
          <volume>1</volume>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Lang</surname>
          </string-name>
          Verlag,
          <article-title>Wien et</article-title>
          . al.,
          <year>2016</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>U.</given-names>
            <surname>Ammon</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.N.</given-names>
            <surname>Lenz</surname>
          </string-name>
          (Eds),
          <source>Variantenwörterbuch des Deutschen</source>
          . Die Standardsprache in Österreich, der Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien,
          <source>Namibia und Mennonitensiedlungen</source>
          , 2nd. ed., de Gruyter, Berlin,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] bistro: Information System for Legal Terminology</article-title>
          , URL: https://bistro.eurac.edu.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>R. de Groot</surname>
          </string-name>
          (
          <year>1999</year>
          ),
          <article-title>Das Übersetzen juristischer Texte”</article-title>
          .
          <source>Recht und Übersetzen</source>
          , in: G.R. de Groot, R. Schulze (Eds.),
          <source>Recht und Übersetzen</source>
          , Nomos, Baden-Baden,
          <year>1999</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Translating</given-names>
            <surname>Law</surname>
          </string-name>
          . Multilingual Matters, Clevedon,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Šarčević</surname>
          </string-name>
          , New approach to legal translation, Kluwer Law International, The Hague,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Muhr</surname>
          </string-name>
          ,
          <article-title>Österreichische und deutsche Rechtsterminologie - Typische Unterschiede und Probleme der Beschreibung plurizentrischer Rechtstermini'</article-title>
          ,
          <source>Schriftenreihe der Deutschsprachigen Gemeinschaft</source>
          ,
          <volume>13</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ralli</surname>
          </string-name>
          ,
          <article-title>Habe ich nun Vorfahrt, Vorrang, Vortritt oder soll ich doch lieber warten?, Ask a Linguist, Eurac Research Blog</article-title>
          . URL: https://www.eurac.edu/en/blogs/connecting
          <article-title>-the-dots/habe-ich-nun-vorfahrt-vorrang-vortrittoder-soll-ich-doch-lieber-warten.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ralli</surname>
          </string-name>
          , Natascia,
          <string-name>
            <surname>I. Stanizzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Alber, COVID-19 e lavoro terminologico: riflessioni a posteriori</article-title>
          ,
          <source>AIDAinformazioni: Rivista di Scienze dell'Informazione</source>
          , vol.
          <volume>1</volume>
          -
          <fpage>2</fpage>
          (
          <year>2023</year>
          ),
          <year>2023</year>
          ,
          <fpage>91</fpage>
          -
          <lpage>114</lpage>
          . doi: https://doi.org/10.57574/596529285.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[11] ISO 26162-1</source>
          , Management of terminology resources -
          <source>Terminology Databases - Part</source>
          <volume>1</volume>
          : Design,
          <string-name>
            <surname>ISO</surname>
          </string-name>
          , Genève,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] ISO 16642,
          <article-title>Computer applications in terminology - Terminological markup framework</article-title>
          , ISO, Genève,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Drewer</surname>
          </string-name>
          ,
          <string-name>
            <surname>K-D. Schmitz</surname>
          </string-name>
          , Terminologiemanagement. Grundlagen - Methoden - Werkzeuge, Springer, Berlin,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[14] ISO 30042</source>
          ,
          <article-title>Management of terminology resources - TermBase eXchange (TBX), ISO</article-title>
          , Genève,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[15] ISO 639</source>
          ,
          <article-title>Code for individual languages and language groups</article-title>
          , ISO, Genève,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[16] ISO 3166</source>
          ,
          <string-name>
            <surname>Country</surname>
            <given-names>codes</given-names>
          </string-name>
          , URL: https://www.iso.org/iso-3166
          <string-name>
            <surname>-</surname>
          </string-name>
          country-codes.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>