<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>German Conference on Artificial Intelligence, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Introduction to the Second Workshop on Humanities-Centred Artificial Intelligence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sylvia Melzer</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hagen Peukert</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Thiemann</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universität Hamburg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universität zu Lübeck</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>19</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In 2022, this year's workshop on Humanities-Centred Artificial Intelligence (CHAI) presents a selection of five papers that are supposed to reveal a variety of projects in the field of the Humanities, in which artificial intelligence (AI) methods are engaged to generate outcomes with higher rates of eficiency than with traditional methods. It seems that the focus on eficiency is the next logical step in this series of workshops intending to provide a circular view on all aspects of a commitment to Artificial Intelligence in the Humanities. While in 2021 the first workshop [1] prioritized all those projects that showed a deep impact on finding phenomena which the human mind is unable to think of in the first place, it bears a lot of plausibility to continue the workshop series with topics on how to best process, prepare and extract the needed information. In addition, we like to maintain the idea of presenting a very diverse array of projects and applications promoting the essence of the Humanities - a most diverse field of academic disciplines. Admittedly, the focus on texts is prevalent throughout, even in disciplines like art history, musicology, or archaeology. Yet shifting towards new technologies in all fields is also undeniable. As an illustration, nowadays historians are increasingly using technologies to evaluate texts which are stored in a structured and machine-readable format such as Text Encoding Initiative (TEI) [2] or EpiDoc [3]. And if data are not available in appropriate formats, they will use approaches of optical character recognition maybe together with databasing on demand to automatically transform all data of interest to e.g. text encoded material and further into a structured machine-readable code that finally can be saved to a database [4, 5]. Moreover, Humanities' scholars engage computational pattern analysis (see paper 2), social network analysis (see paper 3), or Natural Language Processing (NLP) (see paper 5) to analyze the content or context of written artefacts such as manuscripts or, more specifically, inscriptions on bronze statues (see paper 5). Thus, networks of scribes are identified, the artefact itself is</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        correctly dated and assigned to a place of manufacture. NLP and other AI methods are used to
detect patterns. However, generally, these methods often use training data from contemporary
rather than historical data. This is problematic when the use of the method generated a bias
in the historical record, risking incorrect conclusions about historical events, dates, or places.
E.g. in contribution [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] it is shown, if a poem written in Tamil between the 1st century BCE
and the 2nd century CE is translated into English using e.g. the Google Translator, the correct
translation is not guaranteed. One of the reasons for incorrect translation is that the structure
of a language from the past is diferent from that of today.
      </p>
      <p>The very same phenomenon is addressed in the contribution on afix identification in Middle
English (see paper 1), in which the semantic function of a bound afix may change over time. Yet
this is only one side of the coin; the other side is the form of the afix that usually changes more
drastically and leads to high degrees of variability hardly to be recognized either by humans or
machines. So collecting representational quantitative data on the frequency of lexical afixes
throughout 700 hundred years of English language use has proven to be challenging. While type
frequencies of all sufixes and prefixes were determined with relative ease, the identification of
token frequencies from larger text corpora turns out to be calling for AI approaches. Extracting
all representations of one afix type and its exact quantities required taking into account all kinds
of variability in form and usage. Exact quantities are required to make the more interesting
statements on afix productivity and identification as well as interrelations with other factors
of influence in the system of language, i.e. a correlation to word order or predictions of likely
future changes. Again, because of the small quantities of available text training material,
automated AI approaches have long been ignored as possible candidates for a viable solution.
Indeed, this is comprehensible for Neural Network approaches, but as the contribution reveals
in describing diferent stages of adjusting and exchanging methodological set ups, the correct
combination of methods to solve the problem satisfactorily is finally achieved, i.e. a given (and
long standing) problem in Diachronic Linguistics exemplifies how the existing inventory of
AI-methods is typically applied. There are hardly any straight imperatives of proceedings
that could be followed here. In fact, it cannot be plausibly predicted with a higher or lower
probability as to which a certain AI method fits better than the other. Of course, it is possible
to make a reasonable selection from the method inventory – that is, exclude neural networks
because the data does not fulfill its very basic requirements – but it still leaves the researcher
with too many alternatives from which it is impossible to estimate a success rate. What seems
to be an trial-and-error approach from the outside, is a kind of systematic polling from the
inside perspective. In the concrete case described in the contribution, one could learn from
the history of implemented tools that, on the one hand, the right combination from a
semiautomatic method (1st generation) enriched with a smart algorithm (2nd generation) would
only be eficient if extended with a quality resource (4th generation). On the other hand, none
of the components can be missed out, however, as the third generation showed, not all methods
are equally optimal.</p>
      <p>As further explicated in paper 4, that the algorithms of an information retrieval process
produce results that frequently cannot be understood by the end users. Therefore, an
information retrieval approach was presented, which explained information retrieval results in an
explainable way.</p>
      <p>To conclude, AI methods used in the Humanities should be further investigated considering
the many influential variables as in any other subject such as biases, objectivity,
representativeness, validity and the like. During the CHAI 2022 workshop, the challenges in applying AI
methods in the field of the Humanities and first solutions will be highlighted. In the
contributions at hand, new algorithms and requirements are presented as well as one approach to fulfill
the user needs during an information retrieval process through the supporting use of a Pepper
robot.</p>
      <p>
        The existing algorithms were developed to solve one problem and not all problems. To solve
domain-specific problems, a knowledge base is needed that can be applied in the application
of algorithms. But there is no algorithm that will work for all domains. There are only small
parts which have to be combined efectively so that only the relevant knowledge has to be
considered when selecting algorithms. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] To gain knowledge from a variety of humanities
projects and to be able to take them into account during implementation can be achieved
through the interaction between humanities and computer science. This interaction space is
created by the workshop Humanities-Centred Artificial Intelligence (CHAI).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gippert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peukert</surname>
          </string-name>
          ,
          <source>Proceedings of the Workshop on HumanitiesCentred Artificial Intelligence (CHAI</source>
          <year>2021</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          <volume>3093</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>44</lpage>
          . Https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3093</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Text</given-names>
            <surname>Encoding</surname>
          </string-name>
          <string-name>
            <surname>Initiative</surname>
          </string-name>
          ,
          <article-title>P5: Guidelines for Electronic Text Encoding</article-title>
          and Interchange,
          <source>Version 4.0</source>
          .0.,
          <source>Last updated on 13th February</source>
          <year>2020</year>
          , revision ccd19b0ba, https://tei-c.org/ Vault/P5/4.0.0/doc/tei-p5-doc/en/html/,
          <source>2020. Accessed 27 November</source>
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Elliott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bodard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mylonas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stoyanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tupman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vanderbilt</surname>
          </string-name>
          , et al.,
          <article-title>EpiDoc Guidelines: Ancient documents in TEI XML (Version 9)</article-title>
          ., Available: https://epidoc.stoa.org/ gl/latest/., (
          <year>2007</year>
          -
          <fpage>2022</fpage>
          ).
          <source>Accessed January 22</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzer</surname>
          </string-name>
          , E. Wilden,
          <string-name>
            <given-names>R.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <article-title>TEI-based Interactive Critical Editions</article-title>
          ,
          <source>in: 15th IAPR International Workshop on Document Analysis Systems, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>230</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Weise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Harter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <article-title>Databasing on demand for research data repositories explained with a large epidoc dataset</article-title>
          ,
          <source>CENTERIS</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kuhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzer</surname>
          </string-name>
          , R. Möller,
          <article-title>AI-based Companion Services for Humanities, in: AI methods for digital heritage</article-title>
          ,
          <source>Workshop at 43rd German Conference on Artificial Intelligence</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rich</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and the humanities</article-title>
          ,
          <source>Computers and the Humanities</source>
          <volume>19</volume>
          (
          <year>1985</year>
          )
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          . URL: http://www.jstor.org/stable/30204398.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>