<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Annual Conference of the German Informatics Society), September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Challenges in Automatic Speech Recognition in the Research on Multilingualism</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edyta Jurkiewicz-Rohrbacher</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Asselborn</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universität Regensburg, Institut für Slavistik</institution>
          ,
          <addr-line>Universitätsstraße 31, 93053 Regensburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hamburg, Institut für Slavistik</institution>
          ,
          <addr-line>Von-Melle-Park 6, 20146 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Hamburg, Institute of Humanities-Centered Artificial Intelligence (CHAI)</institution>
          ,
          <addr-line>Warburgstraße 28, 20354 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>18</volume>
      <issue>2025</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper explores the potential of using Large Language Models in multilingualism research to accelerate the management and processing of spoken data. The speech-to-text processing of utterances by multilingual speakers are in the focus. Qualitative discussion of the main issues relating to the non-standard language use of bilingual individuals is provided, using Polish-German recordings from the LangGener corpus as an example.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multilingualism</kwd>
        <kwd>ASR</kwd>
        <kwd>Transcription</kwd>
        <kwd>Polish</kwd>
        <kwd>German</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The main challenge in multilingualism research is collecting suficient data from a homogeneous sample
of multilingual speakers to achieve robust results. Perfectly balanced bilingualism in speech and writing
is extremely rare. Most speakers have significant discrepancies in their writing skills across diferent
languages, often because they were not educated in one of them. Consequently, multilingualism research,
which is necessarily usage-based, relies mostly on spoken data. Managing spoken data is considerably
more challenging than managing written data, which is more consistent, even in less formal varieties,
and therefore easier to parse and annotate. To perform any kind of quantitative analysis, spoken data
must first be transcribed into digital text for further processing and annotation.</p>
      <p>Recent developments in large language models for Automatic Speech Recognition (ASR), in particular
the emergence of Whisper [1], have clearly accelerated the transcription process. The business sector is
the biggest beneficiary, as any kind of recording can be used for quick documentation of a meeting,
compilation of notes, protocols or presentations.</p>
      <p>From the perspective of multilingualism research, however, the situation looks diferent due to the
diferent expectations of the quality of automatic transcription from business and research. This paper
provides an overview of the issues that need to be addressed to enable linguists to use LLM-enhanced
ASR eficiently. We describe the targeted standard of transcript in Section 2. We use Polish-German
bilingualism as an example and data from the LangGener project [2] (see Section 3.2). The preliminary
research presented here is based on three versions of Whisper[1]: small, medium, large, because it
can be run locally (see Section 6 for ethical reasons) and is evaluated as best performing among the
available transcription tools (see [3] for comparison). Section 4 describes the generally known problems
related to the quality of transcripts, while Section 5 presents the issues specific to multilingualism.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Targeted Features of Transcript</title>
      <p>In terms of transcripts, the main diference between business and multilingual studies is that
businessoriented transcripts should ideally be clean and monolingual to ensure fluent reading. In other words,
the focus is on the content’s informativeness. Therefore, an ASR should automatically translate foreign
elements into the company’s language — English in many cases — even if the meeting was multilingual.
Elements of speech that do not carry essential information, such as repetitions, disfluencies and
hesitation markers, should be removed. In other words, a business transcript will resemble standard
written language more closely than the actual oral communication.</p>
      <p>Transcripts used for research purposes are diferent in nature. In linguistic investigations, the
recording is the primary object of research and transcripts approximate all the sounds that speakers
make, including hesitation markers and self-corrections, to show how and when oral communication
lfows.</p>
      <p>The standards for transcripts vary in academic research, depending on the language being documented
and the purpose of the research. Underresourced languages without a written tradition or standard
orthographic rules are usually transcribed using phonetic transcription systems (for overview, see
[4]). Most European languages can be transcribed using simpler systems that have been developed for
analysing spoken discourse (for overview, see [5]). Such systems are mostly based on the orthography
of the standard language. They vary in terms of the notation used for spoken language phenomena
and issues related to transcript quality, as well as the number of annotation layers. The latter approach
is preferable for further automatic language processing, such as lemmatisation and morphosyntactic
tagging, as well as for automatic queries, which can be conducted according to standard forms. Further
annotation layers, such as phonetic transcriptions, can be added afterwards to diferentiate between
diferent non-standard pronunciations, which are typical of dialects, heritage, and non-native varieties.
In summary, a transcript for research purposes should be a precise representation of speech that is
simple enough to be easily searched and processed further.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment Setup</title>
      <p>The following section focuses on the technical side of this article. First, a brief introduction into the
Whisper models used is provided together with the configuration parameters. Afterwards, a brief
description of the dataset is given.</p>
      <sec id="sec-3-1">
        <title>3.1. Whisper Models</title>
        <p>For this case study, we have decided to use the standard Whisper models[1]. The latest versions
as described on the Whisper GitHub page of both the library and the versions of the models were
employed1. For this first case study, our goal was to have a base case estimate of how the basic Whisper
models will perform. Thus, we have decided to use the auto language detection modes for all the
experiments. We have used three diferent model sizes to find the tradeofs between the model size and
qualitative results of the transcription. The three models used were
• Whisper base with 74 million parameters,
• Whisper medium with 769 million parameters and
• Whisper large with 1550 million parameters.</p>
        <p>More information can be found on the model card on GitHub2 All Whisper models were used in the
multilingual versions.</p>
        <sec id="sec-3-1-1">
          <title>1https://github.com/openai/whisper, accessed September 17 2025. 2https://github.com/openai/whisper/blob/main/model-card.md, accessed September 17, 2025.</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Corpus LangGener as Source of Data</title>
        <p>The LangGener Corpus [6] contains recordings of language biography interviews with Polish–German
bilinguals. The sample is stratified across two generations: an older generation who lived in Poland in
areas that were part of the German Reich before 1945 (called Generation Poland), and late bilinguals
who were born in these areas and immigrated to Germany (nowadays known as ’Aussiedler’ or in the
project called Generation Germany).</p>
        <p>This stratification principle makes the sample interesting for ASR since their speech contains various
features of non-standard language, including dialectal features and non-native pronunciation.</p>
        <p>Structurally, the corpus contains many phenomena related to multilingualism [7] such as
codeswitching, lexical matter and pattern replication. We describe them briefly in Section 5. For a full
overview of these features in LangGener, see [8].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Previously Identified Problems with LLM-enhanced Transcription</title>
      <p>
        One serious flaw that applies to many LLM-enhanced tools is hallucinations. In the case of ASR, this
means transcriptions of text which cannot be aligned with the audio file, observed by [ 9] for Whisper.
Although open.ai claimed improving this issue,3 we still identify hallucinations in the studied data, as
shown in (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) To jest pożyteczna rozmowa. Dowiemy się trochę więcej o sobie.4 (Added in transcription with
Whisper Large:) Tak. Tak. To było bardzo miłe, ale bardzo miłe. Tak. Tak. Tak. Tak. Tak.
Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak. Tak.
‘That’s a useful chat. We are learning more about ourselves. (Added in transcription with Whisper
Large:) Yes. Yes. This is very nice, but really nice. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes.’
In their review of ASR for English and German, [3] identify several types of mistake that are probably
generally valid. These are:
• wrong transcription due to similar-sounding words,
• misunderstood proper nouns,
• omitting single words and sentences,
• missing sub-sentences,
• assigning wrong endings,
• spelling mistakes.
      </p>
      <p>The Whisper transcription of the LangGener corpus contains such mistakes too. In particular, it
omits disfluencies, broken words, repetitions and other features of conversational data. We find issues
relating to incorrect endings, spelling mistakes and homophone structures that are particularly relevant
for multilingual data. We therefore return to them in the next section.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Transcription Issues for Research on Multilingualism</title>
      <sec id="sec-5-1">
        <title>5.1. Tendency to codes’ unification</title>
        <p>Problems strictly related to multilingual transcriptions have received little investigation to date. We are
aware of [10], who fine-tuned Whisper for English-Chinese (i.e. biscriptual) transcription. However,
this work does not provide an overview of the problems associated with such transcriptions; rather, it
contains a technical description of the fine-tuned model.</p>
        <p>When studying multilingualism, it is important to avoid translations in transcripts known from
business solutions, since the focus of research is precisely on phenomena related to the mixing of
two language codes. Typically, one code dominates (the matrix code), while the other is used only
occasionally and is embedded within the matrix code.</p>
        <p>
          Whisper does adjust the text to one language relatively rarely, but we find some problematic passages,
for example (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ):
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) a. diese Rechtsanspruch to # jest każde dziecko ma Recht da drauf5
        </p>
        <p>b. Tak, tak. ...na szpruch. To jest... każde dziecko ma prawo na szpruch [Whisper Large]
The word Rechtsanspruch ‘legal right’ is not recognized as the embedded German code, and its second
part is transcribed with Polish orthography as Polish non-existing word. /S/ is represented as &lt;sz&gt;
instead of German spelling requiring &lt;s&gt; at the beginning of word before &lt;k&gt;, &lt;p&gt; or &lt;t&gt;. Further, the
phrase Recht da drauf ‘right to it’ is not transcribed but interpreted, and rendered in Polish consistently
with the previous mistake, thus, as prawo na szpruch ‘right for szpruch’.</p>
        <p>
          Preferring to unify with the matrix code in the output can also lead to words being replaced with
similar-sounding words from the matrix code, or entirely new linguistic units being formed according
to word-building rules of the matrix code. Thus, in example (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) German word schicken ‘here: they send’
is represented as Polish szczypią ‘they pitch’, while pressen ‘they squeeze’ is represented as the made-up
verb presują. In eine Gruppe receives a direct translation. Note that the German forms are ambiguous in
terms of person and number, which may explain partially why switching to German in the transcription
is avoided.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) a. i oni coraz więcej tych dzieci # schicken in eine Gruppe pressen in eine Gruppe _ żeby
się wszyscy pomieścili 6
b. i oni coraz więcej tych dzieci... ...szczypią w jednej grupie, presują w jednej grupie.
        </p>
        <p>[Whisper Large]</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Code detection</title>
        <p>In all three versions of Whisper, the matrix code was usually recognised in our data, and the embedded
code was a considerable source of transcription errors. However, we obtained transcripts where the
output was recognised as an entirely diferent language, such as Yiddish, as shown in Figure 1 from the
transcript of speaker BQ RAC with Whisper Large.</p>
        <p>We assume that this may be due to the frequent code switching and unclear pronunciation in the first
few seconds of the audio file. This shows that the way in which the recordings are cut before processing
afects the accuracy of the automatic transcription. Whisper Base exhibits a similar issue in the same
section, but incorporates an even greater variety of languages, ultimately producing a predominantly
Germanic transcription, as shown in Figure 2.</p>
        <sec id="sec-5-2-1">
          <title>5https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-146207-151067.wav 6https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-151067-158777.wav</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Influence of language contact related phenomena</title>
        <p>In Section 3.2 we mentioned three phenomena that occur particularly frequently in the situations
of language contact: Code Switching (CS), Matter Replication (MAT) and Pattern Replication (PAT).
Having them shortly explained, we point out the problems they cause for ASR.</p>
        <p>
          Following [7], we classify phenomena related to the direct transfer of embedding language as CS
and MAT. When a single word or phrase takes over the inflection of the matrix language and becomes
fully integrated, we call it MAT. CS is similar to MAT, but it shows the lower grade of grammatical
integration as shown in (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ).
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) bo normalerweise zawsze miałem pół # Sprachkursu 7
        </p>
        <p>‘because normally, I had only half of the language course.’</p>
        <p>CS and MAT are particularly interesting from the perspective of ASR, as they imply mixing of two
language codes with distinct phonological systems, and orthographies. This leads to issues regarding
orthographic choice and the identification of words as belonging to the lexicon of the matrix or embedded
language. Consequently, we observe the frequent mistakes mentioned in the previous section, such as
incorrect spellings, interlingual homophones and problems with ending assignment.</p>
        <p>
          MAT in (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) receives Polish inflection of male noun in accusative singular -u. However, the embedded
language orthography in the stem of the word is required by the HIAT-based transcription [11], the
approach taken by the authors’ of LangGener corpus. This is problematic for Whisper, as shown in (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ).
The word Sprache poses the beginning of a CS. Whisper interprets it as MAT and transcribes it following
Polish orthography as &lt;sz&gt; and not as capital &lt;S&gt; according to the German orthographic norm. Note
that speakers of Polish usually denasalise /˜E/, rendered in Polish orthography as &lt;ę&gt;, in coda to /E/.
Therefore, it is unlikely that it could be registered in an audio file.
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) a. to mi było wichtig żeby nasze dzieci jedną Sprache gut beherrscht he * # haben8
b. to mi było wichtig, żeby nasze dzieci jedną szprachę gut beherste haben. [Whisper Large]
Another frequently occurring mistake is writing words together, for example an adverb and an
adjective, as in the phrase ganz kleine Kinder ‘very small children’ in example (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) is transcribed as
ganzkleinen Kindern. Note that additionally the inflection ending -n is added, although it does not occur
in the audio file.
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) a. ta grupa jest z tymi ganz kleine Kinder jest dobrze # belegt 9
        </p>
        <p>b. ta grupa z tymi ganzkleinen Kindern ist dobrze belegt. [Whisper Large]</p>
        <p>
          Frequent CS seems to increase the error in the transcription, as shown in (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ):
7https://langgener.ijppan.pl/OUT/BN_WUP_I_GD_PL07-477730-480050.wav
8https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-237220-247670.wav
9https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-58850-64500.wav
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) a.
        </p>
        <p>my to są trzy # Personalschlüssel ist # ich glaube drei # zwei Vollzeitkräfte i chyba
jeszcze trzydzieści # godzin jedna Kraft10
b. To są trzy personalsche SEL, jakieś global drive, zwei Vollzeitkräfte i chyba jeszcze 30 o
godzin, bo jednak kraft. [Whisper Large]</p>
        <p>The long German compound noun Personalschlüssel ‘stafing ratio, personnel counting’ is most likely
recognised as German since the first part of the word is transcribed following German spelling, the
German clause ich glaube drei ‘I guess three’ is transcribed as a mix of Polish and English jakieś global
drive. Although the phrase zwei Vollzeitkräfte ‘two full positions’ is entirely correctly transcribed,
the final part of the utterance trzydzieści godzin jedna Kraft, which is a mix of Polish and German is
transcribed (partly incorrectly) with Polish spelling, since kraft is not capitalised.</p>
        <p>We also observe here that if a word is spelled incorrectly, this error is consistent throughout the
transcript. The word Personalschlüssel, which appears four times in one recording, is consistently
misspelled with a made-up expression personalsche SEL each time.</p>
        <p>
          The third mentioned category, PAT refers to the situation where only structures are borrowed into
matrix language. In example (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ), the Polish age construction with a habere verb (mieć) is replicated into
German, instead of copula construction sein + Numeral + Jahre alt.
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
meine Mutter die hat dreizehn Jahre gehabt  11
‘My mother, she was thirteen years old.’
The influence of pattern replication is dificult to assess using purely qualitative analysis. Often, PAT
violates the idiom of the matrix languages, or its syntax, e.g., as demonstrated by the famous sentence
uttered by the footballer Lothar Matthäus again what learned, which copies the German syntax of
‘wieder was gelernt’. Therefore, we leave the analysis of PAT for future research.
        </p>
        <p>
          Nevertheless, the syntax-based prediction may play a role ASR. For example, the Polish modal verb
może ‘can’ is often followed by an infinitive complement. In (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ), however, the subject and predicate
are inverted. Therefore, the subject rodzic ‘parent’ linearly following the modal predicate seems to
be interpreted as similar in pronunciation infiniitve rodzić ‘give birth’, while the actual infinitive is
spelled-out by the speaker in German and, therefore, unrecognized.
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) bo takie dziec * jak dziecko nie dostanie może rodzic verklagen die Stadt12
        </p>
        <p>a. jak dziecko nie dostanie, może rodzić swe klage w tej szczypie [Whisper Large]
Therefore, we believe that the word-order related PAT are worth inspections in the future. For
example, we formulate a working hypothese that the strict German word-order rules regarding the
position of the verbal predicate – in the second position in the main sentence or in the last position in
the subordinate sentence, frequently copied by bilingual speakers from Polish to German – could pose
a potential source of error for ASR too.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Ethical Issues</title>
      <p>Due to protection policies and laws, such as the General Data Protection Regulation in the European
Union, personal and sensitive data must be protected and inaccessible to external entities during and
after the preparation and publication of data. Nonetheless, data obtained is expected to be publicly
available to other researchers in accordance with the current research standards. Typically, the personal
and sensitive data are accessible only to authorised project members. To enable LLM-based processing of
data that could contain sensitive information, such information would need to be deleted or hidden before
10https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-49260-57407.wav
11https://langgener.ijppan.pl/OUT/XL_PIL_GP_DE27-77300-82110.wav
12https://langgener.ijppan.pl/OUT/NF_PAD_II_GD_PL01-158897-162417.wav
processing on external servers. Alternatively, it can be conducted exclusively locally while maintaining
the data protection standards. Of these two options, only local processing is realistic. First, manually
processing transcripts to pseudonymise or erase information would be economically unreasonable.
Second, an ASR with integrated LLMs benefits from the syntactic and semantic information encoded by
name entities, which are usually the subject to pseudonymisation.</p>
      <p>Access to data sets appropriate for the fine-tuning ASR is also more problematic than for other
LLMenhanced applications. Such data is most frequently subject to data protection regulations, meaning the
data sets cannot be shared without major changes being undertaken.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion and Future Work</title>
      <p>It appears that the quality of state-of-the-art ASR systems is still far from enabling instant multilingual
transcriptions without considerable expenditure on post-editing. It is important to point out that
grammatical and orthographic errors are easily detectable. Problems with translation and the removal
of embedded language from transcriptions are more serious, as they are harder to identify. Therefore,
these two issues are given higher priority in additional training. Based on the results of the survey, our
future work will focus on fine-tuning Whisper using the bilingual LangGener corpus. Additionally, we
will explore ways to incorporate ASR into the Research Data Repository of the University of Hamburg,
using the Polish-German example as the prototype. This would enable users to search for recordings
based on the words used in them without requiring the uploader to provide a transcript upfront.</p>
    </sec>
    <sec id="sec-8">
      <title>Limitations</title>
    </sec>
    <sec id="sec-9">
      <title>Ethics Statement</title>
      <p>This paper provides a very preliminary overview of problems related to ASR in multilingualism research.
It is evident that trials on fine-tuning could ofer solutions to these issues. Although multilingual
transcripts are the focus of this paper, some of the problems (and therefore potential solutions) may
also be relevant to monolingual transcriptions with dialectal features or other vulnerable groups.
This work complies with the ACL Ethics Policy. Prior to the current study, we had not taken any actions
to pre-train the systems for the needs of the current task.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepL in order to: Grammar and spelling check.
After using these tool(s)/service(s), the authors reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgements</title>
      <p>We are thankful to the principal investigators of the project Language across Generations: Contact
Induced Change in Morphosyntax in German-Polish Bilingual Speech, Anna Zielińska and Björn Hansen,
for providing us with the material for the survey.</p>
      <p>This contribution was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German
Research Foundation) under Germany´s Excellence Strategy – EXC 2176 ‘Understanding Written
Artefacts: Material, Interaction and Transmission in Manuscript Cultures’, project no. 390893796. The
research was mainly conducted within the scope of the Centre for the Study of Manuscript Cultures
(CSMC) at University of Hamburg.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          , T. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Brockman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McLeavey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Robust speech recognition via large-scale weak supervision</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2212</volume>
          .
          <fpage>04356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Zielińska (Eds.), Soziolinguistik trift Korpuslinguistik:
          <article-title>Deutsch-polnische und deutsch-tschechische Zweisprachigkeit</article-title>
          , Universitätsverlag Winter,
          <year>2022</year>
          . doi:doi.org/10. 33675/2022-82538591.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wollin-Giering</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Höfting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ventzke</surname>
          </string-name>
          ,
          <article-title>Automatic transcription of english and german qualitative interviews</article-title>
          ,
          <source>Forum Qualitative Sozialforschung / Forum: Qualitative Social Research</source>
          <volume>25</volume>
          (
          <year>2024</year>
          ).
          <source>doi:10.17169/fqs-25.1</source>
          .4129.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tresoldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chacon</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-M. Fehn</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Walworth</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Forkel</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. List</surname>
          </string-name>
          ,
          <article-title>A crosslinguistic database of phonetic transcription systems</article-title>
          ,
          <source>Yearbook of the Poznan Linguistic Meeting</source>
          <volume>4</volume>
          (
          <year>2018</year>
          )
          <fpage>21</fpage>
          -
          <lpage>53</lpage>
          . doi:
          <volume>10</volume>
          .2478/yplm-2018-0002.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Kreuz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riordan</surname>
          </string-name>
          ,
          <article-title>The art of transcription: Systems and methodological issues</article-title>
          , in: A. H.
          <string-name>
            <surname>Jucker</surname>
            ,
            <given-names>K. P.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          , W. Bublitz (Eds.), Methods in Pragmatics, De Gruyter Mouton, Berlin,
          <year>2018</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>120</lpage>
          . doi:doi:10.1515/
          <fpage>9783110424928</fpage>
          -
          <lpage>003</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nekula</surname>
          </string-name>
          ,
          <article-title>Die LangGener-Korpora als multifunktionale Ressourcen der Mehrsprachigkeitsforschung zwischen Sozio- und Korpuslinguistik.</article-title>
          , in: B.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zielińska (Eds.), Soziolinguistik trift Korpuslinguistik:
          <article-title>Deutsch-polnische und deutsch-tschechische Zweisprachigkeit</article-title>
          , Winter Universitätsverlag, Heidelberg,
          <year>2021</year>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakel</surname>
          </string-name>
          ,
          <article-title>Investigating the mechanisms of pattern replication in language convergence</article-title>
          ,
          <source>Studies in Language</source>
          <volume>4</volume>
          (
          <year>2007</year>
          )
          <fpage>829</fpage>
          -
          <lpage>865</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Centner</surname>
          </string-name>
          ,
          <article-title>Lexikalische Replikation bei deutsch-polnisch Bilingualen in zwei Generationen</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Universität Regensburg,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .5283/epub.58164.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Koenecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S. G.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. X.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sloane</surname>
          </string-name>
          ,
          <article-title>Careless whisper: Speech-to-text hallucination harms</article-title>
          ,
          <source>in: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency</source>
          , FAccT '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>1672</fpage>
          -
          <lpage>1681</lpage>
          . doi:
          <volume>10</volume>
          .1145/3630106.3658996.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Adapting whisper for codeswitching through encoding refining and language-aware decoding</article-title>
          ,
          <source>in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          . 1109/ICASSP49660.
          <year>2025</year>
          .
          <volume>10889634</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ehlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rehbein</surname>
          </string-name>
          , Halbinterpretative Arbeitstranskriptionen (HIAT),
          <source>Linguistische Berichte</source>
          (
          <year>1976</year>
          )
          <fpage>21</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>