<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Better Transcription of UK Supreme Court Hearings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hadeel Saadany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catherine Breslin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Constantin Orăsan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Walker</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Translation Studies, University of Surrey</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Just Access</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kingfisher Labs Ltd</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>19</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Transcription of legal proceedings is very important for enabling access to justice. However, manual speech transcription is an expensive and slow process. In this paper we describe part of a combined research and industrial project for building an automated transcription tool designed specifically for the justice sector in the UK. We explain the challenges involved in transcribing court room hearings and the Natural Language Processing (NLP) techniques we employ to tackle these challenges. We will show that fine-tuning a generic of-the-shelf pre-trained Automatic Speech Recognition (ASR) system with an in-domain language model as well as infusing common phrases extracted with a collocation detection model can improve not only the Word Error Rate (WER) of the transcribed hearings but avoid critical errors that are specific of the legal jargon and terminology commonly used in British courts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal Transcription</kwd>
        <kwd>UK Supreme Court</kwd>
        <kwd>Automatic Speech Recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        There has been a recent interest in employing NLP
techniques to aid the textual processing of the legal domain Model Transcript
[
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. In contrast, processing spoken court hearings Reference So my lady um it is dificult to..
has not received the same attention as understanding the AWS ASR So melody um it is dificult to...
legal text documents. In the UK legal system, the court Reference All rise ...
hearings sessions have a unique tradition of verbal argu- AWS ASR All right ...
ment. Moreover, these hearings crucially aid in new case Reference it makes further financial order
preparation, provide guidance for court appeals, help in AWS ASR it makes further five natural
legal training and even guide future policy. However,
the audio material for a case typically spans over several
hours, which makes it both time and efort consuming tem which can compete with well-known cloud-based
for legal professionals to extract important information ASR systems which are trained on much larger datasets.
relevant to their needs. Currently, the existing need for At the same time, in commercial scenarios, using generic
legal transcriptions (covering 449K cases p.a in the UK cloud-based ASR systems to transcribe a specialised
doacross all court tribunals [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is largely met by human main may result in a sub-optimal quality transcriptions
transcribers. for clients who require this service.
      </p>
      <p>Although there are several current speech-to-text This holds particularly true for British court room
au(STT) technology providers which could be used to tran- dio procedures. When applying a generic cloud-based
scribe this data automatically, most of these systems ASR system (in our case Amazon Transcribe) on British
are trained on general domain data which may result court rooms, the Word Error Rate (WER) remains
relain domain-specific transcription errors if applied to a spe- tively high due to hearings’ length, multiplicity of
speakcialised domain. One way to address this problem is for ers, complex speech patterns, and more crucially, due
end-users to train their own ASR engines using their in- to unique pronunciations and domain-specific
vocabudomain data. However, in most of the cases the amount lary. Examples in Table 1 show some common
probof data available is too low to enable them to train a sys- lems we faced when transcribing UK court hearings
by on-the-shelf ASR systems such as Amazon Web
Services (AWS) Transcribe1. The references are taken from
human-generated ground-truth transcripts of real UK
Supreme Court Hearings2 created by the legal editors</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        in our project’s team. The first error is due to a special
pronunciation of the phrase ‘my lady’ in British court
rooms as it is pronounced like ‘mee-lady’ when barris- Automatic speech recognition (ASR) models convert
auters address a female judge. Similarly, in the second dio input to text and they have optimal performance
example, the error relates to the linguistic etiquette of when used to transcribe data which is similar to the one
UK court hearings which the ASR system consistently they were trained on. However, performance degrades
fails to recognise. The error in the third example, on the when there is a mismatch between the data used for
trainother hand, is related to legal terminology critical of the ing and the one that is being transcribed. Additionally,
specific transcribed case. Errors similar to the third ex- some types of audio material are intrinsically harder for
ample are numerous in our dataset and also afect named speech recognition systems to transcribe. In practice,
entities such as numbers and names that are vital in un- this means that speech recognition system performance
derstanding the legal argument in the transcribed cases. degrades when, for example, there is background noise
These errors can lead to serious information loss and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], non-native accents [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], young or elderly speakers
cause confusion. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], or a shift in domain [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        In this paper, we describe a joint research and com- Performance degradation is typically mitigated by
mercial efort to perform domain adaptation of a generic adapting or fine-tuning ASR models towards the domain
ASR system to mitigate the errors in the automated UK of the targeted data by using a domain-specific dataset
court transcription services. We propose to minimise [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. Some methods for domain adaptation adopt
legal-specific errors by fine-tuning of-the-shelf ASR sys- NLP techniques such as using machine translation
modtems with a custom language model (CLM) trained on els to learn a mapping from out-of-domain ASR errors to
legal documents as well as 139 hours of human-edited in-domain terms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. An alternative approach is to build
transcriptions of UK Supreme Court hearings. We also a large ASR model with a substantially varied training
employ NLP techniques to automatically build a custom set, so that the model is more robust to data shifts. An
vocabulary of common multi-word expressions and word example of this latter approach is the recently released
n-gram collocations that are critical in court hearings. OpenAI Whisper model which is trained on 680k hours
We infuse our custom vocabulary to the CLM at tran- of diverse domain data to generalise well on a range of
scription time. In this research, we evaluate the benefits unseen datasets without the need for explicit adaptation
of our proposed domain adaptation methods by compar- [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
ing the WER of the CLM output with two of-the-shelf Moreover, ASR models are evaluated using Word Error
ASR systems: AWS Transcribe (commercial) and the Ope- Rate (WER), which treats each incorrect word equally.
nAI Whisper model (open-source) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We also compare However, ASR models do not perform equally on diferent
the general improvement in the ASR system’s ability categories of words. Performance is worse for categories
to correctly transcribe legal entities with and without like names of people and organisations as compared to
adopting our proposed methods. In addition we discuss categories like numbers or dates [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. ASR research
tarthe transcription time with diferent ASR settings since geted improving specific errors such as diferent named
transcription time is critical for the commercial pipeline entities using NLP techniques [16, 17].
implemented by the industrial partner of the project. In this paper, we propose simple techniques to improve
the efect of the domain mismatch between a generic
ASR model and the specialised domain of British court
room hearings. Our proposed method, improves both
the system’s WER rate as well as its ability to capture
3https://www.supremecourt.uk/decided-cases/
4https://research.iclr.co.uk/blackstone
case-specific terms and entities. In the next section, we
present the setup of our experiments and the evaluation
results.
      </p>
      <p>The second method we employ to create a list of
custom vocabulary is to identify named entities in our
dataset. For this purpose, we use Blackstone4, an NLP
library for processing long-form and unstructured legal
text capable of identifying legal entities. The list of legal
entities includes: Case Name, Court Name, Provision (i.e.
a clause in a legal instrument), Instrument (i.e. a legal
term of art) and Judge. We concatenated this Blackstone We compare the performance of CLM2 infused with the
entity list with the spaCy v3.4 library list of non-legal en- legal terms list (CLM2+Vocab) to the two generic ASR
tities such as: Cardinals, Persons and Dates. The results systems. The ratios in Table 3 indicate that CLM2+Vocab
of applying our domain-adaptation methods for the tran- is generally more capable of transcribing legal-specific
scription of 2 Supreme Court case hearings consisting of terms than the other two models. It is also better at
12 hours is explained in the next section. transcribing critical legal entities such as Provisions.5
Such legal terminology needs to be accurately transcribed.</p>
      <p>Our CLM2 model with legal vocabulary demonstrates
4. Results better reliability in transcribing these terms.
A similar trend is evident with the legal entity Judge
Table 2 shows the WER scores and WER average score which refers to the forms of address used in British court
for the 2 transcribed cases with diferent CLM system rooms (e.g. ‘Lord Phillips’, ‘Lady Hale’). This entity is
settings, as well as, for the two baseline systems: the typically repeated in court hearings whenever a
barrisAWS Transcribe (AWS base) and Whisper. The diferent ter or solicitor addresses the court. We see that both the
CLM settings are as follows: generic ASR systems perform badly on this category with
1. CLM1 is trained on only the texts of the Supreme ratios of 0.66 and 0.69, respectively. On the other hand,</p>
      <p>Court judgements. we observe a significant improvement in correctly
tran2. CLM2 is trained on both the judgements and the scribing this type of entities by the CLM2+Vocab with a
gold-standard transcripts. ration of 0.84 correct transcriptions. Appendix A shows
3. CLM2+Vocab uses CLM2 for transcription plus an example of the output of the AWS base ASR model
the global vocabulary list extracted by our phrase without our domain-adaptation methods compared to
detection model. the output of the CLM correcting the mistakes. The
transcription errors (highlighted yellow) in the base output
4. CLM2+Vocab2 uses CLM2 for transcription plus includes legal jargon, legal terms and named entities. The
the legal entities vocabulary list extracted by errors are corrected by our CLM model (corrections are
Blackstone and spaCy v3.4 library. highlighted in blue).</p>
      <p>As can be seen in Table 2, the ASR performance is In addition to evaluating the output of the ASR
enconsistently better with the CLM models than with the gines, we also recorded the time required to produce the
generic ASR systems for the two transcribed cases. CLM2 transcription. The models based on AWS were run in the
model, trained on textual data (i.e. the written judge- cloud using the Amazon infrastructure. Whisper was run
ments) and gold-standard court hearing transcriptions, on a Linux desktop with an NVIDIA GeForce RTX 2070
outperforms AWS base and Whisper with a 9% and 8% GPU with 8G VRAM. For all the experiments, the medium
WER improvement, respectively. Moreover, we observe English-only model was used. As expected the fastest
runaround 9% improvement in average WER score over the ning time is obtained using the AWS base model. Running
two generic models when concatenating the list of legal the best performing model increases the time by 155%,
phrases that is extracted by our phrase detection model whilst Whisper more than doubles it. Trade-of between
with the CLM2 system. While ASR error correction in- running time and the level of domain-specific accuracy
dicates an improved transcription quality with our pro- is a variable parameter that can be determined based on
posed domain adaptation methods, we also evaluated the the transcription purpose and the end-user needs defined
ASR systems performance with specific errors such as by our project’s commercial partner.
legal entities and terms. 5A Provision, a statement within an agreement or a law, typically</p>
      <p>Table 3 shows the average ratio of correctly transcribed consists of alphanumeric utterances in British court hearings (e.g.
legal entities in the two studied court room hearings. ‘section 25(2)(a)-(h)’ or ‘rule 3.17’).</p>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <p>In this paper, we present a study which shows the efect of
domain adaption methods on improving the of-the-shelf
ASR system performance in transcribing a specialised
domain such as British court hearings. We optimised the
performance of the ASR system by training an ASR
custom language model on gold-standard legal transcripts
and textual data from the legal domain. We also trained
a phrase detection model to incorporate extracted list of
data-specific bigram collocations at transcription time.
We evaluated the ASR quality improvements both in
terms of average WER and ratio of correctly transcribed
legal-specific terms. We observe significant gains in the
ASR transcription quality by our domain adaptation
techniques. For commercial use of ASR technologies,
improving error rate in general and transcription quality of
critical legal terms in particular would minimise manual
post-editing efort and hence save both time and money.
We plan to evaluate the impact of diferent configurations
proposed in this paper on the editors’ postediting efort.</p>
      <p>In the future, we will expand to record data from a
variety of accents to address another axis of degradation
in British audio procedures diferent than the Supreme
Court hearings which are mostly a homogeneous group
of speakers. We will also explore the ability to use NLP
topic modelling techniques to connect legal entities that
were crucial in a court’s case decision.
P. Zelasko, M. Jetté, Earnings-21: a practical
benchmark for asr in the wild, arXiv preprint
arXiv:2104.11348 (2021).
[16] H. Wang, S. Dong, Y. Liu, J. Logan, A. K. Agrawal,
Y. Liu, ASR Error Correction with Augmented
Transformer for Entity Retrieval., in: Interspeech,
2020, pp. 1550–1554.
[17] N. Das, D. H. Chau, M. Sunkara, S. Bodapati,
D. Bekal, K. Kirchhof, Listen, Know and Spell:
Knowledge-Infused Subword Modeling for
Improving ASR Performance of OOV Named Entities,
in: ICASSP 2022-2022 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, 2022, pp. 7887–7891.
[18] A. Graves, Sequence transduction with recurrent
neural networks, arXiv preprint arXiv:1211.3711
(2012).
[19] J. Guo, G. Tiwari, J. Droppo, M. Van Segbroeck,
C.-W. Huang, A. Stolcke, R. Maas, Eficient
minimum word error rate training of rnn-transducer
for end-to-end speech recognition, arXiv preprint
arXiv:2007.13802 (2020).
[20] G. Bouma, Normalized (pointwise) mutual
information in collocation extraction, Proceedings of GSCL
30 (2009) 31–40.
[21] R. Řehůřek, P. Sojka, Software Framework for Topic
Modelling with Large Corpora, in: Proceedings of
the LREC 2010 Workshop on New Challenges for
NLP Frameworks, ELRA, Valletta, Malta, 2010, pp.
45–50. http://is.muni.cz/publication/884893/en.</p>
    </sec>
    <sec id="sec-4">
      <title>A. Appendix: Examples of ASR ouput with and without</title>
      <p>domain-adaptation</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Elwany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moore</surname>
          </string-name>
          , G. Oberoi,
          <article-title>Bert goes to law school: Quantifying the competitive advantage of access to large legal corpora in contract understanding</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>00473</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Nay</surname>
          </string-name>
          ,
          <source>Natural Language Processing for Legal Texts, DOI=10.1017/9781316529683</source>
          .011, Cambridge University Press,
          <year>2021</year>
          , p.
          <fpage>99</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mumcuoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Öztürk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Ozaktas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koç</surname>
          </string-name>
          ,
          <article-title>Natural language processing in law: Prediction of outcomes in the higher courts of turkey</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <fpage>102684</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frankenreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nyarko</surname>
          </string-name>
          ,
          <article-title>Natural language processing in legal tech, Legal Tech and the Future of Civil Justice (David Engstrom ed</article-title>
          .) (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sturge</surname>
          </string-name>
          , Court statistics for England and Wales,
          <source>Technical Report, House of Commons Library</source>
          ,
          <year>2021</year>
          . URL: https://commonslibrary.parliament.uk/ research-briefings/cbp-8372/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          , T. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Brockman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McLeavey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Robust Speech Recognition via Large-Scale Weak Supervision</article-title>
          ,
          <source>OpenAI</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mandel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khudanpur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Manohar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Povey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Raj</surname>
          </string-name>
          , et al.,
          <article-title>CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kudina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Halpern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Scharenborg</surname>
          </string-name>
          ,
          <article-title>Quantifying bias in automatic speech recognition</article-title>
          ,
          <source>arXiv preprint arXiv:2103.15122</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Mitigating bias against non-native accents</article-title>
          , Delft University of Technology (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carson-Berndsen</surname>
          </string-name>
          ,
          <article-title>Unsupervised domain adaptation for speech recognition with unsupervised error correction</article-title>
          ,
          <source>Proc. Interspeech</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>5120</fpage>
          -
          <lpage>5124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Siddhartha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Beaufays</surname>
          </string-name>
          ,
          <article-title>Incremental layer-wise self-supervised learning for eficient speech domain adaptation on device</article-title>
          ,
          <source>arXiv preprint arXiv:2110.00155</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Komori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kawai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mochizuki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ogawa</surname>
          </string-name>
          , Text-Only
          <source>Domain Adaptation Based on Intermediate CTC, Proc. Interspeech</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>2208</fpage>
          -
          <lpage>2212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dingliwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bodapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gandhe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Gadde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kirchhof</surname>
          </string-name>
          , Domain prompts:
          <article-title>Towards memory and compute eficient domain adaptation of ASR systems</article-title>
          , https://tinyurl.com/2a9jp88t,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Palaskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Meripo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Konam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Metze</surname>
          </string-name>
          ,
          <article-title>Asr error correction and domain adaptation using machine translation</article-title>
          ,
          <source>in: ICASSP 2020- 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>6344</fpage>
          -
          <lpage>6348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Del Rio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Delworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Westerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palakapilly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>