<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Compiling a Large Swiss German Dialect Corpus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuela Weibel</string-name>
          <email>manuela.weibel@idiotikon.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muriel Peter</string-name>
          <email>muriel.peter@idiotikon.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Schweizerisches Idiotikon</institution>
          ,
          <addr-line>Auf der Mauer 5, 8001 Z u ̈rich</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Schweizerisches Idiotikon</institution>
          ,
          <addr-line>Auf der Mauer 5, 8001 Zu ̈ rich</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1998</year>
      </pub-date>
      <volume>3</volume>
      <abstract>
        <p>The Swiss German Dialect Corpus (Schweizer Mundartkorpus CHMK) is an initiative launched by the Swiss German dictionary Schweizerisches Idiotikon. It is an unbalanced, opportunistic corpus and the largest dialect corpus for Swiss German to date. The corpus will be accessible through a query engine and, in part, as an open-source XML corpus. In this paper we provide an overview of the concept, workflow, and challenges of compiling a corpus for a non-standard linguistic variety.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <sec id="sec-1-1">
        <title>The language situation in</title>
      </sec>
      <sec id="sec-1-2">
        <title>German-speaking Switzerland</title>
        <p>
          Switzerland has four official languages (German,
French, Italian, and Romansh) with German
being the most widely spoken (more than 60
percent1). Within German-speaking Switzerland, two
varieties of the same language co-exist and are
used in separate and distinct situations: Swiss
German and standard German. Such a situation is
widely referred to as a diglossia
          <xref ref-type="bibr" rid="ref12 ref2 ref2 ref3 ref3 ref7">(cf. Ferguson,
1959; Rash, 1998; Christen, 2019; Christen and
Schmidlin, 2019)</xref>
          . One variety, in this case Swiss
German, is commonly used for everyday, mostly
spoken communication. It is usually not codified
and is associated with informal situations. The
second variety, standard German, is highly
codified and used in formal settings such as school,
political debates, information programmes on
national radio or television as well as for written texts
          <xref ref-type="bibr" rid="ref12 ref2 ref3">(Christen and Schmidlin, 2019, p. 208)</xref>
          .
        </p>
        <p>
          The Swiss German dialects are part of the
Alemannic dialect group and form a continuum: there
are no clear-cut boundaries between the
different dialects; some phenomena occur in more than
one dialect, while others are unique. Moreover,
due to the small-scale nature of the linguistic
areas, a lot of variation occurs within Swiss
German dialects. And even though Swiss German
is increasingly being used for written
communication, there is no standardised orthography.
Attempts to introduce a standard for writing Swiss
German dialects have had varying degrees of
success
          <xref ref-type="bibr" rid="ref12 ref13 ref2 ref3">(cf. Rash, 1998; Siebenhaar, 2013;
Christen and Schmidlin, 2019)</xref>
          . One such attempt is
the “Schwyzertu¨tschi Diala¨ktschrift”, introduced
by
          <xref ref-type="bibr" rid="ref4">Eugen Dieth in 1938</xref>
          and updated by
Christian Schmid-Cadalbert in 1986. It applies to all
Swiss German dialects and is widely used by
linguists and dialectologists today
          <xref ref-type="bibr" rid="ref12">(cf. Scherrer et al.,
2019)</xref>
          . However, as none of the standards have
ever been taught at school, they have not been
implemented by a significant number of people.
There is no political intent for establishing a
standard Swiss German, nor is there a need for it, as
Swiss German speakers may make use of
standard German in order to be understood in other
German-speaking countries. These three factors
– the dialect continuum, the small-scale linguistic
landscape, and the lack of a standardised
orthography – result in a large degree of variation in written
Swiss German
          <xref ref-type="bibr" rid="ref2 ref3">(cf. Christen, 2019)</xref>
          .
        </p>
        <p>
          German-speaking Switzerland has a rich
tradition of dialect literature. Beyond that, its use in
written communication was limited in the past.
However, with the development of text message
services and social media, the number of texts
written in Swiss German has increased
significantly over the last 20 years
          <xref ref-type="bibr" rid="ref12 ref2 ref3">(Samardzˇic´ et al.,
2015; Christen and Schmidlin, 2019)</xref>
          .
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.2 Intentions of the project</title>
        <p>Possibilities concerning natural language
processing (NLP) for Swiss German have long been
restricted due to an insufficient number of
available texts. With the rise of Swiss German text
resources, Switzerland’s language technology
research has recently shifted its focus: first efforts
towards building dialect corpora and developing
NLP tools for Swiss German have been made
since 2009 (see subsection 2.1 for a more detailed
view on related research). With the Swiss
German Dialect Corpus (Schweizer Mundartkorpus
CHMK), we intend to further facilitate this
development by building the largest integral corpus
of Swiss German dialects so far. The
Schweizerisches Idiotikon is able to provide the required
human and material resources for the compilation
of such a corpus – most importantly, an extensive
library of dialect literature.</p>
        <p>As a research institution that is dedicated to
the documentation of Swiss German dialects, the
Schweizerisches Idiotikon intends to provide a
platform for existing and future Swiss German
dialect corpora.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work 2</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Language technology for Swiss German</title>
        <p>
          The Swiss SMS corpus compiled by Du¨rscheid and
Stark (2011) features nearly 26,000 short
messages in different languages and dialects with
more than 40 percent of the messages written
in Swiss German. The corpus is part-of-speech
tagged. Normalisation2 was conducted by means
of interlinear glossing with a specifically
developed annotation tool
          <xref ref-type="bibr" rid="ref14 ref9">(Ruef and Ueberwasser,
2013)</xref>
          and following a continuously updated set of
annotation guidelines
          <xref ref-type="bibr" rid="ref14 ref9">(cf. Ueberwasser, 2013)</xref>
          .
        </p>
        <p>NOAH’s corpus by Hollenstein and Aepli
(2014) is a comparably small collection of
manually part-of-speech tagged Swiss German texts.
Hollenstein and Aepli adapted the
StuttgartTu¨bingen-Tagset (STTS), a part-of-speech tagset
widely applied in NLP for standard German,
accounting for the morphosyntactic particularities of
2Word normalisation describes the process of identifying
multiple forms of a word and assigning a single normal form.
Swiss German dialects.</p>
        <p>The ArchiMob corpus by Samardzˇic´ et al.
(2016) provides transcriptions of spoken Swiss
German. Samardzˇic´ et al. (2015) conducted a
range of experiments in order to automate
annotation steps: they examined different methods based
on machine translation in order to normalise the
corpus. Furthermore, part-of-speech tagging was
applied semi-automatically, based on the tagset
established by Hollenstein and Aepli (2014).</p>
        <p>For the project What’s up, Switzerland?,
Ueberwasser and Stark (2017) collected more than one
million WhatsApp messages. Nearly half of the
messages are written in Swiss German. The
corpus is due for publication in spring 2020.</p>
        <p>It is our goal to integrate existing corpora into
the CHMK wherever licences are compatible and
we plan on using the part-of-speech tagset
provided by Hollenstein and Aepli (2014).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Schweizer Textkorpus CHTK</title>
        <p>The Swiss Text Corpus CHTK is a balanced
reference corpus of the written standard language in
German-speaking Switzerland in the 20th and 21st
centuries. It was established in 2000 at the
University of Basel, with the aim of assembling a
corpus of standard German texts from Switzerland
and ensuring its accessibility and continuation. In
2014, the corpus was transferred to the
Schweizerisches Idiotikon, where it has been maintained
ever since. Various insights gained from this
corpus can be applied to the CHMK. A short overview
of the CHTK will contrast its selection criteria
with those we have defined for the new dialect
corpus (see subsection 3.2).</p>
        <p>The texts in the CHTK were chosen based on
different criteria regarding form, content, and time
of publication. Hard criteria, which had to be
met, and soft criteria were established. Apart from
the language in which they were written
(standard German) and the time of publication,
further hard criteria included form and content of the
texts. Four categories of work were defined, which
had to be represented evenly: fiction, non-fiction,
functional texts, and journalistic texts.</p>
        <p>Due to the limited number of texts available for
the 20th century, soft criteria such as the author’s
regional origin within Switzerland and their
gender could not always be taken into consideration.
For further information on the CHTK see Bickel
et al. (2009).
genre
prose
poetry
drama
mixed
total
books</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Swiss German Dialect Corpus CHMK</title>
      <sec id="sec-3-1">
        <title>Application</title>
        <p>The Swiss German Dialect Corpus will allow a set
of practical applications. A corpus query engine
will provide an interface for linguistic research.
The engine will build upon the knowledge and
expertise gained when working on the CHTK3. In
addition, the corpus will serve as a base for the
lexicographic work on the Swiss German dictionary
Schweizerisches Idiotikon. Finally, all
copyrightfree texts of the corpus will be made fully
accessible in XML format and under an open-source
licence. This way, we intend to provide a tool for
the enhancement of language technology research
for Swiss German dialects.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Selection criteria</title>
        <p>In contrast to the previously compiled reference
corpus CHTK, we have decided that the new
dialect corpus will be an unbalanced corpus of Swiss
German texts. This decision is mainly based on
the fact that an equal distribution of text genres and
dialects is difficult to ensure for a low-resource
language such as Swiss German, and would pose
an unnecessary constraint on the number of
eligible texts per criterion.</p>
        <p>There has been a recent increase of available
texts written in Swiss German, one important
source for non-fictional Swiss German texts
being the Alemannic Wikipedia4. Nevertheless, the
list of represented text genres shows a
considerable bias towards fiction and poetry: technical
texts such as medical essays or instruction
manuals written in Swiss German are difficult to obtain.</p>
        <p>Due to a lack of availability, we also have to
abstain from weight criteria concerning authors’
3The CHTK corpus query tool is based on the open
source search engine ddc-concordance
(http://www.ddcconcordance.org/).</p>
        <p>4https://als.wikipedia.org.
gender and dialect distribution. As a result, the
CHMK selection criteria forgo any weighting and
are reduced to the following:
1. The texts need to be written in Swiss German.
2. The texts need to be from 1800 AD or later.
3. The dialect must be clearly indentifiable, i.e.</p>
        <p>the canton or region must be known.</p>
        <p>While the limited number of criteria ensures a
larger number of eligible texts, it may
simultaneously lead to an overrepresentation of certain
dialects and an unbalanced gender distribution. It is
therefore important that users of our corpus take
into account that we do not intend to represent the
linguistic reality of German-speaking Switzerland
but to gather and provide as much data as possible.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Processing</title>
        <p>The Swiss German Dialect Corpus will potentially
serve many different purposes. Thus, it is crucial
that we meticulously document the metadata of
the gathered sources; author biographies, text
categorisation and dialect information will later
enhance a wide range of linguistic analyses.
Detailed metadata also allow for more specific
machine learning training, i.e. when it comes to
training a language model for a certain dialect. Last but
not least, well-documented metadata will simplify
the creation of copyright-dependent and other
subcorpora.</p>
        <p>So far, we have collected and scanned over 600
books, adding up to over 90,000 pages. Since
optical character recognition (OCR) has not yet been
performed, we can only estimate the actual
number of tokens. Taking into consideration different
average word counts depending on the text genre,
we expect our scanned texts to contain a total of
over 27 million tokens (see Table 1).</p>
        <p>When estimating the potential number of Swiss
German tokens, we deduct 10 pages per book,
accounting for illustrations, titles and index pages, as
well as other sections generally written in standard
German (introduction, epilogue, author biography,
bibliography). This results in a total of
approximately 25.3 million Swiss German tokens.</p>
        <p>In order to acquire more recent dialect
literature, we have downloaded 900 individual
webpages, counting approximately 500,000 tokens in
total.</p>
        <p>Before any further texts are gathered, the data
obtained so far will be processed and linguistically
annotated. This will allow for an earlier
publication of a first release. Moreover, repeating the
workflow for further releases will lead to an
additional refinement of the process.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Prospect</title>
      <p>
        In a next step, the scanned texts will be
converted into digital text using OCR. For this
purpose, we have teamed up with a group of
developers in Wu¨rzburg (Germany), using their
system OCR4all5. OCR4all is an open-source
software which combines state-of-the-art OCR
components and continuous model training into a full
workflow. Primarily developed to analyse
historical printings, the system not only performs well
on modern fonts, it also outperforms commercial
state-of-the-art tools when applied to 19th century
Fraktur scripts
        <xref ref-type="bibr" rid="ref8">(cf. Reul et al., 2019)</xref>
        . Given that
almost a sixth of our scanned books are, in fact,
printed in Fraktur typeface, this is an important
asset for our project. An included module allows for
the correction of errors and training of new
recognition models. Tests will be carried out to evaluate
the cross-dialectal range of a trained model and
assess the need for a dialect classifier.
      </p>
      <p>Given the relatively large size of our corpus,
it is important that we automate as many of the
processing steps as possible. For this reason, we
intend to apply automated methods for
part-ofspeech tagging and normalisation as assessed by
Samardzˇic´ et al. (2015).</p>
      <p>The absence of a writing standard for Swiss
German paired with large lexical and
phonological differences across German-speaking
Switzerland result in substantial orthographic
inconsistencies. These can be observed, not only on an
inter-dialectal level or between different writers,
5https://www.uni-wuerzburg.de/zpd/ocr4all.
but even on an intra-writer level. When providing
a corpus query engine for non-standard linguistic
varieties such as Swiss German, it is therefore
crucial that the data is normalised. Both the ArchiMob
Corpus and the Swiss SMS Corpus have developed
their own normalisation guidelines. In an effort to
harmonise existing and future corpora, we plan to
establish a normalisation standard for Swiss
German and apply it to our corpus. In case of
compatible licences, we will also apply it to existing
corpora and integrate these into our first release, thus
converting the Swiss German Dialect Corpus to a
central platform for all dialect corpora for Swiss
German.</p>
      <p>Diglossia.</p>
      <p>WORD,
Nora Hollenstein and Noe¨mi Aepli. 2014.
Compilation of a Swiss German Dialect Corpus and its
Application to PoS Tagging. In Proceedings of the
First Workshop on Applying NLP Tools to Similar
Languages, Varieties and Dialects, pages 85–94,</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Hans</given-names>
            <surname>Bickel</surname>
          </string-name>
          , Markus Gasser, Annelies Ha¨cki Buhofer, Lorenz Hofer, and
          <string-name>
            <surname>Christoph</surname>
            <given-names>Scho¨n. 2009. Schweizer</given-names>
          </string-name>
          <string-name>
            <surname>Text Korpus - Theoretische Grundlagen</surname>
          </string-name>
          ,
          <article-title>Korpusdesign und Abfragemo¨glichkeiten</article-title>
          .
          <source>Linguistik Online</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Helen</given-names>
            <surname>Christen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Alemannisch in der Schweiz</article-title>
          .
          <source>In Hanna Fischer and Brigitte Ganswindt</source>
          , editors,
          <source>Deutsch: Sprache und Raum-Ein internationales Handbuch der Sprachvariation</source>
          , volume
          <volume>30</volume>
          of Handbu¨cher zur Sprach- und
          <string-name>
            <surname>Kommunikationswissenschaft</surname>
          </string-name>
          , pages
          <fpage>246</fpage>
          -
          <lpage>279</lpage>
          . Walter de Gruyter GmbH &amp; Co KG.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Helen</given-names>
            <surname>Christen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Regula</given-names>
            <surname>Schmidlin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Die Schweiz. Dialektvielfalt in mehrsprachigem Umfeld</article-title>
          . In Rahel Beyer and Albrecht Plewnia, editors,
          <source>Handbuch des Deutschen in West- und Mitteleuropa: Sprachminderheiten und Mehrsprachigkeitskonstellationen</source>
          , pages
          <fpage>193</fpage>
          -
          <lpage>244</lpage>
          . Narr Francke Attempto Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Eugen</given-names>
            <surname>Dieth</surname>
          </string-name>
          .
          <year>1938</year>
          .
          <article-title>Schwyzertu¨tschi Diala¨ktschrift: Leitfaden nach den Beschlu¨ssen der Schriftkommission der Neuen helvetischen Gesellschaft, Gruppe Zu¨rich</article-title>
          . O. Fu¨ssli.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Eugen</given-names>
            <surname>Dieth</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Schmid-Cadalbert</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>Schwyzertu¨tschi Diala¨ktschrift</article-title>
          . Sauerla¨nder, Aarau,
          <volume>2</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Christa Du¨rscheid and Elisabeth Stark</source>
          .
          <year>2011</year>
          .
          <article-title>sms4science: An International Corpus-Based Texting Project and the Specific Challenges for Multilingual Switzerland</article-title>
          , pages
          <fpage>299</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Charles A.</given-names>
            <surname>Ferguson</surname>
          </string-name>
          .
          <year>1959</year>
          .
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>325</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Christian</given-names>
            <surname>Reul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Christ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Hartelt</surname>
          </string-name>
          , Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Bu¨ttner, and Frank Puppe.
          <year>2019</year>
          .
          <article-title>OCR4all-An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>9</volume>
          :
          <fpage>4853</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Beni</given-names>
            <surname>Ruef</surname>
          </string-name>
          and
          <string-name>
            <given-names>Simone</given-names>
            <surname>Ueberwasser</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The Taming of a Dialect: Interlinear Glossing of Swiss German Text Messages</article-title>
          . In Marcos Zampieri and Sascha Diwersy, editors,
          <source>Nonstandard Data Sources in Corpus-based Research</source>
          , ZSM-Studien,
          <article-title>Schriften des Zentrums Sprachenvielfalt und Mehrsprachigkeit der Universita¨t zu Ko¨ln</article-title>
          , pages
          <fpage>61</fpage>
          -
          <lpage>68</lpage>
          . Shaker Verlag, Aachen.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Tanja</surname>
            <given-names>Samardzˇic´</given-names>
          </string-name>
          , Yves Scherrer, and
          <string-name>
            <given-names>Elvira</given-names>
            <surname>Glaser</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Archimob - a corpus of spoken swiss german</article-title>
          .
          <source>In Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ),
          <source>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), pages
          <fpage>4061</fpage>
          -
          <lpage>4066</lpage>
          . s.n.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Tanja</surname>
            <given-names>Samardzˇic´</given-names>
          </string-name>
          , Yves Scherrer, and
          <string-name>
            <given-names>Elvira</given-names>
            <surname>Glaser</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Normalising orthographic and dialectal variants for the automatic processing of Swiss German</article-title>
          .
          <source>Proceedings of the 7th Language and Technology Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Yves</given-names>
            <surname>Scherrer</surname>
          </string-name>
          , Tanja Samardzˇic´, and
          <string-name>
            <given-names>Elvira</given-names>
            <surname>Glaser</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Digitising Swiss German: how to process and study a polycentric spoken language</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>53</volume>
          (
          <issue>4</issue>
          ):
          <fpage>735</fpage>
          -
          <lpage>769</lpage>
          . Publisher: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Beat</given-names>
            <surname>Siebenhaar</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Sprachgeographische Aspekte der Morphologie und Verschriftung in schweizerdeutschen Chats</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Simone</given-names>
            <surname>Ueberwasser</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Non-standard data in Swiss text messages with a special focus on dialectal forms</article-title>
          .
          <source>In Marcos Zampieri and Sascha Diwersy</source>
          , editors,
          <source>Non-standard Data Sources in Corpus-based Research</source>
          , ZSM-Studien,
          <article-title>Schriften des Zentrums Sprachenvielfalt und Mehrsprachigkeit der Universita¨t zu Ko¨ln</article-title>
          , pages
          <fpage>7</fpage>
          -
          <lpage>24</lpage>
          . Shaker Verlag, Aachen.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Simone</given-names>
            <surname>Ueberwasser</surname>
          </string-name>
          and
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>Stark</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>What's up, Switzerland? A corpus-based research project in a multilingual country</article-title>
          .
          <source>Linguistik Online</source>
          ,
          <volume>84</volume>
          (
          <issue>5</issue>
          ):
          <fpage>105</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>