<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Zurich, Switzerland Institute of Computational Linguistics</institution>
        </aff>
      </contrib-group>
      <fpage>17</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>In this paper, we present the concept, content and experience with an actively running Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities. This video-based course is held in German, does not require any programming skills, and serves as an introduction to automatic text analysis. The target audience is anyone who is interested in applying basic language technology to text corpora. It has a strong empirical focus on digital representations, tools and corpus linguistics. The main goal thereby is to grasp the fundamental terminology and concepts of computational linguistics, to understand the main problems and solutions, as well as to know about the performance and limitations of current methods. Furthermore, manual annotation and data visualization are introduced in this course.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        More and more scientific disciplines use automatic
text analysis in their digital scholarship. In the
humanities, we have literary and cultural studies (e.g.
popularized as “distant reading”
        <xref ref-type="bibr" rid="ref8">(Moretti, 2013)</xref>
        ,
“corpus based discourse analysis”
        <xref ref-type="bibr" rid="ref1 ref14">(Sinclair, 2004;
Bubenhofer, 2009)</xref>
        etc.), empirical corpus
linguistics and computational social sciences (including
automatic media monitoring
        <xref ref-type="bibr" rid="ref11">(Reamy, 2016)</xref>
        ), but
text mining is also popular in the natural sciences,
for instance in the bio-medical domain
        <xref ref-type="bibr" rid="ref3">(Cohen and
Hunter, 2008)</xref>
        .
      </p>
      <p>Being able to apply Natural Language
Processing (NLP) methods to texts requires special
knowledge and skills. The goal of this course is not to
teach these skills, but to didactically introduce
important concepts and techniques related to digital
text representation and analysis. Therefore,
programming experience is neither required for this
introductory course nor provided in it.</p>
      <p>According to Ubell (2017), more than 58
million people have signed up worldwide for Massive
Open Online Courses (MOOCs) by now. This form
of distance learning in higher education has grown
popular over the last 6 years and several
commercial and non-commercial platforms compete for
participants.</p>
      <p>Our free course is held on Coursera1, one of
the largest commercial platforms that distributes
classes mostly held in English and created by
lecturers of top universities around the world. Our course
language is German2 which on the one hand has
the disadvantage of excluding participants who do
not speak German, but on the other hand, it allows
us to occupy a niche in language technology
focusing on German texts. A first session of the course
was run in summer 2015, and about 900 learners
visited the course at least once. Due to legal issues
between our university and Coursera, and due to
the introduction of Coursera’s new platform3 and
the resulting course migration effort, it took two
years to start the next session of our course.</p>
      <p>The rest of this paper is organized as follows: in
section 2, we introduce and motivate the syllabus of
our course, in section 3 we discuss our experience
from running the course twice so far.</p>
      <p>1The MOOC can be accessed via this link:
https://www.coursera.org/learn/digital-humanities.</p>
      <p>2All videos have German subtitles, which is especially
useful for users with a limited understanding of German. We
explicitly allow English contributions in the discussion forum
and peer assignments.</p>
      <p>3Coursera now offers all courses more flexibly on demand
by restarting each course regularly at intervals of several
weeks. Learners can now easily switch from one instance
of a course to the next if they cannot complete within their
initial learner cohort. According to Saraf (2017), these cohorts
improve the completion rate compared to purely self-paced
learning and still offer more flexibility.</p>
    </sec>
    <sec id="sec-2">
      <title>Course Structure</title>
      <p>The course is designed to run over a period of 6
weeks each of which has its own thematic focus.
Each thematic module consists of 30 to 90 minutes
of videos, which mostly use a fairly traditional
format where a lecturer presents slides and explains
NLP methods in an accessible and illustrative way.
In addition, we provide learner-oriented learning
objectives, more detailed readings regarding the
presented topics and further course material within
each module. In order to test the individual learning
progress, we integrated either a brief final quiz or
a peer assessment at the end of each module. The
course syllabus is structured as follows:</p>
      <sec id="sec-2-1">
        <title>Module 1 “Paths into the Digital World”</title>
        <p>This introductory module presents the fundamental
concepts and terminology regarding the digitization
of texts. We present techniques such as scanning
and OCR (Optical Character Recognition) as well
as other approaches for the acquisition of text
corpus material (including digital-born documents),
and we discuss potential problems related to
digitization and corpus design. Additionally, short
interviews about digitization techniques and the
relevancy of digitization with two experts from the
Zurich central library complete the first module.
Module 2 “Structured and Effective
Representation of Corpus Data”
The second module provides an overview of
different encodings, the markup language XML and the
TEI P5 standard for text representation. The second
half of the module has its focus on automatic
tokenization and sentence segmentation. Finally, in a
non-graded hands-on discussion prompt the learner
needs to apply the acquired XML knowledge
concerning well-formedness and identify syntax errors
in an XML document.</p>
        <p>
          Module 3 “Properties of Corpora and Basic
Methods for Analysis”
In this module, we present the basic concepts of
corpus linguistics such as term frequencies, n-grams,
collocations and methods for analyzing texts
according to Lemnitzer and Zinsmeister (2006). In
addition, we demonstrate the functionality of
various platforms and interfaces for corpus analysis and
show some hands-on corpus query examples. In the
last part of Module 3, we introduce the topic
“visual linguistics”
          <xref ref-type="bibr" rid="ref2">(Bubenhofer, 2016)</xref>
          together with
a variety of tools for displaying the properties of
texts in a creative, interactive and illustrative way.
Module 4 “Automatic Corpus Annotation Using
NLP Tools”
In this module, we introduce different automatic
corpus annotation methods, such as part-of-speech
tagging, lemmatization, stemming, parsing, Named
Entity Recognition, and Entity Linking
          <xref ref-type="bibr" rid="ref10">(Ratinov
and Roth, 2009)</xref>
          for automatic disambiguation.
Furthermore, we investigate potential problems and
sources of errors that can emerge while using such
automatic annotation tools and we offer approaches
to solving these issues.
        </p>
        <p>Module 5 “Manual Annotation and Evaluation
of Corpus Data”
The main topic of module 5 is the efficient
combination between manual and automatic annotation
and the integration of machine learning methods
in the vein of Pustejovsky and Stubbs (2013).
Subsequently, we present the most common metrics
for measuring the quality of NLP systems and
introduce the concept of inter-rater reliability. In the
second part of module 5, we focus on the
possibilities and restrictions of crowd-sourcing methods in
the digital humanities.</p>
        <p>Module 6 “Challenges in Multilingual Text
Analysis”
The last module concentrates on multilingual and
parallel corpora as well as on automatic language
identification in large-scale text collections. Finally,
we introduce several up-to-date tools for automatic
alignment of parallel corpora on the level of
documents, sentences and words.</p>
        <sec id="sec-2-1-1">
          <title>Assessments</title>
          <p>In terms of graded assignments, we integrate short
single and multiple choice quizzes ranging from 5
to 12 questions at the end of each module4.
Module 3 and Module 5 additionally include a graded
peer assignment where each learner is supposed to
assess at least two other submissions according to
detailed grading instructions. The peer assessment
in Module 3 encourages the learner to apply the
acquired knowledge on complex corpus queries.
By this means, each learner performs individual
queries on the IMS Open Corpus Workbench or the
COSMAS II interface, regarding diachronic
language change. Apart from this, the learner is
supposed to generate frequency charts or collocation</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4One module currently does not have a quiz.</title>
        <p>profiles and to interpret the findings and insights
gained from this task.</p>
        <p>The peer assignment in module 5 demands the
learner to run the online demo version of the
Stanford Named Entity Tagger (Finkel et al. (2005)) or
the Thomson Reuters Open Calais (Reuters (2008))
on a small sample text of his own choice and to
evaluate the NER taggers’s output according to the
evaluation metrics precision and recall that we
explained in this module. In this manner, peer
assessments motivate the learners to try out individually
different NLP tools and corpus query platforms,
and to question and critically analyze their output.</p>
        <sec id="sec-2-2-1">
          <title>Community building and feedback</title>
          <p>In order to enhance community building and
thematic exchange between enrolled learners, we
included a “Meet and Greet” discussion prompt
section in the first module as well as a “Feedback and
Thank You” discussion field at the very end of the
last module. For each module a weekly discussion
forum is automatically generated on the platform
where participants can ask or answer questions
regarding the content of a module. Additionally, for
each discussion prompt, individual threads are
automatically included in the weekly forum to
allow topic-related discussions and exchange. To
ensure a friendly discussion atmosphere and make
the learners feel well looked after, the course tutor
is actively present in the forums and tries to answer
or comment every contribution.
2.1</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Lecturers and tutors</title>
          <p>Three different lecturers teach in this course and
they agreed beforehand on the overall content,
syllabus and presentation style. After that, each
lecturer was responsible for developing his own
module content, preparing the slides and
organizing additional material. A student assistant
supported this process, cut the video recordings, added
some video effects (zooming, highlighting,
textual annotations, in-video quizzes in order to avoid
monotony) to the slide recordings and published
everything on Coursera’s electronic learning
management platform. All lecturers already had a lot
of teaching experience in the subjects of their
modules, yet, everyone had to invest a large amount
of time to fit the existing teaching material from
normal university classes into video sequences of
an appropriate length for online courses. Actually,
some of our videos are still too long by current
standards (5-7 minutes).</p>
          <p>Having 3 different lecturers makes the course more
varied and offers the learner slightly different
perspectives on the matter. In addition, every lecturer
was able to teach the topics he is more specialized
and experienced in.
2.2</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Building a studio and gaining recording experience</title>
          <p>For the time of the video recordings we turned an
office into a makeshift studio. We decided to record
the videos on our own and not by a multimedia
production team from our university, who, however,
instructed us kindly in the beginning. Although
the result would have looked more professional,
this gave us the urgently needed flexibility in
production as all lecturers had no prior experience in
teaching in front of a camera.</p>
          <p>The scene background was white with some
books and logos for ease of recognition (see Fig. 1).
Lighting was installed to keep the scene equally
illuminated without making the lecturers look pale.
The lecturers were filmed from the side while
sitting in order to offer a relaxed learning atmosphere.
Lecturers needed a while to learn to keep
eyecontact with the camera rather than looking at their
slides. Small slips of the tongue were accepted as
ingredients of natural talks. Larger wording errors
were cut out and required repeated recordings.
2.3</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Resources and NLP Tools</title>
          <p>
            As a running example, we use the diachronic and
multilingual corpus Text+Berg
            <xref ref-type="bibr" rid="ref16">(Volk et al., 2010)</xref>
            which allows us to illustrate many different NLP
tasks and exploitation techniques on a coherent
and academically freely available resource. This
corpus has texts mostly in French, German, and
Italian, some of them translated, and spans over a
period of 150 years.
          </p>
          <p>In our videos, we also mention, demonstrate and
reference a lot of other initiatives, resources,
frameworks, and open-source tools: (a) digitization
initiatives (Projekt Gutenberg, Europeana, TextGrid);
(b) OCR crowd-correction and crowd-sourcing in
general (TypeWright, Crowdflower, Artigo); (c)
online corpora and corpus query tools (COSMAS
II/DeReKo, DWDS, CQPweb); (d) parallel corpora
(EuroParl, Canadian Hansard); (e) sentence and
word alignment tools for parallel corpora
(InterText, HunAlign, GIZA++); (f) language
identification (lingua-ident, LangId); (g) text representation
standards (Unicode, UTF-8, XML, TEI-P5); (h)
annotation standards (STTS, Universal tags and
dependencies); (i) standard lexical and syntactic NLP
tools (Porter Stemmer, Durm Lemmatizer,
TreeTagger, Connexor-Tagger; chunkers and parsers);
(j) named entity recognition (Open Calais,
Stanford NER); (k) tools for manual annotation of
linguistic structures (and/or querying the annotations)
(WebAnno, ANNIS, EXMARaLDA, RSTTool); (l)
visualization (Graphviz, Leaflet, Gephi).
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>The field of language technology and NLP is a
rapidly evolving discipline. In the last 25 years,
systems based on hand-written rules and
applicationspecific algorithms have been largely superseded
by statistical systems that are typically built by
supervised or semi-supervised machine learning
techniques.</p>
      <p>
        In our course we reflect this paradigm change,
e.g. by contrasting the output of a rule-based
partof-speech tagger with a statistical one, and make
our participants aware of the different requirements
for these approaches (e.g. manually built training
material needed for supervised machine learning).
However, we do not introduce “Neural Deep
Learning” methods
        <xref ref-type="bibr" rid="ref7">(Manning, 2015)</xref>
        , which currently
dominate NLP research and already have an
impact on practical NLP systems. Our course design,
which roughly follows the traditional NLP pipeline
steps with language identification, tokenization,
part-of-speech tagging, syntactic analysis and
semantic analysis does not particularly fit the recent
trend for neural end-to-end systems
        <xref ref-type="bibr" rid="ref17">(Zhang et al.,
2015)</xref>
        , which – in the extreme – try to avoid these
steps altogether and favor purely character-based
approaches.
      </p>
      <p>For an introductory course targeting the basics of
text analysis for digital humanities and addressing
learners with a mostly arts and humanities
background, we strongly believe that our course
structure results in a better understanding of the
problems that one needs to tackle when processing
natural language.</p>
      <p>In addition, white box instead of black box
systems using valid features5 are most important for
digital humanities and linguistics: It is often crucial
to properly design linguistic meaningful features to
receive valid categories for understanding the
specificity of a text corpus or a linguistic phenomenon.
To give a simple example: Even if a statistical
model based on character n-grams turns out to
perform best for authorship attribution, this model is
of low interest for a linguistic research question on
writing styles. That is because character n-grams
do not represent a linguistic meaningful category
and it is unclear what a character n-gram measures.</p>
      <p>Even though: A follow-up intermediate course
clearly would need to focus more on distributional
(word embeddings and topic modeling) and neural
approaches, which, however, require more
knowledge in mathematics and programming skills.</p>
      <sec id="sec-3-1">
        <title>Active Learning</title>
        <p>Successful MOOCs have to offer more than just
recorded video streams of lectures. Freeman et
al. (2014) show that active learning settings
generally improve the learning outcome of participants.
Platforms such as Coursera offer several technical
solutions for making distance learning more than
passive consumption of videos. Individual user
activity for an active and enduring learning
experience is encouraged through several course items.
In-video-questions re-captivate the learner’s
attention and require brief reflections on recently learned
course content. Peer assignments encourage
learners to apply knowledge from the current module
and to try out NLP tools individually and critically
evaluate their actual performance. By assessing
other peers, further reflection and critical feedback
is demanded from the learners. In a hands-on video
in Module 3, we provide step-by-step instructions
for individual corpus analysis with the IMS Open
Corpus Workbench. A brief introduction of the
CQP query language allows the learner to issue
more complex queries. Furthermore, we constantly
invite the learner to apply his or her newly acquired
skills and do further experiments on his or her own
at the end of each module.</p>
        <sec id="sec-3-1-1">
          <title>5Valid for categories in the respective discipline.</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Community building and forum activity</title>
        <p>From the experience with the first session of the
course held in summer 2015, there is only a limited
need of the users for exchange in the forums. There
was some discussion on more advanced topics such
as dependency parsing which was mentioned in
Module 3, however, more formally introduced only
later in Module 4. In the past, the peer
assessments on the evaluation of named entity taggers
triggered some discussions, for instance, on the
question whether the German word
“Mittelmeerraum” (Mediterranean) should be recognized as a
toponym or not.</p>
        <p>Our course participants on Coursera come from
all over the world6, although naturally participants
from the German speaking countries dominate. The
participants have different backgrounds and
interests, in our current course 37% declare themselves
as higher education students. Others are either
looking for a job after graduation or already employed
and willing to expand their knowledge regarding
NLP for Digital Humanities.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Course development</title>
        <p>After successfully running our MOOC on the new
Coursera learning management system in Summer
2017, we fine-tuned our course for its future
iterations. We tried to respond to previous learner’s
feedback and to include a variety of small
adjustments such as smaller quizzes after each video
instead of longer quizzes at the end of each module.
We now provide guidelines at the beginning of the
MOOC and explain how the course can be used in
order to satisfy wide-ranging needs of learners with
different backgrounds, therefore easing
“cherrypicking” of certain course modules and not forcing
everybody into following the one-module-per-week
order. Additionally, we integrated more discussion
and reading prompts related to the course content
to maintain the learner’s active attention. A new
outlook section in the last module provides further
links and information on machine translation and
recent trends on applying Neural Network
methods in NLP. In October 2017, a new version of
the course goes live where learners will be able
to purchase a certificate provided by the platform
Coursera that can be helpful when seeking a job in
the field of Digital Humanities.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper presents the content of an ongoing
introductory MOOC on Natural Language Processing
for Digital Humanities. Any participant who
successfully completes this course will have a broad
overview on the problems and solutions for
automatically enriching and exploiting text corpora
(via visual exploration or more sophisticated
corpus queries). The course introduces the process
of digitization, corpus creation, text representation,
statistical analysis, visualization, automatic and
manual annotation on different linguistic levels, as
well as the challenges and benefits of multilingual
resources.</p>
      <p>
        As with any MOOC, the number of participants
that actually complete the course is only a small
fraction (5 to 12%) of all registered users
        <xref ref-type="bibr" rid="ref15">(Ubell,
2017)</xref>
        . When our course was run for the first time
in 2015, 46 participants achieved a certificate of
accomplishment out of 883 learners who actually
visited the course at least once. In the current
ondemand setup of the course that started in July
2017 we have a lower number of registered learners,
however, the majority of them seems to be actively
following the course.7
      </p>
      <p>The number of participants cannot be considered
“massive” in the literal sense of “Massive Open
Online Course”, however, MOOCs actually do not
need to have thousands of students. The strength
of courses like ours lies in their openness, in the
way they present and offer specialist knowledge to
interested people all over the world, and last but not
least, how they structure the learning process and
the topics in an accessible way and easily digestible
portions.</p>
      <sec id="sec-4-1">
        <title>Acknowledgment</title>
        <p>The production of our MOOC was financed by
the division “Digitale Lehre und Forschung (DLF)”
from the Faculty of Arts of the University of Zurich
(UZH). We would like to thank Anita Holdener
(DLF) for her constant technical support, Lukas
Meyer from “Multimedia &amp; E-Learning-Services
(MELS)” of the UZH for producing our promotion
video and for the introduction in video recording
he gave to us, and last but not least, Sara Wick, our
initiative student tutor and production assistant.
6Some of them are also motivated by the fact that the
course is given in German.</p>
        <p>7Regarding the participation in the course, we currently
have 211 active learner out of 293 enrolled learners.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Noah</given-names>
            <surname>Bubenhofer</surname>
          </string-name>
          .
          <year>2009</year>
          . Sprachgebrauchsmuster. Korpuslinguistik als Methode der Diskurs- und
          <string-name>
            <surname>Kulturanalyse</surname>
          </string-name>
          .
          <source>Sprache und Wissen</source>
          , 4. De Gruyter, Berlin, New York.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Noah</given-names>
            <surname>Bubenhofer</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Drei Thesen zu Visualisierungspraktiken in den Digital Humanities</article-title>
          .
          <source>Rechtsgeschichte Legal History - Journal of the Max Planck Institute for European Legal History</source>
          , (
          <volume>24</volume>
          ):
          <fpage>351</fpage>
          -
          <lpage>355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Bretonnel Cohen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Hunter</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Getting started in text mining</article-title>
          .
          <source>PLOS Computational Biology</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jenny</given-names>
            <surname>Rose</surname>
          </string-name>
          <string-name>
            <surname>Finkel</surname>
          </string-name>
          , Trond Grenager, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling</article-title>
          .
          <source>Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL</source>
          <year>2005</year>
          ),
          <volume>6</volume>
          :
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Scott</given-names>
            <surname>Freeman</surname>
          </string-name>
          , Sarah L. Eddy,
          <string-name>
            <surname>Miles</surname>
            <given-names>McDonough</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michelle</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>Nnadozie</given-names>
          </string-name>
          <string-name>
            <surname>Okoroafor</surname>
          </string-name>
          , Hannah Jordt, and Mary Pat Wenderoth.
          <year>2014</year>
          .
          <article-title>Active learning increases student performance in science, engineering, and mathematics</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>111</volume>
          (
          <issue>23</issue>
          ):
          <fpage>8410</fpage>
          -
          <lpage>8415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Lothar</given-names>
            <surname>Lemnitzer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Heike</given-names>
            <surname>Zinsmeister</surname>
          </string-name>
          .
          <year>2006</year>
          . Korpuslinguistik. Eine Einfu¨hrung. Narr, Tu¨bingen.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Last words: Computational linguistics and deep learning</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>41</volume>
          :
          <fpage>701</fpage>
          -
          <lpage>707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Franco</given-names>
            <surname>Moretti</surname>
          </string-name>
          .
          <year>2013</year>
          . Distant Reading. Verso Books, London.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>James</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>Amber</given-names>
            <surname>Stubbs</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Natural language annotation for machine learning</article-title>
          .
          <source>O'Reilly Media</source>
          , Sebastopol, CA.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Lev</given-names>
            <surname>Ratinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>CoNLL</source>
          ,
          <volume>6</volume>
          :
          <fpage>147</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Tom</given-names>
            <surname>Reamy</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep text: using text analytics to conquer information overload, get real value from social media, and add big(ger) text to big data</article-title>
          .
          <source>Information Today.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Thomson</given-names>
            <surname>Reuters</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Open calais demo</article-title>
          . http: //www:opencalais:com/opencalaisdemo/. Date accessed:
          <volume>20</volume>
          /07/
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Kapeesh</given-names>
            <surname>Saraf</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Life gets in the way: How Coursera is solving for the biggest challenge in online learning</article-title>
          . https: //blog:coursera
          <article-title>:org/life-getsway-coursera-solving-biggestchallenge-online-learning/</article-title>
          . Date accessed:
          <volume>20</volume>
          /07/
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>John</given-names>
            <surname>Sinclair</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Trust the Text</article-title>
          .
          <article-title>Language, Corpus and Discourse</article-title>
          . Routledge, London.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Robert</given-names>
            <surname>Ubell</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>MOOCs come back to earth</article-title>
          .
          <source>IEEE Spectrum</source>
          ,
          <volume>54</volume>
          (
          <issue>3</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Volk</surname>
          </string-name>
          , Noah Bubenhofer, Adrian Althaus, Maya Bangerter, Lenz Furrer, and
          <string-name>
            <given-names>Beni</given-names>
            <surname>Ruef</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Challenges in building a multilingual alpine heritage corpus</article-title>
          .
          <source>Seventh International Conference on Language Resources and Evaluation (LREC).</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Xiang</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <source>Junbo Zhao, and Yann LeCun</source>
          .
          <year>2015</year>
          .
          <article-title>Character-level convolutional networks for text classification</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          , pages
          <fpage>649</fpage>
          -
          <lpage>657</lpage>
          . Curran Associates, Inc.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>