<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Access to Media Collections and Archives Using Computational Linguistic Tools</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>James Pustejovsky, Marc Verhagen Department of Computer Science Brandeis University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Nancy Ide, Keith Suderman Department of Computer Science Vassar College</institution>
        </aff>
      </contrib-group>
      <fpage>19</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>In this paper, we outline the strategies, methodology, and infrastructure needed to bring advanced computational linguistic tools to researchers and archivists in the humanities. We discuss three use cases involving the application of the Language Application Grid (LAPPS), an open, web-based infrastructure providing interoperable access to hundreds of computational linguistic (CL) component web services, together with facilities for multistep analyses via tools pipelining, performance evaluation, and resource delivery. These include: CL analysis of corpora restricted under copyright; the challenge posed by radio and television media collections; and the use of LAPPS for assisting archivists in their collection and cataloguing efforts. We believe that the adoption and use of CL platforms such as LAPPS by the digital humanities (DH) will help foster better communication, sharing, and research between the two communities.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the 1960s, the fields that are now called “computational linguistics" and “digital
humanities" were not recognized as distinct [
        <xref ref-type="bibr" rid="ref19 ref20 ref9">9, 19, 20</xref>
        ]. In the 1970s, when
computational linguistics (CL) began to be heavily influenced by advances in the field of
Artificial Intelligence and adopted logic- and rule-based, symbolic methods,
“Humanities Computing" retained the fundamentally statistical approach prevalent in
the previous decade. Over the ensuing 40 years, the two fields have evolved in
relative isolation. Some efforts were made in the early 1990s to reunite the two when
CL once again took up statistical methods, using the argument that statistical
methods adopted and adapted in CL had much to offer the field of digital humanities,
and also that digital humanities, with its vast store of creative language data,
provided a challenge to current methods that could yield fresh insights into the ways
language conveys meaning. These efforts failed, and as a result, the two fields
continued to evolve along their own paths. CL pushed empirical methods forward into
machine learning and, most recently, “deep learning" involving neural networks
and similar structures, while humanities computing evolved along a very different
path, encompassing the creation, maintenance, and use of massive libraries of
digitized data, including not only literary, historical, and similar texts but also images,
audio, and video, representing artifacts relevant to the arts and humanities. Thus
the term “digital humanities" was coined.
      </p>
      <p>
        Within the past few years, Digital Humanities (DH) has looked to CL for
methods to enable richer analysis of literary, historical, and other kinds of documents,
recognizing that CL methods and procedures can in fact enhance the kinds and
amount of information that can be automatically extracted from language data [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
However, re-marrying the two disciplines has proven non-trivial [
        <xref ref-type="bibr" rid="ref18 ref20">18, 20</xref>
        ]. The
difficulty is invariably attributed to a lack of accommodation in CL tools for users who
are not technically inclined, and indeed, this is largely true. However, the roots of
the problem go far deeper, stemming from two complementary factors: differences
in the goals for which the same analyses are applied in each area, and differences
in the methodological norms and perspective of the researcher.
      </p>
      <p>In this paper, we outline a general methodology towards accomplishing the
goal of re-integrating the two fields, and list the requirements on what tools are
needed by humanities researchers and archivists. We first review the platform of
the LAPPS Grid, and then examine three case studies of how the platform can help
the humanities scholar and archivist in their research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Language Applications (LAPPS) Grid</title>
      <p>Over the past ten years, there has been increased activity in efforts to integrate
Human Language Technologies (HLT) applications, corpora, as well as development
platforms. This stems from an obvious and growing demand for robust language
processing capabilities across academic disciplines, education, and industry.
However, one of the major problems in this area has been and remains component
interoperability, reusability, and integration. This has resulted in much of the field of
HLT being fragmented, characterized by a lack of standard practices, few widely
usable and reusable tools and resources, and much redundancy of effort. Rapid
development and deployment of HLT applications has also been hindered by the lack
of ready-made, standardized evaluation mechanisms, especially those which
enable evaluation of component performance in applications consisting of a pipeline
of processing tools.</p>
      <p>
        To address these problems, we have developed an open, web-based
infrastructure, called the LAPPS Grid1[
        <xref ref-type="bibr" rid="ref10 ref21">10, 21</xref>
        ], that provides interoperable access to
hun1Funded by the NSF-SI2 initiative.
dreds of HLT component web services, together with facilities for multi-step
analyses via tools pipelining, performance evaluation, and resource delivery for a wide
range of language resources [
        <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
        ]. As an easy-to-use interface and management
system, the LAPPS Grid has adopted the Galaxy framework2 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a robust,
welldeveloped platform for workflow configuration and management, and persistence
of results. The LAPPS Grid affords the possibility of creating ready-made
workflows to perform specific analytic tasks that can be used off-the-shelf or customized
to accommodate specific projects, as well as means to compose and evaluate
workflows from atomic NLP components [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The LAPPS/Galaxy platform can be
accessed through a web interface (http://www.lappsgrid.org), deployed locally on
any Unix system, or run from the cloud. Figure 1 provides an overview of the
LAPPS Grid architecture.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Language Analysis within the HathiTrust Data Capsule</title>
      <p>
        In our first case study, we examine the role that the LAPPS Grid can play in
ensuring access for CL tools over copyright-restricted content, specifically, over the
HathiTrust Library [
        <xref ref-type="bibr" rid="ref5">17, 5</xref>
        ]. The HathiTrust Digital Library comprises the digitized
representations of 13.68 million volumes, 6.84 million book titles, 359,528 serial
titles, and 4.79 billion pages. Approximately 39% of the items in the HathiTrust
corpus are digital representations of print volumes in the public domain. The
remaining 61% are works under copyright. Because of copyright restrictions,
scholars have come to see this 61% of the HathiTrust collection of volumes as sitting
2Funded by NSF awards 0543285 and 0850103.
behind a ‘copyright wall’ that makes it next to impossible for them to have
meaningful access to their content.
      </p>
      <p>
        The HathiTrust Research Center (HTRC) develops software infrastructure,
models, and tools to help digital humanities (DH) scholars conduct new
computational analyses of works of the HathiTrust corpus, with a focus on analysis of
larger datasets than can be done today (what they call “analysis at scale”). One of
the key infrastructure components of HTRC is the Data Capsule (DC). Recently,
LAPPS/Galaxy has been adopted by a Mellon-funded project at the University of
Illinois, which is utilizing the platform to apply sophisticated HLT text mining
methods to HTRC’s massive digital library3. The HTRC’s DC Project involves a
collaboration between Illinois, Indiana, Brandeis, Oxford, and Waikato
Universities [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Working with our Illinois and Indiana collaborators, the project is focused
on implementing specific LAPPS tools that are most needed by the digital
humanities scholar within the HTRC user community.
      </p>
      <p>
        The HTRC Data Capsule [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], shown below in Figure 2, is a solution to
provisioning secure researcher access directly to the raw data objects of the HathTrust.
      </p>
      <p>The goals of the present work include: the deployment of tools that enhance
search and discovery across the library by complementing traditional volume-level
bibliographic metadata with new metadata, using specially-developed
LAPPS/Galaxybased CL applications; the creation of Linked Open Data resources to help scholars
find, select, integrate and disseminate a wider range of data as part of their
scholarly analysis life-cycle; a set of exemplar pre-built Data Capsules that incorporate
tools commonly used by both the DH and CL communities that scholars can then
customize to address their specific needs.</p>
      <p>The initial work carried out within the WCSA+DC research involved an
integrated effort of studying the needs and requirements of the DC users: that is,
identifying those NLP web services that have already been wrapped and integrated
into the LAPPS Grid, as well as modules that are not yet available. This is being
followed by the integration of document-level and document collection processing
(genre and topic identification) modules into the DC, as well as the most basic
lowlevel processing (sentencization, tokenization, and POS tagging). The next level
of processing planned includes more the computationally intensive NLP modules,
such as finding "Named Entities" such as cities, countries, people, etc., as well
as performing various levels of constituent- and dependency-based parsing at the
sentence level. This will be followed by a detailed evaluation of the NLP services.
This involves: (a) assessing the overall performance of each component service
within the Data Capsule; and (b) examining the possible workflow configurations
of the different services as configured in distinct pipelines to determine the optimal
configuration in terms of performance. The ability to apply a cyclic process of
iterative testing, evaluation, and re-configuration is particularly important for rapid
development of workflows to suit specific user needs, and is one of the benefits
offered by adopting the LAPPS Grid framework.
4</p>
    </sec>
    <sec id="sec-4">
      <title>CL-Facilitated Access to Media Collections</title>
      <p>While the previous use case looked at how HLT tools can assist in the discovery
of content in text-based corpora that are subject to limited access due to copyright
restrictions, the second use case we consider examines the role that CL tools can
play in broadcast media collections. We are currently examining the content of the
American Archive of Public Broadcasting, a collaboration between WGBH and the
Library of Congress4.</p>
      <p>
        The last sixty years of our shared history and culture has been well documented
on broadcast media. Many important events, persons, issues, and conflicts have
been recorded and discussed in programs at the national and local levels on public
television and radio, but much of this material is currently inaccessible and has
yet to become part of the historical record. Scholars have long recognized the
value of media for the evidence such material can provide about the past as well
as the manner in which the public has experienced the news. Likewise, educators
have appreciated the ability of audiovisual materials to make history come alive
in the classroom setting. Both scholars and educators have been frustrated by the
difficulties associated with accessing these materials [
        <xref ref-type="bibr" rid="ref14">8, 14</xref>
        ]. From our discussions
with archivists and historians, it seems that making historical public broadcasting
programs accessible and searchable would be a great enabler for scholarship.
      </p>
      <p>
        Broadcast media is especially important because of the era it reflects, such as
that contained in the American Archive. Broadcast media, once it is made
accessible, will add rich archival material to enhance the historical record. Not only is
much of our broadcast media history inaccessible, but it is also in danger of
degrading and becoming lost to posterity if it remains much longer on station and archival
shelves. Reformatting to a digital format is necessary for long-term preservation,
but digitizing is only the first step towards improved access of this material for use
by scholars and educators. Most materials held in storage by stations contain
minimal descriptive information, in many cases only a program title [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Once this
material is digitized, cataloging becomes an extraordinarily labor intensive endeavor.
Using CL applications for language-based analysis can help extract significantly
more descriptive information; however, optimizing these tools for humanities
research requires a digital history team working closely with the CL team to map the
output to historical events, places, people, and themes; iteratively improving the
computation by revising the assumptions the tool makes; and provide historical
interpretations of the new data. These are some of the tasks we are currently carrying
out with WBGH and their affiliates.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>A Language Application Toolkit for Archivists</title>
      <p>
        As our final case study, we examine the role that an HLT platform such as LAPPS
Grid can play in helping digital archivists manage their collections [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We have
recently begun collaborating with several media and archival organizations to explore
the applicability of the LAPPS Grid platform to the specific needs of cataloguing,
indexing, and retrieving data from media collections. In particular, we have
partnered with WGBH of Boston to determine how the configuration and combination
of existing computational linguistic (CL) tools can significantly transform the way
archivists and librarians describe their media collections. Using WGBH’s corpus
of archival video and audio transcripts and metadata as a research data set, we
have started to develop a toolkit that will be evaluated on its ability to create and
enhance metadata and improve discoverability of large and diverse media
collections, allowing for substantial progress in the effort and time spent creating and
improving records from their collections. This toolkit will be built on top of the
Language Application (LAPPS) Grid and will leverage both the tools and workflow
composer environment already present in the LAPPS Grid framework.
      </p>
      <p>
        Audio and video media are, by definition, not text, and therefore opaque to
text search engine capabilities. Finding content relevant to one’s research question
among thousands of hours of programming is time-consuming, involving
watching and/or listening to potentially hours of content in order to zero in on relevant
content [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The availability of descriptive, structured, textual metadata about the
content of these collections and about the items they include radically improves
search and browse capabilities for researchers; however, the effort to fully describe
and catalog these materials is highly labor-intensive and therefore costly.
      </p>
      <p>
        Such a toolkit is an excellent example of how current CL tools can be
configured and combined into a drag-and-drop toolkit and incorporated into archival
accessioning and cataloging workflows to significantly ease the work involved in
creating rich, descriptive metadata records for each item. The toolkit extends the
current capabilities of LAPPS to include tools (already available in projects such
as Alveo [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) for accessing text content in audio and video, as well as access to
the publicly available audio and video materials in the WGBH archives. By
coupling these tools with sophisticated CL modules for information retrieval, question
answering, and text mining, the goal is to be able to create composite workflows
to extract and analyze information gleaned from these resources. Through a
process of iterative refinement involving both testing of tools and augmentation of
supporting resources, we will develop a set of optimal workflows for information
extraction from audio and video and evaluate the results on both the collection and
individual item levels, to determine the degree to which the annotation process is
facilitated. These ready-made workflows will be made available to archivists who
can customize them for particular domains and applications, augment supporting
resources with additional data, either extracted in earlier steps or derived from
other sources. It is our expectation that the project will ultimately produce a set
of ready-made (but still customizable) workflow ‘packages’ that will dramatically
reduce the time and cost of metadata production for digital archival materials.
      </p>
      <p>Current archival practice involves the need to dedicate many human hours to
create, normalize, and catalog collections; however, cataloging is so time-consuming
that it is often the case that collections are put into a queue for cataloging,
creating a huge backlog of unprocessed collections. By using such a toolkit, cataloging
will be incorporated into the acquisitions workflow and will become a duty of the
computer, allowing humans to reallocate their time to work on tasks that still
require a human to perform. Instead of catalogers watching hour-long programs and
recording descriptive information about the material, the LAPPS toolkit would
automate creation of metadata such as creating speech-to-text transcripts, identifying
proper names, locations, organizations, and even dates, and would perform
metadata clean-up and normalization, and the output of the system could be ingested
into the archives’ metadata repository automatically. This new workflow could
save months, even years, of an archivist’s time.
6</p>
    </sec>
    <sec id="sec-6">
      <title>What the Digital Humanities Need from CL</title>
      <p>
        The example uses of CL technologies for DH research outlined above demonstrate
some of the ways in which the use of CL tools differs between the two fields.
In broad terms, the goal of CL is to achieve some level of automatic
understanding or interpretation of human language data for sophisticated applications such as
question answering, machine translation, information retrieval, or summarization
[
        <xref ref-type="bibr" rid="ref16 ref2">2, 16</xref>
        ]. Tool chains for end-to-end performance of this kind of task are developed
and tested for their efficacy; the focus is on the final result of applying the tool chain
comprising the application, with a relatively high tolerance for error or “noise". In
contrast, for DH the focus is more often on finding information that may then be
subjected to further human analysis, and may require what in CL are considered
to be relatively low-level, enabling tasks, such as tokenization, sentence boundary
detection, part-of-speech tagging, named entity recognition, or gross-level
dependency analysis. Furthermore, DH deals with data from vastly varying domains and
genres, while CL at present tends to focus on a somewhat more limited range of
data. Thus for DH the availability of robust, highly accurate tools for low-level
tasks that are applicable or configurable to handle vastly different domains is
perhaps the highest priority, rather than the overall performance of high-level
sophisticated NLP applications. The LAPPS Grid, which provides easy-to-use access to
a wide variety of customizable low-level CL tools together with means to evaluate
performance on dataset from different domains, already addresses these needs to
a large extent. Its adaptation to accommodate the projects described in the
previous sections should continue to augment its capabilities to serve the needs of DH
research.
[8] Annika Hinze, Craig Taube-Schock, David Bainbridge, Rangi Matamua, and
J Stephen Downie. Improving access to large-scale digital libraries through
[17] Beth Plale, Robert McDonald, Yiming Sun, Inna Kouper, Ryan Cobine,
J Stephen Downie, Beth Sandore Namachchivaya, and John Unsworth.
Hathitrust research center: computational access for digital humanities and
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Howard</given-names>
            <surname>Besser</surname>
          </string-name>
          .
          <article-title>The next stage: Moving from isolated digital collections to interoperable digital libraries</article-title>
          .
          <source>First Monday</source>
          ,
          <volume>7</volume>
          (
          <issue>6</issue>
          ),
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Julian</given-names>
            <surname>Brooke</surname>
          </string-name>
          , Adam Hammond, and
          <string-name>
            <given-names>Graeme</given-names>
            <surname>Hirst</surname>
          </string-name>
          .
          <source>Proceedings of the Fourth Workshop on Computational Linguistics for Literature</source>
          ,
          <article-title>chapter GutenTag: an NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus</article-title>
          , pages
          <fpage>42</fpage>
          -
          <lpage>47</lpage>
          . Association for Computational Linguistics,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Cariani</surname>
          </string-name>
          and
          <string-name>
            <given-names>Casey</given-names>
            <surname>Davis</surname>
          </string-name>
          .
          <article-title>Let the computer do the work</article-title>
          .
          <source>In Presentations from FIAT/IFTA 2016 World Conference</source>
          , Warsaw, Poland, http://fiatifta.org/,
          <year>October 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Steve</given-names>
            <surname>Cassidy</surname>
          </string-name>
          , Dominique Estival, Timothy Jones, Denis Burnham, and
          <string-name>
            <given-names>Jared</given-names>
            <surname>Burghold</surname>
          </string-name>
          .
          <article-title>The alveo virtual laboratory: A web based repository api</article-title>
          .
          <source>In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          , Reykjavik, Iceland, may
          <year>2014</year>
          .
          <article-title>European Language Resources Association (ELRA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J</given-names>
            <surname>Stephen Downie</surname>
          </string-name>
          , Kirstin Dougan, Sayan Bhattacharyya, and
          <string-name>
            <given-names>Colleen</given-names>
            <surname>Fallaw</surname>
          </string-name>
          .
          <article-title>The hathitrust corpus: A digital library for musicology research</article-title>
          ?
          <source>In Proceedings of the 1st International Workshop on Digital Libraries for Musicology</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Giardine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Hardison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burhans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Elnitski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blankenberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , W. Miller,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Kent</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Nekrutenko</surname>
          </string-name>
          .
          <article-title>Galaxy: a platform for interactive large-scale genome analysis</article-title>
          .
          <source>Genome Research</source>
          ,
          <volume>15</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1451</fpage>
          -
          <lpage>55</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>HathiTrust.</surname>
          </string-name>
          <article-title>The workset creation for scholarly analysis + data capsules</article-title>
          . https://www.hathitrust.org/2016-spring-update.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>semantic-enhanced search and disambiguation</article-title>
          .
          <source>In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries</source>
          , pages
          <fpage>147</fpage>
          -
          <lpage>156</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Susan</given-names>
            <surname>Hockey</surname>
          </string-name>
          .
          <article-title>The history of humanities computing. A companion to digital humanities</article-title>
          , pages
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nancy</surname>
            <given-names>Ide</given-names>
          </string-name>
          , James Pustejovsky, Christopher Cieri, Eric Nyberg, Di Wang, Keith Suderman,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Verhagen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Wright</surname>
          </string-name>
          .
          <article-title>The language application grid</article-title>
          .
          <source>In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          , Reykjavik, Iceland, may
          <year>2014</year>
          .
          <article-title>European Language Resources Association (ELRA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Nancy</surname>
            <given-names>Ide</given-names>
          </string-name>
          , James Pustejovsky, Keith Suderman, and
          <string-name>
            <given-names>Marc</given-names>
            <surname>Verhagen. The Language Application Grid Web Service Exchange</surname>
          </string-name>
          <article-title>Vocabulary</article-title>
          .
          <source>In Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (OIAF4HLT)</source>
          , Dublin, Ireland,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Nancy</surname>
            <given-names>Ide</given-names>
          </string-name>
          , Keith Suderman, James Pustejovsky, Eric Nyberg, Christopher Cieri, and
          <string-name>
            <given-names>Marc</given-names>
            <surname>Verhagen</surname>
          </string-name>
          .
          <article-title>The Language Application Grid and Galaxy</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), Portorož, Slovenia,
          <year>2016</year>
          .
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Nancy</surname>
            <given-names>Ide</given-names>
          </string-name>
          , Keith Suderman,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Verhagen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>James</given-names>
            <surname>Pustejovsky. The Language Application Grid Web Service Exchange</surname>
          </string-name>
          <article-title>Vocabulary</article-title>
          .
          <source>In Revised Selected Papers of the Second International Workshop on Worldwide Language Service</source>
          Infrastructure - Volume
          <volume>9442</volume>
          , pages
          <fpage>18</fpage>
          -
          <lpage>32</lpage>
          . Springer-Verlag New York, Inc.,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Leonard</surname>
          </string-name>
          .
          <article-title>Mining large datasets for the humanities</article-title>
          .
          <source>IFLA WLIC</source>
          , pages
          <fpage>16</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Johan</surname>
            <given-names>Oomen</given-names>
          </string-name>
          , Riste Gligorov, and
          <string-name>
            <given-names>Michiel</given-names>
            <surname>Hildebrand</surname>
          </string-name>
          . Waisda?
          <article-title>: making videos findable through crowdsourced annotations</article-title>
          .
          <source>Crowdsourcing our Cultural Heritage</source>
          , pages
          <fpage>161</fpage>
          -
          <lpage>184</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Sandford</given-names>
            <surname>Bolette</surname>
          </string-name>
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          , Sussi Olsen, and
          <string-name>
            <given-names>Lars</given-names>
            <surname>Borin</surname>
          </string-name>
          .
          <source>Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA</source>
          <year>2015</year>
          ,
          <source>chapter Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA</source>
          <year>2015</year>
          .
          <article-title>Northern European Association for Language Technology</article-title>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          beyond.
          <source>In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries</source>
          , pages
          <fpage>395</fpage>
          -
          <lpage>396</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Susan</surname>
            <given-names>Schreibman</given-names>
          </string-name>
          , Ray Siemens,
          <string-name>
            <given-names>and John</given-names>
            <surname>Unsworth</surname>
          </string-name>
          .
          <article-title>A companion to digital humanities</article-title>
          . John Wiley &amp; Sons,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Patrik</given-names>
            <surname>Svensson</surname>
          </string-name>
          .
          <article-title>The landscape of digital humanities</article-title>
          .
          <source>Digital Humanities</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Edward</given-names>
            <surname>Vanhoutte</surname>
          </string-name>
          .
          <article-title>The gates of hell: History and definition of digital| humanities| computing</article-title>
          .
          <source>Defining Digital Humanities: A Reader</source>
          , pages
          <fpage>119</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Marc</given-names>
            <surname>Verhagen</surname>
          </string-name>
          , Keith Suderman, Di Wang, Nancy Ide, Chunqi Shi,
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Wright</surname>
          </string-name>
          , and
          <string-name>
            <given-names>James</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          .
          <article-title>The LAPPS Interchange Format</article-title>
          .
          <source>In Revised Selected Papers of the Second International Workshop on Worldwide Language Service</source>
          Infrastructure - Volume
          <volume>9442</volume>
          , pages
          <fpage>33</fpage>
          -
          <lpage>47</lpage>
          . Springer-Verlag New York, Inc.,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Welty</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nacy</given-names>
            <surname>Ide</surname>
          </string-name>
          .
          <article-title>Using the right tools: enhancing retrieval from marked-up documents</article-title>
          .
          <source>Computers and the Humanities</source>
          ,
          <volume>33</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>84</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Jiaan</surname>
            <given-names>Zeng</given-names>
          </string-name>
          , Guangchen Ruan, Alexander Crowell, Atul Prakash, and
          <string-name>
            <given-names>Beth</given-names>
            <surname>Plale</surname>
          </string-name>
          .
          <article-title>Cloud computing data capsules for non-consumptiveuse of texts</article-title>
          .
          <source>In Proceedings of the 5th ACM workshop on Scientific cloud computing</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>