<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Invited Talk: Domain-adaptation of Natural Language Processing Tools for RE</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Tejaswini Deoskar Institute for Logic</institution>
          ,
          <addr-line>Language and Computation (ILLC)</addr-line>
          ,
          <institution>University of Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Natural language processing tools like part-of-speech taggers and parsers are being used in a variety of applications involving natural language, including RE. Such tools, based on statistical models of language, are learnt via supervised machine learning algorithms from human-annotated data. Due to their dependence on annotated data, which is limited in size and genre, these models have a fall in performance for words or constructions not encountered in the annotated data, as well as for genres or domains of language di erent from the supervised training data. This talk will present Tejaswini Deoskar's work on semi-supervised learning, where a model initially trained on supervised data is further improved by using unannotated data, available in much larger quantities. Such semi-supervised training improves performance over low-frequency words and constructions, i.e. those in the long tail of language use, and may also be used to adapt supervised NLP models to perform better over new domains of text such as those used in RE documents.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Biography of Tejaswini Deoskar Tejaswini Deoskar is assistant professor at the Institute for Logic,
Language and Computation (ILLC) at the University of Amsterdam. Her research focuses on probabilistic learning
techniques and probabilitistic models for natural language. She is interested in the general problem of learning
the syntax and semantics of natural language using "data-driven" methods. This "data" usually consists of large
collections of language usage (most commonly, text). In particular, she worked on "semi-supervised" learning
techniques for language, where the data is a combination of bare text, plus text that is annotated with extra
syntactic or semantic information. She is also especially interested in the class of grammars called
"stronglylexicalised" grammars. She worked on semi-supervised learning for such grammars, especially the grammar
formalism Combinatory Categorial Grammar (CCG).</p>
      <p>Copyright c 2018 by the paper's authors. Copying permitted for private and academic purposes.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>