<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Named Entity Recognition using FOX</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rene´ Speck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <email>ngongag@informatik.uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AKSW, Department of Computer Science, University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Unstructured data still makes up an important portion of the Web. One key task towards transforming this unstructured data into structured data is named entity recognition. We demo FOX, the Federated knOwledge eXtraction framework, a highly accurate open-source framework that implements RESTful web services for named entity recognition. Our framework achieves a higher Fmeasure than state-of-the-art named entity recognition frameworks by combining the results of several approaches through ensemble learning. Moreover, it disambiguates and links named entities against DBpedia by relying on the AGDISTIS framework. As a result, FOX provides users with accurately disambiguated and linked named entities in several RDF serialization formats. We demonstrate the different interfaces implemented by FOX within use cases pertaining to extracting entities from news texts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The goal of the demonstration will be to show the whole of the FOX workflow from
the gathering and preprocessing of input data to the generation of RDF data. In
addition, we will show how to configure and train FOX after it has been enhanced with a</p>
      <sec id="sec-1-1">
        <title>1 FOX online demo:http://fox-demo.aksw.org</title>
        <p>FOX project page:http://fox.aksw.org.</p>
        <p>Source code, evaluation data and evaluation results:http://github.com/AKSW/FOX.
novel NER tool or EL algorithm. Further, we will present FOX’s feedback RESTful
service to improve the training and test datasets. In the demonstration, we also go over
the Python2 and Java bindings3 for an easy use of FOX’s RESTful service within an
application. At the end we will explain how to use the FOX Java interfaces to integrate
future algorithms.
2.1</p>
        <p>
          Workflow
The workflow underlying FOX consists of four main steps: (1) preprocessing of the
unstructured input data, (2) recognizing the Named Entities (NE), (3) linking the NE to
resources using AGDISTIS [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and (4) converting the results to an RDF serialization
format.
        </p>
        <p>Preprocessing FOX allows users to use a URL, text with HTML tags or plain text as
input data (see the top left part of Figure 1). The input can be carried out in a form
(see the center of Figure 1) or via FOX’s web service. In case of a URL, FOX sends
a request to the given URL to receive the input data. Then, for all input formats, FOX
removes HTML tags and detects sentences and tokens.</p>
        <p>We will use text examples, URLs and text with HTML tags to show how FOX
gathers or cleans them for the sake of entity recognition.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2 https://pypi.python.org/pypi/foxpy 3 https://github.com/renespeck/fox-java</title>
        <p>
          Entity Recognition Our approach relies on four state-of-the-art NER tools so far: (1)
the Stanford Named Entity Recognizer (Stanford) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], (2) the Illinois Named Entity
Tagger (Illinois) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], (3) the Ottawa Baseline Information Extraction (Balie) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
(4) the Apache OpenNLP Name Finder (OpenNLP) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. FOX allows using a
particular NER approach which is integrated in it (see bottom right of Figure 1). To this end,
FOX light has to be set to the absolute path to the class of the tool to use. If FOX light
is off, then FOX utilizes these four NER tools in parallel and stores the received NEs
for further processing. It maps the entity types of each of the NER tools to the classes
Location, Organization and Person. Finally, the results of all tools are merged
by using FOX’s EL layer as discussed in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We will show the named entities
recognized by FOX and contrast these with those recognized by the other tools. Moreover,
we will show the runtime log that FOX generates to point to FOX’s scalability.
Entity Linking FOX makes use of AGDISTIS [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], an open-source named entity
disambiguation framework able to link entities against every linked data knowledge base,
to disambiguate entities and to link them against DBpedia. In contrast to lookup-based
approaches, our framework can also detect resources that are not in DBpedia. In this
case, these are assigned their own URIs. Moreover, FOX provides a Java interface and
a configuration file for easy integration of other entity linking tools. We will show the
messages that FOX generates and sends to AGDISTIS as well as the answers it receives
and serializes.
        </p>
        <p>Serialization Formats FOX is designed to support a large number of use cases. To
this end, our framework can serialize its results into the following formats: JSON-LD4,
N-Triples5, RDF/JSON6, RDF/XML7, Turtle8, TriG9, N-Quads10. FOX allows the user
to choose between these formats (see bottom left part of Figure 1). We will show how
the out of FOX looks like in the different formats and point to how they can be parsed.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Evaluation and Results</title>
      <p>
        We performed a thorough evaluation of FOX by using five different datasets and
comparing it with state-of-the-art NER frameworks (see Table 1). Our evaluation shows that
FOX clearly outperforms the state of the art. The details of the complete evaluation are
presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The evaluation code and datasets are also available at FOX’s Github
page, i.e., http://github.com/AKSW/FOX.
4 http://www.w3.org/TR/json-ld
5 http://www.w3.org/TR/n-triples/
6 http://www.w3.org/TR/rdf-json
7 http://www.w3.org/TR/REC-rdf-syntax
8 http://www.w3.org/TR/turtle
9 http://www.w3.org/TR/trig
10 http://www.w3.org/TR/n-quads
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>We will present FOX, a NER framework which relies on EL and demonstrate how it
can be used. In future work, we will extend the number of tools integrated in FOX.
Moreover, we will extend the tasks supported by the framework. In particular, we aim
to integrate tagging, keyword extraction as well as relation extraction in the near future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J</given-names>
            <surname>Baldridge</surname>
          </string-name>
          .
          <source>The opennlp project</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Jenny</given-names>
            <surname>Rose</surname>
          </string-name>
          <string-name>
            <surname>Finkel</surname>
          </string-name>
          , Trond Grenager, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling</article-title>
          .
          <source>In ACL</source>
          , pages
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Ali</given-names>
            <surname>Khalili</surname>
          </string-name>
          ,
          <article-title>So¨ ren Auer, and Axel-Cyrille Ngonga Ngomo. context - lightweight text analytics using linked data</article-title>
          .
          <source>In 11th Extended Semantic Web Conference (ESWC2014)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>David</given-names>
            <surname>Nadeau</surname>
          </string-name>
          .
          <article-title>Balie-baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques</article-title>
          .
          <source>Technical report, Technical report</source>
          , University of Ottawa,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , Norman Heino, Klaus Lyko, Rene´ Speck, and
          <article-title>Martin Kaltenbo¨ ck. SCMS - Semantifying Content Management Systems</article-title>
          .
          <source>In Proceedings of the International Semantic Web Conference</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Lev</given-names>
            <surname>Ratinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>In Proceedings of the Thirteenth Conference on Computational Natural Language Learning</source>
          ,
          <source>CoNLL '09</source>
          , pages
          <fpage>147</fpage>
          -
          <lpage>155</lpage>
          , Stroudsburg, PA, USA,
          <year>2009</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Rene´
          <article-title>Speck and Axel-Cyrille Ngonga Ngomo</article-title>
          .
          <article-title>Ensemble learning for named entity recognition</article-title>
          .
          <source>In In Proceedings of the International Semantic Web Conference, Lecture Notes in Computer Science</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Usbeck</surname>
          </string-name>
          .
          <article-title>Combining linked data and statistical information retrieval</article-title>
          .
          <source>In 11th Extended Semantic Web Conference</source>
          ,
          <source>PhD Symposium</source>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , So¨ ren Auer, Daniel Gerber, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Both</surname>
          </string-name>
          .
          <article-title>Agdistis - agnostic disambiguation of named entities using linked open data</article-title>
          . In Submitted to 12th
          <source>International Semantic Web Conference</source>
          ,
          <volume>21</volume>
          -25
          <source>October</source>
          <year>2013</year>
          , Sydney, Australia,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>