<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Matching Natural Language Data with Ontologies</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Orange Labs</institution>
          ,
          <addr-line>F-22307 Lannion cedex -</addr-line>
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontologies and natural languages are complementary. Whereas ontologies are used to model knowledge formally, natural language is primarily used by users to communicate with ontology based systems. In order to transform information or queries in natural language into valid ontological expressions, the meaning of natural language entities have to be matched with the given ontologies. In contrast to pure ontology matching, the matching with natural language data poses some problems linked to their ambiguities (synonymy, homonymy/polysemy, redundancy, to name but a few).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Apart from the ontologies, the matching requires a complete lexicon of the language used to
label or describe the ontological classes and properties (= entities). Our lexicon is lso linked
to a semantic thesaurus. The ontologies, on the other hand, usually have non-ambiguous
entity labels (like http: //www.acemedia.org/ontos/tennis#Player1) or a comment,
explaining the entity. This is especially necessary if the entity labels are not self-explanatory like
1 We shorten name spaces like http://www.acemedia.org/ontos/tennis# to “tennis:” etc.
tennis:C12 (a fortunately rare case). Further, the semantic thesaurus contains a thematic
hierarchy of all semantic concepts to help disambiguation. These are grouped into 880 themes
which in turn are organized in 80 domains. Domains are divided into about 10
macro-domains. The matching itself comprises several
lexicon
semantic
thesaurus
(e) adding ontologies to 
semantic thesaurus
domain
ontologies
(a) extract and correct 
class and property labels</p>
      <p>NLP
(b) Semantic analysis of 
classes (taking into 
account their taxonomic </p>
      <p>context)
(c) Semantic analysis of 
properties (taking into 
account their range and 
domain)
(f) Creation of semantic  steps (cf. fig. 1). Apart from a (more or less
ccluosmteprlienxg  crulalessse fsor  manual) preparation in order to correct
possi(g) Creation of RDFS  ble labeling errors in the ontologies, the other
rules (predicate­class  steps do not need any intervention: (a)
extracttransformation)</p>
      <p>
        ing the “ontological context” of entities and
as(d) Daentedc htiynpgo snyynmosnyms  signing eventual reformulations of entity labels;
Fig. 1. linguistic-ontological matching (b) natural language processing passes:
detecting meanings for classes using their ontological
context (direct sub-classes); (c) and for properties using their ontological context (domain
and range classes); (d) determining the application depending synonyms and co-hyponyms;
(e) adding the ontological hierarchy to the semantic taxonomy; (f) creating semantic
transformation rules for “complex class labels”2; (g) creating transformation rules for the creation
of ontological representation (from semantic graphs. Synonyms (defined in our multilingual
thesaurus, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) are all matched onto the same ontological class (e.g. “river”, “stream”, “creek”
etc. → holidays:River). If a class has no sub-classes, we also match the co-hyponyms of the
label to the class (e.g. in our case “car”, “bus”, “truck”, “motorbike” . . . → general:Vehicle.
The resulting linguistic data is successfully used the aceMedia prototype, similarly produced
data is used in an industrial application to create and access ontological based information
from/via natural language.
      </p>
      <p>New perspectives are offered by structured semantic data which is getting more and more
available. Databases like Wikipedia (especially the categorization schema used within) or
RDF or ontology based information systems like DBpedia or freebase3 (both initialized by
Wikipedia contents) will help to improve the linking of natural languages and formally
modeled ontologies.
2 Labels which use multi-word expressions like tennis:ExhibitionMatch instead of simple words.
3 http:// dbpedia.org/ , http:// freebase.com/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Heinecke</surname>
          </string-name>
          , J.:
          <article-title>Ge´ne´ration automatique des repre´sentations ontologiques</article-title>
          .
          <source>In: TALN</source>
          . (
          <year>2006</year>
          )
          <fpage>502</fpage>
          -
          <lpage>511</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dasiopoulou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinecke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saathoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strintzis</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          :
          <article-title>Multimedia reasoning with natural language support</article-title>
          .
          <source>In: IEEE-ICSC</source>
          . (
          <year>2007</year>
          )
          <fpage>413</fpage>
          -
          <lpage>420</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Heinecke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toumani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Natural Language Mediation System for E-Commerce applications</article-title>
          .
          <source>In: Workshop HLT for the Semantic Web and Web Services. ISWC</source>
          . (
          <year>2003</year>
          )
          <fpage>39</fpage>
          -
          <lpage>50</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ehrig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>QOM - quick ontology mapping</article-title>
          .
          <source>In: ISWC</source>
          . (
          <year>2004</year>
          )
          <fpage>683</fpage>
          -
          <lpage>697</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology Matching. Springer, Heidelberg (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Heinecke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smits</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chardenon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guimier De Neef</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maillebuau</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boualem</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>TiLT : plate-forme pour le traitement automatique des langues naturelles</article-title>
          .
          <source>TAL 49:2</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchese</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaihrayeu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Encoding classifications into lightweight ontologies</article-title>
          .
          <source>Journal of Data Semantics VIII</source>
          (
          <year>2007</year>
          )
          <fpage>57</fpage>
          -
          <lpage>81</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A resource-poor approach for linking ontology classes to Wikipedia articles</article-title>
          . In: STEP. (
          <year>2008</year>
          )
          <fpage>381</fpage>
          -
          <lpage>387</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chagnoux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinecke</surname>
          </string-name>
          , J.:
          <article-title>Aligner ontologies et langues naturelles. ge´rer la synonymie</article-title>
          . In:
          <string-name>
            <surname>Plateforme</surname>
            <given-names>AFIA</given-names>
          </string-name>
          , Grenoble (
          <year>2007</year>
          )
          <fpage>87</fpage>
          -
          <lpage>94</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>