<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontologies for Multilingual Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Deryle W. Lonsdale</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David W. Embley</string-name>
          <email>embley@cs.byu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen W. Liddle</string-name>
          <email>liddle@byu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science, Brigham Young University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Systems, Brigham Young University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Linguistics &amp; English Lang., Brigham Young University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>In our global society, multilingual barriers sometimes prohibit and often discourage people from accessing a wider variety of goods and services. We propose multilingual extraction ontologies as an approach to resolving these issues. Our ontologies provide a conceptual framework for a narrow domain of interest. Grounding narrow-domain ontologies linguistically enables them to map relevant utterances and text to meaningful concepts in the ontology. Our prior work includes leveraging large-scale lexicons and terminology resources for grounding and augmenting ontological content [14]. Linguistically grounding ontologies in multiple languages enables cross-language communication within the scope of the various ontologies' domains. We quantify the success of linguistically grounded ontologies by measuring precision and recall of extracted concepts, and we can gauge the success of automated cross-linguistic-mapping construction by measuring the speed of creation and the accuracy of generated lexical resources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Though English has so far served as the principal
language for Internet use (with currently 28.7% of all
users), its relative importance is rapidly diminishing.
Chinese users, for example, comprise 21.7% of Internet
users and their growth in numbers between 2000 and
2009 has been 1,018.7%; the growth in Spanish users
has been 631.3% over the last decade. Since more
people want to access web information in more languages,
this poses a substantial challenge and opportunity for
research and business organizations whose interest is in
providing multilingual access to web content.</p>
      <p>The BYU Data Extraction research Group (DEG)1
has worked for years on tools—such as its Ontology
Extraction System (OntoES)—to enable access to web
content of various types: car advertisements,
obituaries, clinical trial data, and biomedical information. The
group to date has focused on English web data, while
1This work was funded in part by U.S. National
Science Foundation grants for the TIDIE (IIS-0083127)
and TANGO (IIS-0414644) projects.
understanding the eventual need to extend OntoES to
other languages. This appears to be an opportune time
for our group to enter the area of multilingual
information extraction and show how the DEG infrastructure
is poised to make significant contributions in this area
as it has already has in extracting English information.</p>
      <p>There are currently a few efforts in the area of
multilingual information extraction. Some focus on very
narrow domains, such as technical information for oil
drilling and exploration in Norwegian and English.
Others are more general but involve more than two
languages, such as accessing European train system
schedules. The U.S. government (NIST TREC), the
European Union (7th Framework CLEF), and Japan
(NTCIR) all have initiatives to help further the development
and evaluation of multilingual information retrieval and
data extraction systems. Of course, Google and other
companies interested in web content and market share
are enabling multilingual access to the Internet.</p>
      <p>Almost all of the existing efforts involve a typical
scenario that might include: collecting a query in the user’s
language, translating that query into the language of
the web pages to be searched, locating the answers, and
then returning the relevant results to the user or to
someone who can help the user understand their
content. This approach is fraught with problems since
machine translation (MT), a core component in the
process, is still a developing technology.</p>
      <p>For reasons discussed below, we believe that our
approach has technical and linguistic merit, and can
introduce a fresh perspective on multilingual information
extraction. Our ontology-based techniques are ideal for
extracting content in various languages without
having to rely directly on MT. By carefully developing the
knowledge resources necessary, we can extend
DEGtype processing to other languages in a modular fashion.</p>
    </sec>
    <sec id="sec-2">
      <title>THE ONTOLOGY-BASED APPROACH 2. 2.1</title>
    </sec>
    <sec id="sec-3">
      <title>Extraction Ontologies</title>
      <p>
        Just over a decade ago, the BYU Data-Extraction
research Group (DEG) began its work on information
extraction. In a 1999 paper, DEG researchers described
an efficacious way to combine ontologies with simple
natural language processing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].2 The idea is to
de2Recently, others have begun to combine ontologies with
clare a narrow domain ontology for an application of
interest and augment its concepts with linguistic
recognizers. Coupling recognizers with conceptual modeling
turns a conceptual ontology into an extraction
ontology. When applied to data-rich semi-structured text, an
extraction ontology recognizes linguistic elements that
identify concept instances for the object and
relationship sets in the ontology’s conceptual model. We call
our system OntoES, Ontology-based Extraction System.
      </p>
      <p>Consider, for example, a typical car ad. Its content
can be modeled with a conceptual ontology such as that
shown in Figure 1. With linguistic recognizers added for
concepts such Make, Model, Year, Price, and Mileage,
the domain ontology becomes an extraction ontology.</p>
      <p>
        We have developed a form-based tool [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that helps
users to develop ontologies including declaring
recognizers and associating them with ontological concepts.
It also permits users to specify regular expressions that
recognize traditional value phrases for car prices such as
“$15,900”, “7,595”, and “$9500”—with optional dollar
signs and commas. Users can also declare additional
recognizers for other expected price expressions such as “15
grand”. To help make recognizers more precise, users
can declare exception expressions, left and right
context expressions, units expressions, and even keyword
phrases such as “MSRP” and “our price” to help sort
out various prices that might appear. Figure 2 shows
snippets from recognizer declarations for car ads data.
      </p>
      <p>Applying the recognizers of all the concepts in a
carads extraction ontology to a car ad annotates, extracts,
and organizes the facts from that ad. The result is a
machine-readable cache of facts that users can query or
use to perform data analysis or other automated tasks.3</p>
      <p>
        To verify that a carefully designed extraction
ontology for car ads can indeed annotate, extract, and
organize facts for query and analysis, DEG researchers have
natural language processing [
        <xref ref-type="bibr" rid="ref11 ref2">11, 2</xref>
        ]. The combination
has been called “linguistically grounding ontologies.”
3See http://deg.byu.edu for a working online
demonstration of the system.
      </p>
      <p>
        Price
internal representation: Integer
external representation: \$[
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1-9</xref>
        ]\d{0,2},?\d{3}
| \d?\d [Gg]rand | ...
context keywords: price|asking|obo|neg(\.|otiable)|...
...
      </p>
      <p>LessThan(p1: Price, p2: Price) returns (Boolean)
context keywords: (less than|&lt;|under|...)\s*{p2} |...
...</p>
      <p>
        Make
...
external representation: CarMake.lexicon
...
conducted experiments with hundreds of car ads from
various on-line sources containing thousands of fact
instances. In one experiment, when an existing OntoES
car ads ontology was hand-tuned on a corpus of 100
development documents and then tested on an unseen
corpus of about 110 car ads, the system extracted 1003
attributes with with recall measures of 94% and
precision measures nearing 100% [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Recently, DEG researchers have experimented with
information extraction in Japanese. Figure 3 shows an
OntoES extraction ontology that can extract
information from Japanese car ads analogous to the English
one shown earlier. The concept names are in Japanese
as are the regular-expression recognizers. Yen amounts
range from 10,000 yen to 9,999,999 yen whereas dollar
amounts range from $100 to $99,999. The critical
observation is that the structure of the Japanese ontology
is identical to the structure of the English ontology.</p>
      <p>
        This type of ontology-based matching across languages
at the lexical level indicates a possible strategy for
providing a cross-linguistic bridge through concepts rather
than relying on traditional means of translation.
Similar approaches have been tried in such areas as machine
translation (e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and cross-linguistic information
retrieval [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>As currently implemented, OntoES extraction
ontologies can “read” and “write” in any single language. The
car-ad examples here are in English and Japanese, but
extraction ontologies work the same for all languages.
To “read” means to recognize instance values for
ontological concepts, to extract them, and to appropriately
link related values together based on the associated
conceptual relationships and constraints. To “write” means
to list the facts recorded in the ontological structure.
Having “read” a typical car ad, OntoES might write:
Year: 1984
Make: Dodge
Model: W100
Price: $2,000
Feature: 4x4
Feature: Pickup</p>
      <p>Accessory: 12.5x35” mud tires
In addition, based on the constraints, OntoES knows
and can write several meta statements about an
ontology. Examples: “an Accessory is a F eature” (white
triangles denote hyponym/hypernym is-a constraints);
“T rim is part of M odelT rim” (black triangles denote
meronym/holonym is-part-of constraints), “Car has at
most one M ake” (the participation constraint 0:1 on
Car for M ake denotes that Car objects in car ads
associate with M ake names between 0 and 1 times, or “at
most once”).</p>
      <p>As currently implemented, however, OntoES cannot
read in one language and write in another. This
crosslinguistic ability to read in one language and then
translate to and write in another language is the essence of
our multilingual-oriented development. For example,
we expect to be able to read the price in yen from a
Japanese car-ad and write “Price: $24,124” and to read
the Kanji symbols for the make and write “Make:
Mitsubishi”. To assure this level of functionality, we need
to encode unit or currency conversion routines for
values like P rice and to encode cross-linguistic lexicons for
named entities such as M ake. In principle, encoding
this cross-linguistic mapping is currently possible, but
represents a fair amount of manual effort. We are
currently finding ways to largely automate this mapping.
In addition, we are adding two other capabilities to the
system that will similarly enhance extraction and query
processing: compound recognizers and patterns.</p>
      <p>
        Compound recognizers allow OntoES to directly
recognize ontological relationships beyond simple concepts.
For a query like: “Find Nissans for sale with years
between 1995 and 2005.”, we need to recognize each of
the years as well as the between constraint that relates
them. Our previous work has implemented compound
recognizers for operators in free-form queries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but we
now seek to linguistically ground these types of
ontological relationships.
      </p>
      <p>
        Patterns will allow OntoES to identify and extract
from structured text. For example, car ads often
appear as a table with P rice in one column, Y ear in
another column, and M ake and M odel in a third column.
Detecting patterns in documents will allow OntoES to
apply specialized extraction rules and likely improve
extraction accuracy. By extending our work with table
patterns [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we expect to fully exploit patterns in text.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Multilingual Mappings</title>
      <p>We are extending in a principled way the
cross-linguistic effectiveness of our OntoES system by
adapting it for use in processing data-rich documents in
languages other than English. Though the OntoES system
was originally designed to handle English-language
documents, it was implemented according to standard
webrelated software engineering principles and best
practices: version control, integrated development
enviroments, standardized data markup and encoding (XML,
RDF, and OWL), Unicode character representation, and
tractability (SWRL rules and Pellet-based reasoning).
Consequently, we anticipate that internationalization of
the system should be relatively straightforward, not
requiring wholesale rewrites of crucial components. This
should allow us to handle web pages in any language,
given appropriate linguistic knowledge sources. Since
OntoES does not need to parse out the grammatical
structure of webpage text, only lower-level lexical
(wordbased) information is necessary for linguistic processing.</p>
      <p>The system’s lexical knowledge is highly modular,
with specific resources encoded as user-selectable
lexicons. The information used to build up existing
content for the English lexicons includes a mix of implicit
knowledge and existing resources. Some lexicon entries
were created by students during class and project work;
other entries were developed from existing lexical
resources (e.g. the US Census Bureau for personal names,
the World Factbook for country names, Ethnologue for
language names, etc.). We are developing analogous
lexicons for other languages, and adapting OntoES as
necessary to accommodate them in its processing. As was
the case for English, this involves some hand-crafting of
relevant material, as well as finding and converting
existing data sources in other languages for targeted types
of lexical information. Often this is relatively
straightforward: for example, WordNet is a sizable and
important component for English OntoES, and similar and
compatible resources exist for other languages.
However, we also need to rely on linguistic knowledge and
experience to find, convert, and implement appropriate
cross-linguistic lexical resources.</p>
      <p>
        In the realm of cross-linguistic extraction systems,
OntoES has a clear advantage. We claim that
ontologies, which lie at the crux of our extraction approach,
can serve as viable interlinguas. We are currently
substantiating this claim. Since an ontology represents a
conceptualization of items and relationships of interest
(e.g. interesting properties of a car, information needed
to set up a doctor’s appointment, etc.), a given ontology
should be appropriate cross-linguistically with perhaps
occasionally some slight cultural adaptation. For
example, in our prior work on extraction from obituaries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
we found that worldwide cultural and dialect differences
were readily apparent even in English material. Certain
terms for events like “tenth day kriya”, “obsequies”,
and “cortege” were found only in English obituaries
announcing events outside of America. Since our lexical
resources serve as a “grounding” of the lowest-level
concepts from ontologies with the lexical content of the web
pages, substituting one language’s lexicon for another’s
provide OntoES a true cross-linguistic capability.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Ongoing Work</title>
      <p>Our current work involves several separate but related
tasks. We are locating annotated corpora in other
languages amenable for evaluation purposes, and collecting
and annotating interesting multilingual web material of
our own. We are also developing prototype lexicons
and recognizers for these target languages. Of course,
our work requires us to develop and adapt prototype
ontologies for target languages for sample concepts in
data-rich domains.</p>
      <p>In addition, we are enhancing extraction ontologies
by enabling them to (1) explicitly discover and extract
relationships among object instances of interest, and (2)
discover patterns of interest from which they can more
certainly identify and extract both object instances and
relationship instances of interest. This involves
devising, investigating, designing, coding, and evaluating
algorithms for compound recognizers and for pattern
discovery and patterned information extraction.</p>
      <p>Finally, we are evaluating system performance using
standard metrics and gold-standard annotated data.</p>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSION</title>
      <p>
        Though an interesting effort in its own right, we
expect our multilingual extraction work to also contribute
to our larger effort to create a Web of Knowledge [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ].
Our research centers around resolving some of the tough
technical issues involved in a community-wide effort to
deploy the semantic web [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and in concert with efforts
at Yahoo!, Google, and elsewhere to extract information
from the web and integrate it into community portals to
enable community members to better discover, search,
query, and track interesting community information [
        <xref ref-type="bibr" rid="ref10 ref13 ref3">3,
10, 13</xref>
        ]. Multilingual extraction ontologies have the
farreaching potential to play a significant role as
semanticweb work finds its way into mainstream use in global
communities.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Muhammed</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          .
          <article-title>Ontologybased constraint recognition for free-form service requests</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Data Engineering (ICDE'07)</source>
          , pages
          <fpage>366</fpage>
          -
          <lpage>375</lpage>
          , Istanbul, Turkey,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sintek</surname>
          </string-name>
          .
          <article-title>Towards linguistically grounded ontologies</article-title>
          .
          <source>In Proceedings of the 6th European Semantic Web Conference (ESWC'09)</source>
          , pages
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          , Heraklion, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>DeRose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <article-title>Building structured web community portals: A top-down, compositional, and incremental approach</article-title>
          .
          <source>In Proceedings of the 33rd Very Large Database Conference (VLDB'07)</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>28</lpage>
          , Vienna, Austria,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Dorr</surname>
          </string-name>
          . Machine Translation:
          <article-title>A view from the lexicon</article-title>
          . MIT Press, Cambridge, MA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          , D. Campbell,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liddle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lonsdale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-K.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <article-title>Conceptual-model-based data extraction from multiple-record web pages</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>227</fpage>
          -
          <lpage>251</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          , D. Campbell,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liddle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <article-title>Ontology-based extraction and structuring of information from data-rich unstructured documents</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98)</source>
          , pages
          <fpage>52</fpage>
          -
          <lpage>59</lpage>
          , Washington D.C.,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liddle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lonsdale</surname>
          </string-name>
          , G. Nagy,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tijerino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Clawson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Crabtree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lynn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Padmanabhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Watts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Woodbury</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zitzelberger</surname>
          </string-name>
          .
          <article-title>A conceptual-model-based computational alembic for a web of knowledge</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Conceptual Modeling (ER08)</source>
          , pages
          <fpage>532</fpage>
          -
          <lpage>533</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Liddle</surname>
          </string-name>
          .
          <article-title>Automating the extraction of data from HTML tables with unknown structure</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>54</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zitzelberger</surname>
          </string-name>
          .
          <article-title>Theoretical foundations for enabling a web of knowledge</article-title>
          .
          <source>In Proceedings of the 6th International Symposium on Foundations of Information and Knowledge Systems (FoIKS10)</source>
          , Sophia, Bulgaria,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pereira</surname>
          </string-name>
          .
          <article-title>The unreasonable effectiveness of data</article-title>
          .
          <source>IEEE Intelligent Systems, March/April</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Firby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Jr.</surname>
          </string-name>
          , H. Johnson, P. Ogren, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Cohen. OpenDMAP</surname>
          </string-name>
          :
          <article-title>An open source, ontology-driven, concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kishida</surname>
          </string-name>
          .
          <article-title>Technical issues of cross-language information retrieval: A review</article-title>
          .
          <source>Information Processing and Management: an International Journal</source>
          ,
          <volume>41</volume>
          :
          <fpage>433</fpage>
          -
          <lpage>455</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bohannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Keerthi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Merugu</surname>
          </string-name>
          .
          <article-title>A web of concepts</article-title>
          .
          <source>In Proceedings of the 2009 Symposium on Principles of Database Systems</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          , Providence, RI,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lonsdale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Embley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hepp</surname>
          </string-name>
          .
          <article-title>Reusing ontologies and language components for ontology generation</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>69</volume>
          :
          <fpage>318</fpage>
          -
          <lpage>330</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Embley</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Liddle</surname>
          </string-name>
          . FOCIH:
          <article-title>Form-based ontology creation and information harvesting</article-title>
          .
          <source>In Proceedings of the 28th International Conference on Conceptual Modeling (ER</source>
          <year>2009</year>
          ), pages
          <fpage>346</fpage>
          -
          <lpage>359</lpage>
          , Gramado, Brazil,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>W3C (World Wide Web Consortium) Semantic Web Activity Page</surname>
          </string-name>
          . http://www.w3.org/2001/sw/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>