<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linguistic Patterns for Information Extraction in OntoCmaps</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amal Zouaq</string-name>
          <email>Amal.zouaq@rmc.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dragan Gasevic</string-name>
          <email>dgasevic@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Hatala</string-name>
          <email>mhatala@sfu.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Athabasca University</institution>
          ,
          <addr-line>1 University Drive, Athabasca, AB T9S 3A3</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Royal Military College of Canada</institution>
          ,
          <addr-line>CP 17000, Succ. Forces, Kingston</addr-line>
          ,
          <institution>ON Canada K7K 7B4</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Simon Fraser University</institution>
          ,
          <addr-line>250-102nd Avenue, Surrey, BC</addr-line>
          <country country="CA">Canada</country>
          <addr-line>V3T 0A3</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linguistic patterns have proven their importance for the knowledge engineering field especially with the ever-increasing amount of available data. This is especially true for the Semantic Web, which relies on a formalization of knowledge into triples and linked data. This paper presents a number of syntactic patterns, based on dependency grammars, which output triples useful for the ontology learning task. Our experimental results show that these patterns are a good starting base for text mining initiatives in general and ontology learning in particular.</p>
      </abstract>
      <kwd-group>
        <kwd>Linguistic patterns</kwd>
        <kwd>dependency grammars</kwd>
        <kwd>triples</kwd>
        <kwd>knowledge extraction</kwd>
        <kwd>ontology learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        With the development of Semantic Web technologies and the increased number of
initiatives relative to the Web of data, there is a need to create reusable and high
quality ontologies. For this purpose, ontology design patterns (ODP) have been
modeled and span various aspects of ontological design such as architectural ODPs,
Content ODPs and Reengineering ODPs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These patterns enable the definition of
formal methodologies for ontology creation and maintenance. Of particular interest to
the knowledge engineering community are Lexico-Syntactic Patterns. In fact, with the
availability of large unstructured knowledge resources such as Wikipedia, it becomes
crucial to be able to extract ontological content using scalable (semi)automatic
knowledge extraction techniques. In this work, we consider that lexico-syntactic
patterns are syntactic structures that trigger the extraction of chunks of information.
Obviously, these patterns cannot be used in isolation and they necessitate various
filtering mechanisms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to identify ontological knowledge from this extracted textual
information. However, by constituting Lexico-syntactic ODP catalogs, it is likely that
these ODPs will be used and reused as the first building block of ontology learning
initiatives. So far, there are some ODPs repositories [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] but their number remains
modest.
      </p>
      <p>
        Trying to address this issue from a knowledge extraction perspective, this paper
introduces the main patterns that are used by OntoCmaps, an ontology learning tool
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. OntoCmaps exploits a pattern knowledge base that is domain independent.
OntoCmaps lexico-syntactic ODPs are based on a dependency grammar formalism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
that is well-suited for knowledge extraction. This paper describes the various ODPs
that are used to extract multi-word expressions, hierarchical relationships and
conceptual relationships that can be later promoted as domain knowledge and converted in
      </p>
    </sec>
    <sec id="sec-2">
      <title>OWL1 format. The paper details each class of patterns, and presents the accuracy of</title>
      <p>each pattern prior to any filtering. Our results show that filtering techniques should be
used in a separate sieve on top of the ODPs to improve the accuracy of the extraction.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>ODP are quite recent design patterns whose objective is to create modeling solutions
to well-identified problems in ontology engineering and thus promote good design. In
this paper, we are mostly interested in one subclass of ODPs: Lexico-syntactic
patterns (LSPs). In particular LSPs are meant to identify which patterns in texts
correspond to logical constructs of the OWL language. This type of patterns is essential
from a semi-automatic ontology engineering point of view. In fact, as soon as the
domain becomes a real-world problem, it starts to be difficult and very expensive to
manually design an ontology.</p>
        <p>
          In general LSPs are a widely used method in text mining and can be traced back to
the work of [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for hyponymy extraction. There have been several attempts to use
LSPs for ontology learning [
          <xref ref-type="bibr" rid="ref2 ref3 ref6 ref7">2, 3, 6, 7</xref>
          ], relation extraction [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] or for axiom extraction
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Among the most similar works to ours are SPRAT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] which propose
various patterns for ontology learning and population. One of the peculiarities of our
approach is that we designed purely syntactic patterns that examine a bigger number
of linguistic constructs (e.g. relative clause modifiers, adjectival complements,
copula, etc.) than what is available in the state of the art to extract information from text.
Moreover, our patterns are based solely on a dependency grammar combined with
parts-of-speech tagging. We used a similar approach in a previous work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] on
semantic analysis but the aim was not the extraction of triples for ontology learning and the
patterns themselves were not structured and conceived in the same way. Finally, there
are also various LSPs identified on the ODP portal2 but to the best of our knowledge,
the majority of the patterns in this paper are new with respect to the listed LSPs.
Overall, 29 over the 31 patterns presented in this paper are not listed on the ODP
portal. There is one common pattern for object property extraction (nsubj-dobj) which is
already widely used in the information extraction field and one common pattern for
hierarchical relations extraction (nsubj-cop). Finally, in one case, there is a similarity
between one LSP used to extract sub-classes/super-classes relationships and one of
our patterns for hierarchical relationship extraction (with the use of the expression
including instead of include). In any case, one major difference is the use of
dependency relations in OntoCmaps.
1 http://www.w3.org/TR/owl-features/
2 http://ontologydesignpatterns.org/wiki/Category:ProposedLexicoSyntacticOP
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Lexico-Syntactic Patterns in OntoCmaps</title>
        <sec id="sec-2-2-1">
          <title>OntoCmaps</title>
          <p>
            OntoCmaps is an ontology learning tool that takes unstructured texts about a domain
of interest as input [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. OntoCmaps is essentially based on two main stages: a
knowledge extraction step which relies on syntactic patterns to extract candidate
triples from texts, and a knowledge filtering step which acts as a sieve to identify
relevant triples among these candidates. Since this paper focuses on the knowledge
extraction part, this section presents the formalisms and tools used during the extraction
stage.
          </p>
          <p>
            As aforementioned, OntoCmaps patterns are mainly syntactic patterns which use a
dependency grammar formalism and part-of-speech tagging [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. The dependency
analysis is obtained through the Stanford Parser [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] which defines a grammatical
relations hierarchy and outputs dependencies (we use the collapsed dependencies). A
dependency parse represents a set of grammatical relations (from this hierarchy) that
link each pair of related words in a sentence. Several examples of dependency parses
are provided in the following sections.
3.2
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Syntactic Patterns</title>
          <p>
            Patterns define specific syntactic configurations that link variables, constrained by
given parts-of-speech, using grammatical dependencies. Parts of speech constraints
allow filling in the variables with the right types of grammatical categories and
therefore are essential for the accuracy of the extraction [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. Parts of speech are defined in
the Penn Treebank II3.
          </p>
          <p>Some patterns might overlap and are organized into a pattern hierarchy to trigger
the more detailed patterns first. When a parent pattern is instantiated, all its children
are disqualified for the current sentence, to avoid the extraction of meaningless
fragments. Patterns are interpreted (using Java methods) to extract triples that can
represent candidate domain ontological relationships. Triples also let OntoCmaps identify
potentially relevant domain terms (i.e. content words).</p>
          <p>The transformation rules (which can be considered as Transformation ODPs) focus
on triples to enable mappings with the OWL-DL language. OWL DL is much more
limited than natural language. Consequently, various syntactic configurations do not
have any equivalent in OWL-DL. For example, verbs tense cannot be represented.
There are general conventions that are followed in OntoCmaps for generating possible
mappings: 1) Nouns and combinations of nouns, adjectives and adverbial modifiers
are converted into potential candidate classes; 2) Proper nouns are converted into
named entities (potential instances); 3) Comparative adjectives potentially map to an
OWL Object Property when domain and range are classes; 4) Transitive verbs map to
potential OWL Object Properties when domain and range are classes. They can also
map to data types properties if the range is not considered as a class; 5) Negation on a
3 http://www.cis.upenn.edu/~treebank/
verb between two identified classes maps to the OWL complement construct; 7) Verb
tenses, modals and particles are all aggregated in the label of a potential OWL object
property (verb); 8) The noun following a possessive pronoun is translated into and
OWL Object Property or a data type property; 9) Determiners, quantifiers,
comparative and superlative adverbs are ignored in OntoCmaps at this point.</p>
          <p>There is no predefined meaning assigned to any of the extracted terms and
relationships. Terms and relationships labels might have various morphological forms but are
all related to their root lemma. Therefore, various morphological structures with the
same root all relate to the same relation or term. Semantics is left underspecified or
more specifically specified by the domain context, since OntoCmaps takes a domain
corpus as input. If there are triples related to the term “bank”, then whether it is the
financial institution or the side of a river will be determined by the input corpus and
by the other extracted relationships. The following sections details the patterns used
by OntoCmaps. A pattern is represented using the following convention:
Grammatical relation (Head-Index / POS, Dependent-Index/POS) à Transformation
─ Grammatical relation represents a dependency relation;
─ Head and Dependent are variable names;
─ POS represents a part-of-speech. Note that we use the generic part-of-speech NN
for all the noun parts-of-speech (NN, NNS) and the generic part-of-speech VB for
all the verbal parts-of-speech (VB, VBD, VBG, VBN, VBP and VBZ);
─ Index represents the position of the word in the sentence;
─ Transformation describes the resulting expression when this pattern is instantiated.
3.3</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Expression Extraction</title>
          <p>Simple and multi-word expressions (MWE) are considered candidate domain terms if
they occur in lexico-syntactic ODPs for hierarchical and conceptual relationship
extraction. The first step in OntoCmaps consists of aggregating MWE to generate a new
dependency graph composed of MWE linked by grammatical relations (Table 1). The
most common MWE are obtained through the patterns (1), (2) and (3).</p>
          <p>Example
nn(Systems-3/NN, Computer-1/NN), nn(Systems-3/NN,
Operating-2//NN) àComputer operating systems
amod(intelligence-2/NN, Artificial-1/JJ) à
Intelligence
Artificial
amod(systems-3/NN, Intelligent-1/JJ),
nn(systems3/NN, computing-2/NN)àIntelligent computing
systems</p>
          <p>Note that Pattern (3) is a combination of (1) and (2).
advmod(X/VBN,
Y/RB), dobj(X/VBN,
Z/NN) à Y_X_Z (4)
prep_of/IN (X/NNP,
Y/NNP)à X_of_Y(5)
prep_of /IN(X/NN,
Y/NN)à X_of_Y (6)
advmod(modified-2/VB, Experimentally-1/RB),
dobj(modified-2/VB, cell-3/NN)à Experimentally
modified cell
prep_of/IN (University-2//NNP, Toronto-4/ NNP)à
University of Toronto
prep_of/IN(page-2/NN, book-5/NN)à Page of book
Note that another possible transformation could be Y X
(e.g. book page).</p>
          <p>We also designed two patterns for multi-word expressions containing the preposition
of (5) (6). Pattern (5) extracts a named entity and is useful for ontology population
(rather than learning). The type of multi-word expressions in (6) can be tricky, as the
Y part can represent the domain concept and the X part might be only an attribute
(e.g. the color of the car) or a part (the wheel of the car). However, if this MWE is not
important for the domain, it is likely that it will be sieved during the filtering stage.
3.4</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Relationship Extraction</title>
          <p>Relationship extraction in OntoCmaps refers to hierarchical relationships
extraction (aka taxonomy or hyponymy) and conceptual relationships extraction (OWL
Object Properties). Relationship extraction is run after few other operations, mainly
the aggregation of multi-words expressions, the distributive interpretation of
conjunctions and the removal of function words such as determiners and quantifiers. These
prior operations produce a modified dependency graph used as input for relationships
extraction.</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>Hierarchical Relationship Extraction.</title>
          <p>
            There have been many works [
            <xref ref-type="bibr" rid="ref10 ref2 ref4">2, 4, 10</xref>
            ] in hierarchical relationships extraction
using patterns. In OntoCmaps, hierarchical relationships are mapped to subclasses in
OWL-DL.
          </p>
          <p>OntoCmaps reuses some of Hearst’s patterns (patterns (7) (8) in Table 2) using the
dependency grammar formalism, parts-of speech, and transformation rules. We also
create a hierarchical relationship from the multi-word expression pattern (3) (in Table
1) which involves a noun compound modifier and an adjectival modifier (pattern 9,
Table 2) and from the multi-word expression pattern 4 (Table 1) thus obtaining
pattern (10) (Table 2). Finally, we designed one pattern based on the copula (Pattern
(11), Table 2).</p>
        </sec>
        <sec id="sec-2-2-6">
          <title>Conceptual Relationships Extraction.</title>
          <p>Conceptual relationships refer to OWL Object Properties with a domain and range.
These relationships are among the most difficult to extract. We propose
dependencybased patterns that are divided into four main categories: main clauses (containing a
nominal subject nsubj), passive clauses (containing a passive nominal subject),
relative clauses (containing a relative clause modifier) and finally other clauses which
group certain constructs not belonging to the other categories.</p>
          <p>Main clauses.</p>
          <p>Main clauses are organized around the main verb of the sentence. Pattern (12) has
been already referenced in the ODP portal. Pattern (13) enriches Pattern (12) with a
preposition attached to the main verb and allows the creation of two triples. Patterns
(16-18) use the xcomp relationship (which indicates a clausal complement with an
external subject) to create a relationship between the main subject of the sentence
(nsubj) and its direct object (Pattern (16) and (18)) or its related agent (Pattern (17)).
Finally, Pattern (14) and (15) create a relationship between the nominal subject and
the object of the preposition.
à can define( content packaging, content
organizations)
nsubj(X/VB, Y/NN),
dobj(X/VB, Z/NN),
prep_K(X/VB, A/NN)à
X_Z_K(Y, A), X(Y,Z)(13)
nsubj(X/JJ, Y/NN),
prep_K(X/JJ, A/NN)à
X_K(Y,A) (14)
Nsubj(X/JJ, Y/NN),
cop(X/JJ, V/VB)
prep_P(X/JJ,
Z/NN)àX_P(Y,Z)(15)
nsubj(X/JJ, Y/NN),
cop(X/JJ, C/VB),
xcomp(X/JJ, V/VB),
dobj(V/VB, Z/NN),
à X_V(Y, Z) (16)
nsubj(X/JJ, Y/NN)
xcomp(X/JJ, V/VB)
agent(V/VB, Z/NN)à
X_V(Y,Z) (17)
Nsubj(X/VB, Y/NN),
dobj(X/VB, V/NN)
xcomp(X/VB, Z/VB),
dobj(Z/VB, N/NN)à</p>
          <p>X_V_Z(Y, N)(18)</p>
          <p>AICC has submitted CMI001 to the IEEE.
à has submitted(aicc , cmi001)
has submitted cmi001 to (aicc ,ieee)
The RTE describes the LMS requirements for
managing the runtime environment such as
standardized data model elements used for
passing information relevant to the learner's
experience with the content).
à relevant_to(information, experience)
These branches are visible to the LMS.
à visible to( branch, lms)
The Sequencing Control Choice element
indicates that the learner is free to choose any
activity in a cluster in any order without
restriction.
à free to choose (learner, activity)
The difficulty lies in the fact that the set of all
possible behaviors given all possible inputs is
too large to be covered by the set of observed
examples.
à too large to be covered by (set of possible
behavior, set of observed examples)
SCORM recognizes that some training
resources may contain internal logic to
accomplish a particular learning task à may contain
internal logic to accomplish (training resource,
learning task)
Passive clauses.</p>
          <p>Passive clauses allow the extraction of conceptual relationships (Table 4) and
sometimes their inverse property. For instance, pattern (19) (Table 4) can be used to
define such an inverse property for the relation defined set of information - can be
tracked by - lms environment by creating an OWL inverseOf relation: lms
environment - can track - defined set of information.
The data model element names shall be considered
reserved tokens.
à shall be considered(data model element names,
reserved tokens)
Relative clauses (see Table 5) are generally neglected by similar pattern-based
approaches, as they are often distant from their main subject. The relationships created
by our patterns in this category have generally lengthy labels, but they allow us to
find links between two candidate concepts that might be otherwise neglected.
Pattern
rcmod(X/NN, Y/VB), dobj(Y/VB,
Z/NN), prep_P(Y/VB, Q/NN) à
Y_Z_P(X, Q), Y(X, Z) (23)
Nsubj(X/NN, Y/NN), rcmod(X/NN,
V/VB), dobj(V/VB,
Z/NN)àV(Z,Y)(24)
Nsubj(X/NN, Y/NN), rcmod(X/NN,
V/VB), dobj(V/VB, Z/NN),
Prep_P(V/VB, Q/NN)àV_Z_P(Y,
Q)(25)
Example
Learning Management System is a
software that automates event administration
through a set of services.à
automates_event_administration_through
(Software, set_of_services)
à automates(learning management
system, event administration)
à automates event administration
through (Learning management system,
set of services)
Rcmod(X/NN, V/VB),
xcomp(V1/VB,V2/VB),
dobj(V2/VB, Y/NN),
prep_P(V2/VB, Z/NN)à
V1_V2_Y_P (X, Z) (26)
"1484.11.1" is a standard that defines a set
of data model elements that can be used to
communicate information from a content
object to an LMS. à can be used to
communicate information from (set of
data model element, content object)
This keyword data model element can
rcmod(X/NN, V/VB), dobj(V/VB, only be applied to a data model element
Z/NN)à V(X,Z) (27) that has children.</p>
          <p>à has (data model element, children)</p>
          <p>Relative clauses patterns are focused around the dependency relationship rcmod
and enable making links between a main subject and the rcmod direct object (Pattern
24) or a preposition (Pattern (23) and (25)) in the relative clause. Pattern (26) makes
a link between the noun in a relative clause modifier with the clausal complement
xcomp, and finally Pattern (27) links the noun of the relative clause modifier to its
direct object.</p>
          <p>Other clauses.</p>
          <p>Finally, there are some clauses around infinitival modifiers (infmod) and participial
modifiers (partmod) that modify their noun phrase and that allow us to create
conceptual relationships (Table 6) when they have a direct object or a preposition. Note
again that pattern 31 (Table 6), which has an agent dependency, can lead to an OWL
inverse property similar to the one explained in the passive clauses.
Infmod(X/NN,Y/VB),
Dobj(Y/VB,Z/NN)àY(X,Z) (28)
Partmod(X/NN,V/VB),dobj(V/VB,
Y/NN)à V(X,Y) (29)
Partmod(X/NN, V/VB),
prep_P(V/VB, Y/NN) à V_P(X,Y)
(30)
Partmod(X/NN, V/VB),
agent(V/VB, Y/NN)àV(X,Y) (31)
This value can be requested by the SCO
to determine the next index position.à
determine (sco, next index position)
A SCO can communicate with an LMS
using the SCORM RTE
à using( lms, scorm rte)
...to describe the components used in a
learning experience.
à used in (components, learning
experience)
All data model elements described by
SCORM are ….
described by (data model elements,
scorm)</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation</title>
        <p>
          In order to evaluate our patterns, we essentially relied on two corpora used in previous
experiments [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: the SCORM corpus, which is a set of manuals on the SCORM
eLearning standard and the AI corpus, which is a set of Wikipedia pages about
artificial intelligence. We previously generated and validated two OWL ontologies from
these corpora and we consider them as our gold standards (GSs). Details about these
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>GSs can be found in [6]4. We then calculated the precision of the various patterns</title>
      <p>based on these GSs.</p>
      <p>Precision= number of generated relationships or concepts per pattern (A) / number of
relationships or concepts in (A) that exist in the GS</p>
    </sec>
    <sec id="sec-4">
      <title>Tables 7-11 report the various results of each pattern category5. Each table pre</title>
      <p>sents, for each pattern in the category, the precision of the extracted relationships and
concepts in both corpora. One must note that some of the relationships extracted by
patterns were perfectly valid (from a lexical point of view) but were not found in the
GS, thus reducing the reported precision. Another point is that concepts are simple or
multi-words expressions that occur in a relationship. Therefore, we were able to
calculate concept precision as well by identifying how many concepts involved in the
extracted relationships were in the GSs.
We can notice that the precision of hierarchical relationships and their corresponding
concepts is quite high (Table 7).
4 Available at http://azouaq.athabascau.ca/goldstandards.htm
5 Extraction examples for each pattern can be found at
http://azouaq.athabascau.ca/experiments/wop2012/SCORMPatterns_WOP2012.xls and
AIPatterns_WOP2012.xls</p>
      <p>Relations Relations Concepts Concepts
Pattern SCORM AI SCORM AI
43.18 25.00 80.58 70.00
60.58 54.54 86.04 73.58
38.77 37.50 82.10 80.77</p>
      <p>Average 47.51 26.85 82.91 74.78</p>
      <p>
        These results give a general idea on the Precision of the lexico-syntactic patterns.
As we have previously mentioned, and as the results confirm it, there is a need to
filter the various extractions using statistical and/or graph-based metrics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
most frequent patterns were nsubj-dobj, nsubjpas-prep, nsubj-dobj-prep, nsubj-prep
and partmod-prep. The most precise (but scarcer) patterns, without any filtering, were
hierarchical patterns. One important observation is the quite high precision of
concepts even without filtering. Regarding relationships, it is possible to imagine that if
concepts of interest are known upfront, then these patterns will be very useful for
discovering relationships between these predefined concepts. This will be tackled in
future work. We also created few patterns for attributes extraction involving
possessives or nominal subject and copula with adjectives. However, the way to translate
these attribute relationships into OWL-DL was not straightforward.
5
      </p>
      <sec id="sec-4-1">
        <title>Conclusion</title>
        <p>This paper presented a list of the main patterns used in OntoCmaps, our ontology
learning tool. These patterns target specific syntactic structures in a dependency
representation and are useful for the extraction of multi-word expressions and triples that
can be later translated into OWL classes and properties. There were some simplifying
assumptions made in OntoCmaps, mainly the removal of determiners and the lack of
co-reference resolution that should be included in future work. In this current state,
our patterns represent a good starting base that any researcher in text mining might
use and especially the ontology learning community which lacks clear and reusable
design patterns. Overall, future efforts will tackle how and if a more fine-grained
semantic analysis would be beneficial to the ontology learning task. Another future
task will be to extend the coverage of our patterns by extracting frequently occurring
syntactic structures using machine learning methods. Finally, one of the lessons
learned in this paper is that such pattern-based extraction should necessarily be
coupled with a filtering mechanism to increase the precision of the extractions.
Acknowledgments. This research was funded by the NSERC Discovery Grant
Program.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blomqvist</surname>
          </string-name>
          . E.:
          <article-title>Semi-automatic Ontology Construction based on Patterns</article-title>
          .
          <source>PhD Thesis</source>
          . Linköping University, Department of Computer and Information Science (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          et al.:
          <article-title>A Library of Ontology Design Patterns: Reusable Solutions for Collaborative Design of Networked Ontologies</article-title>
          .
          <source>NeOn D2.5</source>
          .
          <issue>1</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Völker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Acquisition of OWL DL axioms from lexical resources</article-title>
          .
          <source>In: Proc. of the 4th European Semantic Web Conf.</source>
          , pp.
          <fpage>670</fpage>
          -
          <lpage>685</lpage>
          , Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hearst</surname>
          </string-name>
          . M.
          <article-title>: Automatic Acquisition of Hyponyms from Large Text Corpora</article-title>
          .
          <source>In Proc. of the 14th International Conf. on Computational Linguistics</source>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          (
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>De Marneffe</surname>
          </string-name>
          , M-C ,
          <article-title>MacCartney B</article-title>
          . &amp;
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          :
          <article-title>Generating Typed Dependency Parses from Phrase Structure Parses</article-title>
          .
          <source>In Proc. of LREC</source>
          , pp.
          <fpage>449</fpage>
          -
          <lpage>454</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zouaq</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasevic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Hatala</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Towards open ontology learning and filtering</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>36</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1064</fpage>
          -
          <lpage>1081</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Funk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Using Lexico-Syntactic Ontology Design Patterns for ontology creation and population, CEUR-WS</article-title>
          .org, (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zouaq</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gagnon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ozell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Semantic Analysis using Dependency-based Grammars</article-title>
          and
          <string-name>
            <surname>Upper-Level</surname>
            <given-names>Ontologies</given-names>
          </string-name>
          ,
          <source>International Journal of Computational Linguistics and Applications</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>85</fpage>
          -
          <lpage>101</lpage>
          ,
          <string-name>
            <given-names>Bahri</given-names>
            <surname>Publications</surname>
          </string-name>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Singer</surname>
          </string-name>
          . M.:
          <article-title>Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network</article-title>
          .
          <source>In Proc. of HLT-NAACL</source>
          , pp.
          <fpage>252</fpage>
          -
          <lpage>259</lpage>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Snow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , A. Y.:
          <article-title>Learning syntactic patterns for automatic hypernym discovery</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          , pp.
          <fpage>1297</fpage>
          -
          <lpage>1304</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>