=Paper=
{{Paper
|id=None
|storemode=property
|title=Linguistic Patterns for Information Extraction in OntoCmaps
|pdfUrl=https://ceur-ws.org/Vol-929/paper6.pdf
|volume=Vol-929
|dblpUrl=https://dblp.org/rec/conf/semweb/ZouaqGH12
}}
==Linguistic Patterns for Information Extraction in OntoCmaps==
Linguistic Patterns for Information Extraction in OntoCmaps Amal Zouaq1,2, Dragan Gasevic2,3, Marek Hatala3 1 Royal Military College of Canada, CP 17000, Succ. Forces, Kingston, ON Canada K7K 7B4 2 Athabasca University, 1 University Drive, Athabasca, AB T9S 3A3 3 Simon Fraser University, 250-102nd Avenue, Surrey, BC Canada V3T 0A3 Amal.zouaq@rmc.ca, dgasevic@acm.org, mhatala@sfu.ca Abstract. Linguistic patterns have proven their importance for the knowledge engineering field especially with the ever-increasing amount of available data. This is especially true for the Semantic Web, which relies on a formalization of knowledge into triples and linked data. This paper presents a number of syntac- tic patterns, based on dependency grammars, which output triples useful for the ontology learning task. Our experimental results show that these patterns are a good starting base for text mining initiatives in general and ontology learning in particular. Keywords: Linguistic patterns, dependency grammars, triples, knowledge ex- traction, ontology learning. 1 Introduction With the development of Semantic Web technologies and the increased number of initiatives relative to the Web of data, there is a need to create reusable and high quality ontologies. For this purpose, ontology design patterns (ODP) have been mod- eled and span various aspects of ontological design such as architectural ODPs, Con- tent ODPs and Reengineering ODPs [1]. These patterns enable the definition of for- mal methodologies for ontology creation and maintenance. Of particular interest to the knowledge engineering community are Lexico-Syntactic Patterns. In fact, with the availability of large unstructured knowledge resources such as Wikipedia, it becomes crucial to be able to extract ontological content using scalable (semi)automatic knowledge extraction techniques. In this work, we consider that lexico-syntactic pat- terns are syntactic structures that trigger the extraction of chunks of information. Ob- viously, these patterns cannot be used in isolation and they necessitate various filter- ing mechanisms [6] to identify ontological knowledge from this extracted textual information. However, by constituting Lexico-syntactic ODP catalogs, it is likely that these ODPs will be used and reused as the first building block of ontology learning initiatives. So far, there are some ODPs repositories [2] but their number remains modest. Trying to address this issue from a knowledge extraction perspective, this paper introduces the main patterns that are used by OntoCmaps, an ontology learning tool [6]. OntoCmaps exploits a pattern knowledge base that is domain independent. On- toCmaps lexico-syntactic ODPs are based on a dependency grammar formalism [5] that is well-suited for knowledge extraction. This paper describes the various ODPs that are used to extract multi-word expressions, hierarchical relationships and concep- tual relationships that can be later promoted as domain knowledge and converted in OWL1 format. The paper details each class of patterns, and presents the accuracy of each pattern prior to any filtering. Our results show that filtering techniques should be used in a separate sieve on top of the ODPs to improve the accuracy of the extraction. 2 Related Work ODP are quite recent design patterns whose objective is to create modeling solutions to well-identified problems in ontology engineering and thus promote good design. In this paper, we are mostly interested in one subclass of ODPs: Lexico-syntactic pat- terns (LSPs). In particular LSPs are meant to identify which patterns in texts corre- spond to logical constructs of the OWL language. This type of patterns is essential from a semi-automatic ontology engineering point of view. In fact, as soon as the domain becomes a real-world problem, it starts to be difficult and very expensive to manually design an ontology. In general LSPs are a widely used method in text mining and can be traced back to the work of [4] for hyponymy extraction. There have been several attempts to use LSPs for ontology learning [2, 3, 6, 7], relation extraction [7] or for axiom extraction [3]. Among the most similar works to ours are SPRAT [7] and [2] which propose various patterns for ontology learning and population. One of the peculiarities of our approach is that we designed purely syntactic patterns that examine a bigger number of linguistic constructs (e.g. relative clause modifiers, adjectival complements, copu- la, etc.) than what is available in the state of the art to extract information from text. Moreover, our patterns are based solely on a dependency grammar combined with parts-of-speech tagging. We used a similar approach in a previous work [8] on seman- tic analysis but the aim was not the extraction of triples for ontology learning and the patterns themselves were not structured and conceived in the same way. Finally, there are also various LSPs identified on the ODP portal2 but to the best of our knowledge, the majority of the patterns in this paper are new with respect to the listed LSPs. Overall, 29 over the 31 patterns presented in this paper are not listed on the ODP por- tal. There is one common pattern for object property extraction (nsubj-dobj) which is already widely used in the information extraction field and one common pattern for hierarchical relations extraction (nsubj-cop). Finally, in one case, there is a similarity between one LSP used to extract sub-classes/super-classes relationships and one of our patterns for hierarchical relationship extraction (with the use of the expression including instead of include). In any case, one major difference is the use of depend- ency relations in OntoCmaps. 1 http://www.w3.org/TR/owl-features/ 2 http://ontologydesignpatterns.org/wiki/Category:ProposedLexicoSyntacticOP 3 Lexico-Syntactic Patterns in OntoCmaps 3.1 OntoCmaps OntoCmaps is an ontology learning tool that takes unstructured texts about a domain of interest as input [6]. OntoCmaps is essentially based on two main stages: a knowledge extraction step which relies on syntactic patterns to extract candidate tri- ples from texts, and a knowledge filtering step which acts as a sieve to identify rele- vant triples among these candidates. Since this paper focuses on the knowledge ex- traction part, this section presents the formalisms and tools used during the extraction stage. As aforementioned, OntoCmaps patterns are mainly syntactic patterns which use a dependency grammar formalism and part-of-speech tagging [9]. The dependency analysis is obtained through the Stanford Parser [5] which defines a grammatical rela- tions hierarchy and outputs dependencies (we use the collapsed dependencies). A dependency parse represents a set of grammatical relations (from this hierarchy) that link each pair of related words in a sentence. Several examples of dependency parses are provided in the following sections. 3.2 Syntactic Patterns Patterns define specific syntactic configurations that link variables, constrained by given parts-of-speech, using grammatical dependencies. Parts of speech constraints allow filling in the variables with the right types of grammatical categories and there- fore are essential for the accuracy of the extraction [8]. Parts of speech are defined in the Penn Treebank II3. Some patterns might overlap and are organized into a pattern hierarchy to trigger the more detailed patterns first. When a parent pattern is instantiated, all its children are disqualified for the current sentence, to avoid the extraction of meaningless frag- ments. Patterns are interpreted (using Java methods) to extract triples that can repre- sent candidate domain ontological relationships. Triples also let OntoCmaps identify potentially relevant domain terms (i.e. content words). The transformation rules (which can be considered as Transformation ODPs) focus on triples to enable mappings with the OWL-DL language. OWL DL is much more limited than natural language. Consequently, various syntactic configurations do not have any equivalent in OWL-DL. For example, verbs tense cannot be represented. There are general conventions that are followed in OntoCmaps for generating possible mappings: 1) Nouns and combinations of nouns, adjectives and adverbial modifiers are converted into potential candidate classes; 2) Proper nouns are converted into named entities (potential instances); 3) Comparative adjectives potentially map to an OWL Object Property when domain and range are classes; 4) Transitive verbs map to potential OWL Object Properties when domain and range are classes. They can also map to data types properties if the range is not considered as a class; 5) Negation on a 3 http://www.cis.upenn.edu/~treebank/ verb between two identified classes maps to the OWL complement construct; 7) Verb tenses, modals and particles are all aggregated in the label of a potential OWL object property (verb); 8) The noun following a possessive pronoun is translated into and OWL Object Property or a data type property; 9) Determiners, quantifiers, compara- tive and superlative adverbs are ignored in OntoCmaps at this point. There is no predefined meaning assigned to any of the extracted terms and relation- ships. Terms and relationships labels might have various morphological forms but are all related to their root lemma. Therefore, various morphological structures with the same root all relate to the same relation or term. Semantics is left underspecified or more specifically specified by the domain context, since OntoCmaps takes a domain corpus as input. If there are triples related to the term “bank”, then whether it is the financial institution or the side of a river will be determined by the input corpus and by the other extracted relationships. The following sections details the patterns used by OntoCmaps. A pattern is represented using the following convention: Grammatical relation (Head-Index / POS, Dependent-Index/POS) à Transformation ─ Grammatical relation represents a dependency relation; ─ Head and Dependent are variable names; ─ POS represents a part-of-speech. Note that we use the generic part-of-speech NN for all the noun parts-of-speech (NN, NNS) and the generic part-of-speech VB for all the verbal parts-of-speech (VB, VBD, VBG, VBN, VBP and VBZ); ─ Index represents the position of the word in the sentence; ─ Transformation describes the resulting expression when this pattern is instantiated. 3.3 Expression Extraction Simple and multi-word expressions (MWE) are considered candidate domain terms if they occur in lexico-syntactic ODPs for hierarchical and conceptual relationship ex- traction. The first step in OntoCmaps consists of aggregating MWE to generate a new dependency graph composed of MWE linked by grammatical relations (Table 1). The most common MWE are obtained through the patterns (1), (2) and (3). Table 1. Multi-word expressions patterns Pattern Example nn(Systems-3/NN, Computer-1/NN), nn(Systems-3/NN, nn(X/NN, Y/NN) à Operating-2//NN) àComputer operating systems Y_X (1) amod(intelligence-2/NN, Artificial-1/JJ) à Artificial amod(X/NN, Intelligence Y/JJ)àY_Z (2) amod(systems-3/NN, Intelligent-1/JJ), nn(systems- nn(X,Y), amod(X,Z) 3/NN, computing-2/NN)àIntelligent computing sys- à Z_Y_X (3) tems Note that Pattern (3) is a combination of (1) and (2). advmod(modified-2/VB, Experimentally-1/RB), advmod(X/VBN, dobj(modified-2/VB, cell-3/NN)à Experimentally Y/RB), dobj(X/VBN, modified cell Z/NN) à Y_X_Z (4) prep_of/IN (University-2//NNP, Toronto-4/ NNP)à prep_of/IN (X/NNP, University of Toronto Y/NNP)à X_of_Y (5) prep_of/IN(page-2/NN, book-5/NN)à Page of book prep_of /IN(X/NN, Note that another possible transformation could be Y X Y/NN)à X_of_Y (6) (e.g. book page). We also designed two patterns for multi-word expressions containing the preposition of (5) (6). Pattern (5) extracts a named entity and is useful for ontology population (rather than learning). The type of multi-word expressions in (6) can be tricky, as the Y part can represent the domain concept and the X part might be only an attribute (e.g. the color of the car) or a part (the wheel of the car). However, if this MWE is not important for the domain, it is likely that it will be sieved during the filtering stage. 3.4 Relationship Extraction Relationship extraction in OntoCmaps refers to hierarchical relationships extrac- tion (aka taxonomy or hyponymy) and conceptual relationships extraction (OWL Object Properties). Relationship extraction is run after few other operations, mainly the aggregation of multi-words expressions, the distributive interpretation of conjunc- tions and the removal of function words such as determiners and quantifiers. These prior operations produce a modified dependency graph used as input for relationships extraction. Hierarchical Relationship Extraction. There have been many works [2, 4, 10] in hierarchical relationships extraction us- ing patterns. In OntoCmaps, hierarchical relationships are mapped to subclasses in OWL-DL. OntoCmaps reuses some of Hearst’s patterns (patterns (7) (8) in Table 2) using the dependency grammar formalism, parts-of speech, and transformation rules. We also create a hierarchical relationship from the multi-word expression pattern (3) (in Table 1) which involves a noun compound modifier and an adjectival modifier (pattern 9, Table 2) and from the multi-word expression pattern 4 (Table 1) thus obtaining pat- tern (10) (Table 2). Finally, we designed one pattern based on the copula (Pattern (11), Table 2). Table 2. Hierarchical Relations Patterns Pattern Example Animals such as monkeys and apes prep_such_as/IN(X/NNS, prep_such_as/IN(animals-1/NNS, monkeys- Y/NNS)àis-a(Y, X) (7) 4/NNS), prep_such_as/IN(animals-1/NNS, apes- 6/NNS) à is_a (monkeys, animals); is-a(apes, animals) All buildings, including houses and castles… prep_including/IN(X/NNS, prep_including/IN(buildings-2/NNS, houses- Y/NNS)àis-a(Y, X) (8) 5/NNS)à is-a(houses, buildings); is-a(castles, buildings) Intelligent computing systems… nn(X,Y), amod(Y,Z)à is-a amod(systems-3, Intelligent-1), nn(systems-3, (Y, X) (9) computing-2)à is-a (Intelligent computing sys- tems, computing systems)à is-a(Computing systems, systems) Genetically modified food advmod(X/VBN, Y/RB), advmod(modified-2, Genetically-1), dobj(X/VBN, Z/NN) à is- dobj(modified-2, food-3)à is-a(Genetically a(Y_X_Z, Z)(10) modified food, food) Pinguins are birds nsubj(X/NNS, Y/NNS), nsubj(birds-3/NNS, Pinguins-1/NNS), cop(birds- cop(X/NNS, Z/VBP)à is- 3/NNS, are-2/VBP)à is-a(Pinguins, Birds) a(Y,X) (11) Conceptual Relationships Extraction. Conceptual relationships refer to OWL Object Properties with a domain and range. These relationships are among the most difficult to extract. We propose dependency- based patterns that are divided into four main categories: main clauses (containing a nominal subject nsubj), passive clauses (containing a passive nominal subject), rela- tive clauses (containing a relative clause modifier) and finally other clauses which group certain constructs not belonging to the other categories. Main clauses. Main clauses are organized around the main verb of the sentence. Pattern (12) has been already referenced in the ODP portal. Pattern (13) enriches Pattern (12) with a preposition attached to the main verb and allows the creation of two triples. Patterns (16-18) use the xcomp relationship (which indicates a clausal complement with an external subject) to create a relationship between the main subject of the sentence (nsubj) and its direct object (Pattern (16) and (18)) or its related agent (Pattern (17)). Finally, Pattern (14) and (15) create a relationship between the nominal subject and the object of the preposition. Table 3. Main clause patterns for conceptual relationship extraction Pattern Example Content packaging can define content organiza- nsubj(X/VB, Y/NN), tions. dobj(X/VB, Z/NN)àX(Y,Z) à can define( content packaging, content or- (12) ganizations) AICC has submitted CMI001 to the IEEE. nsubj(X/VB, Y/NN), à has submitted(aicc , cmi001) dobj(X/VB, Z/NN), has submitted cmi001 to (aicc ,ieee) prep_K(X/VB, A/NN)à X_Z_K(Y, A), X(Y,Z)(13) The RTE describes the LMS requirements for nsubj(X/JJ, Y/NN), managing the runtime environment such as prep_K(X/JJ, A/NN)à standardized data model elements used for X_K(Y,A) (14) passing information relevant to the learner's experience with the content). à relevant_to(information, experience) These branches are visible to the LMS. Nsubj(X/JJ, Y/NN), à visible to( branch, lms) cop(X/JJ, V/VB) prep_P(X/JJ, Z/NN)àX_P(Y,Z) (15) The Sequencing Control Choice element indi- nsubj(X/JJ, Y/NN), cates that the learner is free to choose any ac- cop(X/JJ, C/VB), tivity in a cluster in any order without re- xcomp(X/JJ, V/VB), striction. dobj(V/VB, Z/NN), à free to choose (learner, activity) à X_V(Y, Z) (16) The difficulty lies in the fact that the set of all nsubj(X/JJ, Y/NN) possible behaviors given all possible inputs is xcomp(X/JJ, V/VB) too large to be covered by the set of observed agent(V/VB, Z/NN)à examples. X_V(Y,Z) (17) à too large to be covered by (set of possible behavior, set of observed examples) Nsubj(X/VB, Y/NN), SCORM recognizes that some training re- dobj(X/VB, V/NN) sources may contain internal logic to accom- xcomp(X/VB, Z/VB), plish a particular learning task à may contain dobj(Z/VB, N/NN)à internal logic to accomplish (training resource, X_V_Z(Y, N)(18) learning task) Passive clauses. Passive clauses allow the extraction of conceptual relationships (Table 4) and sometimes their inverse property. For instance, pattern (19) (Table 4) can be used to define such an inverse property for the relation defined set of information - can be tracked by - lms environment by creating an OWL inverseOf relation: lms environ- ment - can track - defined set of information. Table 4. Passive clauses patterns for conceptual relationship extraction Pattern Example Nsubjpass(V/VB, Y/N), The purpose of establishing a common data model is Agent(V/VB, Z/N)à to ensure that a set of information can be tracked by V(Y,Z) (19) different LMS environments. à can be tracked by (defined set of information, lms environment) Nsubjpass(V/JJ, Sequencing information is defined on the Activities X/NN), cop(V/JJ, and is external to the training resources associated V/VB), prep_P(V/JJ, with those Activities. Y/NN)à V_P(X,Y)(20) à external to (sequencing information, training re- source) Nsubjpass(V/VB, Metadata can be applied to Assets. X/NN), prep_P(V/VB, à can be applied to (metadata, assets) Y/NN)à V_P(X,Y) (21) The data model element names shall be considered Nsubjpass(V/VB,X/NN), reserved tokens. dobj(V/VB,Y/NN)à à shall be considered(data model element names, V(X,Y)(22) reserved tokens) Relative Clauses. Relative clauses (see Table 5) are generally neglected by similar pattern-based ap- proaches, as they are often distant from their main subject. The relationships created by our patterns in this category have generally lengthy labels, but they allow us to find links between two candidate concepts that might be otherwise neglected. Table 5. Relative clauses patterns for conceptual relationship extraction Pattern Example Learning Management System is a soft- rcmod(X/NN, Y/VB), dobj(Y/VB, ware that automates event administration Z/NN), prep_P(Y/VB, Q/NN) à through a set of services.à auto- Y_Z_P(X, Q), Y(X, Z) (23) mates_event_administration_through (Software, set_of_services) Nsubj(X/NN, Y/NN), rcmod(X/NN, à automates(learning management sys- V/VB), dobj(V/VB, tem, event administration) Z/NN)àV(Z,Y)(24) à automates event administration Nsubj(X/NN, Y/NN), rcmod(X/NN, through (Learning management system, V/VB), dobj(V/VB, Z/NN), set of services) Prep_P(V/VB, Q/NN)àV_Z_P(Y, Q)(25) "1484.11.1" is a standard that defines a set Rcmod(X/NN, V/VB), of data model elements that can be used to xcomp(V1/VB,V2/VB), communicate information from a content dobj(V2/VB, Y/NN), object to an LMS. à can be used to prep_P(V2/VB, Z/NN)à communicate information from (set of V1_V2_Y_P (X, Z) (26) data model element, content object) This keyword data model element can rcmod(X/NN, V/VB), dobj(V/VB, only be applied to a data model element Z/NN)à V(X,Z) (27) that has children. à has (data model element, children) Relative clauses patterns are focused around the dependency relationship rcmod and enable making links between a main subject and the rcmod direct object (Pattern 24) or a preposition (Pattern (23) and (25)) in the relative clause. Pattern (26) makes a link between the noun in a relative clause modifier with the clausal complement xcomp, and finally Pattern (27) links the noun of the relative clause modifier to its direct object. Other clauses. Finally, there are some clauses around infinitival modifiers (infmod) and participial modifiers (partmod) that modify their noun phrase and that allow us to create concep- tual relationships (Table 6) when they have a direct object or a preposition. Note again that pattern 31 (Table 6), which has an agent dependency, can lead to an OWL inverse property similar to the one explained in the passive clauses. Table 6. Other clauses patterns for conceptual relationship extraction This value can be requested by the SCO Infmod(X/NN,Y/VB), to determine the next index position.à Dobj(Y/VB,Z/NN)àY(X,Z) (28) determine (sco, next index position) A SCO can communicate with an LMS Partmod(X/NN,V/VB),dobj(V/VB, using the SCORM RTE Y/NN)à V(X,Y) (29) à using( lms, scorm rte) ...to describe the components used in a Partmod(X/NN, V/VB), learning experience. prep_P(V/VB, Y/NN) à V_P(X,Y) à used in (components, learning experi- (30) ence) All data model elements described by Partmod(X/NN, V/VB), SCORM are …. agent(V/VB, Y/NN)àV(X,Y) (31) described by (data model elements, scorm) 4 Evaluation In order to evaluate our patterns, we essentially relied on two corpora used in previous experiments [6]: the SCORM corpus, which is a set of manuals on the SCORM eLearning standard and the AI corpus, which is a set of Wikipedia pages about artifi- cial intelligence. We previously generated and validated two OWL ontologies from these corpora and we consider them as our gold standards (GSs). Details about these GSs can be found in [6]4. We then calculated the precision of the various patterns based on these GSs. Precision= number of generated relationships or concepts per pattern (A) / number of relationships or concepts in (A) that exist in the GS Tables 7-11 report the various results of each pattern category5. Each table pre- sents, for each pattern in the category, the precision of the extracted relationships and concepts in both corpora. One must note that some of the relationships extracted by patterns were perfectly valid (from a lexical point of view) but were not found in the GS, thus reducing the reported precision. Another point is that concepts are simple or multi-words expressions that occur in a relationship. Therefore, we were able to cal- culate concept precision as well by identifying how many concepts involved in the extracted relationships were in the GSs. Table 7. Precision of Hierarchical Relationships Patterns. Relations Relations Concepts Concepts Pattern SCORM AI SCORM AI Pattern (7) 47.46 93.75 79.45 84.37 Pattern (11) 74.63 56.76 91.29 72.79 Pattern (8) 100.00 76.92 100.00 80.77 Average 74.03 75.81 90.25 79.31 We can notice that the precision of hierarchical relationships and their corresponding concepts is quite high (Table 7). Table 8. Precision of Conceptual Relationships Patterns (Main Clauses) Relations Relations Concepts Concepts Pattern SCORM AI SCORM AI Pattern (12) 52.16 24.10 81.75 57.45 Pattern (16) 75.00 0 100.00 33.33 Pattern (15) 50.75 35.29 79.58 61.67 Pattern (13) 50.00 26.25 78.24 49.51 Pattern (14) 36.31 28.33 79.49 62.62 4 Available at http://azouaq.athabascau.ca/goldstandards.htm 5 Extraction examples for each pattern can be found at http://azouaq.athabascau.ca/experiments/wop2012/SCORMPatterns_WOP2012.xls and AI- Patterns_WOP2012.xls Pattern (17) 0 0 0 0 Pattern (18) 0 0 80.43 90 Average 37.74 16.28 71.35 50.65 Interestingly, passive clauses (Table 9) and other clauses (Table 11) achieve a bet- ter performance for relationships precision than main clauses (Table 8) which get approximately the same results as relative clauses (Table 10). Table 9. Precision of Conceptual Relationships Patterns (Passive Clauses). Relations Relations Concepts Concepts Pattern SCORM AI SCORM AI Pattern (19) 69.56 25.00 74.00 56.25 Pattern (22) 60.00 66.67 88.00 100.00 Pattern (21) 58.45 44.78 84.16 66.67 Pattern (20) 75.00 na 95.00 Na Average 65.75 45.48 85.29 74.31 Table 10. Precision of Conceptual Relationships Patterns (Relative Clauses). Relations Relations Concepts Concepts Pattern SCORM AI SCORM AI Pattern (23) 41.18 56.25 62.07 66.67 Pattern (25) 0 0 90.48 60.00 Pattern (24) 73.33 50.00 93.90 76.47 Pattern (26) 0 0 85.71 0 Pattern (27) 50.62 21.62 86.39 56.52 Average 33.02 25.57 83.71 51.93 Table 11. Precision of Conceptual Relationships Patterns (Other Clauses). Relations Relations Concepts Concepts Pattern SCORM AI SCORM AI Pattern (29) 43.18 25.00 80.58 70.00 Pattern (30) 60.58 54.54 86.04 73.58 Pattern (28) 38.77 37.50 82.10 80.77 Average 47.51 26.85 82.91 74.78 These results give a general idea on the Precision of the lexico-syntactic patterns. As we have previously mentioned, and as the results confirm it, there is a need to filter the various extractions using statistical and/or graph-based metrics [6]. The most frequent patterns were nsubj-dobj, nsubjpas-prep, nsubj-dobj-prep, nsubj-prep and partmod-prep. The most precise (but scarcer) patterns, without any filtering, were hierarchical patterns. One important observation is the quite high precision of con- cepts even without filtering. Regarding relationships, it is possible to imagine that if concepts of interest are known upfront, then these patterns will be very useful for discovering relationships between these predefined concepts. This will be tackled in future work. We also created few patterns for attributes extraction involving posses- sives or nominal subject and copula with adjectives. However, the way to translate these attribute relationships into OWL-DL was not straightforward. 5 Conclusion This paper presented a list of the main patterns used in OntoCmaps, our ontology learning tool. These patterns target specific syntactic structures in a dependency rep- resentation and are useful for the extraction of multi-word expressions and triples that can be later translated into OWL classes and properties. There were some simplifying assumptions made in OntoCmaps, mainly the removal of determiners and the lack of co-reference resolution that should be included in future work. In this current state, our patterns represent a good starting base that any researcher in text mining might use and especially the ontology learning community which lacks clear and reusable design patterns. Overall, future efforts will tackle how and if a more fine-grained semantic analysis would be beneficial to the ontology learning task. Another future task will be to extend the coverage of our patterns by extracting frequently occurring syntactic structures using machine learning methods. Finally, one of the lessons learned in this paper is that such pattern-based extraction should necessarily be cou- pled with a filtering mechanism to increase the precision of the extractions. Acknowledgments. This research was funded by the NSERC Discovery Grant Pro- gram. References 1. Blomqvist. E.: Semi-automatic Ontology Construction based on Patterns. PhD Thesis. Linköping University, Department of Computer and Information Science (2009) 2. Presutti, V. et al.: A Library of Ontology Design Patterns: Reusable Solutions for Collabo- rative Design of Networked Ontologies. NeOn D2.5.1 (2008) 3. Völker, J., Hitzler, P. & Cimiano, P.: Acquisition of OWL DL axioms from lexical re- sources. In: Proc. of the 4th European Semantic Web Conf., pp.670-685, Springer (2007) 4. Hearst. M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In Proc. of the 14th International Conf. on Computational Linguistics, pp.539-545 (1992). 5. De Marneffe, M-C , MacCartney B. & Manning, C. D.: Generating Typed Dependency Parses from Phrase Structure Parses. In Proc. of LREC, pp. 449–454 (2006). 6. Zouaq, A., Gasevic, D. & Hatala, M.: Towards open ontology learning and filtering. Inf. Syst. 36(7): 1064-1081 (2011) 7. Maynard, D.; Funk, A. & Peters, W.: Using Lexico-Syntactic Ontology Design Patterns for ontology creation and population, CEUR-WS.org, (2009). 8. Zouaq, A., Gagnon, M. & Ozell, B.: Semantic Analysis using Dependency-based Gram- mars and Upper-Level Ontologies, International Journal of Computational Linguistics and Applications, 1(1-2): 85-101, Bahri Publications (2010). 9. Toutanova, K., Klein, D., Manning, D. & Singer. M.: Feature-Rich Part-of-Speech Tag- ging with a Cyclic Dependency Network. In Proc. of HLT-NAACL, pp. 252-259 (2003). 10. Snow, R., Jurafsky, D. and Ng, A. Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems, pp.1297-1304 (2004).