Pattern for Re-engineering a Term-based Thesaurus, Which Follows the Record-based Model, to a Lightweight Ontology http://ontologydesignpatterns.org/wiki/Submissions:Term-based - record- based model - thesaurus to lightweight ontology Boris Villazón-Terrazas1 , Mari Carmen Suárez-Figueroa1 , and Asunción Gómez-Pérez1 Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Spain {bvillazon,mcsuarez,asun}@fi.upm.es, WWW home page: http://www.oeg-upm.net/ 1 Introduction This pattern for re-engineering non-ontological resources (PR-NOR) fits in the Schema Re-engineering Category proposed by [3]. The pattern defines a proce- dure that transforms the term-based thesaurus components into ontology rep- resentational primitives. This pattern comes from the experience of ontology engineers in developing ontologies using thesauri in several projects (SEEMP1 , NeOn2 , and Knowledge Web3 ). The pattern is included in a pool of patterns, which is a key element of our method for re-engineering non-ontological resources into ontologies [2]. The patterns generate the ontologies at a conceptualization level, independent of the ontology implementation language. 2 Pattern Problem Re-engineering a term-based thesaurus, which follows the record-based model, to design a lightweight ontology. Non-Ontological Resource A non-ontological resource holds a term-based the- saurus which follows the record-based model. A thesaurus represents the knowledge of a domain with a collection of terms and a limited set of re- lations between them. The record-based data model [4] is a denormal- ized structure, uses a record for every term with the information about the term, such as synonyms, broader, narrower and related terms. Applicability The semantics of the relation between narrower and broader terms are subClassOf. 1 http://www.seemp.org 2 http://www.neon-project.org 3 http://knowledgeweb.semanticweb.org 108 Ontology Generated The ontology generated will be based on the lightweight ontology architectural pattern (AP- LW-01) [5]. Each thesaurus term is mapped to a class. A subClassOf relation is defined between the new classes for the BT/NT relation. A relatedClass re- lation is defined between the new classes for the RT relation. For the UF/USE relations the Syn- onymOrEquivalence (SOE) pattern [1] is applied. Process - Solution 1. Identify the records that contain thesaurus terms without a broader term. 2. For each one of the above identified thesaurus terms ti : 2.1. Create the corresponding ontology class, Ci class, if it is not created yet. 2.2. Identify the thesaurus term, tj , which are narrower terms of ti . They are refer- enced in the same record that contains ti . 2.3. For each one of the above identified the- saurus term tj : 2.3.1. Create the corresponding ontology class, Cj class, if it is not created yet. 2.3.2. Set up the subClassOf relation be- tween Cj and Ci 2.3.3. Repeat from step 2.2 for cj as a new ti 2.4. Identify the thesaurus term, tr , which are related terms of ti . They are refer- enced in the same record that contains ti . 2.5. For each one of the above identified the- saurus term tr : 2.5.1. Create the corresponding ontology class, Cr class, if it is not created yet. 2.5.2. Set up the relatedClass relation be- tween Cr and Ci 2.5.3. Repeat from step 2.4 for tr as a new ti 2.6. Identify the thesaurus term, tq , which are equivalent terms of ti . They are ref- erenced in the same record that contains ti . 2.7. For each one of the above identified the- saurus term tq : 2.7.1. Apply the SynonymOrEquivalence (SOE) pattern. Example Suppose that someone wants to build a lightweight ontology based on the European Training Thesaurus (ETT), which is a term-based thesaurus and it follows the record-based model. 109 Non-Ontological Resource The European Training Thesaurus (ETT) consti- tutes the controlled vocabulary of reference in the field of vocational education and training (VET) in Europe. The relation semantics between the sub-ordinate and the super-ordinate concepts is subClassOf. This classification scheme is available at http://libserver.cedefop.europa.eu/ett/en/ Ontology Generated The ontology generated will be based on the lightweight ontology architectural pattern (AP- LW-01) [5]. Each thesaurus term is mapped to a class. A subClassOf relation is defined between the new classes for the BT/NT relation. A relatedClass re- lation is defined between the new classes for the RT relation. For the UF/USE relations the Syn- onymOrEquivalence (SOE) pattern [1] is applied. Process - Solution 1. Create the learning class and the personal development class. 2. Create the competence class and assert that competence is subClassOf learning. 3. Create the performance class and assert that performance is subClassOf development. 4. Assert that achievement is label of the performance class. 5. Assert that competence is relatedClass of performance. 6. Create the skill class and assert that skill is subClassOf competence. 6.1. Create the efficiency class and as- sert that efficiency is subClassOf performance. 6.2. Create the failure class and assert that failure is subClassOf performance. 6.3. Create the success class and assert that success is subClassOf performance. Related Resources This pattern is related to the architectural pattern AP-LW-01 [5] for modelling a lightweight ontology. 110 3 Pattern Usage This pattern is being applied to re-engineer the European Training Thesaurus (ETT)4 into a Education Ontology5 , within the context of the SEEMP project. It contains over 2500 terms (1550 are descriptors, and 950 non descriptors). This term-based thesaurus is modelled following the record-based data model. 4 Summary and Future Work We have presented a pattern for transforming a term-based thesaurus, which is modelled following a record-based data model, into a lightweight ontology. The pattern is included in a pool of patterns, which is a key element of our method for re-engineering non-ontological resources into ontologies [2]. We plan to develop software libraries within a framework that implement the transformation process suggested by the pattern. Moreover, we will include exter- nal resources to improve the quality of the resultant ontologies. Finally, we need to calculate how much effort do we save re-engineering classification schemes us- ing patterns compared with re-engineering classification schemes without them. Acknowledgments. This work has been partially supported by the European Comission projects NeOn(FP6-027595) and SEEMP(FP6-027347), as well as by an R+D grant from the UPM. References 1. C. Roussey and O. Corcho. SynonymOrEquivalence (SOE) Pattern. http://ontologydesignpatterns.org, 2009. 2. A. Garcı́a, A. Gómez-Pérez, M. C. Suárez-Figueroa, and B. Villazón-Terrazas. A Pattern Based Approach for Re-engineering Non-Ontological Resources into On- tologies. In Proceedings of the 3rd Asian Semantic Web Conference (ASWC2008). Springer-Verlag, 2008. 3. V. Presutti, A. Gangemi, S. David, G. Aguado de Cea, M. C. Surez-Figueroa, E. Montiel-Ponsoda, and M. Poveda. NeOn Deliverable D2.5.1. A Library of On- tology Design Patterns: reusable solutions for collaborative design of networked ontologies. In NeOn Project. http://www.neon-project.org, 2008. 4. D. Soergel. Data models for an integrated thesaurus database. Comatibility and Integration of Order Systems, 24(3):47–57, 1995. 5. M. C. Suárez-Figueroa, S. Brockmans, A. Gangemi, A. Gómez-Pérez, J. Lehmann, H. Lewen, V. Presutti, and M. Sabou. Neon modelling components. Technical report, NeOn project deliverable D5.1.1, 2007. 4 http://libserver.cedefop.europa.eu/ett/en/ 5 The ontology will be available at http://droz.dia.fi.upm.es/hrmontology/ 111