ICAASE'2014 OntoWM: an Ontology for Unification and Description of Web Mining OntoWM: an Ontology for Unification and Description of Web Mining Khaled Benali Sidi Ahmed Rahal Lab systems, Networks, Databases, SNDB. Lab systems, Networks, Databases, SNDB. University of Science and Technology of Oran University of Science and Technology of Oran USTO USTO 2 C 18 Debdaba Bechar Algeria 08000 ORAN, BP 1505 El Mnaouer Benalikhaled2013@yahoo.fr Rahalsa2001@yahoo.fr Abstract – This article is concerned with the merging of two active research domains: Knowledge Discovery in Databases (KDD) and Knowledge Engineering (KE) with a main interest in Ontology. In KDD, we need to unify the domain of web mining. To overcome this drawback, several methods have been proposed in the literature. So, we propose an ontology, named OntoWM which includes definitions of basic Web Mining entities, such as tasks, algorithms... to describe the spots and the basic entities of the web mining in order to share common understanding of this method and explain what is considered as implicit. Keywords – Web Mining, Ontology, KDD, KE mining and web usage mining [3]. While the 1. INTRODUCTION content mining approaches focus on the content of single web pages, web usage mining uses In recent years use of term ontology has become server logs that detail the past accesses to the prominent in the area of computer science web site data made available to public. research and the application of computer science While KDD and data mining have enjoyed great methods in management of scientific and other popularity and success in recent years, there is a kinds of information. In this sense the term distinct lack of a generally accepted framework ontology has the meaning of a standardized that would cover and unify the data mining terminological framework in terms of which the domain. The present lack of such a framework is information is organized [1]. “Data Mining (DM) is perceived as an obstacle to the further an emerging field that covers a wide range of development of the field. In [4], Yang and Wu application domains, such as marketing, finance, collected the opinions of a number of outstanding e-commerce, biology and privacy among the data mining researchers about the most others. [2]. Among the knowledge models used challenging problems in data mining research. in DM, Web mining (WM) consists of a set Among the ten topics considered most important operations defined on data residing on WWW and worthy of further research, the development data servers. The reference [3] defines web of a unifying framework for data mining is listed mining as “…the discovery and analysis of useful first. One step towards developing a general information from the World Wide Web”. Such framework for data mining is constructing data can be the content presented to users of the ontology of data mining. web sites such as hypertext markup language (HTML) files, images, text, audio or video. Also In this article we will create our new ontology the psychical structure of the web sites or the model (OntoWM) to unify Web mining and to server logs that keep track of user accesses to represent web mining tasks, and define the the resources mentioned above can be targets of semantics of the relationships between entities of web mining techniques. Web mining is mainly web mining. categorized into two subsets namely web content International Conference on Advanced Aspects of Software Engineering ICAASE, November, 2-4, 2014, Constantine, Algeria. 189 ICAASE'2014 OntoWM: an Ontology for Unification and Description of Web Mining 2. OntoWM corpus). The study of the domain revealed more than 200 concepts concerned. For the development of our ontology, we tried to follow the steps proposed in [5]. For example (figure 2), from the terms (History- User, Log-User) were selected, by means of a Step 1. Determination of the domain and domain expert, the concept candidate “Log- scope of the ontology User”. From the terms (Web-Usage-Mining, Web-Log-Mining) were selected, by means of a Our application requires an Ontology of Web domain expert, the concept candidate “Web- Mining (OntoWM), which should allow us to Usage-Minig”. describe the spots and the basic entities of the web mining in order to share common understanding of this method and explain what is Web Usage considered as implicit. Therefore we can limit our Mining study to the description of spots and basic entities of WM. OntoWM will be used to help Concept users and researchers of the web mining to chosen understand the elements and steps of this method. Step 2. Enumerate important terms in the ontology In this step we write down a list of all terms (figure 1), with an expert in the method of Web Mining, we extracted, using a domain expert, more than 250 relevant terms. For example, Fig. 2. Example of conceptualization important Web-Mining-related terms will include: Classification, we then classified the types of Web Usage Mining, Web Content Mining, Web concepts in classes and subclasses, thus Log Mining, Semantics Web, Web Robots, forming a class hierarchy with root class: Web- Software Agent, Spiders, Log, History User, Mining. These classes are the concepts of our Softbots, Data Cleaning… ontology. We selected several kinds of concepts: Algorithms, Application-Domains, Tasks, Web- Usage-Mining, Web-Structure-Mining, HITS, Page-Rank... To establish the hierarchy of classes, we conduct from top to bottom starting with the most general concepts and ending with the specialization of concepts. Therefore, we start with the most general classes, namely: Web-Mining, Algorithms, Application-Domains, Tasks, Basic- Concepts, … We then refined each class. For example, the class Fichie-Log-Format was refined by the concepts: Common-Log-Format, Extended-Log-Format and the class Text- Fig. 1. Part of the extracted terms Classification which has been specializing in sub- concepts: Automated-Filtering, Text- Categorization. Step 3. Define the classes and the class hierarchy (Conceptualization and Ontologization) Operationalization (use of OWL) We usually start by defining classes. From the To build our ontology “OntoWM.owl” we used the 1 list created in Step 2, we select the terms that representation language OWL. OWL is one of describe objects having independent existence the languages most used in the construction of rather than terms that describe these objects. ontology. The ontology presented was performed These terms will be classes in the ontology and through the use of the editor "Protégé" open 2 will become anchors in the class hierarchy. We source distributed by the University of Stanford organize the classes into a hierarchical Medical Informatics. Protégé allows, through its taxonomy. We identified the types of concepts in the field of Web Mining, drawing on data from 1 w3c : http://www.w3.org sources located on ([6], [7]… this is our text 2 Available at: http://protege.stanford.edu/ International Conference on Advanced Aspects of Software Engineering ICAASE, November, 2-4, 2014, Constantine, Algeria. 190 ICAASE'2014 OntoWM: an Ontology for Unification and Description of Web Mining GUI automatic generation of code corresponding to the OWL ontology. "Owl: Thing" is a predefined class. Every OWL class is a subclass of owl: Thing. Figure 3 is graphical representations (screenshots) of the class hierarchy of our ontology, produced using 3 4 the tools OWLViz and OntoGraf . Fig. 4. The data properties Step 5. Create instances The last step is creating individual instances of classes in the hierarchy. The tab "Individuals" can create instances and assign properties. For example, Weka1.0 is an instance of the class Weka. On the screen presented (Figure 5, it is possible to edit the information about the individual “Weka1.0”. Fig. 5. Individual “Weka1.0” 3. Experimental Results In terms of this section, we discuss the experimental evaluations and tests applied in our Fig. 3. General structure of our ontology ontology. Assessment tests are performed on a OntoWM.owl machine with an Intel I5 2,53 GHz, 4 GB of Step 4. Define the properties of classes memory under Windows 8. The platform used is Protégé 4.3.0 (with OWL 3.4.2, the reasoner The classes alone will not provide enough 5 FaCT++1.6.2 , the reasoner HermiT 1.3.7, information to answer the competency questions OWLViz 4.1.2 and OntoGraf 1.0.1). from Step 1. Once we have defined some of the classes, we must describe the internal structure 3.1. Richer Knowledge representation of concepts. Most of the remaining terms are likely to be properties of these classes. For each In this paper, and with this approach: property in the list, we must determine which 1. We have contributed to the advancement of class it describes. These properties become slots research in the field of Web Mining using an attached to classes. Thus, the Doc-Node class ontology which is characterized by : will have the following slot: NBR-Node (Figure 4). - 7 main parts: Application-Domains, Tasks, Caracteristics, Basic-Concepts, Process, Algorithms, Soft) - 7 levels deep - More than 200 concepts 3 OWLViz is designed for use with Protégé OWL editor plugin. This - ….. tool allows you to view the hierarchy of classes in an OWL ontology 4 OntoGraf gives support for interactively navigating the 5 relationships of your OWL ontologies Fast Classification of Terminologies International Conference on Advanced Aspects of Software Engineering ICAASE, November, 2-4, 2014, Constantine, Algeria. 191 ICAASE'2014 OntoWM: an Ontology for Unification and Description of Web Mining 2. We have in our ontology provides a richer OntoWM will be used to help users or representation of knowledge generally accepted researchers of the web mining to understand the in this field (The classification of the basic elements and steps of this method. We have the elements of the web mining by axes (Tasks, ability to use this ontology as such, or to host in a Algorithms …) will make it easier for a user or site and make calls from the corresponding URI. researcher to understand this method). With our ontology OntoWM, we proposed a new unified and standard architecture based on 3.2. Coherence of the ontology model ontologies for the terms most used in Web Mining. We tested our ontology according to the criteria The ontology developed (OntoWM.owl) is Furst [8] by the reasoner Fact + + (installed with considered incomplete (we need to populate the Protégé 4, Fig 6) and the RDF W3C validator proposed classes of Web Mining entities) and that allow us therefore also ensure that our OWL still needs to be improved throughout its life document follows the syntax of RDF (Fig 7), which already gives a first indication of the cycle. Our perspective is to improve and continue validity of our ontology that allows us to validate this work to enrich the ontology with new the consistency of the model associated with the concepts and add inference refining properties ontology. So our ontology is characterized by and restrictions. (criteria of Gruber [9]): • The clarity and objectivity of the definitions, 5. REFERENCES which must be independent of any [1] B. Smith, “Ontology. In: Blackwell Guide to the implementation choices, extensibility, no cycle Philosophy of Computing and Information,” (that is to say, loop definition) and no redundancy Oxford Blackwell, (Malden, 2003, 155–166). concepts and relationships. [2] Bellandi A ., Furletti B., Grossi V., and Romei A., “Pushing constraints in association rule mining: an ontology-based approach,” Proceedings in IADIS International Conference WWW/Internet (Vila Real, Portugal, Year of Publication:2007). [3] Cooley, R. and Mobasher, B. and Srivastava, J., ”Web mining: Information and pattern discovery on the world wide web,” In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), Los Fig. 6. Formal Validation of our ontology Alamitos. "OntoWM.owl" by the reasoner FaCT + + [4] Q. Yang, and X. Wu, “10 chalenging problems in data mining research,” International Journal of Information Technology & Decision Making, Vol. 5, No. 4 PP. 597–604 , 2006. [5] N. F. Noy, and D. L. McGuinness,”Ontology Development 101: A Guide to Creating Your First Ontology,” (Stanford University, Stanford. 2004). [6] B. Liu. “Web DataMining Exploring Hyperlinks, Contents, and Usage Data,” Book, ISBN-10 3- 540-37881-2 Springer Berlin Heidelberg New York. 2007 Fig. 7. Syntax validation of our ontology "OntoWM.owl" by the W3C RDF validator [7] H. Yilmaz, “Using ontology based web usage mining and object clustering for recommendation,” Master thesis, the graduate 4. Conclusion and Perspectives school on natural and applied sciences of middle In this paper, we have outlined ontology and web east technical university, 2010. mining issues and requirements. A state-of-the- [8] F. R Furst, ”Contribution à l'ingénierie des art review considering the main existing (in last 7 ontologies: une méthode et un outil years) “web Mining-Ontology” approaches and d'opérationnalisation,”PhD thesis, University of tools was presented. Nantes, France, 2004. In this work, we present a proposal for a new [9] T. R. Gruber, “Towards principles for the design ontology of Web Mining "OntoWM". OntoWM is of ontologies used for knowledge sharing,” the first ontology that describes the field of web International Journal of Human-Computer Studies, Vol. 43, n. 5-6, pp 907 – 928, 1995. mining in detail (different types of tasks and basic entities of the method of Web Mining). International Conference on Advanced Aspects of Software Engineering ICAASE, November, 2-4, 2014, Constantine, Algeria. 192