On the Issue of Property Transitivity in RDF Datasets Tatiana Shulga [0000-0002-5521-5960], Alexander Sytnik [0000-0002-1256-7253], Ekaterina Panteleeva [0000-0002-2693-937X] Yuri Gagarin State Technical University of Saratov 77 Politechnicheskaya street, Saratov, Russia, 410054 taiss@yandex.ru Abstract. As part of the development of the concept of the semantic web in re- cent years have been created a large number of OWL ontologies and RDF da- tasets based on them. However, in the design and using the properties of OWL ontologies, is extremely important to correctly reflect real-world information, because something that is totally logical in the world of abstract data may poor- ly correlate with the expected behavior of web applications for the user. For ex- ample, SPARQL queries that using the transitive properties of the OWL lan- guage can create loops and return incorrect information. In this article we show that in some cases it is preferable to abandon the use of transitivity of properties in ontologies and describe an algorithm for traversing related entities, which al- lows solving the problem of loops. As an example, illustrating this problem is considered the web application "Linked Open Specialties" (LOS), which sends SPARQL queries to the ontology "Specialties". The ontology “Specialties” rep- resents the structure of official lists of specialties, bachelors and masters gradu- ate programs and research specialties that were valid in recent years in Russian Federation and allows us to establish their correspondence using the transitive property “equalsTo”. Notably, that, although the developed algorithm is formu- lated and used in terms of a specific subject area to solve the problem of a sepa- rate application, it is quite universal and can be used to solve the transitivity problem in RDF-datasets of other subject areas. Keywords: Transitivity of OWL-properties, ontology, RDF, Semantic Web, re- cursive algorithm, linked specialties 1. Introduction The past two decades have been characterized by the rapid development of Seman- tic Web technologies. [1]. Conceptually, semantic web is a stack (set) of web technol- ogies that allow to store and link data from various sources (systems and documents) in a manner that machine processing is applicable to them. One of the major technol- ogies of this stack is the RDF language, which is used to recording statements about any resources as triplets. This is a flexible data model, which is independent of sub- ject area. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Proceedings of the of the XXIII International Conference "Enterprise Engineering and Knowledge Management" (EEKM 2020), Moscow, Russia, December 8-9, 2020. However, the capabilities of the RDF language are limited by binary predicates and efficient computer processing of data is possible only when semantics are added to RDF data sets. In practice, for this purpose are developed RDF datasets based on OWL ontologies. In accordance with the definition of the W3C consortium, ontology is understood as a formal model of knowledge representation in a certain subject area that describes types of objects (classes), interaction between them (properties), and ways of joint use classes and properties (axioms) [2]. In recent years, a huge number of RDF datasets based on OWL-ontologies have been developed, some of which are provided in the Linked Open Data Cloud(LOD), and some are described in scientific publications (for example, [3,4,5,6]). The OWL language implements the basic requirements for ontology languages: clearly defined syntax, formal semantics, sufficient power of expression, ease of ex- pressing knowledge and effective support for logical inference. Support for logical deduction allows to obtain new knowledge using existing knowledge and to detect different kinds of contradictions in ontologies, for example, unforeseen relations be- tween classes or individuals. This process is possible by the mechanism of axioms (restrictions) - a kind of rules that operate in this ontology. Axioms of the OWL lan- guage allow us to present “complex” knowledge, for example, to set restrictions of the cardinality (there cannot be more than 20 students in a group), or characteristics of properties, such as transitivity (if lecture1 is included in topic1, and topic1 is included in section1, then this means that lecture 1 is included in section 1). In this paper, we consider in detail an axiom of the OWL language such as the transitivity of properties and the problem of using transitive properties in RDF da- tasets. Queries that use transitive properties can create a kind of loops and return in- correct information. In this paper, this problem is described and solved using a specif- ic RDF-dataset as an example. The set is developed on the basis of the OWL-ontology “Speciality”, which is for to present data of the lists of specialties, bachelors and mas- ters graduate programs and research specialties ever operating in the Russian Federa- tion. We propose a method for solving this problem by developing and implementing an algorithm for traversing related entities. 2. Transitivity of properties in OWL According to the OWL notation, property P is called transitive in the case from the totality of facts that individuals A and B are connected by property P and individuals B and C are connected by property P, it follows that individuals A and C are also connected by property P. Transitive properties are composite properties because they are created in a few steps. For example, on the basis of the statements “Mari isDescendantOf Tom” and “Tom isDescendantOf Mike”, it may be concluded that “Mari isDescendantOf Mike”. The fact that transitive properties are composite may be the source of some prob- lems associated with the so-called “loops” which can occur unpredictably in the case of sufficiently long transitive chains. This means that if the connection chain of this property is nonlinear and represents some semblance of graph, then at the moment there is no mechanism for specifying the order of its traversal or any restrictions on it, therefore SPARQL queries to resources associated with this property can return re- sults that do not meet expectations [7]. The situation is also complicated by the fact that, although SPARQL query language has a filtering mechanism, it is applied direct- ly to the query results and cannot affect the chain of logical inference during its exe- cution. Since the results returned by the query are simple resources, it is impossible to obtain any information about the transitive relation traversal order, which means that it is also not possible to carry out any filtering from the outside. More specifically, this problem can be illustrated by the example of the ontology “Specialties”. 3. Ontology “Specialties” The ontology “Specialties” [8] represents the structure of the official lists of spe- cialties and areas of training bachelors and masters, operating in recent years in Rus- sia, and a list of scientific specialties. The Ministry of Education and Science approve such lists in the Russian Federation. In recent decades, the higher education system of the Russian Federation has changed significantly. In particular, only from 2004 three official lists of specialties and areas of training bachelors and masters was changed, graduate school became an educational program and conformity of graduate school areas with scientific specialties was established. The Ministry of Education and Sci- ence publishes data about these changes on the Internet in the form of orders and in- structions, but in practice their use by educational organizations and citizens cause’s difficulties. This is due to the fact that this data are published in the form of pdf- documents for which effective machine analysis is impossible Therefore, there is a need to develop special web applications that would allow governing bodies, educa- tional organizations and private citizens not only to have access to information, but also to analyze it effectively. If such applications are developed on the basis of the traditional approach — using relational databases, then many questions arise related to data openness, their support, changing the structure of databases (after changing the structure of lists), relationships with other educational resources. The solution to this problem was the development of the OWL ontology "Specialties" and the correspond- ing RDF data set. This set is developed based on official documents of the Ministry of Education and Science of the Russian Federation, available in open access (for exam- ple, [8]). It contains the output data of the lists such as the names and codes of spe- cialties and educational programs combined in the UGSN from various lists and their correspondence. The ontology “Specialties” and RDF dataset were developed by a team of teachers and students of Yuri Gagarin State Technical University of Saratov (SSTU) [9]. They can be accessed in several ways: through the Web application Linked Open Special- ties (LOS) [10], or through the SPARQL endpoint [11]. The described ontology is also published in the cloud of open linked dictionaries (Linked Open Vocabulary) [12] and can be used by any developers to create any web applications in the area of higher education of the Russian Federation that require data on former or current lists of specialties and educational programs. Using the data of this ontology, it is possible by specialty in the diploma and the year of its receipt to determine the relevant specialty or educational program currently in force. That is, in particular, it associates the name and code of the specialty or edu- cational program valid at one time with the name and code of the specialty or educa- tional program valid at another time. The class hierarchy of this ontology is presented in Figure 1. Fig. 1. The structure of the ontology classes “Specialties" Ontology object properties (properties that connect instances of two classes) are pre- sented in Table 1. Table 1. Object properties of the ontology “Specialties” Object property Designation of property dcterms:hasPart Property from the ontology "Dublin Core". Represents a resource that is physically or logically included in the described resource. dcterms:isPartOf Property from the ontology "Dublin Core". Represents a resource in which the described resource is physically or logically includ- ed in any resource. partOf (included in) A property that indicates that a particular object is part of another object. It is transitive and inverse to the “consistsOf” property and is a subclass of a class dcterms: isPartOf. isPartOfList (included Subproperty properties «PartOf», shows that certain UGSN in the list) included in a certain list. Domain is UGSN, the range – is a list, and it is the inverse of the property «listConsistsOf». isPartOfUGSN (includ- Property that indicates that a certain specialty included in certain ed in the UGSN) UGSN. Domain is a specialty, the range – is a UGSN, and it is the inverse of the property «UGSNConsistsOf». consistsOf (consists of) A property that indicates that a particular object has specific components. It is transitive and inverse to the property “isPartOf” , is a subclass of the class dcterms: hasPart class. listConsistsOf (list The sub-property of the “consistsOf” property, showing that a сonsists of) particular list has components (UGSN). Domain is the list, the range – is a UGSN, and it is the inverse of the property «isPartOfList». ugsnConsistsOf (UGSN The sub-property of the “consistsOf” property, showing that a сonsists of) certain UGSN has components (specialties). Domain is UGSN, the range - is a specialty, and it is the inverse of the property «isPartOfUGSN», hasLevelEducation (has Shows the level of training for a particular specialty. Domain is a level education) specialty, and the range - is a level of education. Also is a functional property. owl:sameAs A property from the ontology "Web Ontology Language", show- ing that two links actually refer to the same object, that is, the objects are identical. equalsTo (equals to) Property showing that one specialty from one list corresponds to another specialty from another list. Domain and range - "Special- ty". Is transitive and symmetric property, is a subclass of owl: sameAs. UGSNHasUDC A property indicating that a specific UGSN corresponds to a specific UDC from the ontology "UDC-Scheme". 4. The problem of transitivity of the property “equalsTo” From the object properties of the ontology “Specialties”, the most important in the context of this work is the property equalsTo. It shows that one specialty from one list corresponds to another specialty from another list. The domain and range of this property is a specialty. It is transitive and symmetric property and is a subclass of a class owl: sameAs. The transitivity of this property is necessary in order to establish correspondence through related lists, which is logical when establishing consistency between special- ties. However, the results of SPARQL queries directed at obtaining a list of equivalent specialties did not meet the needs of the task, since they contained the full path of property transitivity. Perhaps this is correct from the point of view of the abstract application of this characteristic of the property, but it does not correspond to the general logic of the LOS application, since it contains a kind of “cycle”. In this case, “loop” means the non-linear course of transitivity in which the output chain returns to the specialty from the same list as the initial specialty for which compliance is estab- lishing. This leads to the fact that the user of the application often gets a result that looks incorrect. An example of such a result can be seen in Table 2. When requesting correspond- ences for the specialty "Economics" from the list “OKSO” with the code "080100", output contain such specialties as "Foreign regional studies", "Regional studies of Russia", "Applied mathematics and computer science", which is not the expected result for the user. Table 2. Fragment of correspondences list for the specialty "Economics" Educational pro- Code Level edu- UGSN List Period of gram cation validity Educational pro- Code Level edu- UGSN List Period of gram cation validity Foreign Regional 032000 Bachelor's Liberal arts 337 2011-2012 Studies degree Regional Studies of 032200 Bachelor's Liberal arts 337 2011-2012 Russia degree Regional Studies of 41.03.02 Bachelor's Political sciences 1061 2012-our Russia degree and regional studies days Foreign Regional 41.03.01 Bachelor's Political sciences 1061 2012-our Studies degree and regional studies days Applied Mathemat- 010400 Bachelor's Physics and Mathe- 337 2011-2012 ics and Computer degree matics sciences Science Applied Mathemat- 01.03.02 Bachelor's Mathematics and 1061 2012-our ics and Computer degree Mechanics days Science Business Informatics 080500 Bachelor's Economics and 337 2011-2012 degree Management Business Informatics 38.03.05 Bachelor's Economics and 1061 2012-our degree Management days Applied Computer 230700 Bachelor's Computer Science 337 2011-2012 Science degree and Computer Engi- neering Applied Computer 09.03.03 Bachelor's Computer Science 1061 2012-our Science degree and Computer Engi- days neering It should be noted that in the above example already including the filtering by spe- cialties, which belong to the same list as the initial specialty. However, as mentioned above, the filters in the SPARQL-query works directly with its results and have no influence on the inference process and the construction of the transitivity chain, there- fore, they do not fulfill the required task. 5. A recursive traversal algorithm for related specialties In order to solve the considered problem, we suggest removing the transitivity re- striction on property equalsTo and using the following algorithm that obtains the list of related specialty for a given specialty and returns values, which is the correct from the user’s point of view. Algorithm to obtain the list of related specialties This algorithm allows getting the set of all equivalent specialties from other lists by the code of a given specialty. Input data: code of the specialty s; Output data: set of specialties A corresponding to the specialty s filed by the input. The steps of the algorithm: 1. Get the list code p, to which specialty s belongs; 2. Get a list of specialties B, associated with s by property equalsTo; 3. Remove specialty s from list B, if B is contained s; 4. For each specialty si from list B do the following steps: 4.1. Get the code of list pi that contain specialty si; 4.2. If pi corresponds with p, break the current iteration of the loop and go to specialty si+1; 4.3. If the set of specialties A, contains si, break the current iteration of the loop and go to specialty si+1; 4.4. Add specialty si to the set of specialties A; 4.5. Do steps 2-4 for specialty si. For ease of developing and testing this algorithm, on the base of the ontology “Specialties” a small ontology “Test” was developed and filled with data. It is an abstraction of that part of the original ontology, which will interact directly mecha- nism for establishing correspondences. The class hierarchy of the developed ontology is presented in Figure 2 and includes only two classes: EducationalProgramm - the specialty of education, and List - a list that contains specialties. Fig. 2. The structure of the ontology classes “Test” The object properties hierarchy of the ontology «Test» is presented in Figure 3. The main properties in it are the following properties: equalsTo - establishes the cor- respondence between specialties from various lists, isPartOfList - shows that the spe- cialty is part of the list, listConsistOf - shows that the list consists of components - specialties. Fig. 3. The structure of the ontology object properties “Test” The resulting ontology was filled with data about individuals - specialties, and for the convenience of testing the name of each contains names of those specialties with which it is directly related by the property equalsTo. An example of a description of such specialty is presented in Figure 4. Fig. 4. Description of the specialty а11а01а21а22 The next step was the development of a console application that implements this algorithm in Java. The main reason for choosing this programming language was the fact that LOS application is written in this language and thus further integration of the developed mechanism would be the most simple. The following classes were created during application development:  App, directly responsible for the logic of establishing correspondence be- tween specialties and the formation of the required result;  SparqlQuery containing the texts of all used SPARQL-queries;  QueryService, necessary for substitution of variable values and correct formation of SPARQL queries before sending to the server;  Util, responsible for connecting to a SPARQL-endpoint, sending SPARQL-queries and receiving their results. The App class requires special attention, since the algorithm for obtaining the list of correspondences for the specialty is implemented precisely in it. The main function in it is recursiveTraversal, a recursive function to bypass the list of specialties related by property equalsTo, its code is shown in Figure 5. Fig. 5. Function recursiveTraversal The result of work of the application for the specialty "a11a01a21a22" is presented in Figure 6. All necessary correspondences were obtained, including those whose connection by the property equalsTo with this individual was not setup directly, but was obtained as a result of a recursive traversal of related specialties. Fig. 6. The result of work of the application The integration of this algorithm in the LOS application will provide correct results when performing a request to search for specialties that correspond to the specialty "Economics" from the list “OKSO” with the code "080100" (Fig. 8.) Table 3. Correspondences list for the specialty "Economics" Educational Code Level edu- UGSN List Period of program cation validity Economics 521600 Bachelor's Liberal arts and so- The 2001-2004 degree cio-economic science List Economics 080100 Bachelor's Economy and Man- 337 2011-2012 degree agement Economics 032200 Bachelor's Economy and Man- 1061 2012-our degree agement days 6. Conclusion Thus, we investigated the problem of property transitivity in RDF-datasets using a specific example. To do this, we analyzed the work of the LOS application that per- forms SPARQL queries to the RDF-dataset, which was based on the ontology "Spe- cialty". As a result, a problem was revealed in constructing the output chain of the transitive property equalsTo: queries using this property returned an incorrect result containing "loops". A recursive traversal algorithm for related specialties, described in this article is correct and can be successfully implemented in the application “Special- ties of higher education of the Russian Federation” to solve the problem of transitivity of the property equalsTo. In the future, the re-engineering of the application “Specialties of higher education of the Russian Federation” will be held using this algorithm, for which the developed Java application that implements this algorithm will be useful. Notably, that, although the developed algorithm is formulated and used in terms of a specific subject area to solve the problem of a separate application, it is quite uni- versal and can be used to solve the transitivity problem in RDF-datasets of other sub- ject areas. References 1. A Semantic Web Primer (Cooperative Information Systems series) 3rd (third) Edition by Antoniou, Grigoris, Groth, Paul, van Harmelen, Frank van. Published by The MIT Press (2012). 2. Linked Data Glossary. W3C Working Group Note 27 June 2013. [Электронный ресурс] URL: http://www.w3.org/TR/2013/NOTE-ld-glossary-20130627/#ontology (accessed on 30.09.2020). 3. Guarino, N., Musen, M. Applied ontology: The next decade begins (2015) Applied Ontol- ogy, 10 (1). pp. 1-4. 4. Schulz Stefan The Role of Foundational Ontologies for Preventing Bad Ontology Design. (2018) CEUR Workshop Proceedings, 2205. 5. Shulga, T., Sytnik, A., Kumova, S., Isaev, D. Web service for the dissertation opponents selection based on ontological approach (2019) CEUR Workshop Proceedings, 2413. pp.145-151. 6. Kelle Pereira, Crystiam & Siqueira, Sean & Pereira Nunes, Bernardo & Dietze, Stefan. (2017). Linked data in Education: a survey and a synthesis of actual research and future challenges. IEEE Transactions on Learning Technologies. 1-1. 10.1109/TLT.2017.2787659. 7. Fionda, Valeria & Pirrò, Giuseppe & Consens, Mariano. (2019). Querying knowledge graphs with extended property paths. Semantic Web. 10. 1-42. 10.3233/SW-190365. 8. Order of the Ministry of Education of the Russian Federation of December 4, 2003 N 4482 “On the Application of the all-Russian Classifier of Specialties in Education”. Available online: https://www.vyatsu.ru/uploads/file/1403/prikaz_minobrazovaniya_rossii_perehodnik_okso .pdf (accessed on 30.09.2020) 9. Sytnik A.A., Shulga T.E. Ontological engineering knowledge in the field of higher educa- tion of the Russian Federation // Engineering enterprises and knowledge management (IP & UZ-2018): collection of scientific papers of the XXI-th Russian scientific conference. April 26-28, 2018 / under scientific. ed. Yu. F. Telnova: in 2.t. - Moscow: FGBOU VO "REU them. G. V. Plekhanov", 2018.T1. Pp. 234-239. ISBN 978-5-7307-1359-8. (v.1) 10. SPARQL-endpoint to the ontology "Specialties". Available online: http://sparql.sstu.ru:3030 (accessed on 30.09.2020) 11. Web application "Specialties of higher education of the Russian Federation”. Available online: http://los.sstu.ru (accessed on 30.09.2020) Ontology "Specialties" in the open dictionary of related data LOV. Available online: http://lov.okfn.org/dataset/lov/vocabs/losp (accessed on 30.09.2020)