=Paper= {{Paper |id=Vol-2585/paper12 |storemode=property |title=An ontology-based approach to describe collaborative work by reusing and enriching data from an institutional repository |pdfUrl=https://ceur-ws.org/Vol-2585/paper12.pdf |volume=Vol-2585 |authors=María-Auxilio Medina N.,Delia Arrieta D.,Jorge de la Calleja M.,Laura Zacatzontetl H.,Marilú Zacatelco P. |dblpUrl=https://dblp.org/rec/conf/lanmr/NDMP19 }} ==An ontology-based approach to describe collaborative work by reusing and enriching data from an institutional repository== https://ceur-ws.org/Vol-2585/paper12.pdf
     An ontology-based approach to describe collaborative
         work by reusing and enriching data from an
                   institutional repository

             María-Auxilio Medina N.1, Delia Arrieta D.2, Jorge de la Calleja M.1,
                        Laura Zacatzontetl H.1, Marilú Zacatelco P.1
      1
          Departamento de Posgrado. Universidad Politécnica de Puebla. Tercer Carril del Ejido
           Serrano S/N. San Mateo Cuanalá. Juan C. Bonilla, Puebla, México. C. P. 72640
                        2
                          Facultad de Economía, Contaduría y Administración.
               Universidad Juárez del Estado de Durango. Fanny Anitúa y Priv. Loza S/N
                         Col. Los Ángeles Durango, Dgo. México. C.P. 34000
                {maria.medina, jorge.delacalleja, marilu.zacatelco}@uppuebla.edu.mx,
                                {darrietad, laurita_z_h}@hotmail.com




          Abstract. Besides tutoring and consultancies, the development of academic and
          scientific documents in universities evidenced collaborative work. This paper
          presents an ontology-based approach to describe different modes of
          collaboration by reusing and enriching data from an institutional repository,
          from a collection of posters. The approach uses an application ontology that
          makes explicit the relationships among authors and posters. The paper presents
          a list of competency questions that are answered in natural language and by the
          ontology terminology. The proposed approach is of value as this offers
          machine-readable data to support further analysis and inference mechanisms.
          Keywords: Ontologies, semantic web, institutional repositories, document
          management.



1     Introduction

Besides tutoring and consultancies, the development of academic and scientific
documents in universities are evidences of collaborative work that can be used for
supporting management decisions. At present, the Universidad Politécnica de Puebla
(UPPue) distributes open-access documents such as articles, master’s thesis and
posters by using the infrastructure of its institutional repository (IR), from now on,
UPPue-IR.

Posters are documents written by graduate students of different academic programs
where they report partial results of research activities; posters are often presented at
symposiums or congresses. UPPue-IR is a documental database that allows users to
retrieve validated documents frequently produced between teachers, students or both
of them. From a technical point of view, this repository implements the Open
Archives Initiative Protocol (OAI-PMH protocol) (Lagoze and Van de Sompel, 2001)
to interoperate with the National Repository (RN, 2019); this protocol is also used to
    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License
    Attribution 4.0 International (CC BY 4.0)




                                               131
export descriptive data of documents, commonly refered as metadata. The
implementation of this protocol implies that documents are depicted by using the
Dublin Core Metadata Element Set as the default metadata standard (DCMI, 2014).
The elements of this standard related with collaborative work among authors of
posters are creator and contributor, the first one stores the name of a student name,
while the second one refers to his/her advisor; if there is a third or fourth author, their
names are also stored in multiples instances of the contributor element. Unlike posters
are retrieved by search engines, the order in a list of authors for posters or other types
of academic documents neither their contribution are taken into account.
   This paper presents an ontology-based approach to describe collaboration among
authors of posters by reusing and enriching data from the UPPue-IR. The approach
uses an application ontology that makes explicit the relationships among students,
teachers and posters.
  The paper is organized as follows. Section 2 presents user types and their
competency questions (CQs). Section 3 describes the main ontology components.
Section 4 contains the answer for CQs. Section 5 enumerates implicit information that
is derived from the ontology. Finally, we conclude in Section 6 with a summary of the
present work along with further research perspectives.


2    User types and their competency questions

According to (Gruber, 1995), an ontology is a “specification of a shared
conceptualization”; in computer and information sciences, ontologies are formal
definitions of types, properties and relationships between entities that exist in a
particular domain of interest (Ecured, 2019). Ontologies are knowledge models
composed by instances, concepts, rules and relationships that have a unique
representation for a group of people or computers.

Table 1 shows the main user types of the posters’ collection, these users are highly
likely to be found in other IR.

                      Table 1. User types of the posters’ collection.

             User types                              Description
         Advisor               A person who directs the research work of a
                               graduate student that is reported on a poster. The
                               advisor is the second author in the authors’ list.
         Manager               The manager of an IR in charged of exporting
                               metadata
         Student               The main author of a poster, the first in the authors’
                               list
         Teacher               The third or fourth author of a poster, a person from
                               the academic staff that reviews the content and
                               structure of a poster




                                            132
The scope of the ontology proposed is determined by the Competency Questions
(CQs) of Table 2, more information about CQs can be found in (Noy and Hafner,
1997) and (Bezerra et al., 2013). CQ1 to CQ4 support knowledge acquisition for IRs,
CQ5 and CQ6 are related with the IR context while CQ7 and CQ8 have specific
information about collaborative work between authors.


                    Table 2. Competency questions for by user types.

        Number of CQ                Description of CQs in natural language
          CQ1            What is a poster?
          CQ2            What is a poster for?
          CQ3            What kind of DC elements are used to describe a poster?
          CQ4            Who use a poster?
          CQ5            Which are mandatory metadata elements to deposit a
                         poster into the UPPue-IR?
            CQ6          How posters are introduced into the ontology?
            CQ7          Who form the list of authors of a poster?




3    Main ontology components

The paper proposes an ontology to describe different modes of collaboration among
authors of a posters’ collection. The metadata for this collection are exported from the
UPPue-IR and transform into ontology instances. Note that any other IR that
implements the OAI-PMH protocol has also their own mechanisms to export
metadata. The ontology is composed of a hierarchy of classes, a set of data properties
(data property axioms), object properties (object property axioms) and instances (also
knows as individuals), this is edited by using the Protégé software tool version 5.2
(Musen, 2015). The following sections describe these components.


3.1 Main classes

The main class of the proposed ontology is called University, the purpose is to have a
general concept that refers to the context of use for the proposed ontology. Table 3
shows the names and descriptions of three classes at the second level of the ontology,
remaining concepts are obtained by generalization and specialization and distribute
between the third or fourth level in the class hierarchy. By convention, class names
starts with a capital letter.




                                          133
                Table 3. Classes at the second level of the proposed ontology.

       Class                                         Description

     Department     This class refers to the adscription of a student or teacher.

       Poster       A document written by a student where he/she reports partial results of
                    his/her research activities.

       User         The User class integrates user types, (advisor, manager, student and
                    teacher. An advisor is a type of teacher.



3.2 Data properties

The classes at the second level of the hierarchy are described by using data properties.
For example, the name, last name or gender of a User, the title and date of a Poster are
modeled as data properties. All interoperability aspects that correspond to the
implementation of OAI-PMH protocol and the DC elements can be represented as
data properties that link posters and users with data values from an XML Schema
Datatype or an RDF literal (RDF, 2001).


3.3 Object properties

Collaborative work between authors to produce posters are modeled in the ontology
as object properties, they are associated with domain and range restrictions as is
illustrated in Table 4.

      Table 4. Object properties for modeling collaborative work to produce posters.

                Object property                Domain                     Range
                   assignedTo                 Teacher                   Department
                   hasTeacher                Department                  Teacher
                 wasProducedIn                 Poster                  Department
                  hasPoster                  Department                   Poster
                   hasStudent                Department                  Student
                     studies                  Student                   Department
                  isAdvisorOf                 Advisor                    Student
                 isFirstAuthor                Student                     Poster
               isSecondAuthorOf               Advisor                     Poster
               isThirdAuthorOf                Teacher                     Poster
              isFourthAuthorOf                Teacher                     Poster
                 isManagedBy                   Poster                    Manager




                                               134
Table 5 shows the facets for the object properties, the notation is as follows:
functional (F), inverse functional (IF), asymmetric (AS) and irreflexive (I). Object
properties of Table 4, the facets in Table 5 and ontology instances form the ABox for
the ontology, reasoners use this box to maintain logical consistency and to infer new
knowledge. It is worth to mention that any of the object properties is considered
symmetrical, transitive or reflexive.

                       Table 5. Facets for object properties.

                   Object properties                         Facets
                       assignedTo                           F, AS, IR
                       hasTeacher                            AS, IR
                    wasProducedIn                           F, AS, IR
                        hasPoster                            AS, IR
                       hasStudent                            AS, IR
                         studies                            F, AS, IR
                      isAdvisorOf                            AS, IR
                     isFirstAuthor                          F, AS, IR
                 isSecondAuthorOf                           F, AS, IR
                   isThirdAuthorOf                          F, AS, IR
                  isFourthAuthorOf                          F, AS, IR
                     isManagedBy                             AS, IR


3.4 Ontology instances

Posters and user types are modeled as ontology instances. A semi-automatic process
has been designed in order to transform metadata from UPPue-IR into ontology
instances. As a way of illustration, Figure 1 shows how the 133 posters that form the
posters’ collection are distributed by year.




                        Fig. 1. Distribution of posters by year




                                          135
Figure 2 shows information for a user in the Spanish version of the ontology. The
translation of the Spanish terms is as follows:

    •    apellidoMaterno, second last name
    •    nombreDePila, name
    •    Autor, Author, a subclass of the User class
    •    esAutorDe, isAuthorOf
    •    cartel1, poster1
    •    genero, gener
    •    apellidoPaterno, first last name




           Fig. 2. Information about an ontology instance of the User class.


Figure 3 shows the information of usage of two different users. It is worth to notice
that the role of these users is included in the ontology, (“tieneSinodal” is equivalent to
“isThirdAuthorOf” and “ProfesorDeTiempoCompleto” is the Spanish term used for
the “FullTimeTeacher” class).




                                           136
              Fig. 3. Information about collaborative work of two users.

Figure 4 shows the ontology metrics for the posters’ collection. Note that the number
of axioms is 2985 and that there are 396 ontology instances (individual account).




                     Fig. 4. Metrics of the ontology for posters.




                                         137
4    Formal answers to competency questions

   CQs are used as guidelines for ontology evaluation. This section presents the
answers to CQs in natural language and using formal concepts. An excerpt of the
usage information of the ontology elements are described in this section as formal
answer.

CQ1: What is a poster?. A poster is a document written by a graduate student where
he/she report partial results of his/her research activities.
.
Formal answer:
    1. Annotation property: rdf:isDefinedBy for Poster
    2. Data type property: posterData for Poster
    3. Object property: wasProducedIn, isManagedBy, isFirstAuthorOf,
         isSecondAuthorOf, isThirdAuthorFor, isFourthAuthorOf

CQ2: What is a poster for?. A poster is a document to report advances or partial
results of reserarch activities.
Formal answer:
     1. Class: Poster
     2. Poster SubClassOf University
     3. Object properties: wasProducedIn, hasPoster

CQ3: What kind of DC elements are used to describe a poster?. Title, date, year,
subject (for the department) and a list of authors (creator and contributor elements).

Formal answer:
   1. Class: Poster
   2. Poster SubclassOf University
   3. Data property: title, (functional)
   4. Data property: year, (functional)
   5. Data property: subject, (functional)
   6. Date property: date, (functional)

CQ4: Who use a poster?. UPPue-IR user types are advisor, manager, student and
teacher.
Formal answer:
    1. Class: User
    2. (Advisor, Manager, Student, Teacher) SubClassOf User
    3. Annotation property: rdf:isDefinedBy for Advisor, Manager, Student,
         Teacher

CQ5: Which are mandatory metadata to deposit a poster into the UPPUE-IR?.
Formal answer:
   1. Data property: title, string or RDF literal
   2. Data property: year, integer




                                         138
    3.   Data property: subject, string or RDF literal
    4.   Date property: date, date

CQ6: How posters are introduced into the ontology?. Posters are introduce into the
ontology as instances, the information about collaborative work of authors is
represented in object properties.
Formal answer:
    1. Class: Poster
    2. Poster SubClassOf University
    3. Object properties: see Table 4


CQ7: Who forms the list of authors in a poster?. A graduate student (the first author),
an advisor (the second author) and two teachers (the third and fourth author).
Formal answer:
    1. Object properties isFirstAuthorOf, isSecondAuthorOf, isThirdAuthorOf,
         isFourthAuthorOf.
    2. isFirstAuthor, domain (Student)
    3. isSecondAuthor, domain (Advisor)
    4. isThirdAuthor, domain (Teacher)
    5. isFourthAuthor, domain (Teacher)


In summary, although the ontology is simple in terms that this represents the addition
of semantic information to a particular collection of data from an IR, this is able to
represent CQs and their answers using its own terminology. All the inconsistencies
were corrected before release. Hermit and Pellet reasoners were used for validation of
logical consistency. The ontology can be exported to different semantic web
languages such as RDF (RDF, 2001) or the Ontology Web Language (OWL 2004).


5    Implicit knowledge derived from the ontology

The formal features of the ontology enables to extract implicit knowlegde as the
following:

•   If the second author of a poster is a teacher, then he/she is considered an advisor
•   If a student is the first author of a poster, then he/she is a graduate student
•   If a poster only has two authors, the first one is a graduate student and the second
    one his/her advisor
•   A department has many teachers but a teacher is assigned only to a department
•   The Poster and User are disjoint classes
•   A user can not be a Student and a Teacher at the same time
•   If a teacher is an advisor, that means that at least his/her name appears in the
    second place of an authors’ list




                                          139
The establishment of axioms, cardinality, domain and range restrictions as well as the
definition of object properties, enables the formal representation of knowledge useful
to discover possible data inconsistencies. For example, cardinality restrictions can be
inserted into the ontology in order to establish a minimum, exactly or maximum
number of authors for each poster. Ontologies as the described in this paper can be
used to represent collaborative work of other types of documents according to the
interests of potential users.




6    Conclusions

   This paper presented an ontology-based approach to describe collaborative work
by reusing and enriching data from an institutional repository. Ontology instances
were obtained by exporting metadata of a posters’ collection. The approach uses the
ontology to formally represent relationships among users and posters.
   The paper used a list of CQs that are answered by the ontology terminology in
natural language and formal answers. The natural language answers are stored as
definitions in the RDF language, while the formal answers are extracted from the
usage dialogs from the Protégé ontology editor. Ontology information is used by
reasoners to infere new knowledge as well as to discover possible data
inconsistencies, the last feauture add value to data from IRs.
   The ontology itself and their instances form a machine-readable dataset that can be
explote by semantic technologies. As future work, we plan to work in the design of an
ontology assessment process to get feedback from the constructed ontologies.


References

1.   Lagoze C., Van de Sompel H. The open archives initiative: building a low-barrier
     interoperability framework. In Proceedings of the first ACM/IEEE-CS Joint Conference
     on Digital Libraries JCDL’01. pp. 54-62. ISBN 1-58113-345-6. DOI:10.1145/379437.
     (2001).
2.   National repository. Repositorio Nacional. Gobierno de México. Consejo Nacional de
     Ciencia y Tecnología (CONACYT). Retrieved from: https://www. repositorionacionalcti.
     mx. (2019).
3.   DCMI Metadata Terms. Dublin Core Metadata Initiative. Retrieve from:
     http://www.dublincore.org/specifications/dublin-core/dcmi-terms/. (2014).
4.   Gruber, T. R. Toward principles for the design of ontologies used for knowledge sharing.
     International Journal of Human-Computer Studies, Vol. 43 No. 4-5. pp. 907-928. (1995)
5.   Ecured. Ontología. Retrieve from: https://www.ecured.cu/Ontología. (2019).
6.   Noy, N. F., Hafner, C. D. The state of the art in ontology design: a survey and
     comparative review. In AI Magazine. Vol. 18. No. 1, pp. 53-74. (1997).
7.   Bezerra, C., Freitas, F., Santana, F. Evaluating ontologies with competency questions. In
     Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web
     Intelligence (WI) and Intelligent Agent Technologies (IAT), IEEE Computer Society
     Washington, DC, USA. Vol. 03. No. 1. pp. 284-285. ISBN: 978-0-7695-5145-6. DOI:
     10.1109/WI-IAT.2013.199. (2013). (2013).




                                            140
8.  RDF1.1 XML syntax. Retrieved from: http://www.w3.org/TR/rdf-syntax-grammar/
    (2001).
9. Musen, M. A. The protégé project: a look back and a look forward. AI Matters.
    Association of Computing Machinery Specific Interest Group in Artificial Intelligence,
    Vol. 1 No. 4, pp.4-12. DOI: 10.1145/2557001.25757003. (2015).
10. OWL Web Ontology Language Overview. Retrieved from: http://www.w3.org/TR/owl-
    semantics/. (2004).




                                           141