=Paper=
{{Paper
|id=None
|storemode=property
|title=Building CORTUPP: a digital collection of technical reports with semantic features
|pdfUrl=https://ceur-ws.org/Vol-677/12_LANMR10_poster.pdf
|volume=Vol-677
}}
==Building CORTUPP: a digital collection of technical reports with semantic features==
Building CORTUPP: a digital collection of
technical reports with semantic features
Ma. Auxilio Medina, Argelia B. Urbina Nájera, Antonio Benitez R., J. de la
Calleja, E. López D., Rebeca Rodrı́guez H.
Universidad Politécnica de Puebla
Tercer Carril del Ejido Serrano S/N
Juan C. Bonilla, Puebla, México
{mmedina, aurbina, abenitez, jdelacalleja, elopez, rrodriguez }
@uppuebla.edu.mx,
WWW home page: http://informatica.uppuebla.edu.mx/
~mmedina, ~aurbina, ~abenitez, ~jdelacalleja, ~elopezd, ~rrodriguez,
Abstract. The construction of a digital collection from the beginning
implies technical decisions such as choosing format and design of docu-
ments, the selection of search and browsing mechanisms to access data
and metadata, and the use of an architecture which support collaborative
work of authors. This paper describes CORTUPP, a digital collection of
technical reports. CORTUPP uses REC, an external service to support
collaborative labeling and ranking of documents.
1 Introduction
Learning is a continuous process supported by daily activities; peer and student
- teacher interactions enrich and accelerate this process [1]. Since 2004, the Uni-
versidad Politécnica de Puebla (UPPuebla) has adopted the competency based
education model [2]. The paper describes our work in integrating educational
resources into a digital collection of technical reports called CORTUPP. A tech-
nical report is a document that describes a research project or a technological
solution, this is constructed as final works of students.
The paper is organized as follows. Section 2 briefly describes a semantic
digital library. Section 3 explains the architecture of the collection. Section 4
describes the semantic features of CORTUPP. Section 5 explains how REC, an
external service that supports collaborative labeling and ranking of documents
is integrated to CORTUPP. Finally, Section 6 includes conclusions and suggests
future directions of our work.
2 Semantic digital libraries
Semantic digital libraries refer to systems build upon research on digital libraries,
semantic web, social networking and human computer interaction: they integrate
113
knowledge organization systems, delivered by classic digital libraries, with the
semantic web and social networking (Web 2.0) technologies [3].
Authors believe that semantic web technologies can support the development
of valuable collections and services required in educational institutions. The par-
ticular interest is a semantic digital library, that according to [3] is formed by
materials, tools and meanings. Some of the goals of a semantic digital library
are the following ones:
– Anyone can use it
– Knowledge is accessible from the semantic digital library
– Resources are available with the modality anytime anywhere
– Friendly and multi-modal interfaces
– Multiple connected devises
Although freely distributed software exists around the world to construct
semantic digital libraries such as Greenstone 1 or Jerome DL 2 , we decided to
implement an independent component in order to take into account the work
flows implemented at the UPPuebla. Authors believe that CORTUPP can serve
as a basis to construct a semantic digital library for the UPPuebla.
3 Architecture of CORTUPP
CORTUPP collection consists of a database, a web interface, assessment ins-
truments, a common structure of documents and search mechanisms. This is
available at http://server3.uppuebla.edu.mx/cortupp/. Figure 1 shows the
architecture of our collection. This is an adaptation of an architecture proposed
by [3].
The content of CORTUPP is formed by technical reports, registers of assess-
ment committees, assessment instruments and calendar of activities. The data
about users and count of users are also part of the content. The main users
are teachers and students at the UPPuebla that make use of services of access,
storage and search. Next sections describe the components of the architecture.
3.1 Structure of technical reports
We propose a common structure of the documents that introduces a common
semantic by itself. This structure is formed by the following mandatory chap-
ters: 1)research propose, 2)theoretical marc, 3)research design, 4)implementa-
tion, 5)results and 6)conclusions. Support material of the research project such
as interviews, questionnaires or large tables can be added in appendixes.
Authors are free to propose the structure of each chapter, except the first
one that refers to the research propose which is formed by the following manda-
tory sections: introduction, general objective, specific objectives, justification,
1
http://www.greenstone.org/
2
http://www.jeromedl.org/
114
Fig. 1. Architecture of CORTUPP
chronogram of activities, hardware and software requirements and scopes and
limits of the research project. The document structure has been defined as a
Latex template. The BibTex file format is used to create the bibliography 3 . A
technical report is described itself as a techreport entry. .
We identify internal users that belong to the UPPuebla community, they are
students, teachers, managers and staff of the diffusion department; and exter-
nal users who are members of another academic communities or visitors of the
collection.
3.2 Keyword-based services
CORTUPP has a web-based interface to access the documents stored at the
database, this makes use of hyperlinks to explore documents, to download the
assessment instruments or to access to relevant web pages. Technical reports are
stored as PDF files in the web server.
At CORTUPP users can carry on two types of searches: 1) keyword-based
searches and 2) authority search (search by author or by a participant of the
assessment committee). An assessment committee is formed by three teachers
who play the role of advisor, secretary and vocal. This committee validates the
content of the document. Figure 2 shows the interface of CORTUPP.
3
http://www.kfunigraz.ac.at/ binder/texhelp/bibtx-7.html
115
Fig. 2. Interface of CORTUPP
4 Semantic features
CORTUPP uses existing legal metadata in semantically enabled libraries. Tech-
nical reports are described with the Dublin Core (DC) elements of Table 1. These
elements are associated to the elements of the Latex template.
Table 1. DC elements used to describe a technical report
DC element Description
Creator Indicates the name of the first author
Date Indicates the delivery date
Description Contains the abstract of the technical report
Identifier This is a number used to identify the technical report
in the collection
Language Language of the content (Spanish)
Publisher Contains the name of the university as the entity responsible
for publishing the technical report
Subject Keywords of the technical report according to a research area
Title A given title to the technical report
CORTUPP is represented in a structure called ontology of records that main-
tain an organization by content. This is a hierarchical structure that provides
a unique and unambiguous interpretation of the document elements. This has
concept-term relationships useful for search based on free text. The main cha-
racteristics of an ontology of records are the following ones:
116
1. Technical reports are clustered by similarity
2. Clusters in the k -level have labels of k -terms
3. All documents of a cluster share the terms of its label
The features of the ontology of records can be found in [4]. Then, semantic
information is represented by metadata attached to each document and by the
ontology of records. CORTUPP design corresponds to the levels of knowledge
proposed by [3]:
1. Organization of the information in databases
2. Organization of the information in the documents
3. Organization of the metadata
4. Organization of the topics treated in the documents
5. Organization of the concepts, terms and relations
5 REC: an external service with semantic features
Adding semantic features for digital collections is a topic of interest in research
areas such as collaborative labeling, web 2.0 and semantic digital libraries. For
example, [5] describes the potential of tagging systems to support knowledge
organization or [6] investigate social book marking in digital libraries and derive
the design requirements to incorporate social book marking.
A tag is a keyword that acts like a subject or category for the associated
content [3]. Tags are user added metadata, tagging is the establishment of a
relationship between an online information resource and a user.
In social contexts, such as Flickr 4 , facebook 5 , del.icio.us. 6 and Soboleo 7 ,
traditional measures of information retrieval are not important, else the opinion
and experience of previous users. In this sense, we have decided to integrate
REC, an open software that allow users of CORTUPP to add tags to take into
account subjective information of users.
REC [7] makes use of the “induced tagging” technique design to improve
the quality of automatic markers. It offers collaborative labeling through the
resulting tags that produce recommendations and a ranking documents service
where labels are useful for helpful content recommendation.
REC allow domain experts and members of the community to assign meaning
labels to the technical reports of CORTUPP. Using REC, users construct a
different organization of the documents. The integration of REC to CORTUPP
makes it a community information space through functionality for selection,
annotation, authoring/contribution and collaboration.
4
http://www.flickr.com/
5
http://www.flickr.com/
6
http://delicious.com/
7
http://www.soboleo.com/
117
6 Conclusions
CORTUPP allows users to reuse the content of documents, the collection inte-
grates research activities of students under the supervision of a team of teachers.
This has the following advantages: distribution of assessment instruments, exten-
sion of document descriptions with labels. However, there are several challenges
in the construction of a semantic digital library such as providing for more usa-
bility and inference mechanisms. At the date, CORTUPP can be perceived as a
result of collaborative content production at the UPPuebla.
Currently, search services at CORTUPP are keyword-based. As future work,
we plan to expand those services in order to have semantic search engines, such
engines can be used to improve the quality of keyword-based search engines
by taking into account the meaning of the words. We conclude with further
possibilities of organization and recommendation that arise from the use of REC.
References
1. I., H.: Role of information technologies in teaching learning process: perception of
the faculty. Turkish Online Journal of Distance Education - TOJDE 9(2) (2008)
2. Lindley, W.I.: Constraints and potentials of training mid-career extension profe-
sionals in africa, part 2 (1999)
3. Kruk, S.R., McDaniel, B.: Semantic Digital Libraries. Springer-Verlag, Berlin,
Heidelberg (2009)
4. Medina, M.A., Sánchez, J.A.: Ontoair: A method to construct lightweight ontologies
from document collections. In: ENC ’08: Proceedings of the 2008 Mexican Interna-
tional Conference on Computer Science, Washington, DC, USA, IEEE Computer
Society (2008) 115–125
5. Li, Q., Lu, S.C.Y.: Collaborative tagging applications and approaches. IEEE Mul-
timedia 15 (2008) 14–21
6. Puspitasari, F., Lim, E.P., Goh, D.H.L., Chang, C.H., Zhang, J., Sun, A., Theng,
Y.L., Chatterjea, K., Li, Y.: Social navigation in digital libraries by bookmarking.
In: ICADL’07: Proceedings of the 10th international conference on Asian digital
libraries, Berlin, Heidelberg, Springer-Verlag (2007) 297–306
7. Sánchez, J.A., Arzamendi-Pétriz, A., Valdiviezo, O.: Induced tagging: promoting
resource discovery and recommendation in digital libraries. In: JCDL ’07: Procee-
dings of the 7th ACM/IEEE-CS joint conference on Digital libraries, New York,
NY, USA, ACM (2007) 396–397
Acknowledgments
We thank to the staff of Programa Académico de Ingenierı́a en Informática at
the UPPuebla for their help and cooperation in the construction process. This
work is partially supported by PROMEP grant Biblioteca Digital Semántica de
Recursos Educativos /103-5/09/4023.
118