A Framework for Ontological Description of Archaeological Scientific Publications Andrea Bonomi, Glauco Mantegari and Giuseppe Vizzari Department of Informatics, Systems and Communication, University of Milan–Bicocca Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy {andrea.bonomi, glauco.mantegari, giuseppe.vizzari}@disco.unimib.it Abstract— This paper describes the results of the first step structure, without any human intervention. On the other hand, in the development of a comprehensive project aimed at re- the libraries that use the first approach are maintained with alizing a portal and a set of advanced services supporting massive human effort, justified only if the offered quality is the sharing of knowledge about Prehistory and Protohistory in the Italian context. In particular, one of these services is much higher. For example, in DBLP3 (Digital Bibliography represented by a digital library, whose entries (i.e. bibliographic & Library Project) the entries and the related information descriptions of publications) will be ontologically described. The (authors, conferences, journals, etc.) are manually standardized paper introduces the approach that was adopted to support to guarantee than every entity is always represented by the the acquisition and representation of ontological knowledge. same string (e.g. every author name is always spelt the same The software modules that were developed to support these phases allow on one hand the management of the assertional way). This process is necessary because different bibliographic component of the ontology, and on the other the association of information sources provide information in different formats the related entities to digital library contents. These descriptions (e.g. some sources give full authors names, in others names will be exploited to support effective strategies for bibliographic are abbreviated). information retrieval as well as semantic navigation schemes through the recommendation of contents related to the currently Another difference between the two approaches regards the viewed one. description of the publications contents. The manual approach consists in (manually) associating publications to keywords I. I NTRODUCTION from a dictionary or a classification system. General classifica- The term e-Science is used to describe science performed tion systems are available (e.g. the ACM Computing Classifi- through global collaborations between scientists, enabled by cation System4 or the Dewey Decimal Classification System5 ), Internet technologies, in order to solve scientific problems [1]. however they are extremely generic and they do not support the Today, no researcher can work isolated but his/her work description of relations between different publications (e.g. to depends on the available resources in the scientific commu- describe that a publications is part of a collections or to define nity. Publications provide one of the main channels for the links between an article and a technical report). The automatic dissemination of scientific results and it is very important to approach consists in extracting keywords from the full-text have access to the right publications when they are needed. documents and associating them to the related publication Moreover, in most scientific fields, the amount of publications description. This approach requires to have access to the full- is growing exponentially [2] and finding the right information text documents in processable form (i.e. if the documents are is correspondingly getting harder. The growth of the amount of digitized from hardcopies they must be processed by means existing scientific publications is not a new phenomenon: in the of an OCR tool). 1960, Maron and Kuhns reported the the fact that documentary This paper describes instead a manual description approach data are being generated at an alarming rate, doubling every adopted in the specific domain of Archaeology. This work is 12 years [3]. set in the wider context of a project aimed at realizing a portal Today there are many on-line digital libraries helping users and a set of advanced services supporting the sharing of knowl- in finding information about scientific publications. There are edge about Prehistory and Protohistory in the Italian area. For mainly two approaches to populate these libraries: manually this specific activities partners in Archaeology Departments edit its contents and automatically populate it. The first participate to the project by providing their domain knowl- approach is generally used by libraries, publishers, editors, edge, but also providing the active participation of (thesis, laboratories, researchers, and so on, whereas the other is master, PhD) students that can carry out document description often adopted by Internet portals and general-purpose search activities. However, in order to effectively describe contents engines, like Google-Scholar1 or CiteSeer2 . These search en- beyond a keyword based approach, and thus in order to gines actively retrieve new documents and automatic tags and support effective forms of information retrieval and semantic link metadata information related in a scientific publications 3 http://dblp.uni-trier.de/ 1 http://scholar.google.com/ 4 http://www.acm.org/class/ 2 http://citeseer.ist.psu.edu/ 5 http://en.wikipedia.org/wiki/Dewey Decimal Classification for the retrieval of digital resources related to prehistory and protohistory, and in general to the archaeological research methodologies. In fact, even if there are a growing number of initiatives providing for the electronic publishing of scientific papers - as, for example, the digital archives of the Italian Institute for Prehistory and Protohistory7 or the BibAr8 project hosted at the University of Siena - their indexing by traditional search engines is often unsatisfactory. The main requirement of the portal is to give the community itself the possibility of autonomously managing the contents by means of simple editing tools. At this regard, we must keep in mind that, in most cases, archaeologists have just low- level technical competence and the development of a complex Fig. 1. A screenshot of the ArcheoServer Home Page editing system may result in the failure of the project. In our scenario there are two principal classes of editors. The first one is represented by the students of Archaeology of navigation, human annotators must have available a domain the Universities involved in the project, who are responsible ontology whose elements can be selected as relevant indicators of the content creation; the second one is represented by of the topics treated in the described publication. The paper Archaeology professors or researchers that, beyond creating introduces the ontological description approach, as well as the contents, supervise the work of students. software modules developed to support the definition of the However, our intent is also to create a platform for the ex- archaeological domain ontology and the e-library document perimentation of computer science research, focusing on those annotation. aspects that can lead to a real innovation, such as the semantic The remainder of this paper is organized as follows. Sec- description and retrieval of the contents. In fact, scientific tion II provides a description of the application scenario which publications in archaeology reflect its strong interdisciplinar is followed by a discussion of related works. In Section IV nature in terms of contents richness and articulation. For this we discuss the chosen content description approach and in reason it may be interesting to describe, by means of a specific Section V we describe the overall system architecture. We ontology, all the publications that will be archived in the will end with an outlook about possible future extension. e-Library section in order to provide advanced instruments for a more effective retrieval of the specific information a user is II. A PPLICATION SCENARIO interested to. Moreover the system may even suggest relevant In the course of 2005, the chair of Prehistory and Pro- contents which are semantically related to the ones the user is tohistory of the University of Milan, in collaboration with actually viewing on the screen. the Department of Informatics, Systems and Communication Therefore, the e-Library must allow content editors to of the University of Milan-Bicocca and the Department of describe semantically all the publications in a collaborative and Archaeology of the University of Bologna, have started a long- simple way, by adopting a simple web-based user interface. term project for the creation of a set of Web-oriented services In particular this description will be performed manually by aimed at supporting the sharing of knowledge on prehistory the students, while archaeology professors and researchers will and protohistory in Italy. supervise the work and will progressively refine/maintain the The main objective of the project has been the creation domain ontology. of a Web portal, named ArcheoServer6 , which will pro- Since ontologies are complex to build and understand, the vide a collaborative platform for the exchange of scientific ontology terminological component (roughly speaking, the information among the communities of Italian archaeology structure of the ontology) has to be designed by archaeology researchers. A first type of information regards the preliminary professors and researchers with the aid of knowledge engi- results of the research in progress (e.g. that relating to the neers. In our scenario, since after the initial design of the archaeological excavations conducted during the year), which ontology a structural modification occurs rarely, an ontology are rarely communicated to the scientific community before editing tool, external to the e-Library Web-based system, can being revised and included in larger studies. Moreover, the be used for this activity. The e-Library has only to support portal will provide an easy access to more articulated and the maintenance on the domain ontology assertion component analytical contributions on specific topics (e.g. those discussed (the instances of the concepts defined in the terminological in a PhD thesis or in a article in a scientific journal), by means component). Figure 2 shown a scenario that conveys how of the electronic publishing of traditional papers. different users groups should interact with the application main A particularly relevant section of the portal is devoted to a components. e-Library which was devised to supply an effective mechanism 7 http://www.iipp.it/ 6 http://www.archeoserver.it/ 8 http://www.bibar.unisi.it/ Knowledge Ontology Bibliography CiteSeer11 in another example of a Web-based scientific Engineer maintainer editor Edit the ontology literature digital library which was developed by the NEC Maintenance of the Edit the bibliography, terminological ontology assertion describe the Research Institute. The aim of the project is not only to create components components publications a digital library but to provide algorithms, metadata, services, view edit view edit edit edit techniques, and software that can be used in other libraries. CiteSeer offers to the users features similar to DBLP but A-Box uses a different approach to populate the library: CiteSeer T-Box Publications actively retrieves new documents and automatic tags and descriptions Bibliography links metadata information inherent in an academic documents Ontology syntactic structure [5]. In our opinion, CiteSeer offers many view view view interesting features, but since it is not an open source product, Search and we cannot use it for our e-Library framework. navigation Another application for assisting users in managing, search- End-User ing, and sharing bibliographic information is Bibster12 [6]. It allows searching bibliographic information on a distributed peer-to-peer network using Semantic Web technologies and Fig. 2. Use case displaying users groups and their actions over the system provides an easy way to share data with other users. Biblio- graphic data are represented following the SWRC (Semantic Web Research Community) Ontology [7]. This ontology de- A non-functional requirement for the e-Library is the adop- fines a shared and common domain theory that represents a tion of OWL9 (Web Ontology Language) as ontology language research community, its researchers, topics, publications, tools, to describe the publications contents. OWL was adopted be- and relations between them. However, Bibster does not match cause it allows representing and exporting ontological knowl- our requirements since the project requires a web-based e- edge in an interoperable way. Library application. It must be noted that this paper reports the first results of Out of the bibliographic domain, there are many ontology- the project, but we also aim at adopting this approach to the based Web search applications which we have analyzed. ontological description of other contents of the portal, from OntoWeb [8] is a semantic portal through which knowledge images depicting findings and sites, to specific elements of can be gathered, stored and accessed by members of a certain interest in the webGIS (e.g. sites, settlements). The description community. Knowledge retrieval and extraction is based on of these aspects of the project, however, are out of the scope the documents ontological annotation. In the portal, the hier- of this paper. archical organization of the different concepts of the ontology is graphically represented as a dynamic tree, from which III. A NALYSIS OF THE R ELATED S YSTEMS the users can view instances of a class by expanding the tree nodes and selecting the element of interest. OntoWeb Before choosing how to develop the e-Library, different graphically displays only the relations of the classes but not available bibliography information system, semantic annota- the relations between individuals. In our opinion, this kind of tion frameworks, OWL editors and viewers have been ana- visualization is not suitable for our requirements, because the lyzed. A summary of the analysis of such systems will be relation between specific e-Library contents (i.e. individuals) given in this section. are extremely relevant. The e-Library is mainly inspired by DBLP10 (Digital Bib- Sesame13 [9] is an open source framework with support liography & Library Project). DBLP is a Computer Science for inferencing and querying on RDF and RDF Schema. Bibliography developed by the University of Trier that allows Despite it is mainly a library for building applications that searching a huge collection of bibliographic information (in need to work with RDF, Sesame comes with an interface to October 2006, more than 800.000 publications) with a easy-to- allow access to semantic repositories through a Web browser. use Web interface. The Web interface also allows browsing the The interface supports both semantic query and navigation bibliography by following links of author, citations, journals of the ontology via hyperlinks. However this interface is not and conferences. DBLP collects bibliographics information intended to support end-users with little or no knowledge about provided by publishers, editors and so on. A detailed de- ontology languages and thus it offers only a basic support scription of DBLP, its architecture, evolution and perspectives for our requirements. From the developer point of view, the can be found in [4]. Since DBLP was started as a prototype API provided by Sesame are comparable to the Jena API. In Web application in 1993, several years before the birth of an evaluation of different knowledge base presented in [10], the Semantic Web initiative, it does not provide any form of Sesame seems to be faster than Jena. However we choose semantic description of the publications. 11 http://citeseer.ist.psu.edu/ 9 http://www.w3.org/TR/owl-features/ 12 http://bibster.semanticweb.org/ 10 http://dblp.uni-trier.de/ 13 http://www.openrdf.org/ Geographics instance of Place Italy partOf (inferred) instance of partOf_directly instance of North Italy Activity partOf_directly instance of partOf_directly Lombardy instance of partOf_directly instance of Central Italy South Italy Human Activity Fig. 4. Navigation tree and graphical representation of the correspondent ontology graph Fig. 3. A screenshot of the A-Box Editor ontology terminological component, save it as an OWL file, Jena for developing out framework because Sesame lacks a and than we import this file in the framework. complete support of OWL. From the user’s point of view, the developed framework is In order to develop the e-Library user interface, many composed of different modules: A-Box Editor, Publication De- ontology editors and visualization tools have been investi- scription Interface and End-Users Interface. This last module gated. In our opinion, these applications are critical because is divided in three submodules: the Semantic Query Interface, the diffusion of Semantic Web technology depends on the the Semantic Navigation Interface and the A-Box Viewer. Not availability of convenient and flexible tools for editing and all the modules are fully implemented yet, in particular the browsing ontologies. Semantic Query and Semantic Navigation Interfaces are still The more popular ontology editor is Protégé14 . It is a free, under development. In the following paragraphs, more details open source ontology editor and knowledge-base framework. about each module will be given. A detailed description of Protégé is out of the aim of this paper and can be found in [11]. In our opinion, Protégé is one A. A-Box Editor of the best OWL editor, but its user interface is too complex The A-Box Editor is only available for ontology maintainers for a user with no experience of OWL and lacks some useful and enables them to edit the ontology A-Box. functions like the inspection of the elements via hyperlinks and As shown in Figure 3, the ontology navigation tree is comfortable edit/visualization facilities for the A-Box [12]. placed on the left part of the interface and the individuals An interesting Web-based OWL ontology exploration tool and properties editor on the right. The aim of the navigation is OntoXpl, which is described in [12]. In particular, an tree is to explore the A-Box and select the individual to edit. interesting features of OntoXpl is the visualization facility The navigation tree is not a hierarchy of classes, but rather of for A-Box, that can be displayed as tree whose nodes are individuals connected with partOf or superType15 properties. individuals and arc are properties. This kind of visualization OWL does not contain specific primitives for partOf or is suitable for A-Boxes with many individuals. OntoXpl also superType properties but it supports suitable mechanisms to supports the inspection of the ontology elements via hyper- express the features we wanted to specify for these properties. links. OntoXpl has inspired the design of the framework user We defined both these properties as transitive (e.g. if Varese interface, particularly the navigation tree and the A-Box Editor. Province is part of Lombardy, and Lombardy is part of North Italy, then Varese Province is part of North Italy). For each IV. C ONTENT DESCRIPTION APPROACH property, we also defined a sub-property which is directed and Following the previously introduced requirements, three non transitive (e.g. we defined the property partOf directly as e-Library user groups have been identified: ontology main- a sub-property of which partOf ). These properties link directly tainers, content editors and end-users. End-users can have no an individual with its “father” and will be used to build the knowledge about ontologies and related editors, and ontology navigation tree. For example, if we assert that Varese Province maintainers are supposed to have a limited background of is directly part of Lombardy, a reasoner infers that Varese ontologies. Thus, one of the most important decisions in the Province is part of Lombardy and Varese Province is part of design of the e-Library is how to display and edit the ontology Italy. A description of this approach to the representation of terminological component (a set of classes and properties, in the Part-Whole relation is described in [13]. the following called T-Box) and assertion component (a set of We decided that the displayed navigation tree should not T-Box-compliant individuals, in the following called A-Box) exactly reflect the structure of the ontology A-Box but rather in a user-friendly way. it should attempt to provide a clear and usable presentation of the ontology to the users. An example of navigation tree As mentioned in Section II, in this framework, there is no specific tool to edit the T-Box. We adopted Protégé to edit the 15 Our superType property is different from the OWL subClassOf : in fact the subClassOf is a relations between classes, the superType is between 14 http://protege.stanford.edu/ individuals. is shown in Figure 4. The root of the navigation tree is the “fake” element Thing; it is not actually part of the ontology and it is only a placeholder. Under the tree root node, there are the top-level individuals (e.g. Human activity or Italy). These individuals are connected to the underlying individuals with partOf or superType properties (e.g. North Italy is part of Italy or Farming has super type Human activity). The editor of individuals and properties is placed on the right part of the interface. Using this editor, an ontology maintainer can create new individuals related to an existent Fig. 6. A screenshot of the Publication Description Interface one by means of a partOf or superType property (as shown in Figure 5), remove individuals, edit the label of an individual (the displayed name) and edit the related properties. ontology to a publication. The statements predicate (also The properties of each classes are defined in the T-Box. called property) defines the relation between the publication Two types of properties are distinguished: object property (subject) and the topic (object). Examples of properties are is a binary relation between two individuals and datatype hasTopicCulture and hasTopicHistoricalPeriod. These proper- property is a binary relation between an individual and a ties are defined in the ontology T-Box and every property literal (a primitive type, like string or number). Properties is a sub-property of the generic topic property hasTopic can also have cardinality and range restriction. For example, (e.g. hasTopicCulture is a sub-property of hasTopic). Range the class TypologyOfArchaeologicalObject has the property restriction is used to specify the valid values for the property buildOf. This property has no cardinality restriction (so it (e.g. hasTopicCulture has Culture as range). The Publication can have zero, one ore more values) but Material is specified Description Interface considers the range restriction allowing as range (co-domain). For instance, Sword is an instance of only to select the valid individuals as values of every proper- TypologyOfArchaeologicalObject and has the property buildOf ties. For example, the property hasTopicGeographics accepts Metal, where Metal is an instance of Material. only instances of GeographicsPlace as object, so, as shown in There are four properties editor defined in the framework: Figure 6, the interfaces only allows to select instances of this • single datatype allows editing a single literal value, class. displayed as a single line input box; C. End-User Interface • multiple datatype allows editing multiple literal value, adding and removing values; End-User Interface is composed of three submodules: the A- • single object allows defining a relation with a single Box Viewer, the Semantic Query Interface and the Semantic individual, presenting the user a tree for selecting the Navigation Interface. Not all the submodules are yet fully value; the individuals displayed in the tree are only those implemented. that are valid for the property range; The A-Box Viewer is directly derived from the A-Box • multiple object allows defining relations with multiple Editor. Through this module users can view ontology indi- individuals; It is similar to the single object editor but viduals and their properties, browse properties via hyperlinks allows adding and removing individuals, rather than se- and access related publications thanks to their description. lecting only one. Browsing the ontology is essential for the user to explore the available information and it also helps non-expert users B. Publication Description Interface to refine their search requirements, should they start with no The Publication Description Interface allows the content specific requirement in mind [14]. editors to associate a ontology-based description to the publi- The Semantic Query Interface is in an early stage of cations. development. Currently it only allows searching for papers The publications descriptions are statements (i.e. subject- characterized by a specific topic. The interface, as shown predicate-object triples) that associate a topic defined in the Italy instance of partOf_directly Geographics partOf Place (inferred) North Italy instance of partOf_directly New instance of Instance Fig. 5. Dialog box to create a new individual under North Italy and graphical representation of the graph after the creation of the new individual Fig. 7. A screenshot of the Semantic Query Interface LiILiteral LiResource LiProperty <> Publication <> <> Presentation A-Box End-User value : String Description label : String Editor Interface layer Interface localName : String URI : String LiClass dataTypeProperty : boolean Web Interface <> functionalProperty: boolean label : String objectProperty : boolean localName : String literal : boolean Ontology API URI : String range : LiClass[] Business logic layer LiAdapter Jena Adapter <> DB Adapter addInstance(class,uri) : LiInstance LiInstance classOf(instance) : LiClass <> Jena label : String getInstance(uri) : LiInstance getProperty(uri): LiProperty localName : String newLiClass(uri) : LiClass URI : String newLiInstance(uri): LiInstance triplesWithSubject() : LiTriple[] Persistence newLiLiteral(value): LiLiteral triplesWithObject() : LiTriple[] Bibliography newLiProperty(uri): LiProperty superType() : LiInstance[] layer newLiTriple(s,p,o): LiTriple superTypeInverse() : LiInstance[] Ontology removeProperyValue(p) partOf() : LiInstance[] Publications removeResource(uri) partOfInverse() : LiInstance[] Descriptions setPropertyValue(p,o) hasParent() : boolean MySQL tripleByObject(uri) : LiTriple hasChild() : boolean tripleBySubject(uri) : LiTriple classesOf() : LiClass[] addSubResource(resource) removeResource() LiTriple setPropertyValue(prop,value) Fig. 8. Overview of the framework three-tier architecture <> removePropertyValue(prop,value) subject : LiInstace getPropertyValues(prop) : LiResource[] property : LiProperty properties() : LiProperty[] object : LiResource in Figure 7, allows selecting the requested topic from the Fig. 9. UML diagram of the Framework API ontology individuals tree. The current implementation retrieves only the publications that satisfies all the specified criteria. A future extension may relax this constraint especially with reference to the number of retrieved publications (adapting the We use Hivemind16 to develop an open framework that query to the results). can be easy integrated with new adapters. Hivemind is a The Semantic Navigation Interface will support users in framework that supports the configuration of different services, the e-Library navigation. This system will suggests to the their lifecycle, and their combination. It is inspired by the users publications considering multiple strategies for making Service-Oriented Architecture, an approach to the design of recommendations (e.g. similar treated topics, recently visited software architectures adopting loosely coupled services. document, user interest, access frequency). This module is not currently implemented in the prototype system and will be In the framework, the ontology language OWL is used to object of future work. define a set of concepts and relation between them and to use these definitions to describe the contents of the e-Library V. F RAMEWORK A RCHITECTURE publications. OWL defines an information model that can be In this section, we introduce a high level overview of the represented as a directed graph, in which the nodes represent implemented prototype system. The framework was developed resources and the arcs the properties. The implemented API according to a three-tier architectural approach, as shown in supports the manipulation and query of these graph in two Figure 8. different ways: frame-centric and statement-centric. The presentation layer is a Web-based user-interfaces. The The frame-centric view is similar to the object-oriented business logic layer consists of a platform that implements paradigm. Every resource is viewed as and object and proper- the e-Library main services accessible through a set of API ties as attribute. This view is used for ontology navigation and independent from the underlaying storage systems. The aim resource manipulation. The statement-centric is a lower level of the persistence layer is to store the topics ontology, the view in which the graph is represented as a set of triples. Each publications descriptions and bibliographic data. triple contains three components: subject, predicate and object. A. Business Logic Layer This kind of representation is used to obtains query results. The business logic layer consists of a platform that imple- The Figure 9 shows an UML diagram of the framework ments the e-Library main services accessible through a set of API. All the information provided to the upper level are API. The main purposes of these API are to support the ma- modeled using these interfaces. The interfaces LiClass and nipulation and querying of the ontology and the publications LiProperty correspond to OWL Class and Property, LiResource descriptions without requiring a detailed understanding of the represents a generic RDF17 (Resource Description Framework) specific internal storage facility. Resource, LiInstance a class instance (an individual), LiLiteral The business logic layer interacts with the underlaying layer a literal and LiTriple a statement (an assertion), constituted by through a set of adapters: this plug-in interface makes the a subject, a predicate an a object. LiAdapter is the interface application independent from the specific implementations. We of each adapter (i.e. Jena Adapter and DB Adapter). defined a common API for the adapters: currently implemen- tations of these API are the Jena Adapter and the DB Adapter. The first one is a wrapper for the semantic framework Jena, 16 http://jakarta.apache.org/hivemind/ the other one for the relational database MySQL. 17 http://www.w3.org/RDF/ i o k e k e r e l m e t h o 3 ) n v n d Journal id Journal 2 ) r e q u e s t Name Author id 1 ) e x p a n d J s r i p t s u b i t e m s D W R (0,n) a v a c o e n d l First name e n Publication id (0,n) r Author v a a a written by r e s p o s e : 4 ) published on n K e b b b Last name > s u b o e s l i s t Title n n d J e r e r a v a S v v c c c s X M L i o a t (0,1) a F e s a c Electronic edition Homepage > d d d i c l e e e (0,n) > Publication p ISBN A p i s p l ) 5 a y d (0,n) Class id t h e s u b (0,1) Class W e b p e has class a g (0,1) W e b I t e r f e (0,n) n a c Volume n o d e s o n Description W e b B r o w s e r i n t h e t r e e A p p l i t i o e r e r c a n S v Year (0,n) Publisher id published by Publisher has Name subject Fig. 11. Interaction among the navigation tree and the server components. (1,1) Predicate Inferred Description Object issue. Generally speaking, bibliographics data and descriptions can grow very quickly22 and can have a memory occupation much more relevant than the ontology. If publications were Fig. 10. ER diagram for bibliographics data and publications descriptions annotated, the number of descriptions could be very high. A semantic framework like Jena, that uses a memory-based B. Jena reasoner, is not suitable to manage this amount of data (a performance evaluation of several frameworks suitable for Our choice for OWL ontology storage, manipulation and large OWL ontologies is presented in [10]). quering is Jena18 , an open-source Semantic Web Toolkit The bibliographic data and the descriptions are stored in developed by HP Labs19 . Its aim is to support the development the database, whose schema is shown in Figure 10. The of applications that use the Semantic Web information models most notable element is the publication description table. an languages [15]. We have adopted this framework since it This table holds information about publications descriptions matched our requirements and because is widely used within as subject-predicate-object triples: the subject is a publication the Semantic Web research community and well documented. identifier, the predicate defines the type of the topic (e.g. The core of the toolkit is the RDF API, which supports the hasHistoricalPeriodTopic, hasCultureTopic) and the object is manipulation and querying of RDF graphs (an OWL graph the topic of the document. Examples of such triples are: can be viewed as a specialization of a RDF graph, so the publication001 hasHistoricalPeriodTopic bronzeAge, publica- Jena API also supports OWL graphs). Jena supports several tion001 hasCultureTopic Etruschi. According to the defined different storage technologies for ontology persistence. The domain ontology, every publication can have zero, one or more simplest is to load axioms and individuals directly from an topics, also of the same topic type (e.g. the a publication can OWL file, but this approach requires the document to be parsed be related to both the historical periods Middle Bronze Age each time the framework starts up and to store after every and Late Bronze Age). modification. This can be a source of significant overhead. To avoid this problem, we have used the relational databases D. Presentation layer persistent storage strategy. This approach also enables faster The main technology used to develop the user interface, de- retrieval and insertion of the ontology elements20 . To import scribed in Section IV, is JSF23 (JavaServer Faces). We choose the OWL ontology created with Protégé into the database, this technology mainly because JavaServer Faces define a we have used the Jena OWL readers (Jena has readers and clear separation between application and presentation logic writers for different languages that can be used to represent and support the connection of the presentation layer to the RDF graphs and OWL). application code. JSF defines a set of APIs for representing C. Persistence Layer user interface components, managing their state, handling events, input validation, and defining page navigation. The topics ontology, the publications descriptions and bibli- Another adopted technology is AJAX; it is not a technology ographics data are stored in on the relational DBMS MySQL21 . in itself, but a term that refers to the combined use of a group Jena stores ontology in a statements table and other ad- of technologies (JavaScript, DHTML (Dynamic HTML)24 , ditional tables (e.g. for reification statements); these tables XML and the Remote Scripting) [16]. In particular, we use are not intended for direct access by other applications. AJAX for the dynamic tree component, that is used as nav- Publications descriptions and bibliographics data are described igation tree, properties editor tree and semantic query topics in the ontology but they are stored separately for performance 22 For example, DBLP(Digital Bibliography & Library Project), the Com- 18 http://jena.sourceforge.net/ puter Science Bibliography of the University of Trier, indexes more than 19 http://www.hpl.hp.com/ 800000 publications. 20 For more information, see: Jena Fastpath Query Processing - 23 http://java.sun.com/javaee/javaserverfaces/ http://jena.sourceforge.net/DB/fastpath.html 24 http://www.w3.org/DOM/faq.html#DHTML–DOM, 21 http://www.mysql.org/ http://www.w3schools.com/dhtml/ tree. We use AJAX because this technology enables to display of results, the query could be extended to select publication new contents in a Web page without completely reloading it. treating also topics related to those explicitly required. As shown in Figure 11, it is possible to dynamically load the Finally, future works will be focused on the development tree elements when required. Having such feature allows it to and test of the Semantic Navigation Interface, which will handle large amounts of data: this is a very important aspect support users in the e-Library navigation. This system will because the tree could be very large and is unnecessary to load make recommendations considering multiple strategies: e.g. all the elements every time. correlation, recently visited documents, user interests, access The AJAX tree is integrated with the rest of the framework frequency. This interface will also capture the cumulative by DWR25 (Direct Web Remoting). This technology allows effect of an entire user navigation session in order to generate JavaScript code in client Web browser to communicate with semantic queries. An description of a work based on this the framework running on the server. approach can be found in [17]. R EFERENCES VI. C ONCLUSIONS AND F UTURE D EVELOPMENTS [1] C. A. Goble, “Using the Semantic Web for e-Science: Inspiration, In this paper, we presented a prototype of a semantic- Incubation, Irritation.” in International Semantic Web Conference, 2005, pp. 1–3. based e-Library. This applications allows users searching a [2] M. Ley and P. Reuther, “Maintaining an Online Bibliographical collection of publications semantically described. Moreover it Database: the Problem of Data Quality.” in EGC, ser. Revue des gives to the content editors the possibility of autonomously Nouvelles Technologies de l’Information, vol. RNTI-E-6. Cépaduès- Éditions, 2006, pp. 5–10. managing the assertional component of the domain ontology, [3] M. E. Maron and J. L. Kuhns, “On Relevance, Probabilistic Indexing the publications description and the bibliographic data. To and Information Retrieval.” J. ACM, vol. 7, no. 3, pp. 216–244, 1960. describe the publications topic, the e-Library exploits ontology [4] M. Ley, “The DBLP Computer Science Bibliography: Evolution, Re- search Issues, Perspectives.” in SPIRE, Lecture Notes in Computer expressed in OWL. A campaign of tests with the students Science, vol. 2476. Springer, 2002, pp. 1–10. of Archaeology aimed at evaluating the effectiveness of the [5] C. L. Giles, “Citeseer: Past, Present, and Future.” in AWIC, Lecture publications description approach and the usability of the user Notes in Computer Science, vol. 3034. Springer, 2004, p. 2. [6] P. Haase, J. Broekstra, M. Ehrig, M. Menken, P. Mika, M. Olko, interface is under way. The tests were focused on the A-Box M. Plechawski, P. Pyszlak, B. Schnizler, R. Siebes, S. Staab, and Editor and the Publication Description Interfaces because these C. Tempich, “Bibster - a Semantics-based Bibliographic Peer-to-Peer modules are in a more advanced stage of development. System.” in International Semantic Web Conference, Lecture Notes in Computer Science, vol. 3298. Springer, 2004, pp. 122–136. Preliminary results of this tests showed that the proposed [7] Y. Sure, S. Bloehdorn, P. Haase, J. Hartmann, and D. Oberle, “The ontology visualization is useful for the users as a guide to SWRC Ontology - Semantic Web for Research Communities.” in EPIA, Lecture Notes in Computer Science, vol. 3808. Springer, 2005, pp. describe the contents of publications. It helps users with no 218–231. knowledge about ontologies to understand the relationship [8] P. Spyns, D. Oberle, R. Volz, J. Zheng, M. Jarrar, Y. Sure, R. Studer, between the different topics and between the topics and the and R. Meersman, “Ontoweb - a Semantic Web Community Portal.” in PAKM, Lecture Notes in Computer Science, vol. 2569. Springer, 2002, publications. Moreover new required features were expressed pp. 189–200. after the tests. In particular, the users required the possibility [9] J. Broekstra, A. Kampman, and F. van Harmelen, “Sesame: An Archi- to choose the property on which each tree is built on. For tecture for Storing and Querying RDF Data and Schema Information.” in Spinning the Semantic Web, MIT Press, 2003, pp. 197–222. example, the users found useful the findings tree build on the [10] Y. Guo, Z. Pan, and J. Heflin, “An Evaluation of Knowledge Base “superType” property (e.g. “Sword has super type weapon”, Systems for Large OWL Datasets.” in International Semantic Web “weapon as super type handwork”), but they can also make Conference, Lecture Notes in Computer Science, vol. 3298. Springer, 2004, pp. 274–288. use on a tree build on the “hasMaterial” property. Another [11] H. Knublauch, M. A. Musen, and A. L. Rector, “Editing Description required feature is the ability to sort the tree items according to Logic Ontologies with the Protégé OWL Plugin.” in Description Logics, a given property. Currently, the items are sorted alphabetically, CEUR Workshop, vol. 104. 2004. [12] V. Haarslev, Y. Lu, and N. Shiri, “Ontoxpl - Intelligent Exploration of whereas for some concepts, like the historical periods, this OWL Ontologies.” in Web Intelligence. IEEE CS, 2004, pp. 624–627. choice is not sensible. For example, the historical periods [13] R. Alan, W. Chris, N. Natasha, and W. Evan, “Simple Part-Whole are better ordered by an explicit “isPrecedent/isSuccessive” Relations in OWL Ontologies,” Aug 2005. [Online]. Available: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/ property. [14] S. Ram and G. Shankaranarayanan, “Modeling and Navigation of Large The tests also considered the Semantic Query Interface, Information Spaces: a Semantics Based Approach.” in HICSS, 1999. which is at an early stage of development. Currently it only [15] B. McBride, “Jena: A Semantic Web Toolkit.” IEEE Internet Computing, vol. 6, no. 6, pp. 55–59, 2002. allows searching for papers characterized by specific topics. [16] G. Jesse James, “Ajax: a New Approach to Web Applications.” The interface allows selecting the topics from the ontology [Online]. Available: http://www.adaptivepath.com/publications/essays/ individuals tree and retrieves the publications related with archives/000385.php [17] N. Athanasis, V. Christophides, and D. Kotzinos, “Generating on the fly all the selected topics. From the test experience, it might be Queries for the Semantic Web: The ICS-Forth Graphical RQL Interface useful to relax these constraints especially with reference to (GRQL).” in International Semantic Web Conference, Lecture Notes in the number of retrieved publications, adapting the query to the Computer Science, vol. 3298. Springer, 2004, pp. 486–501. [18] S. A. McIlraith, D. Plexousakis, and F. van Harmelen, Eds., The Seman- results. For example, if a query selects only a small number tic Web - ISWC 2004: Third International Semantic Web Conference. Proceedings, Lecture Notes in Computer Science, vol. 3298. Springer, 25 http://getahead.ltd.uk/dwr/ 2004.