Information retrieval in Current Research Information
                             Systems.

              Andrei S. Lopatenko
         Vienna University of Technology
            Gusshausstrasse 28 / E015
             A-1040 Vienna, Austria
                   

             andrei@derpi.tuwien.ac.at
ABSTRACT                                                         persons, information from research web pages also should
In this paper we describe the functional requirements for        be included into information retrieval operations.
research information systems and problems which arise in         Usually researchers'or policy-makers'demands for research
the development of such a system. Here is shown which            information is not limited to information from one single
problems could be solved by using knowledge markup               system. Research information in any science or technology
technologies. In this article one DAML + OIL ontology for        area is scattered among a number of heterogeneous
Research Information System is offered. The already              information systems. There is a strong need to gather
developed ontologies for research analyzed and compared.         information or to point researchers to systems where
The architecture based on knowledge markup for collecting        information can be found. It is very important to know if
research data and providing access to it is described. It is     the gathered research information is actual and complete.
shown how RDF Query Facilities can be used for
                                                                 We are developing the AURIS-MM information system
information retrieval about research data.
                                                                 (Austrian Research Information System - MultiMedia
Keywords                                                         enhanced) to provide research information to interested
Current Research Information System, Ontology,                   consumers in a more attractive way. The system is being
Information Retrieval, DAML, RDF, Knowledge Markup,              developed coming from the existing AURIS (Austrian
scientific publishing                                            Research Information System) and FoDok-Online
INTRODUCTION                                                     (Research Documentation of Vienna University of
Information about research results, projects, publications,      Technology).
organizations, researchers and so on published on the web        Our experience and newest web technologies showed us
play a more and more pervasive role in modern research.          that centralized database systems are very efficient but not
The increasing dependence of modern research on already          the best solution to provide access to research data due to a
achieved research results requires to have ability to retrieve   widespread distribution of the research data over the web.
research information in a more efficient way.                    The new version of AURIS-MM is based on Semantic Web
Information overload by the exponential rise of amount of        technologies
information makes it difficult for researchers to find                 RDF –          Resource     Description    Framework
relevant information. To solve these problems a number of        www.w3.org/rdf
Current Research Information Systems (CRIS) is being
developed.                                                             RDFS – Resource Description Framework Schema
                                                                 www.w3.org/rdf
But in most cases such systems do not solve their task of
providing complete and actual information with a minimum                 DAML + OIL (DARPA Agent Markup Language
of information noise. This is one reason that researchers are    + Ontology Inference Layer) www.daml.org
not prone to publish results about their research via            ONTOLOGY DEVELOPMENT FOR SCIENCE
information systems. Publishing usually is limited to            Some efforts already were done to provide to researchers,
researcher’s or project’s web pages.                             industry, policy-makers efficient information access to
To provide actual and complete information for interested        research data from some sectors of science and access to
                                                                 research limited to organization (university research
                                                                 information systems), or limited to geographical boundaries
 LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT                    (national networks, ERGO[ERGO] – European Research
     COLUMN ON THE FIRST PAGE FOR THE                            Gateways Online) .
            COPYRIGHT NOTICE.
                                                                 The development and use of such systems has shown that it
                                                                 is very hard to collect complete and up-to-date data about
research in a sector or in an organization like a university in   markup pages for automatic knowledge extraction. The last
a central system due to the huge effort of periodically           version of DAML is named DAML + OIL. DAML
copying or keying in the data by the providers.                   specifications, examples, tools, ontologies are published at
Due to the fact that already huge amount of data is provided      DAML home page.
on internet web pages of projects, researchers, universities,     Several ontologies for research information are developed
it is hard to get researchers provide their data once more        in DAML. Among them: DAML version of SHOE
into a centralized system.                                        University                                       ontology
Full      text    search     engines      like      Google        (http://www.cs.umd.edu/projects/plus/DAML/onts/univ1.0.
(http://www.google.com) index among others also pages             daml), SWRC (Semantic Web Research Community)
with research information. But they can not limit search to       ontology     (http://www.semanticweb.org/ontologies/swrc-
trusted data, understand context of the page and provide          onto-2000-09-10.daml), homework assignment ontology
search based on meaning of the data.                              (http://www.ksl.stanford.edu/projects/DAML/ksl-daml-
                                                                  desc.daml).
One of the possible ways to collect data about research is
the page annotation. Knowledge can be annotated on the            A more complete list of ontologies for research data as well
page in such a way that automatic tools can collect and           as for metadata standards, thesauri and system architectures
understand it [BL-2001, Hend-2001, Erd-2001]                      please find at the European Current Research
                                                                  Information       Systems      Platform      home      page
Ontologies make possible that software agents can                 (http://www.eurocris.org) and at Andrei Lopatenko’s
understand knowledge which is marked up [Staab-2001,              Resourse Guide to Metadata for Science, Research and
SWA] . The benefits of ontologies and Semantic Web use            Technology
for scientific publishing were described at [Lee-2001]            (http://derpi.tuwien.ac.at/~andrei/Metadata_Science.htm)
Some effort is already done to develop markups for
                                                                  ONTOLOGY
scientific data.
                                                                  So, the main goal of our ontology development was to
SHOE[Hefl-99, SHOE] is a small extension to HTML                  develop an ontology which will help users of research
which allows to annotate some knowledge about web page            information to retrieve relevant information.
content. SHOE is a very simple language for declaring
                                                                  The Primary use cases of information retrieval for CRIS are
ontology, defining classification, relationship, inference
                                                                  [Jeff-98, CERIF-2000, Lind-2000, Aks-2000]
rules, categories, etc. SHOE was developed in the
Department of Computer Science, University of Maryland.               •   Retrieving information about research results by
SHOE specification, tools, SHOE ontology in plain text and                researchers or students for results reuse. The
DAML, examples are accessible at the SHOE home page                       estimation of research results.
Several ontologies for university and research data were              •   Seeking collaborators which can take part in
developed for SHOE. There are the University ontology                     research projects as partners, sell their expertise,
and the Computer Science Department ontology                              results and intellectual rights
(http://www.cs.umd.edu/projects/plus/SHOE/onts/index.ht               •   Finding facilities and equipment which can be
ml).                                                                      used for research
OIL (Ontology Inference Layer) [OIL, Fens-2000] - "is a               •   Assess and access to Research and Development
proposal for a web-based representation and inference layer               capabilities by policymakers
for ontologies, which combines the widely used modeling
primitives from frame-based languages with the formal                 •   Finding ongoing research and technology activities
semantics and reasoning services provided by description                  and results of projects by users in commerce and
logics. It is compatible with RDF Schema (RDFS), and                      industry
includes a precise semantics for describing term meanings             •   Finding the sponsors for a new research project
(and thus also for describing implied information)." OIL
                                                                  The ontology should contain terms already known to
was sponsored by the European Community via the IST
                                                                  developers of Current Research Information system to make
projects Ibrow and On-To-Knowledge.                               it more easy to integrate new infrastructure with the old
In the OIL for research data there were developed SWRC            ones.
(Semantic Web Research Community Ontology)                        There are not a lot of metadata standard for science. The
(http://ontobroker.semanticweb.org/ontologies/swrc-onto-
                                                                  review of them have been done at [Grot-98,Lop-01].
2000-09-10.oil) and KA2 (Ontology of Knowledge
Acquisition community) .                                          Math-Net developed a metadata format based on Dublin
                                                                  Core and RDF Schema for mark up of knowledge about
DAML (DARPA Agent Markup Language)[DAML] -
                                                                  content of researchers and institutes pages[MathNet]. Math-
ontology markup language, was developed as an extension
to RDF and RDFS. DAML allows to specify ontologies and
Net metadata set allows describe Researchers/Research              Advanced          Yes             Close       to   Close       to
groups/organizations, projects, results, events, publications.     classificatio                     CERIF            CERIF
In our ontology development we decided to use CERIF-               n which can                       classificatio    classificatio
2000 metadata standard (Common European Research                   server      to                    n           of   n           of
Information Format)[CERIF-2000]                                    research                          publications     publications
                                                                   and                               .        Grey    .        Grey
According to CERIF documents [CERIF] “CERIF 2000 is a              educational                       literature is    literature is
set of guidelines meant for everyone dealing with research         IS                                not included     not included
information systems. The CERIF 2000 guidelines are
developed by a group of experts from the EU Member
States and Associated Member states, under the co-                                               Event
ordination of the European Commission.”                            Yes.              Conference      Yes. Very        Conference
                                                                   Vary basic        s               close  to
Now CERIF 2000 is used by several groups of developers                                               CERIF
                                                                   classificatio
and researcher in different EU member states, it is proved
                                                                   n
and stable. Also different group of developers are well-
acquainted with CERIF-2000 what will let make a process                                        Equipment
of ontology more easy                                              Yes.              No              No               No
Despite excellence of CERIF as metadata format for
research, there are certain lacks in CERIF in description                                        Patent
some types of research information resources. In
                                                                   Patent            No              No               No
development of our ontology we decided to enrich it with
terms, slots from some other ontologies, to make it more
suitable for research information retrieval.                                           Product/Research result
In the next table is provided comparison of enriched CERIF         Product           Only            Yes              Only
ontology with a few already developed ontologies (they                               software                         software
were described earlier)                                                              and                              product
                                                                                     software
Table 1. Comparison of selected ontologies for science                               libraries
  CERIF           Math-Net       SWRC             University                        Expertise skill/Research topic
  2000            ontology       Semantic         Ontology         Expertise         Yes             Research         No
                                 Web                               skill             Subject         Topic
                                 Research                                            Value
                                 Community
                                                                                          Multimedia elements
                             Person
                                                                   Multimedia        No              No               No
  Yes.            Yes.           Advanced         Advanced         elements
  Not                            hierarchy        Hierarchy
                                                                   No
  classified in                  suitable for     suitable for
                                 research and     research and                                 Sites/pages
  CERIF
                                 education        education        No                Yes             No               No
                             Project
  Not             Yes            Yes.             No             After the comparative analysis of the CERIF ontology,
  classified in                  Classified.                     selected ontologies and some research information systems,
  CERIF                                                          it was recognized that CERIF ontology could be a base
                         Organization                            technology due to richness of base terms and relevance to
                                                                 RIS. But in some areas there are certain lacks in CERIF.
  Yes.            Yes            Close       to   Only
                                                                 Enriching CERIF ontology with terms from other
  Classified                     CERIF            educational
                                                                 ontologies can be useful for research information systems
                                 classificatio
                                 n                               The primitive units of the CERIF ontology are Person,
                                                                 Project, Organization Unit, Publication, Event, Site
                          Publication                            (Internet service/page), Equipment, Result, Multimedia
                                                                 element, Research topic (Expertise skill).
Research results which can be reused might be described in     Project
publications (articles, thesis, technical reports, etc.).                European project
Research results might be described precisely (Research
result or Product). They can be presented by advanced                    Fundamental research project
presentation techniques - Multimedia element, which maybe                Applied research project
video, images, drawing, diagrams, MS PowerPoint                          Financed by official bodies project
presentations.
                                                               Person
Research results are results of research projects, invented
by persons(researchers, students), in organization units                 Researcher
(universities, labs, institutes, departments). Information               Student
about expertise skills of persons, organizations can be also   Product/Research result
significant for estimation of research results.
                                                                         Fundamental
Some research results are patented and             valuable
information about them can be stored in patents.                         Applied
To make search of research results more easy information                 Software
about any entity can be classified by research topics.                              Software library
To find a partner. Partner might be an organization unit or                         Information system
person, which has relevant for partner seeker research                   Compound
results and experience. Information about results and
experience of partner can be extracted from its                          Process
publications, description of the projects.                               Technology
Information about organization units, publications, results,             Algorithm
projects, persons can be stored on the sites. No research                Documentation
information system store all relevant information. So users
need to know about other information system, which can                              Proposal
help in search research results, partners.                     Event
To help user find information, data about other research                 Conference
data relevant sites and internet services should be provided             Cultural event
to user.
                                                                         Exhibition
Research may need equipment or facilities. Information
about those entities also should be retrievable and                      Political event
searchable.                                                              Sport event
Table 2. Research Information Ontology terms                             Trade fair
Organization unit                                                        Workshop
        Enterprise                                             Publication
        Higher Education Establishment                                   Abstract
                    University                                           Book
                    Faculty                                              Conference paper
                    Institute                                            Conference proceedings
        International organization                                       Dissertation
        Joint Research Center                                            Guideline
        Non-research private non-profit                                  Index
        Non-research public sector                                       Journal article
        Private research center                                          Lecture
        Private non-profit research center                               Multimedia
        Public research center                                           Patent
        Laboratory                                                       Report
        Research Group                                                   Review
Equipment                                                            1.   knowledge markup (by researcher)
Multimedia element                                                   2.   harvesting marked-up knowledge by crawlers or
         Audio                                                            software agents
         AudioVisual                                                 3.   transforming harvested data into formats
                                                                          appropriate for metadata repository/search engines
        DataForMultimedia(data for scientific software
modules, such as GIS)                                                4.   loaded into repository
         ExecutableFile(which       visualize    information,        5.   retrieved by search engines according to users
process, etc)                                                             request
         Flash
         Image                                                   WEB PAGE ANNOTATION
                                                                 So the ontology can serve for understanding meaning of
         RealMedia
                                                                 data. But to make data understandable by software agents,
         ShockWave                                               they should be provided in a format, which agent can parse
         Slide presentation                                      A number of annotation tools are described in [Staab-
         Video                                                   2001].
Site                                                             For page annotation we use two tools: OntoMat and
                                                                 AURIS-MM metadata generating facilities.
         Organization’s site
                                                                 OntoMat [OntoMat] is a user-friendly interactive webpage
         Project’s site                                          annotation tool. It includes web browser and ontology
         Personal home page                                      browser. Ontology browser supports DAML + OIL
         Publication on the web                                  ontology exploration. Web browser           supports web
                                                                 browsing, highlighting parts of the web pages and creating
         List of the publications                                annotations based on highlighted part of the pages. To
         Reference page                                          annotate the web page researcher needs to open web page
         Information system                                      in the browser, then open ontology from provided by
                                                                 project URL. Then the researcher can crate annotation
                  Library (access to articles)                   highlighting regions of the page and describing them in
                  Research Information System (access to         ontology browser according to the ontology terms, relation
research data- projects, persons, organizations)                 and attributes. OntoMat automatically creates RDF
                                                                 annotation and new web page with included RDF
                                                                 annotation. The annotated web pages can be published on
The complete ontology and set of terms are presented at          the web instead of annotated.
http://derpi.tuwien.ac.at/~andrei/Metadata_Science.htm.
                                                                 AURIS-MM metadata generating facilities generated RDF
For ontology development CERIF-2000 Guidelines and               description of the data from AURIS-MM Relational
Subject Index recommendations were used, as well                 database.
Multimedia Ontology [Hunt-2001] and science and
university ontologies mentioned early.                           To create annotated web page, researcher needs input data
                                                                 about his research (projects, publications, etc) into AURIS-
As a guidelines for ontology development we used [Noy-           MM, and the use metadata generating facility just by
2001, Noy-G]                                                     pressing buttons. Generated RDF file then can be published
INFORMATIONAL RETRIEVAL ARCHITECTURE                             on the web directly, or can be embedded into the web page.
                                                                 The generated RDF file for the object has a persistent
The research data for retrieval should be collected,             location in the AURIS-MM, which can be used as an
analyzed. To make possible analysis and understanding of         identifier for that object. This is very important because
meaning of data by software, they should be published in         information about the one object can be asserted on
format understandable by software agent or annotated. Then       different pages. OntoMat supports only annotation and does
annotations should be collected, analyzed, if it is considered   not generate persistent URLs, because it is annotation tool.
necessary, they should also be transformed into one              Currently AURIS-MM does not support any ontology for
model/format. During search operation queries and data           semantic annotation as OntoMat does. But it supports
should be processed by search engines and response should        vocabularies and thesaurus for advanced annotations, also it
be send to information consumers                                 supports workflows and allows to re-use already inputted
So the process of information retrieval consists of              data.
Fig. Annotation of the page                                   Fig. Metadata collecting into RDF database


Fig. The registration of multimedia element.


                                                              QUERYING  COLLECTED   METADATA,                   GETTING
                                                              KNOWLEDGE FROM ANNOTATIONS
                                                              Once the annotated metadata were collected, how to use
                                                              them?
                                                              There are several tools which can be used to search
                                                              annotated pages.
                                                              SHOE       Search    Engine      –    Semantic     Search
                                                              (http://www.cs.umd.edu/projects/plus/SHOE/search/)
                                                              search registered annotated pages. User of search engine
                                                              can choose ontology, then choose type of resource he
                                                              searches, create very simple filter conditions and search
COLLECTING METADATA                                           SHOE metadata database.
To make knowledge annotated on the web pages accessible
for retrieval, it should be collected, analyzed, stored and   Our approach assumes that data would be described in RDF
made accessible for query engine.                             or can be translated into RDF by transformation procedure.
                                                              Also to provide search services for researcher query
Harvesting (collecting) RDF metadata possible by using        facilities should be able to search data by its meaning (type
RDF                                                Crawler    of resource or property), values of attributes (properties)
(http://ontobroker.semanticweb.org/rdfcrawl/index.html) –     and relation between resources.
java application, which can crawl web pages and collect
RDF data. After crawling RDF Crawler produces one file        There are several query engines for RDF[Karv-2000],
which store all RDF data and declaration of all used RDF      Squish, Ontobroker, Redland RDF Application Framework,
Schemas.                                                      MetaLog, RDF Data Query Language.
The data about research now provided in different markup      In our project to query RDF database Sesame RDF Query
formats. Austrian research information system, Math-          Repository and Querying Facility is used.
Net(http://www.math-net.org) and other societies use          Sesame supports RQL (RDF Query Language) [Vass]
different markups to annotate date.                           which is being developed by ICS-FORTH Institute. Sesame
In our approach all data should be converted to RDF to be     supports storing both RDF and RDF Schema information.
accessible for search and analysis through one search         Querying Facilities of Sesame supports Schema information
engine.                                                       about subclasses and subproperties, searching by attributes
                                                              values, resource relations.
                                                              Table. Examples of SESAME queries to retrieve
                                                              research information
                                                               projects and participants of those projects
http://derpi.tuwien.ac.at/~andrei/cerif.rdfs#Person
All persons in database (and any subtype of a person,          Sesame provides application interface through HTTP
-researchers and student)                                      protocol, so application can query and update network RDF
                                                               databases.
http://derpi.tuwien.ac.at/~andrei/cerif.rdfs#Researcher        CONCLUSIONS
                                                               Use of Semantic Web technologies might be very fruitful
All persons who are researchers (or any subtype of
                                                               for development of Research Information Systems.
researchers)
                                                               The annotation of knowledge make it more easy to
                                                               researchers and research organization to assert information
^http://derpi.tuwien.ac.at/~andrei/cerif.rdfs#Researcher       about their research for dissemination. No need to register it
All persons, who are researchers and not any subtype           in a number of information systems. Software agents can
of researcher                                                  collect information and understand its meaning
select X,Y                                                     Not only research data but also new domain knowledge can
                                                               be also asserted and shared for use.
from     #Project    {X}.     #project_persons{Y},      {Z}
#expertise_skill {E}                                           Query engines for Semantic Web due to that inference
                                                               abilities and schema exploration can make development of
where X = Z and N = “Semantic Web”
                                                               Research Information System more easy then conventional
All projects in Semantic Web with description of persons       technologies like Relational Database management systems
participation in them                                          because exploration of domain knowledge is very crucial
If the organization or person, or Research Information         for CRIS systems .
System asserts new type of project – software project and in   ACKNOWLEDGMENTS
RDF Schema provides that it is a subtype of AURIS-MM,          I thank Walter Niedermayer and all AURIS-MM project
then it will also searched.                                    staff, Vienna University of Technology for support and
                                                               helpful comments on previous versions of this article.
select X,Y                                                     REFERENCES
from     ^#Project     {X}.    #project_persons{Y},     {Z}
#expertise_skill {E}
where X = Z and N = “Semantic Web”
Only projects in Semantic Web asserted as exactly CERIF


ERGO European Research Gateways Online http://www.cordis.lu/ergo
BL-2001 Berners-Lee T., Hendler J., Lassila O., The Semantic Web, Scientific American, May 2001
Hend-2001 Hendler J., Agent and the Semantic Web, IEEE Intelligent Systems Journal, March/April 2001
Erd-2001 M. Erdmann, A. Maedche, H-P. Schmurr, and S. Staab, From Manual to Semi-automatic Semantic Annotation,
LLQN SLQJ(OHFWURQLF$UWLFOHVLQ&RPSXWHUDQG,QIRUPDWLRQ6FLHQFH9RO 
SWA Semantic Web Activity http://www.w3.org/2001/sw
Lee-2001 Berners-Lee T., Hendler J., Scientific publishing on the ‘semantic web’, 12 April, The Nature,
http://www.nature.com/nature/debates/e-access/Articles/bernerslee.htm
Hefl-99 Jeff Heflin, James Hendler, and Sean Luke, SHOE: A Knowledge Representation Language for Internet Applications,
Technical Report CS-TR-4078 (UMIACS TR-99-71). 1999. http://www.cs.umd.edu/projects/plus/SHOE/pubs/#tr99
SHOE SHOE home page. http://www.cs.umd.edu/projects/plus/SHOE/
OIL Ontology Inference Layer web site http://www.ontoknowledge.org/oil
Fens-2000 D. Fensel et al.: OIL in a nutshell In: Knowledge Acquisition, Modeling, and Management, Proceedings of the
European Knowledge Acquisition Conference (EKAW-2000), R. Dieng et al. (eds.), Lecture Notes in Artificial Intelligence,
LNAI, Springer-Verlag, October 2000.
DAML DARPA Agent Markup Language. http://www.daml.org
Jeff-98 K. G. Jeffery, “ERGO: European Research Gateways Online and CERIF: Computerized Exchange of Research
Information Format”, ERCIM News N. 35, 1998, http://www.ercim.org/publication/Ercim_News/enw35/jeffery.html
CERIF-2000 Common European Research Information Format 2000 Guidelines.
ftp://ftp.cordis.lu/pub/cerif/docs/cerif2000.htm
Lind-2000 Niclas Lindgren, $QLWD5DXWDP N Managing Strategic Aspects of Research, CRIS-2000,
(ftp://ftp.cordis.lu/pub/cris2000/docs/rautamdki_fulltext.pdf)
Aks-2000 Dag W Aksnes, Johanne-Berit Revheim, The Application of CRIS for Analyzing Research Output - Problems and
Prospects, CRIS-2000 ( ftp://ftp.cordis.lu/pub/cris2000/docs/aksnes_fulltext.pdf)
Grot-98. M. Grotschel, L. Lugger, "Scientific Information systems and Metadata", Classification in the Information Age.
Proc. of the 22nd Annual GfKl Conference, Dresden, March 4-6, 1998.
Lop-01. Lopatenko A. S., Kulagin M. V. "Current Research Information Systems and Digital Libraries. Needs for
integration", to appears in proceedings of "Digital Libraries: Advanced Methods and Technologies, Digital Collections", Sep.
2001
MathNet. Math-Net Application Profile http://www.iwi-iuk.org/material/RDF/1.1/profile/MNPage/
CERIF CERIF Homepage. http://www.cordis.lu/cerif
Hunt-2001 J. Hunter, "Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology", SWWS, Stanford, July
2001
Noy-2001 Noy N. F., Ontology Engineering, Semantic Web Working Symposium, 2001, Stanford
Noy-G Noy N. F., McGuinees D. L., Ontology Development 101: A Guide to Creating Your First Ontology,
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html
Staab-2001 Staab, S., Maedche, A., and Handschuh, S.: Creating Metadata for the Semantic Web: An Annotation Framework
and the Human Factor. Technical Report, 2001 http://www.aifb.uni-karlsruhe.de/WBS/sha/papers/semantic-annotation.pdf
OntoMat Webpage annotation tool. http://ontobroker.semanticweb.org/annotation/ontomat/index.html
Karv-2000 Karvounarakis G., Querying RDF Metadata and Schemas Technical Report, Institute of Computer Science,
Foundation for Research and Technology-Hellas (FORTH), Crete, Greece,
http://www.ics.forth.gr/proj/isst/RDF/rdfquerying.pdf
Vass G. K. Vassilis, C. D. Plexousakis, S. Alexaki, “Querying Community Web Portals”,
http://139.91.183.30:9090/RDF/publications/sigmod2000.html