=Paper=
{{Paper
|id=Vol-175/paper-28
|storemode=property
|title=Semantics-based Publication Management using RSS and FOAF
|pdfUrl=https://ceur-ws.org/Vol-175/6_mika_swvu_final.pdf
|volume=Vol-175
|dblpUrl=https://dblp.org/rec/conf/semweb/MikaKS05
}}
==Semantics-based Publication Management using RSS and FOAF==
Semantics-based Publication Management using RSS and FOAF
Peter Mika and Michel Klein and Radu Serban
Department of Computer Science, Vrije Universiteit Amsterdam
[pmika|mcaklein|serbanr]@cs.vu.nl
Abstract 2 Sources of Information
Listing references to scientific publications on per- For information about publications, we rely on the common
sonal or group homepages is a common practice. BibTeX format. We ask authors to include a BibTeX file
Doing this in a consistent and structured manner with their own publications on a publicly accessible part of
either requires a lot of discipline or a centralized their website. For many authors this does not require addi-
database. Scientific publication, however, is a dis- tional work, as they already maintain such a file themselves.
tributed activity by nature. We present a com- A simple crawler collects all files from the www.few.vu.nl
pletely distributed and RDF-based implementation domain. The BibTeX files are translated to RDF using the
for disseminating references to scientific publica- BibTeX-2-RDF service,1 which creates instance data for the
tions. Our application only uses existing informa- “Semantic Web Research Community” (SWRC) ontology.2
tion sources and allows for different output formats, Personal information is collected via the web as well, us-
e.g. HTML, RSS and RDF. ing the FOAF profiles [Brickley and Miller, 2005] that people
linked from their homepage. The FOAF files contain RDF
statements describing personal information such as the in-
1 Collecting and Publishing References dividual’s homepage, workplace, image and relationships to
Information about scientific publications is often maintained other people.
by individual people. To present this distributed information To know which researchers are member of which depart-
in different selections and in different formats at different lo- ment, we have implemented a web service that translates the
cations usually requires a lot of manual work. We demon- content of the department mailing lists to a FOAF format with
strate an application that performs this task using Semantic statements about group membership. We do not reveal the
Web based techniques. email addresses of people, but use a hash of the email address
Our application collects several sources of information as identifier. By using the mailing lists as a source for the
from several locations, in particular information about pub- group membership information, we do not have to maintain
lications of authors from their homepage, information about this information ourselves, but rely on the existing infrastruc-
group-membership from the department website and infor- ture in the department (i.e. the computer system adminstra-
mation about people by crawling FOAF-profiles. All sources tion).
are—if not yet in this format—translated to RDF and up-
loaded to an RDF store, in our case Sesame [Broekstra et al.,
2002].
3 Aggregation
In the repository we apply several unification and reason- Mapping Schemas
ing steps to link the different data sources, to derive additional Using distributed, web-based knowledge technologies, we
facts and to remove redundant information. In addition, a sep- have to deal with the arising semantic heterogeneity of our
arate web service can be used to query for publications based information sources. Heterogeneity effects both the schema
on specific criteria and to produce a variety of output formats, and instance levels.
including BuRST (a compatible extension of RSS 1.0) and As the schemas used are stable, lightweight web on-
HTML. tologies, mappings on the class level cause little problem:
Figure 1 presents a schematic representation of the ap- such mappings are static and can be manually inserted into
proach, which is introduced in detail in the following section. the knowledge base. An example of such a mapping is
the subclass relationship between the swrc:Person and
inference & foaf:Person classes or the subproperty relationship be-
unification
publications
publications
tween swrc:name and foaf:name.
publications
upload Although we used existing RDF schemas for describing the
RSS request /
RSS
instance data, a simple extension of the SWRC ontology was
groups
groups data
departments query client necessary to preserve the sequence of authors of publications.
RDF webservice
store data To this end we defined the authorList and editorList
personal info HTML
personal
personalinfo
info request / data browser properties, which have rdf:Seq as range, comprising an or-
dered list of authors..
Figure 1: A schematic representation of the approach. 1
See http://www.cs.vu.nl/˜mcaklein/bib2rdf/.
2
See http://ontoware.org/projects/swrc/.
Unifying Instances 5 Use Cases
Heterogeneity on the instance level arises from using differ- Our system for semantics-based bibliography management
ent identifiers in the sources for denoting the same real world can be used by individuals and groups alike in a variety of
objects. This effects FOAF data (where typically each per- modes. It can be used to provide a search interface to pub-
sonal profile also contains partial descriptions of the friends), lication collections on personal homepages or departmental
but also publication information, as the same author may be websites such as the homepages of the AI and BI groups of
referenced in a number of BibTeX sources. the VUA (information pull).
The solution is provided by instance reasoning (smushing) More interestingly, the use of RSS technology allows oth-
using ontological features. The FOAF ontology defines a ers to be notified of changes to these collections (informa-
number of inverse-functional properties of the Person class tion push) by subscribing to publication feeds. A number of
which can be used to determine whether two instances of Per- generic tools are available for reading and aggregating RSS
son are the same. (Functional properties, on the other hand, information, including browser extensions, online aggrega-
can be used to prove that two instances are not the same.) tors, news clients and desktop readers for a variety of plat-
For example, if two Persons have the same value for the forms. While these software are not aware of the SWRC and
mbox-sha1sum (hash of the email address), we can conclude FOAF schemas, they are still able to process BuRST feeds by
that both instances are the same. In this way, we can relate ignoring the information they do not understand. (This be-
the statements from the FOAF files to the statements about the haviour is mandated by the RSS specification and is the basis
mailinglist-membership. Besides the inverse-functional prop- of modularization in RSS.) Mozilla FireFox also natively sup-
erties, we also apply fuzzy string matching to compare person ports RSS feeds as the basis for creating dynamic bookmark
names, following a step of normalization (e.g. to be able to folders. These folders refresh their contents from an RSS feed
compare ’Harmelen, F.’ and ’Frank van Harmelen’). Simi- whenever the user opens them.
larly, publications are matched based on an exact match of The reliance on RDF and lightweight, widely used web
the date of the publication and a tight fuzzy match of the title. ontologies also makes it possible to access personal profiles
Matching publications based on author similarity is among and publication information by generic RDF tools such as the
the future work. Piggy Bank browser extension. Piggy Bank allows users to
The matches that we find are recorded in the RDF store collect RDF statements linked to Web pages while browsing
using the owl:sameAs property. Since Sesame doesn’t na- through the Web and to save them for later use. FOAF infor-
tively support OWL semantics at the moment, we expanded mation can be processed by a growing number of tools, while
the semantics of this single property using Sesame’s custom the SWRC data can be easily converted back to BibTeX to
rule language. These rules express the reflexive, symmetric complete the knowledge cycle.
and transitive nature of the property as well as the intended
meaning, namely the equality of property values. The rules 6 Discussion
add several statements to give all the equivalent resources the In summary, we presented a semantic-based system for pub-
same the set of properties. These rules are executed by the lication management that builds on web technology, well-
custom inferencer during uploads, which means that queries known ontologies and by reusing existing information re-
are fast to execute. (On the downside, the size of the reposi- quires no additional effort from the individual. In compari-
tory greatly increases.) son to centralized approaches, our system leaves the control
over publication management and presentation in the hands
of the individual researcher, while still allowing for informa-
4 Presentation tion push. On the other hand, our system is more lightweight
After the information has been merged, the triple store can be than P2P networks that require users to install and run spe-
queried to produce publications lists according to a variety of cific software on their computers. The Java object models
criteria, including persons, groups and publication facets. An for the FOAF, RSS and BuRST formats as well as the tools
online form helps users to build such queries against the de- for crawling and smushing FOAF data have been made avail-
partmental publication repository. The queries are processed able as part of the open source Elmo API for Sesame. Elmo
by another web-based component, the Publication webser- can be downloaded from www.openrdf.org. The interface
vice. to the tools themselves and some examples can be found at
http://prauw.cs.vu.nl:8080/burst/.
This tool takes the location of the repository, the query,
the properties of the resulting RSS channel and optional style
instructions as parameters. In a single step, it queries the References
repository and generates an RSS channel with the publica- [Brickley and Miller, 2005] Dan Brickley and Libby Miller.
tions matching the query. This RSS channel follows the FOAF vocabulary specification. Namespace document,
BuRST specification3 for mixing in publication metadata into June 3, 2005.
the RSS channel. The resulting channel appears as a RSS 1.0 [Broekstra et al., 2002] Jeen Broekstra, Arjohn Kampman,
channel for compatible tools while preserving RDF metadata.
and Frank van Harmelen. Sesame: An architecture for
The presentation service can also add XSL stylesheet in- storing and querying RDF and RDF Schema. In Ian
formation to the RSS feed, which allows to generate different Horrocks and James A. Hendler, editors, Proceedings of
HTML layouts (tables, short citation lists or longer descrip- the First International Semantic Web Conference (ISWC
tions with metadata). The HTML output can be viewed with 2002), volume 2342 of Lecture Notes in Computer Sci-
any XSLT capable browser and it can be tailored even further ence, pages 54–68, Sardinia, Italy, June, 9–12, 2002.
by adding a custom CSS stylesheet. Springer-Verlag.
3
http://www.cs.vu.nl/˜pmika/research/burst/BuRST.html