=Paper=
{{Paper
|id=Vol-175/paper-28
|storemode=property
|title=Semantics-based Publication Management using RSS and FOAF
|pdfUrl=https://ceur-ws.org/Vol-175/6_mika_swvu_final.pdf
|volume=Vol-175
|dblpUrl=https://dblp.org/rec/conf/semweb/MikaKS05
}}
==Semantics-based Publication Management using RSS and FOAF==
<pdf width="1500px">https://ceur-ws.org/Vol-175/6_mika_swvu_final.pdf</pdf>
<pre>
                   Semantics-based Publication Management using RSS and FOAF

                                         Peter Mika and Michel Klein and Radu Serban
                                    Department of Computer Science, Vrije Universiteit Amsterdam
                                                 [pmika|mcaklein|serbanr]@cs.vu.nl


                                    Abstract                                           2       Sources of Information
       Listing references to scientific publications on per-                           For information about publications, we rely on the common
       sonal or group homepages is a common practice.                                  BibTeX format. We ask authors to include a BibTeX file
       Doing this in a consistent and structured manner                                with their own publications on a publicly accessible part of
       either requires a lot of discipline or a centralized                            their website. For many authors this does not require addi-
       database. Scientific publication, however, is a dis-                            tional work, as they already maintain such a file themselves.
       tributed activity by nature. We present a com-                                  A simple crawler collects all files from the www.few.vu.nl
       pletely distributed and RDF-based implementation                                domain. The BibTeX files are translated to RDF using the
       for disseminating references to scientific publica-                             BibTeX-2-RDF service,1 which creates instance data for the
       tions. Our application only uses existing informa-                              “Semantic Web Research Community” (SWRC) ontology.2
       tion sources and allows for different output formats,                              Personal information is collected via the web as well, us-
       e.g. HTML, RSS and RDF.                                                         ing the FOAF profiles [Brickley and Miller, 2005] that people
                                                                                       linked from their homepage. The FOAF files contain RDF
                                                                                       statements describing personal information such as the in-
1      Collecting and Publishing References                                            dividual’s homepage, workplace, image and relationships to
Information about scientific publications is often maintained                          other people.
by individual people. To present this distributed information                             To know which researchers are member of which depart-
in different selections and in different formats at different lo-                      ment, we have implemented a web service that translates the
cations usually requires a lot of manual work. We demon-                               content of the department mailing lists to a FOAF format with
strate an application that performs this task using Semantic                           statements about group membership. We do not reveal the
Web based techniques.                                                                  email addresses of people, but use a hash of the email address
   Our application collects several sources of information                             as identifier. By using the mailing lists as a source for the
from several locations, in particular information about pub-                           group membership information, we do not have to maintain
lications of authors from their homepage, information about                            this information ourselves, but rely on the existing infrastruc-
group-membership from the department website and infor-                                ture in the department (i.e. the computer system adminstra-
mation about people by crawling FOAF-profiles. All sources                             tion).
are—if not yet in this format—translated to RDF and up-
loaded to an RDF store, in our case Sesame [Broekstra et al.,
2002].
                                                                                       3       Aggregation
   In the repository we apply several unification and reason-                          Mapping Schemas
ing steps to link the different data sources, to derive additional                     Using distributed, web-based knowledge technologies, we
facts and to remove redundant information. In addition, a sep-                         have to deal with the arising semantic heterogeneity of our
arate web service can be used to query for publications based                          information sources. Heterogeneity effects both the schema
on specific criteria and to produce a variety of output formats,                       and instance levels.
including BuRST (a compatible extension of RSS 1.0) and                                   As the schemas used are stable, lightweight web on-
HTML.                                                                                  tologies, mappings on the class level cause little problem:
   Figure 1 presents a schematic representation of the ap-                             such mappings are static and can be manually inserted into
proach, which is introduced in detail in the following section.                        the knowledge base. An example of such a mapping is
                                                                                       the subclass relationship between the swrc:Person and
                                inference &                                            foaf:Person classes or the subproperty relationship be-
                                 unification
 publications
  publications
                                                                                       tween swrc:name and foaf:name.
   publications
                   upload                                                                 Although we used existing RDF schemas for describing the
                                                            RSS request /
                                                                              RSS
                                                                                       instance data, a simple extension of the SWRC ontology was
     groups
      groups                                                data
    departments                        query                                  client   necessary to preserve the sequence of authors of publications.
                            RDF                webservice
                            store       data                                           To this end we defined the authorList and editorList
 personal info                                              HTML
  personal
   personalinfo
            info                                            request / data   browser   properties, which have rdf:Seq as range, comprising an or-
                                                                                       dered list of authors..
     Figure 1: A schematic representation of the approach.                                 1
                                                                                               See http://www.cs.vu.nl/˜mcaklein/bib2rdf/.
                                                                                           2
                                                                                               See http://ontoware.org/projects/swrc/.
Unifying Instances                                                  5   Use Cases
Heterogeneity on the instance level arises from using differ-       Our system for semantics-based bibliography management
ent identifiers in the sources for denoting the same real world     can be used by individuals and groups alike in a variety of
objects. This effects FOAF data (where typically each per-          modes. It can be used to provide a search interface to pub-
sonal profile also contains partial descriptions of the friends),   lication collections on personal homepages or departmental
but also publication information, as the same author may be         websites such as the homepages of the AI and BI groups of
referenced in a number of BibTeX sources.                           the VUA (information pull).
   The solution is provided by instance reasoning (smushing)           More interestingly, the use of RSS technology allows oth-
using ontological features. The FOAF ontology defines a             ers to be notified of changes to these collections (informa-
number of inverse-functional properties of the Person class         tion push) by subscribing to publication feeds. A number of
which can be used to determine whether two instances of Per-        generic tools are available for reading and aggregating RSS
son are the same. (Functional properties, on the other hand,        information, including browser extensions, online aggrega-
can be used to prove that two instances are not the same.)          tors, news clients and desktop readers for a variety of plat-
For example, if two Persons have the same value for the             forms. While these software are not aware of the SWRC and
mbox-sha1sum (hash of the email address), we can conclude           FOAF schemas, they are still able to process BuRST feeds by
that both instances are the same. In this way, we can relate        ignoring the information they do not understand. (This be-
the statements from the FOAF files to the statements about the      haviour is mandated by the RSS specification and is the basis
mailinglist-membership. Besides the inverse-functional prop-        of modularization in RSS.) Mozilla FireFox also natively sup-
erties, we also apply fuzzy string matching to compare person       ports RSS feeds as the basis for creating dynamic bookmark
names, following a step of normalization (e.g. to be able to        folders. These folders refresh their contents from an RSS feed
compare ’Harmelen, F.’ and ’Frank van Harmelen’). Simi-             whenever the user opens them.
larly, publications are matched based on an exact match of             The reliance on RDF and lightweight, widely used web
the date of the publication and a tight fuzzy match of the title.   ontologies also makes it possible to access personal profiles
Matching publications based on author similarity is among           and publication information by generic RDF tools such as the
the future work.                                                    Piggy Bank browser extension. Piggy Bank allows users to
   The matches that we find are recorded in the RDF store           collect RDF statements linked to Web pages while browsing
using the owl:sameAs property. Since Sesame doesn’t na-             through the Web and to save them for later use. FOAF infor-
tively support OWL semantics at the moment, we expanded             mation can be processed by a growing number of tools, while
the semantics of this single property using Sesame’s custom         the SWRC data can be easily converted back to BibTeX to
rule language. These rules express the reflexive, symmetric         complete the knowledge cycle.
and transitive nature of the property as well as the intended
meaning, namely the equality of property values. The rules          6   Discussion
add several statements to give all the equivalent resources the     In summary, we presented a semantic-based system for pub-
same the set of properties. These rules are executed by the         lication management that builds on web technology, well-
custom inferencer during uploads, which means that queries          known ontologies and by reusing existing information re-
are fast to execute. (On the downside, the size of the reposi-      quires no additional effort from the individual. In compari-
tory greatly increases.)                                            son to centralized approaches, our system leaves the control
                                                                    over publication management and presentation in the hands
                                                                    of the individual researcher, while still allowing for informa-
4       Presentation                                                tion push. On the other hand, our system is more lightweight
After the information has been merged, the triple store can be      than P2P networks that require users to install and run spe-
queried to produce publications lists according to a variety of     cific software on their computers. The Java object models
criteria, including persons, groups and publication facets. An      for the FOAF, RSS and BuRST formats as well as the tools
online form helps users to build such queries against the de-       for crawling and smushing FOAF data have been made avail-
partmental publication repository. The queries are processed        able as part of the open source Elmo API for Sesame. Elmo
by another web-based component, the Publication webser-             can be downloaded from www.openrdf.org. The interface
vice.                                                               to the tools themselves and some examples can be found at
                                                                    http://prauw.cs.vu.nl:8080/burst/.
   This tool takes the location of the repository, the query,
the properties of the resulting RSS channel and optional style
instructions as parameters. In a single step, it queries the        References
repository and generates an RSS channel with the publica-           [Brickley and Miller, 2005] Dan Brickley and Libby Miller.
tions matching the query. This RSS channel follows the                FOAF vocabulary specification. Namespace document,
BuRST specification3 for mixing in publication metadata into          June 3, 2005.
the RSS channel. The resulting channel appears as a RSS 1.0         [Broekstra et al., 2002] Jeen Broekstra, Arjohn Kampman,
channel for compatible tools while preserving RDF metadata.
                                                                      and Frank van Harmelen. Sesame: An architecture for
   The presentation service can also add XSL stylesheet in-           storing and querying RDF and RDF Schema. In Ian
formation to the RSS feed, which allows to generate different         Horrocks and James A. Hendler, editors, Proceedings of
HTML layouts (tables, short citation lists or longer descrip-         the First International Semantic Web Conference (ISWC
tions with metadata). The HTML output can be viewed with              2002), volume 2342 of Lecture Notes in Computer Sci-
any XSLT capable browser and it can be tailored even further          ence, pages 54–68, Sardinia, Italy, June, 9–12, 2002.
by adding a custom CSS stylesheet.                                    Springer-Verlag.
    3
        http://www.cs.vu.nl/˜pmika/research/burst/BuRST.html

</pre>