=Paper=
{{Paper
|id=Vol-538/paper-4
|storemode=property
|title=Linked Data Authoring for Non-Experts
|pdfUrl=https://ceur-ws.org/Vol-538/ldow2009_paper4.pdf
|volume=Vol-538
|dblpUrl=https://dblp.org/rec/conf/www/Luczak-RoschH09
}}
==Linked Data Authoring for Non-Experts==
Linked Data Authoring for Non-Experts
Markus Luczak-Rösch Ralf Heese
Freie Universität Berlin Freie Universität Berlin
AG Corporate Semantic Web AG Corporate Semantic Web
Takustr. 9 Takustr. 9
D-14195 Berlin, Germany D-14195 Berlin, Germany
luczak@inf.fu-berlin.de heese@inf.fu-berlin.de
ABSTRACT Likewise, we believe that the principles of linked data will
The vision of the Semantic Web community is to create a change once more the perception of the Web and publish-
linked “Web of data” providing ubiquitous data access via ing content as linked data will become commonplace. At
machine understandable links between data resources. Due the moment, most linked data sources offer data that origi-
to the need of enormous expertise for producing and con- nally resides in relational databases. Wrappers convert this
suming linked data, the idea of linked data emerges only data into RDF format automatically. Besides the availabil-
slowly and only a few data sources are currently available as ity of database wrappers, we think that the broad success
linked data. In this paper we present Loomp to facilitate an of linked data in everyday usage also depends on the avail-
increasing use of the Web data. Loomp enables non-experts ability of authoring tools that enable ordinary Web users –
to produce and publish semantically annotated content as i.e. non-experts regarding to semantic web technologies –
easy as formatting text in word processors. Furthermore, to publish linked data. Currently, there exists tool support
Loomp simplifies the reuse of content with different Web for editing metadata or adding semantics to wikis, but to
applications. our best knowledge tools are missing that allow non-expert
users to enrich content such as text and multimedia objects
with detailed semantics.
Categories and Subject Descriptors In this paper we introduce Loomp as a system that allows
I.7.4 [Electronic Publishing]; H.4.1 [Information Sys- every Web user to create semantically enriched content as
tems Applications]: Office Automation—Desktop publish- easy as to format text in a word processor. A user can easily
ing,Word processing; H.3.2 [Information Storage and Re- publish the same content to various applications such as Web
trieval]: Information Storage browsers, blogs, and wikis. Additionally, Loomp features an
integrated linked data server for publishing the content as
linked data (Figure 1).
General Terms
Semantic Web
Keywords
linked data, content authoring and publishing, Web of data
1. INTRODUCTION
Over the past years authoring content for the Web un-
derran an interesting evolution which is well-known as the
evolution from a Web 1.0 in the past towards the so-called
Web 2.0 in the present. We state that the decisive cata-
lyst for the evolution step from Web 1.0 to Web 2.0 was
the broad success of blogging. Easy-to-use tools such as
WordPress enabled a critical mass of people to change their Figure 1: Providing manually created linked data to
behavior on the Web from consume-oriented to an interplay the linked data cloud and Web applications
of consuming and publishing. In addition, people apply wiki
systems as appropriate means for managing large knowledge
bases collaboratively. Both developments show that ordi-
nary Web users nowadays have a higher technical affinity The paper is structured as follows: As an illustrating ex-
and understanding of the network effects on the Web than ample of using Loomp in a real world scenario, we report our
in times of Web 1.0. Authoring content online has become results of interviewing journalists and editors in Section 2.
a familiar task. In Section 3 we give an overview of the Loomp architecture
and describe key points of Loomp in detail. In Section 4
Copyright is held by the author/owner(s).
WWW2009, April 20-24, 2009, Madrid, Spain. we present related work and in Section 5 we conclude and
. outline our future work intentions.
2. THE JOURNALISTS USE CASE text documents and email communication. In some cases,
Nowadays we can find interesting use cases with under- especially in the online publishing sector, journalist enter
lying business models in literature that are based on linked their articles directly into editorial managements systems or
data contained in webpages. The BBC for example devel- content management systems. Furthermore, they add ap-
oped a system that utilizes automatic enrichment of content propriate categories and tags. Finally, an editor revises and
to increase the visiting time on their website1 . By providing releases the articles for his department.
links to information sources of their own website that are re- During the last years several studies such as [9] recognized
lated to the currently shown website, a user does not need to an increasing importance of online publishing and claimed
search external sources for further information. In general, for more professional journalists in online media. Classical
it is more complex to identify the business value of publish- publishing houses and media companies are the most impor-
ing linked data on the Web – especially the advantage over tant information provider on the Web. However, even these
search engines does not hold as an argument. For example, providers need to satisfy the demands of the consumer for
we have to distinguish between the automatic generation cross-media publishing of contents.
of linked data based on conventional data sources and the In Figure 3 we give an overview of Loomp in the journalist
manual enrichment of Web contents. In the first case, the use case setting. The application of Loomp is possible in two
creation and publishing of linked data does not cause any ways. First, as a personal information management system
additional effort for the author. In the latter case, an au- for the journalists to facilitate an easier search and reuse
thor has to put more effort into the creation of Web content, of former research results and former articles. Second, as
because he has to face the additional task of annotation. an editorial management system for the publishing houses
In this section we describe a use case in the domain of which acts as flexible means for cross-media publishing and
journalism which illustrates the added value of manual cre- personalized content aggregation.
ation of linked data. This journalists use case is represen-
Released Works
tative for the domain of content and knowledge intensive
Freelance Journalist
work in a heterogeneous environment. We personally inter-
Web Data
viewed journalists and editors of publishing houses which Research Results
are typically working in the areas of print publishing, online
Article Archive
publishing, and cross-media publishing. The self-conception Publishing House
of both groups is a bit overlapping, because sometimes free-
Research Results
lancers writing articles call themselves editors. However, in Employed Journalist
this paper we distinguish these two groups on the basis of
their tasks: journalists research and write articles and edi- Article Archive
tors revise and publish the work of journalists- Journalists Editor
may be employed or work as freelancers. As a matter of
fact, however, many freelance journalists have a contract Figure 3: Setting of Loomp in the context of the
with only a single publishing house. journalists use case
Freelance Journalist Loomp helps journalists to manage their notes, interview
Released Works
logs, references, addresses, etc. The system supports users
Article Archive Research Results
to enrich them semantically, e.g., by an automatic annota-
tion assistant and an easy-to-use editor for manual anno-
Article Archive tations. Furthermore, authors can write and manage their
Publishing House articles with Loomp – the creation of semantic annotations
Employed Journalist is possible. Especially, Loomp helps to link an article to its
Research Results information sources. Loomp provides human and machine
readable representations of the content, so that it can eas-
ily be searched, reused, and published. In this case Loomp
Editor serves as a Web authoring tool.
On the side of the publishing houses editors use Loomp to
Figure 2: Representative setting of today’s publish- revise and edit articles written by journalists. Furthermore,
ing houses they can add more annotations to articles and possibly in-
terlink them. Finally, they choose a publishing channel for
Figure 2 depicts a representative setting of journalists and the work (e.g. such as a blog, an RSS feed, a wiki, or print)
editors in the context of today’s publishing houses. Journal- and release them. At this point we regard Loomp as a dis-
ists research specific topics on demand and access various tributed and collaborative Web content management system
information sources for this purpose, e.g. websites, books, which facilitates cross-media publishing of semantically en-
related articles, and human informants. Our personal inter- riched information.
views yielded that journalists note the results of this research The benefits of Loomp in this context are manifold. Most
using paper and pencil. Only very few journalists use dig- important, Loomp features a semantic search engine and
ital devices for this task and even fewer apply information thus decreases effort of finding information that a user has
management systems. To transfer the finished article to the created before. Because the system also keeps track of prove-
responsible editor at the publishing house the people use free nance information, authors can retrace the sources of an ar-
ticle. Since Loomp serves all content semantically enriched,
1
http://www.bbc.co.uk/blogs/radiolabs/ consumers can modify the presentation of it according to
their current information needs. For example, the reader Some of these requirements seem to be rather visionary,
may decide to highlight all names of persons in a text. As but we realize Loomp with these requirements in our mind.
a consequence authors and editors are more or less freed We strongly focus technical requirements as well as socio-
from the effort of formatting texts in bold, italic, or un- economic requirements. Both, but mostly the latter, stress
derlined. Based on the semantic annotations the content the goal to design an authoring tool for the Web of data
provider can offer content and target group specific services, which does not contradict human mindsets.
e.g., the BBC example of providing related information or
accurately fitting commercials. 3.1 The Loomp System Architecture
The two basic types of resources that are managed by
Loomp are fragments and mash-ups. A fragment is the
3. DESIGNING A LINKED DATA EDITOR smallest piece information in Loomp and describes a closed
FOR THE MASSES notional entity containing annotated text, multimedia con-
Considering our use case example it comes clear that the tent, or a SPARQL query. Mash-ups are composed of an
task of building a linked data authoring tool for a broad arbitrary number of fragments. Both, fragments and mash-
range of Web users is a complex task. The design criteria ups, are assigned a unique identifier (URI) and can be re-
have to respect that the target group has no theoretical un- trieved by dereferencing this URI. In the case of SPARQL
derstanding of what RDF and linked data is. Moreover, we queries we assign two identifiers, one for the query itself and
expect that the common understanding of the Web as a net- one for the result of the query. As known from the RDF
work of human readable pages will not change in the near specification, the identifier can be used to make statements
future, so that the value of a Web of data is not very familiar about them, e.g., add metadata such as the author and the
for ordinary users. To lower the barriers the compelling sim- creation date.
plicity of Web 2.0 applications should be transfered to the We designed Loomp as a typical LAMP2 compatible Web
task of creating linked data: light-weight, easy-to-use, and application (see Figure 4). Loomp serves contents either in
easy-to-understand. In the following list we describe design RDF (e.g. for linked data clients) or in XHTML/RDFa [1]
requirements on an authoring system which in our opinion (e.g for Web browsers).
are necessary to enable non-expert users to participate in
the Web of data.
Intuitive user interface The system hides the complex-
ity of creating linked data by providing an intuitive
user interface. It follows common mindset and uses
well-known procedures of system interaction to pro-
duce semantic annotations. For example, every com-
puter user is nowadays able to select a text and to click
on a button to format it italic.
Simple vocabularies Although Web users know the term
URL or Internet address, they are currently rarely
aware of namespaces. Thus, the system provides ac-
cess to vocabularies without going into technical de-
tails. Each concept of a vocabulary has a meaningful
label and is explained in simple terms and with exam-
ples of usage. The system supports widely accepted Figure 4: Overview of the Loomp system architec-
vocabularies and is able to map concepts of equal or ture
similar meaning.
On the server side the main components are a database for
Reuse of content Often the same content is published in storing the data, a linked data server for providing access to
different formats, so the system has to be able to con- the data as linked data, an RDF API for accessing the data
vert the content to common formats such as PDF and by the Loomp application, and a security/authorization com-
to interact with other (Web) applications such as blogs ponent for granting access to the data. The linked data
and wikis. server and RDF API components are realized with the RAP
Pubby library [14].
Support for linked data The system offers its content as On the client side we distinguish between a frontend and
linked data. In order to create linked data, the sys- a backend. The term frontend comprises all clients that re-
tem has to provide support for searching resources and trieve data from the Loomp application without authoriza-
linking to them. tion, e.g., access to all publicly available content by linked
data clients and by Web browsers. The boxes Faceted Brows-
Data authority A user decides which data is publicly avail- ing and Faceted Viewing represent websites that exploit the
able. semantic annotations of the content for navigating to related
content and for changing the appearance of the content (see
Easy to install The requirements for installation and run-
Section 3.3). Loomp also features a plug-in mechnism to
ning the system are low, so that it can be installed in
allow (read and write) access to its content from Existing
most webspaces. The need for configuration is reduced
2
to a minimum. LAMP: Linux, Apache, MySQL, PHP
Web Applications. Thus, for example, it is possible to view Loomp references resources in DBPedia to support unique
the annotated content as simple HTML pages, blog entries, identifiers.
or wiki pages. The term backend comprises clients that
have to authorize before they can access data – typically, 3.3 Consumer-oriented Presentation
these clients are allowed to modify content, e.g., by the One Nowadays, if an author decides to emphasize a text phrase
Click Annotator and the content management component that seems to be important to her, she may format it using
(see Section 3.2). Using the Vocabulary Mgmt. component an italic font or select a different font color. A consumer
an experienced user may add vocabularies and modify them. of the content is unable to change the appearance in order
to facilitate the accomplishment of a specific task, e.g., a
3.2 One Click Annotation and Content Man- consumer would like to highlight phrases (i.e. all names of
agement persons belonging to a working group) that are important
To enable users to annotate fragments as easy as format- to decide on the relevance of a webpage for conducting a
ting text in word processors, we develop the One Click An- search on a topic.
notator (see Figure 5). The One Click Annotator extends Loomp aims at supporting consumer-oriented presenta-
the TinyMCE3 online HTML editor to support RDFa anno- tion of content. Typically, the content managed by Loomp
tations in a WYSIWYG way. It adopts the look & feel style is delivered in XHTML/RDFa format and, thus, it contains
which is well-known from word processors for applying style semantic annotations. Using Loomp the appearance of the
sheets to text. On the left side of the annotation toolbar a content is defined by cascading style sheets (CSS). By sepa-
user selects the concept for annotating a piece of text, on the rating content from appearance an author who uses Loomp
right side she chooses a vocabulary from a drop-down menu. has still the possibility to exert influence on the appearance
The effort for annotating text semantically substitutes the of the content by changing an existing style sheet or provid-
effort of formatting text bold or italic. For example, the ing a user-specific one.
user selects an email address in the text and clicks the but- In contrast to current Web pages Loomp also allows con-
ton Email. In a next step we plan to integrate an automatic sumers to change the appearance of the content according
annotation recommender. Nevertheless the user has the au- to her current needs. By means of a toolbar a consumer can
thority to reject suggested annotations. In the background format semantically annotated phrases, e.g., she can high-
the One Click Annotator inserts RDFa annotations into the light the names of members of a specific working group with
XHTML. If a user saves her changes the XHTML/RDFa a yellow background. We call this feature faceted viewing.
content is send to the server which in turn extracts RDF
statements and stores them into an triple store. In detail, a
fragment is stored as an XHTML/RDFa representation as
4. RELATED WORK
well as a batch of extracted and assigned RDF metadata. In [11] Heath et al. divided the creation of linked data
into the following steps: i) select vocabularies, ii) partition
the RDF graph into “data pages”, iii) assign an URI to each
Email Name Street Zipcode City Email Name Address
data page, iv) create HTML variants of each data page, v)
assign an URI to each entity, vi) add page metadata and
Institut für Informatik
Humboldt Universität zu Berlin
Humboldt‐Universität
more links, and vii) add a semantic sitemap. With Loomp
Unter den Linden 6
10099 Berlin
we follow these steps for creating linked data. Using the
freytag@dbis.informatik.hu‐berlin.de
One Click Annotator a user selects from a set of vocabular-
ies that reuses existing ontologies (i). Considering Loomp
we distinguish between fragments and mash-ups which are
automatically assigned URI in the background. The content
is published in HTML format beside other (ii–v). The user
Figure 5: Using the One Click Annotator for anno- may also add meta data to fragments and mash-ups (vi).
tating text semantically Last but not least the Loomp server generates a sitemap of
all publicly available content.
A mash-up consists of a sequence of fragments. The user Tools for creating semantically enriched content include
interface for modifying mash-ups exploits modern web tech- semantic wiki and semantic tagging engines. Examples for
nologies to allow drag-and-drop and in-place editing of its semantic wikis are OntoWiki [4], Ikewiki [15], or Semantic
fragments. A user can extend a mash-up by creating a new MediaWiki [12]. These wikis extend traditional wikis by
fragment at the desired place of the mash-up or searching functionalities that enable users to add annotations to a
and dragging an existing fragment to it. wiki page and to specify relationships between pages based
Loomp makes use of the semantic annotations for search- on ontologies. In our opinion semantic wikis are far from
ing fragments and mash-ups. For example, if a search term being usable by non-experts. Besides the effort to learn
has been annotated with different concepts, then the result a special syntax to write and to annotate content, a user
items are grouped and displayed according to these concepts. has to cope with technical terms such as resource, different
In a second step, a user can refine the search by retrieving kinds of relationships, and namespaces. In contrast semantic
the remaining fragments of a group. A fragment can be con- tagging engines such as faviki4 exploit the well-known user
tained in many different mash-ups, it is even possible that interaction procedure of tagging to annotate content. In the
a mash-up contains the same fragment more than once. To background faviki calls functions of the Zemanta Semantic
comply with the linked data principles a user can link frag- API5 to retrieve suggestions for tags, e.g., Wikipedia terms.
ments and mash-ups to other resources on the Web, e.g., 4
http://www.faviki.com
3 5
http://tinymce.moxiecode.com/ http://www.zemanta.com/
Zemanta as well as OpenCalais6 are examples for services Acknowledgments
that automatically annotate content. In Loomp we use these This work has been partially supported by the “InnoProfile-
services to suggest annotations to users. Corporate Semantic Web” project funded by the German
In [8], the author present a JavaScript API for modify- Federal Ministry of Education and Research (BMBF).
ing RDFa directly on the client side and synchronizing the
changes with the server. While our OneClickAnnotator is
suitable for extensive changes of annotations of a text, this 6. REFERENCES
JavaScript library is a useful supplement for smaller changes [1] B. Adida and M. Birbeck. RDFa primer – bridging the
of annotated texts. human and data webs. W3C Working Group Note,
The Tabulator linked data browser [5] allows users to edit Oct. 2008.
data directly on the Web of data. However, since it requires [2] S. Auer. Triplify. Project Website, Retrieved January
a Firefox plug-in in its current stage of development we see 9, 2009, from http://triplify.org.
it as a proprietary tool. In the context of OpenLink Data [3] S. Auer, C. Bizer, J. Lehmann, G. Kobilarov,
Spaces7 provides a complete platform for creating a pres- R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a
ence on the Web of data, e.g., calendar, weblog, or book- web of open data. In Proceedings of ISWC/ASWC
mark manager. However, they focus on describing the data 2007, volume 4825 of LNCS, pages 715–728. Springer
entities semantically while we enrich the content itself. Verlag, Nov. 2007.
In addition, many wrappers have been developed for web- [4] S. Auer, S. Dietzold, and T. Riechert. OntoWiki - A
sites and relational databases to generate linked data. For Tool for Social, Semantic Collaboration. In I. F. Cruz
example, in [6] Bizer et al. describe one of many examples et al., editors, The Semantic Web - ISWC 2006,
for writing an API wrappers to mash-up and interlink data volume 4273 of LNCS, pages 736–749. Springer, 2006.
as RDF. Auer et al. present an approach to create linked [5] T. Berners-Lee et al. Tabulator redux: Writing into
data based on crawling and processing data from Web pages the semantic web. Technical report, ECS, University
in [3]. As a more direct possibility to publish linked data of Southampton, 2007.
several tools support the mapping of relational databases to [6] C. Bizer, R. Cyganiak, and T. Gauss. The RDF Book
RDF, e.g., D2RQ [7], RDB2RDF [13], and Triplify[2]. All Mashup: From Web APIs to a Web of Data. In
these approaches have in common that they only support an S. Auer, C. Bizer, T. Heath, and G. A. Grimnes,
indirect way of creating linked data, e.g., an author cannot editors, SFSW, volume 248 of CEUR Workshop
directly annotate the content. Proceedings. CEUR-WS.org, 2007.
On his website [10] the author presents the idea of writing
[7] C. Bizer and A. Seaborne. D2rq - treating non-rdf
RDF data directly to non-RDF data sources. With loomp
databases as virtual rdf graphs. In ISWC2004
we pursue a similar goal but from our viewpoint the data
(posters), November 2004.
should recide on the user’s server and not on the application
[8] S. Dietzold, S. Hellmann, and M. Peklo. Using
server, if the user wishes so. Using the loomp plug-in an
javascript rdfa widgets for model/view separation
external application can directly retrieve the user data from
inside read/write websites. In Proceedings of the 4th
his server.
Workshop on Scripting for the Semantic Web, 2008.
[9] P. Glotz and R. Meyer-Lucht. Zeitung und Zeitschrift
in der digitalen Ökonomie – Delphi-Studie. Project
5. CONCLUSION AND OUTLOOK Website, Retrieved January 10, 2009, from
http://www.unisg.ch/org/mcm/web.nsf/
In this paper we presented a Web application for creating,
wwwPubInhalteGer/Online Publishing Delphi-Studie.
managing, and publishing semantic data, namely Loomp.
With Loomp we make an important contribution to the suc- [10] M. Hausenblas. pushback - Write Data Back From
cess of linked data. In contrast to existing editors, our main RDF to Non-RDF Sources. http:
focus lies on an intuitive user interface, that enables every //esw.w3.org/topic/PushBackDataToLegacySources,
Web user to produce semantically enriched content and to March 2009. retrieved on 3rd March, 2009.
distribute it across various media easily. Furthermore, we [11] T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, and
reduced the system requirements to operate a Loomp server O. Hartig. How to publish linked data on the web,
to a minimum (LAMP system) and integrated a linked data October 2008. Tutorial at the ISWC2008, retrieved on
server which provides all public content as linked data to 10th February, 2009.
increase the awareness and usage of linked data . [12] M. Krötzsch, D. Vrandecić, and M. Völkel. The
An initial version of Loomp has recently been released Semantic Web - ISWC 2006, chapter Semantic
which illustrates the basic functionalities, e.g., content man- MediaWiki, pages 935–942. Lecture Notes in
agement and publishing. As a major feature, we will exploit Computer Science. Springer Verlag, 2006.
existing web services to propose annotations automatically [13] A. Malhotra. Progress report from the rdb2rdf xg. In
which can be accepted or rejected by a user. In our future C. Bizer and A. Joshi, editors, International Semantic
work, we will also address the integration and the support Web Conference (Posters & Demos), volume 401 of
of third party applications such as blogs, wikis, and word CEUR Workshop Proceedings. CEUR-WS.org, 2008.
processors. [14] R. Oldakowski et al. RAP: RDF API for PHP. In
Proceedings of SFSW 2005, May 2005.
[15] S. Schaffert. Ikewiki: A semantic wiki for collaborative
6
http://www.opencalais.com/ knowledge management. In Proceedings of STICA’06,
7 Manchester, UK, June 2006.
http://virtuoso.openlinksw.com/wiki/main/Main/Ods