=Paper=
{{Paper
|id=Vol-538/paper-4
|storemode=property
|title=Linked Data Authoring for Non-Experts
|pdfUrl=https://ceur-ws.org/Vol-538/ldow2009_paper4.pdf
|volume=Vol-538
|dblpUrl=https://dblp.org/rec/conf/www/Luczak-RoschH09
}}
==Linked Data Authoring for Non-Experts==
<pdf width="1500px">https://ceur-ws.org/Vol-538/ldow2009_paper4.pdf</pdf>
<pre>
                      Linked Data Authoring for Non-Experts

                    Markus Luczak-Rösch                                          Ralf Heese
                     Freie Universität Berlin                             Freie Universität Berlin
                   AG Corporate Semantic Web                             AG Corporate Semantic Web
                           Takustr. 9                                            Takustr. 9
                    D-14195 Berlin, Germany                               D-14195 Berlin, Germany
                    luczak@inf.fu-berlin.de                               heese@inf.fu-berlin.de

ABSTRACT                                                           Likewise, we believe that the principles of linked data will
The vision of the Semantic Web community is to create a         change once more the perception of the Web and publish-
linked “Web of data” providing ubiquitous data access via       ing content as linked data will become commonplace. At
machine understandable links between data resources. Due        the moment, most linked data sources offer data that origi-
to the need of enormous expertise for producing and con-        nally resides in relational databases. Wrappers convert this
suming linked data, the idea of linked data emerges only        data into RDF format automatically. Besides the availabil-
slowly and only a few data sources are currently available as   ity of database wrappers, we think that the broad success
linked data. In this paper we present Loomp to facilitate an    of linked data in everyday usage also depends on the avail-
increasing use of the Web data. Loomp enables non-experts       ability of authoring tools that enable ordinary Web users –
to produce and publish semantically annotated content as        i.e. non-experts regarding to semantic web technologies –
easy as formatting text in word processors. Furthermore,        to publish linked data. Currently, there exists tool support
Loomp simplifies the reuse of content with different Web        for editing metadata or adding semantics to wikis, but to
applications.                                                   our best knowledge tools are missing that allow non-expert
                                                                users to enrich content such as text and multimedia objects
                                                                with detailed semantics.
Categories and Subject Descriptors                                 In this paper we introduce Loomp as a system that allows
I.7.4 [Electronic Publishing]; H.4.1 [Information Sys-          every Web user to create semantically enriched content as
tems Applications]: Office Automation—Desktop publish-          easy as to format text in a word processor. A user can easily
ing,Word processing; H.3.2 [Information Storage and Re-         publish the same content to various applications such as Web
trieval]: Information Storage                                   browsers, blogs, and wikis. Additionally, Loomp features an
                                                                integrated linked data server for publishing the content as
                                                                linked data (Figure 1).
General Terms
Semantic Web


Keywords
linked data, content authoring and publishing, Web of data


1.   INTRODUCTION
   Over the past years authoring content for the Web un-
derran an interesting evolution which is well-known as the
evolution from a Web 1.0 in the past towards the so-called
Web 2.0 in the present. We state that the decisive cata-
lyst for the evolution step from Web 1.0 to Web 2.0 was
the broad success of blogging. Easy-to-use tools such as
WordPress enabled a critical mass of people to change their     Figure 1: Providing manually created linked data to
behavior on the Web from consume-oriented to an interplay       the linked data cloud and Web applications
of consuming and publishing. In addition, people apply wiki
systems as appropriate means for managing large knowledge
bases collaboratively. Both developments show that ordi-
nary Web users nowadays have a higher technical affinity          The paper is structured as follows: As an illustrating ex-
and understanding of the network effects on the Web than        ample of using Loomp in a real world scenario, we report our
in times of Web 1.0. Authoring content online has become        results of interviewing journalists and editors in Section 2.
a familiar task.                                                In Section 3 we give an overview of the Loomp architecture
                                                                and describe key points of Loomp in detail. In Section 4
Copyright is held by the author/owner(s).
WWW2009, April 20-24, 2009, Madrid, Spain.                      we present related work and in Section 5 we conclude and
.                                                               outline our future work intentions.
2.        THE JOURNALISTS USE CASE                                                   text documents and email communication. In some cases,
   Nowadays we can find interesting use cases with under-                            especially in the online publishing sector, journalist enter
lying business models in literature that are based on linked                         their articles directly into editorial managements systems or
data contained in webpages. The BBC for example devel-                               content management systems. Furthermore, they add ap-
oped a system that utilizes automatic enrichment of content                          propriate categories and tags. Finally, an editor revises and
to increase the visiting time on their website1 . By providing                       releases the articles for his department.
links to information sources of their own website that are re-                         During the last years several studies such as [9] recognized
lated to the currently shown website, a user does not need to                        an increasing importance of online publishing and claimed
search external sources for further information. In general,                         for more professional journalists in online media. Classical
it is more complex to identify the business value of publish-                        publishing houses and media companies are the most impor-
ing linked data on the Web – especially the advantage over                           tant information provider on the Web. However, even these
search engines does not hold as an argument. For example,                            providers need to satisfy the demands of the consumer for
we have to distinguish between the automatic generation                              cross-media publishing of contents.
of linked data based on conventional data sources and the                              In Figure 3 we give an overview of Loomp in the journalist
manual enrichment of Web contents. In the first case, the                            use case setting. The application of Loomp is possible in two
creation and publishing of linked data does not cause any                            ways. First, as a personal information management system
additional effort for the author. In the latter case, an au-                         for the journalists to facilitate an easier search and reuse
thor has to put more effort into the creation of Web content,                        of former research results and former articles. Second, as
because he has to face the additional task of annotation.                            an editorial management system for the publishing houses
   In this section we describe a use case in the domain of                           which acts as flexible means for cross-media publishing and
journalism which illustrates the added value of manual cre-                          personalized content aggregation.
ation of linked data. This journalists use case is represen-
                                                                                                            Released Works
tative for the domain of content and knowledge intensive
                                                                                                                                            Freelance Journalist
work in a heterogeneous environment. We personally inter-
                                                                                                                                 Web Data
viewed journalists and editors of publishing houses which                                                                                         Research Results
are typically working in the areas of print publishing, online
                                                                                                                                                  Article Archive
publishing, and cross-media publishing. The self-conception                              Publishing House
of both groups is a bit overlapping, because sometimes free-
                                                                                                                       Research Results
lancers writing articles call themselves editors. However, in                         Employed Journalist
this paper we distinguish these two groups on the basis of
their tasks: journalists research and write articles and edi-                                                           Article Archive
tors revise and publish the work of journalists- Journalists                                                                                            Editor

may be employed or work as freelancers. As a matter of
fact, however, many freelance journalists have a contract                            Figure 3: Setting of Loomp in the context of the
with only a single publishing house.                                                 journalists use case

                                                             Freelance Journalist       Loomp helps journalists to manage their notes, interview
                               Released Works
                                                                                     logs, references, addresses, etc. The system supports users
     Article Archive                                            Research Results
                                                                                     to enrich them semantically, e.g., by an automatic annota-
                                                                                     tion assistant and an easy-to-use editor for manual anno-
                                                                   Article Archive   tations. Furthermore, authors can write and manage their
                                          Publishing House                           articles with Loomp – the creation of semantic annotations
      Employed Journalist                                                            is possible. Especially, Loomp helps to link an article to its
            Research Results                                                         information sources. Loomp provides human and machine
                                                                                     readable representations of the content, so that it can eas-
                                                                                     ily be searched, reused, and published. In this case Loomp
                                                Editor                               serves as a Web authoring tool.
                                                                                        On the side of the publishing houses editors use Loomp to
Figure 2: Representative setting of today’s publish-                                 revise and edit articles written by journalists. Furthermore,
ing houses                                                                           they can add more annotations to articles and possibly in-
                                                                                     terlink them. Finally, they choose a publishing channel for
   Figure 2 depicts a representative setting of journalists and                      the work (e.g. such as a blog, an RSS feed, a wiki, or print)
editors in the context of today’s publishing houses. Journal-                        and release them. At this point we regard Loomp as a dis-
ists research specific topics on demand and access various                           tributed and collaborative Web content management system
information sources for this purpose, e.g. websites, books,                          which facilitates cross-media publishing of semantically en-
related articles, and human informants. Our personal inter-                          riched information.
views yielded that journalists note the results of this research                        The benefits of Loomp in this context are manifold. Most
using paper and pencil. Only very few journalists use dig-                           important, Loomp features a semantic search engine and
ital devices for this task and even fewer apply information                          thus decreases effort of finding information that a user has
management systems. To transfer the finished article to the                          created before. Because the system also keeps track of prove-
responsible editor at the publishing house the people use free                       nance information, authors can retrace the sources of an ar-
                                                                                     ticle. Since Loomp serves all content semantically enriched,
1
    http://www.bbc.co.uk/blogs/radiolabs/                                            consumers can modify the presentation of it according to
their current information needs. For example, the reader            Some of these requirements seem to be rather visionary,
may decide to highlight all names of persons in a text. As        but we realize Loomp with these requirements in our mind.
a consequence authors and editors are more or less freed          We strongly focus technical requirements as well as socio-
from the effort of formatting texts in bold, italic, or un-       economic requirements. Both, but mostly the latter, stress
derlined. Based on the semantic annotations the content           the goal to design an authoring tool for the Web of data
provider can offer content and target group specific services,    which does not contradict human mindsets.
e.g., the BBC example of providing related information or
accurately fitting commercials.                                   3.1     The Loomp System Architecture
                                                                     The two basic types of resources that are managed by
                                                                  Loomp are fragments and mash-ups. A fragment is the
3.   DESIGNING A LINKED DATA EDITOR                               smallest piece information in Loomp and describes a closed
     FOR THE MASSES                                               notional entity containing annotated text, multimedia con-
   Considering our use case example it comes clear that the       tent, or a SPARQL query. Mash-ups are composed of an
task of building a linked data authoring tool for a broad         arbitrary number of fragments. Both, fragments and mash-
range of Web users is a complex task. The design criteria         ups, are assigned a unique identifier (URI) and can be re-
have to respect that the target group has no theoretical un-      trieved by dereferencing this URI. In the case of SPARQL
derstanding of what RDF and linked data is. Moreover, we          queries we assign two identifiers, one for the query itself and
expect that the common understanding of the Web as a net-         one for the result of the query. As known from the RDF
work of human readable pages will not change in the near          specification, the identifier can be used to make statements
future, so that the value of a Web of data is not very familiar   about them, e.g., add metadata such as the author and the
for ordinary users. To lower the barriers the compelling sim-     creation date.
plicity of Web 2.0 applications should be transfered to the          We designed Loomp as a typical LAMP2 compatible Web
task of creating linked data: light-weight, easy-to-use, and      application (see Figure 4). Loomp serves contents either in
easy-to-understand. In the following list we describe design      RDF (e.g. for linked data clients) or in XHTML/RDFa [1]
requirements on an authoring system which in our opinion          (e.g for Web browsers).
are necessary to enable non-expert users to participate in
the Web of data.

Intuitive user interface The system hides the complex-
     ity of creating linked data by providing an intuitive
     user interface. It follows common mindset and uses
     well-known procedures of system interaction to pro-
     duce semantic annotations. For example, every com-
     puter user is nowadays able to select a text and to click
     on a button to format it italic.

Simple vocabularies Although Web users know the term
    URL or Internet address, they are currently rarely
    aware of namespaces. Thus, the system provides ac-
    cess to vocabularies without going into technical de-
    tails. Each concept of a vocabulary has a meaningful
    label and is explained in simple terms and with exam-
    ples of usage. The system supports widely accepted            Figure 4: Overview of the Loomp system architec-
    vocabularies and is able to map concepts of equal or          ture
    similar meaning.
                                                                     On the server side the main components are a database for
Reuse of content Often the same content is published in           storing the data, a linked data server for providing access to
    different formats, so the system has to be able to con-       the data as linked data, an RDF API for accessing the data
    vert the content to common formats such as PDF and            by the Loomp application, and a security/authorization com-
    to interact with other (Web) applications such as blogs       ponent for granting access to the data. The linked data
    and wikis.                                                    server and RDF API components are realized with the RAP
                                                                  Pubby library [14].
Support for linked data The system offers its content as             On the client side we distinguish between a frontend and
    linked data. In order to create linked data, the sys-         a backend. The term frontend comprises all clients that re-
    tem has to provide support for searching resources and        trieve data from the Loomp application without authoriza-
    linking to them.                                              tion, e.g., access to all publicly available content by linked
                                                                  data clients and by Web browsers. The boxes Faceted Brows-
Data authority A user decides which data is publicly avail-       ing and Faceted Viewing represent websites that exploit the
    able.                                                         semantic annotations of the content for navigating to related
                                                                  content and for changing the appearance of the content (see
Easy to install The requirements for installation and run-
                                                                  Section 3.3). Loomp also features a plug-in mechnism to
    ning the system are low, so that it can be installed in
                                                                  allow (read and write) access to its content from Existing
    most webspaces. The need for configuration is reduced
                                                                  2
    to a minimum.                                                     LAMP: Linux, Apache, MySQL, PHP
Web Applications. Thus, for example, it is possible to view                 Loomp references resources in DBPedia to support unique
the annotated content as simple HTML pages, blog entries,                   identifiers.
or wiki pages. The term backend comprises clients that
have to authorize before they can access data – typically,                  3.3     Consumer-oriented Presentation
these clients are allowed to modify content, e.g., by the One                  Nowadays, if an author decides to emphasize a text phrase
Click Annotator and the content management component                        that seems to be important to her, she may format it using
(see Section 3.2). Using the Vocabulary Mgmt. component                     an italic font or select a different font color. A consumer
an experienced user may add vocabularies and modify them.                   of the content is unable to change the appearance in order
                                                                            to facilitate the accomplishment of a specific task, e.g., a
3.2           One Click Annotation and Content Man-                         consumer would like to highlight phrases (i.e. all names of
              agement                                                       persons belonging to a working group) that are important
   To enable users to annotate fragments as easy as format-                 to decide on the relevance of a webpage for conducting a
ting text in word processors, we develop the One Click An-                  search on a topic.
notator (see Figure 5). The One Click Annotator extends                        Loomp aims at supporting consumer-oriented presenta-
the TinyMCE3 online HTML editor to support RDFa anno-                       tion of content. Typically, the content managed by Loomp
tations in a WYSIWYG way. It adopts the look & feel style                   is delivered in XHTML/RDFa format and, thus, it contains
which is well-known from word processors for applying style                 semantic annotations. Using Loomp the appearance of the
sheets to text. On the left side of the annotation toolbar a                content is defined by cascading style sheets (CSS). By sepa-
user selects the concept for annotating a piece of text, on the             rating content from appearance an author who uses Loomp
right side she chooses a vocabulary from a drop-down menu.                  has still the possibility to exert influence on the appearance
The effort for annotating text semantically substitutes the                 of the content by changing an existing style sheet or provid-
effort of formatting text bold or italic. For example, the                  ing a user-specific one.
user selects an email address in the text and clicks the but-                  In contrast to current Web pages Loomp also allows con-
ton Email. In a next step we plan to integrate an automatic                 sumers to change the appearance of the content according
annotation recommender. Nevertheless the user has the au-                   to her current needs. By means of a toolbar a consumer can
thority to reject suggested annotations. In the background                  format semantically annotated phrases, e.g., she can high-
the One Click Annotator inserts RDFa annotations into the                   light the names of members of a specific working group with
XHTML. If a user saves her changes the XHTML/RDFa                           a yellow background. We call this feature faceted viewing.
content is send to the server which in turn extracts RDF
statements and stores them into an triple store. In detail, a
fragment is stored as an XHTML/RDFa representation as
                                                                            4.     RELATED WORK
well as a batch of extracted and assigned RDF metadata.                        In [11] Heath et al. divided the creation of linked data
                                                                            into the following steps: i) select vocabularies, ii) partition
                                                                            the RDF graph into “data pages”, iii) assign an URI to each
      Email    Name      Street   Zipcode   City   Email   Name   Address
                                                                            data page, iv) create HTML variants of each data page, v)
                                                                            assign an URI to each entity, vi) add page metadata and
     Institut für Informatik
     Humboldt Universität zu Berlin
     Humboldt‐Universität
                                                                            more links, and vii) add a semantic sitemap. With Loomp
     Unter den Linden 6
     10099 Berlin
                                                                            we follow these steps for creating linked data. Using the
     freytag@dbis.informatik.hu‐berlin.de
                                                                            One Click Annotator a user selects from a set of vocabular-
                                                                            ies that reuses existing ontologies (i). Considering Loomp
                                                                            we distinguish between fragments and mash-ups which are
                                                                            automatically assigned URI in the background. The content
                                                                            is published in HTML format beside other (ii–v). The user
Figure 5: Using the One Click Annotator for anno-                           may also add meta data to fragments and mash-ups (vi).
tating text semantically                                                    Last but not least the Loomp server generates a sitemap of
                                                                            all publicly available content.
   A mash-up consists of a sequence of fragments. The user                     Tools for creating semantically enriched content include
interface for modifying mash-ups exploits modern web tech-                  semantic wiki and semantic tagging engines. Examples for
nologies to allow drag-and-drop and in-place editing of its                 semantic wikis are OntoWiki [4], Ikewiki [15], or Semantic
fragments. A user can extend a mash-up by creating a new                    MediaWiki [12]. These wikis extend traditional wikis by
fragment at the desired place of the mash-up or searching                   functionalities that enable users to add annotations to a
and dragging an existing fragment to it.                                    wiki page and to specify relationships between pages based
   Loomp makes use of the semantic annotations for search-                  on ontologies. In our opinion semantic wikis are far from
ing fragments and mash-ups. For example, if a search term                   being usable by non-experts. Besides the effort to learn
has been annotated with different concepts, then the result                 a special syntax to write and to annotate content, a user
items are grouped and displayed according to these concepts.                has to cope with technical terms such as resource, different
In a second step, a user can refine the search by retrieving                kinds of relationships, and namespaces. In contrast semantic
the remaining fragments of a group. A fragment can be con-                  tagging engines such as faviki4 exploit the well-known user
tained in many different mash-ups, it is even possible that                 interaction procedure of tagging to annotate content. In the
a mash-up contains the same fragment more than once. To                     background faviki calls functions of the Zemanta Semantic
comply with the linked data principles a user can link frag-                API5 to retrieve suggestions for tags, e.g., Wikipedia terms.
ments and mash-ups to other resources on the Web, e.g.,                     4
                                                                                http://www.faviki.com
3                                                                           5
    http://tinymce.moxiecode.com/                                               http://www.zemanta.com/
Zemanta as well as OpenCalais6 are examples for services          Acknowledgments
that automatically annotate content. In Loomp we use these        This work has been partially supported by the “InnoProfile-
services to suggest annotations to users.                         Corporate Semantic Web” project funded by the German
   In [8], the author present a JavaScript API for modify-        Federal Ministry of Education and Research (BMBF).
ing RDFa directly on the client side and synchronizing the
changes with the server. While our OneClickAnnotator is
suitable for extensive changes of annotations of a text, this     6.   REFERENCES
JavaScript library is a useful supplement for smaller changes      [1] B. Adida and M. Birbeck. RDFa primer – bridging the
of annotated texts.                                                    human and data webs. W3C Working Group Note,
   The Tabulator linked data browser [5] allows users to edit          Oct. 2008.
data directly on the Web of data. However, since it requires       [2] S. Auer. Triplify. Project Website, Retrieved January
a Firefox plug-in in its current stage of development we see           9, 2009, from http://triplify.org.
it as a proprietary tool. In the context of OpenLink Data          [3] S. Auer, C. Bizer, J. Lehmann, G. Kobilarov,
Spaces7 provides a complete platform for creating a pres-              R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a
ence on the Web of data, e.g., calendar, weblog, or book-              web of open data. In Proceedings of ISWC/ASWC
mark manager. However, they focus on describing the data               2007, volume 4825 of LNCS, pages 715–728. Springer
entities semantically while we enrich the content itself.              Verlag, Nov. 2007.
   In addition, many wrappers have been developed for web-         [4] S. Auer, S. Dietzold, and T. Riechert. OntoWiki - A
sites and relational databases to generate linked data. For            Tool for Social, Semantic Collaboration. In I. F. Cruz
example, in [6] Bizer et al. describe one of many examples             et al., editors, The Semantic Web - ISWC 2006,
for writing an API wrappers to mash-up and interlink data              volume 4273 of LNCS, pages 736–749. Springer, 2006.
as RDF. Auer et al. present an approach to create linked           [5] T. Berners-Lee et al. Tabulator redux: Writing into
data based on crawling and processing data from Web pages              the semantic web. Technical report, ECS, University
in [3]. As a more direct possibility to publish linked data            of Southampton, 2007.
several tools support the mapping of relational databases to       [6] C. Bizer, R. Cyganiak, and T. Gauss. The RDF Book
RDF, e.g., D2RQ [7], RDB2RDF [13], and Triplify[2]. All                Mashup: From Web APIs to a Web of Data. In
these approaches have in common that they only support an              S. Auer, C. Bizer, T. Heath, and G. A. Grimnes,
indirect way of creating linked data, e.g., an author cannot           editors, SFSW, volume 248 of CEUR Workshop
directly annotate the content.                                         Proceedings. CEUR-WS.org, 2007.
   On his website [10] the author presents the idea of writing
                                                                   [7] C. Bizer and A. Seaborne. D2rq - treating non-rdf
RDF data directly to non-RDF data sources. With loomp
                                                                       databases as virtual rdf graphs. In ISWC2004
we pursue a similar goal but from our viewpoint the data
                                                                       (posters), November 2004.
should recide on the user’s server and not on the application
                                                                   [8] S. Dietzold, S. Hellmann, and M. Peklo. Using
server, if the user wishes so. Using the loomp plug-in an
                                                                       javascript rdfa widgets for model/view separation
external application can directly retrieve the user data from
                                                                       inside read/write websites. In Proceedings of the 4th
his server.
                                                                       Workshop on Scripting for the Semantic Web, 2008.
                                                                   [9] P. Glotz and R. Meyer-Lucht. Zeitung und Zeitschrift
                                                                       in der digitalen Ökonomie – Delphi-Studie. Project
5.     CONCLUSION AND OUTLOOK                                          Website, Retrieved January 10, 2009, from
                                                                       http://www.unisg.ch/org/mcm/web.nsf/
  In this paper we presented a Web application for creating,
                                                                       wwwPubInhalteGer/Online Publishing Delphi-Studie.
managing, and publishing semantic data, namely Loomp.
With Loomp we make an important contribution to the suc-          [10] M. Hausenblas. pushback - Write Data Back From
cess of linked data. In contrast to existing editors, our main         RDF to Non-RDF Sources. http:
focus lies on an intuitive user interface, that enables every          //esw.w3.org/topic/PushBackDataToLegacySources,
Web user to produce semantically enriched content and to               March 2009. retrieved on 3rd March, 2009.
distribute it across various media easily. Furthermore, we        [11] T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, and
reduced the system requirements to operate a Loomp server              O. Hartig. How to publish linked data on the web,
to a minimum (LAMP system) and integrated a linked data                October 2008. Tutorial at the ISWC2008, retrieved on
server which provides all public content as linked data to             10th February, 2009.
increase the awareness and usage of linked data .                 [12] M. Krötzsch, D. Vrandecić, and M. Völkel. The
  An initial version of Loomp has recently been released               Semantic Web - ISWC 2006, chapter Semantic
which illustrates the basic functionalities, e.g., content man-        MediaWiki, pages 935–942. Lecture Notes in
agement and publishing. As a major feature, we will exploit            Computer Science. Springer Verlag, 2006.
existing web services to propose annotations automatically        [13] A. Malhotra. Progress report from the rdb2rdf xg. In
which can be accepted or rejected by a user. In our future             C. Bizer and A. Joshi, editors, International Semantic
work, we will also address the integration and the support             Web Conference (Posters & Demos), volume 401 of
of third party applications such as blogs, wikis, and word             CEUR Workshop Proceedings. CEUR-WS.org, 2008.
processors.                                                       [14] R. Oldakowski et al. RAP: RDF API for PHP. In
                                                                       Proceedings of SFSW 2005, May 2005.
                                                                  [15] S. Schaffert. Ikewiki: A semantic wiki for collaborative
6
    http://www.opencalais.com/                                         knowledge management. In Proceedings of STICA’06,
7                                                                      Manchester, UK, June 2006.
    http://virtuoso.openlinksw.com/wiki/main/Main/Ods

</pre>