Linked Data Utilization along the Content Value Chain –
              Observations and Implications
          Georg Neubauer                                                                                 Tassilo Pellegrini
  University of Applied Sciences St.                                                            University of Applied Sciences St.
                Poelten                                                                                       Poelten
  Matthias Corvinus Str. 15, 3100 St.                                                           Matthias Corvinus Str. 15, 3100 St.
           Poelten, Austria                                                                               Poelten, Austria
      dm131520@fhstp.ac.at                                                                       tassilo.pellegrini@fhstp.ac.at

ABSTRACT                                                                     500                                                447
The authors present the results of a longitudinal investigation in
                                                                             400                                          368          366
the utilization of Linked Data technologies along the content value                                                 317
chain. The authors analyzed 71 papers in the period from 2006 to
                                                                             300
2014 that used Linked data technologies in editorial workflows.                                              208
By coding the primary and secondary research topics addressed in             200
the paper the authors draw a conclusion of the maturity of Linked                                       88
Data technologies as support systems along the content value                 100     33    47    47
chain. The survey indicates that Linked Data technologies are
                                                                               0
constantly maturing as a support infrastructure for editorial
processes. The validity of the survey results for application                      2006 2007 2008 2009 2010 2011 2012 2013 2014
domains not related to editorial tasks is open to discussion.
                                                                           Figure 1. ACM Publications containing the term “Linked
Categories and Subject Descriptors                                         Data” from 2006 – 2014 (N = 1921)
E.0 [General]; K.4.3 [Organizational Impacts]                              To tackle these questions the authors chose to analyze a subset of
                                                                           research papers from the ACM database that address the
General Terms                                                              application of Linked Data within editorial workflows. This subset
Management, Economics, Human Factors, Standardization                      allowed us to apply a unified classification scheme – known as the
                                                                           content value chain [1] – to the various application areas of
                                                                           Linked Data. The content value chain can be described as a
Keywords                                                                   process model that is comprised of several sequential steps
Linked Data, Content Value Chain, Semantic Metadata, Semantic
                                                                           contributing to the content production process. By looking at the
Web, Data Journalism, News Production, Editorial Workflows,
                                                                           application area of Linked Data in editorial workflows it was
Media Economics, IPR, Data Licensing
                                                                           possible to identify primary and secondary areas of utilization,
                                                                           thus allowing us to draw conclusions towards the diffusion and
1. INTRODUCTION                                                            appropriability of Linked Data for the production of media
The growing recognition of Linked Data among the research                  content.
community as “Semantic Web done right” [14] motivates to take a
closer look if and how Linked Data research has evolved over the           2. CLASSIFICATION SCHEME &
recent years. Such an investigation allows to gain insights into
research trends and interdependencies thereof, and it allows to               RELATED WORK
draw conclusions whether the research field has reached a                  The original concept of the value chain as developed by Michael
significant degree of maturity in terms of technology diffusion and        Porter in 1979 is used as an analytical framework for the analysis
application areas.                                                         of value creation processes at the firm level or the industry level
As illustrated in Figure 1 a survey about the occurrence of the            [15]. Over recent years the concept of the value chain has also
phrase “Linked Data” in research publications of the ACM digital           gained popularity in the context of open data in general [4; 6; 16]
library from the period 2006 to 2014 reveals the growing                   and Linked Data in special [3; 5]. Especially research that
popularity of this technological concept in the computer sciences          investigated the organizational and economic impact of Linked
till 2013 with a decline in 2014. Linked Data as a generic                 Data refers to the concept of the value chain [13].
technology for data management is being applied across various             In this paper we refer to a generic abstraction of the content value
application areas and industries, making it very hard to come to a         chain consisting of five steps: 1) content acquisition, 2) content
general statement concerning its level of maturity and industry            editing, 3) content bundling, 4) content distribution and 5) content
adoption. So is this distribution from figure 1 an indicator for the       consumption. As illustrated by [1] Linked Data can contribute to
growing maturity of a research field? And if yes, how can this             each step by supporting its associated intrinsic production
maturity be operationalized empirically?                                   function. These are in detail:
                                                                           Content acquisition is mainly concerned with the collection,
                                                                           storage and integration of relevant information necessary to


                                                                       8
produce a news item. In the course of this process information and         with main classification (black) multiplied with the amount of the
facts are being pooled from internal or external sources for further       related classifications for the secondary classification. Figure 4
processing.                                                                illustrates the results of our survey.
Content editing entails all necessary steps that deal with the
semantic adaptation, interlinking and enrichment of data.
Adaptation can be understood as a process in which acquired data
is provided in a way that it can be used in the editorial process.
Interlinking and enrichment are often performed via processes like
tagging and/or referencing to enrich media documents either by
disambiguating existing concepts or by providing background
knowledge for deeper insights.
Content bundling is mainly concerned with the contextualization
and personalization of information products. It can be used to
provide customized access to media files i.e. by using metadata for        Figure 2. Legend: time-based categorization into the content
the device-sensitive delivery of content, or to compile thematically       value chain
relevant material into Landing Pages or Dossiers thus improving
the navigability, findability and reuse of information.
In a Linked Data environment the process of content distribution           4. RESULTS
mainly deals with the provision of machine-readable and                    4.1 General Findings
semantically     interoperable    (meta)data     via    Application        Figure 3 illustrates the general findings of our investigation, which
Programming Interfaces (APIs) or SPARQL Endpoints. These can               are showing the result of all years later discussed in 4.2 as
be designed either to serve internal purposes so that data can be          influence circles on a grid. The diagonal line with the black circles
reused within controlled environments (i.e. within or between              represent the amount of papers within the main classification,
units) or for external purposes so that data can be shared between         while the other circles show the related classifications if they are
unknown users (i.e. as open SPARQL Endpoints on the Web).                  read in a horizontal way. As mentioned, related classifications are
Content consumption entails any means that enable a human user             a result of additionally found secondary topics that match the
to search for and interact with content items in a pleasant und            content value chain, for one paper already has a main topic
purposeful way. So according to this view this level mainly deals          classification.
with end user applications that make use of Linked Data to
provide access to content i.e. by providing reasonable retrieval
tools and/or visualizations.
The five steps of the content value chain comprise the
classification scheme.

3. METHODOLOGY
We selected a sample of 71 papers (out of 1921) dealing with the
utilization of Linked Data in editorial workflows in the period
from 2006 to 2014 from the ACM Digital Library (DL). The
selected papers had to comply with the following criteria: 1) the
work must analyse the utilization of Linked Data with reference to
some sort of editorial workflow; and 2) the work must not be
purely theoretical but provide at least a proof of concept. The
relevant papers have then been analysed and clustered according
to the five classes acquisition, editing, bundling, distribution,
consumption. As most papers treated more than one of these
topics we weighted each paper according to the primary and
secondary topic discussed, thus also gaining a better
understanding how the research topics relate to each other.
                                                                           Figure 3. Influence cycles (result) – time-based categorization
Figure 2 illustrates the classification scheme. The black boxes            into the content value chain
indicate the primary classification of a paper and the amount of
papers falling into this category. The secondary classification            The main application areas of Linked Data in editorial workflows
inherit a weighted greyscale value. The number in the grey and             fall into the areas editing (23 papers), bundling (18 papers) and
black boxes indicates how many papers referred to these classes.           consumption (21 papers).
Hence, reading the rows horizontally gives an overview how the             Crawling and leveraging processes could be subsumed as
primary classification of a paper relates to its secondary                 acquisition process [1] using special indexing methods for several
classification. Reading the columns vertically by summing up the           entities found and aggregated through queries. The indexing
values from the black boxes gives the amount of papers falling             methods built a fundament for further scientific processing called
into a specific class.                                                     content editing.
The weighted greyscale values have been calculated as follows.             Scientific editing using algorithmic methods to classify data into
Given that black is 100%. 50% divided by the amount of papers              separated, semantically enriched lists or ontologies were treated in

                                                                       9
23 papers as main topic. All of these editing methods were part of            to content bundling with subrelations to content acquisition and
a recognition process used for video-, text- or graphic- analysis in          content editing, while one of them also mentioned content
terms of media-analysis and enrichment of metadata.                           distribution or content consumption as tertiary topic. Four papers
18 papers concerned content bundling as main topic. Bundling can              address content consumption as main topic showing subrelations
easily be defined as fine-grained representations of resource parts           to content acquisition in all of their descriptions and one paper
used for personalization and contextualization of the content.                including further treatment of editing.
Just 4 papers described distributions for example in case of
improved accessibility of information. The main difference to the
content bundling process and the content consumption process
explained later on, therefore was, that only APIs can access this
data which in case of content bundling wasn't put to visualized
graphs of the content. This low number of distributions is not
significant for further conclusions.
21 papers applied Linked Data through a framework visualizing
graph-based relations of links. This sort of standard for framework
developers was to visualize links of Linked Data for purposes like
content recommendation.

4.2 Longitudinal Perspective
Figure 4 illustrates the results of our analysis from a longitudinal
perspective. The visualization scheme corresponds with Figure 3
but additionally lists the amount of papers (the black boxes) and
their related topics (the grey boxes) in the years from 2006 to
2014. I.e. if there are two papers of content acquisition in 2014,
this means that these two papers have their main classification in
content acquisition and related topics in all other areas of the value
chain.
2006: We found just one paper in 2006 with relation to our
research focus. This paper addressed content acquisition as main
topic and editing issues as secondary topic.
2007: In 2007 one paper was classified treating content bundling
as main topic and content acquisition as secondary topic. Two
papers addressing content consumption as primary topic and
acquisition, editing and bundling in treating only content
consumption.
2008: In 2008 we determine one paper addressing content
distribution and one paper addressing content consumption both
referring to content editing.
2009: We have three papers classified as content editing, content
bundling and content consumption. The subrelations in case of
content bundling is editing and in case of content consumption the            Figure 4. Primary and secondary topics in Linked Data
subrelations equally refer to content bundling and content                    utilization
distribution.
                                                                              2013: All papers that describe content editing frameworks in the
2010: In 2010 the authors detected one paper treating content                 year of 2013 also have acquisitional processes as topic. One of
acquisition, one paper treating content distribution and another              three papers addressing content editing have a subrelation to
one content consumption. Two papers treated content editing                   content bundling. Two papers are subrelated to content
frameworks. All of the five papers treated content acquisition as             distribution and one to content consumption. Only one paper
their secondary topic.                                                        related to content bundling subrelated to content acquisition and
2011: In 2011 one paper was about content acquisition, editing,               content editing. Four papers give reason to content consumption.
distribution and content consumption. The relations begin in the              Their relation to subclasses are three addressing content editing,
content editing class including a single subrelation to content               two addressing content bundling and four addressing content
acquisition and content consumption. Four papers have all an                  consumption frameworks as main topic.
equal amount of subrelations to content acquisition and editing.
Additionally one paper described a framework for content                      2014: In 2014 the classification scheme of the content value chain
consumption.                                                                  seems applicable to a huge amount of papers. We analysed 25
                                                                              papers and came to the conclusion that scientific content editing
2012: In 2012 the authors found one paper addressing content
                                                                              utilizing combinations of vocabularies for the preparation of
acquisition as main topic and content editing as secondary topic.
                                                                              linked data is high of note, i.e. automatic extraction RDF-Triples
Two papers demonstrated the opposite pattern, discussing editing
                                                                              from web sources for purposes of content enrichment. So 11
as main topic and acquisition as secondary topic. Four papers refer
                                                                              papers are classified as content editing in nearly all cases within

                                                                         10
acquisitional preprocessing. Content bundling with 5 papers and                Technologies          for       E-Government,         2004.
content consumption with 6 papers as main classification seem                  http://project10x.com/bio_downloads/business_value_of_sem
very similar spreaded in relation to the former years.                         anti c_technologies_2005.pdf, accessed May 9, 2015
                                                                            [5] Latif, Atif, Anwar Us Saeed, Patrick Hoefler, Alexander
5. DISCUSSION, LIMITATIONS & FUTURE                                             Stocker, and Claudia Wagner. “The Linked Data Value Chain:
   WORK                                                                         A Lightweight Model for Business Engineers.” In I-
The results show a trend in the utilization of Linked Data                      SEMANTICS,                568–75.       Citeseer,       2009.
technologies towards content editing, content bundling and                      http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.
content consumption. Especially the increasing amount of papers                 950 &rep=rep1&type=pdf.
addressing consumption purposes after 2009 is taken as an                   [6] Pepe, Alberto, Matthew Mayernik, Christine L. Borgman, and
indicator for the increasing maturity of Linked Data technologies               Herbert Van de Sompel. “Technology to Represent Scientific
in editorial workflows. We also made out a reason of the                        Practice: Data, Life Cycles, and Value Chains.” World Wide
increasing usage of content acquisition processes beginning in                  Web Internet And Web Information Systems, 2009, 1–22.
2008, assuming that the data infrastructure achieved reclaimable
                                                                            [7] Robak, Silva, Bogdan Franczyk, and Marcin Robak.
integrity. Concerning the main result the intertwinedness of
                                                                                “Research Problems Associated with Big Data Utilization in
research topics have seamless integration of distinct steps in the
                                                                                Logistics and Supply Chains Design and Management,” n.d.
content value chain. Metadata acquisition systems can minimize
                                                                                https://fedcsis.org/proceedings/2014/pliks/472.pdf.
the human burden in recording data [12]. Normally the content
acquisition process is the premier step to process data. We also            [8] Solanki, Monika, and Christopher Brewster. “Consuming
claim that there exists a structural relation between content                   Linked Data in Supply Chains: Enabling Data Visibility via
distribution and acquisition given the fact that these two processes            Linked        Pedigrees.”       In      COLD,         2013.
are technologically intertwined in interlinked data ecosystems.                 http://windermere.aston.ac.uk/~monika/papers/SolankiCOLD2
Content distribution could be treated as a main goal of data                    013.pdf.
storage and supply [13]. The authors assume that well established           [9] Taskar, Benjamin, Eran Segal, and Daphne Koller.
Linked Data stores are a precondition to content acquisition                    “Probabilistic Classification and Clustering in Relational
allowing further processing like content bundling, content                      Data.” In International Joint Conference on Artificial
distribution and content consumption. By taking this appropriate                Intelligence,     17:870–78.      LAWRENCE          ERLBAUM
amount of papers in 2014 we came to the conclusion that content                 ASSOCIATES                          LTD,                 2001.
editing takes root, but the consistency of the result should also be            http://ai.stanford.edu/users/koller/Papers/Taskar+al:IJCAI01.p
considered in a normalized way to the former years.                             df.
To gain further insights the authors plan to extend the sample size         [10]      Van Erp, Marieke, Willem Robert van Hage, Laura
of their survey in their future work. The current amount of 71                  Hollink, Anthony Jameson, and Raphaël Troncy. “Detection,
papers is simply too small to draw precise conclusions on the state             Representation, and Exploitation of Events in the Semantic
of the art and future direction of Linked Data utilization in                   Web,”              2013.             http://ceur-ws.org/Vol-
editorial workflows. But apart from these limitations the insights              1123/proceedingsderive2013.pdf.
generated by the survey indicate that Linked Data technologies are
                                                                            [11] Villazón-Terrazas,  Boris,     and     Oscar     Corcho.
constantly maturing as a support infrastructure for editorial                    “Methodological Guidelines for Publishing Linked Data.”
processes. The validity of the survey results for application                    Una Profesión, Un Futuro: Actas de Las XII Jornadas
domains not related to editorial tasks is open to discussion.
                                                                                 Españolas de Documentación: Málaga 25, no. 26 (2011): 20.
6. REFERENCES                                                               [12] Labrinidis, Alexandros, and H. V. Jagadish. “Challenges and
                                                                                 Opportunities with Big Data.” Proc. VLDB Endow. 5, no. 12
[1] Pellegrini, Tassilo. “Integrating Linked Data into the Content
                                                                                 (August 2012): 2032–33. doi:10.14778/2367502.2367572.
    Value Chain: A Review of News-Related Standards,
    Methodologies and Licensing Requirements.” In Proceedings               [13] Edward, Curry et al. "Big Data. Technical Working Groups
    of the 8th International Conference on Semantic Systems, 94–                 White Paper," 2014.
    102. ACM, 2012. http://dl.acm.org/citation.cfm?id=2362513.                   http://bigproject.eu/sites/default/files/BIG_D2_2_2.pdf
[2] Auer, Sören, Theodore Dalamagas, Helen Parkinson, François              [14] Berners-Lee, Tim (2008). Linked open Data. See also:
    Bancilhon, Giorgos Flouris, Dimitris Sacharidis, Peter                       http://www.w3.org/2008/Talks/0617-lod-tbl/#%281%29,
    Buneman, et al. “Diachronic Linked Data: Towards Long-                       accessed May 9, 2015
    Term Preservation of Structured Interrelated Information.” In
                                                                            [15] Porter, Michael (1985). Competitive Advantage. New York:
    Proceedings of the First International Workshop on Open
                                                                                 Free Press
    Data, 31–39. WOD ’12. New York, NY, USA: ACM, 2012.
    http://doi.acm.org/10.1145/2422604.2422610.                             [16] Archer, Phil; Dekkers, Max; Goedertier, Stijn; Loutas,
[3] Auer, Sören, Jens Lehmann, Axel-Cyrille Ngonga Ngomo,                        Nikolaos (2013). Study on business models for Linked Open
    and Amrapali Zaveri. “Introduction to Linked Data and Its                    Government Data (BM4LOGD - SC6DI06692). Services
    Lifecycle on the Web.” In Reasoning Web. Semantic                            See     also:    http://ec.europa.eu/isa/documents/study-on-
    Technologies for Intelligent Data Access, 1–90. Springer,                    business-modelsopen-government_en.pdf, accessed May 10,
    2013.       http://link.springer.com/chapter/10.1007/978-3-642-              2015
    39784-4_1.
[4] Davis, Mills.      “The Business Value of             Semantic
    Technologies.”     Presentation and Report,           Semantic

                                                                       11