<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Framework for Social Semantic Journalism</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bahareh Rahmanzadeh Heravi</string-name>
          <email>Bahareh.Heravi@deri.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jarred McGinnis</string-name>
          <email>jarred@logomachy.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Enterprise Research Institute (DERI) National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Logomachy Ltd</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>13</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>Increasingly, news breaks on social media, where ordinary citizens post images and videos and their own commentary in the form of text. This user-generated content (UGC) is newsworthy information and invaluable for newsrooms. In order to incorporate this data into a news story, the journalist needs to process, compile and verify information on the social web within a very short timespan. This is done mostly manually and is a time-consuming and labour-intensive process for media organisations. This paper proposes Social Semantic Journalism framework as an assistant to journalists for breaking news production, and as solution to the above problem.</p>
      </abstract>
      <kwd-group>
        <kwd>Social Semantic Journalism</kwd>
        <kwd>Citizen Journalism</kwd>
        <kwd>User Generated Content</kwd>
        <kwd>Social News</kwd>
        <kwd>Semantic News</kwd>
        <kwd>Social Web</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Linked Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The consumers of news and information are no longer passive and isolated
consumers. Smart phones, digital cameras, mobile internet and social media platforms have
made us all broadcasters of information. We consume information from traditional
news sources, but also through social media platforms, with 1/3 of adults under 30
getting their news from social media [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We form communities to inform one
another, we comment, we coordinate, and we disseminate. This ubiquity of new
technologies has made it more likely than ever that an individual or a community, not a
professional journalist, will be the initial source of information for a breaking news
event. This community-sourced data, or “citizen/social journalism”, is a valuable
source of information for news media organisations across the world.
      </p>
      <p>Journalists are already monitoring social media for scoops, details, and images, but
the process is laborious, and provides inconsistent results. In the deadline-driven
world of journalism, the need to process huge volumes of community-sourced data in
order to extract potential news stories is a universal problem. This data, known as
user-generated content (UGC) is mostly unstructured, unfiltered and unverified, and
often lacks contextual information. Traditional approaches to newsgathering are
quickly overwhelmed by the volume and velocity of information being produced.
Extracting stories from UGC goes beyond the simple transcoding of individual
streams; it is also important for news organisations to have richly annotated, analysed
and interconnected content.</p>
      <p>
        Social Semantic Journalism addresses a universal problem experienced by media
organisations; the combination of vast amount of UGC across social media platforms
and the limited amount of time the journalist has to spare to extract potential news
stories from these mostly unstructured, unfiltered and unverified data. In this
situation, there is evidently a need for solutions that can help source, filter and verify
social media content for media organisations who are now competing with the
continuous flow of free content available 24/7 on the web, while budgets are tight and
deadlines are tighter. Social Semantic Journalism also aims to address the chief obstacle
facing news organisations, the vetting process, since the current manual process of
checking through user-generated content is considered to be overwhelming and
inadequate [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The remainder of this paper is as follows. Section 2 introduces Social Semantic
Journalism. Section 3 briefly describes the technologies to best suited to constitute a
framework for Social Semantic Journalism. Section 4 concludes the paper with
summary remarks about the potential impact of Social Semantic Journalism.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Social Semantic Journalism</title>
      <p>
        The user-generated content shared on the social media now forms a significant source
of first hand information and content for breaking news coverage. Every minute 347
new blog posts are created, 74 hours of new video is uploaded to YouTube, 100,000
tweets are sent and Facebook users share 684,478 pieces of content [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Amongst
these data is valuable information that the professional journalist can use to create
breaking news stories. This huge amount of user-generated content, however, cannot
be processed manually and there is no existing search engine or online tool that can
source, aggregate, filter and verify these fast paste and voluminous streams of data for
news reportage.
      </p>
      <p>
        Social Semantic Journalism proposes a Semantic-based solution that can formalise
and link unstructured UGC to other semantically-enriched data sets in what is termed
the “Linked Data Cloud” for integration, verification and fact-checking purposes, e.g.
government datasets or DBpedia/Wikipedia. By working with the media industry,
Semantic Web researchers can significantly add to the emerging field of
computational or data journalism by “developing techniques, methods, and user interfaces” that
can “help discover, verify, and even publish new public-interest stories at lower cost”
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Semantic Web technologies are a means for providing a machine readable data
structure and also facilitate information integration from various sources which
are built using the same underlying technologies. The Semantic Web effort is
considered to be in an ideal position to make social web platforms interoperate by providing
standards to support data interchange and interoperation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The application of the
Semantic Web to the Social Web, termed the “Social Semantic Web”, has the
potential to create a network of interlinked and semantically enriched user generated
knowledge base, bringing together applications and social features of the Social Web
with knowledge representation languages and formats from the Semantic Web [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>A Framework for Social Semantic Journalism</title>
      <p>There are a number of technologies that will be required to produce a Social Semantic
Journalism Framework. These technologies would inevitably work together,
becoming the inputs and outputs for each other, creating a process up to meet the challenge
that social media presents to journalists and editors as they try to what is news worthy
in UGC. Figure 2 illustrates the technologies and process to realise Social Semantic
Journalism.
Content Discovery is the ingestion the raw content from social media and enriching
it with semantic metadata, which can be made use of by the other phases.
Data Ingestion is gathering a representative sample of data from microblog updates
and the users posting them.</p>
      <p>Entity Extraction and Semantic Annotation involves the identification of semantic
entities such as ‘place’, ‘organisation’ and ‘person’ and linking them with relevant
semantic metadata from Linked Open Data.</p>
      <sec id="sec-3-1">
        <title>Event Detection is the identification of events as they happen.</title>
        <p>Noise Detection is the detecting and filtering non-relevant content from streams where
a topic has already been identified.</p>
        <p>Burst Detection is the discovery of bursts or sudden increases in frequency of topic
and/or location specific microblog.</p>
        <p>Filtering and Contextualisation uses the derived metadata from Content Discovery
phase, and further refines the metadata, associating related content, putting news
stories within a wider context of the news agenda and world events and starting to
develop a provenance trace.</p>
        <p>Contextualisation discovers background and contextual information for a specific
news story, leveraging the metadata created during the content discovery stage.
Provenance Construction is to produce a provenance trace and graph to be utilised for
the trust verification stage.</p>
        <p>User and Content Geolocation is to approximate the relevant location of an
event/tweet by exploiting a combination of explicit GPS coordinates, disambiguation
through semantic annotation and making use of social graph data.</p>
        <p>Community Analysis relies on the event, burst and noise detection to isolate users
generating timely and relevant UGC.</p>
        <p>Trust Verification utilises the provenance data and the extracted concepts from
Content Discovery and Filtering and Contextualisation.</p>
        <p>Provenance Analysis provides the analysis, abstraction and summarisation of
provenance information, which would help journalists in identifying eyewitnesses and
assessing reputation of the source.</p>
        <p>Point-of-View Analysis provides indicators for the perspective or point-of-view of a
piece of content to inform the journalist as to the likely perspective the content takes.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Veracity determines the veracity of the content of a post. Publication is concerned with annotation, publication and archival of produced news stories. This phase feeds back to filtering and contextualisation phase for future historical contextualisation purposes.</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This paper introduced Social Semantic Journalism framework, as a solution that can
help journalists in the process of breaking news production, when the initial source is
social media.</p>
      <p>The potential impact of a framework for Social Semantic Journalism includes a
dynamic and flexible alternative to newswire subscriptions, providing high-quality
and timely news and open up the market to new media aggregators, curators and
commentators, creating new business opportunities and opportunities for media
exploitation and reuse. The novel ability to effectively exploit, large-scale social media
streams by journalists will give a voice to citizens, enabling journalists to do richer
and more relevant story development faster.</p>
      <p>There is increasing evidence of a tipping point for technologies such as semantic
web, linked data and natural language processing. Non-technology companies in the
news industry sector such as BBC, New York Times, Novosti and the Press
Association have begun to make considerable capital investments in the technologies
employed with this project (e.g. linked data, language analytics, etc.). The benefits of
using Linked Data for knowledge sharing, integration and reuse has been validated in
a variety of application domains and contexts from the digital enterprise to healthcare
and green IT. A social semantic journalism framework is an opportunity to
demonstrate the viability of these approaches for integrating social media information from a
community of users into the mainstream news media workflow.</p>
      <p>An immediate consequence of the framework would be the development of a set of
API-driven semantic services and tools to process social media content and data is the
stimulation of demand for high-performance, bandwidth-hungry media applications
and services. The torrent of disparate, contradictory and unstructured social media
content is made accessible, relevant and useful.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>The work presented in this paper has been partially funded by Science Foundation
Ireland (SFI) under Grant Numbers 2/TIDA/I2389 and SFI/08/CE/I1380 (Lion-2).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Sonderman</surname>
          </string-name>
          , Jeff.
          <article-title>One-third of adults under 30 get news on social networks now</article-title>
          . http://www.poynter.org/latest-news/mediawire/189776/one-third
          <article-title>-of-adults-under-30-getnews-on-social-networks-now/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2008</year>
          . Definition of Citizen Journalism [Online], Available from: http://www.youtube.com/watch?v=
          <fpage>QcYSmRZuep4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. DOMO, How Much Data is Created Every Minute? Available from:http://www.domo.com/blog/2012/06/how-much
          <article-title>-data-is-created-every-minute/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>M, and</article-title>
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>P. N.</given-names>
          </string-name>
          <year>2010</year>
          . Opening Closed Regimes: Civil Society, Information Infrastructure, and
          <string-name>
            <given-names>Political</given-names>
            <surname>Islam</surname>
          </string-name>
          .
          <source>In Annual meeting of the American Political Science Association.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Turner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Computational journalism</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>54</volume>
          (
          <issue>10</issue>
          ), p.
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Berrueta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brickley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Görn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Idehen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kjernsmo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Sintek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>SIOC Core Ontology Specification (W3C Member Submission)</article-title>
          .
          <source>W3C.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tramp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frischmuth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shekarpour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>Sö.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>An Architecture of a Distributed Semantic Social Network. Semantic Web Journal, Special Issue on The Personal and Social Semantic Web</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>8. http://dev.iptc.org/rNews.</mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>McGinnis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilton</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Donovan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2012</year>
          ) http://data.press.net/ontology/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Breslin</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
          </string-name>
          , S. The Social Semantic Web: Springer, ISBN
          <volume>9783642011719</volume>
          ,
          <issue>3</issue>
          <year>October 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Heravi</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Breslin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2012</year>
          , May).
          <source>Towards Social Semantic Journalism. Workshop on the Potential of Social Media Tools and Data for Journalism in News and Media Industry at the Sixth International AAAI Conference on Weblogs and Social Media.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Avram</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Gartner's Software Hype Cycles for 2012</article-title>
          . http://www.infoq.com/news/2012/08/Gartner-Hype-Cycle-
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>