<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IMGpedia: A Proposal to Enrich DBpedia with Image Meta-Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Bustos</string-name>
          <email>bebustos@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <email>ahogan@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Semantic Web Research Department of Computer Science University of Chile</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We introduce IMGpedia: a research proposal aiming to bridge structured knowledge-bases and multimedia content. Our concrete plan is to enrich DBpedia data with further metadata about images from Wikipedia, including content-based visual descriptors. Our concrete goal is to create a uni ed querying and inference system that allows for interrogating the DBpedia knowledge-base and the visual content of Wikipedia's images together. Our broader ambition is to explore methods by which multimedia data can be made a rst-class citizen of the Semantic Web.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an ongoing e ort by the Linked Data community to extract
structured content from Wikipedia and represent it in RDF. The main goal is
to enable users to query the content of Wikipedia as a whole, getting direct
answers automatically aggregated from multiple articles. The most recent
version of DBpedia contains billions of facts extracted from 125 language versions
of Wikipedia, with links to and from dozens of external datasets. Over the
past seven years, it has become the central dataset of the Linked Open Data
community [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>However, DBpedia mainly focuses on extracting information from Wikipedia's
info-boxes: attribute{value panes that appear on the right-hand side of articles.
As such, aside from adding links, DBpedia ignores the images appearing in the
body of the article for a given entity as well as the structured data available in
image pages: no meta-data are extracted for images. Like many initiatives in the
Semantic Web1, DBpedia links to but otherwise disregards multimedia content.</p>
      <p>Our proposal is thus to extract and associate meta-data from the images
embedded in Wikipedia and link the resulting corpus with the DBpedia dataset.
This dataset { which we call IMGpedia { would consider all images in an
article, all meta-data associated with the image available from Wikimedia (author,
date, size, etc.) and visual descriptors that capture the content of the image
itself.
1 But not all: see, e.g., http://www.w3.org/2005/Incubator/mmsem/</p>
      <p>We are motivated by the idea of creating a corpus that allows for querying,
in unison, both the structured/semantic meta-data of DBpedia and the visual
content extracted from images; e.g., \give me Europe cathedrals that have an
image visually similar to one of the external images for Cusco Cathedral in Peru ".
Likewise, we foresee the possibility of inferring new links from this dataset, e.g.,
inferring that Saddam Hussein and Donald Rumsfeld have met based on being
associated with the same image (in which they are co-depicted). The resulting
corpus may also serve as an interesting experimental dataset for the
imageprocessing community, where the structured data associated with images may
serve as a ground truth.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Images and Visual Descriptors</title>
      <p>Before describing IMGpedia, we need to introduce some basic concepts about
how images are encoded and what are visual descriptors. An image is a matrix of
so-called pixels (picture elements). A pixel contains information about its color,
which can be displayed for example on a computer monitor. There are several
ways to encode the color information of a pixel, which depends on the selection of
a color space. Common color spaces are RGB (red-green-blue, used by computer
monitors) and CMYK (cyan-magenta-blue-black, used by printers), where colors
are represented as tuples of numbers; for example, an RGB color is represented
by a three tuple. There are several ways to compress the image encoding, mainly
lossy compression methods (e.g., JPEG format) and lossless methods (e.g., PNG
format).</p>
      <p>A visual descriptor is a way of characterizing an image based on its content.
This can be done considering the whole image (global descriptor) or regions of
interest detected on the image (local descriptors). For this work, we will initially
focus on global descriptors since they can be computed more e ciently than
local descriptors, and likewise similarity between them is also more e cient to
compute.</p>
      <p>
        Visual descriptors can be de ned in several ways; e.g., based on the colours,
texture and/or shape of the image. They do not include any semantic information
about what appears on the image|hence why they are also called \low-level
features". For instance, a simple colour descriptor is the colour histogram [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], that
captures the distribution of colour in the image. We note that visual descriptors
are usually vectors of high dimensionality (tens to hundreds of real values).
      </p>
      <p>Visual descriptors allow us to implement, e.g., content-based similarity search.
A similarity query in an image data set returns the most similar images,
according to its content, to a given one (the query image). This is also known as
query-by-example. Formally, let U be the universe of all images, let S 2 U be
the image data set, and let : U U ! R+ be a function (the distance) that
returns how dissimilar are two images. There are two basic types of similarity
queries: (1) Range query : given the query image q 2 U and a tolerance radius
r 2 R+, return all images from S that are within distance r to q; (2) Top-k
query : return the k-closest objects to q. If S is formed by all visual descriptors
(high-dimensional vectors) extracted from the images in the data set, and if q is
the visual descriptor of the query image, and if is any metric function (e.g., the
Euclidean distance), it is relatively straightforward to implement content-based
range and top-k queries over S.
3</p>
    </sec>
    <sec id="sec-3">
      <title>IMGpedia Dataset</title>
      <p>Our vision of IMGpedia is an enhanced version of DBpedia with image
entities. An image entity contains metadata (e.g., title, subject, source, format,
description, date, size, location, etc.) and content-based descriptors (e.g., colour
descriptor) of the image. Image entities can be linked with other entities (not
necessarily images).</p>
      <p>
        For creating the IMGpedia dataset itself, we propose the following
procedure:
{ Locate and download images/image-pages from Wikimedia.
{ Extract meta-data from the image page, including its size, author, licence,
etc. Annotate images with tags computed from its (possibly many)
captions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
{ Compute the visual descriptors for the images. For this, we can use global
visual descriptors like colour and edge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], following the MPEG-7 standard [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
{ Create the image entities using the extracted metadata and content-based
data.
      </p>
      <p>{ Represent and publish the IMGpedia dataset as Linked Data.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Querying IMGpedia</title>
      <p>Our main research goal is to investigate methods by which semantic data (in
this case DBpedia) and multimedia data (in this case describing Wikipedia
images) can be combined such that they can be queried in a holistic manner. In
the context of IMGpedia, our approach is divided into three main parts:
materialising links between image resources, extending SPARQL to execute
contentbased analysis at runtime, and inferring new links between \primary entities"
based on image data.</p>
      <p>
        Materialising relations between images using content-based descriptors.
Lowlevel descriptors do not contain any semantic information about the original
image, making them hard for users to leverage in queries. This problem is known
as the semantic gap [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, high-level relations among image entities can
be computed from visual descriptors and similarity queries. For example, the
relation near-copy can be de ned as two di erent images with distance less
than some threshold . By using range queries, it would be easy to nd all pairs
of near-copies among the images. Other relevant relations that can be considered
are alt-size, contains and similar. These could also be materialised as triples
and added to the structured knowledge-base, with appropriate inference { e.g.,
for symmetry, re exivity or subsumption of relations { allowing users to specify
SPARQL queries such as:
SELECT ?usPolitician WHERE {
db:Saddam_Hussein foaf:depiction ?img1 .
?usPolitician dbo:party db:Republican_Party_(US) ;
      </p>
      <p>foaf:depiction ?img2 .</p>
      <p>?img1 i:nearCopy ?img2 .
}
Extend SPARQL with functions for content-based image search. Not all
contentbased user requirements can be anticipated in the form of discrete relations.
Hence we propose to extend SPARQL to include content-based analysis features.
More speci cally, we propose to use extensible functions in SPARQL and custom
datatypes to enable queries that combine querying of semantic content and image
content. Taking the introductory example, let's say that the user wishes to nd
cathedrals in Europe with similar images to external images of Cusco Cathedral
in Lima:
SELECT ?cathedral ?sim WHERE {
db:Cusco_Cathedral foaf:depiction ?img1 .</p>
      <p>FILTER(i:colorRatio(?img1,i:rgb(40,100,150),i:rgb(170,200,255)) &gt; 0.2)
?eurCathedral rdf:type dbo:ReligiousBulding ;
dbo:location [ dcterms:subject dbc:Countries_in_Europe ] ;
foaf:depiction ?img2 .</p>
      <p>BIND(i:sim(?img1,?img2) as ?sim) FILTER(?sim &gt; 0.7)
} ORDER BY ?sim</p>
      <p>The rst FILTER uses extended functions to only consider images that have
more than 20% of their pixels falling within the cuboid of colours bounded by the
two RGB points (looking for blue sky). The subsequent BIND and FILTER allow
the images from European buildings to be ltered and ordered by similarity.</p>
      <p>A major challenge here is balancing expressivity and e ciency. In the above
case, given a reasonable query plan, the rst lter can be applied over the six
images appearing in the Cusco Cathedral article, but then all images of all
religious buildings in Europe need to be compared with the images that pass
the rst step. In order to improve the performance of queries, we propose to
investigate the use of image indexing techniques that allow for such lters to be
executed a lookup, rather than a post- lter, which should lead to more options
for query planning. For example, in the query above, a more e cient query plan
may try to bind values for ?img2 using a similarity range query (over values
bound for ?img1) allowing for a join to be computed with the knowledge-base
rather than applying a brute-force similarity lter over bindings produced by
the knowledge-base for ?img2.</p>
      <p>We see this as being one of the deepest technical challenges posed by the work:
creating cost models and query plans that combine indexes over the
knowledgebase and multimedia content appears to be a challenging but general problem.
Content-based-driven knowledge discovery. A more speculative idea is to infer
new knowledge about the data using the images entities and their relations.
For example, say that two DBpedia resources are associated with the same
(near-copy of an) image. If both resources are of type dbo:Person, the relation
hasMet could be inferred. If one resource was a dbo:Person and the other was
a dbo:Place, the relation hasVisited could be inferred. Such inferences could
be axiomatised as domain-speci c rules. Of course, the resulting inferences may
not always be crisp conclusions, but may be associated with a con dence value.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this short paper, we have introduced and motivated IMGpedia: a proposal
to enrich DBpedia with meta-data extracted from Wikipedia images. We view
IMGpedia as a concrete use-case through which to investigate the challenges
and opportunities of combining semantic knowledge-bases with multimedia
content.</p>
      <p>Acknowledgements This work was supported by the Millennium Nucleus Center
for Semantic Web Research, Grant № NC120004, and Fondecyt, Grant № 11140900.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          , P. van Kleef,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer. DBpedia - A Large-scale</surname>
          </string-name>
          ,
          <article-title>Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Manjunath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Ohm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Yamada</surname>
          </string-name>
          .
          <article-title>Color and texture descriptors</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          ,
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <volume>703</volume>
          {
          <fpage>715</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. MPEG-7
          <string-name>
            <surname>Overview</surname>
          </string-name>
          . URL: http://mpeg.chiariglione.org/standards/mpeg-7/mpeg7.htm (accessed:
          <year>2015</year>
          {
          <volume>01</volume>
          {29),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Noah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kassim</surname>
          </string-name>
          .
          <article-title>Going Beyond the Surrounding Text to Semantically Annotate and Search Digital Images</article-title>
          .
          <source>In Intelligent Information and Database Systems</source>
          , pages
          <fpage>169</fpage>
          {
          <fpage>179</fpage>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmachtenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <source>Linking Open Data Cloud Diagram</source>
          <year>2014</year>
          . http://lod-cloud.net/; l.a.
          <year>2015</year>
          /01/30.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>A. W. M. Smeulders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jain</surname>
          </string-name>
          .
          <article-title>Contentbased image retrieval at the end of the early years</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>22</volume>
          (
          <issue>12</issue>
          ):
          <volume>1349</volume>
          {
          <fpage>1380</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>