<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IMGpedia: Enriching the Web of Data with Image Content Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Ferrada</string-name>
          <email>sferrada@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Bustos</string-name>
          <email>bebustos@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <email>ahogan@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Semantic Web Research Department of Computer Science University of Chile</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Data rarely takes into account multimedia content, which forms a central part of the Web. To explore the combination of Linked Data and multimedia, we are developing IMGpedia: we compute content-based descriptors for images used in Wikipedia articles and subsequently propose to link these descriptions with legacy encyclopaedic knowledge-bases such as DBpedia and Wikidata. On top of this extended knowledge-base, our goal is to consider a uni ed query system that accesses both the encyclopaedic data and the image data. We could also consider enhancing the encyclopaedic knowledge based on rules applied to co-occurring entities in images, or content-based analysis, for example. Abstracting away from IMGpedia, we explore generic methods by which the content of images on the Web can be described in a standard way and can be considered as rst-class citizens on the Web of Data, allowing, for example, for combining structured queries with image similarity search. This short paper thus describes ongoing work on IMGpedia, with focus on image descriptors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Wikipedia centres around the curation of human-readable encyclopaedia articles,
where at the time of writing it contains about 38 million articles in almost 200
languages. In terms of structured content, the DBpedia project [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] systematically
extracts meta-data from Wikipedia articles and presents it as RDF. Another
initiative, called Wikidata [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] { organised by the Wikimedia Foundation { allows users
to directly create and curate machine-readable encyclopaedic entries. In terms of
multimedia content, Wikimedia Commons1 is a collection of 30 million
freelyusable media les (image, audio, video) that are used within Wikipedia articles.
Recently, DBpedia Commons [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was released: a knowledge-base that takes the
meta-data from each multimedia le's description page { such as the author, size
and licensing of the media { and publishes it as RDF. However, none of the existing
knowledge-bases in this space describe aspects of the multimedia content itself.
      </p>
      <p>
        Because of this we are developing IMGpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a new knowledge-base that
uses meta-data from DBpedia Commons and enriches it with visual descriptors
of all images from the Wikimedia Commons dataset. The goal is to allow users
to perform \visuo-semantic queries" using IMGpedia, e.g., retrieve a painting for
every European painter from the 17th century or given a photo of yourself, obtain
the top-k most similar portraits painted by an American artist. We could also infer
new knowledge about DBpedia entities, given the relations between the images;
1 https://commons.wikimedia.org
e.g, if two entities of type dbo:Person share an image (or they are very similar), we
could infer, with some given likelihood, that they have met. Our general goal is to
investigate methods by which descriptors of the content of images { not just their
meta-data { can be integrated with the Web of Data in a standard way.
      </p>
      <p>
        In previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we proposed the general goals of IMGpedia. This paper is
a brief progress update: currently we are downloading the images from Wikimedia,
computing the visual descriptors for each image, and creating the image entities that
will be linked with existing encyclopaedic knowledge-bases. We thus propose a set
of existing multimedia algorithms for extracting descriptors from images, describe
their applications, and provide reference implementations in various programming
languages such that they be used as a standard way to (begin to) describe images
on the Web of Data. We provide some initial results on computing descriptors for
the images of Wikimedia Commons, we describe some of the technical engineering
challenges faced while realising IMGpedia, and we outline future plans.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Reference Implementations for Visual Descriptors</title>
      <p>Aside from simply providing the URL of an image, we wish to compute visual
descriptors for each image. These descriptors are vectors produced after performing
di erent operations over the matrix of pixels in order to obtain certain features, such
as color distribution, shapes and textures. Later, those vectors are stored as a part of
a metric space in which similarity search can be performed. We are considering four
di erent image descriptors which we have already implemented in Java, Python and
C++, and we have proven to give equivalent results among di erent languages over
a dataset of 2800 Flickr images. The rst three descriptors require a preprocessing
step, which is to convert the image to greyscale using intensity Y = 0:299 R +
0:587 G + 0:114 B, where R, G and B are the values of each color channel: red,
green and blue, respectively. The four descriptors we compute are:
Gray Histogram Descriptor: images are partitioned in a xed amount of blocks.</p>
      <p>
        Per each block a histogram of gray intensity is computed; typically intensity
takes 8 bit values. Finally, the descriptor is the concatenation of all histograms.
Oriented Gradients Descriptor: image gradient is calculated via convolution
with Sobel kernels [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A threshold is then applied to the gradient, and for those
pixels that exceed it, the orientation of the gradient is calculated. Finally, a
histogram of the orientations is computed and is used as the descriptor.
Edge Histogram Descriptor: for each 2 2 pixel block, the dominant edge
orientation is computed (horizontal, vertical, both diagonals or none), where the
descriptor is a histogram of these orientations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        DeCAF7: a Ca e neural network pre-trained with the Imagenet dataset [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is
used. To obtain the vector, each image is resized and ten overlapping patches
are extracted; each patch is given as input to the neural network and the last
self-convolutional layer of the model is extracted as a descriptor, so the nal
vector is the average of the layers for all the patches [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The implementations and their dependency/compilation documentation can be
found at https://github.com/scferrada/imgpedia under GNU GPL license.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Performance Evaluation of Descriptors</title>
      <p>In order to build the knowledge-base, preliminary steps must be done. First of all,
it is necessary to obtain a local copy of the Wikimedia Commons image dataset
so we can keep IMGpedia updated in the future. To do so, Wikimedia provides
an rsync-protocol server for downloading the 21TB of images of the dataset.
Currently we have downloaded about 19TB at an average speed of 550GB per
day, corresponding to 14 million images downloaded from a total of 16 million.
Each image is stored locally, along with its four descriptors.</p>
      <p>We benchmarked the computation of descriptors in order to know the time such
extraction will take. Experiments were performed on a machine with Debian 4.1.1,
a 2.2 GHz (24 core) Intel(r) Xeon processor, and 120 GB of RAM. We computed the
descriptors for a sub-folder of the Wikimedia Commons dataset, containing 57,377
images. The process is comprised of three steps: read the image and load it into
memory, extract the descriptor itself and save the vector on disk. Two experiments
were performed, one using a single execution thread and the other using 28 threads,
where each thread handles an image at a time. The neural network for DeCAF7
was used in CPU mode (rather than GPU mode).</p>
      <p>Results are shown in Table 1, where we see that DeCAF7 is the most expensive
descriptor to run. While we can see an improvement of one order of magnitude in
calculation time for the rst three descriptors, DeCAF7 does not bene t: the Ca e
implementation for neural networks already uses multithreading for its computation
on a single image, so assigning di erent images to di erent threads gives a slight
overhead since multiple cores are already being exploited. On the present hardware,
we estimate it would take 1.8 years to run DeCAF7 over all 16 million images;
hence, we will have to select a subset of images on which to run this descriptor.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Next Steps</title>
      <p>Linking IMGpedia with DBpedia, Wikidata, etc. To complete our
knowledgebase and be able to infer new relations, we must combine these visual descriptors
with the semantic entities of DBpedia and Wikidata. For this, we must extract
the links between the articles and the images that they use since they are not present
in DBpedia Commons and only present in DBpedia for some images. This is an
ongoing e ort in which we are researching the possible options to perform this task:
we hope to exploit the Wikimedia dumps, either to derive this information directly,
or to parse it from the articles themselves. Subsequent to this, we wish to investigate
using descriptor-based user-de ned-functions in SPARQL to enable visuo-semantic
queries that combine, e.g., image similarity with knowledge-base queries.
E cient algorithms for nding relations between similar images To facilitate
querying IMGpedia, we propose to compute static relationships between images; e.g., if
two images have a descriptor distance that tends to 0, then we can state that those
images are nearCopy of each other. Other distance-based relations can be computed
based on other xed thresholds, such as similar. The brute-force approach for
nding all pairs of images within a given distance takes quadratic comparisons. Thus,
the challenge is to do this e ciently, where we propose to explore: building an index
structure for similarity search over the dataset; using approximate similarity search
algorithms; using self-similarity join techniques; and so forth.</p>
      <p>
        Labelling images for multimedia applications By linking IMGpedia to existing
knowledge-bases, we hope to label images with categories, types, entities, etc. Thus
IMGpedia could serve as a useful resource for the multimedia community; e.g.,
since we are using Ca e framework for neural networks to compute DeCAF7 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we
could train a network using the Dbpedia Commons categories extracted for each
image as labels, allowing us to automatically classify new images in the dataset or
benchmark performance against other classi ers. We could also label images with
the speci c entities they contain using DBpedia/Wikidata based on the article(s)
in which it appears: we could need to tread careful since, e.g., an image used on
an article of a person may not contain that person, but we could consider some
heuristics such as the location of the image and some further content analysis.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this short paper, we have given updates of our ongoing work on IMGpedia. We
benchmarked four image descriptors for which reference implementations are made
public. We discussed a number of open engineering challenges, as well as some open
research questions we would like to study once the knowledge-base is completed.
Acknowledgments This work was supported by the Millennium Nucleus Center for
Semantic Web Research, Grant NC120004, and Fondecyt, Grant 11140900. We
thank Camila Faundez for her assistance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bustos</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>IMGpedia: a proposal to enrich DBpedia with image meta-data</article-title>
          .
          <source>In: AMW</source>
          . vol.
          <volume>1378</volume>
          , pp.
          <volume>35</volume>
          {
          <fpage>39</fpage>
          .
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            <given-names>man</given-names>
          </string-name>
          , J., Zhang, N.,
          <string-name>
            <surname>Tzeng</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Decaf: A deep convolutional activation feature for generic visual recognition</article-title>
          .
          <source>arXiv preprint arXiv:1310.1531</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isele</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jentzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morsey</surname>
            , M., van Kleef,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Manjunath</surname>
            ,
            <given-names>B.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohm</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Color and texture descriptors</article-title>
          .
          <source>IEEE Trans. on Circuits and Systems for Video Tech</source>
          .
          <volume>11</volume>
          (
          <issue>6</issue>
          ),
          <volume>703</volume>
          {
          <fpage>715</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zezula</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Large-scale image retrieval using neural net descriptors</article-title>
          .
          <source>In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <volume>1039</volume>
          {
          <fpage>1040</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sobel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feldman</surname>
          </string-name>
          , G.:
          <article-title>A 3 3 isotropic gradient operator for image processing</article-title>
          .
          <source>Stanford Arti cial Project</source>
          (
          <year>1968</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Vaidya</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dbpedia commons: Structured multimedia metadata from the wikimedia commons</article-title>
          .
          <source>In: The Semantic WebISWC</source>
          <year>2015</year>
          , pp.
          <volume>281</volume>
          {
          <fpage>289</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          .
          <source>Comm. ACM</source>
          <volume>57</volume>
          ,
          <issue>78</issue>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>