IMGpedia: A Proposal to Enrich DBpedia
               with Image Meta-Data

                        Benjamin Bustos and Aidan Hogan

                          Center for Semantic Web Research
                          Department of Computer Science
                                  University of Chile
                         {bebustos,ahogan}@dcc.uchile.cl


        Abstract. We introduce IMGpedia: a research proposal aiming to bridge
        structured knowledge-bases and multimedia content. Our concrete plan
        is to enrich DBpedia data with further metadata about images from
        Wikipedia, including content-based visual descriptors. Our concrete goal
        is to create a unified querying and inference system that allows for
        interrogating the DBpedia knowledge-base and the visual content of
        Wikipedia’s images together. Our broader ambition is to explore meth-
        ods by which multimedia data can be made a first-class citizen of the
        Semantic Web.


1     Introduction

DBpedia [1] is an ongoing effort by the Linked Data community to extract
structured content from Wikipedia and represent it in RDF. The main goal is
to enable users to query the content of Wikipedia as a whole, getting direct
answers automatically aggregated from multiple articles. The most recent ver-
sion of DBpedia contains billions of facts extracted from 125 language versions
of Wikipedia, with links to and from dozens of external datasets. Over the
past seven years, it has become the central dataset of the Linked Open Data
community [5].
    However, DBpedia mainly focuses on extracting information from Wikipedia’s
info-boxes: attribute–value panes that appear on the right-hand side of articles.
As such, aside from adding links, DBpedia ignores the images appearing in the
body of the article for a given entity as well as the structured data available in
image pages: no meta-data are extracted for images. Like many initiatives in the
Semantic Web1 , DBpedia links to but otherwise disregards multimedia content.
    Our proposal is thus to extract and associate meta-data from the images em-
bedded in Wikipedia and link the resulting corpus with the DBpedia dataset.
This dataset – which we call IMGpedia – would consider all images in an arti-
cle, all meta-data associated with the image available from Wikimedia (author,
date, size, etc.) and visual descriptors that capture the content of the image
itself.
1
    But not all: see, e.g., http://www.w3.org/2005/Incubator/mmsem/
    We are motivated by the idea of creating a corpus that allows for querying,
in unison, both the structured/semantic meta-data of DBpedia and the visual
content extracted from images; e.g., “give me Europe cathedrals that have an im-
age visually similar to one of the external images for Cusco Cathedral in Peru”.
Likewise, we foresee the possibility of inferring new links from this dataset, e.g.,
inferring that Saddam Hussein and Donald Rumsfeld have met based on being
associated with the same image (in which they are co-depicted). The resulting
corpus may also serve as an interesting experimental dataset for the image-
processing community, where the structured data associated with images may
serve as a ground truth.


2   Images and Visual Descriptors

Before describing IMGpedia, we need to introduce some basic concepts about
how images are encoded and what are visual descriptors. An image is a matrix of
so-called pixels (picture elements). A pixel contains information about its color,
which can be displayed for example on a computer monitor. There are several
ways to encode the color information of a pixel, which depends on the selection of
a color space. Common color spaces are RGB (red-green-blue, used by computer
monitors) and CMYK (cyan-magenta-blue-black, used by printers), where colors
are represented as tuples of numbers; for example, an RGB color is represented
by a three tuple. There are several ways to compress the image encoding, mainly
lossy compression methods (e.g., JPEG format) and lossless methods (e.g., PNG
format).
    A visual descriptor is a way of characterizing an image based on its content.
This can be done considering the whole image (global descriptor) or regions of
interest detected on the image (local descriptors). For this work, we will initially
focus on global descriptors since they can be computed more efficiently than
local descriptors, and likewise similarity between them is also more efficient to
compute.
    Visual descriptors can be defined in several ways; e.g., based on the colours,
texture and/or shape of the image. They do not include any semantic information
about what appears on the image—hence why they are also called “low-level
features”. For instance, a simple colour descriptor is the colour histogram [2], that
captures the distribution of colour in the image. We note that visual descriptors
are usually vectors of high dimensionality (tens to hundreds of real values).
    Visual descriptors allow us to implement, e.g., content-based similarity search.
A similarity query in an image data set returns the most similar images, accord-
ing to its content, to a given one (the query image). This is also known as
query-by-example. Formally, let U be the universe of all images, let S ∈ U be
the image data set, and let δ : U × U → R+ be a function (the distance) that
returns how dissimilar are two images. There are two basic types of similarity
queries: (1) Range query: given the query image q ∈ U and a tolerance radius
r ∈ R+ , return all images from S that are within distance r to q; (2) Top-k
query: return the k-closest objects to q. If S is formed by all visual descriptors
(high-dimensional vectors) extracted from the images in the data set, and if q is
the visual descriptor of the query image, and if δ is any metric function (e.g., the
Euclidean distance), it is relatively straightforward to implement content-based
range and top-k queries over S.


3   IMGpedia Dataset
Our vision of IMGpedia is an enhanced version of DBpedia with image en-
tities. An image entity contains metadata (e.g., title, subject, source, format,
description, date, size, location, etc.) and content-based descriptors (e.g., colour
descriptor) of the image. Image entities can be linked with other entities (not
necessarily images).
    For creating the IMGpedia dataset itself, we propose the following proce-
dure:

 – Locate and download images/image-pages from Wikimedia.
 – Extract meta-data from the image page, including its size, author, licence,
   etc. Annotate images with tags computed from its (possibly many) cap-
   tions [4].
 – Compute the visual descriptors for the images. For this, we can use global vi-
   sual descriptors like colour and edge [2], following the MPEG-7 standard [3].
 – Create the image entities using the extracted metadata and content-based
   data.
 – Represent and publish the IMGpedia dataset as Linked Data.


4   Querying IMGpedia
Our main research goal is to investigate methods by which semantic data (in
this case DBpedia) and multimedia data (in this case describing Wikipedia
images) can be combined such that they can be queried in a holistic manner. In
the context of IMGpedia, our approach is divided into three main parts: mate-
rialising links between image resources, extending SPARQL to execute content-
based analysis at runtime, and inferring new links between “primary entities”
based on image data.

Materialising relations between images using content-based descriptors. Low-
level descriptors do not contain any semantic information about the original
image, making them hard for users to leverage in queries. This problem is known
as the semantic gap [6]. However, high-level relations among image entities can
be computed from visual descriptors and similarity queries. For example, the
relation near-copy can be defined as two different images with distance δ less
than some threshold τ . By using range queries, it would be easy to find all pairs
of near-copies among the images. Other relevant relations that can be considered
are alt-size, contains and similar. These could also be materialised as triples
and added to the structured knowledge-base, with appropriate inference – e.g.,
for symmetry, reflexivity or subsumption of relations – allowing users to specify
SPARQL queries such as:

SELECT ?usPolitician WHERE {
  db:Saddam_Hussein foaf:depiction ?img1 .
  ?usPolitician dbo:party db:Republican_Party_(US) ;
     foaf:depiction ?img2 .
  ?img1 i:nearCopy ?img2 .
}


Extend SPARQL with functions for content-based image search. Not all content-
based user requirements can be anticipated in the form of discrete relations.
Hence we propose to extend SPARQL to include content-based analysis features.
More specifically, we propose to use extensible functions in SPARQL and custom
datatypes to enable queries that combine querying of semantic content and image
content. Taking the introductory example, let’s say that the user wishes to find
cathedrals in Europe with similar images to external images of Cusco Cathedral
in Lima:
SELECT ?cathedral ?sim WHERE {
  db:Cusco_Cathedral foaf:depiction ?img1 .
  FILTER(i:colorRatio(?img1,i:rgb(40,100,150),i:rgb(170,200,255)) > 0.2)
  ?eurCathedral rdf:type dbo:ReligiousBulding ;
    dbo:location [ dcterms:subject dbc:Countries_in_Europe ] ;
    foaf:depiction ?img2 .
  BIND(i:sim(?img1,?img2) as ?sim) FILTER(?sim > 0.7)
} ORDER BY ?sim

    The first FILTER uses extended functions to only consider images that have
more than 20% of their pixels falling within the cuboid of colours bounded by the
two RGB points (looking for blue sky). The subsequent BIND and FILTER allow
the images from European buildings to be filtered and ordered by similarity.
    A major challenge here is balancing expressivity and efficiency. In the above
case, given a reasonable query plan, the first filter can be applied over the six
images appearing in the Cusco Cathedral article, but then all images of all
religious buildings in Europe need to be compared with the images that pass
the first step. In order to improve the performance of queries, we propose to
investigate the use of image indexing techniques that allow for such filters to be
executed a lookup, rather than a post-filter, which should lead to more options
for query planning. For example, in the query above, a more efficient query plan
may try to bind values for ?img2 using a similarity range query (over values
bound for ?img1) allowing for a join to be computed with the knowledge-base
rather than applying a brute-force similarity filter over bindings produced by
the knowledge-base for ?img2.
    We see this as being one of the deepest technical challenges posed by the work:
creating cost models and query plans that combine indexes over the knowledge-
base and multimedia content appears to be a challenging but general problem.
Content-based-driven knowledge discovery. A more speculative idea is to infer
new knowledge about the data using the images entities and their relations.
For example, say that two DBpedia resources are associated with the same
(near-copy of an) image. If both resources are of type dbo:Person, the relation
hasMet could be inferred. If one resource was a dbo:Person and the other was
a dbo:Place, the relation hasVisited could be inferred. Such inferences could
be axiomatised as domain-specific rules. Of course, the resulting inferences may
not always be crisp conclusions, but may be associated with a confidence value.


5    Conclusions
In this short paper, we have introduced and motivated IMGpedia: a proposal
to enrich DBpedia with meta-data extracted from Wikipedia images. We view
IMGpedia as a concrete use-case through which to investigate the challenges
and opportunities of combining semantic knowledge-bases with multimedia con-
tent.

Acknowledgements This work was supported by the Millennium Nucleus Center
for Semantic Web Research, Grant № NC120004, and Fondecyt, Grant № 11140900.


References
1. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hell-
   mann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - A Large-scale,
   Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal,
   2014.
2. B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada. Color and texture
   descriptors. IEEE Transactions on Circuits and Systems for Video Technology,
   11(6):703–715, 2001.
3. MPEG-7 Overview. URL: http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-
   7.htm (accessed: 2015–01–29), 2015.
4. S. Noah, D. Ali, A. Alhadi, and J. Kassim. Going Beyond the Surrounding Text to
   Semantically Annotate and Search Digital Images. In Intelligent Information and
   Database Systems, pages 169–179. 2010.
5. M. Schmachtenberg, C. Bizer, A. Jentzsch, and R. Cyganiak. Linking Open Data
   Cloud Diagram 2014. http://lod-cloud.net/; l.a. 2015/01/30.
6. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-
   based image retrieval at the end of the early years. IEEE Transactions on Pattern
   Analysis and Machine Intelligence, 22(12):1349–1380, 2000.