Proceedings of the first workshop on
  DownScaling the Semantic Web
        (DownScale2012)

Hosted at ESWC2012, May 28, Heraklion, Crete, Greece


                                                       Editors
                                            Christophe Guéret
                                             Stefan Schlobach
                                                Florent Pigout
Program Committee

Anna Bon                  VU University Amsterdam
Stephane Boyera           World Wide Web Foundation
Philippe Cudré-Mauroux   University of Fribourg
Mathieu D’Aquin           Knowledge Media Institute, the Open University
Jérôme David            INRIA Rhône-Alpes
Victor De Boer            VU Amsterdam
Christophe Guéret        Vrije Universiteit Amsterdam
Florent Pigout            OLPC France
Mike Powell               IKM Emergent
Louiqa Raschid            University of Maryland
Laurens Rietveld          Vrije Universiteit Amsterdam
Stefan Schlobach          Vrije Universiteit Amsterdam
Elena Simperl             KIT
Andreas Thor              Department of Computer Science, University of
                          Leipzig
Sameer Verma              San Francisco State University
Maria Esther Vidal        Universidad Simon Bolivar, Dept. Computer Science
Table of Contents

Tackling a DISASTER using semantic technologies . . . . . . . . . . . . . . . . . . . .                      1
   Guillermo González-Moriyón, Emilio Rubiera, Marcos Sacristán and
   Javier Collado
Linked data from your pocket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     6
   Jérôme David, Jérôme Euzenat and Maria-Elena Rosoiu
The Web of Radios - Introducing African Community Radio as an
interface to the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   14
    Anna Bon, Victor de Boer, Pieter De Leenheer, Chris Van Aart, Nana
    Gyan and Hans Akkermans

Voice-based Access to Linked Market Data in the Sahel . . . . . . . . . . . . . . . .                        23
   Victor De Boer, Nana Baah Gyan, Anna Bon, Pieter De Leenheer,
   Chris van Aart and Hans Akkermans
Semantic Web in a constrained Environment . . . . . . . . . . . . . . . . . . . . . . . . .                  31
   Laurens Rietveld and Stefan Schlobach

Information Ecosystems and International Development . . . . . . . . . . . . . . . .                         39
    Mike Powell
Downscaling Entity Registries for Ad-Hoc Environments . . . . . . . . . . . . . . .                          40
  Philippe Cudré-Mauroux, Gianluca Demartini, Iliya Enchev, Christophe
  Gueret and Benoit Perroud
         Tackling a DISASTER using semantic
                     technologies

Guillermo González-Moriyón1 , Emilio Rubiera1 , Marcos Sacristán2 , and Javier
                                   Collado2
                                 1
                                  Fundación CTIC
                              Gijón, Asturias, Spain
                       {name.surname}@fundacionctic.org
                        http://www.fundacionctic.org/
                                    2
                                      Treelogic
                             Llanera, Asturias, Spain
                         {name.surname}@treelogic.com
                          http://www.treelogic.com/


      Abstract. In the event of a disaster, coordination of emergency respon-
      ders is challenging due to the diversity of support systems in use. Work is
      now being done to tackle the problem leveraging semantic technologies.
      The chosen approach focuses on the interoperability between Emergency
      Management Systems (EMSs) via data mediation and a reference ontol-
      ogy. This paper introduces the DISASTER FP7 project and outlines its
      main activities planned for the upcoming years.

      Keywords: emergency management systems, ontology, data mediation,
      ontology modularization, data sharing


1   Introduction
Near 100,000 people died in Europe in the first decade of this century due to
natural and industrial disasters [6]. Transport accidents and terrorism also add
casualities to this figure. Moreover, material losses ascend to 150 billion Euro
from natural disasters alone in the same decade. Disasters such as L’Aquila,
Eyjafjallajökull, Prestige and Chernobyl will remain carved in the collective
European consciousness.
    In the current era, developed countries have a number of bodies that re-
spond to emergencies in order to minimize casualities and economic loss. First
responders include fire brigades, police, hospitals and military forces. An effec-
tive response depends on their ability to quickly take decisions based on accurate
and reliable data. Emergency Management Systems (EMS) are information tools
supporting resolutive actions against the clock.
    Nowadays Europe showcases a well-spread heterogeneity of EMSs depend-
ing on the stakeholder “color” (emergency jargon to identify first responders)
and the administrative division (city, region, etc.) where they operate. Political
boundaries and compartimentation of functional competences are a hindrance
Fig. 1. First responders of different “colors” react to a forest fire on the border between
two European countries.


for fluid communication and information exchange between EMSs. This frag-
mentation has evolved with time for cultural and historic reasons, and it is an
unavoidable part of the European reality. The complexity of some emergency
situations requires the participation of a variety of first responders who depend
crucially on their ability to effectively exchange data (Figure 1).


2    Requirements for effective operational data exchange

At the event of a disaster many entities are involved. Some may be affected
by the disaster, others may need to take contigence actions, while others may
be accidental watchers of the incident. Thus, each entity can provide a partial
version of the total picture. The objective is to make the most complete picture
available to decision makers by putting together the partial pictures coming
from the different parts involved. Due to the decentralization of the information
scenario a number of conditions must be fulfilled:

 – It must be possible to gather and to manage heterogeneous information from
   many available sources. This information includes an appropriate description
   of the disaster event in terms of both quality and quantity. In addition, the
   description of the scenario must be enriched with contextual and environ-
   mental data, e.g., infrastructures, populated areas, weather forecast... This
   harvest process often faces challenges due to the incompleteness and incor-
   rectness of the data.
 – The description of the scenario must be shared and agreed by the stake-
   holders in real time. A common picture facilitates team coordination and
   synchronization of response operations, for instance, to know in real time
   which areas have been already evacuated.
 – Information overload must be prevented. Decisions are difficult to make when
   dealing with an overwhelming volume of heterogeneous data. Filtering infor-
   mation in terms of relevance is crucial. Moreover, the concept of relevance is
   subjective: each actor is interested in a different subset of the information.
 – Information pieces must be referenced with respect to both geographical and
   temporal coordinates. To gain an insight of the situation, it helps to have
   map interfaces and events displayed in sequence.
 – Message oriented communications among different responders must be ef-
   fective regardless of cultural and technical differences. Due to the diversity
   at each end of the channel, many problems arise including divergences of
   communication protocols, data formats or information models. In addition
   to the core message, the sender may decide to include additional information
   that enriches the message and increases its usefulness to the receiver.
 – When integrating information from various sources and communication break-
   down occurs, previously exchanged information might still be valuable. Not
   all data has the same expiration date: topographic information will still be
   valid, whereas current positioning of units deployed is dynamic and poten-
   tially untrustable afterwards.

    The aforementioned conditions match some of the research topics tackled by
the Semantic Web, e.g., information quality, data and model sharing, or data
enrichment. These matchings suggest that a potential solution to the EMS inter-
operability problem may reuse the findings from the Semantic Web community.
Moreover, the work on Semantic Web has demonstrated the value of adopt-
ing open standards to reuse vocabularies and exchange data between decoupled
systems in the absence of a central authority.


3   A novel approach to disasters

DISASTER (Data Interoperability Solutions at STakeholders Emergencies Re-
action) is a collaborative European project including software vendors, research
organizations and specialists in emergency response. The goal of DISASTER
is to ease communication among existing EMSs considering the requirements
aforementioned.
    DISASTER project respects EMSs diversity in Europe. No “one size fits all”
central EMS or unique exchange language is proposed. Instead, DISASTER relies
on a shared reference ontology as well as on mediation techniques. As depicted
on Figure 2, semantic technologies will make possible for any EMS to exchange
information and to query external sources including Geographical Information
Systems and the Linked Open Data cloud, and of course, other EMSs.
    When it comes to mutual understanding, the first step is to agree on shared
baseline knowledge. Under our approach, this knowledge is modelled with ontolo-
gies. Ontologies have been proved to solve common understanding problems [2].
Fig. 2. DISASTER proposes data mediation based on a reference ontology to enable
data exchange between existing EMSs and other systems.


Moreover, the use of ontologies to specify knowledge in the domain of emergen-
cies have already been discussed in a number of studies [1, 3]. Despite this body of
previous work, in 2011 the European Commission identified the need to develop
an ontology shared by all stakeholders (FP7 research topic SEC-2011.5.3-2).


4   Prospective work

DISASTER has commenced in February 2012 and will run until 2015. The
project has started by gathering requirements about different aspects: from first
responders operative requirements to linguistic and cultural requirements.
    DISASTER proposes data mediation to tackle communication problems among
EMSs. Efforts are planned to bring to practice the theoretical results of the re-
search community on data mediation. Actual interoperability among EMSs poses
a number of challenges regarding several dimensions of the problem: communi-
cation protocols, data models and data formats. A workable solution typically
involves all these aspects.
    The main result of the project will be the DISASTER ontology. The cre-
ation of this ontology will follow a modular approach: the core module will be
based on upper level ontologies such as DOLCE [4] or SUMO [5]. This core will
be complemented with transversal modules giving support for representation of
temporal and spatial descriptions. These vertical modules are vital to express
contextual information, and offer a chance to plug-in existing domain ontolo-
gies and taxonomies. Finally, vertical modules will extend the base functionality
to different domains at the levels of both disaster description and stakeholder
resources. Although a modular design brings a number of difficulties at design
time, this methodology allows further extension to fit specific scenarios in the
future. It is expected that at the end of the DISASTER project, a stable version
of the ontology (as universal as possible) will be produced, leaving it feasible to
store a local copy for each emergency system. By means of this ontology, me-
diation can take place in order to consume external data before any connection
breakdown and under offline circumstances.
    The ontology will serve to combine heterogeneous information coming from
diverse sources. The project will study how to coherently assemble an agreed dis-
aster scenario description. Moreover, map-based visualization paradigms will be
explored to merge context information with operational data and user-oriented
mechanisms will be defined to filter relevant subsets of information. Devices
supporting on site operations should be taken into account. Responders are of-
ten equipped with mobile devices providing access to the information. However,
connectivity issues must be addressed as connection cannot be ensured.
    Finally, in order to ensure the DISASTER solution is grounded on reality,
validation will be carried out. At this early stage of the project the details of the
final project scenario are still a draft. However, one of the possibilities involves
a (simulated) disaster at a large international European airport with a transna-
tional component featuring stakeholders of different kinds and nationalities.


Acknowledgements
The research leading to these results has received funding from the European
Union Seventh Framework Programme (FP7/2007-2013) under grant agreement
n 285069.


References
1. Z. Fan and S. Zlatanova. Exploring ontology potential in emergency manage-
   ment. Proceedings of the Gi4DM conference geomatics for disaster management
   2010, 179(13):e18, 2010.
2. D. Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic
   Commerce, volume 48. Springer, 2003.
3. X. Li, G. Liu, A. Ling, J. Zhan, N. An, L. Li, and Y. Sha. Building a Practical
   Ontology for Emergency Response Systems. International Conference on Computer
   Science and Software Engineering, pages 222–225, 2008.
4. C. Masolo, S. Borgo, A. Gangemi, N. Guarino, A. Oltramari, and L. Schneider.
   WonderWeb Deliverable D17. The WonderWeb Library of Foundational Ontologies
   and the DOLCE ontology, 2002.
5. I. Niles and A. Pease. Towards a standard upper ontology. Proceedings of the
   international conference on Formal Ontology in Information Systems FOIS 01,
   2001(3):2–9, 2001.
6. E. E. A. Technical. Mapping the impacts of natural hazards and technological
   accidents in Europe An overview of the last decade. Technical Report 13, European
   Environmental Agency (EEA), 2010.
                     Linked data from your pocket

                  Jérôme David, Jérôme Euzenat, Maria-Elena Roşoiu

                         INRIA & Pierre-Mendès-France University
                                     Grenoble, France
                   {Jerome.David,Jerome.Euzenat,Maria.Rosoiu}@inria.fr


       Abstract. The paper describes a lightweight general purpose RDF framework
       for Android. It allows to deal uniformly with RDF, whether it comes from the web
       or from applications inside the device. It extends the Android content provider
       framework and introduces a transparent URI dereferencing scheme allowing for
       exposing device content as linked data.


1   Introduction

Smartphones are becoming the main personal information repositories. Unfortunately,
this information is stored in independent silos managed by applications and thus, it is
difficult to share data across them. Nowadays, mobile operating systems, such as An-
droid, deliver solutions in order to overcome this, but they are limited by the application
database schemas that must be known beforehand.
    The difficulty to share phone data at the web scale can be seen as another drawback.
One can synchronize application data, such as the contacts or the agenda using a Google
account. However, they are not generic solutions and there it is no mean to give access
to data straight from the phone.
    Our goal is to provide applications with a generic layer for data delivery in RDF. Us-
ing this solution, applications can exploit device information in an uniform way without
knowing from the beginning application schemas. This information can also be exposed
to the web and web information can be considered in the same uniform manner. More-
over, we propose to do it along the linked data principles (provide RDF, describe in
ontologies, use URIs, links to other sources).
    For example, in the future, this application could be used as a personal assistant.
When one would like to know which of his contacts will participate to a scientific
conference, he can query the calendar of all his contacts in order to retrieve the answer.
For sure, according to the security settings of the corresponding contact, he may be
allowed or not to access the calendar.
    The mobile device information can as well be accessed remotely, from any web
browser, by any persons who has granted the access to it. In this case, acts like a web
server.
    We presented a first version of the RDF content provider in [2]. This layer, built on
top of the Android content provider, allowed to share application data inside the phone.
In this paper, we extend the previous version by adding capabilities to access external
RDF data and to share application data as linked data on the web.
    We first describe the context in which the Android Platform stores its data, and how
it can be extended in order to integrate RDF. Then, we present two applications that
sustain its feasibility: the first one is an RDF browser that acts like a linked data client
and allows the navigation through the device information, and the second one is an RDF
server which exposes its information to the outside world. We continue to present the
challenges raised by such applications and solutions we implemented for them. Finally,
we conclude presenting future improvements and challenges in this field.


2   Android Content Providers
Inside the Android system, each application runs in isolation from other applications.
This Linux-based operating system assigns to each application a different and unique
user. Only this user is granted access to the application files. This allows one to take ad-
vantage of a secure environment, but prevents the exchange of data across applications.
To overcome this drawback, Android provides the content provider mechanism.
    Content providers enable the transfer of structured data between device applica-
tions. They encapsulate the data and control the access to it through an interface. This
interface empowers one to query the data or to modify it ([4], [3]).
    A content provider is a subclass of ContentProvider and implements the follow-
ing interface:
Cursor query( Uri id, String[] proj, String select, String[] selectArgs, String orderBy )
Uri insert( Uri id, ContentValues colValueList)
int update( Uri id, ContentValues colValueList, String select, String[] selectArgs )
int delete( Uri id, String select, String[] selectArgs )
String getType( Uri id ) .

With the content provider API, each data (table or individual) is identified by a URI
having the following structure:
content://authority/path/to/data

The content: scheme is the cornerstone of each Content Provider URI, the
authority identifies the provider, i.e., the dataset, and the path/to/data iden-
tifies a particular table or individual (row) in the dataset. For example, the URI
content://contacts/people refers to all the people in the contact application, and
the URI content://contacts/people/33 identifies a specific instance of these,
namely the instance having the id 33.
     When an application wants to access a particular piece of data, it queries its URI.
This is done through a call to the ContentResolver which is able to route the query
to the right content provider.
     From a semantic web point of view, using URIs to identify data is a strong point of
the Android content providers. Still, there are several limitations if we would like to use
them as a linked data interface.
     Specifically, URIs used by content providers are local to each device, i.e., not deref-
erenceable on the web, and not unique. The content scheme used by providers is not a
standard protocol. Furthermore, two distinct devices will use the same URI to identify
different data. For example, by using content://contacts/people one would be
able to access the contacts from both devices.

                                             2
    Another drawback is the SQL interface of the Android content providers. The
queries are issued in an SQL manner and the results are presented to the user as a
table.


3     The RDF Content Provider Framework

We designed the RDFContentProvider framework to give a semantic web flavour to
Android and to overcome these problems. It is composed of the RDFProvider API
and the RDFContentResolver application. The API must be included inside the ap-
plications that want to access RDF providers and inside the applications that want to
define new RDF content providers. The RDFContentResolver application is the one
that records all the RDF content providers installed on the device and routes queries to
the relevant provider. Figure 1 gives an overview of the framework architecture.


                                                                                             uri         rdf              Internet
                                                                                                                            Phone
                                                    RDFBrowser                                    RDFServer


                                                            uri         rdf            uri
                                                                                                   rdf
                                                                      RDFContentResolver

                                                                                                   rdf
                                             uri            uri
                                                                                  rdf


                                                                  rdf                        uri
                                                                           uri


                                                rdf
            RDFHttpContentProvider   RDFPhoneSensorsContentProvider        RDFCalendarProvider           RDFContactProvider
                                                                                         tuples


                                                                                                                      tuples
                                                                               query


                                                                                                              query


                uri      rdf                                                     Calendar                       Contact


Fig. 1. The architecture components and the communication between them. Components with
double square have a graphic user interface.


3.1    The RDF Provider API

The RDFProvider API delivers the following classes and interfaces:

    – RdfContentProvider: An abstract class that should be extended if one wants to
      create an RDF content provider. In fact, it subclasses the ContentProvider class
      belonging to the Android framework;
    – RdfContentResolverProxy: A proxy used by applications to send queries to the
      RDFContentResolver application;
    – Statement: A class used for representing an RDF statement;
    – RdfCursor: An iterator on a set of RDF statements;


                                                                  3
 – RdfContentProviderWrapper: A subclass of RdfContentProvider which
   allows for adding RDF content provider capabilities to an existing classical con-
   tent provider.

RDFContentProvider follows primarily the same kind of interface as
ContentProvider. The minimal interface to implement linked data applications is:

 – RDFCursor getRdf( Uri id )

The Cursor iterates on a table of subject-object-predicate (or object-predicate) which are
the triples involving the object given as a URI. If one wants to offer a more elaborate
semantic web interface, i.e., a minimal SPARQL endpoint, the following methods have
to be also implemented:

 – Uri[] getTypes( Uri id ): returns the RDF types of a local URI;
 – Uri[] getOntologies(): ontologies used by the provider;
 – Uri[] getQueryEntities(): classes and relation that the provider can deliver;
 – Cursor query( SparqlQuery query ): returns results tuple;
 – Cursor getQueries(): triple patterns that the provider can answer.

The RDF providers that we have developed so far are implementing only the first three
primitives.


3.2   The RDF Content Resolver Service

The RDFContentResolver service has the same goal as the ContentResolver be-
longing to the Android framework. It maintains the list of all the installed RDF content
providers, and forwards the queries it receives to the corresponding one. This applica-
tion is never visible to the user, therefore we have implemented it as an Android service.
    When an RDF Content Provider is instantiated by the system, this provider auto-
matically registers to the RDFContentResolver. A principle similar to the one from
the Android Content Provider framework is used.
    The RDFContentResolver can route both the local (content:) and external
(http:) URI-based queries. In case of a local URI, i.e., starting with the content
scheme, the resolver decides to which provider it must redirect the query. In case of an
external URI, i.e., starting with the http scheme, the provider automatically routes the
query to the RDFHttpContentProvider (see Figure 1).
    The RDFHttpContentProvider allows one to retrieve RDF data from the Web.
It parses RDF documents and presents them as RDFCursors. So far, only the minimal
interface has been implemented, i.e., the getRdf( Uri id ) method.


3.3   RDF Providers for Address Book, Calendar and the Phone Sensors

The RDF Content Resolver application is also bundled with several RDF content
providers encapsulating the access to Android predefined providers. The Android
framework has applications that can manage the address book and the agenda. These
two applications store their data inside their own content provider.

                                            4
    In order to expose this data as RDF, we developed the RDFContactProvider
and the RDFCalendarProvider. These providers are wrapper classes for the Con-
tactProvider and the CalendarProvider residing inside the Android framework.
    RDFContactProvider exposes contact data using the FOAF ontology. It provides
data about a person’s name (display name, given name, family name), about its phone
number, email address, instant messenger identifiers, homepage and notes.
    RDFCalendarProvider provides access to the Android calendar using the RDF
Calendar ontology1 . The data supplied by this provider contains information about
events, their location, their date (starting date, ending date, duration, and event time
zone), the organizer of the event and a short description.
    RDFPhoneSensorsContentProvider aims to expose sensor data from the sen-
sors embedded inside the mobile device. Contrary to the others, they are not offered
as Content Providers. At the present time, it only delivers the geographical position
(retrieved using the Android LocationManager service). In order to express this infor-
mation in RDF, we use the geo location vocabulary2 , the one that provides a namespace
for representing lat(itude) and long(itude).


4     RDF Browser

The RDF Browser acts like a linked data client. Given a URI, the browser makes an
HTTP URI request in order to retrieve the information from the specified location. If
the data contains other URIs, the user can click on them and the browser will issue a
new query with this URI.
    An example can be found in Figure 2. In this case, the user uses the RDFBrowser
to get the information about the contact having the id 4. When the browser receives
the request, it sends it further to the RDFContentResolver. Since the URI starts with
the content:// scheme and has the com.android.contacts authority, the resolver
routes the query to the RDFContactProvider. This provider retrieves the set of triples
describing the contact and sends it to the calling application which displays it to the
user. Thereupon, the user decides that he wants to continue browsing and selects the
contact’s homepage. In this case, since the URI starts with the http:// scheme, the
resolver routes the query to the RDFHttpContentProvider. The same process repeats
and the user can see the remote requested file, i.e., Tim Berners-Lee FOAF file.


5     RDF Server

The RDF Server is a new component added to the architecture. This server provides
to the outside world the data stored into the device as RDF. Due to the fact that the
server must maintain a permanent connection to the Internet without user interaction,
we implemented it as an Android service, i.e., a background process.
    One important issue appears when one would like to get data from a device because
the URI used to query the content providers has a local meaning. In the outside world,
 1
     RDF Calendar vocabulary: http://www.w3.org/TR/rdfcal/.
 2
     Geo location vocabulary: http://www.w3.org/2003/01/geo/ .


                                           5
                       Fig. 2. An example of using the RDF Browser.

the URI used to query the address book of two different persons will be the same, but
the content of the address book will be different.
    The server principles are quite simple. In the beginning, the server receives a request
from the outside. Then, it dereferences the requested URI, i.e., it translates the external
URI into an internal one, which has meaning inside the Android platform. The RDF
Server sends it further to the RDFContentResolver. In a manner similar to the one
explained for the RDF Browser the set of triples is obtained but, before sending this set
to the server, the URIs of the triples are externalized and the graph is serialized using a
port of Jena under the Android platform.
    The      URI      externalization      process      translates   the     local     URI
content://authority/path/to/data                   into     the    dereferenceable      one
http://deviceIPAddress:port/authority/path/to/data. Reversing the
translation of such a URI is possible since both the authority and the path are kept
during the externalization process.
    Usually, mobile devices do not have a permanent IP address and thus, the external-
ized URIs are not stable. To overcome this, a dynamic DNS client34 can be used.
    In addition, the server supports a minimal content negotiation mechanism. If one
wants to receive the data in RDF/XML, it will set the MIME types of the Accept-type
header of its request to "application/rdf+xml" or to "application/*". In the opposite case
or when the client sets the MIME type to "text/plain", the data will be transmitted in an
N-TRIPLE format. Not only the requester has the opportunity to express its preferences
regarding the format of the received data, but the default format of the transmitted data
can be specified in the server settings, as well the port on which the server can listen on
and the domain name server for it.

 3
   Dynamic DNS Client: https://market.android.com/details?id=org.l6n.
   dyndns&hl=en.
 4
   DynDNS: http://dyn.com/dns/ .


                                            6
                              Fig. 3. RDF Server response.

   An example can be found in Figure 3. In this scenario, the user retrieves information
about the fourth contact from the device address book. The request is processed by the
RDF Server in a manner similar to the one of the RDF Browser.


6    Technical Details

The RDF Server included in our architecture eases the access of the user to the RDF
data found on the web. For that purpose, we wanted to reuse an existing semantic web
framework, such as Jena or Sesame. Yet they are not suitable to be employed under
the Android platform (the code depends on some libraries that are unavailable under
Android). There are a few ports of these frameworks to Android: Microjena5 and An-
drojena6 are ports of Jena and there exists a port of Sesame to the Android platform
mentioned in [1]. We use Androjena.
    A problem that arises when we use this framework is that the size of the application
increases substantially. This problem could have been avoided by reimplementing only
the Jena modules that are needed in our architecture. Still, we would like to improve
our architecture by adding more features (such as a SPARQL query engine) that require
additional modules to those used to read/parse/write RDF, available in Jena.
    A tool that we found useful in our development process was ProGuard. ProGuard7
is a code shrinker, optimizer, and obfuscator. It removes the unused classes, methods
or variables, performs some byte-code optimizations and obfuscates the code. The tool
proved to be efficient in reducing the size of our application (our framework including
Androjena) by half, i.e., its initial size was 6.48MB, and after we applied the tool it
diminished up to 2.98MB.
    The existence of such tools as ProGuard, is a step forward in the continuous battle
between applications that require a considerable amount of space for storing their code
and devices with a reduced memory storage.
    We are currently examining how to query the device data using SPARQL. There are
two main ways of doing this:
 5
   http://poseidon.ws.dei.polimi.it/ca/?page_id=59.
 6
   http://code.google.com/p/androjena/.
 7
   http://proguard.sourceforge.net/.


                                           7
    – creating a new RDF content provider which relies on a triple store to deposit the
      data [5], and then using SPARQL to query it; or
    – translating SPARQL queries into SQL queries, and further decompose it in a form
      compatible with the ContentProvider interface.
    At the moment, we are investigating the second option. There are several available
tools that can make the translation from SPARQL to SQL, like Virtuoso or D2RQ.
However, these tools solve only half of the problem because the SQL queries have to be
adapted to the ContentProvider interface, i.e., the queries have a particular format, dif-
ferent than the SQL one. This interface allows for querying only one view of a specified
table at a time, hence it is not possible to ask Content Providers to perform joins.
    Further challenges regarding security must be taken into account. The user of the
application should be able to grant or to deny the access to its personal data. A specific
vocabulary should be used in order to achieve this 8 . More that that, the dangers of grant-
ing system access to a third-party user can be avoided by using a secure authentication
protocol 9 .
    As can be seen, there are still technical problems in implementing a full RDF frame-
work at the core of Android. Specific solutions must be developed.


7     Conclusion
Involving Android devices in the semantic web, both as consumers and providers of
data, is an interesting challenge. As mentioned, it faces the issues of size of applications
and URI dereferencing in mobility situations.
    A next step is to provide a more fine grained and structured access to data through
SPARQL querying. This promises to raise the issue of computation, and thus energy,
cost on mobile platform.
    A further issue will be the control of privacy in such a framework. But here too, we
think that semantic technologies can help.


References
1. Mathieu d’Aquin, Andriy Nikolov, and Enrico Motta. Building sparql-enabled applications
   with android devices. 2011.
2. Jérôme David and Jérôme Euzenat. Linked data from your pocket: The android rdfcontent-
   provider. In Proc. 9th demonstration track on international semantic web conference (ISWC),
   Shanghai (CN), pages 129–132, 2010.
3. Marko Gargenta. Learning Android. O’Reilly Media, Inc., 2011.
4. Reto Meier. Professional Android 2 Application Development. Wrox, 2011.
5. Danh Le Phuoc, Josiane Xavier Parreira, Vinny Reynolds, and Manfred Hauswirth. RDF
   On the Go: An RDF Storage and Query Processor for Mobile Devices. In 9th International
   Semantic Web Conference (ISWC2010), November 2010.


 8
     http://www.w3.org/wiki/WebAccessControl.
 9
     http://www.w3.org/wiki/Foaf+ssl.


                                              8
     The Web of Radios - Introducing African Community
          Radio as an interface to the Web of Data

      Anna Bon1 , Victor de Boer1 , Pieter De Leenheer1 , Chris van Aart1 , Nana Baah
      Gyan1 , Max Froumentin2 , Stephane Boyera2 , Mary Allen3 , Hans Akkermans1
                 1
                 Network Institute, VU University, Amsterdam, The Netherlands
           {a.bon, v.de.boer, pieter.de.leenheer, c.j.van.aart,
                  n.b.gyan}@vu.nl, hans.akkermans@akmc.nl
                               2
                                  World Wide Web Foundation
                       {maxf,boyera}@webfoundation.org
                  3
                    Sahel Eco, ACI 200 Rue 402, 03 BP 259, Bamako, Mali
                        mary.saheleco@afribonemali.net


          Abstract. The World Wide Web as it is currently deployed can only be accessed
          using modern client devices and graphical interfaces, within an infrastructure
          compassing datacenters and reliable, high speed Internet connections. However,
          in many regions in developing countries these conditions are absent. Many peo-
          ple living in remote rural areas in developing countries will not be able to use the
          Web, unless they can produce and consume voice-based content using alternative
          interfaces such as (2G) mobile phone, and radio. In this paper we introduce a
          radio platform, based on a use case and requirements analysis of community ra-
          dio stations in Mali. The voice-based content of this radio platform will be made
          publicly available, using Linked Data principles, and will be ready for unexpected
          re-use. It will help to bring the benefits of the Web to people who are out of reach
          of computers and the Internet.


          Keywords: community radio, voice-based interfaces, Web of Data, radio plat-
          form


1      Introduction
The World Wide Web is perfectly adapted for use by people in developed countries. It is
visual, text-based, and mainly written in English or a few other world languages 4 . The
Web depends on the availability of computers, datacenters, glass fiber backbones, fixed
and wireless networks, 3G mobile telephony and transport of large volumes of data at
high speed. In remote rural areas in many developing countries, conditions are different.
Poor infrastructure, lack of equipment, low levels of literacy, and use of under-resourced
local languages, seriously hamper the access to the Web for many people.
There is a general consensus that the global Information Society must benefit all people
in the world. The United Nations Millennium Declaration contains a commitment for
developing a people-centred, inclusive and development-oriented Information Society
 4
     Wikipedia, http://en.wikipedia.org/wiki/Global Internet usage,Global Internet Usage
so that people everywhere can create, access, utilize and share information and knowl-
edge to attain the internationally agreed development goals and objectives, including
the Millennium Development Goals.5
Yet, in many rural regions in Africa community radio is the only source of information.
People have radios at home and listen to programs broadcast in local languages every
day. Many people have access to simple voice-based (2G) mobile phone, but text mes-
saging is hardly used [1].
The availability of both mobile phone and radio is opening opportunities for new ser-
vices. E.g. radio listeners phone to the radio station and leave voice messages that they
want to have broadcast, or react to popular radio programs leaving news, opinion, re-
gional information etc. Community radio here operates as an important local informa-
tion hub, where people bring information for further dissemination.
Radio stations in rural areas in Africa operate under harsh conditions. Only the largest
and state financed radio stations have a computer and an internet connection. Due to
lack of funds many radio stations still use old-fashioned, analogue equipment, such as
tape recorders. Yet, it is in the line of expectation that more and more radio stations will
have computers and an internet connection in the coming years.
In the current situation the information broadcast by the community radio is volatile: it
is not stored and kept for later access or re-use. Radios do not have means to manage,
reuse and index this voice-based content.
In this paper we introduce a radio platform as a new interface to the Web. It enables
management of radio content in an efficient way, making it accessible and searchable,
so that it can serve a broad audience, e.g. Africans in the diaspora, who want to have
news from their home villages6
Additional, the voice-based radio content on this radio platform might be linked to other
data sources on the Web, enabling community radios in Africa to become an interface
to the Web of Data. An example of a system that manages market information based on
Linked Data principles and produces voice-output as broadcasts for African community
radios, is described by De Boer et al. [2]. In the future new applications providing lo-
cally relevant information from the Web of Data, such as pluviometric data, agricultural
data, market prices etc. might become available through the radio platform.

The radio platform described in this paper not only facilitates production, consump-
tion and management of voice-and web-based radio content, but it also enables access
to the Web for people who do not have a computer or the Internet.

     Contributions of this paper are:

 – A radio platform with both a web and a voice-based mobile interface that allows
   content creation, retrieval and indexing of spoken radio content.

 5
   UNMD, United Nations Millenium Declaration, General Assembly resolution 55/2. United
   Nations, New York, 2000
 6
   Communication possibilities with people living in the diaspora, are described by Serigne Man-
   sor Tall in: Les émigrés sénégalais et les nouvelles technologies de l’information et de la com-
   munication. http://www.unrisd.org
    – African community radio, introduced as a new interface to the Web of Data

    This paper is structured as follows. In section 2 we describe related work. In section
3 the architecture of the radio platform is described, the use cases collected from three
different radios in Mali, as well as the principles used to manage the content. In section
4 we describe the challenges related to the organization of the voice-based radio content.
In section 5 we discuss future work that must be done on the Web of Radios, including
the sustainability aspects.


2      Related Work

Related work on the development of a similar platform was done in the Freedom Fone7 .
Freedom Fone is a project initiated by The Kubatana Trust of Zimbabwe, a civil and
information activist platform from Zimbabwe. Freedom Fone is open source software
for creating audio content using phone. Freedom Fone provides a voice platform simi-
lar to the basic setup proposed in this paper, but without the Linked Data enabled data
management.
Research on speech recognition started in the 1930s and resulted in commercial deploy-
ments of voice-based services in the 1970s. Major achievements on language recogni-
tion, mainly for English, took place in the 1980s and 1990s and culminated in the de-
velopment of VoiceXML by the W3C Voice Browser group, in 1999, facilitating and
standardizing the development of voice applications [3].
Sheetal Agarwal et al. from IBM Research India, developed a system to enable au-
thorship of voice content for 2G phone in a web space, they named the WWTW or
World-Wide Telecom Web. The system is not connected to the Web, therefore not al-
lowing access by third party search engines. The system represents closed web space,
within the phone network. Especially the lack of open search possibility constrains its
growth [4].
     From Burundi a system has been reported [5] to use tagging software and mul-
timedia mobile data collection. The software is named EthnoCorder8 . The NGO that
co-developed this app was Help Channel Burundi. However, because of the current un-
availability of multimedia devices in the given rural context, this technical solution may
be still out of reach of community radio stations targeted in this study.
A related project on the Semantic XO and Linked Data for developing countries is
described by Guéret et al. [6]. The Semantic XO is a system that connects rugged,
low-power, low-cost robust small laptops (aka the XO promoted by the One Laptop Per
Child organization) for the empowerment of poor communities, based on Linked Data
principles in order to publish previously unpublished data.
De Boer et al. [2] describe a distributed voice- and web-based market information sys-
tem, named Radio Marché, aimed at stimulating agricultural trade in rural areas of
Africa. This system connects to regionally distributed market information systems, us-
ing Linked Data approaches.
 7
     Freedom Fone, http://www.freedomfone.org
 8
     http://www.ethnocorder.com/
3    The radio platform

The design of the radio platform described in this section is based on extensive use
case and requirements analysis, performed in Mali, with the collaboration of radio jour-
nalists from community radio stations. The research was done as part of the Foroba
Blon9 project10 , funded by the International Press Institute, and the VOICES project11 ,
partially funded by the EU, within the 7th Framework Programme. The Foroba Blon
project is aimed at supporting and promoting citizen journalism in developing coun-
tries. The VOICES project is aimed at developing innovative mobile voice services to
support users in underprivileged communities in African countries.


                  phone users


Fig. 1. Conceptual design of the radio platform as a voice-interface to the Web for people who
are out of reach of computers and the Internet, but do have phone or radio.


 9
   Foroba Blon in Bambara language refers to a large space, where everyone has the right to
   speak in front of the village chief; the truth must be told here, but only respectfully, without
   insulting anyone.
10
   Foroba Blon, Citizen Journalism: http://worldplantage.wordpress.com/2012/01/14/community-
   radio-in-tominian-and-segou-mali/
11
   VOICES: http://www.mvoices.eu
3.1   Community radio in Mali and how it operates

In Mali many community radio stations exist. Some are state funded and connected to
the national broadcasting service ORTM (Office Radio Télévision du Mali). Others are
privately funded or completely self-supportive. According to their business, funding
scheme, size and location some radio stations do have computers and internet, some
have computers without internet connection and some do not have any computer fa-
cilities at all. All these radio stations are situated within the coverage area of mobile
telephony.
The Malian community radios have large bases of listeners and the radius of coverage
ranges between 100-200 km. These radio stations create their own programs and broad-
cast local and regional news, music, informative programs, round table programs and
paid announcements. Three radio station stations are involved in the projects described
in this paper. These are: Radio ORTM Ségou, a state owned radio, that has computers
and a 2 Mbps fixed line (DSL) internet connection. Radio ORTM Ségou broadcasts
programs in French and Bambara, the most widely spoken language in Mali.
The second radio station is Radio Moutian, in Tominian. This radio is independent and
its funding is based on paid airtime for announcements and private gifts from third par-
ties. Radio Moutian has a computer but no internet connectivity. Programs are mainly
broadcasted in Bomu, a local language fro the Tominian region. The third radio is Radio
Seno in Bankass. This radio is independent from the Malian state and has only analogue
equipment. There are no computers, there is no internet connection here, but the radio
has many listeners in the region around Bankass. The main language spoken here is
Dogon. The activities of the three above mentioned radio stations are related to three
types of end-users or customers:


 – NGOs, that buy airtime to broadcast public announcements about informative and
   educational topics, such as agriculture and public health information. This type
   of service is usually based on fixed monthly subscriptions to airtime for recurring
   broadcasts.
 – Non-commercial listeners from the region, who buy a few minutes of airtime and
   pay a broadcast fee per minute airtime. The information is usually brought to the
   radio, or communicated via phone and subsequently written down on paper by the
   radio staff. Some listeners call in on a given time slot (one hour per week) and
   leave a short voice message ( few seconds only) as a reaction to a program that
   was broadcast on a certain popular topic. These messages are named letters to the
   editors (LTE).
 – Journalists or trusted village reporters that phone to the radio and leave local news
   or interviews on a regular base. In the current situation, all incoming phone calls
   are attended by a radio staff member and annotated in tabular form on paper.


3.2   The radio platform architecture

The proposed radio platform, which we named Foroba Blon (FB), consists of a data
store containing recorded voice messages and related meta-information.
The interface to the FB radio platform is either purely voice-based, through mobile
phone, for entering new content, or web-based. Users of the mobile interface are the
listeners from the region, who enter letters to the editor (LTE). These people only have
mobile phones but no access to the Internet. Their calls are answered by the system
with a pre-recorded welcome message in a local Malian voice inviting them to leave
their message. For the sake of user-friendliness, the user interface and the dialogue for
this category of users is kept as short and simple as possible, since the expected callers
will be unfamiliar with interactive voice response systems and may not respond to a
complex computer-generated dialogue asking to press buttons etc.

   Another category of users of FB are the trusted reporters calling from the field,
and also using the mobile interface. They phone in and leave their spoken report for
broadcasting. These users are previously registered, having their phone number, name,
address and preferred language in FB. These users will be trained to navigate the voice-
menu, and use the IVR system, asking to press a button on the phone to confirm or
answer a question about their current location, subject of the message, etc. The FB sys-
tem always answers the registered caller in his/her preferential language.

     The voice messages are stored as audio files in the FB data store, together with meta-
information being the date and time of the call, the length of phone call in seconds, the
phone number of the caller. Messages from trusted users are linked to the owner, his/her
address, and his/her preferred language. For all users of the system, confidentiality and
anonymity will be ensured, according to the broadcast policies used by the radio stations
in Mali.
     The FB radio platform also has a normal web interface, where internet-connected
end-users/customers can access and upload a voice message. Depending of their cus-
tomer relationship to the radio, they can login to the radio-platform as (i) registered
users such as NGOs, and trusted reporters, or (ii) as unregistered users. There is an op-
tion to sign up and create a user account by registering the name, phone number, village
and preferred language. Unregistered users can access former broadcasts since this is
public information.
For the radio user, FB provides a web-based interface, enabling them to manage the data
in the data store. It provides a file list where they can access, listen, broadcast, delete
files, and add/update/delete meta-information.
The radio station that has no computer nor internet, has only a very limited interface
to the radio platform, since this is the constraint of a voice interface. The radio user
receives a welcome message asking if she wants to hear the last 10 messages, or if she
wants to manage the welcome messages to the end-users.
     The FB radio platform is hosted either locally, on a stand-alone computer, or in the
cloud. The FB consists of a voice platform running an open source web server and a
local voice browser that handles the voice interaction. The FB radio platform uses a
GSM gateway device, e.g. OfficeRoute,12 a device that handles incoming and outbound
calls and streams the voice messages to and from the phone.

12
     OfficeRoute: http://en.flossmanuals.net/freedom-fone/connecting-officeroute
The FB radio platform could in theory be physically hosted anywhere in the world,
on any web server, connected to the Internet. However, in this actual case in rural re-
gions of Mali, this is not possible. Firstly, the radio platform has to be accessible using
an inexpensive local Malian phone number, so it must be connected to a Malian phone
network. Secondly, the web service accessed over the Internet must also be accessi-
ble locally. Since the internet connectivity in Mali is usually of low bandwidth and of
high latency, voice web services hosted in datacenters in the US or Europe, are too
slow for proper deployment in Mali. For these two reasons, the system has to be prefer-
ably hosted locally in Mali. In the absence of good and reliable datacenters or hosting
providers in Mali, the radios can decide to deploy the FB radio platform on a local com-
puter at their own premises. Obviously before this can be done, the radio staff members
have to be trained how to do operational maintenance of the FB platform, and espe-
cially how they can cope with frequent power outages, and bring the system back to a
consistent state

4     Organizing the radio content
The next challenge is how to manage the spoken content of un-resourced languages
such as Bomu, Bambara and Dogon. Since up to present no interactive voice response
(IVR) systems exist for these languages, the voice-based content cannot be indexed by
conventional search engines. Therefore collecting as much meta-information as possi-
ble is essential. Very simple ways of indexing the messages are based on owner (known
through phone number) automatic language recognition, time slot, (e.g. all messages
collected on January 13 between 10 and 11 a.m. are related to the radio program on
harvesting sheanuts). The radio journalist can manually enter meta-information such
as keywords, village region, language, name or any other attribute to an audio file us-
ing her radio-web interface. In the future existing tagging systems such as EthnoCorder
may be considered, to facilitate meta-data collection.

4.1   Prepare the data for unexpected re-use
For the moment, the data of the radio platform will be only used locally, but we want to
prepare the platform for the future. According to Tim Berners-Lee it is the unexpected
re-use of information which is the great value, added by the Web. We will use Linked
Data principles for the audio data on the FB radio platform, as following the golden
rules by Berners-Lee [7] for publishing and connecting data on the Web, while adhering
to its architecture and standards: 1. Using URIs as names for things; 2. Using HTTP
URIs so that people can look up those names; 3. Providing useful information, using
the standards like RDF, SPARQL; 4. Including links to other URIs, so that people can
discover more things, especially across regions and future similar community radio
platforms. If the FB radio platform proves to be a success, other instances of FB may
be installed at local radio stations in Mali, across borders, in neighbouring countries,
Burkina Faso, Ghana, Senegal, Guinée where conditions with regards to illiteracy, local
languages, mobile telephony and community radio are similar to those in Mali. This
will create a Web of African community radios that are linked to each other and that
will eventually become part of the Web of Data.
4.2     Organize an open source community of developers to create applications for
        the radio platform
In the VOICES and Foroba Blon projects one instance of the radio platform is devel-
oped by a small team of developers, in collaboration with end-users13 sponsored by the
International Press Institute as a pilot project. However, to enable further development
of the radio platform, and to expand the scale of the web of radios, it is important to
look at new ways of production and consumption of data and services. African commu-
nity radios operate in a low-income region where the sustainability of a system relies
on the underlying business model. Community radios do not have enough earnings to
invest in new systems, and their listeners-base is large, but poor. Application develop-
ment will therefore to be organized in a cost-effective way. We propose to organize
an open source community of developers and to rely on commons-based peer produc-
tion for the development of applications that will open the Web of Data to radio using
voice-modality.


5      Discussion and future work
From this paper it becomes clear that the Web of African Radios can only emerge as
an interface to the Web of Data, when sufficient applications are built, that link voice-
based content. For the navigation of voice menus and other voice-based dialogues small
subsets of the local languages such as Bambara and Bomu have to be recorded and re-
sourced using time-consuming techniques and efforts. The user interfaces have to be
extensively tested and validated with end-users in the local situation, since these are
culturally sensitive. For the resourcing of more local languages crowd-sourcing tech-
niques may be applied. The issue of meta-information is another important topic. In
the model presented for the FB radio platform in this paper, only a small amount of
meta-data is collected. When the repositories of spoken content start to become larger,
new innovative ways of describing spoken content have to be developed. To contribute
to a critical mass of content and applications that are necessary in this rural domain,
a socio-technical network has to be put in place, that must be supported by a com-
munity of contributors: web developers, listeners that provide meta-information, local
ICT-entrepreneurs, people who are willing to produce and consume data. According
to Kazman and Hong-Mei Chen [8] organizing a community of developers around an
open source service requires a consolidated kernel infrastructure, allowing peripheral
services to be created by a de-centralized community of developers. Specific social and
technical mechanisms are needed to ensure long-term participation and to encourage
community engagement. In this case this is justified by the aim to open the Web of Data
to people who are out of reach of computers and the internet.


References
1. Akkermans, H., Grewal, A., Bon, A., Tuyp, W., Allen, M., Gyan, N.B.: W4RA-VOICES
13
     At the moment of writing, the use cases have been collected in Mali, and the FB platform is
     being built accordingly. However, no feed-back has yet been received from the users
   eld report. Tech. rep., Web Alliance for Regreening Africa (2011), http:
   //www.mvoices.eu/2011/03/25 Voices-W4RA Public Report.pdf
2. De Boer, V., De Leenheer, P., Bon, A. Gyan, N.B., Van Aart, C., Guéret, C., Tuyp, W., Boyera,
   S., Allen, M., Akkermans, H. Radio Marché: Distributed Voice en Web Interfaced Market
   Information System under Rural Conditions, in prep.
3. W3C: Voice Extensible Markup Language VoiceXML Version 2.0, W3C Recommendation
   16 March (2004), http://www.w3.org/TR/voicexml20/
4. Agarwal, S.K., Jain, A., Kumar, A., Rajput, N.: The world wide telecom web browser. In:
   Proceedings of the First ACM Symposium on Computing for Development. ACM DEV 10,
   (2010) New York, NY, USA, ACM 4:14:9.
5. Horst, N.: EthnoCorder in Burundi: innovation, data collection and data use. In: Participatory
   learning and action. IIED (2011). http://pubs.iied.org/pdfs/14606IIED.pdf
6. Guéret, C., Schlobach, S.: SemanticXO : connecting the XO with the worlds largest infor-
   mation network. In: Proceedings of the First International Conference on eTechnologies and
   Networks for Development, ICeND2011. (2011)Communications in Computer and Informa-
   tion Science, Springer LNCS.
7. Berners-Lee       T.      :     Linked        Data,      the     four        rules     (2006).
   http://www.w3.org/DesignIssues/LinkedData.html
8. Kazman, R. & Hong-Mei, C.: The Metropolis Model. A new logic for development of crowd-
   sourced systems. Communications of the ACM, (2009), Vol. 52, No 7.
 Voice-based Access to Linked Market Data in the Sahel

Victor de Boer, Nana Baah Gyan, Anna Bon, Pieter de Leenheer, Chris van Aart, Hans
                                  Akkermans

Dept. of Computer Science, the Network Institute, VU University, Amsterdam, The Netherlands
 {v.de.boer, n.b.gyan, a.bon, pieter.de.leenheer, c.j.van.aart,
                                j.m.akkermans}@vu.nl


       Abstract. In this paper, we present our ongoing efforts to bring the Web of
       Data to rural communities in the Sahel region. These efforts center around Ra-
       dioMarché, a market information system (MIS) which can be accessed using first-
       generation mobile phones. We argue that linking the locally produced and con-
       sumed data to (external) Linked Data sources will increase its value. We describe
       how RadioMarché data is available as Linked Open Data and present a prototype
       demonstrator with voice-based access to this linked market data. Through this in-
       terface, the Linked Data can be accessed using first generation mobile phones. As
       such, these are first steps towards opening the Web of Data to local users that do
       not have appropriate hardware to produce and consume Linked Data. We present
       a number of use cases as well as the current deployment state. We also discuss
       our current efforts to leverage the creation of Linked Data in development regions
       and build applications on this Linked Data.


1    Introduction
Development and use of the Web of Data has until now mainly focused on developed
countries, as was the case with the Web of Documents before it. 4.5 billion people -
mainly in developing countries- currently can not access the World Wide Web. The
reasons for this include infrastructural ones such as a lack of high bandwidth Internet
connections and reliable power supplies as well as socio-economic issues such as the
high cost of buying Personal Computers, language mismatches and lack of reading and
writing abilities. For our case study in Mali, only 1.8% of the population has Internet
access1 , only 10% has access to the electricity network2 , and only 26.2% is literate3 .
Currently, a number of efforts are being undertaken to bridge this so-called ’digital
divide’ in the World Wide Web, including the recent forming of the Web Foundation.
As was argued in [1], while the Web of Documents has been around for 20 years, as
engineers of the much newer Web of Data, we have the opportunity to not let the “digital
Linked Data divide” grow too large. To avoid a seemingly unbridgable gap, we should
consider the underprivileged majority as we design Linked Data architecture, describe
use cases and provide access to that Linked Data. In this paper, we describe our ongoing
 1
   http://www.internetworldstats.com/ Internet World Statistics, Miniwatts Marketing Group.
 2
   http://www.developingrenewables.org/energyrecipes/reports/genericData/Africa/
   061129%20RECIPES%20country%20info%20Mali.pdf
 3
   http://www.indexmundi.com/facts/indicators/SE.ADT.LITR.ZS Index Mundi 2011.
investigations the implementation of Linked Data-backed solutions for the rural Sahel
regions.


1.1     RadioMarché

Our efforts center around a Market Information System, RadioMarché [2], a web-based
market information system being developed within the VOICES project 4 aimed at stim-
ulating agricultural trade in the Sahel region. The RadioMarché system augments an
already running Market Information System (MIS), that was introduced by our partner
NGO, Sahel Eco5 , in the Tominian Area in Mali.
    Within RadioMarché local market information about Non-Timber Forest Products
(NTFPs) such as honey, tamarind and shea nuts is stored. A local instance of Ra-
dioMarché has a data store with rudimentary market information such as product of-
ferings (including product type, quality, quantity, location and logistical issues) and
contact details from sellers and buyers. This information is sent to community radios
for radio broadcast and is made available for individual potential buyers and sellers.
To overcome interfacing and infrastructural issues, RadioMarché has a voice interface
which can be accessed through the normal telephone network using first-generation
mobile phones.
    A first version of RadioMarché has been deployed in November 2011 in the To-
minian region. In this paper we describe a prototype demonstrator developed in paral-
lel which exposes the market data from this prototype instance of RadioMarché using
Linked Data approaches, so that new opportunities for product and service innovation
in agriculture and other domains can be unleashed. The prototype demonstrator also
features rudimentary voice-access to the Linked Data.


1.2     Why Linked Data?

We believe that Linked Data as a paradigm is very much suitable for knowledge shar-
ing in developing countries. Linked Data approaches provide a particularly light-weight
way to share, re-use and integrate various data sets using Web standards such as URIs
and RDF. It does not require the definition of a specific database schema for a dataset
[3]. We assume that the majority of the use of the locally produced data will also be
consumed locally. Although the specifics of the locally produced data will differ from
use case to use case and from region to region, Linked Data provides us with a standard
way of integrating the common elements of the data. Also, because we do not impose a
single overarching schema on the data, data reuse for new services is easier, both within
a region and across regions. The aggregated data can be used by NGO’s to assess run-
ning programs and increase their own transparency and accountability. We will provide
examples in Section 4.
    An additional advantage is that Linked Data is well-suited to deal with multiple lan-
guages as its core concepts are resources rather than textual terms. Where the Web of
Documents, by design, is language-specified, Linked Data is designed to be “language
 4
     http://www.mvoices.eu
 5
     http://www.saheleco.net
agnostic”, which suits our purpose of multilingual and voice-based access well. A sin-
gle resource, identified by a URI (ie. http://example.org/shea nuts) can have multiple
labels (eg. Shea Nuts@en and Amande de Karite@fr). Other than textual labels, for our
voice-services we add audio to the resources with language-specific voice snippets, also
identified through URIs.


2      Related Work

Agarwal et al. from IBM Research India, developed a system to enable authorship of
voice content for 2G phone in a Web space, they named the WWTW (World-Wide Tele-
com Web). The whole system creates a closed web space, within the phone network.
Linking from one voice site to the other is done through a protocol HSTP, created by
IBM. Especially the lack of open search possibility constrains its growth [4].
    Several automated market information systems have been developed and built to
support farmers and agricultural trade in developing countries. One of the well-known
market information systems is ESOKO [5], an online market system, developed and
built in Ghana. ESOKO enables sellers and buyers to exchange market information.
Google started a project in Uganda in 2009, partnering with MTN and Grameen Foun-
dation to develop mobile applications that serve the needs of poor and other vulnerable
individuals and communities, most of whom have limited access to information and
communications technology [6]. This system is based on SMS but does not allow voice
access.
    The Web Foundation has started the Open (Government) Data to ”Conduct coun-
try level actions and global actions to increase the impact and benefits of Open Data
worldwide” [7]. This effort focuses on opening government data in developing coun-
tries such as Ghana. Our data is initially designed to be produced and consumer by the
regional farmers themselves. Linking our regional data to the (Linked) Open govern-
ment data could increase the value of both datasets. A related project on Linked Data
for developing countries is described by Guéret et al. [8]. The SemanticXO is a system
that connects rugged, low-power, low-cost robust small laptops for empowerment of
poor communities in developing countries.


3      The RadioMarché linked market data Demonstrator

3.1     The Linked Market Data

For our experimental demonstrator, we transcribed a copy of the up-to-date market
information from the RadioMarché prototype deployed in the Tominian region to RDF
triples and stored in a ClioPatria triple store [9]. The transcription process is done using
XMLRDF rewrite rules 6 , the conversion can be run when the RadioMarché database is
updated to ensure the database of the deployed version and the linked data store of our
prototype are in sync.
 6
     http://cliopatria.swi-prolog.org/packs/xmlrdf
     Currently, we use PURLs for the resource URIs. The temporary namespace chosen
is http://purl.org/collections/w4ra/radiomarche/. An HTTP request to these PURL URIs
is redirected to the ClioPatria server, running at http://semanticweb.cs.vu.nl/radiomarche/.
     Through ClioPatria’s Linked Data package, the RDF data is accessible as Linked
Open Data. The result of an HTTP request for a resource is either a human-readable
web page 7 or the raw RDF triples describing the resource (in the case of a browser
request or an RDF request respectively). A SPARQL 1.0 endpoint is also provided at
http://semanticweb.cs.vu.nl/radiomarche/sparql/.
     As of February 2012, 31 market offerings are in the triple store. These market of-
ferings have been done by 15 different farmers, living in 13 different villages spread
across 6 regional “zones”. The market offerings contain the quality, quantity and type
of the product the price and contact information. In total, the market data consists of
721 triples.
     In the current version of the demonstrator, the FOAF ontology has been used to
describe persons. Additionally explicit links from the dataset to external data sources
were made manually. These include links from zones and villages to GeoNames con-
cepts, DBpedia geographical resources and DBPedia product descriptions.


3.2   Voice-based access to Linked Data

The linked market data can be browsed through the web using the above mentioned
URLs. However, as stated, our goal is to provide a voice-based interface that allows
non-intrusive market information access for all users having a first-generation mobile
phone.
    We have implemented a rudimentary version of a voice-based interface to the linked
market data as described in the previous section. The voice service is built using VoiceXML
[10], the industry standard for developing voice applications. Although in a deployment
version we cannot assume that text-to-speech (TTS) libraries are available for the local
languages, we here only implement English-language access to the data, using English
TTS. Within the VOICES project, TTSs are currently being developed for local dialects
of French as well as local languages such as Bambara and Bomu.
    The prototype voice application is running on the Voxeo Evolution platform 8 . The
platform includes a voice browser, which is able to interpret VoiceXML documents,
includes (English) TTS and provides a number of ways to access the Voice application.
These include the Skype VoIP number +990009369996162208 and the local (Dutch)
phone number +31208080855.
    When any of these numbers is called, the voice application accesses a VoiceXML
document hosted on a remote server. This document contains the dialogue structure for
the application. In the current demonstrator, the caller is presented with three options,
to browse the data by product or region, or to listen to the latest offering. The caller
presses the code on his or her keypad (this is Dual Tone Multi-Frequency or DTMF).
The voice application interprets the choice and forwards the caller to a new voice menu.
 7
   For example http://purl.org/collections/w4ra/radiomarche/village Samoukuy/ shows all infor-
   mation about the Samoukuy village
 8
   http://evolution.voxeo.com
For products, the caller must select the type of product (“press 1 for Tamarind”, “press
2 for Honey”, etc.), for regions the caller is presented with a list of regions to choose
from. Based on the choice the application then accesses a PHP document on the remote
server, the choice is copied as a HTTP GET variable.
    Based on the choice, a SPARQL query is constructed. This SPARQL query is then
passed to the RadioMarché Linked Data server, which returns the appropriate results.
For a product query, all (recent) offerings about that product are returned. The SPARQL
result is then transformed into VoiceXML and articulated to the caller.
    The prototype demonstrator and the ways of accessing it are shown in Figure 1.


                                                                 Linked Data Cloud


                                                                                      VoiceXML-to-
                                                                                         SPARQL
                          XMLRDF
       Market data      rewrite rules       Linked market data

     Locally deployed                                                                  Voice browser
      RadioMarché                                                                    (Voxeo Evolution)
                                        ClioPatria Linked Data server
         instance


                                                       Web access                             Voice access


      Fig. 1. Schematic representation of the linked market data prototype demonstrator.


    Of course, the current method of accessing the data is only one of many possible
actions. The caller can be presented with advanced filtering options (“enter the maxi-
mum price for offerings of product X”, “enter a date range for product offerings”) or
combinations of data queries. However, because of the slow and linear nature of voice
interfaces -when compared to visual UIs- options have to be limited more than with
visual interfaces. This means that in our research we will identify useful services on
this data and provide Voice-to-SPARQL mappings for these services.


4   Current Work

Voice-access to the linked market data as described above is still very much in an early
prototype state. We are currently working on multiple projects to a) expand number of
interlinked datasets produced and consumed in the region and b) investigate use cases
that use this Linked Data and build services and applications for those use cases. In this
section, we describe the current status of these efforts.

4.1     Other Linked Data sets
The following is a list of Linked Data sets currently being realized. Each of these will
be related to the linked market data as well as to external sources.

Meeting Scheduler Within the VOICES project a second use case is to develop a
voice-accessible meeting scheduling system. The goal of this system is to provide local
NGOs with a more effective way to transfer agricultural knowledge about non-timber
forest products to their farmer community. The services developed in this case study
provide voice access to personal and scheduling information. By integrating this in-
formation with the market information from RadioMarché, personal profiles can be
enriched with information about the type of products that specific farmers have been
producing within a given period. Here a new scheduling and notification service can
re-use the market information within a region.

Citizen Journalism Data A second use case that is currently under development by the
same team is a voice-based journalism platform, IPI innovation fund, which allows both
professional and citizen journalists to send voice-recorded news items to local commu-
nity radios. The target region for this use case consists of agricultural communities and
there is a large possibility for re-use of both technical infrastructure as well as data. To
do this, the re-usable resources (e.g. person data, geographical or product information)
in the market information data are linked to the relevant resources in the target data set
using Linked Data standard relations.

Pluvial Data We are also developing a crowdsourcing platform to transform photo-
copied data about rainfall in the Bankas area in Mali to Linked Open Data. This plat-
form targets the ‘diaspora’, e.g. people originally from the region that have since moved
to developed countries, where they might have better access to web browsers. The plu-
vial Linked Data acquired in this way will be linked to the aforementioned data. This
can be exploited by our partner NGO as well as other NGOs to analyze for example
patterns between rainfall and market offerings.

IDS Data The Institute for Development Studies recently published an API expos-
ing more than 30.000 publications about development research 9 . As part of a recent
agreement, we will develop a wrapper around the IDS API to expose its content as high
quality Linked Data, enriching it with connections to other Linked Data datasets. These
include both general datasets such as DBPedia or GeoNames as well as datasets with
information from developing countries that are currently being realized. We will also
develop a client application showing the advantages of this publication, exploiting the
integration of the IDS data with other Linked Data sets in an information mashup.
 9
     http://api.ids.ac.uk
Links to External datasets At the same time, local and national governments as well
as NGOs can exploit the linked market data for analytic purposes, monitoring the trade
in NTFPs within and across regions. By linking the market information to existing
agricultural vocabularies such as FAO’s Agrovoc thesaurus10 , the CAB Thesaurus11 , or
the USDA’s National Agricultural Library NAL12 , the aggregated market data can be
used for specific analyses for government or NGO purposes.

4.2   Linked Data Applications
We are currently building a client web application where we use the various Linked
Data. This application will use the market data and exploit its links to GeoNames for
displaying the market offerings on a map. The links to other dataset (IDS data, pluvial
data, etc.) will also be exploited to provide the user with additional information about
products or regions.
    The application will allow the local NGO to perform various types of analysis based
on market data that are useful on the basis of their educational programs. The applica-
tion will aim to demonstrate the added value of the linked data approach through the
re-use of and integration with existing market data from various sources with differ-
ing schemes and through the re-use of and integration with market data with publicly
available knowledge from the web on agriculture and economics.
    At the same time, we will continue to work on client applications for users in the
developing regions themselves, focusing on voice-based access to the data. Within the
VOICES project, (limited) TTS systems for smaller languages are being developed.


5     Discussion
We have presented the Linked Data version of the RadioMarché system, its data and the
voice-based access. This system represents our first steps to brining Linked Data to pro-
ducers and consumers in developing countries. We describe a demonstrator with locally
produced Linked Data which provides rudimentary voice-based access, in addition to
browser-based and Linked Data-application access.
    Currently, the demonstrator is implemented on commercial-grade and University-
provided web servers including the Voxeo Evolution platform, PURL servers and the
VU University Amsterdam web server. The voice application is also only reachable
through a Dutch local phone number or Skype access. To ensure sustainability of the
Linked Data and the client applications, this infrastructure needs to be moved to the
developing regions itself as much as possible. The Orange Emerginov platform 13 can
provide the web server and voice browser technology needed for this infrastructure
and include local Malian phone numbers. The Linked Data servers, voice-interfaces
and client applications can be moved to this platform at testing or deployment time.
A second option is entirely local. This version has the data and applications running
10
   http://aims.fao.org/website/AGROVOC-Thesaurus/sub
11
   http://www.cabi.org/cabthesaurus/
12
   http://agclass.nal.usda.gov/
13
   http://www.emerginov.org/
on a web-connected dedicated laptop that is be deployed locally. The voice channel
is provided by a local voice browser and a GSM gateway (2N OfficeRoute) device
connected to the laptop that allows phone calls to be handled by the system on the
laptop.
    As was discussed in Section 1.2, we aim to include the audio language resources to
the Linked Data itself. We are currently gathering language snippets that act as audio
labels for resources. These will be added to the data itself so that they can be interpreted
by a voice browser directly.

Acknowledgements The authors would like to thank Mary Allen and Amadou Tangara
of Sahel Eco. This research is partly funded by the European Union through the 7th
Framework Programme (FP7) under grant agreement Num. 269954.


References
 1. Guèret, C., Schlobach, S., de Boer, V., Bon, A., Akkermans, H.: ”is data sharing the privilege
    of a few? bringing linked data to those without the web”. Outrageous Ideas at International
    Semantic Web Conference (ISWC 2011). Jury award winning paper. 1st Place (2011)
 2. de Boer, V., Leenheer, P.D., Bon, A., Gyan, N.B., van Aart, C., Guèret, C., Tuyp, W., Boyera,
    S., Allen, M., Akkermans, H.: Radiomarché: Distributed voice- andweb-interfaced market
    information systems under rural conditions. In: Accepted for publication in Proceedings of
    24th International Conference on Advanced Information Systems Engineering, CAiSE’2012,
    Gdansk, Poland, 25 29 June 2012. (2012)
 3. Domingue, J., Pedrinaci, C., Maleshkova, M., Norton, B., Krummenacher, R.: Fostering a
    relationship between linked data and the internet of services. In Domingue, J., Galis, A.,
    Gavras, A., Zahariadis, T., Lambert, D., Cleary, F., Daras, P., Krco, S., Mller, H., Li, M.S.,
    Schaffers, H., Lotz, V., Alvarez, F., Stiller, B., Karnouskos, S., Avessta, S., Nilsson, M., eds.:
    The Future Internet. Volume 6656 of Lecture Notes in Computer Science. Springer Berlin /
    Heidelberg (2011) 351–364
 4. Agarwal, S.K., Jain, A., Kumar, A., Rajput, N.: The world wide telecom web browser. In:
    Proceedings of the First ACM Symposium on Computing for Development. ACM DEV ’10,
    New York, NY, USA, ACM (2010) 4:1–4:9
 5. ESOKO: Esoko. http://www.esoko.com/ (2011)
 6. AppLab, G.:               Google sms to serve needs of poor in uganda.
    http://blog.google.org/2009/06/google-sms-to-serve-needs-of-poor-in.html (2009)
 7. Foundation, W.W.W.: Open government data. http://www.webfoundation.org/projects/ogd/
    retrieved 14-03-2012 (2012)
 8. Guéret, C., Schlobach, S.: Semanticxo : connecting the xo with the world’s largest informa-
    tion network. In: Proceedings of the First International Conference on eTechnologies and
    Networks for Development, ICeND2011. Communications in Computer and Information
    Science, Springer LNCS (2011)
 9. Schreiber, G., Amin, A., van Assem, M., de Boer, V., Hardman, L., Hildebrand, M., Hollink,
    L., Huang, Z., van Kersen, J., de Niet, M., Omelayenko, B., van Ossenbruggen, J., Siebes,
    R., Taekema, J., Wielemaker, J., Wielinga, B.: Multimedian e-culture demonstrator. In:
    The Semantic Web - ISWC 2006, Athens, Georgia, volume 4273 of LNCS, pages 951-958,
    Winner Semantic Web Challenge 2006, Springer Verlag, November 2006 (2006)
10. W3C: Voice extensible markup language (voicexml) version 2.0. W3C Recommendation 16
    March 2004 http://www.w3.org/TR/voicexml20/ (2004)
    Semantic Web in a Constrained Environment

                      Laurens Rietveld and Stefan Schlobach

    Department of Computer Science, VU University Amsterdam, The Netherlands
                    {laurens.rietveld,k.s.schlobach}@vu.nl


       Abstract. The semantic web is intrinsically constrained by its environ-
       ment. These constraints act as a bottlenecks and limit the performance
       of applications in various ways. Examples of such constraints are the lim-
       ited availability of memory, disk space, or a limited network bandwidth.
       But how do these bounds influence Semantic Web applications? In this
       paper we propose to study the Semantic Web as part of a constrained
       environment. We discuss a framework where applications adapt to the
       constraints in its environment.

       Keywords: downscaling, ranking, constraints, resource bounds


1    Introduction

Agreement that the Semantic Web has become a huge success story is widening:
standardised languages exist for representing web data and ontologies, together
with tools to deal with such semantic information. This facilitates applications
that publish and consume triples in the billions, in domains as various as life-
sciences, cultural heritage, e-business and journalism. Additionally, the number
of robust commercial triple stores is constantly increasing, and successful DBMS
such as Oracle become RDF aware. Parallelization and distribution have made
expressive semantic reasoning massively scalable, and the movement towards
storage and reasoning in the cloud suggest that even the last technological bar-
riers can soon be overcome.
    Although we partially share this optimistic view, it only paints a part of a
larger picture. In fact, the apparent scalability critically depends on a compu-
tational infrastructure that is out of reach for the vast majority of the human
population. Powerful servers, supercomputers and clusters are the privilege of
few. Often, users of the Semantic Web cannot rely on a constant and reliable
Internet connection with sufficient bandwidth. In more remote regions, even elec-
tricity supply is not always guaranteed, with restricted battery runtime having
to be considered when building applications. But it is not just the infrastructure,
or its lack of, that provides unsurpassable boundaries for the Semantic Web to
be used and useful. Think, e.g., of real-time user interaction interaction that
restricts the computation time to the often very short attention span of humans.
After all, who wants to wait more than a few seconds for search results? Ad-
ditionally, such interaction has to take the human processing bandwidth into
account. Anybody wants to see more than 10 search results? No. In short: the
Semantic Web is bound by resource availability.
    Although those bounds look diverse at first glance, we suggest to study in-
formation access and processing on the Semantic Web in the context of these
bounds more systematically. The purpose of such an analysis is not an end in
itself: such an analysis of the relation between Semantic Web applications and
the way they are resource bound can help when building better applications, or
to build good applications more easily. A promising approach is to study the
explicit and intrinsic orderings and rankings in data and the information need.
This information can then be used to deal with the resource-bounds. Take as ex-
ample the human attention span, which requires applications to produce results
extremely fast. This is often impossible in computationally expensive representa-
tion languages and for applications involving complex data, unless the intrinsic
rankings are used to produce good results first and fast. Ranking of goodness is
at the basis of such any-time approaches.
    This paper is a first attempt to investigate the dependencies between partic-
ular resource bounds and the type of orderings and rankings used to overcome
the boundaries. Is there a more generic relation between ranking and resources?
And if so, can we explicate this relation, and use it to guide the process of
building Semantic Web applications in a resource bound world? The rest of this
paper introduces a number of assumptions underlying our approach (section 2).
Following, we study those assumptions in the context of two Semantic Web appli-
cations(section 3). We conclude with future work and open questions(section 4).


2   Constraints & Ranking
In the introduction we claim that resource bounds and rankings are related. This
claim constitutes the foundations of an unifying idea for more easily building
good application for the Semantic Web: the idea of using orderings in the data
and application requirements to deal with explicitly defined resource bounds.
This section will introduce those claims more systematically. We will first elab-
orate on these statements. Afterwards, in the next section, we will study them
in light of two very different use-cases.

(1.) The world is resource bound. Our environment is full of constraints.
These bounds constrain us, applications, and the way we interact with, and have
to build, these applications.

(2.) Deal with these constraints. Applications have to (implicitly) deal with
these constraints, which often means trading functionality of the application to
remain within the given bounds (e.g. switch to off-line mode when there is no
connectivity).

(3.) We need ranking Applications can deal with these constraints by ranking
results and/or tasks. This enables the application to take a sub-selection (top-
k) part of the results or tasks, and process them. In doing so, the application

                                       2
does not process the complete result or task set. An example where the ordering
of tasks is used to adapt to changing bounds, is that of an operating system.
Most operating systems will rank the running tasks in importance, and give the
higher ranked tasks precedence. This ensures the most important tasks will still
run whenever there is a high CPU load (resource bound).
(4.) Ranking: it’s all in the data Ranking of results and/or tasks depends on
rankings of data. This data often contains explicit ordering such as recency. Or
there is a more implicit ordering covering items such as importance or relevance.
For this framework we selected a set of ranking measures which are easy to
retrieve and calculate:
 – Recency: How old is this entity
 – Size: How large is this entity
 – Frequency: How often is this entity used or accessed
 – Similarity: How similar is this entity compared to others

(5.) We need explicit bounds In doing so, an application is able to use these
bounds as input, and link these bounds to other functionality in the application.
The following list is a first attempt at distinguishing between the generic types
of constraints relevant for Semantic Web applications:
Hardware Constraints We consider any machine (e.g. server, pc, laptop)
   hardware restriction as a machine constraint. Examples of such constraints
   are Memory, Hard disk space or Battery power. Not all constraints are appli-
   cable to every domains. For example, the hardware restriction Battery power
   only applies to environments where laptops are used.
Network Constraints The Semantic Web needs network connections. The
   connections between the nodes in such a network are constrained by for
   example Network Bandwidth or available connectivity.
Interaction Constraints We consider interaction constraints as the constraints
   imposed by the limits of the agent using the application. These constraints
   are more difficult to obtain, but they might be retrievable from access logs or
   user profiles. Examples are reaction time 1 or Processing Bandwidth, i.e. the
   maximum information an agent is capable of processing in a given situation.

(6.) Tell me your bounds and ranking, and I deal with them. This
requires an explicit link between the constraints and the ranking measures. By
using this link, applications can adapt their ranking measures to the changing
resource bounds. The ability to adapt is particularly useful, because resource
bounds are not static: Network bandwidth might fluctuate, just as available
memory can change. Applications and the way they use the ranking measures
should change as well.
    The picture shown in figure 1 illustrates the plausible connections between
the constraints and the ranking measures. The case study descriptions in the
next section will elaborate on these connections in more detail.
1
    The maximum time limit between a request of the agent to the SW application, and
    the desired response from the application.


                                          3
                                                                       Ranking Measures
                                                        Similarity   Frequency     Size   Frequency
                               Hard Disk Space
                               Battery Power
                               Memory
                 Constraints

                               Network Bandwidth
                               Available Connectivity
                               Reaction Time
                               Processing Bandwidth


                                 Fig. 1: Ranking Measures vs. Constraints


3     Case Studies
We discuss two very different case studies: the SemanticXO and visualization of
census data, and the way rankings and resources bounds relate in them.

3.1   SemanticXO
An environment with obvious constraints is given in the SemanticXO project.
The XO laptop is part of the One Laptop per Child (OLPC) project. The aim
of OLPC is to create educational opportunities for the worlds poorest children
by providing each child with a ‘rugged, low-cost, low-power, connected laptop’.
The SemanticXO is a project which aims ”to provide an infrastructure that is
needed to integrate semantic information from the Web of Data into programs
(...) running on an XO computer” [2].

Software Stack Currently, the SemanticXOs software stack contains [2]:
 – A triple store, in charge of storing triples with meta-data.
 – An HTTP server, which serves as a public interface to the triple store, pro-
   vide de-referenceable URIs for the resources, and serve the files. The HTTP
   server is accessible by other XOs.
 – A common interface (API) for accessing the triples stored locally, on another
   SemanticXO, or elsewhere on the Web of Data.
The SemanticXO uses this software stack to store information as a Data Graph
in the triple store. After the graph is stored in the triple store, it is available to
other XOs via the HTTP Server or the API.

Problem description XO’s are often used in a network with similar XO’s,
where a school server functions as an access point with internet connection.
Currently, distributing information within such difficult. Moving information
from one XO to another requires both XOs to be on-line, and run the same
program (called ‘activity’). Situations where the recipient of information is off-
line, where information may be shared asynchronously, or where information
needs to be shared regardless of any running activity, are currently impossible
to deal with.


                                                               4
    The SemanticXO offers a framework with which to approach this problem.
Currently, objects containing meta-information are stored locally as a data store
object (DS-Object). These objects contain meta-information, and contain (a
combination of) attribute-value pairs. More complex information such as images
are stored separately. Generic shipping of objects requires a DS-Object to be
wrapped as a Semantic DS-Object (SDS-Object), which is stored as a graph in
the triple store. Via the HTTP server or the API, this information is automati-
cally available to other XO’s in the network. Using this infrastructure, the nodes
can share and sync objects. This creates a network of loosely coupled triple stores
between which information has to be distributed.

Constraints & Ranking We will discuss the constraints and rankings in this
case study, using the claims from the previous section.
    (1.) The world is resource bound Internet connections are unreliable or have
a very limited bandwidth, and hardware may not be on-par with regular laptops
and desktops, to name but a few.
    (2.) Deal with these constraints Not dealing with constraints such as network
bandwidth will most likely result in an inadequate distribution of objects. Not
all objects may be distributed to all nodes, which means SDS-Objects run a risk
of not being delivered to the intended recipient.
    (3.) We need ranking Constraints may limit the number of SDS-Objects to
be shared, which means a subsection of the SDS-Objects should be synced. This
is essentially a top-k ranking problem, where only the most useful objects are
cached and synced, and others are ignored.
    (4.) Ranking: it’s all in the data Relevant ranking measures are Recency (i.e.
how long ago is this graph created or modified), Size (i.e. how big (in bytes) is
the graph), and Frequency (i.e. how often is this graph synced)
    (5.) We need explicit bounds The resource bounds of the SemanticXO are
either caused by the hardware specification[1] or the XO network connections.
We consider the following resource bounds for the SemanticXO (a subset of
the bounds described in section2): Available memory, available hard disk space,
battery power, network bandwidth and available connectivity.
    (6.) Tell me your bounds and ranking, and I deal with them By combining
the resource bounds (5), the rankings (4) and the links between both (fig 1), the
application can deal and adapt to its resource bounds. We consider the following
relations between the constraints and these rankings measures:
 1. Hard Disk Space: With limited disk space available, giving precedence to
     smaller (size) objects will allow the SemanticXO to store a larger quantity
     of objects.
 2. Battery Power: When battery power is running low, giving precedence to
     recency and frequency will make sure the old objects which have not been
     synced that often are ordered higher than others, because older objects in
     the queue might indicate these objects are not spread through the network.
 3. Available Memory: Processing large graphs might require more memory.
     When available memory is low, precedence should be given to smaller graphs.


                                        5
                             Fig. 2: Census Visualization


 4. Network Bandwidth: A low network bandwidth should decrease the im-
    portance of the size of SDS-Objects. This way, at least the smaller objects
    will still be distributed via this node.
 5. Available Connectivity: When the SemanticXO is connected to a very
    limited number of other nodes, the graphs which are not very well distributed
    in the network should be given precedence. This means objects which are
    not often synced (frequency) and older object (recency) are ranked higher
    than others.

This framework is particularly useful for the SemanticXO, because hardware may
differ between XO’s. An XO server has completely different hardware (bounds)
than a regular XO laptop. Using the described framework, the server and laptop
can both optimize their performance based on their own resource bounds.


3.2 Visualizing Census Data
To show that resource bounds are not limited to the environment of developing
countries we describe a totally different application where Dutch historical census
data is visualized. This use case is part of the Data2Semantics project2 , which
focusses on the problems of ‘how to share, publish, access, analyse, interpret and
reuse scientific data’.

Visualizing Census Data This dataset contains Dutch census counts from
between 1795 and 1971, and covers demographic information, such as gender,
age, location, marital status, occupation, household and religion. An example of
an application making use of the (convert to RDF) dataset is shown in figure 2.
This visualization has several selectors:
 1. Variable selection: The variable of which the distribution is shown on the
    map
 2. Time: The year of the census count
2
    www.data2semantics.org


                                          6
 3. Zoom: Zoom in or out of the map
 4. Age: Age range for which to visualize the data for
 5. Aggregation Level: The aggregation level for the visualization. Changing this
    value changes the granularity of the colors on the map.
Because this visualization makes use of a SPARQL endpoint, these selectors
correspond to SPARQL queries. In this respect, we can consider the variable
selection, time, and age as filters, and the aggregation level as a ‘group by’.
Constraints & Ranking We will discuss the constraints and rankings in this
case study, using the claims from section 2.
    (1.) The world is resource bound This visualization application is clearly
resource bound. Users will expect near to any-time responses when changing the
sliders. Additionally, the application will have to deal with current (consumer)
hardware and network constraints.
    (2.) Deal with these constraints Not adapting to the constraints leads to
situations where the response time (when changing a slider) is too long.
    (3.) We need ranking A way to deal with these constraints is by pre-executing
queries. Executing all possible setting combination will not be feasible, however:
there are too many possible combinations of queries. Deciding which queries to
pre-execute, though, is a ranking problem/
    (4.) Ranking: it’s all in the data Relevant ranking measures are:
 1. Query similarity: Difference in similarity between current view, and the pos-
    sible query. The bigger the difference, the wider the ‘window’ becomes which
    is being pre-executed.
 2. Frequency of usage: Selectors often used are more interesting to pre-execute.
    Therefore, queries where the filter value of a frequently used selector is dif-
    ferent from the filter value in the SPARQL query of the current view, should
    have precedence.
 3. Size of expected number of results for a given query: e.g., decreasing the
    aggregation level increases the retrieved number of triples.
    (5.) We need explicit bounds The resource bounds of this visualization in-
volve: available memory, network bandwidth, reaction time of the user, and
processing bandwidth of the user.
    (6.) Tell me your bounds and ranking, and I deal with them By combining
the resource bounds (5), the rankings (4) and the links between both (fig 1), the
application can deal and adapt to its resource bounds. We consider the following
relations between these constraints and the rankings measures:
 1. Memory: Limited memory implies limited available space to store the query
    results. This should give precedence to queries where the expected query
    result is low.
 2. Network bandwidth: The smaller the network bandwidth is, the smaller
    the expected size of query results should be.
 3. Reaction time: More time might allow for a smaller query similarity, which
    results in a bigger window size for the pre-executed queries. A larger interval
    allow for a larger query result, as there is more time available for the response
    to be generated and processed.


                                         7
 4. Processing bandwidth: The more information a user is able to process,
    the larger the size of the query result may be, and the bigger the allowed
    windows size may be (query similarity).


4   Future Work & Conclusion
This paper showed a perspective with which to consider applications in a (dy-
namic) resource bound environment. The presented framework should allow se-
mantic web applications to adapt to changing constraints, by making use of
the explicit and intrinsic orderings and rankings in the data. We showed the
generic link between ranking and constraints, and put these into context of two
very different case studies. This framework is not complete though. A roadmap
towards completion requires discussion on some unanswered questions, and an
implementation/evaluation of this framework. Current open questions are:
 – The ranking measures mentioned in section2 are very broad: recency, size,
   frequency and similarity. Can we keep these notions on such a generic level?
   Or are these ranking measures open to interpretation, depending on the
   application domain?
 – What are the generic relations between constraints and ranking measures?
   For both use cases we show some links. Are these generic? And does dealing
   with one constraints influence other constraints? Is there a cost in dealing
   with constraints.
 – How do the ranking measures relate to each other in such a dynamic system?
   How is the final ranking score determined?
 – How to define interaction constraints? Different users interact differently
   with a system.
 – How does this approach hold up against other methods such as the use of
   access control, differentiated services, or formal maximization under con-
   straint?
The main question seems to be: does explicating these concepts and relations
work in practice? This requires further implementation and evaluation of this
framework in one of the use cases.

Acknowledgements. This publication was supported by the Dutch national
program COMMIT.


References
1. CL1 Hardware Design Specification (2008)
2. Guéret, C., Schlobach, S.: SemanticXO : connecting the XO with the World’s
   largest information network. In: Proceedings of the First International Conference
   on e-Technologies and Networks for Development (ICeND2011). ”Communications
   in Computer and Information Science”, Springer LNCS, Dar-es-Salaam, Tanzania
   (2011)


                                         8
       Information Ecosystems and International
                    Development

                                       Mike Powell

                                    IKM Emergent
                             m.powell@pop3.poptel.org.uk


        Abstract. This paper looks at the purpose of developing linked open
        data applications of value to poorer and less well connected potential
        users and considers some of the challenges that need to be met if this pur-
        pose is to be achieved. It stresses the different societal contexts in which
        such applications will be deployed and describes some of the potential
        negative and unintended consequences of enthusiastic but ill prepared
        initiatives. It describes the development information environment as one
        of overlapping information ecologies, which, as the metaphor implies,
        contains both interdependencies and examples of predatory, counter de-
        velopmental behaviours. From this analysis it suggests a few key areas
        which combine the potential of real value for users with exciting and
        ground breaking technological challenges.


1     Introduction

The Downscale 2012 workshop asks software producers1 to think of the four
billion people who do not have access to the internet when designing linked
open data platforms and to aim to reduce rather than add to the digital divide.
This request seems entirely consistent with the purpose behind many existing
linked open data initiatives, which seek to promote greater equity in governance
through enabling higher levels of transparency and accountability. It could in-
deed be said that it is hard to see the point of an interest in linked open data
solutions without having an interest in their potential social impact. If this is
indeed a common perspective within the linked open data community, then the
questions of what social impact is intended and how it may be achieved need to
be taken seriously.
    This paper, whilst commending the Downscale initiative, argues that the
relationship between information and poverty is far more complex than a simple
lack of access to communications channels or indeed to information itself. It
suggests that deeper study of the potential of information to combat poverty
is required. Such deeper study needs, in each situation, to include the specific
dynamics of any particular type of information and of the situation in which any
1
    To avoid confusion, we use the word “development” to relate to the field of interna-
    tional development, struggles against poverty etc. and thus must use another word
    to describe the development and developers of software and other technical solutions
II

intended user group finds itself. However, some general points can perhaps be
outlined here. The desired result of such preparatory work is to identify issues
in people’s informational needs which pose new technical challenges as well as
to become aware of and build meaningful collaborations within the social and
organisational contexts in which any technical solutions will be applied.


2    Information and International Development

This paper is also located within a discourse about “international development”.
The author participated at a panel discussion at ICTD 2011 at which consider-
able scepticism was expressed by a predominantly technical audience about the
relevance of the international development sector. Some argued that the idea
of top down investment in processes of poverty alleviation involved outsiders
telling other people what they should do and was therefore patronising. Oth-
ers suggested that such efforts had been largely ineffective and therefore need
not be taken into account. In this author’s opinion, there is some merit in both
arguments. It is also undeniably the case that much economic and social de-
velopment, including innovation with ICT, emerges from the internal dynamics
of local societies and has no relation to the purposeful efforts of international
development actors. That said, the sector is a significant presence in many parts
of the developing world. Definitions of what constitutes “official development
assistance” vary, but the OECD2 calculated that it amounted to USD 127.6 bil-
lion in 2010. When we consider that some USD 2-3 billion of that amount is
spent by professional development organisations just on their own information
systems, including their linked open data platforms, we can see that the sector
is inevitably going to impact on the information landscape. It has the potential
both to offer resources and co-ordination to solutions which will help the poor
or, as has been argued elsewhere, to worsen the digital divide3 . It is also the
case that history of international development efforts is littered with examples
of “technical experts”, including in the ICT4D field, failing to recognise or adapt
to the socio-cultural or physical specificities of the new environments in which
they were working and thus failing spectacularly. Thus for the lessons of its
failures as much as its successes, as well as for its exchanges of ideas and the
opportunities it offers to build collaborations, it would be unwise to ignore the
work of the development sector.
    Indeed one starting point for any information and development initiative
should be a clear understanding of what is meant by development. Leaving aside
the large number of examples which could legitimately be given of development
assistance being applied for the direct economic or political benefit of the donor,
2
  OECD, Query Wizard for International Development Statistics, accessed September
  15th 2011
3
  See Powell, Davies and Taylor, 2012 “ICT for or against development? An introduc-
  tion to the ongoing case of Web 3”. IKM Emergent Working Paper, in press. The
  paper takes an historical overview of the use of ICT within the devlopment sector
  and looks more particularly at early linked open data initiatives within it.
                                                                                  III

there are longstanding and legitimate debates on what types of development
strategy are likely to be most effective. Is it more useful to attempt to support the
general development of a poorer country in the expectation that resulting growth
and educational improvements will spread to all sectors of society and, because
of their wider base, be more sustainable? Or is it better to target interventions
at the specific needs and challenges of the poorest and most marginalised? As
Wright et al [1] explain in their recommendations on Open Government Data in
India:

     “The meaning of “open government data” and the purposes it serves
     will have to be re-examined from an Indian perspective. The reasons
     that work well in the US and the UK may not work well in India. We
     also have to be very careful about how we imagine the end-users of open
     government data. Do we visualise open data as being for the benefit of
     individual middle-class citizens by helping them to consume (processed)
     data themselves (with bus routes, for instance), or do we visualise them
     as being for the benefit of the poor, and thus target NGOs? Do we
     visualise them as being hackers or laypersons?   ” (P 40)
    This question is fundamental for any information-led development initiative,
including those concerned only with data. If “development” initiatives are in-
tended to be of direct benefit to the poorest, this implies improving the relative
position of the poor in relation to the rest of society, or, if we are thinking in
terms of societies, the relative position of a poor country in relation to global
levels. Given that information by itself is of little value without the capacity to
make use of it and that such capacity is strongly linked to educational levels,
it follows that simply making information more generally accessible is likely to
worsen not improve such relative positions. Wright et al., in the paper cited
above, highlight the danger of what they call “elite capture of transparency”4 ,
whereby information coming from or intended to benefit poor people is captured
and exploited by others, a process which would be familiar to students of the
uptake and use of public health services in Europe. Thus simply enabling the
provision of data or information alone is not adequate. Questions need to be
posed as to what information would be of particular value to particular groups.
    Equally important are issues of how information is provided and curated and
to what extent the people using and producing the information have any control
over the process. These issues relate to power and profit but also to culture. All
societies are based on sets of cultural understandings which may be more or less
homogeneous, more or less dynamic in each situation. In recent years changes
in informational behaviours have led to new understandings of the interaction
of information and culture in many parts of the world. However, as an UNRISD
4
    A full case study, also referred to by Wright, is Benjamin, S. Bhuvaneswari, R.
    Rajan, P. and Manjunatha 2007 “Bhoomi: ’E-Governance’, Or, An Anti-Politics
    Machine Necessary to Globalize Bangalore?” A CASUM-m Working Paper http:
    //casumm.files.wordpress.com/2008/09/bhoomi-e-governance.pdf
IV

workshop in the run up to the first global summit on the “Information Society”
(WSIS) concluded: “it is a serious mistake to assume that they constitute a
uniform process globally or share a common destination, rather than a variety
of new processes each influencing and being influenced by the society in which
they are taking place.” [2]. Nor are such changes likely to be uniform within
individual societies. In a country such as the UK, for example, informational
behaviour of an individual might be expected to depend in large degree on their
age, education, social and professional networks as well, possibly, on their class,
gender and ethnic origin. This is of particular importance in the context of linked
open data which, in its European and North American manifestations, is very
much associated with ideas of openness of information and with the role of the
engaged “hacktivitst” committed to interpreting and putting to public use the
data made available. In our view, this model of open data leading to new analysis
and real change remains largely aspirational even in those environments with the
greatest quantity of linked open data and the highest density of “hactivists”. It
is a long way from being a realistic model for change in other environments as
both the Web Foundation’s “Open Government Data Feasibility Studies”5 and
Wright et al.’s study in India indicate. Whilst some such attitudes and skills may
be shared especially amongst the most computer literate in many places, it is a
mistake to assume that such ideas will always be seen as positive and beneficial.
This is not intended as an argument against initiatives which promote the use
of linked open data but to reinforce the point that, as with any form of “aid”,
what is done and how needs to be negotiated with those whom it is intended to
benefit.


3     Information Ecology
The development information environment is multi-faceted and complex. It ex-
tends from the global to the entirely local, it covers a multitude of disciplines,
cultures and languages. It operates on many levels. It can be understood as a se-
ries of overlapping information ecologies. The value of this metaphor is threefold.
As predators form part of natural ecologies, so power relations impact on human
ones. Ecologies are complex adaptive systems, so that an action in one part of
them will have impacts elsewhere. Arguably, they are also systems which can
benefit from being purposefully looked after: it is certainly possible to damage
them, perhaps also to nurture them.
    One feature of the power imbalances in the development field is the influence
of large, well resourced institutions using large data sets from which to generate
generalised policy approaches which are then applied to local situations. This
process can ignore the specificities of local circumstances and limit the freedom
of people working at more local levels to develop policy which seems appropriate
to them. In a similar vein, such organisations (and many others) justify their
roles by making extravagant claims about the value of their knowledge and
5
    In   Chile  and    Ghana.    See    http://www.webfoundation.org/projects/
    ogd-feasibility/
                                                                                   V

implying that the main challenge of development is getting this knowledge to
the people who need it. This was an explicit argument behind the World Bank’s
claim to be a “knowledge bank” in the late 1990s. Such attitudes lead to one-way
information flows which are often not based on sufficient information about their
intended recipients for the information to be of much value. As importantly, it
can be hard for information about the realities on the ground, those realities
which the whole effort is intended to improve, to get fed into and used in these
policy making and communications processes.
    Another aspect of the development information ecology is that although there
is a certain attempt to maintain a common purpose of “co-operation for the com-
mon good” across the sector, most development organisations are in an increas-
ingly competitive environment when it comes to seeking funds. This means both
that they try to emphasise the origin and value of “their” information, for exam-
ple by seeking to encourage direct traffic through their own web site and also that
they often have few resources to invest in collaborative information initiatives.
This, as well as the complexity of the quantities and ranges of information that is
relevant to development, has resulted in a very fragmented information environ-
ment. With a few exceptions, mainly limited to certain well defined communities
of interest, information resources are highly dispersed, metadata is poor, and the
inbuilt algorithms of most search engines tend to recognise and reproduce the
influence of the more powerful information providers.
    If the above concentrates on the information ecology as it exists for profes-
sional development organisations working at international level, the situation
for community level organisations and for individuals, marginalised or living in
poverty is even worse. A fragmented information environment is inconvenient
and inefficient for someone working in an office in Geneva, for someone relying
on a pay per minute connection in an internet café in Africa it is impossible.
Despite all the rhetoric about knowledge equating to power and the large ICT
and communications budgets of the sector as a whole, there are incredibly few
resources made available to support information processes which go beyond the
idea of the passive recipient. “Passive recipients” are of no value to development.
What is needed is for people to respond to information, understand it, adapt it
to their own circumstances and to use it: or, in other words, to go through the
process whereby information becomes knowledge. As has been argued elsewhere:


    Formal education undoubtedly helps, but so can many other forms of
    human interaction. People need to be able to validate information and
    to think through if and how any of it may be useful to them. In this
    context, and whatever other mechanisms may be available to help the
    process, connections with other people are essential, not only as sources
    of information, but also as means of validation, reflection and action. This
    is true for everyone - from the fraternity clubs of elite US universities to
    networks of the most poor and marginalised. “Ki raflé du ki amul yeeré
    wayé moy ki amul nit”, as a Senegalese proverb has it, “the poor person
    is not the one without clothes but the one without anyone.” [3]
VI

    The same chapter argues that most people, however marginalised, take part
in a number of informational spaces, “some geared to family matters and social
obligations, others related to work and income, some perhaps related to politics
and governance and to faith” (p 134). These spaces, each of which will have
their own rules and norms depending on their character and purpose, are seldom
seen as part of the development knowledge ecology, but in fact they are key to
knowledge being created and used by the people who’s actions determine whether
or not any development actually takes place. At least some of these spaces are
also open to interaction with external supporters but very little work has been
done at this level to really understand what types of information are most useful
to participants, to what extent and how digital platforms can be used, what
issues of privacy and restricting access arise. It is an area of huge potential but
also of many difficulties.

 Value: in many countries much government data is inaccurate or incomplete,
   often out of date and, whatever the legal requirements may be, often hard
   to hold of6 . In the absence of reliable free information, some have sought to
   develop enterprise models which either reward the generation of data and/or
   seek to charge for its use7 . These models may offer better value than public
   information but, if the end-user has to pay for the data thus provided, are
   likely to be of less benefit to the very poor.
 Security: in the wrong circumstances even the most apparently innocuous
   information can be misused. Ushahidi8 , in Kenya, rightly attracted a lot of
   attention by its ability to use crowd-sourcing to map emerging troublespots
   in the violence that erupted after that country’s elections in 2008. In doing so,
   the organisation, locally based and in touch with the many of the elements
   in Kenyan society working to end the violence, will have had to exercise
   judgement as to what information to make public. The same information,
   used differently, could have had disastrous consequences as was demonstrated
   by the use of Radio Mille Collines as an agent of the genocide in Rwanda.
 Cost: the factor of cost is ever present. Amidst the marvelling at the growth of
   mobile phone usage in Africa, issues of inclusion and exclusion can easily get
   lost. As mobile phones become the main vehicle for communication across
   extended families, the poor are left with the choice of paying up - sometimes
   over 20% of their income - or removing themselves from their most life affirm-
   ing community. Likewise, and in a cautionary tale for any proposed support
   of informational spaces, poor women in Zambia found themselves excluded
   from networks specifically set up to “empower” them (See, for example, [4]).
6
  In addition to the two Web Foundation reports cited in vii above, see Raman, N
  (2012) “Collecting data in Chennai City and the limits of openness” and Raman,
  B. 2012, “The Rhetoric of Transparency and its Reality: Transparent Territories,
  Opaque Power and Empowerment” both in Community Informatics 8:2, Special
  Issue: Community Informatics and Open Government Data http://ci-journal.
  net/index.php/ciej/issue/view/41
7
  See for example http://www.esoko.com/about/
8
  http://www.ushahidi.com/
                                                                              VII

 Control: much current discourse about information, particularly in Europe
   and North America, concerns the desirability of its freedom. However other
   cultures may have other perspectives. For example, in New Zealand the
   indigenous Maori community believe that information about their ancestors
   belongs to them and is a vital part of their identity. The insistence of the
   colonial authorities in collecting and retaining information about individual
   families was a historic bone of contention which has been re-ignited by the
   idea that the government had the right to put such records on-line9 .


4   Implications for Developmental Linked Open
    Information Initiatives

All of the above pose a number of challenges for anyone wanting to develop
linked information solutions in a way which goes beyond the basic connection to
actually the poor and the marginalised to improve their conditions. It underlines
the need for real understanding of local contexts, the desirability of working
with local partners, and the value of participatory engagement with information
providers and users if good choices are to be made about process, content and
platform. Some of the challenges are of a general nature, others are more directly
connected to issues of linking data and information.

 – The need for caution in becoming over reliant the data of powerful institu-
   tions like the World Bank and OECD. It is hard to avoid using their data and
   it is of course positive that they are increasingly making their data openly
   available, so that it becomes a valuable resource to query. On the other hand,
   constant reference to these sources can appear to reinforce their dominant
   roles in the development information ecology and privilege the types of evi-
   dence favoured by global policy makers over evidence which enables learning
   from local contexts.
 – In the same vein, there is a pressing need to make sure all relevant voices
   are heard and that data and other information from the grass roots becomes
   more visible
 – The term “linked open information” rather than “data” is used in the sub-
   heading above because, if the aim is to support the information needs of the
   poor, there is no point in privileging data over other types of information.
   In the situations we are discussing, people are generally lacking many types
   of information. This poses the challenge, which to some extent has been
   explored in the IKM Emergent Programme10 , of using RDF and other data-
   linking tools to create links to metadata about other forms of information
   and to related social media discussions
9
   Plenary floor exchange at First Global Congress on Community Networking,
   Barcelona, Nov 2000
10
   See     http://wiki.ikmemergent.net/index.php/Workspaces:1._Information_
   artefacts     and     http://wiki.ikmemergent.net/index.php/Workspaces:1:
   Linked_Open_Data
VIII

 – Work on data itself, especially government data, in developing countries is
   likely to involve political and technical issues to improve its availability, ac-
   curacy and timeliness as much as technical development. It may also involve
   initiatives aimed at deriving data from crowd sourcing, a practice of which
   there are already examples from India and Kenya.
 – The issues of the platform and the affordability of its use is crucial. Mobile
   Phones are an obvious choice but the cost of their use, particularly for higher
   bandwidth applications can be prohibitive. It may make sense to explore the
   potential for partnerships with resource centres, telecentres or, where they
   exist, libraries.
 – Reusing schema can generate links simply and efficiently but again raises
   issues of politics and influence in whose schema are used. Standards also
   need to be established to avoid the development of an “official” development
   linked open data in which the powerful professional organisations exchange
   data and which excludes interaction with other sources of information and
   data
 – Developing thesauri and ontologies capable of transcending the multi-lingual,
   multi-disciplinary realities of development will be a real challenge for what
   are referred to in the literature as “emergent ontologies”, “heterogeneous
   ontologies” or “dynamic networked ontologies”. The issue of “semantic in-
   teroperability” is also highlighted in the “Report on Open Government Data
   in India” cited above.


5      Conclusion
Finally, as the authors of a report on women’s use of ICT in Mozambique con-
clude, there are limits to what can be achieved by the use of ICTs alone. Their
impact can be immeasurably strengthened if they can be deployed alongside
other more traditional tools of empowerment.

    “Literacy is key - without literacy there can be no empowerment, par-
    ticularly for women and girls. We therefore strongly recommend the im-
    provement of women’s literacy in rural areas. We believe that women’s
    literacy, combined with increased relevance of content, could result in
    computer-related ICT tools becoming an asset to women’s pursuit of
    the means for survival and for control of their lives.” [5]
    What this conclusion suggests, apart from the point it so clearly makes,
is that the process of linking information with meaningful change is far from
straightforward and is unlikely to be taken very far by the development of one
set of technologies, produced in isolation. Collaboration in more broadly based
change efforts perhaps offers a better chance of greater impact as well as provid-
ing an opportunity for the discussions of assumptions, needs and options with
people, rooted in their communities, to better understand their needs and create
a process of mutual learning.
                                                                               IX

References
1. Wright, G, Prakash, P. Abraham, S. and Shah, N. 2011, “Report on Open Gov-
   ernment Data in India”, Centre for Internet and Society, Bangalore. http://www.
   transparency-initiative.org/reports/open-government-data-study-india
2. UNRISD. (2005). Understanding Informational developments: a reflection on key
   research issues. Conference Report. p. 2.
3. Powell and Cummings, 2011, Missed understandings: How ICT might yet prompt
   change in Development in Zavazava and Perez-Chavolla (eds) “The Role of ICT in
   Advancing growth in Least Developed Countries: Trends, Challenges and Opportu-
   nities”. International Telecommunication Union (ITU), Geneva, http://www.itu.
   int/pub/D-LDC-ICTLDC.2011. p132
4. Abraham, K.B. (2009). The names in your address book: are mobile phone net-
   works effective in advocating women’s rights in Zambia? In Buskens, I. and
   Webb, A. (Eds.), African women and ICTs: investigating technology, gender and
   empowerment (Chapter 9). IDRC, Ottowa. http://www.idrc.ca/EN/Resources/
   Publications/Pages/IDRCBookDetails.aspx?PublicationID=61
5. Macueve, G., Mandlate, J., Ginger, L., Gaster, P. & Macome, E. (2009). Women’s
   Use of information and communication technologies in Mozambique: a tool for em-
   powerment. In Buskens, I. and Webb, A. (Eds.), African women and ICTs: investi-
   gating technology, gender and empowerment (Chapter 2, p. 30).
                     Downscaling Entity Registries
                      for Ad-Hoc Environments

          Philippe Cudré-Mauroux1 , Gianluca Demartini1 , Iliya Enchev1 ,
                    Christophe Guéret2 , and Benoit Perroud3?
                 1
                     eXascale Infolab, University of Fribourg, Switzerland
                             {firstname.lastname}@unifr.ch
                     2
                         Vrije Universiteit Amsterdam, the Netherlands
                                    c.d.m.gueret@vu.nl
                            3
                                VeriSign Inc., Fribourg, Switzerland
                                    bperroud@verisign.com


        Abstract. Web of Objects and Linked Data applications often assume
        that connectivity to data repositories and entity resolution services are
        always available. This may not be a valid assumption in many cases.
        Indeed, there are about 4.5 billion people in the world who have no or
        limited Web access. Many data-driven applications may have a critical
        impact on the life of those people, but are inaccessible to those popu-
        lations due to the architecture of today’s data registries. In this paper,
        we point out the limitations of current entity registries when deployed
        in poorly connected or ad-hoc environments. We then sketch new archi-
        tectures based on IPV6, structured P2P networks and data replication
        for entity registries that could run in ad-hoc environments with limited
        Internet connectivity.


1     Introduction

Data registries are critical components of the Web architecture and are widely
used in every-day web activities. For example, domain name registries are
databases containing registered Internet domain names. They are necessary for
all web users wishing to visit a website knowing its URL (e.g., hostname) rather
than its IP address. Thanks to the Domain Name System (DNS) infrastructure,
such information can be obtained by recursively resolving a domain name to an
IP address.
    Another example of registry is the Digital Object Architecture (DOA, see
Section 2). It allows to assign unique identifiers to digital objects (e.g., scientific
publications) which can then be univocally accessed by users. Their identity will
thus last in time even if their physical location may change (similarly to the IP
address associated to a domain name, which often changes over time).
    In situations where data registries are not continuously accessible, the user
experience can be strongly limited. As a basic example, if the DNS server used
?
    Authors are listed in alphabetical order.
by a client computer is not connected to the rest of the DNS hierarchy, then a
very restricted set of Internet domains can be resolved to their IP addresses.
    In addition to those traditional registries offering hash-table like function-
alities, online infrastructures and applications are increasingly turning to more
flexible registries containing information about general objects or entities (i.e.,
object registries and entity registries) to power data-driven applications. Emerg-
ing examples of that trend are DBPedia4 and Freebase5 which, given an entity
identifier (e.g., a URI), provide semi-structured metadata about the entity. An-
other example are the registries used for the Web of Objects, which mediate in-
formation between networks of digital devices connecting to each other, enabling
information publication or integration in sensor networks or smart building con-
texts for example. In those cases, the infrastructure needed is more complex
than a traditional “Hostname-IP” DNS system and is closer to a global registry
mapping unique identifiers to arbitrary structured data.
    In situations where such registries are not continuously accessible, however,
the user experience can be strongly limited. As a basic example, if the DNS server
used by a client computer is not connected to the rest of the DNS hierarchy, then
a very restricted set of Internet domains can be resolved to their IP addresses.
In an object registry context, discontinued access to the registry typically results
in the impossibility to publish data or issue object queries.
    In this paper, we argue that data registries increasingly represent an essen-
tial part of today’s Internet ecology (see Section 2 for a few examples), but
that their current architecture precludes their use in many important contexts.
For example, we can envision ad-hoc environment where the nodes self-organize
without having access to third-party registries. In such transient and poorly-
connected environments, nodes have a clear need to discover, connect, and ex-
change (structured) information with related entities locally and should be able
to do so without resorting to any outside registry. Another interesting context is
data-intensive object applications, where nodes have to discover and exchange
data about very large numbers of entities and should be able to do so in a
peer-to-peer manner whenever possible.
    We propose in this paper different approaches to overcome the limitations
of existing registry solutions for poorly connected environment or localized en-
vironments. First, we point out the limitations of current registry architectures,
before proposing solutions that take into account the limited connectivity of the
peers and enable the management of digital information in ad-hoc environments.
    The remainder of this paper is structured as follows. In the rest of this section
we motivate our work: we show why it is important to consider entity registries
for ad-hoc environments and which are their benefits. In Section 2 we briefly
describe existing architectures for data and entity registries. Section 3 highlights
the problems of current solutions when applied to poorly connected or ad-hoc
environments. Section 4 presents two alternative solutions to exploit entity reg-
istries based on IPV6 addresses and on P2P networks respectively. Finally, we
conclude and discuss future work in Section 5.
4
    http://dbpedia.org/
5
    http://www.freebase.com/
1.1    Use case: Internet-less Mesh Networks
A rapidly increasing number of applications—such as open social applications,
applications relying on governmental data (data.gov) or Linked Open Data6 —
assume an ubiquitous and continuous access to the Internet in order to power
data sharing and data-driven applications. As pointed out in our previous work
[6], this is not a safe assumption as there is more than half of the world’s pop-
ulation who is cut-out from wide area networks. However, people who do not
have access to the Internet still generate data and need to consume knowledge.
For example, children who benefit from the “One Laptop Per Child” project,
which aims at providing low-cost laptops (called XO) to developing countries,
can be connected to each other. XO laptops are used at school while connected to
school servers, but also at home where connectivity typically cannot be ensured.
In such networks, access to centralized registries (e.g., DNS or global object reg-
istries) is intermittent at best. When functionalities based on such registries are
needed (e.g., entity resolution or entity linking), they have to be emulated or re-
placed within the local network, and then possibly integrated to the centralized
infrastructure when the link to the wide-area network is reestablished.
    We can for example imagine a XO laptop connected to a server (e.g., at
school) downloading some Web pages or Wikipedia articles. Afterwards, the
laptop is moved to a different location with no Internet connectivity but with
the possibility to connect to other XO laptops. This scenario enables the sharing
of previously downloaded documents with others who had no possibility to obtain
them from the Internet, and the local publication of new entities.
    In this context, it is often important to identify entities in all documents
to enable entity-centric document aggregation, semantic of faceted search. Such
aggregations may, for example, support learning applications that present the
set of documents users should read when they want to learn about a specific
entity (e.g., “Malaria”).
    Extraction entities from HTML text may be performed automatically by
tools running on the XO laptops or even manually exploiting crowdsourcing,
which can address the problem of limited computational resources available [4].
After an entity occurrence is identified in the text, it has to be uniquely identified
by associating it with the right entity ID in order to foster automated processing.
XO users can also create new entities themselves (e.g., through data acquisition,
document authoring or document enrichment), which then should be propagated
and shared to the rest of the local community. For those different tasks, an entity
registry containing a relevant set of entity descriptions and identifiers is necessary
to streamline and support all data-driven applications in ad-hoc networks.

2     Entity Registries Today
There already exist many solutions to resolve entity names and/or get structured
information about entities. One example of entity registry has been proposed in
the context of the Okkam project7 , where the envisioned system stores a number
6
    http://linkeddata.org/
7
    http://www.okkam.org/
of entity profiles which can be accessed via keyword or structured queries. More
recently, the popularity of Linked Data made it possible to connect large entity
datasets and to make them accessible via SPARQL endpoints. Additionally to
these, we can imagine the adoption of well established platforms like, for instance,
DNS or DOA and to extend such technologies to entity registries.

2.1    DNS
The Domain Name System (DNS) is the system used on the Internet to resolve
domain names to their corresponding IP addresses. Domain resolution works in a
hierarchical manner; the top of the domain name space is served by so-called root
name servers, pointing to authoritative name servers maintaining authoritative
information for the top-level domains (a.k.a. “TLDs”, such as “.ch” or “.com”).
The authoritative name servers responsible for the TLDs point in turn to further
name servers, responsible for second-level domains (e.g., “unifr.ch”), and so on
and so forth to process each domain name label iteratively until the last iteration,
which return the IP address of the domain name queried. In practice, domain
names are often cached at various levels, for instance at the client-side, or at the
level of the DNS server provided by the Internet Service Provider in order to
limit the load on authoritative DNS servers.
    Though originally not designed for this purpose, it is be possible to extend
the current DNS infrastructure to create a full-fledged entity registry. In that
context, we recently suggested an extension of the DNS [3] to serve authoritative
metadata about Internet domains, leveraging both the DNS Text Record field
(“DNS TXT”) and new cryptographic features (“DNSSEC”).

2.2    DOA
The aim of the Digital Object Architecture (DOA)8 is the management of digital
entities over potentially very long timeframes. There are three distinct compo-
nents in DOA:
 – the Resolution System (Handle System)
 – the Digital Object Repository (DORepository)
 – the Digital Object Registry (DORegistry)
The principal function of the Handle system is to map known identifiers into
handle records, containing useful information about the digital object being iden-
tified (e.g., IP address, public key, URL etc.). Every identifier has two parts: a
naming authority (or prefix) and a unique local name under the naming author-
ity suffix, separated by “/” (e.g. “10.1045/january99-bearman”).
     The collection of all local names defined under a certain prefix defines the
local handle namespace under that prefix (something similar to a root zone in
the case of DNS). All the local namespaces (i.e., all prefixes) define the handle
namespace and a prefix can be considered as a top level domain. More names-
paces for Local Handle Services (LHS) can be defined in a hierarchical fashion
8
    http://www.cnri.reston.va.us/doa.html
under the Global Handle Registry (GHR), thus the Handle system provides a
hierarchical service model.
     The DORegistry provides services like browsing, searching, repository and
federation for collections of digital objects that can be distributed across multiple
sites including other DO Registries. A DO registry may manage metadata of
objects from a certain repository. Another possibility is managing both metadata
and actual digital object content stored by the registry, and a third scenario is
managing metadata from multiple repositories. The DO Registry can be set for
different types of metadata schemata and can be customized to provide different
search, federation, handle registration, event management and other services.
     The most popular application of this system is the use of Digital Object Iden-
tifiers (DOIs) to identify digital versions of written publications (e.g., scientific
articles). Such identifiers, by means of an ID resolution, will lead not only to the
digital object but also to its metadata. The important benefit of using DOIs are
persistent citations (i.e., the location of the digital object may change over time
but the identifiers will remain the same and its resolution will lead to the new
location).


2.3    Linked Data

The Linked Data movement has been pushing towards publishing and interlink-
ing public data in standard formats, which enables the automated discovery,
management and integration of structured resources online. The adopted tech-
nology is based on HTTP URIs and RDF. The resolution of an entity given its
identifier boils down to three steps in that context:

 1. discovering the IP address where the HTTP URI is supposed to be hosted
    (for example using the DNS)
 2. contacting the corresponding server and negotiating the content (e.g., to
    serve a human-readable version of the RDF data if the client is a Web
    browser)
 3. retrieve the structured description of the entity over HTTP.

This process is commonly called entity dereferencing since it is similar to general
URI dereferencing on the Web9 .


2.4    The Entity Name System

In the context of the Okkam project, the Entity Name System (ENS) [2] has
been proposed. It is defined as a service to resolve entity names to their global
identifiers (called Okkam IDs). This is made available thanks to a repository of
entity profiles described as a set of attribute-value pairs, and a mix of matching
components that select the correct identifiers for an entity request which may
be submitted in the form of a structured (i.e., attribute-value) or unstructured
(i.e., keyword) query.
9
    http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14
3     Limitations of Current Entity Registries
The existing data and entity registries are a critical asset for many Internet ap-
plications. However, all current architectures present limitations when we con-
sider a situation with limited network connectivity or ad-hoc environments. If
we imagine networks of computers with no connectivity to external Internet re-
sources, it becomes clear that DNS-based entity registries typically do not work
properly as only partial information will be cached in local and accessible DNS
nodes. Moreover, data in the cache may not be up-to-date as DNS nodes may
not frequently communicate with the rest of the DNS infrastructure. Similarly,
clients may not be able to resolve DOI if the servers of the naming authority
which issued such IDs is not reachable through the network. Entity registries like
the Okkam ENS or LOD end-points are all based on centralized solutions which
limit their reliability: in ad-hoc environments such central resolution points may
or may not be reachable at a given point in time.
    For the above mentioned reasons, we claim that a decentralized solution
enriched with full replication of some seed content would be a better approach
for an entity registry in ad-hoc environments. A decentralized system based, for
example, on structured P2P networks can provide better connectivity thanks
to cheap P2P communication and can tolerate the situation in which registry
servers are not constantly available. Thus, in the following section, we envision
solutions that consist of multiple distributed registry instances to optimize the
availability of entity-registries in the ad-hoc environments.


4     Approaches to Downscale Entity Registries
We sketch out below three possible avenues that we have considered for down-
scaling entity registries in ad-hoc environments. Our solutions are based on IPV6,
structured P2P and full replication technologies.

4.1   IPV6 Addresses
For many, the first solution to our problem may be provided by IPV6 tech-
nologies. IPV6 guarantees directly-resolvable IP addresses, irrespective of the
network topology (e.g., rending NAT irrelevant). Hence, local entities could be
made available using local IP addresses only. One could even envision some ad-
dress ranges reserved for such a use.
   While practical and simple, we however think that solutions directly built
on bare IPV6 technologies are limited for two reasons. First, we are missing the
indirection layer offered by the DNS or similar registries, which is often essential
when migrating or evolving entities (migrating a resource to another IPs would
require setting up some redirection mechanism between the previous identifier
and the new one). Second, from a networking perspective, this would boil down
to mixing two separate layers, namely the networking and the application layers,
which would certainly never be endorsed by the central networking companies
and bodies.
4.2     Structured P2P Proxies

P2P technologies are another potential solution to our problem. Distributed
Hash-Tables (DHTs), such as Chord10 or our P-Grid system [1], provide decen-
tralized, scalable hash-table-like functionalities that could be used to store entity
identifiers as well as related meta information in ad-hoc environment. Through
dynamic load-balancing and replication, those networks provide fault-tolerant
and efficient networking primitives where arbitrary requests can typically be
resolved in O(log(N )) messages, where N is the number of nodes in the P2P
network, from any entry point to the network.
    P2P technologies have been proposed in the past to enhance or supplant DNS
infrastructures11 , most often to provide an alternative to ICANN or to support
P2P file exchange. Such efforts had limited success so far. We think that the
CoralCDN [5] system, in particular, is relevant to our scenario, since it takes
advantage of highly efficient P2P mechanisms (P2P DNS and distributed sloppy
hash tables) to create P2P content distribution networks. It however suffers
from several severe limitations in our context, including some reliance on high-
bandwidth and wide-area connectivity and the lack of any mechanism to serve
structured entity content.
    One could hence consider a DHT-based CDN as a starting point to solve
our problem, and enhance the infrastructure with a native entity storage system
(such as our recent dipLODocus system [7]), and with semi-structured capabil-
ities (e.g., supporting declarative queries). Using such P2P infrastructures, we
could for example explore the best possible way to support both entity publica-
tion and entity search in ad-hoc networks.


4.3     Entity Nucleus and Lazy Replication

Even though supporting a full-fledged entity registry in ad-hoc settings is a
necessity, there are many cases where some of the nodes might connect to cen-
tralized infrastructures intermittently. Thus, we believe that it is also essential
to be able to cache authoritative or centralized information, and to be able to
dynamically synchronize data with such infrastructures.
    Depending on the context of the application, some core nucleus of the en-
tity data can be identified (e.g., DBPedia entity data for LOD). In such a case
and if the entity nucleus can be pre-installed on each node, then many of the
entity operations can be resolved locally without resorting to any third-party
infrastructure.
    Networked search and updates, however, still require distributed mechanisms
to be resolved. If such operations are deemed relatively infrequent, then semi-
structured P2P technologies like those described above can be applied: both
distributed queries and updates can for instance be (lazily) propagated or broad-
casted across the network at regular intervals.
10
     http://pdos.csail.mit.edu/chord/
11
     see for instance http://blogs.computerworld.com/17444/p2p_dns_to_take_on_
     icann_after_us_domain_seizures
5    Conclusions
Current entity registry solutions are often based on global hierarchies or cen-
tralized online directories. Such solutions are inapplicable to many contexts,
including ad-hoc networks and environments that have limited access to a wide-
area connection. In this paper, we described some of the key problems related to
using current entity registries in an ad-hoc context and suggested a few possible
alternatives. Three potential solutions were specifically sketched, based on IPV6
technologies, scalable and structured P2P technologies, and (lazy) content repli-
cation. We now plan to implement, test and combine both current entity registry
solutions and our new alternatives to determine in practice which architecture is
most useful given a specific ad-hoc environment. As a start, we plan to focus on
the OLPC XO context to provide a working solution and enable Open Data and
Linked Entity applications for the billions of people who are currently cut-out
of wide area networks.

6    Acknowledgment
This work was supported (in part) by the Swiss National Science Foundation
under grant number PP00P2 128459.

References
1. Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic, Man-
   fred Hauswirth, Magdalena Punceva, and Roman Schmidt. P-grid: A self-organizing
   structured p2p system. ACM SIGMOD Record, 32(3), 2003.
2. Paolo Bouquet, Heiko Stoermer, Claudia Niederee, and Antonio Mana. Entity Name
   System: The Backbone of an Open and Scalable Web of Data. In Proceedings of the
   IEEE International Conference on Semantic Computing, ICSC 2008, number CSS-
   ICSC 2008-4-28-25 in CSS-ICSC, pages 554–561. IEEE Computer Society, August
   2008.
3. Philippe Cudré-Mauroux, Gianluca Demartini, Djellel Eddine Difallah, Ahmed El-
   sayed Mostafa, Vincenzo Russo, and Matthew Thomas. A Demonstration of DNS3 :
   a Semantic-Aware DNS Service. In ISWC 2011.
4. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudre-Mauroux. Zen-
   Crowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for
   Large-Scale Entity Linking. In International World Wide Web Conference (WWW),
   2012.
5. Michael J. Freedman, Eric Freudenthal, and David Mazières. Democratizing con-
   tent publication with coral. In Proceedings of the 1st conference on Symposium on
   Networked Systems Design and Implementation - Volume 1, NSDI’04, pages 18–18,
   Berkeley, CA, USA, 2004. USENIX Association.
6. Christophe Guéret, Stefan Schlobach, Victor De Boer, Anna Bon, and Hans Akker-
   mans. Is data sharing the privilege of a few? Bringing Linked Data to those without
   the Web. In ISWC 2011 - Outrageous Ideas.
7. Marcin Wylot, Jigé Pont, Mariusz Wisniewski, and Philippe Cudré-Mauroux.
   dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data.
   In International Semantic Web Conference, pages 778–793, 2011.