=Paper=
{{Paper
|id=Vol-289/paper-2
|storemode=property
|title=Semantic Annotation of Mobile Data for Language Access
|pdfUrl=https://ceur-ws.org/Vol-289/p02.pdf
|volume=Vol-289
|dblpUrl=https://dblp.org/rec/conf/kcap/LenceviciusR07
}}
==Semantic Annotation of Mobile Data for Language Access==
Semantic Annotation of Mobile Data for Language Access
Raimondas Lencevicius Alexander Ran
Nokia Research Center Cambridge Nokia Research Center Cambridge
3 Cambridge Center 3 Cambridge Center
Cambridge, MA 02142 Cambridge, MA 02142
Raimondas.Lencevicius@nokia.com Alexander.Ran@nokia.com
ABSTRACT are equipped with more and more sensors including GPS
Mobile devices both host and collect significant amount of data receivers, Bluetooth transmitters and receivers, RFID receivers
that could be interesting to users. To make this data easily and others. They also receive and store information about such
accessible, it has to be stored in semantic repositories using a events as messages, phone calls, meetings, application usage and
well-defined ontology. Relationships between data from various access to digital services. It is therefore natural to expect that this
sources should be explicit. Natural language interface to such data data should be collected and made accessible on mobile device.
is an attractive option for information access. However, there are However, there are some open questions that need to be
semantic gaps between the data repositories and the formal resolved in order to make this data useful and accessible both to
representation of meaning produced by language understanding the programs and to the mobile device users. Collected real-world
systems. This paper describes a solution to the issues above. We data must be structured and integrated with other information
have implemented a system that converts the mobile data into available on mobile devices such as the information found in the
RDF format and annotates it with information necessary for user’s phone book or calendar. There also needs to be an intuitive
efficient access via natural language. We have designed and interface that allows flexible access to collected information.
implemented Natural Query system that automates the interface of
natural language system and the semantic data repository. Mobile devices store a rich set of structured information. The
Language tags are used to map between the natural language address book or phone book application contains names, phone
meaning representation and the repository elements. Repository numbers, addresses and affiliations of personal contacts. The
graph search is used to discover the knowledge about the calendar application contains entries for meetings with
repository structure. participants, meeting location and time. We exchange messages
and calls with people and organizations listed as our contacts. All
these data are related. Retrieving these data based on their relation
Categories and Subject Descriptors could be very useful for device owners. With such retrieval
H.3.3 [Information Search and Retrieval], H.5.2 [User capabilities they could learn who called them when they were in
Interfaces]: Natural language, I.2.4 [Knowledge California, or when is their next meeting with Ann from
Representation Formalisms and Methods]: Semantic networks Accenture. Unfortunately, the relations between different data
items are not always recorded explicitly when the events occur or
Keywords information is entered in some application. Therefore it is
Semantic annotation, query language, natural language. important to integrate the collected data by explicating its relation
to the data available on the device. To achieve this goal, we have
developed an extended PIM ontology that covers all relevant
1. INTRODUCTION types of information available on the mobile device: from
Natural language based interaction with software is observed events, information from external data stores, to on-
increasingly viewed as a promising addition and sometimes even device data from several mobile applications. Once the data was
alternative to graphical user interfaces (GUIs), especially in the structured and augmented with relations, it is stored in RDF [15]
domain of mobile devices. Mobile devices host structured and repository.
semi-structured information bases, software services, and
integrated devices such as cameras, music players, etc. Mobile So far mobile applications have been designed with their
devices also make a perfect user interface to the real-world own user interfaces, mostly GUIs, and occasional dedicated
environment. They are constantly carried with the user [2] hardware controls. Most of the application software on the mobile
enabling gathering of user location information. Mobile devices device could benefit from a natural language interface to its
functionality that would simplify and streamline performing
various tasks.
As a rule, language systems and mobile applications software
are developed independently of each other. To recoup the
investment in the development of a language system it must be
capable to integrate with a broad range of information sources.
Unfortunately information bases are not designed for interaction
using natural language. As a result this integration process is
mostly ad hoc, manual process. This severely limits the impact
that maturing language processing technology can have on mobile device was richer than the types supported by standard
transforming the way we interact with the mobile devices. ontologies; therefore we decided to create our own ontology.
Vcard also uses string values for certain objects that we wanted to
In our research, we have investigated ways of created robust
represent as full fledged RDF objects with URIs and attributes so
and portable natural language interfaces to semantic repositories.
they would have identities and we could add information about
We created a novel Natural Query (NQ) language and data access
them. For example, city and country fields are represented as
engine that greatly reduces the costs of providing natural language
strings in vcard. However in order to represent even basic
interfaces to semantic repositories. NQ can heuristically attach
geographic relationships cities and countries must be represented
operational semantic interpretation to a database independent
as objects.
meaning representation of a natural language question over a
given semantic repository. NQ enables us to provide a natural We also considered mixing and matching types from several
language interface to the integrated real-world and on-device data. ontologies for our data. This approach has the advantage of using
NQ requires attaching basic linguistic information to structural types possibly known by other systems. However this approach,
elements of semantic repository. In this paper we give a brief leads to a rather incoherent architecture of the ontology. We
overview of such annotations for ontology in the extended PIM decided that creating a single internally consistent ontology was
domain. preferable in our case. If needed, classes and properties in our
ontology can be related to types in vcard and foaf via equivalence
The paper describes the mobile data conversion into RDF
declarations using RDFS and OWL [21].
and semantic annotation (Section 2). Additional annotation and
knowledge extraction is needed for automated natural language Main class for contacts in our ontology is the Contact class.
interface to the data repository (Section 3). Our experience with It contains address, email, group, phoneNumber and URL
the system is presented in Section 4. We finish with the attributes. Organizations and persons can be Contacts, so we have
description of related work and conclusions. Organization and Person classes inheriting from the Contact
class. In addition to inherited attributes, Organization class also
2. MOBILE DATA INTEGRATION INTO has name and representative attributes. Person class adds
SEMANTIC REPOSITORY affiliation, birthday, btDevice, familyName, givenName, and
nickname attributes. Affiliation class showing the affiliation of a
We had to deal with two major data sources: events gathered
person with some organization has organization and title
by data collection framework and PIM data available from PIM
attributes. Part of the ontology relating these classes is shown in
applications. This section describes data from both sources,
Figure 1.
necessary data conversion and integration into semantic
repository.
2.1 Mobile Device Data
Data on mobile devices is owned by different applications.
This makes it hard to establish and explicitly indicate semantic
relationship between different data items. This situation is
acceptable as long as the users can only interact with their data
using the limited set of functionality provided by the applications.
However, if we open these data for language based access, it
becomes necessary to support access to different data items using
their semantic relationships. Some examples are referring to
people by their affiliations, titles, city of residence or office Figure 1. Part of Mobile PIM ontology
location; referring to meeting by their participants, subject, or
location; referring to received calls by the name of caller’s Group class describes groups of contacts, such as office
organization. colleagues or baseball friends. It has contacts attribute that
contains contacts belonging to the group and name attribute.
In our project we dealt with data that originated from the
phone book application (sometimes also called address book) and Location is a generic class describing locations that has a
the calendar application. Data in these applications are stored in number of subclasses: Address, Country, GPSLocation,
separate Symbian data bases [6]. Since these databases cannot be GSMLocation, Locality, Pcode and Region. Address represents
changed without interfering with the functionality of standard detailed addresses and contains country, locality, pcode, pobox,
applications we chose to integrate all data in a separate semantic region, and street attributes. Country, Locality, Pcode and Region
repository. We designed an extended Personal Information classes are simple with just a name attribute for respective objects.
Management (PIM) ontology that adequately represented all data GSMLocation class describes locations as obtained from GSM
items that we were interested in and their relationships. We network. It has carrier, cellTower and lac attributes. Carrier is
implemented a set of Python scripts that extract the data from the cellular network operator, cellTower has a single cell tower
native databases and import them into the PIM ontology. We used ID, and lac is a Location Area Code describing a certain region
RDF repository for data storage. within the network. GPSLocation specifies locations using
latitude and longitude attributes.
We created the PIM ontology to cover all data available in
the device. We considered using such standard ontologies as W3C
foaf [8] and vcard [20]. However, the information available on the
Mobile device Calendar application contains information needed to infer and attach these codes to some phone numbers
about meetings. Meeting class has subject, location, participants, that enter the system without such codes. For example, the phone
start and end attributes. number supplied via caller ID does not always include the country
code. Custom code has to be written for many data items to
Message class objects represent messages. They indicate
convert them on entry into the form required by the semantic
messageSubject, messageBody, receiver and sender.
repository.
One of the goals of semantic web is developing standard
The attributes of Observation objects connect with other
universal ontologies. Unfortunately, neither the existing
objects of the repository. For example, the phoneNumber attribute
ontologies, nor the one we used in our project can be claimed to
of a CallObserved is of type PhoneNumber, which is also used in
be standard. Attributes and data in different applications and
the attribute phoneNumber of a Person or Organization class.
domains vary significantly. For example, some calendar
Therefore the gathered data semantically integrates with the on-
applications may specify participants, while others don’t. Some
device data. Common classes are basis for building relations
address book applications may allow specifying birthdays for
between data classes belonging to different applications.
contacts, but others do not. Ontologies seem to follow in their
structure the applications or uses that their creators considered at Another area where observed data integrates with on-device
the ontology creation time. Classes are created based on particular data is the location information. GSM locations gathered on the
use cases. Attributes are chosen based on data availability and phone can be related to geographical locations, such as cities,
planned use of that data. Rather than focusing on the states or countries. Some data processing and additional relations
standardization, we discovered that an important value of RDF in the RDF repository are needed for this. We use the partOf
ontology is its extensibility – ability to accommodate new types relation between different objects to represent geographic or
and attributes at any time. organizational inclusions. For example, a relation can indicate that
Boston is a part of Massachusetts, which in turn is a part of the
2.2 Event Data USA. This attribute is also used to describe the GSM location
For data collection on mobile devices, we have used one of containment within a certain geographical object. Since GSM
the frameworks available within Nokia to collect events that occur locations are somewhat imprecise, we have chosen to associate
on a mobile device: phone calls, SMS messages, nearby Bluetooth them with town or city level geographical entities. This provides
devices, and GSM locations. All of these events are tagged with a sufficient information in most cases. If a more precise location can
timestamp when they occur. For phone calls the device records be determined, it could be associated with a city neighborhood,
the phone number called (or the phone number that called the street, house or even part of the office building.
user) and call duration. For messages, the phone number and the
For some other data, programs or users have to add
message text is recorded. A GSM location change event is
information to facilitate integration. Bluetooth device IDs need to
recorded when the cell tower associated with the phone changes.
be associated with specific persons, since such association is not
Finally, the phone periodically scans for Bluetooth devices in its
usually available in the mobile device phone book. For this reason
vicinity and records their names and IDs. All observations are
we added btDevice attribute to the Person class. It has to be filled
stored in the objects of Observation subclasses:
in with concrete values in order to associate the
BTDeviceObserved, CallObserved, MessageObserved, and
BTDeviceObserved observation to a specific person carrying a
LocationObserved.
Bluetooth device.
Although the gathered data is interesting by itself, it becomes
even more useful when properly linked to the data already 2.3 Discussion
available in the device. For example, user may want to know In a number of cases we had to decide whether to represent
where the person who called them lives. This information could particular entities as strings or as objects using URIs. It seems that
be found by relating the call log to the phone book on the device constructing an object is almost always worthwhile, since such
that maintains the association of phone numbers to people and objects can be later used for inter-object relations. For example,
their addresses. To enable this connection, it is important to by having Country, Region and City objects, we are able to
collect and preserve semantically relevant information. The indicate partOf relations between them. Also a single URI for a
connection of gathered information to other data can be achieved particular object, for example, city, allows to detect such
through time and location relationships, phone numbers, email connections as people living or working in a single city.
addresses, Bluetooth IDs and other inverse functional properties.
Overall, we found that our RDF repository is significantly
Time and location can be used to relate data items that are either
more flexible than a relational database. It naturally supports
associated with same time period or the same location. All event
multiple classes of contacts, multiple affiliations per person, and
data is time stamped, which makes such associations relatively
supports a sophisticated typing system.
simple. Location can be related to time stamped data items
through location observed during the same period of time.
Unfortunately for establishing some other relationships however 3. NATURAL LANGUAGE INTERFACE
there might be no generic approach. For example in order to Although the repository of integrated real-world and in-
connect phone call and message data to other data associated with device data can be used in a variety of ways, for example, via
the phone number, the phone number has to be known in a querying it using SPARQL [19], we were interested to provide an
standard form URI. We used the standard international form of intuitive and flexible user interface to it. A general natural
the phone number with country code and long distance code, for language interface to a rich data set could be more effective than a
example +1 555 555-5555. However, data processing may be GUI based application.
As a rule, information bases and language systems are SELECT DISTINCT $person ?givenName ?familyName
developed independently of each other. Therefore information FROM
bases are not designed for interaction using natural language and WHERE { $person a pim:Person; pim:givenName
their integration process is mostly ad hoc, manual process. Figure ?givenName; pim:familyName ?familyName; pim:affiliation
2 is a sketch of a typical architecture that is used to provide a ?affiliation; pim:address ?person_address.
natural language interface to databases and other back-end or $affiliation pim:organization $organization.
native services. $organization pim:address ?organization_address;
pim:name “IBM”.
{?person_address pim:locality “Ulm”} UNION
{?organization_address pim:locality “Ulm”}}
Unfortunately in order for a language system to generate
such semantic representation from the original questions, the
language system must contain a large amount of information
about the structure of the database and its content. Such
information includes the facts that IBM is a name of an
organization and Ulm is a name of a city, cities can be related to
organization through their addresses, organizations are related to
people through their affiliations, people are related to cities
through their home and office addresses, and all these
relationships and objects are represented by the specific structures
and entities of the database.
Entering such information into a language system is a tedious
and costly process that is not only domain dependent but also is
sensitive to specific choices of database organization. There is an
obvious advantage in maintaining some independence between
the database and the language system. One way to achieve this
independence is to have the language system generate semantic
representations of the questions that are as independent of the
database organization as possible.
Figure 2. Architecture sketch of Natural Language Interface In the example above semantic information contained in the
to Services question and independent of database organization amounts to the
The speech recognition and generation components translate following meaning representation:
between text and speech modalities. The language understanding
contact.name: ?
component converts the text into a formal representation of
organization: IBM
meaning sometimes called semantic frame [17]. The language
city: Ulm
generation component converts the formal meaning representation
to a natural language text [1]. The dialog manager uses the It is possible to have the language system produce such
context of conversation to complete frames received from the database independent meaning representation of questions. But is
language understanding module or created by the custom the information in such meaning representation of the question
integration code from responses of backend services. The custom sufficient to perform the requested operation? Obviously there
integration code also translates meaning representation frames it are several information gaps between this database independent
receives from the dialog manager into a standard database query meaning representation and the database specific semantic
or backend specific API requests. representation of the question in the form of a formal query.
Let us assume the user asks the system about contacts in The first gap is due to different names used to refer to the
some organization and geographical location: same elements in the language system and the repository. For
example, the category called “city” in the language system
Who do I know at IBM Ulm?
corresponds to the attribute locality of the Address class.
Who are my contacts at IBM in Ulm?
Therefore there is a need to maintain the mapping between the
What are the names of my contacts at IBM in Ulm?1
two naming systems.
The operational semantics of these questions can be
The second kind of gap between the two systems is that one
adequately represented with a database query. Let us consider
element in the language system may correspond to multiple
how this request would need to be posed to an RDF repository.
elements in the repository and vice versa. In our example the
SPARQL [19] query corresponding to our example question over
reference to the address can map to home address, work address,
the ontology shown on Figure 1 looks as follows:
or the organization address of the contact. This is partly due to the
ambiguity of the natural language, which is not the main focus of
our discussion in this paper. There are also situations where the
1
The name of the organization and the city were selected for granularity of categorization is different between natural language
shortness and carry no other information and repository representations. This happens when several
different concepts exist in the repository for objects which are meaning representation, the data and ontology, and the language
viewed as instances of the same concept in natural language. In tags. It could be argued that if there were correspondence between
our example this gap required the UNION in the query to the categories of database-independent meaning representation
represent the original natural language request. and the data and ontology, the language tags would not be needed.
Unfortunately, if the ontology and language system are to be
Third and the most important source of the information gap
developed independently, there is no way to maintain or ensure
between the meaning representation of the natural language
such match. Thus language tags provide the many-to-many
request and the SPARQL query is due to the fact that the query
mapping between the two independent systems of categorizations
must specify the navigation to the information in the repository
and eliminate the first and second kind of information gaps
using the repository structure. This information about the
between the meaning representation and semantic repositories.
repository organization is entirely absent from the natural
language question and cannot appear in a database independent Figure 3 illustrates language tags associated with a part of
semantic representation. our PIM ontology. A generalization like “Contact” can be
attached to specific classes like “Person” and “Organization”. A
We have designed and implemented the Natural Query (NQ)
general reference like “Name” can be attached to multiple
language and engine [14] that bridges the gaps identified above
elements like “givenName”, “familyName”, and so on. In our RDF
thus opening a way for portable (database independent) natural
repository of real-world and in-device data, we added language
language interfaces to semantic repositories. NQ can
tags to the RDF objects using a subproperty of RDFS label field.
automatically map meaning representation produced by language
systems into precise queries. NQ employs two mechanisms:
language tags and data graph search to return requested data using
3.2 Graph search
The third gap that exists between the database independent
only the information in the database-independent meaning
meaning representation of the natural language request and the
representation of the user request.
formal query that actuates it over a given database is the
information about the organization of the data repository. In order
3.1 Language Tags to navigate from the given attributes of an object to the target of
Language tags are words, expressions, and linguistic tokens
the query, SPARQL queries need to know the specific path that
attached to database elements such as classes and properties.
connects them on the database graph. In current language systems,
Multiple tags can be attached to a single element and a single tag
this path is encoded by the query and stored in the custom
can be attached to multiple elements. Language tags are the names
integration code for every different type of query. Thus a query
of the corresponding categories used by the language system(s).
defines a subgraph with given properties some of which are
When a language system produces a form like the one in our
specified in the database-independent meaning representation of
example,
the natural language request and some are encoded in the custom
contact.name: ? integration code component.
organization: IBM
While a formal query defines a connected subgraph as
city: Ulm
illustrated on Figure 4, the database-independent meaning
under the NQ system its interpretation is: representation only identifies some nodes and edges of this
subgraph. Identified fragment might be disconnected. In the
find the attributes tagged as “name” of an instance of the example above it identifies “Person” and
class tagged as “contact” related through properties tagged as “Organization” classes as well as “Ulm” value of “locality”
“organization” and “city” to values “IBM” and “Ulm” property (by reference to its language tag “city”) and “IBM” as a
respectively value of “name” property of an instance of “Organization” class.
Contact
This leads to an important idea: that the knowledge embedded in
the formal queries that know the database organization, can be
also extracted from the natural language meaning representation
and the data repository itself.
First name Name Last name
Address
City
in
Figure 3. Language tags for database elements
Language tags provide an opportunity for a semantic
annotation additional to the class names and their properties. In a
natural language system accessing an RDF repository data, we
have three layers of semantic information: database-independent Figure 4. Answering query via graph search
In Figure 4 it is possible to notice that for a given set of The system can answer questions ranging from “What is the
elements identified by a meaning representation of natural email of John?” to “Where does Ann work?” to “My meetings next
language request it is possible to identify the query subgraph by week in Cambridge with John from MIT” and “Who called me
searching the database. In other words, a program could find paths yesterday during the meeting with Ann?”. Some of these
connecting the nodes known from the meaning representation, questions would convert to quite complex relational or SPARQL
such as “Person”, “name”, “Organization”, “City”, “Ulm”, and queries. For example for the query “Who called me yesterday”, we
“IBM”. One of such paths is highlighted in the picture. need to find all telephone numbers of calls that occurred yesterday
and then find all people who have these telephone numbers. NQ
Therefore while traditional approaches to semantic analysis
query for this is very simple: “:select ‘Person’ :where ("Received
of natural language questions over databases rely on hand crafted
Call", Time ('yesterday'))”.
code or data for representing the information about the
organization of the database, NQ extracts such knowledge from If we classified questions according to domains, one domain
the data repository by using graph search. Given a question “Who would contain questions about the personal information data from
are my contacts at IBM in Ulm?”, NQ finds paths connecting the an address book application, for example “Who works as a real
nodes known from the database independent meaning estate broker?”. Another set of questions is about meetings, for
representation, such as “Person”, “name”, “Organization”, example, “When are my meetings next month at MIT?”. Yet
“City”, “Ulm”, and “IBM”. another set is about calls and messages, for example, “Who called
me last Friday?”. Finally there are questions spanning multiple
3.3 NQE Discussion domains, for example, “What are emails of people who
NQE may find multiple subgraphs that connect all given participated in a meeting on Monday?”, “Who called me when I
elements. In such cases we apply heuristic ranking of these was in Finland?”, and so on. All these types of queries were
subgraphs in order to determine the most relevant ones. So far we successfully created and executed on the extended PIM data store.
experimented with several ranking mechanisms all of which are
We found out that we could easily ask questions both about
variations on path length (weight) between the elements specified
the in-device data and the collected real-world data. Semantic
by the meaning representation. In all our experiments the results
integration of multiple data sources enhanced our question
retrieved by the system in response to natural language questions
answering capability significantly, allowing such questions as
correspond well with intuition of human subjects.
“Who called me when I was in Helsinki?”, “Which messages did I
The results returned by NQE are designed to support the receive during the meeting with Juha?”, etc. Although an out-of-
needs of conversational interfaces. If no results are found that pattern detection of someone’s Bluetooth device is a weak
match the elements specified in the meaning representation, NQE indication the phone user met the owner of the Bluetooth device,
returns best matches that include only a subset of elements in the in our experiments we assumed such implication. This allowed us
query. For example, if no contacts at IBM in Ulm can be found, to ask questions such as “Who did I meet last week?” or “At what
contacts at IBM in other cities would be returned as well as time did I meet Ann last Saturday?”
contact from Ulm that are not affiliated with IBM
NQE can perform basic reasoning over type hierarchy. A
“Person” is substitutable for a “Contact”, a
“MobilePhoneNumber” for a “PhoneNumber”, but the opposite is
not true. NQE supports organizational and geographic inclusion
and can perform corresponding reasoning. When a calendar
application lists meetings in Helsinki and Oulu, NQE can answer
questions regarding meetings in Finland, where these cities are
located. Similarly information about organizational structure can
be used to answer questions about Nokia while the database only
records Nokia’s internal organizations like Multimedia or
Enterprise Solutions. Finally NQE creates structures that can be
used to produce explanations regarding how the answers relate to
the questions.
We have created a proof of concept implementation of NQ in
Python [12] that runs on S60 [16] mobile phones. Full description
of the Natural Query system implementation is outside the scope
of this paper.
4. EXPERIENCE WITH THE SYSTEM Figure 5. Example question and answer
We tested our system on a PIM test data set containing 550
contacts with about 150 meetings and 250 phone calls, which is Test NQ queries mostly returned expected answers (96%
normal for executives with many active contacts and frequent recall, 92% precision) (Figure 5) including the approximate
meetings. The repository contained over 11000 RDF triples. We answers where the exact answers were not available. For example,
asked over 50 natural queries corresponding to over 600 the question “When was my meetings with Sam last month?” had
parameterized questions. no exact answers, so the system returned approximate answers of
meetings with Sam that did not occur last month as well as the Applications," Proc. ICSLP '00, Vol. III, pp. 271-274,
meetings that occurred last month, but did not include Sam. Beijing, China, Oct. 2000.
The performance of the system was acceptable with answers [2] Chipchase, J., “Why do People Carry Mobile Phones?”,
taking from less than a second to several seconds. The system http://www.janchipchase.com/blog/archives/2005/11/mobile
implementation is a prototype written in Python that was not _essentia.html, 2005.
optimized for memory or speed. The detailed evaluation of system [3] Davis, M., King, S., Good, N., Sarvas, R., “From Context to
performance is outside the scope of this paper. We are planning to Content: Leveraging Context to Infer Media Metadata”
optimize the system performance in the near future. Proceedings of the 12th annual ACM international
Conference on Multimedia, New York, NY, USA, pp: 188 –
5. RELATED AND FUTURE WORK 195, 2004.
Mobile data storage in RDF repositories is investigated by [4] Dill, S, et al., “SemTag and Seeker: Bootstrapping the
ConnectingMe [9] project at Nokia Research Center. We have Semantic Web via Automated Semantic Annotation”,
collaborated with ConnectingMe in the ontology and repository Proceedings of the 12th international conference on World
development. Some tools for data extraction and conversion are Wide Web, Budapest, Hungary, pp: 178 – 186, 2003.
shared between our two projects.
[5] N. Eagle, "Machine Perception and Learning of Complex
Semantic markup and annotation of web [7][4] and media Social Systems", Ph.D. Thesis, Program in Media Arts and
[3] data is a topic of active research. Our research is related to the Sciences, Massachusetts Institute of Technology, June 2005.
mobile media data annotation. There has been a lot of research on [6] Edwards, L., Barker, R., et al. “Developing Series 60
ontology creation tools. We used one of such tool—Protégé [11] Applications”, Addison Wesley 2004.
to design our extended PIM ontology.
[7] M. Erdmann, A. Maedche, H.P. Schnurr, S. Staab, “From
Event data has been gathered on mobile devices by a number manual to semi-automatic semantic annotation: About
of projects including Context [13] and Reality Mining [5]. In our ontology-based text annotation tools”, Proceedings of the
work, we have extended one of the data gathering frameworks Workshop on Semantic Annotation and Intelligent Content,
available at Nokia. 2000.
We have not discovered any research directly corresponding [8] FOAF Vocabulary Specification 0.9,
to the Natural Query approach. The Precise system by Popescu et http://xmlns.com/foaf/0.1/, 2007.
al. [10] attaches language tokens to database elements in a way [9] Lassila, O. et al, “ConnectingMe”,
very similar to language tags of NQ. Also the query derivation http://research.nokia.com/research/projects/connectingme/ind
approach of Precise is based on database graph search. NQ uses a ex.html, 2007.
more flexible data model, supports incomplete answers, and
collects data for explanations. [10] Popescu, A., Etzioni, O., and Kautz, H. 2003. Towards a
theory of natural language interfaces to databases.
In the future, we plan to connect our system to such natural Proceedings of the 8th international Conference on
language and speech systems as TINA [17] and Galaxy [18]. We intelligent User interfaces (Miami, Florida, USA, January 12
plan to perform user trials to evaluate our system and its user - 15, 2003). IUI '03. ACM Press, New York, NY, 149-157.
interface to real world data. We will collect additional data such
[11] Protégé Ontology Editor and Knowledge Acquisition
as email messages, songs listened, and pictures viewed and taken.
System, http://protege.stanford.edu/, 2007
We will also optimize the current prototype implementation.
[12] Python for S60, http://sourceforge.net/projects/pys60, 2007
6. CONCLUSIONS [13] Mika Raento, “Context software - A prototype platform for
Mobile devices are now able to continuously collect various contextual mobile applications”. Proceedings of the
events interesting to the user. Mobile devices also host structured International Proactive Computing Workshop. University of
and semi-structured information bases. We have demonstrated the Helsinki, 2004.
integration of all this data using a flexible and powerful RDF [14] Ran, A., and Lencevicius, R., “Natural Language Query
repository and a common ontology. We have designed and System for RDF Repositories”, To appear in Proceedings of
implemented a query language and engine NQ that can the Seventh International Symposium on Natural Language
automatically map meaning representation produced by language Processing, SNLP 2007, 2007.
systems into formal queries on RDF repositories. We have used
[15] Resource Description Framework, http://www.w3.org/RDF/,
language tags for mapping of the meaning representation to the
2007.
data classes. NQ uses graph search to extract the information
about the repository’s structure. Our experience shows that [16] S60 platform, http://www.s60.com, 2007
semantic data annotation and knowledge extraction significantly [17] S. Seneff, "TINA: A natural language system for spoken
improves the capability of natural languages interfaces to mobile language applications," Computational Linguistics, vol. 18,
data. no. 1, pp. 61-86, March 1992.
7. REFERENCES [18] S. Seneff, E. Hurley, R. Lau, C. Pao, P. Schmid, and V. Zue,
[1] Baptist L. and S. Seneff, "Genesis-II: A Versatile System for "GALAXY-II: A Reference Architecture for Conversational
Language Generation in Conversational System
System Development," Proc. ICSLP 98, Sydney, Australia, [21] Web Ontology Language, http://www.w3.org/TR/owl-
November 1998. features/, 2007.
[19] SPARQL Query Language for RDF,
http://www.w3.org/TR/rdf-sparql-query/, 2007.
[20] Vcard, http://www.w3.org/TR/vcard-rdf, 2007.