Semantic Annotation of Mobile Data for Language Access Raimondas Lencevicius Alexander Ran Nokia Research Center Cambridge Nokia Research Center Cambridge 3 Cambridge Center 3 Cambridge Center Cambridge, MA 02142 Cambridge, MA 02142 Raimondas.Lencevicius@nokia.com Alexander.Ran@nokia.com ABSTRACT are equipped with more and more sensors including GPS Mobile devices both host and collect significant amount of data receivers, Bluetooth transmitters and receivers, RFID receivers that could be interesting to users. To make this data easily and others. They also receive and store information about such accessible, it has to be stored in semantic repositories using a events as messages, phone calls, meetings, application usage and well-defined ontology. Relationships between data from various access to digital services. It is therefore natural to expect that this sources should be explicit. Natural language interface to such data data should be collected and made accessible on mobile device. is an attractive option for information access. However, there are However, there are some open questions that need to be semantic gaps between the data repositories and the formal resolved in order to make this data useful and accessible both to representation of meaning produced by language understanding the programs and to the mobile device users. Collected real-world systems. This paper describes a solution to the issues above. We data must be structured and integrated with other information have implemented a system that converts the mobile data into available on mobile devices such as the information found in the RDF format and annotates it with information necessary for user’s phone book or calendar. There also needs to be an intuitive efficient access via natural language. We have designed and interface that allows flexible access to collected information. implemented Natural Query system that automates the interface of natural language system and the semantic data repository. Mobile devices store a rich set of structured information. The Language tags are used to map between the natural language address book or phone book application contains names, phone meaning representation and the repository elements. Repository numbers, addresses and affiliations of personal contacts. The graph search is used to discover the knowledge about the calendar application contains entries for meetings with repository structure. participants, meeting location and time. We exchange messages and calls with people and organizations listed as our contacts. All these data are related. Retrieving these data based on their relation Categories and Subject Descriptors could be very useful for device owners. With such retrieval H.3.3 [Information Search and Retrieval], H.5.2 [User capabilities they could learn who called them when they were in Interfaces]: Natural language, I.2.4 [Knowledge California, or when is their next meeting with Ann from Representation Formalisms and Methods]: Semantic networks Accenture. Unfortunately, the relations between different data items are not always recorded explicitly when the events occur or Keywords information is entered in some application. Therefore it is Semantic annotation, query language, natural language. important to integrate the collected data by explicating its relation to the data available on the device. To achieve this goal, we have developed an extended PIM ontology that covers all relevant 1. INTRODUCTION types of information available on the mobile device: from Natural language based interaction with software is observed events, information from external data stores, to on- increasingly viewed as a promising addition and sometimes even device data from several mobile applications. Once the data was alternative to graphical user interfaces (GUIs), especially in the structured and augmented with relations, it is stored in RDF [15] domain of mobile devices. Mobile devices host structured and repository. semi-structured information bases, software services, and integrated devices such as cameras, music players, etc. Mobile So far mobile applications have been designed with their devices also make a perfect user interface to the real-world own user interfaces, mostly GUIs, and occasional dedicated environment. They are constantly carried with the user [2] hardware controls. Most of the application software on the mobile enabling gathering of user location information. Mobile devices device could benefit from a natural language interface to its functionality that would simplify and streamline performing various tasks. As a rule, language systems and mobile applications software are developed independently of each other. To recoup the investment in the development of a language system it must be capable to integrate with a broad range of information sources. Unfortunately information bases are not designed for interaction using natural language. As a result this integration process is mostly ad hoc, manual process. This severely limits the impact that maturing language processing technology can have on mobile device was richer than the types supported by standard transforming the way we interact with the mobile devices. ontologies; therefore we decided to create our own ontology. Vcard also uses string values for certain objects that we wanted to In our research, we have investigated ways of created robust represent as full fledged RDF objects with URIs and attributes so and portable natural language interfaces to semantic repositories. they would have identities and we could add information about We created a novel Natural Query (NQ) language and data access them. For example, city and country fields are represented as engine that greatly reduces the costs of providing natural language strings in vcard. However in order to represent even basic interfaces to semantic repositories. NQ can heuristically attach geographic relationships cities and countries must be represented operational semantic interpretation to a database independent as objects. meaning representation of a natural language question over a given semantic repository. NQ enables us to provide a natural We also considered mixing and matching types from several language interface to the integrated real-world and on-device data. ontologies for our data. This approach has the advantage of using NQ requires attaching basic linguistic information to structural types possibly known by other systems. However this approach, elements of semantic repository. In this paper we give a brief leads to a rather incoherent architecture of the ontology. We overview of such annotations for ontology in the extended PIM decided that creating a single internally consistent ontology was domain. preferable in our case. If needed, classes and properties in our ontology can be related to types in vcard and foaf via equivalence The paper describes the mobile data conversion into RDF declarations using RDFS and OWL [21]. and semantic annotation (Section 2). Additional annotation and knowledge extraction is needed for automated natural language Main class for contacts in our ontology is the Contact class. interface to the data repository (Section 3). Our experience with It contains address, email, group, phoneNumber and URL the system is presented in Section 4. We finish with the attributes. Organizations and persons can be Contacts, so we have description of related work and conclusions. Organization and Person classes inheriting from the Contact class. In addition to inherited attributes, Organization class also 2. MOBILE DATA INTEGRATION INTO has name and representative attributes. Person class adds SEMANTIC REPOSITORY affiliation, birthday, btDevice, familyName, givenName, and nickname attributes. Affiliation class showing the affiliation of a We had to deal with two major data sources: events gathered person with some organization has organization and title by data collection framework and PIM data available from PIM attributes. Part of the ontology relating these classes is shown in applications. This section describes data from both sources, Figure 1. necessary data conversion and integration into semantic repository. 2.1 Mobile Device Data Data on mobile devices is owned by different applications. This makes it hard to establish and explicitly indicate semantic relationship between different data items. This situation is acceptable as long as the users can only interact with their data using the limited set of functionality provided by the applications. However, if we open these data for language based access, it becomes necessary to support access to different data items using their semantic relationships. Some examples are referring to people by their affiliations, titles, city of residence or office Figure 1. Part of Mobile PIM ontology location; referring to meeting by their participants, subject, or location; referring to received calls by the name of caller’s Group class describes groups of contacts, such as office organization. colleagues or baseball friends. It has contacts attribute that contains contacts belonging to the group and name attribute. In our project we dealt with data that originated from the phone book application (sometimes also called address book) and Location is a generic class describing locations that has a the calendar application. Data in these applications are stored in number of subclasses: Address, Country, GPSLocation, separate Symbian data bases [6]. Since these databases cannot be GSMLocation, Locality, Pcode and Region. Address represents changed without interfering with the functionality of standard detailed addresses and contains country, locality, pcode, pobox, applications we chose to integrate all data in a separate semantic region, and street attributes. Country, Locality, Pcode and Region repository. We designed an extended Personal Information classes are simple with just a name attribute for respective objects. Management (PIM) ontology that adequately represented all data GSMLocation class describes locations as obtained from GSM items that we were interested in and their relationships. We network. It has carrier, cellTower and lac attributes. Carrier is implemented a set of Python scripts that extract the data from the cellular network operator, cellTower has a single cell tower native databases and import them into the PIM ontology. We used ID, and lac is a Location Area Code describing a certain region RDF repository for data storage. within the network. GPSLocation specifies locations using latitude and longitude attributes. We created the PIM ontology to cover all data available in the device. We considered using such standard ontologies as W3C foaf [8] and vcard [20]. However, the information available on the Mobile device Calendar application contains information needed to infer and attach these codes to some phone numbers about meetings. Meeting class has subject, location, participants, that enter the system without such codes. For example, the phone start and end attributes. number supplied via caller ID does not always include the country code. Custom code has to be written for many data items to Message class objects represent messages. They indicate convert them on entry into the form required by the semantic messageSubject, messageBody, receiver and sender. repository. One of the goals of semantic web is developing standard The attributes of Observation objects connect with other universal ontologies. Unfortunately, neither the existing objects of the repository. For example, the phoneNumber attribute ontologies, nor the one we used in our project can be claimed to of a CallObserved is of type PhoneNumber, which is also used in be standard. Attributes and data in different applications and the attribute phoneNumber of a Person or Organization class. domains vary significantly. For example, some calendar Therefore the gathered data semantically integrates with the on- applications may specify participants, while others don’t. Some device data. Common classes are basis for building relations address book applications may allow specifying birthdays for between data classes belonging to different applications. contacts, but others do not. Ontologies seem to follow in their structure the applications or uses that their creators considered at Another area where observed data integrates with on-device the ontology creation time. Classes are created based on particular data is the location information. GSM locations gathered on the use cases. Attributes are chosen based on data availability and phone can be related to geographical locations, such as cities, planned use of that data. Rather than focusing on the states or countries. Some data processing and additional relations standardization, we discovered that an important value of RDF in the RDF repository are needed for this. We use the partOf ontology is its extensibility – ability to accommodate new types relation between different objects to represent geographic or and attributes at any time. organizational inclusions. For example, a relation can indicate that Boston is a part of Massachusetts, which in turn is a part of the 2.2 Event Data USA. This attribute is also used to describe the GSM location For data collection on mobile devices, we have used one of containment within a certain geographical object. Since GSM the frameworks available within Nokia to collect events that occur locations are somewhat imprecise, we have chosen to associate on a mobile device: phone calls, SMS messages, nearby Bluetooth them with town or city level geographical entities. This provides devices, and GSM locations. All of these events are tagged with a sufficient information in most cases. If a more precise location can timestamp when they occur. For phone calls the device records be determined, it could be associated with a city neighborhood, the phone number called (or the phone number that called the street, house or even part of the office building. user) and call duration. For messages, the phone number and the For some other data, programs or users have to add message text is recorded. A GSM location change event is information to facilitate integration. Bluetooth device IDs need to recorded when the cell tower associated with the phone changes. be associated with specific persons, since such association is not Finally, the phone periodically scans for Bluetooth devices in its usually available in the mobile device phone book. For this reason vicinity and records their names and IDs. All observations are we added btDevice attribute to the Person class. It has to be filled stored in the objects of Observation subclasses: in with concrete values in order to associate the BTDeviceObserved, CallObserved, MessageObserved, and BTDeviceObserved observation to a specific person carrying a LocationObserved. Bluetooth device. Although the gathered data is interesting by itself, it becomes even more useful when properly linked to the data already 2.3 Discussion available in the device. For example, user may want to know In a number of cases we had to decide whether to represent where the person who called them lives. This information could particular entities as strings or as objects using URIs. It seems that be found by relating the call log to the phone book on the device constructing an object is almost always worthwhile, since such that maintains the association of phone numbers to people and objects can be later used for inter-object relations. For example, their addresses. To enable this connection, it is important to by having Country, Region and City objects, we are able to collect and preserve semantically relevant information. The indicate partOf relations between them. Also a single URI for a connection of gathered information to other data can be achieved particular object, for example, city, allows to detect such through time and location relationships, phone numbers, email connections as people living or working in a single city. addresses, Bluetooth IDs and other inverse functional properties. Overall, we found that our RDF repository is significantly Time and location can be used to relate data items that are either more flexible than a relational database. It naturally supports associated with same time period or the same location. All event multiple classes of contacts, multiple affiliations per person, and data is time stamped, which makes such associations relatively supports a sophisticated typing system. simple. Location can be related to time stamped data items through location observed during the same period of time. Unfortunately for establishing some other relationships however 3. NATURAL LANGUAGE INTERFACE there might be no generic approach. For example in order to Although the repository of integrated real-world and in- connect phone call and message data to other data associated with device data can be used in a variety of ways, for example, via the phone number, the phone number has to be known in a querying it using SPARQL [19], we were interested to provide an standard form URI. We used the standard international form of intuitive and flexible user interface to it. A general natural the phone number with country code and long distance code, for language interface to a rich data set could be more effective than a example +1 555 555-5555. However, data processing may be GUI based application. As a rule, information bases and language systems are SELECT DISTINCT $person ?givenName ?familyName developed independently of each other. Therefore information FROM bases are not designed for interaction using natural language and WHERE { $person a pim:Person; pim:givenName their integration process is mostly ad hoc, manual process. Figure ?givenName; pim:familyName ?familyName; pim:affiliation 2 is a sketch of a typical architecture that is used to provide a ?affiliation; pim:address ?person_address. natural language interface to databases and other back-end or $affiliation pim:organization $organization. native services. $organization pim:address ?organization_address; pim:name “IBM”. {?person_address pim:locality “Ulm”} UNION {?organization_address pim:locality “Ulm”}} Unfortunately in order for a language system to generate such semantic representation from the original questions, the language system must contain a large amount of information about the structure of the database and its content. Such information includes the facts that IBM is a name of an organization and Ulm is a name of a city, cities can be related to organization through their addresses, organizations are related to people through their affiliations, people are related to cities through their home and office addresses, and all these relationships and objects are represented by the specific structures and entities of the database. Entering such information into a language system is a tedious and costly process that is not only domain dependent but also is sensitive to specific choices of database organization. There is an obvious advantage in maintaining some independence between the database and the language system. One way to achieve this independence is to have the language system generate semantic representations of the questions that are as independent of the database organization as possible. Figure 2. Architecture sketch of Natural Language Interface In the example above semantic information contained in the to Services question and independent of database organization amounts to the The speech recognition and generation components translate following meaning representation: between text and speech modalities. The language understanding contact.name: ? component converts the text into a formal representation of organization: IBM meaning sometimes called semantic frame [17]. The language city: Ulm generation component converts the formal meaning representation to a natural language text [1]. The dialog manager uses the It is possible to have the language system produce such context of conversation to complete frames received from the database independent meaning representation of questions. But is language understanding module or created by the custom the information in such meaning representation of the question integration code from responses of backend services. The custom sufficient to perform the requested operation? Obviously there integration code also translates meaning representation frames it are several information gaps between this database independent receives from the dialog manager into a standard database query meaning representation and the database specific semantic or backend specific API requests. representation of the question in the form of a formal query. Let us assume the user asks the system about contacts in The first gap is due to different names used to refer to the some organization and geographical location: same elements in the language system and the repository. For example, the category called “city” in the language system Who do I know at IBM Ulm? corresponds to the attribute locality of the Address class. Who are my contacts at IBM in Ulm? Therefore there is a need to maintain the mapping between the What are the names of my contacts at IBM in Ulm?1 two naming systems. The operational semantics of these questions can be The second kind of gap between the two systems is that one adequately represented with a database query. Let us consider element in the language system may correspond to multiple how this request would need to be posed to an RDF repository. elements in the repository and vice versa. In our example the SPARQL [19] query corresponding to our example question over reference to the address can map to home address, work address, the ontology shown on Figure 1 looks as follows: or the organization address of the contact. This is partly due to the ambiguity of the natural language, which is not the main focus of our discussion in this paper. There are also situations where the 1 The name of the organization and the city were selected for granularity of categorization is different between natural language shortness and carry no other information and repository representations. This happens when several different concepts exist in the repository for objects which are meaning representation, the data and ontology, and the language viewed as instances of the same concept in natural language. In tags. It could be argued that if there were correspondence between our example this gap required the UNION in the query to the categories of database-independent meaning representation represent the original natural language request. and the data and ontology, the language tags would not be needed. Unfortunately, if the ontology and language system are to be Third and the most important source of the information gap developed independently, there is no way to maintain or ensure between the meaning representation of the natural language such match. Thus language tags provide the many-to-many request and the SPARQL query is due to the fact that the query mapping between the two independent systems of categorizations must specify the navigation to the information in the repository and eliminate the first and second kind of information gaps using the repository structure. This information about the between the meaning representation and semantic repositories. repository organization is entirely absent from the natural language question and cannot appear in a database independent Figure 3 illustrates language tags associated with a part of semantic representation. our PIM ontology. A generalization like “Contact” can be attached to specific classes like “Person” and “Organization”. A We have designed and implemented the Natural Query (NQ) general reference like “Name” can be attached to multiple language and engine [14] that bridges the gaps identified above elements like “givenName”, “familyName”, and so on. In our RDF thus opening a way for portable (database independent) natural repository of real-world and in-device data, we added language language interfaces to semantic repositories. NQ can tags to the RDF objects using a subproperty of RDFS label field. automatically map meaning representation produced by language systems into precise queries. NQ employs two mechanisms: language tags and data graph search to return requested data using 3.2 Graph search The third gap that exists between the database independent only the information in the database-independent meaning meaning representation of the natural language request and the representation of the user request. formal query that actuates it over a given database is the information about the organization of the data repository. In order 3.1 Language Tags to navigate from the given attributes of an object to the target of Language tags are words, expressions, and linguistic tokens the query, SPARQL queries need to know the specific path that attached to database elements such as classes and properties. connects them on the database graph. In current language systems, Multiple tags can be attached to a single element and a single tag this path is encoded by the query and stored in the custom can be attached to multiple elements. Language tags are the names integration code for every different type of query. Thus a query of the corresponding categories used by the language system(s). defines a subgraph with given properties some of which are When a language system produces a form like the one in our specified in the database-independent meaning representation of example, the natural language request and some are encoded in the custom contact.name: ? integration code component. organization: IBM While a formal query defines a connected subgraph as city: Ulm illustrated on Figure 4, the database-independent meaning under the NQ system its interpretation is: representation only identifies some nodes and edges of this subgraph. Identified fragment might be disconnected. In the find the attributes tagged as “name” of an instance of the example above it identifies “Person” and class tagged as “contact” related through properties tagged as “Organization” classes as well as “Ulm” value of “locality” “organization” and “city” to values “IBM” and “Ulm” property (by reference to its language tag “city”) and “IBM” as a respectively value of “name” property of an instance of “Organization” class. Contact This leads to an important idea: that the knowledge embedded in the formal queries that know the database organization, can be also extracted from the natural language meaning representation and the data repository itself. First name Name Last name Address City in Figure 3. Language tags for database elements Language tags provide an opportunity for a semantic annotation additional to the class names and their properties. In a natural language system accessing an RDF repository data, we have three layers of semantic information: database-independent Figure 4. Answering query via graph search In Figure 4 it is possible to notice that for a given set of The system can answer questions ranging from “What is the elements identified by a meaning representation of natural email of John?” to “Where does Ann work?” to “My meetings next language request it is possible to identify the query subgraph by week in Cambridge with John from MIT” and “Who called me searching the database. In other words, a program could find paths yesterday during the meeting with Ann?”. Some of these connecting the nodes known from the meaning representation, questions would convert to quite complex relational or SPARQL such as “Person”, “name”, “Organization”, “City”, “Ulm”, and queries. For example for the query “Who called me yesterday”, we “IBM”. One of such paths is highlighted in the picture. need to find all telephone numbers of calls that occurred yesterday and then find all people who have these telephone numbers. NQ Therefore while traditional approaches to semantic analysis query for this is very simple: “:select ‘Person’ :where ("Received of natural language questions over databases rely on hand crafted Call", Time ('yesterday'))”. code or data for representing the information about the organization of the database, NQ extracts such knowledge from If we classified questions according to domains, one domain the data repository by using graph search. Given a question “Who would contain questions about the personal information data from are my contacts at IBM in Ulm?”, NQ finds paths connecting the an address book application, for example “Who works as a real nodes known from the database independent meaning estate broker?”. Another set of questions is about meetings, for representation, such as “Person”, “name”, “Organization”, example, “When are my meetings next month at MIT?”. Yet “City”, “Ulm”, and “IBM”. another set is about calls and messages, for example, “Who called me last Friday?”. Finally there are questions spanning multiple 3.3 NQE Discussion domains, for example, “What are emails of people who NQE may find multiple subgraphs that connect all given participated in a meeting on Monday?”, “Who called me when I elements. In such cases we apply heuristic ranking of these was in Finland?”, and so on. All these types of queries were subgraphs in order to determine the most relevant ones. So far we successfully created and executed on the extended PIM data store. experimented with several ranking mechanisms all of which are We found out that we could easily ask questions both about variations on path length (weight) between the elements specified the in-device data and the collected real-world data. Semantic by the meaning representation. In all our experiments the results integration of multiple data sources enhanced our question retrieved by the system in response to natural language questions answering capability significantly, allowing such questions as correspond well with intuition of human subjects. “Who called me when I was in Helsinki?”, “Which messages did I The results returned by NQE are designed to support the receive during the meeting with Juha?”, etc. Although an out-of- needs of conversational interfaces. If no results are found that pattern detection of someone’s Bluetooth device is a weak match the elements specified in the meaning representation, NQE indication the phone user met the owner of the Bluetooth device, returns best matches that include only a subset of elements in the in our experiments we assumed such implication. This allowed us query. For example, if no contacts at IBM in Ulm can be found, to ask questions such as “Who did I meet last week?” or “At what contacts at IBM in other cities would be returned as well as time did I meet Ann last Saturday?” contact from Ulm that are not affiliated with IBM NQE can perform basic reasoning over type hierarchy. A “Person” is substitutable for a “Contact”, a “MobilePhoneNumber” for a “PhoneNumber”, but the opposite is not true. NQE supports organizational and geographic inclusion and can perform corresponding reasoning. When a calendar application lists meetings in Helsinki and Oulu, NQE can answer questions regarding meetings in Finland, where these cities are located. Similarly information about organizational structure can be used to answer questions about Nokia while the database only records Nokia’s internal organizations like Multimedia or Enterprise Solutions. Finally NQE creates structures that can be used to produce explanations regarding how the answers relate to the questions. We have created a proof of concept implementation of NQ in Python [12] that runs on S60 [16] mobile phones. Full description of the Natural Query system implementation is outside the scope of this paper. 4. EXPERIENCE WITH THE SYSTEM Figure 5. Example question and answer We tested our system on a PIM test data set containing 550 contacts with about 150 meetings and 250 phone calls, which is Test NQ queries mostly returned expected answers (96% normal for executives with many active contacts and frequent recall, 92% precision) (Figure 5) including the approximate meetings. The repository contained over 11000 RDF triples. We answers where the exact answers were not available. For example, asked over 50 natural queries corresponding to over 600 the question “When was my meetings with Sam last month?” had parameterized questions. no exact answers, so the system returned approximate answers of meetings with Sam that did not occur last month as well as the Applications," Proc. ICSLP '00, Vol. III, pp. 271-274, meetings that occurred last month, but did not include Sam. Beijing, China, Oct. 2000. The performance of the system was acceptable with answers [2] Chipchase, J., “Why do People Carry Mobile Phones?”, taking from less than a second to several seconds. The system http://www.janchipchase.com/blog/archives/2005/11/mobile implementation is a prototype written in Python that was not _essentia.html, 2005. optimized for memory or speed. The detailed evaluation of system [3] Davis, M., King, S., Good, N., Sarvas, R., “From Context to performance is outside the scope of this paper. We are planning to Content: Leveraging Context to Infer Media Metadata” optimize the system performance in the near future. Proceedings of the 12th annual ACM international Conference on Multimedia, New York, NY, USA, pp: 188 – 5. RELATED AND FUTURE WORK 195, 2004. Mobile data storage in RDF repositories is investigated by [4] Dill, S, et al., “SemTag and Seeker: Bootstrapping the ConnectingMe [9] project at Nokia Research Center. We have Semantic Web via Automated Semantic Annotation”, collaborated with ConnectingMe in the ontology and repository Proceedings of the 12th international conference on World development. Some tools for data extraction and conversion are Wide Web, Budapest, Hungary, pp: 178 – 186, 2003. shared between our two projects. [5] N. Eagle, "Machine Perception and Learning of Complex Semantic markup and annotation of web [7][4] and media Social Systems", Ph.D. Thesis, Program in Media Arts and [3] data is a topic of active research. Our research is related to the Sciences, Massachusetts Institute of Technology, June 2005. mobile media data annotation. There has been a lot of research on [6] Edwards, L., Barker, R., et al. “Developing Series 60 ontology creation tools. We used one of such tool—Protégé [11] Applications”, Addison Wesley 2004. to design our extended PIM ontology. [7] M. Erdmann, A. Maedche, H.P. Schnurr, S. Staab, “From Event data has been gathered on mobile devices by a number manual to semi-automatic semantic annotation: About of projects including Context [13] and Reality Mining [5]. In our ontology-based text annotation tools”, Proceedings of the work, we have extended one of the data gathering frameworks Workshop on Semantic Annotation and Intelligent Content, available at Nokia. 2000. We have not discovered any research directly corresponding [8] FOAF Vocabulary Specification 0.9, to the Natural Query approach. The Precise system by Popescu et http://xmlns.com/foaf/0.1/, 2007. al. [10] attaches language tokens to database elements in a way [9] Lassila, O. et al, “ConnectingMe”, very similar to language tags of NQ. Also the query derivation http://research.nokia.com/research/projects/connectingme/ind approach of Precise is based on database graph search. NQ uses a ex.html, 2007. more flexible data model, supports incomplete answers, and collects data for explanations. [10] Popescu, A., Etzioni, O., and Kautz, H. 2003. Towards a theory of natural language interfaces to databases. In the future, we plan to connect our system to such natural Proceedings of the 8th international Conference on language and speech systems as TINA [17] and Galaxy [18]. We intelligent User interfaces (Miami, Florida, USA, January 12 plan to perform user trials to evaluate our system and its user - 15, 2003). IUI '03. ACM Press, New York, NY, 149-157. interface to real world data. We will collect additional data such [11] Protégé Ontology Editor and Knowledge Acquisition as email messages, songs listened, and pictures viewed and taken. System, http://protege.stanford.edu/, 2007 We will also optimize the current prototype implementation. [12] Python for S60, http://sourceforge.net/projects/pys60, 2007 6. CONCLUSIONS [13] Mika Raento, “Context software - A prototype platform for Mobile devices are now able to continuously collect various contextual mobile applications”. Proceedings of the events interesting to the user. Mobile devices also host structured International Proactive Computing Workshop. University of and semi-structured information bases. We have demonstrated the Helsinki, 2004. integration of all this data using a flexible and powerful RDF [14] Ran, A., and Lencevicius, R., “Natural Language Query repository and a common ontology. We have designed and System for RDF Repositories”, To appear in Proceedings of implemented a query language and engine NQ that can the Seventh International Symposium on Natural Language automatically map meaning representation produced by language Processing, SNLP 2007, 2007. systems into formal queries on RDF repositories. We have used [15] Resource Description Framework, http://www.w3.org/RDF/, language tags for mapping of the meaning representation to the 2007. data classes. NQ uses graph search to extract the information about the repository’s structure. Our experience shows that [16] S60 platform, http://www.s60.com, 2007 semantic data annotation and knowledge extraction significantly [17] S. Seneff, "TINA: A natural language system for spoken improves the capability of natural languages interfaces to mobile language applications," Computational Linguistics, vol. 18, data. no. 1, pp. 61-86, March 1992. 7. REFERENCES [18] S. Seneff, E. Hurley, R. Lau, C. Pao, P. Schmid, and V. Zue, [1] Baptist L. and S. Seneff, "Genesis-II: A Versatile System for "GALAXY-II: A Reference Architecture for Conversational Language Generation in Conversational System System Development," Proc. ICSLP 98, Sydney, Australia, [21] Web Ontology Language, http://www.w3.org/TR/owl- November 1998. features/, 2007. [19] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/, 2007. [20] Vcard, http://www.w3.org/TR/vcard-rdf, 2007.