O KKA M: Towards a Solution to the “Identity Crisis” on the Semantic Web Paolo Bouquet, Heiko Stoermer, Michele Mancioppi and Daniel Giacomuzzi University of Trento Dept. of Information and Communication Tech. Trento, Italy Email: {bouquet, stoermer, manchioppi, giacomuzzi}@dit.unitn.it Abstract— One of the pillars of the Semantic Web enterprise is claim is that there are compelling theoretical reasons why the the idea that if people use standard names for resources (URIs), Semantic Web (and any other semantically driven information then the integration of information from different distributed system) should not force people to use shared URIs for logical sources will happen smoothly and efficiently simply by using URI identity as a key for merging RDF graphs into a single resources, but only (or mostly) practical reasons why people (virtual) graph. The question this paper will try to address is: do not use shared URIs for entities. how do we find and reuse an identifier for an entity on the (Semantic) Web? In this paper propose a system called O KKA M By analogy, our claim can be illustrated by considering the that is currently under development; we discuss requirements, different difficulty of building white page and yellow page ser- architecture, usage scenarios and services we have developed so far to tackle this “Identity Crisis” on the Semantic Web. vices. The former basically requires an efficient mechanism for listing entities, retrieving them, and distinguishing entities one I. I NTRODUCTION from another; the latter always presupposes some taxonomy, which is typically either too general (and therefore does not In the W3C recommendation Uniform Resource Identifier help in discriminating services), or too specific (and therefore (URI): Generic Syntax [1] , a resource is defined as “anything heavy to master for users), or too complex (not usable). that has identity”1 . This means that not only web accessible pages and documents are resources, but also people, cities and It should be clear that the problem of unique identifiers for conferences; even the concept of “car” and the property of resources (in its two flavors: logical resources and entities) “being the owner” of a car are resources, which can be referred is crucial for achieving semantic interoperability and efficient to and described as any other resource (e.g. in an ontology). knowledge integration. However, it is also evident that 99% Despite the generality of the Semantic Web approach, here of the research effort is on the problem of (i) designing we want to suggest that – in practice – there is an essential dif- shared ontologies, or (ii) designing methods for aligning and ference between managing and reusing identifiers of resources integrating heterogeneous ontologies (with special focus on the which correspond to “things” (in a very broad sense, ranging T-Box part of the ontology). Perhaps because of its “practical” from electronic documents to bound books, from people to flavor, we must recognize that only a very limited effort has cars, from conferences to unicorns) – we will call them entities been devoted to address the issue of identity management –, and identifiers which correspond to abstract objects (like for entities. For example, ontology editors, such as Protégé, predicates, relations, assertions) – which we will call logical support the “bad practice” of creating new URIs for any new resources. Our thesis is the following: while any attempt of instance created in an ontology. In our opinion, this problem is “forcing” the use of the same URIs for logical resources is not only of general interest for the Semantic Web enterprise, in principle likely to fail (as every application context has but is one of the most critical gaps in an ideal pipeline from its own peculiarities, and people tend to have different views data to semantic representation: if we do not have a reliable even about the same domain2 ), the same does not hold – or (and incremental) method for supporting the reuse of URIs holds at a level which is philosophically interesting but of for the new entities that are annotated in new documents (or little practical relevance – for entities. On other words, the any other data source), we risk to produce an archipelago of 1 ‘A resource can be anything that has identity. Familiar examples include “semantic islands” where conceptual knowledge may (or may an electronic document, an image, a source of information with a consistent not) be integrated (it depends on how we choose the names purpose (e.g., “today’s weather report for Los Angeles”), a service (e.g., an of classes and properties, and on the availability of cross- HTTP-to-SMS gateway), and a collection of other resources. A resource is not ontology mappings), but ground knowledge is completely dis- necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can connected. And since the most valuable knowledge is typically be resources, such as the operators and operands of a mathematical equation, about individuals, we take this to be an issue that should be the types of a relationship (e.g., “parent” or “employee”), or numeric values attacked. (e.g., zero, one, and infinity)’ [1]. 2 This is what in [2], which was co-authored by one of the authors of this paper, was called the distributed knowledge argument. In this paper, we introduce the main requirements and a prototype implementation of O KKA M3 , a service for sup- An Entity Profile stores untyped data about entities which porting transparent integration of knowledge about entities will support the human user or an application using the through simple identity management support. O KKA M can be O KKA M API to process descriptions about entities, and in described at two different levels: effect enable them to assess whether the entity they want to • the basic services – which belong to a module called store knowledge about in their own local KB already has a URI O KKA MC ORE – provide APIs to create and store URIs in O KKA M, or whether they have to create a new one. We store for entities, to add/modify/remove informal descriptions untyped data for the reason that typing an entity’s attributes of each entity, to index the resources in which knowledge would require us classify the entity, which would be in contrast about an entity is provided (e.g. ontologies, web pages); with the abovementioned goal. We do not discriminate types • on top of O KKA MC ORE , O KKA M offers a collection of of entities, because we explicitly want to be able to provide advanced services, including searching for already ex- naming and descriptions for any entity. isting entities (using different search criteria), extracting Of course at first sight one could think about what types information about entities, ranking results, supporting the of entities would be described in O KKA M, such as persons, reuse of URIs for entities in ontology editing, and so on. artifacts, locations, companies etc.; this could make it appear sensible to provide a basic set of typed attributes for these The structure of the paper is the following: in Sect. II, we in- entities. But we envision the system also to provide support for troduce our motivations and the resulting goals for our project less obvious applications such as Named Entity Recognition in more detail. After that, in Sect. III the O KKA M system from the field of Natural Language processing, which we will architecture and design approaches are illustrated. Sect. IV talk about later, in Section V. In these applications entities describes first basic services that have been implemented on might represent a location or a piece of text in a document, a top of O KKA MC ORE, whereas Sect. V illustrates two usage document itself or a collection of documents, and we end up scenarios we have addressed with the system. We conclude with an unlimited set of potential types of entries, which makes with a discussion of issues and an outlook on the further it impossible to provide a common set of typed attributes development of O KKA M in Sect. VI. for. Therefore it is our opinion that only untyped descriptive metadata can provide for the envisioned level of generality. II. G OALS AND M OTIVATIONS As soon as one starts thinking about the idea of an entity repository, the temptation of building what Craig Knoblock4 called an EntityBase in one of his recent talks, is very strong. In short, an EntityBase can be thought of as an entity- centric knowledge base, where knowledge is organized around entities instead of schemas (e.g. relational schemas or even ontologies). In such an approach, any entity type would be characterized by a collection of attributes (for example, for entities of type book, some attributes can be “author”, “title”, “date of publication” “publisher”), whose semantics is known in advance and explicitly specified. We called this a temptation, as it is extremely appealing Fig. 1. Schematic overview of O KKA M, plus external K/I sources (we would always know what we know about an entity), but also very dangerous, as it presupposes a commitment on the In addition to these untyped data, O KKA M provides for the meaning of an attribute which cannot be guaranteed in most management of what we call ontology references in the Entity practical situation by a repository which aims at being open, Profile, i.e. a set of URIs to external sources that are known extensible, global. Therefore, an important requirement for our to store information or knowledge about these entities, as service is that it is light and fast, which can’t be confused illustrated in Fig. 1. One of the reasons to go in this direction with yet another attempt in the direction of CYC [3] or was the motivation to make O KKA M provide a possible SUMO [4], as systems of this type offered useful approaches in solution to integration issues in the Semantic Web. While a certain areas, but have obviously not contributed to a solution great amount of work has been performed on schema-level of the identity problem in the Semantic Web. What we are information integration5 , the aspect of entity-level information- aiming to provide is a naming service for entities and directory and knowledge integration still offers many opportunities for containing entity profiles, not a knowledge base. providing interesting approaches. One possible application 3 The system is named after Occam, a medieval philosopher whose main 5 It is hardly impossible to cite all related work in this field. Specific to principle – known as the “Occam’s razor” – was: entities should not be the area of the Semantic Web, the reader is referred e.g. to the publication multiplied beyond necessity (in Latin: entia non sunt multiplicanda praeter list on http://www.ontologymatching.org for a host of publica- necessitatem). tions, or the Ontology Alignment Evaluation Initative (http://oaei. 4 Craig Knoblock’s homepage: http://www.isi.edu/ knoblock/. Unfortu- ontologymatching.org/) which performs an alignment contest. For an nately, at this point no citeable publications about this topic are available. overview more related to the database world, we refer e.g. to [6]. we envision to support with O KKA M is an extension-based biunivocal relation with an Entity Identifier, which is created equivalence check for classes in an alignment or integration by the system and represents the URI of the entity in O KKA M process. Currently, in a schema-level integration process with- with which users can identify entities within their application out extension check, classes can be estimated to be equivalent, or KB. Each Entity may have a Preferred Identifier, provided but without an extension check the result of this estimation by the user who creates the entity, e.g. to mirror an identifier cannot be proved. Additionally, without a service that provides used in their information system; the relation among Entities strong decision support about whether two individuals with the and Preferred Identifiers is labeled as a key because Entities same name are actually identical or not (which is the current can not share the Preferred Identifiers. Each Entity may any situation in the Semantic Web), an extension check will hardly number of Alternative Identifiers; similarly to the Preferred deliver very reliable results. With the help of Entity Profiles Identifiers, every Alternative Identifier can belong to only in O KKA M we hope to improve this situation, because if we one Entity at a time. To keep the diagram simple, it does look at a case where two assumedly equivalent classes show not address the fact that there can not be overlap among the that the sets of O KKA M-registered individuals associated to different types of identifiers. All the identifiers are names for them are identical, we have very strong reason to support this the Entities on which they are set: then all the identifiers set equivalence assumption. on a given Entity are synonyms. Since different Entities have The last component of the Entity Profile is a set of assertions different names, each identifier can appear on at most one of identity between entities. We provide these for the case Entity, no matter if it acts as an Entity Identifier, a Preferred where two entities with different URIs in O KKA M are later Identifier or an Alternative Identifier. discovered to describe the exact same object, and are thus Entities can have any number of Labels set on them. Each identical. One possible criticism at this point is certainly the Label has a prefix and a value. Label’s prefixes may be left question how we can know and be certain about identity of empty. Different Labels with the same prefix and value can entities. The answer is that we cannot. O KKA M will suffer only belong to different Entities; as illustrated in Figure 2, the from the same garbage-in-garbage-out property as any other triple consisting of the Label’s prefix, the Label’s value and the information system. But with O KKA M at least we can provide Entity on which the Label is set forms a key. Each Ontology a means for the Semantic Web to store and represent such Reference has a value; any number of Ontology References information, and we hope that by consistent use of O KKA M can be set on an Entity. Similarly to the Labels, Ontology in Semantic Web applications we can strongly improve the References with equivalent value can only belong to different current situation by enabling agents to gain a certain level entities. of confidence that they are actually “talking about” the same The Assertions of Identity are uniquely identified by their objects. Assertion of Identity Identifiers. Each Assertion of Identity involves exactly two Entities. Different Assertions of Identity III. O KKA MC ORE : CHARACTERIZING ENTITIES can not involve the same two Entities. A. Data Model and API O KKA MC ORE provides its users with functionalities to manage and retrieve entities and assertions of identity. From 0 1 Assertion prefix value a programmatical point of view, two APIs are provided: Assertion 0..N 0 of Identity 1 0 1 of Identity 1 has 1 0 0 1 11 0 0 0 1 Publication API: enables publishing, modifying and re- 0 1 Identifier • 110 Label moving of entities and assertions of identity; 11 00 involves has 00 11 • Inquire API: allows retrieval of entities matching a given 0..N Entity 1 value set of criteria. 11 00 1 1 has Identifier 2 00 1 11 0 Both APIs offer straightforward functionalities that one would 0 1 expect in such a system; for the sake of brevity we will not 0 1 Entity 1 has 0..N 0 Ontology 1 Reference 11 000 1 describe the full API in this article. 00 11 Preferred 1 1 0 0..1 0..N has Identifier 0 1 B. Architecture 0 has 1 synset identifier The currently available implementation of O KKA MC ORE 1 0 Alternative 1 00 11 is built on top of the J2EE 1.4 platform. The O KKA MC ORE 0..N Identifier has 1 Wordnet application is an Enterprise Application exposing the Publi- Identifier 0 wnVersion 1 cation and Inquire APIs as Web Services. Its architecture, as presented in Figure 3, follows the classical three-tier model Fig. 2. O KKA MC ORE’s Entity Relationship Diagram that subdivides the Presentation Logic (the Web Services), the Business Logic (carried out by an EJB module), and the Data The O KKA MC ORE application manages data describing Persistence logic. The Web Services framework we adopted is entities and assertions of identity between entities. The data Apache Axis26 . The DataPersistence layer is subdivided into structures we are using to model this are shown in the Entity Relationship Diagram presented in Figure 2. An Entity is in 6 Apache Axis 2 Home: http://ws.apache.org/axis2/ data is allowed to be stored. In particular, we want to stress the following: • first of all, we want to add a new entity only if it is not already stored in O KKA MC ORE. But this means that we need smart ways for recognizing if a new candidate entity is already stored, and for deriving when a new entity which looks like an entity already stored actually is a new one. These two requirements are crucial: failing to meet the first would lead to a lack of completeness (failing to support inferences which in theory are sound based on the fact that two names refer to the same entity); failing to meet the second would lead to a lack of correctness (false conclusions would be supported, based on the fact that two different entities have been collapsed onto a single identifier); • imagine we detect that an entity is already stored, and that we find a new occurence of that entity in a document Fig. 3. O KKA MC ORE Application Architecture where some information about it is provided. Question: what if the new information conflicts with the old one? And, even before, how do we detect that there is such an different modules: i) handling the configurations of the appli- inconsistency? cation, ii) marshalling the internal representation of the data • as it will be clarified in the section on envisaged ap- to external form, such as XML documents and iii) commu- plication scenarios, information may be imported in nication with the database. Thus, implementations exploiting O KKA MC ORE from very different sources, including hu- different database technologies may be plugged effortlessly. mans (who may be carefully making data entry), ad-hoc For this reason, during the development of the O KKA MC ORE wrappers designed to import entities from rich sources application, we implemented two different Data Persistence (e.g. lists of entities from Wikipedia), entity recognition modules: one based on a native XML database, and another tools (which may be extracting entity descriptions from one built on top of a relational database. free text). These potential sources may provide very At first we developed the XML Database based backend be- uneven data, including a lot of garbage, which would cause it allowed us to have a first prototype of O KKA MC ORE undermine the role of O KKA M as a general and reliable running in a very short time. The backend is based on the tool; Open Source database eXist7 . Although the flexibility of the • finally, a theoretical issue which needs to be addressed XML native database together with the XQuery expressiveness is the following: what does count as an entity? There enabled us to complete the backend relatively quickly, we is little doubt that people, organizations, cars, computer experienced scalability issues. It turned out that the number of files, electronic devices, are entities. But, for example, entities that can be managed by this backend ranges in the tens is a document an entity? Is it an abstract entity, or it is of thousands, which is way below our desired goal. Although identified with its physical realizations? If so, is every the XML database based backend performed well for testing copy of a document a different entity? Another example of the rest of the application, and made the O KKA MC ORE is: are logical resources (like concepts, relations, topics) application promptly available for tests with the other services entities? Or the entity is the linguistic expression used built on top of it, we decided to abandon this approach in favor to express a concept? But then are two linguistic formu- of a relational database based backend. lations of the same concept different entities? And fur- IV. S ERVICES thermore: are fictitious entities entities? Should we allow O KKA M can be viewed as a collection of services built on Pegasus and Spider Man to sneak into O KKA MC ORE? top of O KKA MC ORE. In this section we list the main services And the list can be made much longer8 . . . which – in our opinion – should belong to O KKA M, describe To address these issues, we have developed the following their implementation (if available), or present ideas on how compontents: they could be implemented. • OkkamListsManager On the WWW there are many lists of entities and are thus A. Population of O KKA MC ORE 8 We notice that a very practical version of these philosophical questions is The first important service is the one that supports the the following: what should be represented as an instance in an OWL ontology? population of O KKA MC ORE with new entities. In fact, there And what as a class/property? The issue is tricky, and we make only one are many issues that must be addressed and solved before new example: should “Pizza Margherita” be a class or aan instance of an Italian food ontology? If we check e.g. [5], we find that the answer to this type of 7 eXist Home: http://exist.sourceforge.net/ questions can be quite disappointing. a potentially important resource for O KKA M. For exam- is the most precise and complete because the user can ple Wikipedia provides lists of countries, cities, members provide information that the system can not automatically of particulars domains (e.g. Presidents of the United discover, and optimize the input in a feedback loop. States, Computer Scientists, etc) that are exactly the types • Protege Plugin of entities that we want to store in the system. With the We provide a plugin for the ontology editor Protege which objective to find a standard mechanism for integrating we describe in further detail in Sect. V-A. these entities into Okkam we developed a language (an XML Schema) that describes the input that a data source B. Searching for URIs has to follow to communicate with the population process Another critical service is searching for identifier of some of O KKA M. The main elements of the schema follows entity which is known either by some description (e.g. name the internal structure of Okkam, in fact we have elements for people), or by an identifier which was not issued by like ”Labels”, ”Label-prefix” or ”Label-value” that are O KKA M. This site should be held very simple, like a tra- easy to map with the tables elements of OkkamCore. ditional search engine, and based on an easy mechanism to This language is used by different wrappers that we visualize results. In a standard use case the user types in a developed and that try to convert the structure of a source keyword associated to an entity and the system searches the list into the O KKA M input standard. For lists from the repository for instances that match this label. For example Web (Wikipedia, Yahoo, Google, etc.) the main purpose if a user searches for entities that have the label ”Heiko of the wrappers is the data cleansing process from HTML Stoermer” the O KKA M Management System will search in the tags. After this step the entity collections is normalized database the instances that have this keyword and will return with the objective to delete duplicates. Entities with the the main information about these. The main data will be the same annotation label are recognized by the system and URI of the entity, the other labels associated to the entity, the O KKA M administrative user can check if there exist in our example can be ”H.Stoermer”, ”Stoermer”, Mr. Heiko conflicts from members of the list that are the same entity Stoermer” and the classes of the ontologies where the entity (from a logical point of view). During insertion, for ach is used. In our example we have different classes as ”Person” entity the system searches O KKA M if there is already an or ”PHD Student”. The information about the classes where entity with the same label/s. If yes, this entity is “frozen” other person use the entity, its URI, are very important because and included in a set of entities that should be checked by with this data the user can chose which entity is the correct the administrator before addition to the system, otherwise URI in the O KKA M. it is added immediately. If we have two URI’s that share the same label ”Roma”, but • OkkamDBManager one is attached to class like ”City” or ”Capital” and other refers Another important information source for O KKA M can to ”Person” or ”Customer” or ”Employee”, the user can easyily be generic databases, as far as we have access to them. understand whether he needs the first URI because he wants Examples might include direct database access to in- to speak about the capital of Italy or the second one because formation systems such as extranets, online shops or he refers to a Person named ”Roma”. The filtering process, publishing houses. In this case the transformation from with information about classes, can be performed before the the internal structure of the tables into the O KKA M input search step: the submitted query can be a pair of ”keyword - language is easier because the main objective of the class”. This means that the system will return only the URI process is writing queries that build the link between the that fulfil both the terms of the interrogation. Web O KKA M database structure and the okkam data structure. When This first use case is very simple and understandable because the transformation into the input language is completed, the most difficult process, the filtering task, is delegated to the the rows that come from the database follow the same user responsible for this operation. process that we already describe with web lists. With The Web site of the O KKA M is not the only application database sources the role of the user becomes more built on the URI database. There are many situations where it important because, with high numbers of entities, dupli- is very difficult to believe that users use the web site to search cation and redundancy are an increasing problem. the URI of the resources that they need. For example, if we • OkkamManualEntry have a large database with all employees of an organization is Another solution we provide to insert new entities is impossible that the designers and developers wanting to build the manual case. A Web interface provides easy access semantic application on this data search in the O KKA M web to the insert function. The user can add new entities, site all the URI’s of the persons stored in their database. This with labels, ontology references, etc., to the system using process can be simplified if they can use an automatic service, a form to specify all the information that he/she want in this case a web service, that provide an access point to describe the new entity with. As in the previous case, the O KKA M that an application can use. The developers can if the system finds a possible conflict with entities that build an application that extract the data from their database are already in Okkam, it issues a warning message that and send them to the web service which will return some informs the user of the possible error. This methodology results, URI, about the information that already are stored in of insertion is the slowest that Okkam provides, but it the O KKA M. V. T WO USAGE SCENARIOS Profile is created in O KKA M and the resulting URI is A. Runtime support for ontology editing used accordingly. For subsequent discoveries of the same named entitiy, the same URI will be used to indicate Another important area for which O KKA M has to provide that the two discovered entities are in fact the same, just services and applications are existing Semantic Web tools. In in different locations of the document. This approach is particular, ontology editors are applications where users build equally applicable to discovered coreferences11 . If the a formalization of part of the world by means of classes and NLP process updates the Entity Profiles in O KKA M instances of these classes, all identified by URI’s. One of the correctly, we gain direct access to search situations of the most widely used and important editors is Protege, an open type “show me all documents that talk about this entity”, source product that can be extend and modified with ”plug- as the respective links would be stored as Ontology ins” added on the core system. For the O KKA M vision it is of References which we can evaluate and reason about with high importance to develop a plug-in for this application which a higher-level service. provides a connection with the URI database when users create • Refinement: Identity Discovery new instances of a class, which we are doing as illustrated in In the refinement phase, as depicted in Fig. 5, we can Fig. 5. If a user creates a new instance of a class, instead of address shortcomings of the NLP processes in terms of assigning an arbitrary, meaningless number as ID the plug-in discovery of identity. The VIKEF pipeline has dedicated will search the repository whether an URI already exists that a whole processing step to this issue, as – at the named can be assigned to this new instance. The selection process is entitiy extraction level – it is not always possible to envisioned similar to the web search use case where a list of detect identity between entities. Obvious examples in this URI’s that match the label for the new instance are visualized case are missing correspondences between orthographic to the user. variations hinted at already in Sect. IV-B, e.g. the fact that Important support for all the selection processes comes within one document there is a certain probability that the from additional tools, as for example WordNet, that provide strings “Stoermer”, “H. Stoermer” and “Heiko Stoermer” information about the meaning of the classes used in the denote the exact same individual. With support of the ontologies where the new instances are created. With this O KKA M system, we have implemented several heuristics information the system has more data to try to recognize the to address this issue, the simplest performing a substring correct URI to return to the users or application that query query to O KKA M and using a string similarity measure O KKA M. on the results to choose candidates for establishing an assertion of identity between them, and thus to cluster B. Supporting Knowledge Extraction and Representation annotations. A higher level process is free to either choose One of the scenarios we are currently implementing with one single URI for all the annotated entities or to retain the help of O KKA M is to support Knowledge Extraction (KE) the original URIs, as it is always possible to perform processes and the resulting Knowledge Representation (KR) clustering via analysis of identity assertions in O KKA M. in a Semantic Web project9 that aims at building a large-scale Knowledge Base (KB) from information stored in distributed VI. D ISCUSSION AND C ONCLUSION document bases. The architecture comprises a pipeline of O KKA M is the typical example of an application which processes that covers all steps from KE to the building of the is not based on some radically new scientific result, but KB (the so-called Semantic Resource Network) for end-user aims at filling a gap by using existing technologies in a services, as illustrated in Fig. 5 new way. In our opinion, without O KKA M (or a similar Within the pipeline there are several points of application service), most Semantic Web promises will never be kept, as imaginable, two of which we have currently implemented and it provides a sort of bottom level for integration which cannot are further described here: be achieved ex post when the ball stops. However, the fact • Information Extraction: Named Entity Recognition and that the basic technologies are already available should not Coreference lead us to underestimate the critical factors which may affect Whenever our NLP process recognizes a named entity the success and adoption of O KKA M. In addition to aspects in a piece of text, it interacts with O KKA M to analyze already discussed throughout the paper, we identify acceptance whether this named entity already has a unique URI. issues in the form that not every party involved in the Semantic If yes, the NLP process stores locally10 the fact that a Web may be willing to use a centrally managed service that uniquely identified entity has been discovered with addi- is outside of their control. Privacy issues include all the well- tional information such as its location in the document, known aspects of data security, access management, privacy etc. If the entity does not have a URI yet, an Entity etc. that almost all public information systems share. Last, but not least there are of course questions of offered features and 9 see http://www.vikef.net for further information about the VIKEF project. 11 A coreference is a linguistic pattern typically involving pronouns when 10 In fact, the annotations created in this phase are stored in an XML file, talking about an object that has previously been named. Example: “Peter is a which is later refined and then used as a base for the generation of RDF good runner. He does 10k in 45 minutes.” The personal pronoun he establishes annotations that will be fed into a large knowledge base. the coreference in this case. Fig. 4. A Protege plugin for generating individuals registered in O KKA M. Fig. 5. Knowledge pipeline to be supported by O KKA M functionality, such as a really efficient and intelligend search do with existence). and ranking mechanism for Entity Profiles in O KKA M, as well From a design perspective, what we propose is to keep these as performance and scalability issues which are again common two tasks separated: on the one hand, we need a universal and to most information systems. Our planned next steps are to non ambiguous way to refer to the entities about which an address exactly these issues in the form of further research and agent may have some knowledge; on the other hand, we need a by developing additional services on top of O KKA MC ORE. way to specify knowledge about these entities. We believe that We conclude with the statement that currently, when creat- the help of O KKA M this goal can be achieved more cleanly ing ontologies, people actually perform two different tasks: for the Semantic Web, as to existing methods of specifying they specify a conceptualization, and then “populate” such knowledge in the form of ontologies and knowledge bases a conceptualization with instances by assigning instances to we add an identity and reference architecture with a central some class and specifying the values for properties (if any). It character that enables systems and agents to ensure that they is a trivial observation that the same domain (set of entities) “talk” and store knowledge about the same entities, if these may be used to populate different ontologies (e.g. we may have objects share the same identifier. two different conceptualizations of Italian wines&food), and VII. ACKNOWLEDGMENTS that any two ontologies (e.g. an ontology about semantic web researchers and another about people living in Italy) may have This research was partially funded by the European Com- overlapping domains. Creating a conceptual schema and then mission under the 6th Framework Programme IST Integrated populating it with instances address two different issues: the Project VIKEF - Virtual Information and Knowledge Envi- first is an epistemological issue (it has to do with knowledge ronment Framework (Contract no. 507173, Priority 2.3.1.7 about the world), the second is an ontological issue (it has to Semantic-based Knowledge Systems; more information at http://www.vikef.net) R EFERENCES [1] T. Berners-Lee, R. Fielding, and L. Masinter. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. IETF (Internet Engineering Task Force), 2005. http://www.ietf.org/rfc/rfc3986.txt. [2] Matteo Bonifacio, Paolo Bouquet, and Paolo Traverso. Enabling dis- tributed knowledge management: Managerial and technological implica- tions. Informatik - Zeitschrift der schweizerischen Informatikorganisatio- nen, 1:23–29, 2002. [3] Douglas B. Lenat. Cyc: A large-scale investment in knowledge infras- tructure. Commun. ACM, 38(11):32–38, 1995. [4] Ian Niles and Adam Pease. Towards a standard upper ontology. In FOIS, pages 2–9, 2001. [5] Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford University. http://protege.stanford.edu/publications/ontology development/ontology101.html. [6] Erhard Rahm and Philip A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases, 10(4):334–350, 2001.