=Paper=
{{Paper
|id=None
|storemode=property
|title=Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
|pdfUrl=https://ceur-ws.org/Vol-925/paper_6.pdf
|volume=Vol-925
|dblpUrl=https://dblp.org/rec/conf/ekaw/AlexopoulosRG12
}}
==Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation==
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation Panos Alexopoulos, Carlos Ruiz, and José-Manuel Gómez-Pérez iSOCO, Avda del Partenon 16-18, 28042, Madrid, Spain, {palexopoulos,cruiz,jmgomez}@isoco.com Abstract. The rapidly increasing use of large-scale data on the Web has made named entity disambiguation a key research challenge in Informa- tion Extraction (IE) and development of the Semantic Web. In this paper we propose a novel disambiguation framework that utilizes background semantic information, typically in the form of Linked Data, to accurately determine the intended meaning of detected semantic entity references within texts. The novelty of our approach lies in the definition of a struc- tured semi-automatic process that enables the custom selection and use of the semantic data that is optimal for the disambiguation scenario at hand. This process allows our framework to adapt to the particular char- acteristics of different domains and scenarios and, as experiments show, to be more effective than approaches primarily designed to work in open domain and unconstrained situations. 1 Introduction Information Extraction (IE) involves the automatic extraction of structured in- formation from texts, such as entities and their relations, in an effort to make the information of these texts more amenable to applications related to Question Answering, Information Access and the Semantic Web. In turn, named entity resolution is an IE subtask that involves detecting mentions of named entities (e.g. people, organizations or locations) within texts and mapping them to their corresponding entities in a given knowledge source. The typical problem in this task is ambiguity, i.e. the situation that arises when a term may refer to mul- tiple different entities. For example, “Tripoli” may refer, among others, to the capital of Libya or to the city of Tripoli in Greece. Deciding which reference is the correct one in a given text is a challenging task which a significant number of approaches have been trying to facilitate for a long time now [2] [3] [6] [7] [5] [8]. The majority of these approaches rely on the strong contextual hypothesis of Miller and Charles [9] according to which terms with similar meanings are often used in similar contexts. The role of these contexts, which practically serve as disambiguation evidence, is typically played by already annotated documents (e.g. wikipedia articles) which are used to train term classifiers. These classifiers 2 link a term to its correct meaning entity, based on the similarity between the term’s textual context and the contexts of its potential entities [8] [10]. An alternative kind of disambiguation evidence that has recently begun to be used are semantic structures like ontologies and Linked Data [7] [6] [12]. The respective approaches typically employ graph-related measures to determine the similarity between the graph formed by the entities found within the ambiguous term’s textual context and the graphs formed by each candidate entity’s “neigh- bor” entities in the ontology. The candidate entity with the best matching graph is assumed to be the correct one. An obvious limitation of this is the need for comprehensive semantic infor- mation as input to the system; nevertheless the increasing availability of such information on the Web, typically in the form of Linked Data, can help over- come this problem to a significant degree. Still, however, effectiveness of these approaches is highly dependent on the degree of alignment between the content of the texts to be disambiguated and the semantic data to be used. This means that the ontology’s elements (concepts, instances and relations) should cover the domain(s) of the texts to be disambiguated but should not contain other addi- tional elements that i) do not belong to the domain or ii) do belong to it but do not appear in the texts. To show why this is important, consider an excerpt from a contemporary foot- ball match description saying that “Ronaldo scored two goals for Real Madrid”. To disambiguate the term “Ronaldo” in this text using an ontology, the only contextual evidence that can be used is the entity “Real Madrid”, yet there are two players with that name that are semantically related to it, namely Cristiano Ronaldo (current player) and Ronaldo Luis Nazario de Lima (former player). Thus, if both relations are considered then the term will not be disambiguated. Yet, the fact that the text describes a contemporary football match suggests that, in general, the relation between a team and its former players is not ex- pected to appear in it. Thus, for such texts, it would make sense to ignore this relation in order to achieve more accurate disambiguation. Unfortunately, current approaches do not facilitate such a fine-grained control over which parts of a given ontology should be used for disambiguation in a given scenario and which not. Some of them allow the constraining of the concepts to which the potential entities may belong [6] [8], but they do not do the same for relations nor do they provide any structured process and guidelines for better execution of this task. That is because their goal is to build scenario and domain independent disambiguation systems where a priori knowledge about what enti- ties and relations are expected to be present in the text is usually unavailable. Indeed, this is the case in scenarios involving news articles, blog posts, tweets and generally texts whose exact content cannot really be predicted. Yet there can be also specialized scenarios where such predictions can be safely made. One such scenario is the one above about football match descriptions. This was in the context of the project BuscaMedia1 and involved the disambiguation of football related entities within texts describing highlights of football matches. 1 http://www.cenitbuscamedia.es/ 3 The nature of these texts made safe the assumption that the entities expected to be found in them were players, coaches and teams and that the relations implied between them were the ones of current membership (i.e. players and coaches related to their current team). A similarly specialized scenario was in the project GLOCAL2 , involving the disambiguation of location entities within historical texts describing military conflicts. Again, the nature of these texts allowed us to expect to find in them, among others, military conflicts, locations where these conflicts took place and people and groups that participated in them. Given that, in this paper we define a novel ontology-based disambiguation framework that is particularly applicable to similar to the above scenarios where knowledge about what entities and relations are expected to be present in the texts is available. Through a structured semi-automatic process the framework enables i) the exploitation of this a priori knowledge for the selection of the subset of domain semantic information that is optimal for the disambiguation scenario at hand, ii) the use of this subset for the generation of disambigua- tion evidence and iii) the use of this evidence for the disambiguation of entities within the scenario’s texts. As we will show in the rest of the paper, this process allows our system to be more effective in such constrained scenarios than other disambiguation approaches designed to work in unconstrained ones. The rest of the paper is as follows. Section 2 presents related work while section 3 describes in detail our proposed framework. Section 4 presents exper- imental results regarding the framework’s effectiveness in the two application scenarios mentioned above. Finally, in sections 5 and 6 we make a critical dis- cussion of our work, we summarize its key aspects and we outline the potential directions it could take in the future. 2 Related Work A recent ontology-based entity disambiguation approach is described in [7] where an algorithm for entity reference resolution via Spreading Activation on RDF Graphs is proposed. The algorithm takes as input a set of terms associated with one or more ontology elements and uses the ontology graph and spreading acti- vation in order to compute Steiner graphs, namely graphs that contain at least one ontology element for each entity. These graphs are then ranked according to some quality measures and the highest ranking graph is expected to contain the elements that correctly correspond to the entities. Another approach is that of [4] where the application of restricted relation- ship graphs (RDF) and statistical NLP techniques to improve named entity annotation in challenging Informal English domains is explored. The applied re- strictions are i) domain ones where various entities are a priori ruled out and ii) real world ones that can be identified using the metadata about entities as they appear in a particular post (e.g. that an artist has released only one album, or has a career spanning more than two decades). 2 http://glocal-project.eu/ 4 In [5] Hassel et al. propose an approach based on the DBLP-ontology which disambiguates authors occurring in mails published in the DBLP-mailing list. They use ontology relations of length one or two, in particular the co-authorship and the areas of interest. Also, in [12] the authors take into account the semantic data’s structure, which is based on the relations between the resources and, where available, the human-readable description of a resource. Based on these characteristics, they adapt and apply two text annotation algorithms: a structure based one (Page Rank) and a content-based one. Several approaches utilize Wikipedia as a highly structured knowledge source that combines annotated text information (articles) and semantic knowledge (through the DBPedia3 [1] and YAGO [13] ontologies). For example, DBPe- dia Spotlight [8] is a tool for automatically annotating mentions of DBPedia resources in text by using i) a lexicon that associates multiples resources to an ambiguous label and which is constructed from the graph of labels, redirects and disambiguations that DBPedia ontology has and ii) a set of textual references to DBPedia resources in the form of Wikilinks. These references are used to gather textual contexts for the candidate entities from wikipedia articles and use them as disambiguation evidence. A similar approach that uses the YAGO ontology is the AIDA system [6] which combines three entity disambiguation measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, and the semantic coherence among candidate entities for all mentions together. The latter is calculated based on the distance between two entities in terms of type and subclassOf edges as well as the number of incoming links that their Wikipedia articles share. The difference between the above approaches and our framework is detected in the way they treat the available semantic data. For example, Spotlight uses the DBPedia ontology only as an entity lexicon without really utilizing any of its relations, apart from the redirect and disambiguation ones. Thus, it’s more text-based than ontology-based. On the other hand, AIDA builds an entity relation graph by considering only the type and subclassOf relations as well as “assumed” relations inferred by the links within the articles. The problem with this approach is that important semantic relations that are available in the ontology are not utilized and, of course, there is no control over which edges of the derived ontology graph should be utilized in the given scenario. Such control is not provided either in [7] or any of the rest aforementioned approaches except for that of [5] which, however, is specific for the scientific publications domain. 3 Proposed Disambiguation Framework Our framework targets the task of entity disambiguation based on the intuition that a given ontological entity is more likely to represent the meaning of an ambiguous term when there are many ontologically related to it entities in the 3 http://dbpedia.org 5 text. These related entities can be seen as evidence whose quantitative and qualitative characteristics can be used to determine the most probable meaning of the term. For example, consider a historical text containing the term “Tripoli”. If this term is collocated with terms like “Siege of Tripolitsa” and “Theodoros Kolokotronis” (the commander of the Greeks in this siege) then it is fair to assume that this term refers to the city of Tripoli in Greece rather than the capital of Libya. Nevertheless, as we already showed in the introduction, which entities and to what extent should serve as evidence in a given scenario depends on the domain and expected content of the texts that are to be analyzed. For that, the key ability our framework provides to its users is to construct, in a semi-automatic manner, semantic evidence models for specific disambiguation scenarios and use them to perform entity disambiguation within them. In particular, our frame- work comprises the following components: – A Disambiguation Evidence Model that contains, for a given scenario, the entities that may serve as disambiguation evidence for the scenario’s target entities (i.e. entities we want to disambiguate). Each pair of a target entity and an evidential one is accompanied by a degree that quantifies the latter’s evidential power for the given target entity. – A Disambiguation Evidence Model Construction Process that builds, in a semi-automatic manner, a disambiguation evidence model for a given scenario. – An Entity Disambiguation Process that uses the evidence model to de- tect and extract from a given text terms that refer to the scenario’s target entities. Each term is linked to one or more possible entity uris along with a confidence score calculated for each of them. The entity with the highest confidence should be the one the term actually refers to. In the following paragraphs we elaborate on each of the above components. 3.1 Disambiguation Evidence Model and its Construction For the purposes of this paper we define an ontology as a tuple O = {C, R, I, iC , iR } where – C is a set of concepts. – I is a set of instances. – R is a set of binary relations that may link pairs of concept instances. – iC is a concept instantiation function C → I. – iR is a relation instantiation function R → I × I. The Disambiguation Evidence Model defines for each ontology instance which other instances and to what extent should be used as evidence towards its correct meaning interpretation. More formally, given a domain ontology O, a disambiguation evidence model is defined as a function dem : I × I → [0, 1]. If 6 i1 , i2 ∈ I then dem(i1 , i2 ) is the degree to which the existence, within the text, of i2 should be considered an indication that i1 is the correct meaning of any text term that has i1 within its possible interpretations. To construct the optimal evidence model for a given disambiguation scenario we proceed as follows: First, based on the scenario, we determine the concepts the instances of which we wish to disambiguate (e.g. players, teams and man- agers for the football match scenario). Then, for each of these concepts, we determine the related to them concepts whose instances may serve as contextual disambiguation evidence. The result of the above analysis should be a disam- biguation evidence concept mapping function evC : C → C × Rn which given a target concept ct ∈ C returns the concepts which may act as evidence for it along with the ontological relations whose composition links this concept to the target one. Table 1 contains an example of such a function for the football match descriptions scenario where, for instance, soccer players provide evidence for other soccer players that play in the same team. This mapping, shown in the second row of the table, is facilitated by the composition of the relations dbp- prop:currentclub (that relates players to their current teams) and its inverse one is dbpprop:currentclub of (that relates teams to their current players). Table 2 illustrates a similar mapping for the military conflict texts scenario. Table 1. Sample Disambiguation Evidence Concept Mapping for Football Match De- scriptions Target Concept Evidence Concept Relation(s) linking Evi- dence to Target dbpedia-owl:SoccerPlayer dbpedia-owl:SoccerClub is dbpprop:currentclub of dbpedia-owl:SoccerPlayer dbpedia-owl:SoccerPlayer dbpprop:currentclub, is dbp- prop:currentclub of dbpedia-owl:SoccerClub dbpedia-owl:SoccerPlayer dbpprop:currentclub dbpedia-owl:SoccerClub dbpedia- dbpedia-owl:managerClub owl:SoccerManager dbpedia- dbpedia-owl:SoccerClub is dbpedia-owl:managerClub of owl:SoccerManager Using the disambiguation evidence concept mapping, we can then automat- ically derive the disambiguation evidence model dem as follows: Given a tar- get concept ct ∈ C and an evidence concept ce ∈ C then for each instance it ∈ iC (ct ) and ie ∈ iC (ce ) that are related to each other through the com- position of relations {r1 , r2 , ..., rn } ∈ evC (ct ) we derive the set of instances It ⊆ I which share common names with it and are also related to ie through {r1 , r2 , ..., rn } ∈ evC (ct ). Then the value of dem for this pair of instances is computed as follows: 1 dem(it , ie ) = (1) |It | 7 Table 2. Sample Disambiguation Evidence Concept Mapping for Military Conflict Texts Target Concept Evidence Concept Relation(s) linking Evi- dence to Target dbpedia- dbpedia- dbpprop:place owl:PopulatedPlace owl:MilitaryConflict dbpedia- dbpedia- dbpprop:place, dbpedia- owl:PopulatedPlace owl:MilitaryConflict owl:isPartOf dbpedia- dbpedia- is dbpprop:commander of, dbp- owl:PopulatedPlace owl:MilitaryPerson prop:place dbpedia- dbpedia- dbpedia-owl:isPartOf owl:PopulatedPlace owl:PopulatedPlace dbpedia- dbpedia- dbpprop:commander owl:MilitaryPerson owl:MilitaryConflict The intuition behind this formula is that the evidential power of a given entity is inversely proportional to the number of different target entities it provides evidence for. If, for example, a given military person has fought in many different locations with the same name, then its evidential power for this name is low. 3.2 Entity Disambiguation Process The entity reference resolution process for a given text document and a disam- biguation evidence model starts by extracting from the text the set of terms T that match to some instance belonging to a target or an evidence concept, that is some i ∈ iC (c), c ∈ Ct ∪ Ce . Along with that we derive a term-meaning mapping function m : T → I that returns for a given term t ∈ T the instances it may refer to. We also consider Itext to be the superset of these instances. t Then we consider the set of potential target instances found within the Itext ⊆ t Itext and for each it ∈ Itext we derive all the instances ie from Itext for which dem(it , ie ) > 0. Subsequently, by combining the evidence model dem with the term meaning function m we are able to derive an entity-term support function t t sup : Itext × T → [0, 1] that returns for a target entity it ∈ Itext and a term t ∈ T the degree to which t supports it : 1 ! sup(it , t) = dem(it , ie ) (2) |m(t)| ie ∈m(t) Using this function we are able to calculate for a given term in the text the confidence that it refers to the entity it ∈ m(t) as follows: " ! t∈T K(it , t) conf (it ) = " " " ∗ sup(it , t) (3) i! ∈m(t) t t∈T K(it , t) t∈T where K(it , t) = 1 if sup(it , t) > 0 and 0 otherwise. In other words, the overall support score for a given candidate target entity is equal to the sum of the 8 entity’s partial supports (i.e. function sup) weighted by the relative number of terms that support it. It should be noted that in the above process we adopt the one referent per discourse approach which assumes one and only one meaning for a term in a discourse. 4 Framework Application and Evaluation To evaluate the effectiveness of our framework we applied it in the two scenarios we mentioned in the introduction, the one involving disambiguation in football match descriptions and the other in texts describing military conflicts. In both cases we used DBPedia as a source of semantic information and we i) defined a disambiguation evidence model for each scenario and ii) used these models to perform entity disambiguation in a representative set of texts. Then we measured the precision and recall of the process. Precision was determined by the fraction of correctly interpreted terms (i.e. terms for which the interpretation with the highest confidence was the correct one) to the total number of interpreted terms (i.e. terms with at least one interpretation). Recall was determined by the frac- tion of correctly interpreted terms to the total number of annotated terms in the input texts. It should be noted that all target terms for disambiguation in the input texts were known to the knowledge base (i.e. DBPedia). Finally, the results of the above evaluation process were compared to those achieved by two publicly available semantic annotation and disambiguation sys- tems, namely DBPedia Spotlight 4 [8], AIDA5 [6]. The two systems were chosen for comparison because i) they also use DBPedia as a knowledge source and ii) they provide some basic mechanisms for constraining the types of entities to be disambiguated, though not in the same methodical way as our framework does. Practically, the two systems merely provide the users the capability to select the classes whose instances are to be included in the process. In all cases, it should be made clear that the goal of this comparison was not to disprove the effectiveness and value of these systems as tools for open domain and unconstrained situa- tions but rather to verify our claim that our approach is more appropriate for disambiguation in “controlled” scenarios, i.e. scenarios in which a priori knowl- edge about what entities and relations are expected to be present in the text is available. A useful evaluation of popular semantic entity recognition systems for open scenarios may be found at [11]. 4.1 Football Match Descriptions Scenario In this scenario we had to semantically annotate a set of textual descriptions of football match highlights like the following: “It’s the 70th minute of the game and after a magnificent pass by Pedro, Messi managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real.”. These descriptions were used as meta- data of videos showing these highlights and our goal was to determine, in an 4 http://dbpedia-spotlight.github.com/demo/index.html 5 https://d5gate.ag5.mpi-sb.mpg.de/webaida/ 9 unambiguous way, which were the participants (players, coaches and teams) in each video. The annotated descriptions were then to be used as part of a se- mantic search application where users could retrieve videos that showed their favorite player or team, with much higher accuracy. To achieve this goal, we applied our framework and we built a disambiguation evidence model, based on DBPedia, that had as an evidence mapping function that of table 1. This function was subsequently used to automatically calculate (through equation 1) the function dem for all pairs of target and evidence en- tities. Table 3 shows a small sample of these pairs where, for example, Getafe acts as evidence for the disambiguation of Pedro Leon because the latter is a current player of it. Its evidential power, however, for that player is 0.5, since in the same team there is another player with the same name (i.e. Pedro Rios Maestre). Table 3. Examples of Target-Evidential Entity Pairs for the Football Scenario Target Entity Evidential Entity dem dbpedia:Real Sociedad dbpedia:Claudio Bravo (footballer) 1.0 dbpedia:Pedro Rodriguez Ledesma dbpedia:FC Barcelona 1.0 dbpedia:Pedro Leon dbpedia:Getafe CF 0.5 dbpedia:Pedro Rios Maestre dbpedia:Getafe CF 0.5 dbpedia:Lionel Messi dbpedia:FC Barcelona 1.0 Using this model, we applied our disambiguation process in 50 of the above texts, all containing ambiguous entity references. The overall number of ref- erences was 126 with about 90% of them being ambiguous. In average, each ambiguous entity reference had 3 possible interpretations with player names be- ing the most ambiguous. Table 4 shows the results achieved by our approach as well as by DBPedia Spotlight and AIDA. It should be noted that when using the latter systems, we used their concept selection facilities in order to constrain the space of possible interpretations. Still, as one can see from the table data, the constraining of the semantic data that our custom disambiguation evidence model facilitated (e.g. the consideration of only the current membership relation between players and teams) was more effective and managed to yield significantly better results. Table 4. Entity Disambiguation Evaluation Results in the Football Scenario System/Approach Precision Recall F1 Measure Proposed Approach 84% 81% 82% AIDA 62% 56% 59% DBPedia Spotlight 85% 26% 40% 10 4.2 Military Conflict Texts Scenario In this scenario our task was to disambiguate location references within a set of textual descriptions of military conflicts like the following: “The Siege of Augusta was a significant battle of the American Revolution. Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to the British and Loyalist forces in the South”. For that we used again DBPedia and we defined the disambiguation evidence mapping function of table 2 which, in turn, produced the evidence model that is (partially) depicted in table 5. Table 5. Examples of Target-Evidential Entity Pairs for the Miltary Conflict Scenario Location Evidential Entity dem dbpedia:Columbus, Georgia James H. Wilson 1.0 dbpedia:Columbus, New Mexico dbpedia:Pancho Villa 1.0 dbpedia:Beaufort County, South Carolina dbpedia:Raid at Combahee Ferry 1.0 dbpedia:Beaufort County, South Carolina dbpedia:James Montgomery (colonel) 1.0 dbpedia:Beaufort County, North Carolina dbpedia:Battle Of Washington 1.0 dbpedia:Beaufort County, North Carolina dbpedia:John G. Foster 1.0 Using this model, we applied, as in the football scenario, our disambiguation process in a set of 50 military conflict texts, targeting the locations mentioned in them. The average reference ambiguity of this set was 5 in a total of 55 locations. Table 6 shows the achieved results which verify the ability of our framework to improve disambiguation effectiveness. Table 6. Entity Disambiguation Evaluation Results in the Military Conflict Scenario System/Approach Precision Recall F1 Measure Proposed Approach 88% 83% 85% DBPedia Spotlight 71% 69% 70% AIDA 44% 40% 42% 5 Discussion It should have been made clear from the previous sections that our framework is not independent of the content or domain of the input texts but rather adaptable to them. That’s exactly its main differentiating feature as our purpose was not to build another generic disambiguation system but rather a reusable framework that can i) be relatively easily adapted to the particular characteristics of the domain and application scenario at hand and ii) exploit these characteristics in 11 order to increase the effectiveness of the disambiguation process. Our motivation for that was that, as the comparative evaluation of the previous section showed, the scenario adaptation capabilities of existing generic disambiguation systems can be inadequate in certain scenarios (like the ones described in this paper), thus limiting their applicability and effectiveness. Of course, the usability and effectiveness of our approach is directly pro- portional to the content specificity of the texts to be disambiguated and the availability of a priori knowledge about their content. The greater these two parameters are, the more applicable is our approach and the more effective the disambiguation is expected to be. The opposite is true as the texts become more generic and the information we have out about them more scarce. A method that could a priori assess how suitable is our framework for a given scenario would be useful, but it falls outside the scope of this paper. Also, the framework’s approach is not completely automatic as it requires some knowledge engineer or domain expert to manually define the scenario’s disambiguation evidence mapping func- tion. Nevertheless, this function is defined at the schema level thus making the number of required mappings for most scenarios rather small and manageable. Finally, although we haven’t formally evaluated the scalability of our ap- proach, the fact that our framework is based on the constraining of the semantic data to be used makes us expect that it will perform faster than traditional ap- proaches that use the whole amount of data. Furthermore, as the disambiguation evidence model may be constructed offline and stored in some index, the most probable bottleneck of the process will be the phase of determining the candidate entities for the extracted terms rather than the resolution process. Nevertheless, a more rigorous scalability study will have to be made as part of future work. 6 Conclusions and Future Work In this paper we proposed a novel framework for optimizing named entity dis- ambiguation in well-defined and adequately constrained scenarios through the customized selection and exploitation of semantic data. First we described how, given a priori knowledge about the domain(s) and expected content of the texts that are to be analyzed, one can use the semantic data and define an evidence model that determines which and to what extent semantic entities should be used as contextual evidence for the disambiguation task at hand. Then we de- scribed the process through which such a model can be actually used for this task. The overall framework was experimentally evaluated in two specific sce- narios and the results verified its superiority over existing approaches that are designed to work in open domains and unconstrained scenarios. Future work will focus on the further automation of the disambiguation evi- dence model construction by means of data mining and machine learning tech- niques. Moreover, an online tool to enable users to dynamically build such models out of existing semantic data and use them for disambiguation purposes, will be developed. 12 Acknowledgements This work was supported by the Spanish project CENIT-2009-1026 BuscaMedia and by the European Commission under contract FP7- 248984 GLOCAL. References 1. Auer, S., Bizer, C., Kobilarov, G., Lehmann J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the 6th International Semantic Web Conference, pages 722-735, 2007. 2. Fader, A., Soderland, S., Etzioni, O.: Scaling wikipedia-based named entity disam- biguation to arbitrary web text. In Proceedings of the WikiAI 09 - IJCAI Work- shop: User Contributed Knowledge and Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA, July 2009. 3. Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management, ACM, New York, SA, 1625-1628. 4. Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., Sheth A.P.: Context and domain knowledge enhanced entity spotting in informal text. In Proceedings of the 8th International Semantic Web Conference, pages 260-276, 2009. 5. Hassell, J., Aleman-Meza, B., Arpinar, I.: Ontology-driven automatic entity disam- biguation in unstructured text. In Proceedings of the 3rd European Semantic Web Conference, pages 44-57, Springer Berlin, Heidelberg, 2006. 6. Hoffart, J., Yosef, M.A., Bordino, I., Frstenau, H, Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Process- ing, Association for Computational Linguistics, Stroudsburg, PA, USA, 782-792. 7. Kleb, J., Abecker, A.: Entity Reference Resolution via Spreading Activation on RDF-Graphs. In Proceedings of the 7th European Semantic Web Conference, pages 152-166, Springer Berlin, Heidelberg, 2006. 8. Mendes, P.N., Jakob, M., Garcia-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, ACM, New York, USA, 1-8, 2011. 9. Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):128. 10. Pilz, A., Paass, G.: Named entity resolution using automatically extracted seman- tic information. Workshop on Knowledge Discovery, Data Mining, and Machine Learning, page 84-91, 2009. 11. Rizzo G., Troncy, R.: NERD: A Framework for Evaluating Named Entity Recog- nition Tools in the Web of Data. In 10th International Semantic Web Conference, Demo Session, pages 1-4, Bonn, Germany, 2011. 12. Rusu, D., Fortuna, B., Mladenic, D.: Automatically Annotating Text with Linked Open Data. In 4th Linked Data on the Web Workshop (LDOW 2011), 20th World Wide Web Conference, Hyderabad, India, 2011. 13. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Core of Semantic Knowledge. In 16th World Wide Web Conference, 2007.