=Paper=
{{Paper
|id=None
|storemode=property
|title=Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources
|pdfUrl=https://ceur-ws.org/Vol-1422/88.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/SudarikovB15
}}
==Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources==
J. Yaghob (Ed.): ITAT 2015 pp. 88–94 Charles University in Prague, Prague, 2015 Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources Roman Sudarikov and Ondřej Bojar Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Malostranské náměstí 25, 11800 Praha 1, Czech Republic http://ufal.mff.cuni.cz/ {sudarikov,bojar}@ufal.mff.cuni.cz Abstract: We present a pilot study in web-based annota- In this pilot study, we examine several such dictionaries tion of words with senses coming from several knowledge in terms of their coverage and annotator agreement. Un- bases and sense inventories. The study is the first step in like other works on “grounding”, which try to link only a planned larger annotation of “grounding” and should al- the most important words in the sentence [7, 8], we aim at low us to select a subset of these “dictionaries” that seem complete coverage of a given text, i.e. all content words or to cover any given text reasonably well and show an ac- multi-word expressions regardless their part of speech or ceptable level of inter-annotator agreement. role in the sentence. Some of the examined resources have Keywords: word-sense disambiguation, entity linking, a clear bias towards some parts of speech, for example, va- linked data lency dictionaries cover only verbs. We nevertheless ask our annotators to annotate even across parts of speech if the matching POS is not included in the resource. For 1 Introduction instance, verbs can get nominal entries in Wikipedia and Annotated resources are very important for training, tun- nouns get verb frames.1 ing or evaluating many NLP tasks. Equipped with experi- In Section 2, we describe the sense inventories included ence in treebanking, we now move to resources for word in our experiment. Section 3 provides a unifying view on sense disambiguation (WSD) and entity linking (EL). By these sources and introduces our annotation interface. We EL, we mean the task of attaching a unique ID from some conducted two experiments with English and Czech texts database to occurrences of (named) entities in text [1]. using the interface, slightly adapting interface for the sec- Both entity linking and word-sense disambiguation have ond run. Details are in Section 4 and Section 5. been extensively studied, see for example [2–4]. Although only a few researches consider several knowledge bases 2 Resources Included and sense inventories at once [1, 5], the convergence be- tween these two task is apparent, for example, the 2015 Sense inventories and knowledge bases are plentiful and SemEval Task 13 promoted research in the direction of they differ in many aspects including the domain coverage, joint word sense and named entity disambiguation [6]. level of detail, frequency of update, integration of other re- We understand the terms ontology, knowledge base and sources and ways of accessing them. Some of them imple- sense inventory in the following way: ment Resource Description Framework, the metadata data • Ontology is a formal representation of a domain of model designed by W3C for the better data representation knowledge. It is an abstract entity: it defines the vo- in Semantic Web, while others are simply collections of cabulary for a domain and the relations between con- links in the web. cepts, but an ontology says nothing about how that We selected the following subset of general resources knowledge is stored (as physical file, in a database, for our experiment: or in some other form), or indeed how the knowledge BabelNet [10] is a multilingual knowledge base, can be accessed. which combines several knowledge resources including • Knowledge base is a database, a repository of infor- Wikipedia, Wordnet, OmegaWiki and Wiktionary. The mation that can be accessed and manipulated in some sources are automatically merged and accessible via of- predefined fashion. Knowledge is stored in knowl- fline Java API or online REST API. An added benefit is edge base according to an ontology. the multilinguality of BabelNet: the same resource can • Sense inventory is a database, often build based on a be used for genuine (as opposed to cross-lingual) annota- corpus, and providing clustered senses for the words tion for both languages of our interest, English and Czech. or expressions in the corpus. However, we recognize the blending of knowledge bases 1 The conversion of nouns to predicates whenever possible is explic- and sense inventories, so we will use very generic terms itly demanded in some frameworks, e.g. in Abstract Meaning Represen- dictionary or resource interchangeably for either of them. tation (AMR, [9]). Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources 89 Figure 1: Annotation interface, annotating the words “DELETE key” in the sentence “Move the mouse cursor. . . ” with Google Search “senses”. The main limitation is that BabelNet is not updated con- expressions seen in our data, but searching the web pro- tinuously, so we also added both live Wikipedia and Wik- vides some explanation almost always. We thus include tionary as separate sources. BabelNet provides informa- the top ten results returned by Google Search as a special tion about nouns, verbs, adjectives and adverbs, but as kind of dictionary, where the “concept” is a query string stated above, we are interested also in cross-POS anno- and each result is considered to be its’ “sense”. tation. Aside from coverage and frequency of updates, another Wikipedia2 is currently the biggest online encyclopedia reason to include GS is that it provides “senses” at a very with live updates from (hundreds of) thousands of contrib- different level of granularity than others. For instance, the utors so it can cover new concepts very quickly. Wikipedia whole Wiktionary page can appear as one of the options in tries to nest all possible concepts as nouns. For exam- GS “senses”. It will also often be a very sensible choice, ple, en.wikipedia.org/wiki/funny redirects to despite it actually covers several different meanings of the the page “Humour”. word. Wiktionary3 is a companion to Wikipedia that covers We find the task of matching senses coming from dif- all parts of speech. It includes multilingual thesaurus, ferent ontologies and providing a different angle of view phrase books, language statistics. Each word in Wik- or granularity very interesting. The current experiments tionary can have etymology, pronunciation, sample quo- serve as a basis for its further investigation. tations, synonyms, antonyms and translations, for better understanding of the word. PDT-VALLEX and EngVallex (Valency lexicons for 3 Annotation Interface Czech and English): Valency or subcategorization lexi- cons formally capture verb valency frames, i.e. their syn- tactic neighborhood in the sentence [11, 12]. We use the To provide a unified view on the various resources, we use valency lexicons for Czech and English in their offline the terms query, selection list and selection. Given an ex- XML form as distributed with the tree editor TrEd 2.04 . pression in a text, which can be a word or a phrase, even a Google Search5 (GS): From our preliminary experi- non-continuous one, and a resource which should be used ments, we had the impression that no resource covers all to annotate it, the system construct a query. Querying the resource, we get a selection list, i.e. a list of possible 2 http://wikipedia.org senses. 3 http://wiktionary.org The process of extracting the selection list depends on 4 http://ufal.mff.cuni.cz/tred/ the resource. It is straightforward for Google Search (each 5 http://google.com result becomes an option) and complicated for Wiktionary, 90 R. Sudarikov, O. Bojar One or more senses selected Source Total Whole page Bad List None 1 2 3 4 or more Babelnet 28 - 1 3 23 1 0 0 Google Search 71 - 1 9 36 15 5 5 CS Vallex 38 - 0 2 29 6 1 0 EN Vallex 19 - 1 0 18 0 0 0 CS Wikipedia 38 - 9 12 15 1 0 1 EN Wikipedia 114 - 26 16 63 3 0 6 CS Wiktionary 21 - 1 3 7 4 5 1 EN Wiktionary 21 - 0 0 18 2 1 0 Babelnet 98 24 0 10 54 6 2 2 Google Search 93 0 0 26 19 16 11 21 EN Vallex 15 4 0 3 6 2 0 0 EN Wikipedia 103 23 7 36 35 2 0 0 EN Wiktionary 98 17 23 4 40 9 2 3 Table 1: Selection statistics, the first (upper part) and second (lower part) annotation experiments see Section 3.1 below. In principle and to include any con- 3.1 Queries and Selection Lists for Individual ceivable resource, even field-specific or ad hoc ones, the Resources annotator should be free to select the selection list prior to the annotation. This is how we construct queries and extract selection lists Our annotation interface allows to overwrite the query for each of our dictionaries given one or more words from for cases where the automatic construction does not lead the annotated sentence: to a satisfactory selection list. BabelNet We search BabelNet for the lemma of the se- Finally, the annotator is presented with the selection list lected word (or the phrase of lemmas if more words to make his choice (or multiple choices). Overall, the an- are selected). The selection list is the list of all ob- notator picks one of these options: tained BabelNet IDs. Whole Page means that the current URL is already a Google Search We search for the lemmas of the selected good description of the sense and no selection list words and return the snippets of the top ten results. is available on the page. The annotators were asked The selection list is the list of snippets’ titles. to change the query and rather obtain a selection list (e.g. a disambiguation page in Wikipedia) whenever Wikipedia We search for the disambiguation page for the possible. selected words and, if not found, we search for the page with the title matching the lemmas of the se- Bad List means that the extraction of selection list failed lected words. The selection list for disambiguation to provide correct senses. The annotators were sup- pages is constructed by fetching hyperlinks appear- posed to try changing the query to obtain a usable list ing within listings nested in particular HTML blocks. and resort to the “Bad List” option only if inevitable. For other pages we fetch links from the Table of Con- tents and the first hyperlink from each listing item. None indicates that the selection list is correct but that it lacks the relevant sense. Wiktionary We search for the page with the title equal to the lemmas of the selected words. The selection list One or more senses selected is the desired annotation: is created using the same heuristics as for Wikipedia. The list, for the particular pair of selected word(s) and selected resource, was correct and the annotator was Vallex We scan the XML file and return all the frames able to find the relevant sense(s) in the list. belonging to the verb with the lemma matching the selected word’s lemma. Our annotation interface (Figure 1) shows the input sen- tence, tabs for individual sense inventories, the selection list from the current resource and also the complete page 4 First Experiment where the selection list comes from. The procedure is straightforward: (1) select one or more words in the sen- The first experiment was held in March 2014. The 7 tence using checkboxes, (2) select a resource (we asked participating annotators (none of whom had any experi- our annotators to use them all, one by one), (3) check if ence in annotation tasks) were asked to annotate the sen- the selection list is OK and modify the query if needed, tences from PCEDT 2.0 6 with Czech and English sources: (4) make the annotation choice by marking one or more of Wikipedia and Wiktionary for both languages, BabelNet, the checkboxes in the selection list, and (5) save the anno- 6 http://ufal.mff.cuni.cz/pcedt2.0/en/index. tation. html Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources 91 Figure 2: Annotations from a given dictionary in the first experiment broken down by part of speech of the annotated words. Google Search, and the Czech and English Vallexes. Each Source Annotations 2-IAA Annotations 2-IAA Babelnet 29 0.69 29 0.69 annotator was given a set of sentences in English or Czech GS 120 0.24 120 0.24 and they were asked to annotate as many words or phrases CS Vallex 46 0.58 46 0.58 in each sentence as possible, with as many reasonable EN Vallex 19 1.00 19 1.00 meanings as they can. We required the annotators to anno- CS Wikipedia 47 0.32 43 0.35 EN Wikipedia 183 0.05 181 0.10 tate across parts of speech if possible (for instance to an- CS Wiktionary 38 0.29 38 0.35 notate the noun “teacher” with the corresponding verb “to EN Wiktionary 25 0 25 0 teach”). This requirement appeared because we wanted to Total: 507 0.21 501 0.24 evaluate the possibility of using more abstract senses as used, for instance, in works with AMR. Table 2: Inter-annotator agreement in the first experiment, before (left) and after (right) the “Bad List” fix. 4.1 Gathered Annotations a good page, and (2) when the whole page is wrong, for ex- In total, we collected 507 annotations for 158 units. 75 of ample when the system shows the Wikipedia page “South these units had more than one annotation. Africa” for the word “south”. “None” was meant for cor- The upper part of Table 1 provides details on how of- rect selection lists (matching domain, reasonable options) ten each of the annotation options was picked for a given but the right option missing. The guidelines for the first source in the first annotation experiment. Note that in the experiment were not very clear on this so some annota- first experiment, we did not offer the “Whole Page” option. tors marked problems with selection list as “Bad List” and We see that the sources exhibit slightly different patterns some used the label “None”. of use. Wikipedia has lots of “Bad List” options selected Manual revision revealed that only 10 out of 40 “Bad due to the issue described in Section 4.2. GS is the most List” annotations were indeed “Bad List” in one of the two ambiguous resource, the user has picked two or more sense meanings described above. The right hand part of Table 2 in about one half of GS annotations. The highest num- shows IAA after changing wrongly annotated “Bad Lists” ber of “Bad Lists” was received by the English Wikipedia into “None”. (18 out of 40). Figure 2 shows the distribution of different POS per source. Google Search seems to be the most versatile re- 4.3 Inter-Annotator Agreement source, covering all parts of speech well. The relatively Inter-annotator agreement is a measure of how well two low use of BabelNet was due to the web API usage limit. annotators can make the same annotation decision for Vallexes work well for verbs but cross-POS annotation is a certain item. In our case it is measured as the percent- only an exception. Wikipedia and Wiktionary are indeed age of cases when a pair (2-IAA) of annotators agree on somewhat complementary in covered POSes. the (set of) senses for a given annotation unit. The mea- surement was made pairwise for all the annotations, which 4.2 Bad List vs. None Issue had more that one annotator. The results are presented in Table 2, before and after fixing the “Bad List” issue. The “Bad List” annotations should be used in two cases: In general, the IAA estimates should be treated with (1) when the system fails to extract the selection list from caution. Many units were assigned only to a single anno- 92 R. Sudarikov, O. Bojar Figure 3: Annotations from a given dictionary in the second experiment broken down by part of speech of the annotated words. tator, so they weren’t taken into account while computing Search, English Wikipedia, English Wiktionary and ENG- IAA. VALLEX. The guidelines were refined, asking the anno- The extremely low IAA for English Wikipedia was tators to mark the largest possible span for each concept caused by the following issue. For several units, one an- in the sentence, e.g. to annotate “mouse cursor” jointly as notator tried to select all the senses to show that the whole one concept and not separately as “computer pointing de- page can be used, while others have picked one or only vice” for the word “mouse” and “graphic representation of a few senses. We resolved the issue by introducing a new computer mouse on the screen” for the word “cursor”. The option “Whole page” in the second experiment. option “Whole page” was newly introduced to help users Interestingly, we see a negative correlation (Pearson indicate that the whole page can be used as a sense. correlation coefficient of -0.37) between the number of units annotated for a given source and the 2-IAA. 5.1 Gathered Annotations We also report Cohen’s kappa [13], which reflects the agreement when disregarding agreement by chance. In our We collected 570 annotations for 35 words, 32 of which setting, we estimate the agreement by chance as one over had annotations from more than one annotator. The num- the length of the selection list plus two (for “None” and ber of units here is lower that in the first experiment, be- “Bad List”). This is a conservative estimate, in principle cause all our annotators used the same sentences. Also, the annotators were allowed to select any subset of selec- for the second experiment we required the annotators to tion list. We compute kappa using K = P1−P a −Pe e , where Pa use all the resources for each unit, so we have more results was the total 2-IAA and Pe was the arithmetical average of per unit. agreements by chance for each annotation. Kappa for the During the second experiment, the system processed first experiment was 0.13. 147 unique (in terms of selected word(s) and selected re- To assess the level of uncertainty for the estimates, source) queries. All the resources got nearly equal num- we use bootstrap resampling with 1000 resamples, which ber of queries (about 30), except for Vallex, which got gives us IAA of 0.25 ± 0.1 and kappa of 0.135 ± 0.115 for only 10 queries. The annotators changed the queries 95% of samples. 59 times, but this also includes cases, when Wikipedia used its own inner redirects, which our system did not 5 Second Experiment distinguish from users’ changes. BabelNet was changed 9 times, Google Search – 2, Vallex – 8, Wikipedia – 21 The second experiment was held in March 2015 with an- and Wiktionary – 19. Based on these numbers, GS may other group of 6 annotators. One of the annotators had seem more reliable but it is not necessary true. One reason experience in annotating tasks, while others had no such is that some of part of the changes for Wikipedia was made experience. The setting of the experiment was slightly dif- automatically by Wikipedia itself. The other argument is ferent. The annotators were asked to annotate only English that users could limit their effort and after examining the sentences from QTLeap project7 using BabelNet, Google first 10 GS results for the query they just picked “Bad List” option and moved on, not trying to change the query. The POS per source distribution (see Figure 3) for the 7 http://qtleap.eu/ second experiment is similar to the first one, except for the Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources 93 Content words Annotators Source Attempted Labeled A1 A2 A3 A4 A5 A6 Babelnet 100% 91% 53% 20% 67% 66% 79% 40% GS 100% 85% 50% 13% 53% 46% 76% 20% Vallex 32% 26% 7% 6% 10% 40% 0% 0% Wikipedia 100% 58% 39% 20% 35% 53% 50% 26% Wiktionary 100% 88% 53% 20% 32% 40% 76% 26% Total content words 34 34 28 15 28 15 34 15 Table 3: Coverage per content word (second experiment). The left part reports the union across annotators, the right part reports the percentage of content words receiving a valid label (Labeled) for each annotator separately. Source Annotations number 2-IAA 6 Discussion Babelnet 114 0.49 GS 217 0.45 Vallex 17 0.60 Comparing first and second experiment, one can see, that Wikipedia 105 0.61 we managed to improve IAA by expanding the set of avail- Wiktionary 117 0.28 able options and refining the instructions, but IAA is still Total: 570 0.46 not satisfactory. For resources where IAA reaches 60% (Vallex and Table 4: Inter-annotator agreement, second experiment Wikipedia), the coverage is rather low, 26% and 58%. Ba- belNet gives the best coverage but suffers in IAA. Google Search seems an interesting option for its versatility across BabelNet, which did not reach any technical limit this time parts of speech, on par with established knowledge bases and was therefore used more often across all POSes. like BabelNet in terms of inter-annotator agreement but with much more ambiguous “senses”. The cross-POS 5.2 Coverage annotation does not seem very effective in practice, but a more thorough analysis is desirable. In Table 3, we show the coverage of content words in the second experiment. By content words we mean all the words in the sentence, except for auxiliary verbs, punc- 7 Comparison with Other Annotation Tools tuation, articles and prepositions. The instructions asked Several automatic systems for sense annotation are avail- to annotate all content words. Each annotator completed able. Our dataset could be used to compare them empir- a different number of sentences, so the number of words ically on the annotations from the respective repository annotated differs. The column Content words Attempted used by each of the tools. For now we provide only an shows the total number of words with some annotation at illustrative comparison of these three systems: TAGME8 , all, while Labeled are words which received some sense, DBpedia Spotlight9 ,and Babelfy10 not just “None” or “Bad List”. Both numbers are taken Figure 4 provides an example of our manually collected from the union over all annotators. Babelnet get the best annotations for the sentence “Move the mouse cursor to coverage in terms of Labeled annotations. The right hand the beginning of the blank page and press the DELETE side of the table shows how many words each annotator key as often as needed until the text is in the desired spot.”. has labeled. Since the union is considerably higher than For this sentence, the TAGME system with default set- the most productive annotator, we need to ask an impor- tings returned three entities (“mouse cursor”, “DELETE tant question: How many annotators do we need to have key” and “text”). DBpedia Spotlight with default settings a perfect coverage of the sentence. (confidence level = 0.5) returned one entity (“mouse”). Babelfy showed the best result among these systems in 5.3 Inter-Annotator Agreement terms of coverage, failing to recognize only the verb “move” and adverbs “often” and “until”, but it also pro- Results presented in Table 4 are overall better than in the vided several false meanings for found entities. first experiment. The kappa was computed as in Sec- tion 4.3 with the only one difference: we added 3 instead of 2 options when estimating the local probability of the 8 Conclusion agreement by chance (for the new “Whole Page” option). Kappa for the second experiment was 0.40. Bootstrapping In this paper, we examined how different dictionaries can showed IAA 0.39 ± 0.055 and kappa 0.32 ± 0.06 for 95% be used for entity linking and word sense disambiguation. central resamples. Again, the 2-IAA is negatively corre- 8 http://tagme.di.unipi.it/ lated with the number of units annotated (Pearson correla- 9 http://dbpedia-spotlight.github.io/demo/ tion coefficient -0.22). 10 http://babelfy.org/ 94 R. Sudarikov, O. Bojar BabelNet Wikipedia TAGME Spotlight Babelfy Move bn:00087012v,bn:00090948v Motion_(physics) - - - bn:00056033n, bn:00056155n mouse bn:00021487n,bn:00090942v mouse_(disambiguation), mouse_cursor Mouse_(computing) Mouse_(computing) bn:00024529n,bn:00021487n bn:00024529n cursor bn:00024529n mouse_cursor, cursor_(disambiguation) mouse_cursor - bn:00024529n beginning bn:00009632n,bn:00009633n beginning, beginning_(disambiguation) - - bn:00083340v bn:00009634n,bn:00009635n blank bn:00098524a blank_page_(disambiguation) - - bn:01161190n,bn:00098524a page bn:00060158n blank_page_(disambiguation) - - bn:01161190n,bn:00060158n press bn:00091988v,bn:00091986v press_(disambiguation) - - bn:00046094n DELETE bn:01208543n Delete_key, DELETE Delete_key - bn:01208543n, bn:00045088n key bn:01208543n, bn:00048996n Delete_key, key_(disambiguation) Delete_key - bn:01208543n, bn:00048985n often bn:00114048r, bn:00115452r often - - - bn:00116418r needed bn:00107194a Need_(disambiguation) - - bn:00082822v until - until - - - text bn:00076732n text_(disambiguation) Plain_text - bn:00076732n desired bn:00100580a, bn:00026550n Desire_(disambiguation), desired - - bn:00086682v bn:00100607a spot bn:00062699n spot_(disambiguation) - - bn:00062699n Figure 4: Our BabelNet and Wikipedia manual annotations and outputs of three automatic sense taggers for the sentence “Move the mouse cursor to the beginning of the blank page and press the DELETE key as often as needed until the text is in the desired spot.” Overlap indicated by italics (BabelNet and Babelfy) and bold (Wikipedia and TAGME). In our unifying view based on finding the best “selection [4] Navigli, R.: Word sense disambiguation: A survey. ACM list” and selecting one or more senses from it, we tested Comput. Surv. 41(2) (February 2009) 10:1–10:69 standard inventories like BabelNet or Wikipedia, but also [5] Pereira, B.: Entity linking with multiple knowledge bases: Google Search. An ontology modularization approach. In: The Semantic We proposed and refined annotation guidelines in two Web–ISWC 2014. Springer (2014) 513–520 consecutive experiments, reaching average inter-annotator [6] Moro, A., Navigli, R.: SemEval-2015 Task 13: Multilin- agreement of about 46%, with Wikipedia and Vallex up gual All-Words Sense Disambiguation and Entity Linking. to 60%. Higher agreement seems to go together with lower In: Proc. of SemEval-2015. (2015) In press. coverage, but further investigation is needed for confirma- [7] Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation tion and to find the best balance of granularity, coverage of short text fragments (by wikipedia entities). In: Proc. of CIKM, ACM (2010) 1625–1628 and versatility among existing sources. [8] Zhang, L., Rettinger, A., Färber, M., Tadić, M.: A compar- ative evaluation of cross-lingual text annotation techniques. In: Information Access Evaluation. Multilinguality, Multi- Acknowledgements modality, and Visualization. Springer (2013) 124–135 [9] Banarescu, L., , et al.: Abstract Meaning Representation This research was supported by the grants FP7-ICT-2013- for Sembanking (2013) 10-610516 (QTLeap). This research was partially sup- [10] Navigli, R., Ponzetto, S.P.: BabelNet: The automatic con- ported by SVV project number 260 224. This work has struction, evaluation and application of a wide-coverage been using language resources developed, stored and dis- multilingual semantic network. Artificial Intelligence 193 tributed by the LINDAT/CLARIN project of the Ministry (2012) 217–250 of Education, Youth and Sports of the Czech Republic [11] Žabokrtský, Z., Lopatková, M.: Valency information in (project LM2010013). VALLEX 2.0: Logical structure of the lexicon. The Prague Bulletin of Mathematical Linguistics (87) (2007) 41–60 [12] Lopatková, M., Žabokrtský, Z., Ketnerová, V.: Valenční References slovník českých sloves. (2008) [13] Cohen, J.: A Coefficient of Agreement for Nominal Scales. [1] Demartini, G., et al.: Zencrowd: leveraging probabilis- Educational and Psychological Measurement 20(1) (1960) tic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web, ACM (2012) 469–478 [2] Bennett, P.N., et al.: Report on the sixth workshop on exploiting semantic annotations in information retrieval (ESAIR’13). In: ACM SIGIR Forum. Volume 48., ACM (2014) 13–20 [3] Ratinov, L., et al.: Local and global algorithms for disam- biguation to wikipedia. In: Proc. of ACL/HLT, Volume 1. (2011) 1375–1384