Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 89 Helping term sense disambiguation with active learning Pierre André Ménard Caroline Barrière Jean Quirion Centre de recherche informatique Centre de recherche informatique École de traduction de Montréal, Canada de Montréal, Canada Université d’Ottawa, Canada pamenard@gmail.com caroline.barriere@crim.ca jquirion@uottawa.ca Abstract Results also show that terms, although poly- semous, have a very strong bias toward their in- Our research highlights the problem of domain sense. In such biased case, a random term polysemy within terminometrics stud- sampling of annotation data is far from optimal, ies. Terminometrics is the measure of term wasting much human effort. We therefore intro- usage in specialized communication. Pol- duce active learning (Section 5) and implement it ysemy, especially within single-word terms as we will show, prevents using term cor- within an annotation platform (Section 6), to ob- pus frequencies as appropriate statistics for tain a sense-annotated dataset in less time. terminometrics. Automatic term sense dis- ambiguation, as a possible solution, re- 2 Terminometrics quires human annotation to feed a super- vised learning algorithm. Within our experi- Terminometrics is the measure of term usage ments, we show that although being polyse- in different types of communications (Quirion, mous, terms have a strong in-domain sense 2006). Its purpose is to determine, for a partic- bias, making random sampling of annota- tion data less than optimal. We suggest ular concept, the relative corpus frequencies of its the use of active learning and implement it competing terms. within an annotation platform as a way of The protocol of terminometrics, as defined in reducing annotation time. Quirion (2003), consists in first deciding on a do- main of interest and selecting its set of concepts (most often all) from a term bank. Then, for each 1 Introduction particular concept, the individual number of oc- In our research, we investigate the measure of term currences of all its competing terms is counted usage in specialized communication, called Ter- within different corpora from the same domain minometrics (Section 2). The studied terms are gathered by terminologists to represent different provided by a term bank, and we are interested communicative settings. Acknowledging the pos- in the fact that many of these terms are polyse- sible polysemy of competing terms, the protocol mous, creating difficulties for our terminometrics includes a human expert, to actually disambiguate study. We therefore investigate the presence of a randomly selected subset of occurrences, and polysemy in term banks (Section 3). Polysemy obtain better estimates of real frequencies. found in term banks is problematic since it leads A good example of this would be the concept to term occurrences in corpora being possibly pol- of a atomic cluster within the nanotechnology do- ysemous, preventing simple corpus frequencies to main. According to the term bank used, such no- provide proper statistics. We confirm such poly- tion can be expressed by the following 6 terms semous occurrences within our specific termino- atomic cluster, atom cluster, atomic aggregate, metrics experiment in the nanotechnology domain atom aggregate, cluster and aggregate. In ter- (Section 4), as we analyse term sense human an- minometrics, comparative studies of use of terms notation results for a set of nanotechnology terms. in specialized communications, government liter- Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 90 ature, specialized media, and general media are of interest, as they might reveal how some terms are used by the general public, while others are used by more official government documents. Studying the occurrence in text of different syn- onyms of concepts would not be problematic if each one was monosemous. But unfortunately, that is not the case. For example, referring to Ta- ble 1, the term cluster is a competing term for multiple concepts, and simply counting its occur- rences in text, without disambiguation, would not be indicative of its usage for any of them. Figure 1: Degree of polysemy in Wordnet and Ter- Obviously, human annotation is costly, and the mium per term length possibility of performing automatic term sense disambiguation is quite appealing. In terminomet- rics, concepts are evaluated one at a time, reducing for the term cluster, as found in the Grand Dictio- the disambiguation task to a binary decision. The nnaire Terminologique (GDT)1 are shown in Ta- annotation is not a selection among N senses, but ble 1. rather a yes/no decision on whether the current in- There might be a misconception that special- stance represents the current concept or not. Fur- ized language is less ambiguous, and would then thermore, term disambiguation within terminom- not provide a proper challenge for word-sense dis- etry cannot be dealt with similarly to more typi- ambiguation. A study by Barrière (2007), shows cal word-sense disambiguation or even term-sense the contrary, as Wordnet and Termium2 (the actual disambiguation relying on knowledge contained resource used in this experiment) were compared in an external resource (Barrière, 2010) since the along different criteria. One criteria of comparison annotator, or the algorithm, is likely to only have was coverage, and another one, more of our inter- access to the context of occurrences to perform est in this research, is the degree of polysemy in term disambiguation. relation to word specificity. Word specificity was approximated by ”hit counts”, as found in a very 3 Polysemy of specialized terms large corpora (Waterloo Terabyte Corpus, used by Terra and Clarke (2003)), with words occurring Terms for the terminometrics studies are provided from 1 to millions of times. Figure 1 shows their by term banks. Such repositories of terms are results. We see how for common words (hit counts not often investigated for the study of polysemy. in the log10 (f req) > 3), the degree of polysemy In Natural Language Processing, a typical task in the term bank is even larger than in WordNet. of word sense disambiguation requires a lexico- In our study, we wished to further character- graphic resource, such as WordNet (Miller, 1995), ize this degree of polysemy in terminological re- to provide a repository of possible word senses in sources. We used a small set of 164 terms from the order to disambiguate words in texts (Pantel and current experiment (presented in Section 4.1), and Lin, 2002). No doubt that words are polysemous, looked at the number of senses in two term banks: even in specific domains (Chroma, 2011; Vogel, Termium and GDT. Figure 2 shows that special- 2007), but less studies show and discuss the poly- ized terms, especially short ones (1 to 3 words) semy of terms. can have many senses (records) and span many Terms are single-word or multi-word expres- domains. This trend generally diminishes as the sions denoting particular concepts within partic- term length increases. ular domains. A term bank is organized by do- 1 mains (e.g. biology, automotive, etc) and contains The GDT can only be accessed via a web interface at records corresponding to concepts. Each record http://www.granddictionnaire.com . 2 Termium term bank can be accessed online at contains at least one term, and often competing http://www.btb.termiumplus.gc.ca or downloaded at terms (synonyms) denoting that concept, possibly http://open.canada.ca/data/en/dataset/94fc74d6-9b9a- in more than one language. Examples of records 4c2e-9c6c-45a5092453aa Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 91 Domain Terms atomic aggregate, cluster, aggregate, atom aggregate, atom cluster, atomic cluster nanotechnology molecular aggregate, cluster, aggregate, molecule aggregate, molecule cluster nanoaggregate, cluster, aggregate, nanocluster, nanometer-size cluster, nanoscale aggregate, nanoscale cluster seafood crab section, section, crab cluster, cluster software cluster, document cluster mining vein system, vein set, cluster of veins, mining cluster, cluster internet service cluster, cluster of service, cluster scanning tunneling electron microscope, microscope, scanning tunneling microscope, STM atomic force microscope, microscope, AFM, SFM, scanning force microscope nanotechnology magnetic force microscope, microscope, MFM, SMM, scanning magnetic microscope scanning probe microscope, microscope, SPM, scanned-probe microscope Table 1: Different records for ”‘cluster”’ and ”‘microscope”’. were downloaded. After such process, the corpus might still be noisy, but it does contain a majority of nanotechnology-related documents. All terms in the nanotechnology term base are searched for in the corpus. For each of their oc- currences, a window spanning 90 characters each side of the term is extracted. This text span be- comes a contextualized instance to be annotated. Table 2 shows examples of these instances. 4.1 Human annotation process Figure 2: Degree of polysemy in Termium and GDT per term length For our current annotation experiment, a total of 164 terms taken from 29 records (among the 1,035 mentioned earlier) were selected along with the 4 Experiment - Terminometrics in complete set of instances found in the nanotech- nanotechnology domain nology corpus. Each term occurred between 75 to Our current terminometrics study focuses on term 2100 times in the corpus for a total of 17,227 in- usage in the nanotechnology domain within Cana- stances for the whole term sample. This dataset dian French. This domain, within the GDT term was divided into two parts distributed between 2 bank, contains 1,035 records (concepts)3 , each PhD students in terminology. As shown in Ta- with its competing terms. This set of terms is ble 2, annotators were presented text sample with what we call our nanotechnology term base cov- a targeted term and were asked to indicate ”yes” if ering ”‘the science of working with atoms and the term was used in the correct nanotechnology molecules to build devices that are extremely sense and ”no” otherwise. Prior to the annotation small”’ (Merriam-Webster dictionary). effort, the dataset was sorted by terms, as this was considered easier to annotate compared to an an- To study the competing terms for the nanotech- notation by document order, which would ask the nology concepts, a corpus was built using doc- annotator to constantly switch between term def- uments from corporative, educational, news me- initions. They took a total of 82 hours (41 hours dias and government websites. These documents each) to annotate all the instances of the selected were retrieved first by selecting most of the orga- dataset. Each text sample was composed of the 90 nizations originating from the province of Québec, characters prior to a term occurrence, the term oc- Canada, and whose core activities dealt with nan- currence as is, and another 90 characters following otechnology. This list was then vetted by an ex- the term occurrence. The 90 characters window pert. Next, the websites of these organizations was adjusted to avoid word truncation. 3 As the GDT expands everyday, this number might not The annotators were also asked to indicate the represent its current status. difficulty level of the provided answer: standard, Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 92 Annotation Instance Yes ... une technologie d’intégration par laquelle plusieurs nanostructures sont intégrées sur un même substrat. L’interface entre les dispositifs et d’autres systèmes (oxyde, verre) sera aussi étudiée. (... an integration technology for which many nanostructures are integrated on a substrate. The interface between the components in other systems (oxyde, glass) will also be studied.) Yes ... dollars à Bromont dans une petite usine qui allait employer 200 personnes pour la production de substrats, que le dictionnaire définit comme un matériau sur lequel sont réalisés les éléments d’un ... (...dollars at Bromont in a small factory which was going to employ 200 people for the production of substrates, which dictionary define as a material on which are realized elements of...) No ... et valoriser les boues de station d’épuration. L’investigation des possibilités d’acquérir ces substrats requiert l’inventaire des industries de la région, les quantités et les caractéristiques des ... (... and valorize the epuration station’s muds. Investigating the possibility of acquiring these substrates requires to inventoriate the region’s industries, the quantity and features of ...) Yes ... MNT Définition : Fabrication mécanique et contrôlée de structures moléculaires, par une approche ascendante qui consiste à les assembler, étape par étape, molécule par molécule, en se servant d’appareil ... (... MNT Definition : Mechanical and controled fabrication of molecular structures by a bottom-up approach which consist of assembling, step (by step, molecule by molecule, by using tool ... No ... Quand il est possible de le faire, l’analyse de la demande d’énergie est fondée sur une approche ascendante agrégeant les demandes par usage, par secteur d’activités économiques, par région et par ... ( When it is possible to do it, the energy request analysis is founded on a bottom-up approach aggregating the requests by use, by economic activity sector, by regions and by ...) No ... que beaucoup de problèmes rencontrés en pratique ne sont pas adressés par ces processus. L’approche ascendante de l’amélioration du processus consiste donc, selon ces mêmes auteurs, à implanter une équipe ... (... that many issues encountered in practice are not adressed by these processes. The bottom-up approach of process improvement consist of, for these same authors, implanting a team ...) Table 2: Instances for the terms substrat (substrate) and approche ascendante (bottom-up approach) hard, hardest. Results showed that 626 instances currence of a term in a different sense. If P (x) is (3.6%) needed a little more analysis while 222 in- the probability of the correct sense, then 1 P (x) stances (1.3%) were much harder to annotate with is the probability of another sense. Then, we have only the presented context. All the other instances the entropy, shown in Equation 1, as a sum over were judged of standard difficulty meaning that two possible events. the textual contexts of the term occurrences were sufficient for the disambiguation task. In antici- pation of an automatic disambiguation algorithm E(x) = Px log2 Px + (1 Px )log2 (1 Px ) (1) which would only have access to the immediate The resulting function is at its maximum, a context of the term, this confirmed that for most value of 1, with a probability of 50% and is equal cases, it should be possible to disambiguate with a to 0 with probabilities of either 0% or 100%. In ±90 characters window4 . our case, x is the rate of occurrence of an antic- ipated term sense in a corpus. A term with an 4.2 Observations and results on polysemy entropy of 0 would mean it is not ambiguous, ei- Analysis of the annotated instances reveals that ther all or none of the term’s instances use the cor- 84.31% (14,524) of them occur in the correct nan- rect sense, and a term with an entropy of 1 would otechnology sense of the term, and the remain- mean 50% of its instances are used in the correct ing 15.69% (2,703 instances) are used with other sense, the remaining 50% of the instances using meanings. To measure the overall polysemy in our other meanings. dataset, we use the notion of entropy. Entropy is For example, the term STM (acronym of scan- defined as a summation of all possible event prob- ning tunnelling microscope) counts as a single- abilities multiplied by the log of their probability. word term occurring a total of 341 times. Among In our current experiment, there are only two pos- those, 104 instances (104/341=0.30499) have the sible events, first the occurrence of a term in a cor- nanotechnology sense, which gives an entropy of rect sense, let us call that x, and second, the oc- 0.8873 as shown in Equation 2. This is a relatively 4 high entropy level as it nears the 50% maximum. This claim disregards the fact that humans certainly have much apriori knowledge which they use during the disam- If the case would have been less ambiguous, for biguation task. Nevertheless, trigger of this apriori knowl- example 5 out of 341 instances, the entropy would edge would still come from the limited context window. have been 0.1103. Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 93 Figure 3: Average entropy per length on gold corpus E(ST M ) = 0.30499 ⇥ log2 (0.30499)+ (1 0.30499) ⇥ log2 (1 0.30499) (2) = 0.8873 Figure 4: High level view of the active learning pro- cess. The bottom dashed line (Figure 3) shows the av- erage entropy over all terms having a particular a prediction model which would only target the word count. The top full line shows the average majority class, overlooking instances potentially entropy for the 5 terms with the highest enthropy useful for terminometrics experts. (and thus the highest degree of ambiguity) of each To sidestep this risk, we lean toward a learning length, emphasizing how a few terms account for approach called active learning which defines an much of the corpus polysemous instances. Ex- iterative annotation process in order to reduce the amples of these very polysemous terms are tun- risk of producing a biased prediction model. As nelling, substrat, or top-down. shown in Figure 4, this four-step process implies These corpus results, showing an overall ten- the interaction with an oracle, typically a human dency for entropy to decrease with term length, are annotator who needs to be familiar with the do- in line with our previous results presented in Fig- main’s terminology and concepts being studied. ure 2 relating term length to the polysemy level The active learning process starts with a set of within term banks. Nevertheless, these corpus re- unlabelled data (U D) containing, in the current sults also show that the in-domain sense is much context, individual occurrences of a term in a cor- more likely than all other senses. This leads us pus, described by a group of features (e.g. a bag- to think that we should take advantage of the par- of-word made of its co-occurring words in con- ticularity of our task in selecting the annotation text). At this point, the labeled dataset (LD) is dataset, as we further describe in the next section. empty and there is no prediction model available. The active learning algorithm starts by selecting a 5 Active learning for term sense group of instances, called the seed S, from U D. annotation For each instance of S, the oracle is queried to The strong in-domain sense bias results shown in specify a label, and the labeled example is then the previous section, indicate that random sam- stored in LD. The oracle annotates the instance pling, suggested by the terminometrics methodol- using one value of a predefined class label set, ogy, could lead to collecting a biased sample and in this case {yes, no}, yes meaning the instance provide a distorted analysis. Traditional machine is used in the targeted sense, no if another other learning algorithms trained on these unbalanced sense is used. When all instances in S are labeled, samples would suffer the same bias, as less infor- the active learning algorithm uses them to create a mation would be available to classify the minority prediction model. It is important to note that there class. This type of algorithm would likely produce is no ideal size for the seed, but it should be suf- Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 94 ficient to enable the algorithm to train a relevant ated with active learning range from feature selec- prediction model. tion for particular disambiguation tasks (Palmer Once a prediction model is available, the pro- and Chen, 2005), model adaptation when chang- cess takes place in the same order, but with a vari- ing domain between the training and application ant. Instead of a seed, the algorithm superficially of the model (Chan and Ng, 2007), class imbal- applies the prediction model to instances in U D ance problem (Zhu and Hovy, 2007) or deciding (without labeling them or changing them to the when the prediction algorithm stops asking for ad- labeled set) and pick an instance for which the ditional annotation (Zhu et al., 2008). model does not provide a sufficient level of con- fidence for its classification. It then submits this 6 Terminometrics active-learning instance to the oracle who applies a label. Then, platform the newly labeled example is added to LD. The We developed an annotation platform, shown in prediction model is then retrained and the process Figure 5, to facilitate terminometrics studies with continues until the algorithm reaches an overall an active learning component for term disam- level of confidence for all instances in U D. biguation. The platform implements the inter- When this stopping criteria is reached, the ac- active active learning process described above to tive learning process is complete and the predic- control and optimize the active learning between tion model can be used to annotate the remain- the prediction module and the human annotator. ing instances in U D, if needed, or another simi- The platform will also enable future experiments lar dataset. Again, the level of confidence used as within the field of terminometrics in which both the stopping criteria must be empirically defined, the active learning algorithms and the human in- as there is no ideal value. Of course, a higher teraction can be further explored. confidence level might increase the annotation ef- The user of this platform (typically the oracle in fort needed to produce the final prediction model, the active learning process) can create a corpus of while a lower value might produce a less effec- documents, use this corpus to create an annotation tive prediction model using fewer instances. Fine- project by defining a set of concepts, related terms tuning the confidence level helps to reduce the risk and variations (plural, gender) and participate in of training a biased prediction model on a predom- the active learning process. At the end of the ac- inant class in a dataset. tive learning process, the platform annotates the In our current implementation of active learn- remaining instances in U D (see Figure 4) in or- ing, we select a seed of 20 instances with ran- der to estimate the distribution of occurrences of dom sampling which is then processed with Ran- competing terms of a concept. This is used for the domForest (Tin Kam Ho, 1995) as the prediction terminometrics analysis. model. The oracle is then asked to annotate other Aside from the {yes, no} classification, the in- blocks of 20 instances until the algorithm reaches terface offers two other choices; undecided and re- its parametered confidence level. If this level is not ject. The first choice allows the user to skip an in- reached after a total of 200 instances (including stance and go to the next, while being able to later the seed), a final prediction model is trained and return to provide an answer. This could happen applied on U D in order to limit the effort to anno- when the user wishes to see a larger context to per- tate each expression. The features for the classifi- form the disambiguation. In fact, to help this pro- cation process are extracted from the 90 characters cess, the platform also provides an option to view window, which was judged as sufficient during the an instance within its original document. The sec- experiments (Section 4.1). ond choice, reject, removes the instance entirely At this stage in our research, the current im- from the unlabeled and labeled datasets. This is plementation provides a baseline on which we used typically when the user considers that the in- can later improve using different alternative mod- stance should not be used for the terminometrics els presented in the literature. Certainly, other final analysis. research in word sense disambiguation has ex- In order to further reduce the annotation effort plored the empirical behaviour of active learning needed to perform a terminometrics study, other (e.g. (Chen et al., 2006)). Specific issues associ- features, unrelated to active learning, were added Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 95 Figure 5: Annotation interface for terminometrics. to the platform. The first is a language-based doc- of an annotation project. The source issue is that ument filter which can be applied during the cor- a sentence or a whole paragraph (or sometimes pus creation to try to remove documents which are complete documents) can be found in several lo- not suited for the targeted analysis. Each docu- cations within a corpus created from web sites. ment is analysed with a language detection algo- While each occurrence of a term (or its variations) rithm to extract a confidence level associated with is stored and kept for an accurate assessment of its its deduced language. It then enables the user to rate of occurrence in a corpus, only unique con- keep only the documents which are above a spe- texts (the term occurrence and a ±90 characters cific threshold and exclude the remaining from the window) are used for the active learning process. corpus to be annotated. Of course, documents For example, if the first context of ”‘substrat”’ with no text, such as files containing only images, shown in Table 2 was found with the same prior are also removed. and post context in five documents in a corpus, the oracle would be asked at most once to annotate Another effort reduction feature is the duplicate this instance (if it is selected for annotation by the context detection which takes place at the creation Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 96 algorithm), but it would count as five occurrences the day-to-day life of people (e.g. computer do- in the terminometrics analysis. main). We believe there is much room to further The platform also facilitates the management study term polysemy in term banks, in specialized of terminometrics studies by providing many fea- corpus and also in more general corpus where both tures: an integrated storage and search capabil- specialized and common senses might be present. ity on domain-specific corpora, a user interface One of the envisioned experiments is to anno- specifically designed to facilitate annotation by tate semi-automatically a whole corpus to be able providing in-context display of a term to validate, to compare the current approach to a supervised an access to a term list with the possibility for ad- learning method. This will enable us to evaluate dition and removal of terms, and so on. This is an the contribution of active learning on the raw per- improvement over the traditional manual handling formance of disambiguation and time reduction of of documents and term lists, instances generation the annotation task. A new dataset related to a do- and annotation, traditionally done with folders and main different than nanotechnology will also be spreadsheets. While the upper limits of the plat- defined for this experiment to avoid evaluating the form have not been tested explicitly, the current approach on the dataset used for development. experiment was done with a term list of 1,036 en- tries on a corpus of over 220,000 documents. As Acknowledgments far as the sizes of the corpora and vocabulary are We thank the annotators Julián Zapata and Barış concerned, the platform is mainly limited by the Bilgen. speed and capacity of the computer that runs it. 7 Conclusion and future work References Caroline Barrière. 2007. La désambiguı̈sation In this article, we introduced term sense disam- du sens en traitement automatique des langues biguation, a close cousin to word sense disam- (TAL): l’apport de resources terminologiques et lex- biguation, but much less studied within the NLP icographiques. In Marie-Claude L’Homme and community. We showed how terms, especially Sylvie Vandaele, editors, Lexicographie et Termi- single-word terms, are polysemous, both in term nologie: compatibilité des modèles et méthodes, banks and in specialized corpus. pages 113–140. Presses de l’Université d’Ottawa. We presented the idea of using active learning Caroline Barrière. 2010. Recherche contextuelle d’équivalents en banque de terminologie. In Traite- within our terminometrics application, in which ment Automatique des Langues Naturelles 2010. the in-domain sense bias is quite strong. So far, Yee Sang Chan and Ht Ng. 2007. Domain adaptation we have implemented a simple active learning with active learning for word sense disambiguation. algorithm, and will move toward more complex Acl. ones in the near future. The annotation platform, Jinying Chen, Andrew Schein, Lyle Ungar, and Martha ready for experimentation, will allow terminolo- Palmer. 2006. An empirical study of the behav- gists to further complete, in less time, the anno- ior of active learning for word sense disambiguation. tation process of the nanotechnology domain and Proceedings of the main conference on Human Lan- guage Technology Conference of the North Amer- other domains. This will provide test data, on ican Chapter of the Association of Computational which we can measure the different gains in terms Linguistics, pages 120–127. of time and accuracy of our current and future ac- Marta Chroma. 2011. Synonymy and Polysemy in tive learning approaches. Legal Terminology and Their Applications to Bilin- Furthermore, we plan to push further our explo- gual and Bijural Translation. Research in Language, ration of term disambiguation. In fact, although 9:31–50. lexicographic and terminological resources are or- Ingrid Meyer. 2000. Computer Words in Our Every- ganized differently, the distinction between terms day Lives : How are they interesting for terminog- raphy and lexicography ? In Euralex’2000, Inter- and words is not always that ”clear-cut”. Many national Congress on Lexicography, pages 39–58, single-word terms exist also as common words. Stuttgart, Germany. Some specialized terms also migrate from specific George A. Miller. 1995. WordNet: a lexical domains to the general language (Meyer, 2000) database for English. Communications of the ACM, when a specialized domain becomes more part of 38(11):39–41. Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 97 M Palmer and Jy Chen. 2005. Towards robust high performance word sense disambiguation of En- glish verbs using rich linguistic features. Natural Language Processing - Ijcnlp 2005, Proceedings, 3651:933–944. Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining, KDD ’02, pages 613–619, New York, NY, USA. ACM. Jean Quirion. 2003. Methodology for the design of a standard research protocol for measuring terminol- ogy usage. Terminology, 9(c):29–49. Jean Quirion. 2006. Terminometrics - an Evaluation Tool of/for Term Standardization. In TSTT’2006 - International Conference on Terminology, Stan- dardization and Technology Transfer, pages 19–24, Beijing, China. Egidio Terra and C.L.A. Clarke. 2003. Frequency Es- timates for Statistical Word Similarity Measures. In Proceedings of the NAACL 2003, page 165. Tin Kam Ho. 1995. Random decision forests. Pro- ceedings of 3rd International Conference on Docu- ment Analysis and Recognition, 1:278–282. Radek Vogel. 2007. Synonymy and polysemy in ac- counting terminology: fighting to avoid inaccuracy. In Proceedings of the English for Specific Purposes Terminology and Translation Workshop, Košice 13- 14 September 2007. Univerzita P.J. Šafárika. Jingbo Zhu and EH Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. EMNLP- CoNLL. Jingbo Zhu, Huizhen Wang, and Eduard Hovy. 2008. Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classifi- cation. International Joint Conference on Natural Language Processing, pages 366–372.