Leveraging Wikipedia for Ontology Pattern Population from Text Michelle Cheatham and James Lambert Charles Vardeman II Wright State University University of Notre Dame 3640 Colonel Glenn Hwy. 111C Information Technology Center Dayton, OH 45435 Notre Dame, IN 46556 Abstract of two common approaches to this task and show that, due to the fast-changing nature of computer technology, they lose Traditional approaches to populating ontology design pat- terns from unstructured text often involve using a dictionary, their effectiveness over time. We then evaluate the utility of rules, or machine learning approaches that have been estab- using a continuously manually-curated knowledge base to lished based on a training set of annotated documents. While mitigate this performance degradation. The results illustrate these approaches are quite effective in many cases, perfor- that this method holds some promise. mance can suffer over time as the nature of the text documents The remainder of this paper is organized as follows. changes to reflect advances in the domain of interest. This is The section Computational Environment Representation particularly true when attempting to populate patterns related presents the schema of the ontology we seek to populate, to fast-changing domains such as technology, medicine, or while the Dataset section describes the collection of aca- law. This paper explores the use of Wikipedia as a source of demic articles we use as our training and test sets. The ap- continually updated background knowledge to facilitate on- proach and results are presented and analyzed next, and fi- tology pattern population as the domain changes over time. nally some conclusions and ideas for future work in this area are discussed. Introduction Two of the central underpinnings of scientific inquiry are Computational Environment Representation the need to verify results through reproduction of the ex- The Computational Environment ODP was developed over periments involved and the importance of “building on the the course of several working sessions by a group of on- shoulders of giants.” In order for these things to be possi- tological modeling experts, library scientists, and domain ble, experimental results need to be both discoverable and scientists from different fields, including computational reproducible. Important steps have been made recently in chemists and high-energy physicists interested in preserving pursuit of this, including the relaxation of page restrictions analysis of data collected from the Large Hadron Collider on “methodology” sections in many academic journals and at CERN. Our goal was to arrive at an ontology design pat- the requirements by some funding agencies that investiga- tern that is capable of answering the following competency tors make any data they collect publicly available. However, questions: in order for previous work to be truly verifiable and reusable, researchers must be able to not only access the results of • What environment do I need to put in place in order to those efforts but also to understand the context in which they replicate the work in Paper X? were created. A key element of this is the need to preserve • There has been an error found in Script Y. Which analyses the underlying computations and analytical process that led need to be re-run? to prior results in a generic machine-readable format. • Based on recent research in Field Z, what tools and re- In previous work towards this goal, we developed an sources should new students work to become familiar ontology design pattern (ODP) to represent the computa- with? tional environment in which an analysis was performed. This model is briefly described in Section of this work, and more • Are the results from Study A and Study B comparable detail is available in (Cheatham et al. 2017). This paper de- from a computational environment perspective? scribes our work on the next step: development of an au- We focused on creating a model to capture the actual en- tomated approach to populate the ODP based on data ex- vironment present during a computational analysis. Repre- tracted from academic articles. We explore the performance senting all possible environments in which it is feasible for Copyright held by the author(s). In A. Martin, K. Hinkelmann, A. the analysis to be executed is outside of the scope of our Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of current effort. We also do not include the runtime config- the AAAI 2019 Spring Symposium on Combining Machine Learn- uration and parameters as part of the environment. The ra- ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford tionale is that to some extent the same environment should University, Palo Alto, California, USA, March 25-27, 2019. be applicable to many computational analyses in the same field of study, but this would not be true if we included such the desired schema using SPARQL construct queries, as de- analysis-specific information as runtime parameters as part scribed in (Zhou et al. 2018). of the environment. Data sources were considered outside of the confines of the computational environment for similar Listing 1: Sample entry from the gold standard reasons. External web services and similar resources were −b i o 15 not included because they are not inter-related in the same cpu make : HT Xeon way as the environmental elements are. For instance, decid- cpu f r e q u e n c y v a l u e : 3 . 2 ing to use a different operating system often necessitates cpu f r e q u e n c y u n i t : GHz using a different version of drivers, libraries, and software cpu num−c o r e s : 16 applications, whereas in most cases the entire hardware con- memory t y p e : RAM figuration could be changed with no impact on external ser- memory s i z e v a l u e : 8 vices. memory s i z e u n i t : GB Figure 1 shows the schema that we targeted for population programming−l a n g u a g e : C / C++ in this study. It is a slightly modified version of the ODP pre- o s k e r n e l name : L i n u x sented in (Cheatham et al. 2017) – the overall goal and com- o s d i s t r i b u t i o n name : Red Hat petencies of the pattern remain the same, but some entities have been omitted, and properties related to the manufac- A preliminary analysis of the training and test datasets in- turer, make and model of computers and hardware compo- dicate that approaches trained on the older articles may face nents have been added, as well as an entity related to pro- difficulties. For example, Figure 2 shows the number of dis- gramming language, because this information is important tinct values for each tag in the test set (left), current training for our current application goals. set (middle) and old training set (right). From this we can see that the relative number of distinct values for each tag is Dataset about the same in the test set and the current training set (i.e. the tags with the largest number of distinct values in the test The dataset consists of 100 academic articles published in set are also those with the largest number of distinct values 2012 or later (called “current”)1 and 20 published in or be- in the training set). This is less true for the old training set fore 2002 (called “old”). There are 20 current papers and – for instance, the older articles never talk about the num- four old ones from each of five fields of study: biology, ber of CPU cores (presumably because they were all single chemistry, engineering, mathematics, and physics. Twenty core) and the only programming language they mention is of the current articles were randomly selected to make up Fortran. It therefore may be difficult for models trained on the current training set while the remainder form the test set. the old training set to correctly extract information relevant The documents were collected by searching Google Scholar to these tags in the test set. New entities becoming relevant using the query AND algorithm (e.g. bi- over time are sometimes termed “emergent entities” and are ology and algorithm, chemistry and algorithm). Patents and particularly challenging for many NER systems (Nakashole, citations were omitted from the search criteria. Any paper Tylenda, and Weikum 2013). with a PDF download link was searched for the terms cpu Figure 3 shows the number of times each tag was used and processor, in order to quickly determine if the paper had in the test set and both training sets. It is evident that the any content related to a computational environment. If the older articles talk more at the level of computer manufac- paper contained either term, it was retained in the dataset. turer, make and model, while more current articles mention We then manually created a gold standard for the dataset more details such as information about the CPU, the graph- by going through each paper and listing any information rel- ics card, and the amount of memory. Looking at both fig- evant to the ODP shown in Figure 1. This process was com- ures (2 and 3), we see that some tags, such as CPU manu- pleted by a single person, but in the few cases in which a facturer, are key for this ontology population task, because question arose (e.g. whether an R4000 is a make or a model), there are only a few distinct values that must be recognized, the opinions of others and external knowledge sources such but these few values occur many times within the test set. as manufacturer’s websites and websites about historical Models trained on the old training set may be at a particular computing technologies were consulted. A typical entry in disadvantage in these cases, if the values for those key tags the gold standard is shown in Listing 1. Note that the en- in the older documents are not reflective of those in the test tries given do not directly correspond to entities in the on- set. tology. For example, the single tag memory size unit: GB corresponds to an instance of the Memory class in the on- Approach and Results tology with a hasSize of some Amount that in turn hasUnit GB. Tagging based on each entity within the ontology, in- In this work we formulate the problem of populating the cluding those such as Memory that represent blank nodes, computational environment pattern from text solely as a would have made the tagging process more arduous, how- Named Entity Recognition (NER) task. This is in con- ever, and the information can be readily expanded to match trast to many other approaches that divide this process into two steps: entity recognition/typing and relation extraction 1 Three articles from the “current” set had to be thrown out at a (Petasis et al. 2011). In other words, we are trying to directly later stage due to irregularities that caused them to be unparseable arrive at the tags specified in the gold standard as described by multiple PDF parsers. in the Dataset section, with the intention of later creating so cp ftw u_ ar 0 5 10 15 20 25 30 35 40 45 fre e_ qu ver e si so ncy on ftw _v ar lue a op e_ e m n op ratin em cp am er o ry m u_ e at g-sy _s od in ize el g- stem sy _v st _d em ist cp alu _d ib r u _ e ist uti ma rib on ke ut _n co ion_ am m e pu ver te sio m r_ n em mo pr o de og cpu ry_ l ra _n ty m um pe m in -co m n g- la res ul tic um ngu om pr ag - pu oce e co te ss r_ or m lo s pu c te nu atio r_ m n m -n an od uf es ac lations between instances can be then determined based on environment is described in an article, the appropriate re- minology used by the ODP. If more than one computational SPARQL construct queries to expand these tags into the ter- gp tur m com u_ er ul m tic pu o om ter del m ute a p _m m e r ke ul mu mor _m tic o om ltico y_si de op pu mp ze_ l te ut un er r_ er i at in m _ t g- an ma sy uf ke st ac em tu _k gpu rer er _m in cpu ne te l ak rc _m _ve e Figure 1: The Computational Environment ODP on an rs n e u f i on a cp ctio ctu u_ n- r fre ne er op gp qu two er u_ en rk at m cy_ in an u un proximity in the underlying text. g- sy gpu fac it st _ t em nu ure _k m- r gp e c u_ rne ore ke l_ s rn na el m _v e er si o n Figure 2: The number of distinct values for each tag in the test set and the current and old training sets. Old Testset Current 80 70 60 50 40 30 Test set 20 Current Old 10 0 cy e _m t o ufa ke m size ue so o r u n i t e_ e em of pu ame rib _v el cp ion ion og u m m e in ces s p m an rs er m_ r_ er ge u fac l lo e m _m n m _m e uf ake at mp om r_ es l rc ke ufa ke gp n-n ion _v res n m rer pu el_ on r_ l ct ve r ke m rk or e_v r tic k _ve r g- rib u de sy r_ te de te de u_ cpu uni ne el_ re em iz re m -pro ore e en lu ar typ pr n nu am r_ m u_ gpu ak co gpu tio si o ist re od g- om co g-l so u_ nu o er co ltic ute od m stem tion tur a in tem an _ma op ste ute put gua y_ al om ern rsi tic n ctu on rn ctu m ry_s ctu rn -co qu _va in st an o g- ute pu mo pu o ut ers rs te na gp u_ etw m _ m _ ca er -c n u_ _n _d twa _m at di m _m ftw y_ p -n a fre cy m r el u_ en u an em n c io a e cp equ m m m _ te _ o r _f gp cp s ra u m op lti u ul cp sy m s ul m st c sy in sy u g- m in in at at er er op op Figure 3: The number of occurrences of each tag in the test set and the current and old training sets. Preprocessing Machine Learning NER techniques generally fall into two categories: ma- chine learning-based and rules-based (Nadeau and Sekine 2007). Machine learning-based NER systems typically in- In academic articles, the computational environment tends volve training a classifier by manually tagging input docu- to be discussed in a small number of isolated points within ments. The classifier then attempts to learn how to correctly a document, often a single paragraph, sentence, footnote, recognize and type entities from new documents based on or caption. Because some of the techniques we employ in the training set and a set of features such as a word’s part our approach, particularly the use of Wikipedia, are time- of speech, position in a document, prefix/suffix, frequency, intensive, we begin our ontology population task by attempt- etc. Various approaches have been employed to model the ing to identify the key portions of each document (which we relationship between the features and the entities, including generically refer to as the “key paragraph”, with the under- Support Vector Machines (Isozaki and Kazawa 2002), Max- standing that it may actually be a footnote, caption or other imum Entropy (Chieu and Ng 2002), and neural networks element). Our goal in doing this was to arrive at an ontology- (Lample et al. 2016). Because the task of manually generat- agnostic approach with high recall. The approach we take ing training data is onerous, there are also semi-supervised is quite basic: a Python script is given the ontology (in the and unsupervised machine learning-based NER approaches, form of an OWL file) and the name of the academic article but these often only rival, rather than exceed, the perfor- in which to identify the key paragraphs. The script parses mance of supervised systems (Nadeau and Sekine 2007) and the names of all entities from the ontology, splits the entire so are not considered here. content academic article (including footnotes, captions, etc.) In this work we applied a Conditional Random Field into paragraphs, and counts the number of times any ontol- (CRF) based classifier to the task of tagging information rel- ogy term appears in each paragraph. Any paragraph with a evant to the computational environment ontology. CRF was count within 90 percent of the maximum count for that arti- chosen due to its long-standing popularity for information cle is considered a key paragraph. This approach produces a extraction from unstructured text (Kristjansson et al. 2004; recall of .94 and a precision of .99, for an F-measure of .96. Bundschus et al. 2008). We again used the Stanford NLP The remainder of the discussion in this section assumes that group’s implementation (Finkel, Grenager, and Manning the key paragraphs have been successfully identified prior to 2005). This classifier uses features such as word order, n- invoking the approach under consideration. The text in each grams, part of speech, and word shape to create a proba- key paragraph is split into sentences, tokenized, lemmatized, bilistic model to predict the tags in previously unseen docu- and part of speech tagging is performed using the Stanford ments. We developed models based on both the current and NLP pipeline (Manning et al. 2014). old training sets, an example of which is shown in Listing 2 Dataset Model Precision Recall F-measure Dataset Model Precision Recall F-measure Training Current 0.95 0.93 0.94 Training Current 0.96 0.93 0.95 Old 0.95 0.93 0.94 Old 0.99 0.94 0.97 Test Current 0.90 0.52 0.66 Test Current 0.79 0.67 0.73 Old 1.00 0.01 0.02 Old 0.80 0.28 0.41 Table 1: Performance of a machine learning-based approach Table 2: Performance of a rules-based approach (the O symbol indicates that no tag is relevant for that token). ! { word : / ( ? i ) v e r s i o n / } ] ) ) , a c t i o n : ( Annotate ( $ 1 , ner , Listing 2: Tagged data used to train the CRF classifier ” o s d i s t r i b u t i o n name ” ) ) , running O stage : 2 on O } a O Since the results of the machine learning-based approach PC O were mediocre, we also implemented a rules-based approach with O using the Stanford NLP group’s TokenRegex system (Chang Linux o s k e r n e l name and Manning 2014). A strong effort was made to establish a CentOS o s d i s t r i b u t i o n name set of rules for each training set that was as general as pos- 5 os d i s t r i b u t i o n v e r s i o n sible while still maximizing F-measure. Of course, the more , O general the rules are, the more likely they are to conflict with 1 memory s i z e v a l u e one another at some point. Developing these rule sets was GB memory s i z e u n i t therefore quite a difficult task, with each one taking about memory O two days to create. These rule sets were then used to tag the and O articles from the test set. The performance is shown in Table 2.4 cpu f r e q u e n c y v a l u e 2. GHz cpu f r e q u e n c y u n i t We again see that the performance of the rule set based CPU O on current articles drops on the test set, though it remains significantly higher than that of the machine learning-based We then used each model to tag the articles in the test set approach (.73 versus .66 F-measure). Additionally, the per- and assessed the performance in terms of precision, recall formance of the rule set based on older articles, while only and F-measure (Table 1). The F-measure of the approach a little more than half that of the current rules, is much on the training data is not quite 1.0 because tagging in the more reasonable than that of the CRF classifier using the Stanford NLP pipeline only happens at the level of tokens, model trained on old articles (.41 versus .02 F-measure). We and some of the articles contain malformed tokens, such as tried combining the machine learning- and rules-based ap- IntelCorei7, that contain information about more than one proaches, but the combined performance was not any better tag. Still, the performance of both models is quite good on than that of the rules-based approach alone. We therefore the training data. The F-measure using the model trained on did not consider the machine learning-based approach any current documents drops considerably on the test set, but further for this ontology population task. 0.66 may good enough to be of some use in many appli- cations (Others have found that the performance of NER in Enhancing Rules with Background Knowledge technical domains such as biomedicine is in the 58-75 range (Zhang and Ciravegna 2011)). Conversely, the performance As shown in the previous section, the performance of named using the model trained on older articles is abysmal. entity recognition in this domain degrades considerably over time. In this case, the rules created from articles at least ten Rules years older than those being tagged were 44 percent less ef- fective than rules based on contemporaneous articles (.41 Rules-based NER systems allow users to craft rules using a versus .73 F-measure). In looking into the specifics of this regular expression language that can incorporate text, part issue, one underlying problem is that some tags’ rules are of speech, and previously assigned tags. An example rule, based on other tags. For example, a computer’s make can which states that any noun phrase that is between an oper- often be recognized because it follows a computer’s manu- ating system kernel name and the word “version” should be facturer, as in Lenovo ThinkPad. If Lenovo is not recognized tagged as an operating system distribution name, is shown as a computer manufacturer there is a double hit to perfor- in Listing 3. mance, because not only is that word not tagged correctly, but neither is ThinkPad. The common computer manufac- Listing 3: Sample rule turers a decade ago (e.g. Silicon Graphics, Compaq, DEC) { are not common today, which leads to the observed per- ruleType : ” tokens ” , formance degradation. Even more problematic is that over p a t t e r n : ( [ { n e r : o s k e r n e l name } ] time completely new technologies become relevant to de- ( [ ( { p o s :NN} | { p o s : NNP} ) & scriptions of computational environments. A good example Dataset Model Precision Recall F-measure 0.41 F-measure). The performance of the current rule set is Training Current 0.93 0.93 0.93 reduced by 4 percent (0.70 versus 0.73 F-measure), indicat- Old 0.93 0.93 0.93 ing that this approach should not be used unless it is war- Test Current 0.71 0.70 0.70 ranted by changes in the domain of interest and the age of Old 0.61 0.41 0.49 the model used for ontology population. Table 3: Performance of a rules-based approach preceded by Conclusions and Future Work Wikipedia-based annotations In this work we consider the problem of populating an on- tology from a fast-changing domain: computational envi- is GPUs, which have only become popular for parallel pro- ronments. We show that the performance of two popular cessing relatively recently. approaches degrades significantly over time and propose Our goal in this work was to reduce the performance the use of a continuously updated background knowledge degradation seen as rules age by leveraging a source of back- source to mitigate this performance degradation. A basic ground knowledge that is continuously updated. In addition, implementation of this idea improved the performance of we sought to develop an approach that does not require the a rules-based approach based on older documents by 20 types of time-consuming training typical of machine learn- percent. It remains to be determined if this technique can ing and rules based methods. Ideally, the approach should achieve similar results in other fast-changing fields, such take the ontology (or desired set of tags) as input and require as medicine or law. In addition, it is possible that this re- no additional configuration. sult can be improved upon by a more advanced use of The solution we developed uses Wikipedia as the back- Wikipedia, such as through neural network based methods ground knowledge source. We developed a custom NER like word2vec (Mikolov et al. 2013). We plan to explore this module that fits into the Stanford NLP pipeline. The mod- in our future work on this topic. ule takes the list of desired tags as input. We limit this list All of the materials used in this project, including to the tags that are not related to numeric values (i.e. we the article set, answer set, ontology, machine learning omit tags like num-processors and cpu num-cores) because models, rules, and code, have been published to GitHub querying Wikipedia for a number like “4” or “8” is not going (https://github.com/mcheatham/compEnv-extraction). to produce any useful information. We also provide common synonyms for tags (i.e. “processor” for CPU and “company” Acknowledgments for manufacturer). When tagging a document, the module The first and third authors acknowledge partial support by queries Wikipedia for each proper noun in the key para- the National Science Foundation under award PHY-1247316 graph(s) and if a page is returned, determines which tag, if DASPOS: Data and Software Preservation for Open Sci- any, is most relevant by counting the number of times each ence. The third author would like to acknowledge partial tag appears in the first three sentences of the document and support from Notre Dame’s Center for Research Comput- weighting tags that appear earlier more than those that ap- ing. pear later. After the Wikipedia annotator finishes, the rules- based annotator is run. Other researchers have also leveraged Wikipedia for vari- References ous aspects of the NER and ontology population tasks. Many Bundschus, M.; Dejori, M.; Stetter, M.; Tresp, V.; and of these efforts are focused on using Wikipedia to do mul- Kriegel, H.-P. 2008. Extraction of semantic biomedical tilingual NER (Nothman et al. 2013; Kim, Toutanova, and relations from text using conditional random fields. BMC Yu 2012). More related to our current work, Kazama and bioinformatics 9(1):207. Torisawa determine a candidate tag for an entity by extract- Chang, A. X., and Manning, C. D. 2014. TokensRegex: ing the first noun after a form of the word “be” within the Defining cascaded regular expressions over tokens. Tech- introductory sentence of its Wikipedia page and use that as nical Report CSTR 2014-02, Department of Computer Sci- one of the features in a CRF classifier (Kazama and Tori- ence, Stanford University. sawa 2007). Klieger et al. take a similar approach but rather Cheatham, M.; Charles Vardeman, I.; Karima, N.; and Hit- than using a classifier, they leverage WordNet to find syn- zler, P. 2017. Computational environment: An odp to sup- onyms and hypernyms of the type identified from Wikipedia port finding and recreating computational analyses. In 8th in order to arrive at one of the tags of interest (Kliegr et Workshop on Ontology Design and Patterns –WOP2017. al. 2008). A more thorough survey of NER approaches that utilize Wikipedia can be found in (Zhang and Ciravegna Chieu, H. L., and Ng, H. T. 2002. Named entity recog- 2011). The key difference between our work and existing nition: a maximum entropy approach using global informa- approaches in that the results have been analyzed in a way tion. In Proceedings of the 19th International Conference that enables the ability of Wikipedia to mitigate the atrophy on Computational Linguistics, volume 1, 1–7. Association of a rules-based technique to be specifically evaluated. for Computational Linguistics. Our method leads to the results shown in Table 3. While Finkel, J. R.; Grenager, T.; and Manning, C. 2005. Incorpo- this approach is relatively basic, we see that it improves the rating non-local information into information extraction sys- performance of the old rule set by 20 percent (0.49 versus tems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Zhang, Z., and Ciravegna, F. 2011. Named entity recogni- 363–370. Association for Computational Linguistics. tion for ontology population using background knowledge Isozaki, H., and Kazawa, H. 2002. Efficient support vec- from Wikipedia. In Ontology Learning and Knowledge Dis- tor classifiers for named entity recognition. In Proceedings covery Using the Web: Challenges and Recent Advances. of the 19th International Conference on Computational Lin- IGI Global. 79–104. guistics, volume 1, 1–7. Association for Computational Lin- Zhou, L.; Cheatham, M.; Krisnadhi, A.; and Hitzler, P. 2018. guistics. A complex alignment benchmark: Geolink dataset. In Inter- Kazama, J., and Torisawa, K. 2007. Exploiting Wikipedia national Semantic Web Conference, 273–288. Springer. as external knowledge for named entity recognition. In Pro- ceedings of the 2007 Joint Conference on Empirical Meth- ods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Kim, S.; Toutanova, K.; and Yu, H. 2012. Multilingual named entity recognition using parallel data and metadata from wikipedia. In Proceedings of the 50th Annual Meet- ing of the Association for Computational Linguistics: Long Papers-Volume 1, 694–702. Association for Computational Linguistics. Kliegr, T.; Chandramouli, K.; Nemrava, J.; Svatek, V.; and Izquierdo, E. 2008. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Min- ing: held in conjunction with the ACM SIGKDD 2008, 8–17. ACM. Kristjansson, T.; Culotta, A.; Viola, P.; and McCallum, A. 2004. Interactive information extraction with constrained conditional random fields. In AAAI, volume 4, 412–418. Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; and Dyer, C. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360. Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S. J.; and McClosky, D. 2014. The Stanford CoreNLP nat- ural language processing toolkit. In Association for Compu- tational Linguistics (ACL) System Demonstrations, 55–60. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111–3119. Nadeau, D., and Sekine, S. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26. Nakashole, N.; Tylenda, T.; and Weikum, G. 2013. Fine- grained semantic typing of emerging entities. In Proceed- ings of the 51st Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers), volume 1, 1488–1497. Nothman, J.; Ringland, N.; Radford, W.; Murphy, T.; and Curran, J. R. 2013. Learning multilingual named entity recognition from wikipedia. Artificial Intelligence 194:151– 175. Petasis, G.; Karkaletsis, V.; Paliouras, G.; Krithara, A.; and Zavitsanos, E. 2011. Ontology population and enrichment: State of the art. In Knowledge-driven multimedia informa- tion extraction and ontology evolution, 134–166. Springer- Verlag.