Mapping of glossary terms from the Flora of North America to the Plant Ontology enhances both resources Ramona Walls1,* Hong Cui2, James A. Macklin3, Chris Mungall4, Laurel Cooper5, Dennis Stevenson1 and Pankaj Jaiswal5 1  New  York  Botanical  Garden,  Bronx,  New  York,  USA   2School  of  Information  Resources  and  Library  Science,  University  of  Arizona,  Tucson,  Arizona,  85719  USA   3Research  Branch,  Agriculture  and  Agri-­‐Food  Canada,  Ottawa,  Ontario,  Canada   4Lawrence  Berkeley  National  Lab,  Berkeley,  California,  USA   5Department  of  Botany  and  Plant  Pathology,  Oregon  State  University,  Corvallis,  OR,  USA   where the FNA has multiple terms with the same name but 1 INTRODUCTION separate meanings that should map to separate PO terms. A Traditional taxonomic literature can provide a wealth of curator mapped the remaining FNA glossary terms to PO data, but access to that data is limited by its free-text format. terms, based on the FNA and PO definitions. Taxonomic treatments such as the Flora of North America A total of 193 FNA terms mapped to existing PO pri- (FNA Editorial Committee 1993) consist of terse descrip- mary term names and 126 mapped to existing synonyms. tions of the characters used to identify taxa, such as: 333 FNA terms had the same meaning as existing PO terms “…Leaves usually alternate or opposite, sometimes in and have been added as synonyms to the PO, citing the FNA basal rosettes, rarely in whorls; rarely stipulate, usually glossary as the source. 143 unique new terms will be added petiolate, sometimes sessile…” to the PO, corresponding to 180 FNA glossary terms. 118 Converting taxonomic descriptions to computer-readable FNA terms could not be mapped to PO terms, either because format makes them available for automatic retrieval and they were too vague (12 terms, e.g., FNA:lamella, which large-scale analyses. Ontologies such as the Plant Ontology could apply to many different tissue types), because they are (PO) play a central role in automatic annotation, by provid- subcellular components and belong in the Gene Ontology (5 ing semantic meaning for the words in a description. We terms, e.g., FNA:flagella), or because they are better mod- used automated and manual methods to map terms from the eled as qualities (93 terms, e.g., FNA:puncta is better treated Categorical Glossary for the Flora of North America Project as the quality punctate). (http://128.2.21.109/fmi/xsl/FNA/home.xsl) to the PO. The PO is fairly extensive in its coverage of plant ana- tomical entities, as many of missing terms are specialized 2 METHODS structures found only in a few taxa. The PO benefits from this mapping through increased coverage of plant terminol- Terms from the pre-existing categories of “structure”, ogy. Text mining tools such as CharaParser (Cui 2012) that “feature”, or “nominative” were extracted from the FNA are being developed to mine taxonomic descriptions can glossary, roughly corresponding to the PO class plant ana- now use the PO more effectively for automated text annota- tomical entity. An automated mapping to PO release 16 was tion and in return mine more candidate terms from the lit- done using Obol software (Mungall 2004). We manually erature to further enrich PO. The mapping of FNA IDs to checked the automated mapping, and removed any matches PO IDs is available at http://tinyurl.com/FNAPOmapping. that were incorrect. Remaining glossary terms were either manually mapped to existing PO terms, classified as inap- ACKNOWLEDGEMENTS propriate for the PO, or marked to be added to the PO. NSF-IOS: 0822201 to the Plant Ontology Project, and the 3 RESULTS AND DISCUSSION Flora of North America Association. 839 terms were extracted from the FNA glossary, com- REFERENCES pared to 1080 terms in the plant anatomical entity branch of Cui, H. 2012. CharaParser for fine-grained semantic annotation of organism the PO. Using text matching, Obol mapped 264 FNA terms morphological descriptions. J. of Am. Soc. of Information Science and to 313 existing PO terms or synonyms, including 49 FNA Technology. 63(4) doi:10.1002/asi.22618 terms that matched more than one PO term or synonym. FNA Editorial Committee, eds. 1993. Flora of North America North of Mexico. 16+ vols. New York and Oxford. Most duplicate matches arose because the PO has many Mungall, Christopher J. 2004. “Obol: Integrating Language and Meaning in synonyms in Spanish that are identical to the English term Bio-­‐‑ontologies.” Comparative and Functional Genomics 5 (6-­‐‑7) (Au- name. Only 30 Obol matches had to be removed, in cases gust 1): 509–520. doi:10.1002/cfg.435 1