Key Phrase to Text Similarity, Clustering, and Interpretation in Hierarchical Ontologies Boris Mirkin Applied Mathematics and Informatics, National Research University Higher School of Economics Moscow Computer Science and Information Systems, Birkbeck University of London BMirkin@hse.ru Abstract. Scoring similarity between key phrases and unstructured texts is an issue which is important in both information retrieval and text analysis. Researchers from the two fields use different scoring functions, although clear delineation between the two still is lacking. We use suffix tree based score expressing the average conditional probability of a sym- bol in a common substring. Usually, a domain taxonomy serves as the source of key-phrases. Given a set of entities, such as texts or projects or working groups, one can derive clusters of key-phrases using key-phrase- to-entity scores. The clusters represent common themes in the meaning of texts or in activities of working groups. To interpret them, the domain ontology should be used. If the ontology is a rooted tree, a lifting method is proposed to find the most parsimonious interpreting head subject(s), up to a few gaps and offshoots. Some applications and application is- sues are considered. The work is being conducted jointly with T. Fenner (London), S. Nascimento (Lisbon) and E. Chernyak (Moscow).