Paraphrasing of Synonyms for a Fine-grained Data Representation Svetla Koeva Institute for Bulgarian Language Bulgarian Academy of Sciences 52 Shipchenski prohod Blvd. Sofia, 1113 Bulgaria svetla@dcl.bas.bg ABSTRACT Two expressions are synonymous in a linguistic context C if the The paper addressed the question how the paraphrasing of substitution of one for the other in C does not alter the truth synonyms can be linked with a fine-gained ontology based data value. representation. Our challenge is to identify for a set of synonyms The test implies that the WordNet synonyms are cognitive (or (including terms and multiword expressions) the best lexical propositional) synonyms [2]. Cognitive synonymy is a sense paraphrases suitable for given contexts. Our hypothesis is that: i. relation that holds between two or more words used with the same the minimal context in which the paraphrasing can be validated is meaning in a given context in which they are interchangeable. For different for different (semantic) word classes; ii. paraphrasing is example, the pairs {brain; encephalon}, {cry; weep}, {big; huge} defined by patterns within the minimal context containing the are cognitive synonyms. However, cognitive synonyms may differ synonym and its dependent. For each minimal context a different in their collocational range which means that their set of rules is defined with respect to the modifiers and interchangeability is restricted. For example the words educator, complements the words are licensed for. The extracted pedagogue, and pedagog are synonyms linked in the WordNet dependency collocations are linked with the WordNet synonyms. with the definition 'someone who educates young people'. In the With this we achieve two goals: to define the lexical paraphrases collocation with the word certified most preferred is the word suitable for a given context and to augment available lexical- educator (certified educator), followed by pedagogue, while the semantic resources with linguistic information (the dependency word pedagog is most rarely used. In the collocation Microsoft collocations in which synonyms are interchangeable). certified educator the word educator would not be replaced with either of the words pedagogue or pedagog. The absolute Categories and Subject Descriptors synonymy is a symmetric relation of equivalence. However, the definition of synonymy as a substitution of words in a given I.2. [Artificial Intelligence]: Semantic Networks context alternates the meaning of the equivalence relation [16]: J. [Computer applications]: Linguistics If x is similar to y, then y is similar to x in an a equivalent way. General Terms We focus on WordNet because it is a hand crafted (or hand Languages validated) lexical-semantic network and ontology and offers a large network of concepts and named entities along with an Keywords extensive multilingual lexical coverage. In this paper we present a semantics, synonymy, paraphrasing, dependency collocations pattern based method for identification of dependency collocations (a pair of grammatically dependent words that co- occur with more frequency than random) in which two words are 1. INTRODUCTION interchangeable. The difference between grammatical and lexical Paraphrasing is used in many areas of Natural Language collocations is pointed out by many researchers. We introduce the Processing – ontology linking, question answering, notion of dependency collocation which subsumes grammatical summarization, machine translation, etc. Paraphrasing between and lexical collocations and adds the condition for a grammatical synonyms seems a relatively simple task, but in practice an dependence (such as subject, complement, and modifier) between automatic paraphrasing of synonyms might produce collocates. ungrammatical or unnatural sentences. The reason is that although there are many synonyms in any natural language, it is unusual for WordNet, together with other semantic resources such as YAGO1, words defined as synonyms to have exactly the same meaning in OpenCyc2, DBpedia3, etc., is part of the Linguistic Linked Open all contexts in which they are used. In other words, the notion of Data cloud [1]. Our aim is twofold: to define the lexical absolute synonyms remains theoretical. The human knowledge paraphrases suitable for a given context and to augment available about synonyms – words (and/or multiword expressions) denoting one and the same concept, and semantic relations such as hypernymy, meronymy, antonymy, etc., is encoded in the lexical- 1 http://www.mpi-inf.mpg.de/departments/databases-and- semantic network WordNet [16]. The following test for synonymy informationsystems/research/yago-naga/yago/ is applied to WordNet: 2 http://www.opencyc.org 3 http://wiki.dbpedia.org/about 79 lexical-semantic resources with linguistic information (the similarity. Turney [23] proposes a supervised machine learning dependency collocations in which given words are synonyms). approach for discovering synonyms, antonyms, analogies and associations, in which all of these phenomena are subsumed by 2. RELATED WORK analogies. The problem of recognizing analogies is viewed as the There are various attempts to extract automatically candidates for classification of semantic relations between words. a paraphrase based on the Distributional hypothesis, which states The approach proposed here aims at the extraction of collocations that words occurred in the same contexts tend to have similar in which synonyms occur and interchange and towards the meanings [6]. Differences in the approaches can be viewed mainly generalization of the shared contexts. with respect to the restrictions on the contexts [9]: some approaches (for example, grouping similar terms in document 3. PATTERN BASED APPROACH FOR classification) consider all words in a document, others (focused DEPENDENCY COLLOCATIONS on extracting of semantic relations like synonymy) may take The synonymy in WordNet is limited to a certain set of contexts words in a predefined window or extract words in a specific and cannot be directly applied for automatic paraphrasing. For syntactic relation to the target word. Ruiz-Casado et al. [20] label example the words car, automobile and auto from the a pair of words as synonyms if the words appear in the same synonymous set {car; auto; automobile; machine} with a contexts, but this simple approach in many cases might link also definition 'a motor vehicle with four wheels; usually propelled by hypernyms, hyponyms, antonyms, etc. Semantic relations such as an internal combustion engine' can be interchanged in the purpose, agent, location, frequency, material, etc. are assigned to collocations with the word luxury – luxury car, luxury noun-modifier pairs based on semantic and morphological automobile, luxury auto, luxury machine, with the prepositional information about words [17, 18]. phrase with lights – car with lights, auto with lights, automobile Experiments were performed with decision trees, instance-based with lights, machine with lights, and so on. On the other hand, it learning and Support Vector Machines. Turney and Littman [21] is hard to find examples in which the word car from the and Turney [22] use paraphrases as features to analyze noun- collocation car cash market is replaced by words auto, modifier relations. The hypothesis, corroborated by the reported automobile or machine. experiments, is that pairs which share the same paraphrases Our challenge is to identify for a set of synonyms the best lexical belong to the same semantic relation. Lin and Pantel [14] measure paraphrases suitable for given contexts. We accept the view that the similarity between paths in dependency trees assuming that if the meaning of words is expressed through their relations with two dependency paths tend to link the same sets of words (for other words and each word selects the set of semantic word example, commission, government versus crisis, problem) the classes with which it can express a specific meaning. For example, meanings of the paths are similar and the words can be the word director and the word professor are similar in the way paraphrased (for example, finds a solution to and solves). Padó they designate the concept for a person, and this determines the and Lapata [19] take into account context words that stand in a fact that both nouns can co-occur with adjectives denoting height, syntactic dependency relation to the target word and introduce an age, etc. The subsets of adjectives that can collocate with the two algorithm for constructing semantic space models. They rely on words differ with respect to their meaning, and not all adjectives three parameters which guide model construction: which types of that are compatible with one noun are compatible with other as syntactic structures contribute towards the representation of well (chief executive officer, ?chief executive professor). The lexical meaning; importance weighs of different syntactic meaning of the word professor also implies that it may be relations; and the representation of the semantic space (as specified with expressions for disciplines as complements cooccurrences of words with other words, words with parts of (professor of physics), while, in comparison, the word director speech, or words with argument relations such as subject, object, may not. Both words can be specified for institutions through etc.). Heylen et al. [10] compare the performance of models using selecting the respective complements. Therefore, the closer the a predefined context window and those relying on syntactically similarity between two words is the bigger is the number of the related words and show that the syntactic model outperform the contexts which they share. Our hypothesis is that: other models in finding semantically similar nouns for Dutch. Ganitkevitch et al. [3] extracted a Paraphrase Database using the i. the minimal context in which the paraphrasing has to be cosine distance between vectors of distributional features applied identified is different for different word classes; on parallel texts. ii. paraphrasing is defined by patterns within the minimal Hearst [7] introduces lexico-syntactic patterns (for example, X context containing the synonym and its dependent such as Y) in the task for automatic identification of semantic (dependency collocations). relations (hypernymy and hyponymy). Several techniques aim at providing support for the automatic (or semi-automatic) definition The minimal context for English involves different combinations of the patterns to be used for extraction of semantic relations. of the following: adjectival modifier in pre-position, one or Hearst [8] proposes to look for co-occurrences of word pairs several, prepositional complement in post-position; and noun appearing in a specific relation inside WordNet. Maynard et. al. modifier in pre-position. [15] discuss the use of information extraction techniques For adjectives the minimal context starts with the adjective (the involving lexico-syntactic patterns to generate ontological target synonym) and ends with a noun modified by the adjective information from unstructured text. Several approaches combine (for example new idea, new brilliant idea, fresh idea, fresh distributional similarity and lexico-syntactic patterns. Hagiwara et brilliant idea, but not New Idea Magazine). al. [5] describe experiments that involve training various synonym For nouns the minimal context is one of the following: an classifiers. Giovannetti et al. [4] detect semantically related words adjective modifier in the leftmost position and the head noun (the combining manually composed patterns with distributional target synonym) at the right position; a noun modifier in the 80 leftmost position and the head noun (the target synonym) at the The rule says that the head noun can be modified by a right position (for example gold light, amber light, but not Gold prepositional phrase in post-position. The structure of the Light Gallery); the noun (the target synonym) in the leftmost prepositional phrase is constrained to a preposition, zero or more position and a prepositional complement – a preposition and a determiners, zero or more adjectives, and a noun. This general noun at the right position (for example flood of requests, torrent rule is multiplied by replacing its element "NOUN LEMMA" with of abuse). the WordNet synonyms, for example l="teacher" and For verbs the minimal context is one of the following: the verb l="instructor". Our approach makes use of handcrafted rules (the target synonym) in the leftmost position and an object noun running on preliminary annotated texts with part-of-speech tags, at the right position (for example compose music, write music, tags for grammatical categories, and lemmas. Apache OpenNLP 4 with pre-trained models and Stanford Core-NLP55 are used for compose nice music, but not compose music online); the verb (the the annotation of the English texts – sentence segmentation, target synonym) in the leftmost position, a preposition and an tokenisation, and POS tagging [13]. object noun at the right position (for example lies in the hands, rests in the hands, but not rests in the hands of the United States The rules are run on a corpus6 [13] and match for a given pair of Congress). synonyms their minimal contexts, i.e. months of investigation ENG2014348156n, breaking the longstanding political stalemate The dependency collocations in our approach always contain the two constituents occupying the leftmost and rightmost position in ENG2000351165v, acute pain ENG2000769157a. For adjectives the minimal context (in some cases linked with a preposition). and verbs the target synonym is at the first position in the The minimal context is defined by linguistic rules, which describe collocation. For nouns – either at the first or at the last position of eligible constituents between the leftmost and rightmost position. the collocation. The collocations for different word classes are The minimal contexts and the syntactic structures of dependency extracted from the minimal contexts as follows. For nouns: the collocations are different for different languages. We have first adjective and the last noun or the first noun, a preposition if developed rules for Bulgarian and English but only rules for any, a determiner, if any, and the last noun, i.e. months of English are illustrated in this paper. More minimal contexts investigation. For adjectives: the first adjective and the last noun, relevant for synonymy validation can be defined further, for i.e. acute pain. For verbs: the first verb, a preposition, if any, a example comprising coordinative constructions, subject verb determiner, if any, and the last noun, i.e. breaking the stalemate. dependencies, and so on. The results for the Princeton WordNet2.0 base concepts (PWN 2.0 BCS) are presented in Table 1. 4. IMPLEMENTATION Table 1. Number of Rules snd Collocations for PWN 2.0 BCS The rules are formulated within the linguistic formalism called Est and applied through the parser ParseEst [12]. The Est formal Nouns Verbs Adj Total grammar is a regular grammar. The rules are abstractions for Rules 4624 2997 70 7691 strings of words and do not define a hierarchical (linguistic) structure. An element in the rule can be a word, a lemma, a Collocations 223347 396434 5108 624889 grammatical tag, and a lexicon. The boolean operators, the Kleene Unique collocations 59877 73201 4528 137606 star and Kleene plus can be applied on the elements and on groups of elements. The formalism maintains unification and supports cascading application of rules by preset priority. Right The lemmas of the dependent collocates and the information for and/or left context can be defined in a similar way, as a sequence the number of occurrences in a corpus are linked with the of elements. respective WordNet literals in the field LNote (a note related to a The rules have to exhaust all lexical and grammatical literal), as it is shown in (2)7. combinations and permutations. A given word can be specified by (2) the class to which it belongs: lemma, part-of-speech and grammatical categories. For example, the part-of speech tag 'NC' present2 defines common noun, the tag 'NCs' – singular common noun, the proposal,2 regular expression 'NC.' – singular and plural common noun, etc. budget,1 The word permutations are expressed as different paths in the plan,2 rules. For each minimal context, a different rule is defined with respect to the modifiers and complements the target word classes are licensed for. The rule (1) below matches a minimal context for Since the task is not a classification one a validation against an a noun (only part of the rule is presented here). annotated corpus is not applicable. A validation is performed by (1) an expert during the process of the developing of rules: every change within a rule has been checked against a certain number of matches. 4 http://incubator.apache.org/opennlp/ 5 http://nlp.stanford.edu/software/corenlp.shtml 6 The experiments are made on the monolingual parts of the Bulgarian- English parallel corpus: 280.8 and 283.1 million tokens respectively. 7 The PWN2.0 enriched with collocations of synonyms is published at: http://dcl.bas.bg/wordnet_collocatons.xml 81 The pattern matching approach allows a focused extraction of Princeton WordNet is converted to RDF/OWL8, our future plans dependency collocations – not all collocations are extracted but also include the conversion of the dependency collocations of the only those in which a particular dependency is expected. The rules WordNet synonyms to RDF/OWL representation. are applied without prior word sense disambiguation. However, we consider that the focused use of different minimal contexts for 6. REFERENCES different semantic word classes may lead to correct identification [1] Chiarcos, C., Hellmann, S., Nordho, S. 2011.Towards a of collocations. Sometimes even humans cannot distinguish Linguistic Linked Open Data Cloud: The Open Linguistics between hypernyms and hyponyms if their lemmas coincide. The Working Group. In TAL 52(3), 245–275. approach allows the accumulation of information – in case some [2] Cruse 1986: Cruse D. A. 1986. Lexical Semantics. new rules are formulated or the existing rules are applied on Cambridge: Cambridge University Press. different corpora. [3] Ganitkevitch, J., Van Durme, B., and Callison-Burch, C. 5. CONCLUSION AND FUTURE WORK 2013. PPDB: The paraphrase database. In Proceedings of To conclude, it is difficult to define synonymy taking into account NAACL-HLT. Atlanta, Georgia: Association for all different ways in which synonyms may differ; to provide a Computational Linguistics, 758–764. reliable tests for identification of synonyms, and to calculate all [4] Giovannetti, E., Marchi, S. and Montemagni, S. 2008. possible contexts in which two words are synonyms. On the other Combining Statistical Techniques and Lexico-Syntactic hand, dependency collocations provide suitable contexts for Patterns for Semantic Relations Extraction from Text. In paraphrasing with synonyms. This is a step towards an improving Proceedings of the 5th Workshop on Semantic Web of intuitive definitions of synonyms and for a precise linking of Applications and Perspectives. the synonymous words and expressions with the contexts in which [5] Hagiwara, M. O. Y. and Katsuhiko, T. 2009. Supervised two or more words are interchangeable. Synonym Acquisition using Distributional Features and The dependency collocations consist of the head word lemma – a Syntactic Patterns. In Information and Media Technologies noun or a verb, and the dependent word lemma – an adjective or a 4(2), 558–582. noun, and provide information about the combinatory properties [6] Harris, Z. 1985. Distributional Structure. In Katz, J. J. (ed.) between particular semantic word classes. Each lemma, which is The Philosophy of Linguistics. New York: Oxford University present in the WordNet structure, is classified into semantic Press. 26–47. primitives such as person, animal, plant, cognition, communication, etc. [16]. On the bases of the dependency [7] Hearst, M. A. 1992. Automatic acquisition of hyponyms collocations and the classification of semantic primitives different from large text corpora. In: Proceedings of the 14th inferences can be calculated. For example, nouns for professions International Conference on Computational Linguistics. participate in the following collocational patterns, generalized for Nantes, France, 539-545. parts of speech and semantic primitives: [8] Hearst, M. A. 1998. Automated Discovery of WordNet – (dependent adjective – (head noun denoting a profession, Relations. Cambridge MA: MIT Press. semantic primitive: noun.person)) (for example, young engineer, [9] Heylen, Kris; Peirsman, Yves; Geeraerts, Dirk. 2008. blond professor); Automatic Synonymy Extraction: A Comparison of Syntactic – ((head noun denoting a profession, semantic primitive: Context Models. In LOT Occasional Series, 11:101–116. noun.person) – (dependent noun specifying a domain, semantic [10] Heylen, K., Peirsman ,Y., Geeraerts, D., Speelman, D. 2008. primitive: noun.cognition)) (for example, director of theater, Modelling word similarity: an evaluation of automatic rector of university); synonymy extraction algorithms. In Proceedings of the Sixth – ((dependent noun specifying a domain, semantic primitive: International Language Resources and Evaluation noun.cognition) – (head noun denoting a profession, semantic (LREC'08), 3243v3249. primitive: noun.person)) (for example, theater director, university [11] Hirst, G., and St-Onge, D. 1998. Lexical chains as rector); representations of context for the detection and correction of – ((head noun denoting a profession, semantic primitive: malapropisms. In Fellbaum, C. (ed.) WordNet: An electronic noun.person) – (dependent noun specifying an affiliation, lexical database. Cambridge MA: MIT Press. 305–332. semantic primitive: noun.group)) (for example, teacher at [12] Karagiozov, Diman, Anelia Belogay, Dan Cristea, Svetla university, instructor at school). Koeva, Maciej Ogrodniczuk, Polivios Raxis, Emil Stoyanov Some WordNets, for example GermaNet, distinguish between and Cristina Vertan. 2012. i-Librarian – Free Online Library semantic classes of adjectives, thus different semantic for European Citizens. In INFOtheca, no. 1, vol. XIII, May. classifications might be further applied. BS Print: Belgrade. 27-43. One of the main goals of our future work will be to apply [13] Koeva, S., Stoyanova, I., Leseva, S., Dimitrova, T., Dekova, WordNet based semantic classifications in order to obtain R., Tarpomanova, E. The Bulgarian National Corpus: theory generalizations about combinatory preferences of words, in and practice in corpus design. In Journal of Language particular, to generate collocational patterns for WordNet Modeling, 1, 65–110. synonyms. Further, the collocations can be extended by means of relatedness between two concepts in WordNet [11], possibly restricted to the direct hyponyms of the head collocate. Since the 8 http://www.w3.org/TR/wordnet-rdf/ 82 [14] Lin, D. and P. Pantel. 2001. Discovery of Inference Rules for [19] Padó, S. and M. Lapata. 2007. Dependency-based Question Answering. Natural Language Engineering construction of semantic space models. In Computational 7(4):343–360. Linguistics , 33(2):161–199. [15] Maynard D., A. Funk and W. Peters. 2009. Using Lexico- [20] Ruiz-Casado, m., E. Alfonseca and P. Castells. 2005. Using Syntactic Ontology Design Patterns for ontology creation context-window overlapping in Synonym Discovery and and population. In Proceedings of ISWC Workshop on Ontology Extension. Proceedings of the International Ontology Patterns (WOP 2009), Washington, 36–52. Conference. In Recent Advances in Natural Language [16] Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Processing, RANLP-2005, Borovets, Bulgaria, 2005. Miller, K. J. 1990. Introduction to WordNet: An On-line [21] Turney, P., and Littman, M. 2003. Learning analogies and Lexical Database. In International Journal of Lexicography, semantic relations. Technical Report Technical Report ERB- 3(4):235–244. 1103. (NRC #46488), National Research Council, Institute [17] Nastase, V., and Szpakowicz, S. 2003. Exploring noun for Information Technology. modifier semantic relations. In Proceedings of IWCS 2003, [22] Turney, P. 2005. Measuring semantic similarity by latent 281–301. relational analysis. In Proceedings of IJCAI 2005, 1136– [18] Nastase, V., J. S. Shirabad, M. Sokolova and S. Szpakowicz. 1141. 2006. Learning noun-modifier semantic relations with [23] Turney, P. D. 2008. A Uniform Approach to Analogies, corpus-based and WordNet-based features. In Proceedings of Synonyms, Antonyms and Associations. In Proceedings of the 21st National Conference on Artificial Intelligence, the 22nd International Conference on Computational Boston, Mass., 781–787. Linguistics, 905–912. 83