Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 51 A logical information system proposal for browsing terminological resources. Annie Foret IRISA, University of Rennes 1 Campus de Beaulieu, 35042 Rennes cedex, France foret@irisa.fr Abstract and their goals as well : (Cellier et al., 2011) ap- ply Logical Concept Analysis to explore sets of This article presents an automated construc- patterns obtained by data-mining, (Quiniou et al., tion of a logical information context from 2012) consider stylistic patterns, (Foret and Ferré, a terminological resource, available in xml ; we apply this to the resource FranceTerme 2010) consider type-logical grammars, (Falk et and to Camelis tool and we discuss how the al., 2014) uses several features including a thema- resulting context can be used with such a tic one to help identify new words. tool dedicated to logical contexts. In this proposal, we want both : The purpose of this development and the – to facilitate the use of a valuable linguistic re- choices related to this experiment is two- source (with a rich structure) and available in fold : to facilitate the use of a rich lin- XML, and to allow its flexible querying and ex- guistic resource available as open-data in ploration without prior knowledge ; xml ; to test and envision a systematic trans- – we want to test and consider a systematic trans- formation of such xml resources to logical contexts. A logical view of a context allows formation (a transducer) from such resources (in to explore information in a flexible way, wi- XML) to logical contexts ; such contexts can be thout writing explicit queries, it may also loaded in a software allowing rich and flexible provide insights on the quality of the data. browsing on data, combining various heteroge- Such a context can be enriched by other in- neous criteria ; the way we represent the informa- formation (of diverse natures), it can also be tion in the context may also have an impact on its linked with other applications (according to ease of use. arguments supplied by the context). The aim is here to perform a transducer so as to present the data in a logical information system Keywords : Scientific terminology, Technological without losing information content, but gaining in terminology, Multilingual applications, Informa- ease of exploration. Other advantages are provided tion extraction, Textual data mining, Information by a safe navigation (no dead-end property) and retrieval, Linguistic resources, Open Data, Infor- serenpidity. mation Quality, Legal Information. The resulting context is freely available 1 . 1 Introduction Terminological resource. The selected re- This study aims to make linguistic data easier source concerns the scientific and technical fields, to exploit through the logical information systems it also interests us for the richness of its struc- approach : whereas such data are not always easy ture : its multilingual aspects, with definitions, to use without assistance or expertise, logical in- synonymous relations, etc. its confirmed status formation systems are especially designed to of- (with source and date of publication), variations fer a flexible browsing of data when organized as according to domain/subdomains or according to a logical context. Some other works use a simi- linguistic criteria (several variants of English, for lar frameworkbut their data are of different nature, 1. at http://www.irisa.fr/LIS/softwares Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 52 example), possible absence or possible repetition Figure 1 illustrates how LCA generalizes Data- of certain types of information. base and Hierarchical systems, the figure also fol- This rich structure also allows further exten- lows the interface that enables three modes and sions : either with existing data or with new data shows synchronized related windows (as in fi- that we organize in a similar pattern. gure 5) : a query on the top, links in the navigation index on the left, or objects on the right. Logical context. A logical context is defined by Thereafter, we present in section 2 our trans- a finite set of objects O, 2 and a finite set of logic ducer implemented in XSLT 4 and we specify the descriptions d(oi ) expressed using a well-formed construction methodology. We present in section 3 logical language L. how to exploit the transducer output on the Fran- A Logical context management system can load ceTerme resource containing terms of different and manage such a context, allowing querying scientific and technical fields ; we discuss seve- a context by logical requests (explicit or inter- ral scenarios and benefits of this approach through active) ; then the answer is a sub-context of ob- this experiment. Additions and adjustments are jects satisfying the query. We used CAMELIS (ver- proposed and discussed in 4 before concluding in sion 1) 3 for the experiment reported in this article. section 5. This software is based on Logical concept analy- sis (LCA) as defined in (Ferré and Ridoux, 2004). 2 The transducer methodology LCA is an an extension of the formal concept ana- lysis (FCA, see (Ganter and Wille, 1999)) : a logi- 2.1 Some key aspects cal concept, denoted c is a pair formed of an extent The transducer is designed to present data in a ext(c) (a set of objects) and an intent int(c) (a for- logical information system without losing infor- mula) such that the elements of ext(c) are exactly mation content, but gaining in ease of explora- those which satisfy int(c). These concepts form a tion. The diagram in figure 2 illustrates the ap- lattice underlying the incremental logical naviga- proach, where the automated steps (solid arrows) tion tree in the left window of the software. The are distinguished from manual or semi-manual software CAMELIS is also designed for managing ones (dotted line). sets of objects of different types. Object descrip- tions in a given logical context can have several origins : they can be retrieved by a transducer or come from extrinsic judgments (personal notes, for example) ; combining these modes allows to enrich the context and adapt it according to a user preferences. F IGURE 2: global architecture Source Document Schemas. The transducer is designed to apply automatically on documents conforming to some document specifications. In general, such a specification can be automatically F IGURE 1: LIS and LCA produced from an XML instance. We used DTD 2. (several objects can have the same label) 3. http://www.irisa.fr/LIS/ferre/camelis/ 4. a web availability is planned. Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 53 generator 5 displayed become all those verifying the se- lected property (links) ; the query window is Logical information system capabilities. We then automatically updated. recall that a logical context system must allow the loading and management of a context described by CAMELIS general properties. The consistency its objects and its objects properties. More preci- between the three windows is ensured. In addition, sely, we assume that : a session will not lead to an empty set of objects - the logical context allows for some inferences : when following the links in the navigation tree : at least from classical logic (such as if A then A this important property is the navigational safety. or B) possibly with axioms useful to model the CAMELIS update and logic modularity. Using the context and organize its presentation (for example same interface, we can add and dynamically up- to reflect a taxonomy and only see some salient date objects and properties, then export as a new properties) ; logical context file (useful for example to generate - some general form of information retrieval and a documentation for the objects of a selected sub- multi-faceted means are provided, as ”logical fa- context). The tool can also be adapted to choose a cets” and ”logical criteria” combinations ; dedicated logic, obtained by combination of logic - a modular and dynamic construction is allowed, functors (Ferré and Ridoux, 2004). for both sets of objects and sets of properties. We do not detail these last aspects in this article. CAMELIS system. In our experiment, we used the Control. Part of the context information visible logical context tool CAMELIS, that, to our know- in the tool, can be retrieved by other means. We ledge, is the only logical context management sys- built some XPath queries to control the process tem. The tool interface displays three connected and to produce complementary indicators. windows (see figure 3) : We also built a control-context (figure 7). - a query window (top) ; - an object window (right) ; 2.2 Transduction overview - an index window (left) as a navigation tree. We give here the characteristics and main stages Properties in the navigation tree are organised of realization of the transducer, in its basic version. as a clickable summary grouping hierarchy pro- This construction is guided by the information in perties : it is important to note that a navigation the DTD generated by XPath queries and control. link there corresponds to a sub-context (as in fi- Key selection. Defining a key in the source (by gure 5, the link/sub-context cardinality is given means of its document schema or of an XPath ex- and a color is also associated with a concept :two pression) is a preliminary stage, and the key defi- navigation links with the same color lead to the nition plays a central role in the context construc- same sub-context). tion. For this experiment with FranceTerme.xml, In the browsing mode, this tool allows three we considered //Article/@id (XPath). forms of query that select a sub-context : expert/query mode by editing in the query Components. The program is designed to faci- window ; the displayed objects become litate its updating, structured by source compo- those statisfying the logic query ; the navi- nents and similar typical treatments. The treat- gation tree is then presented in a form adap- ment of a given source component depends on its ted to the new context ; kind (XML element, attribute) the relevant part of context, its status (optional, repeatable or not), the example/object mode by selecting a set of ob- domain of the source content, and the desired ren- jects in the object window ; the query au- dering (data type, property name, property hierar- tomatically becomes an expression for the chies). common properties of objects ; index/property mode by selecting a property Main Loop. For each source item Article : (or more) in the navigation tree ; the objects — each object get a unique label (extracted by Terme[@statut=’privilegie’]), used for the 5. it is available at http ://saxon.sourceforge.net/dtdgen.html ;the terminological XML source file we used is accompanied object presentation and for a string property with an XML Schema xsd, but without guarantee in the navigation index ; Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 54 — the key becomes a number property rule extr (key=id) --> (prop1 is val2) articleID = ... (xslt 6 ) ; this will automatically add each property value — the publication date is processed to appear to the object with key id. in the index at different levels of date detail ; This second solution brings some modularity — most other components are processed to since we can put rules in separate files, the pro- produce strings of the form : perties being effectively added to objects after a property name is "string value" file import. We chose this approach by means of — property names may depend on several rules and keys for some repeatable components. 8 XML components (such as Terme element with a @statut attribute), they are organized 3 Logical Context and facets to allow their grouping and multiple levels In this part, we discuss several possible search of detail (Terme ? is more general than Terme modes in the resulting context, where navigation SYNONYME) ; links (incremental) correspond to logical concepts — we also give a common prefix Plus to pro- that can be selected. perties for data in the source, but not vi- 3.1 Simple searches on several data types. sible in the ressource site (such as data about Multilingual data. The FranceTerme resource committees) ; 7 contains translations in several languages, with va- — for XML elements that can be repeated for a riations for the same language. Those data are at- given key/object, (such as Terme) we use an tached to various domains and subdomains (possi- inner loop ; bly several ones for a given object). at this stage the output file context contains Scenario. An exploration of the logical context the description of one object per line, with can be conducted that way, for example : its main properties ; — other components (such as foreign equiva- - open the Domaine ? property in the index ; lents or antonymes, optional or repeated) are - select-click Domaine is "Informatique" rendered by rules of the form : (computer science), this yields the cor- rule extr (key=id) --> (prop1 is val1) responding sub-context (with 3 coherent that automatically associate the property to views) ; the object designated by the key. - we may further select-click Domaine is "Droit" (Law), also automatically expres- 2.3 Modularity of context sed as Domaine is "Informatique" and For treating a logical property related to an Domaine is "Droit" in the top window ; XML component, such as child (op- - open the Equivalent ? property then open tional and repeatable) of an
identified Equivalent en is "..." in the index etc. by its attribute id, several alternatives are pos- sible : - open the PubliArticle ? property then PubliArticle date = 2014 in the index etc. -to indicate the name of property and its value by assembling and repeating the property name for Another simple search on ”streaming” is shown the object, using this pattern : in figure 3. mk "object" key=id, ..., is prop1 Sub-context Cardinalities. In the property val1, val2 prop1 is ... is ... prop2 index-tree, we may choose an order for displaying -to indicate each property value by transformation a given facet. This is useful for example to read rules, using this pattern, when the object is assu- directly which Equivalent en correspond to the med to be already created and associated with the greatest number of French terms (figure 3). key : 8. Here is a typical xslt fragment (with some special sym- rule extr (key=id) --> (prop1 is val1) bols treatments for compatibility) : select="concat($varRulePart1,’idArticle=’, 7. in the navigation tree, compound names of the proper- $varArtId,$varRulePart2, ’Attention is ’, $varQuote, ties are grouped by prefixes, details appear by opening a link translate(normalize-space(./text()), $varQuote, $varBQuote), Plus ? $varQuote)"/> Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 55 F IGURE 3: open facet en, before selection Data types. Data types other than attributes and F IGURE 5: Variants strings can be handled, Figure 4) shows a possible use of dates, allowing for more or less fine-grained selections. kind of subcontext summary and extra informa- tions (here Attention is the focus element). F IGURE 4: Facet en selected, facets date and domain opened F IGURE 6: Focus on Attention Exploring variants and false-friends. Figure 5 comes from selections in the index tree ; links of 3.3 Other scenarios : data quality a same color characterize the same set of objects (concept). This navigation mode allows to detect abnor- This example illustrates the identification of po- malities, in particular pseudo-empty properties ap- tential linguistic errors (in a domain / subdomain) pear on other facets (through a link like Definition Other examples such as ”package” (Equivalent is "") these cases often correspond to existing but en) or some abreviations (”ABS”) can be high- empty XML elements (but are not XML errors). lighted as ambiguous : the dynamic navigation Low cardinalities in the navigation tree may sug- links (domains, etc.) then provide hints to disam- gest to explore the link, by selecting it and ope- biguate. ning other facets simultaneously ; we can analyse this way ”the words without translation, following 3.2 Focused search on elements and the not Equivalent? link. exceptions : summaries. In case of redundancies, these may become ea- Using the Not button at a given stage, we can sily noticeable throusg browsing : exploring the arrive at a subcontext characterized by A2 = Antonymes facet, we can see XML structuring re- not(A1 ) and A0 as in Figure 6. Property A2 ex- dundancies (this information being carried by two presses a search for exceptions to A1 , we get a source elements). Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 56 3.4 Control and actions from a context the navigation (through the proposed increments) ; The logical context software can assign actions a flexible mode of interrogation and ease of inter- to objects by properties ; clicking on an object la- pretation. bel then shows a contextual action menu. We indicate some possibilities through modifi- This is useful in particular to inspect objects in cations in the experiment. their source xml file. Other kinds of control (for coverage, counts, 4.1 Adapting facets using rules etc.) are made Domains and SubDomains data have been - by XPath queries on the XML document used to translated. The resulting two context files contain verify if certain characteristics of particular sub- update rules of the form : contexts (planned or explored) are consistent with rule extr Domaine is "Acoustique" --> the source document ; Domain is "Acoustic" - a meta context built, following the DTD schema, when loaded in the context, properties on the whose objects are : element names, and the pairs right hand side are added to all objects verifying (attribute name, element name). These objects are the left hand side. associated with actions parameterized by their la- bel, in our case (Figure 7), the action is an XPath 4.2 Improving grammatical categories using query using BaseX ; This can be adapted easily to rules and axioms another set of controls. In the original context, we can see (with an appropriate ordering) that among the terms with a category attribute, the names (categorie is "nm" or categorie is "n") are the majority, followed by categorie is "adj.". However these grammatical categories are listed with various values, we can observe : which may include in particular : - a disjunction, as in : categorie is "adj. ou n.m." and Equivalent en is "crossmedia (n. ou adj.)" which selects the term transmédia ; but we also observe its permutation categorie is "n.m. ou adj." ; F IGURE 7: control (meta) context - a more or less fine granularity, as in : categorie is "n.m.inv." The addition of rules and axioms in the logi- 4 Refinements and user preferences cal context permits to harmonize these properties, resulting a more structured navigation tree accor- A logical context tool such as CAMELIS by its ge- ding to this facet. A few lines in the resulting nericity and its features, allows many alternatives context define a hierarchy of categories, such as : to represent and use a terminological resource like rule extr categorie is "n.m.inv." --> Categorie n m inv FranceTerme. ... Some initial choices can be easily revised or Categorie n m inv axiom, Categorie n m completed ; for example the name of a property Categorie n m axiom, Categorie n ... can be changed directly through the interface (or by simple transformation of the context file). Note that such improvements could apply to The choices and refinements should provide other terminological resources and result from lin- a better context. Several quality criteria can be guistic analysis or other lexicons. considered : effectively obtaining a desired result 4.3 Axioms for property variants (usefulness / completeness) ; the number of steps to get there (effectiveness) ; rich browsing indexes We have seen that a property Terme? covers (multiple views) and efficient indexes to pursue three statutes (PRIVILEGIE, SYNONYME, ANTONYME). Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 57 A context of axioms can facilitate a search on all ting to (or pointed to by) a particular word (or or a subset of these variants : more). axiom PRIVILEGIE, SET axiom SYNONYME, SET Notes. axiom SET, ANY axiom ANTONYME, ANY - In the context for FranceTerme, we observe some A query may then group several properties (ha- terms (15) satisfying this query : ving a status below another expression like SET) as Voir Aussi ? and not Voir Depuis ? examplified in figure 8. these are the ”terms pointing to an article, but not pointed to” ; - according to the resource schema, other elements (synonyms, antonyms, ...) could be treated simi- larly, with navigation links in context. 10 Linking to other resources at the level of ac- tions. As explained about control, the logical context management software allows to associate actions to objects. We can use this mode to associate an object with a process (or more) on this item that may be intro- duced in the interface from a context menu rela- F IGURE 8: with axioms on property names ted to the object (or group of objects). The setting can be provided at the transducer level. A file des- This example shows a query for Terms (inclu- cribing these actions can be later loaded from the ding status variants) containing negation (”non”). interface. Note that such axioms can be added or modified We generated connections to : in a modular way. - a parser, installed locally : the processing chain (open) Bonsai (Candito et al., 2010) which takes 4.4 Linking data and resources as parameter the label of the object ; a selection of At the property level in the navigation tree. this action on the object provides a syntactic ana- The website FranceTerme allows to select new lysis of the label expression ; terms, from the description of a current term (de- - a web link to another terminological resource for noted as t1 ) by a link See also. This is rendered in French (CNRTL, http://www.cnrtl.fr/) the para- the context navigation tree by the facet Voir aussi meter being the term as above ; a selection of this (See also) is "... information on tj " (denoted action on the object opens the browser on the web- as fj ) where the term to see tj is shown with its site page for this term, if there exists one (none for key Article . Two modes of translation of this some FranceTerme expressions) ; piece of information have been tested : - an XML link to a subpart of the source file, - in a basic mode, as a simple property fj of the through an XPath tool (BaseX) the parameter current term t1 being the object key (attribute id of Article) ; a - in a full (reflexive) mode, where t1 has fj selection of this action on the object executes the and also f1 : Voir Aussi is "... information on software with a prepared BaseX request using the t 1 ". 9 object key. This second mode allows this type of scenario : This action list is not exhaustive and can be while t1 has property fj , select this fj link in the adapted. In particular, we could consider links (lo- navigation tree ; by reflexive closure, tj is also in cal or not) with other analyzers, or other linguistic the sub-context, and can be further selected. resources and retrieve results to enrich the logi- We treated in the same way the reciprocal link cal context with new properties. The capabilities of Voir Aussi, by adding (by the transducer) a of Full text search (of BaseX) could also be ex- Voir Depuis (See From) "..." property. This en- ables to group into a sub-context the terms poin- 10. this treatment is not currently done for the other ele- ments (in the source xml these terms do not necessarily cor- 9. no addition to the terms that have no link respond to an article/object). Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 58 ploited. could be carried out by adapting the standards and software tools of the semantic web. Finally, we 5 Conclusion mainly discussed browsing, future work could also concern updates. The general aim of this proposal was to show how a logical concept analysis (LCA) framework and tools could be beneficial for browsing termi- References nological resources ; through this experiment the purpose was twofold : Marie Candito, Joakim Nivre, Pascal Denis, and En- rique Henestroza Anguiano. 2010. Benchmar- - to facilitate the use of a useful language resource king of statistical dependency parsers for french. (rich structure) and available in XML, In Chu-Ren Huang and Dan Jurafsky, editors, CO- - to envision a systematic transformation of such LING 2010, 23rd International Conference on Com- resources as XML to logical contexts. putational Linguistics, Posters Volume, 23-27 Au- Improvements may also be suggested and gust 2010, Beijing, China, pages 108–116. Chinese brought to the data ; other treatments may also be Information Processing Society of China. eased, for example a selected sub-context can be Peggy Cellier, Sébastien Ferré, Mireille Ducassé, and exported as text and generate other results (such Thierry Charnois. 2011. Partial orders and logi- cal concept analysis to explore patterns extracted by as a documentation). data mining. In Simon Andrews, Simon Polovina, We illustrated how a logical context allows to Richard Hill, and Babak Akhgar, editors, Concep- explore linguistic information, in a flexible way, tual Structures for Discovering Knowledge - 19th without a priori knowledge, and also get guidance International Conference on Conceptual Structures, on data quality (in the navigation tree, counts and ICCS 2011, Derby, UK, July 25-29, 2011. Procee- colors for concepts, ...) New linguistic information dings, volume 6828 of Lecture Notes in Computer (personal, enterprise, ...) could be incorporated ea- Science, pages 77–90. Springer. Ingrid Falk, Delphine Bernhard, and Christophe sily in the initial context (if they comply with the Gérard. 2014. From non word to new word : Au- document model and the key assumption). tomatically identifying neologisms in french news- Additional data to compare and enrich the papers. In Nicoletta Calzolari, Khalid Choukri, content can also be added in several ways and for Thierry Declerck, Hrafn Loftsson, Bente Maegaard, many languages, (for French : Wordnet Wolf (Sa- Joseph Mariani, Asunción Moreno, Jan Odijk, and got and Fiser, 2012), Lefff lexicon (Sagot, 2010), Stelios Piperidis, editors, Proceedings of the Ninth etc.) : International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, - by adding objects without confusion between May 26-31, 2014., pages 4337–4344. European Lan- sources (since a property indicating the source is guage Resources Association (ELRA). associated with the object) ; Sébastien Ferré and Olivier Ridoux. 2004. Introduc- - by adding properties to expand the browsing pos- tion to logical information systems. Inf. Process. sibilities ; Manage., 40(3) :383–419. - by adding triggered actions on objects. Annie Foret and Sébastien Ferré. 2010. On ca- Other actions corresponding to linguistic pro- tegorial grammars as logical information systems. cessing can be added to the context :parsing the In Léonard Kwuida and Baris Sertkaya, editors, Formal Concept Analysis, 8th International Confe- expression (several languages), syntactic head, rence, ICFCA 2010, Agadir, Morocco, March 15-18, etc. We could also consider inverse properties 2010. Proceedings, volume 5986 of Lecture Notes in (such as translation) and enrich the context with Computer Science, pages 225–240. Springer. these objects. Bernhard Ganter and Rudolf Wille. 1999. For- Moreover, it seems that the development me- mal concept analysis - mathematical foundations. thod could be transposed to other open data Springer. and linguistic xml ressources. To some extent, Solen Quiniou, Peggy Cellier, Thierry Charnois, and Dominique Legallois. 2012. What about sequen- the construction of the transducer presented here tial data mining techniques to identify linguistic pat- could be automated if it relies on a determination terns for stylistics ? In Alexander F. Gelbukh, editor, of a key and a grid indicating for a source com- Computational Linguistics and Intelligent Text Pro- ponent, its label, its type, its repeatability, and the cessing - 13th International Conference, CICLing way it should be rendered. A similar experiment 2012, New Delhi, India, March 11-17, 2012, Pro- Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 59 ceedings, Part I, volume 7181 of Lecture Notes in Computer Science, pages 166–177. Springer. Benoı̂t Sagot and Darja Fiser. 2012. Cleaning noisy wordnets. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Mae- gaard, Joseph Mariani, Jan Odijk, and Stelios Piperi- dis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, May 23-25, 2012, pages 3468–3472. European Language Resources Association (ELRA). Benoı̂t Sagot. 2010. The lefff, a freely available and large-coverage morphological and syntactic lexicon for french. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Ste- lios Piperidis, Mike Rosner, and Daniel Tapias, edi- tors, Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta. European Lan- guage Resources Association.