=Paper= {{Paper |id=Vol-1456/paper9 |storemode=property |title=Representing and Visualizing Text as Ontologies: A Case from the Patent Domain |pdfUrl=https://ceur-ws.org/Vol-1456/paper9.pdf |volume=Vol-1456 |dblpUrl=https://dblp.org/rec/conf/semweb/DasiopoulouLCW15 }} ==Representing and Visualizing Text as Ontologies: A Case from the Patent Domain== https://ceur-ws.org/Vol-1456/paper9.pdf
Representing and Visualizing Text as Ontologies:
       A Case from the Patent Domain

Stamatia Dasiopoulou1 , Steffen Lohmann2 , Joan Codina1 , and Leo Wanner1,3
          1
            Department of Information and Communication Technologies,
                     Pompeu Fabra University, Barcelona, Spain
               2
                  Institute for Visualization and Interactive Systems,
                    University of Stuttgart, Stuttgart, Germany
      3
        Catalan Institute for Research and Advanced Studies, Barcelona, Spain



       Abstract. This paper presents preliminary results on a framework for
       the representation and visualization of text as OWL ontologies under
       an open-domain paradigm, where no a priori schema for the facts to be
       extracted is available. The extracted ontology is visually represented as a
       specifically tailored node-link diagram. The applicability of the approach
       is demonstrated on a use case from the patent domain.


1    Introduction
Extracting ontologies from text can significantly facilitate knowledge integra-
tion and querying, through semantic alignment and mediation [6]. Only recently
though, under the Linking Open Data (LOD) paradigm of publishing and linking
structured information on the Web, has research shifted towards open-domain
approaches, where no a priori schema for the facts to be extracted is available
and the textual input is considered in its entirety [1,20].
    Within such context, two main challenges emerge: 1) to ensure the transla-
tion of textual input into well-formed ontologies that facilitate knowledge inte-
gration and querying in a schema-agnostic fashion; and 2) to provide the means
for comprehensive visualizations that foster the understanding of the extracted
knowledge, particularly at the factual level, in an intuitive manner that appeals
adequately to users of diverse backgrounds and with varying levels of expertise.
    These challenges are sharply manifested in the patent domain. The highly
specialized and cross-domain terminology used in patent documents makes it
very difficult, if not impractical, to rely on the availability of predefined schemata
for the extraction of knowledge relevant to the task at hand. Moreover, the
inherent complexity of patent documents render effective visualizations key tools
for assisting experts in quickly grasping the main elements and their interactions.
    In this paper, we present preliminary results on a framework for the repre-
sentation and visualization of text as an OWL ontology under an open-domain
paradigm, and illustrate its application with a use case from the patent domain.
Abstracting from the specifics of the various semantic parsing methodologies,
we describe an entity-relation-centric model for OWL-based text representation
together with a graphical notation for its visualization as node-link diagram.


                                          83
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain


 2     Related Work
 In accordance with the twofold goal of the proposed framework, related ap-
 proaches to ontology extraction and visualization are discussed in the following.

 2.1   Extracting Ontologies from Text
 Although ontology learning and population from text have been the subject of
 arduous research [4,21], investigations into the conceptualization of text in its
 entirety have commenced only recently with LODifier [1] and FRED [20]. Both
 use Boxer [7] to extract Discourse Representation Structures (DRSs), namely
 discourse referents (entities) and conditions (unary and binary relations), and re-
 spective rules to translate them into ontological representations. LODifier keeps
 modeling commitments minimal, by introducing a blank node for each discourse
 referent and by using reification to capture embedded DRSs. FRED [20] im-
 plements a more earnest mapping of DRSs to OWL constructs, utilizing frame
 semantics [2], links to the DOLCE+DnS foundantional ontology and heuristic
 rules that aim to maximize conformance to Semantic Web best practices.
     Both result in representations that explicitly cater for n-ary relations, which
 represent a critical share of relations for effectively capturing the richness of tex-
 tual contents. However, LODifier compromises ontology design with choices such
 as blank nodes, whereas FRED ensures high compliance with best practices, but
 the presented translations and heuristic rules are specifically tailored to DRSs.
 Instead, our goal is to provide a model for the generation of OWL representa-
 tions from text that avoids commitments to specifics of the predicate-argument
 structures.

 2.2   Visualizing Fact-based Ontologies
 Many approaches to graphically represent ontologies have been proposed in the
 last couple of years [8,14]. However, they are not tailored to the visualization
 of ontologies that are extracted from text, and have limitations in this regard.
 While some approaches (e.g., OWLViz [13] and KC-Viz [18]) merely visualize
 the class hierarchy of ontologies, others (e.g., OntoGraf [10] and FlexViz [11])
 are able to represent different types of properties. All these attempts are related
 to the visualizations generated by FRED in that they focus on terminological
 knowledge (aka TBox) and not on assertional knowledge (aka ABox), which we
 aim to visualize in our work. The same holds for ontology visualizations that
 provide more elaborated notations (e.g., Graffoo [9] and VOWL [15]), i.e., they
 also mainly address the ontology schema and are therefore less appropriate for
 the representation of fact-based ontologies extracted from text.
     This is different in visualizations of RDF and Linked Data that are typ-
 ically more oriented towards the ABox. Examples include RDF Gravity [12],
 Welkin [17], and LodLive [5]. Such visualizations depict the triple structure of
 RDF but they are usually not capable to represent n-ary relations. In addi-
 tion, they use plain node-link diagrams with only little variation in the visual
 elements.


                                          84
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain



                                                                                                                        Quality
                                                                                                                        (external)
                                                      Object
                                                      (external)
                                                                                      float

                                                                                                   hasQuality
                                                                                                                    Subclassmof
                                                                              relevance
          Event                                             Subclassmof
          (external)
                                                                                              Subpropertymof
                                    isParticipantIn
                                                                                                                    Attribute
                                                                                                hasAttr
                                       Subpropertymof                Entity
                 Subclassmof


                                              participant                             hasPart

                                                          Subpropertymof                         Subpropertymof
                         Relation        Subpropertymof
                                                   Subpropertymof                                                 hasPart
                                                                       circumstantial
                                          undergoer          actor




      Fig. 1. Classes and properties of the core vocabulary (visualized with VOWL).



 3     Ontological Text Representation

 Aiming to abstract from predicate-argument specifics while assuring maximal
 interoperability within the Semantic Web and LOD context, we developed a
 minimal reference model for generating ontological text representations at a fac-
 tual level.4 Hence, our goal is to provide core classes and properties for capturing
 the ways in which the extracted entities are interrelated, and that can be applied
 across domains, serving as anchors for attaching application-tailored class and
 property hierarchies.
     A key design decision has been to model the extracted relations as classes
 rather than properties. This is motivated by the saliency of n-ary relations in
 textual resources, and the incurred loss of semantics when, instead of preserving
 the n-ary dependencies, they are broken down into binary relations [19]. Fur-
 thermore, direct mappings to well-established foundational ontologies, such as
 DOLCE+DnS Ultralite5 and SUMO6 , are promoted to enhance the interoper-
 ability and compliance with ontology design practices.
     In accordance with the aforementioned principles, the model comprises the
 following core classes: Entity subsumes the set of physical objects, processes, and
 substances; Relation captures n-ary interrelations between entities; Attribute
 encompasses characteristic aspects of an entity that cannot exist without it.
 Alongside, a minimal set of upper-level object properties connect individuals of
 the three classes: participant allows to link entities to the relations in which
  4
    Modalities, such as belief, causality, and entailment, are not considered as they can
    be covered through specialized ontologies and knowledge patterns.
  5
    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl
  6
    http://www.adampease.org/OP/SUMO.owl


                                                                     85
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain


 they participate; actor and undergoer specialize participant in order to dis-
 criminate between direct participants (“who?”, “what?”, etc.) and complemen-
 tary ones (e.g., the [pump]actor pumps [water]undergoer ), while circumstantial
 is a specialization used as a catch-all property for other types of participation;
 hasAttr is used to associate entities to their attributes; hasPart is used to
 capture mereological relations between entities; lastly, the datatype property
 relevance allows to capture the relevance of the extracted entities to the mat-
 ter being considered. Figure 1 visualizes the core vocabulary using VOWL [15].
 The vocabulary is aligned with classes and properties from the DOLCE+DnS
 Ultralite ontology, which have a white font on a dark background in Figure 1.
     The extracted predicate-argument structures can then be translated into
 OWL representations, according to the following rules:

   – For each extracted entity, attribute, or relation, a named individual is gener-
     ated; for co-referential entities, i.e., entities referring to the same real-world
     object, a single individual is introduced.
   – For each added named individual, respective rdf:type statements are added
     based on the extracted vocabulary of entities, attributes, and relations.
   – Respective rdfs:subClassOf axioms are added for each introduced entity,
     relation, and attribute class.
   – Instigative and passive participation links between entities and relations
     are translated into respective actor and undergoer property assertions;
     likewise for circumstantial participation, where additionally the preposi-
     tions lexicalizing the participation are defined as subproperties. For example,
     given the excerpt “...connected along...”, along is added as a subproperty
     of circumstantial.
   – Links between entities and attributes as well as entities and their parts are
     captured as hasAttr and hasPart property assertions, respectively.

      The result is an OWL ontology consisting primarily of assertional knowledge,
 i.e., class and object property assertions, and to a lesser extent of terminological
 knowledge, as it could be derived from links to LOD resources, such as DBpedia
 and WordNet. Further specializations and schema enrichments, according to the
 given application needs, can be acquired through ontology learning.


 4    Visualization of the Extracted Ontology

 Our visual notation for the graphical representation of the extracted ontology
 is inspired by VOWL [15], which provides user-oriented visualizations for OWL
 ontologies. VOWL has, for instance, been used to create the visualization of
 Figure 1. However, whereas VOWL focuses on the visualization of the ontol-
 ogy schema, we are interested in the visualization of facts extracted from text.
 Therefore, we could not simply reuse VOWL but developed a related ABox
 visualization that combines the strengths of VOWL with the peculiarities of
 visualizing fact-based ontologies extracted from text.


                                          86
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain


                                                      Relation    Relation
                                 Entity
           Entity     Entity     color = type
                                                                  actor
                                 size = pertinence
                                                                  undergoer
         Attribute   Attribute   Attribute             circum-
                                 color = type          stantial   circumstantial

      Fig. 2. Notation for the graphical representation of the extracted ontology.


     Figure 2 summarizes the current visual notation. We adopted the basic visual
 elements of VOWL, consisting of circles which represent the extracted entities
 and rectangles representing the relations. The colors of the circles and attributes
 can be varied depending on their type. In contrast to VOWL, relations can
 be n-ary, which requires that they are rendered as nodes. This is in line with
 our design decision to model relations as classes rather than properties in the
 extracted OWL ontologies. Furthermore, we introduced a labeled link element
 to depict prepositions that qualify circumstantial participations.
     We also adopted the idea of scaling the size of the circles, which, in VOWL,
 reflects the number of individuals that are members of a class. In our case, the
 circle size indicates the relevance values computed for the terms: Entities with
 a higher relevance value are shown in a larger size in the visualization. This
 helps to easily spot those entities that are most relevant to the matter being
 considered.
     Finally, we decided to attach the attributes directly to the entity nodes in-
 stead of adding another link, as for the datatype properties in VOWL, in order to
 emphasize their strong connection and visually indicate that attributes cannot
 exist without the corresponding entities.


 5    Use Case from the Patent Domain

 Patent documents are highly idiosyncratic, verbose texts that describe elaborate
 inventions and make heavy use of specialized terminology. These characteristics,
 in combination with the continuously growing rate at which patents are filed
 worldwide, incur extensive labor and time costs for carrying out typical patent
 portfolio analysis tasks. In this context, structured representations that can assist
 experts in identifying and contrasting patents relevant to the task in question,
 by rendering semantics explicit, and visualizations that effectively summarize
 the key elements of an invention and foster understanding, can entail immediate
 competitive advantages.
     In the investigated use case, we address constructive patents, i.e., patents that
 describe the constituent parts of machine inventions and the ways in which they
 interact. In this context, it is important to specialize the described entities into
 components (e.g., coil, battery), substances, processes, and other entities (e.g.,
 temperature); likewise, for spatial and quantity attributes, such as inner charger


                                                87
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain



                                                                                      drive_shaft           hasPart                  surface

                                                         extends
                                         along
                          shank
                                                                                                                                             to

                                                                                                                                         is_connected
                                                                                                                brushes


                                                 along                                  drives
                                                                                                                plurality


                                                                                                                                                     pump
                                                                     extends
                                                                                 in                      includes
    electric_toothbrush              comprises



                                                                                                                                                      pumps

                                                                                                                                 along

                                                               toothbrush_head                         water_pas...



                            handle                                                                                                                            water

                                                                                  incorporates                                    connects
      component                          in
                                                                                                                            to
      substance                          is_mounted

      spatial part
                                                                                                    water_jet
      other
                                                                                                                                               water_su...
                                                                   motor
      spatial
      quantity
                                                               inner




     Fig. 3. Visualization of an ontology extracted from the claim text of a patent.


 and plurality of brushes, as well as spatial parts (e.g., surface, bottom). To this
 end, the upper-level model definitions have been extended accordingly through
 the introduction of respective subclasses to the classes Entity and Attribute.
    Using the mate tools [3], predicate-argument structures are extracted and
 subsequently their relevance is computed following a methodology similar to one
 used for identifying relevant sentences in extractive summarization tasks [16].
 Then, OWL representations in compliance with the extended core model are
 generated, based on the transformation rules described in Section 3.
    Figure 3 shows the visualization resulting from the below patent claim, where
 the extracted entity, relation, and attribute individuals are outlined in respective
 fonts. The initial layout of the diagram has been generated with a force-directed
 algorithm and has then been manually adapted to increase its readability.
     An electric toothbrush with a water jet, the toothbrush comprising a handle,
 a shank, and a toothbrush head that incorporates the water jet, in which an in-
 ner motor is mounted in the handle and the toothbrush includes a reciprocating
 drive shaft extending along the shank to drive a plurality of brushes in the brush
 head, including a water passage extending along the shank to connect a water
 supply to the water jet, and a pump in the shank for pumping water along the
 passage that is mechanically connected directly to the surface of the drive shaft.
    In the given example, there are four types of extracted entities (components,
 substances, spatial parts, and other) and two types of attributes (spatial and
 quantity), as indicated by the different colors assigned to the entity and attribute


                                                                               88
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain


 nodes. As mentioned before, coreferential entities are captured by a single indi-
 vidual, upon which the respective participation links are projected. For example,
 the mentions of passage in “...a water passage extending along the shank...”
 and “...pumping water along the passage...” refer to the same passage entity;
 accordingly, there is a single “water passage” node to which the participation
 links in the extending and pumping relations have been projected.
     All in all, the visualization provides an adequate representation of the patent
 claim that could be used to support analysts in understanding the elements and
 interrelations of the described invention.

 6    Conclusions and Future Work
 In this paper, we have presented an upper-level model for extracting ontological
 text representations under an open-domain paradigm that allows abstracting
 from the specifics of predicate-argument structures, and a visual notation for its
 graphical representation that focuses on the visualization of facts rather than
 the ontological schema. The applicability of the proposed representation and
 visualization framework has been demonstrated through a use case from the
 patent domain.
     Future work includes further validation and fine-tuning of the representa-
 tion model, through extensive evaluation in cooperation with experts from the
 patent domain, as well as in an application-wise manner, where it will be used as
 the basis for assessing semantic similarity between patents. Furthermore, future
 research will have to address enhanced visualization paradigms that are more
 tailored to the patent domain.
     General challenges with regard to the notation are improved scalability and
 readability of the visualization. A scalable visualization must be capable to rep-
 resent larger ontologies extracted from several paragraphs of a text. In the patent
 use case, the individual claims could, for instance, form different subgraphs that
 are connected with each other according to specified dependencies.
     Generalizing from the patent domain, the presented representation and vi-
 sualization framework may serve as a valuable starting point for related cases of
 ontology extraction and visualization. The open-domain character of the ontol-
 ogy extraction and representation approach enables its wide application, along
 with the visual notation that combines the clarity of VOWL with an ABox-
 oriented view and capabilities to explicitly represent n-ary relations.

 Acknowledgments
 This work has been supported by the EU FP7-SME-606163 project iPatDoc.

 References
  1. Augenstein, I., Padó, S., Rudolph, S.: LODifier: Generating linked data from un-
     structured text. In: 9th Extended Semantic Web Conference (ESWC ’12). pp.
     210–224. Springer (2012)


                                         89
Representing and Visualizing Text as Ontologies: A Case from the Patent Domain


  2. Baker, C., Fillmore, C., Lowe, J.: The Berkeley FrameNet project. In: 36th Annual
     Meeting of the Association for Computational Linguistics and 17th Int. Conference
     on Computational Linguistics (COLING-ACL ’98). pp. 86–90. ACL (1998)
  3. Bohnet, B., Nivre, J., Boguslavsky, I., Farkas, R., Ginter, F., Hajic, J.: Joint mor-
     phological and syntactic analysis for richly inflected languages. Transactions of the
     Association for Computational Linguistics 1, 415–428 (2013)
  4. Buitelaar, P., Cimiano, P. (eds.): Ontology Learning and Population: Bridging the
     Gap Between Text and Knowledge. IOS Press (2008)
  5. Camarda, D.V., Mazzini, S., Antonuccio, A.: LodLive, exploring the web of data.
     In: 8th International Conference on Semantic Systems (I-SEMANTICS ’12). pp.
     197–200. ACM (2012)
  6. Cimiano, P.: Ontology learning and population from text - algorithms, evaluation
     and applications. Springer (2006)
  7. Curran, J., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with c&c
     and boxer. In: 45th Annual Meeting of the Association for Computational Linguis-
     tics (ACL ’07). ACL (2007)
  8. Dudáš, M., Zamazal, O., Svátek, V.: Roadmapping and navigating in the ontology
     visualization landscape. In: 19th International Conference on Knowledge Engineer-
     ing and Knowledge Management (EKAW ’14). pp. 137–152. Springer (2014)
  9. Falco, R., Gangemi, A., Peroni, S., Shotton, D., Vitali, F.: Modelling OWL ontolo-
     gies with graffoo. In: ESWC 2014 Satellite Events. pp. 320–325. Springer (2014)
 10. Falconer, S.: OntoGraf. http://protegewiki.stanford.edu/wiki/OntoGraf (2010)
 11. Falconer, S., Callendar, C., Storey, M.A.: A visualization service for the semantic
     web. In: 17th International Conference on Knowledge Engineering and Knowledge
     Management (EKAW ’10). pp. 554–564. Springer (2010)
 12. Goyal, S., Westenthaler, R.: RDF Gravity. http://semweb.salzburgresearch.at/
     apps/rdf-gravity/ (2004)
 13. Horridge, M.: OWLViz. http://protegewiki.stanford.edu/wiki/OWLViz (2010)
 14. Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., Giannopoulou, E.: Ontology
     visualization methods – a survey. ACM Computer Surveys 39(4) (2007)
 15. Lohmann, S., Negru, S., Haag, F., Ertl, T.: VOWL 2: User-oriented visualization
     of ontologies. In: 19th International Conference on Knowledge Engineering and
     Knowledge Management (EKAW ’14). pp. 266–281. Springer (2014)
 16. Mani, I.: Automatic summarization. John Benjamins Publishing (2001)
 17. Mazzocchi, S., Ciccarese, P.: Welkin. http://simile.mit.edu/welkin/
 18. Motta, E., Mulholland, P., Peroni, S., d’Aquin, M., Gomez-Perez, J.M., Mendez,
     V., Zablith, F.: A novel approach to visualizing and navigating ontologies. In: 10th
     International Semantic Web Conference (ISWC ’11), Part I. pp. 470–486. Springer
     (2011)
 19. Noy, N., Rector, A., Hayes, P., Welty, C.: Defining n-ary relations on the semantic
     web. http://www.w3.org/TR/swbp-n-aryRelations/ (2006)
 20. Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse
     representation theory and linguistic frames. In: 18th International Conference on
     Knowledge Engineering and Knowledge Management (EKAW ’12). pp. 114–129.
     Springer (2012)
 21. Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: A look back
     and into the future. ACM Computer Surveys 44(4) (2012)




                                           90