=Paper= {{Paper |id=Vol-1752/paper04 |storemode=property |title= Conceptual Modeling with Formal Concept Analysis on Natural Language Texts |pdfUrl=https://ceur-ws.org/Vol-1752/paper04.pdf |volume=Vol-1752 |authors=Mikhail Bogatyrev |dblpUrl=https://dblp.org/rec/conf/rcdl/Bogatyrev16 }} == Conceptual Modeling with Formal Concept Analysis on Natural Language Texts == https://ceur-ws.org/Vol-1752/paper04.pdf
        Conceptual Modeling with Formal Concept Analysis
                   on Natural Language Texts †
                                               © Mikhail Bogatyrev
                                               Tula State University,
                                                       Tula
                                                okkambo@mail.ru



                       Abstract                                        Conceptual modeling is one of the ways of modeling
                                                                   semantics in the Natural Language Processing (NLP)
The paper presents conceptual modelling technique on               [22]. Conceptual modeling is the process of
natural language texts. This technique combines the                conceptualization of real world phenomena and creating
usage of two conceptual modeling paradigms:                        conceptual models as a result of conceptualization.
conceptual graphs and Formal Concept Analysis.                     Conceptual model is a graph which vertices are concepts
Conceptual graphs serve as semantic models of text                 and arrows or edges are links between concepts. Every
sentences and the data source for concept lattice – the            conceptual model has its own semantics which
basic conceptual model in Formal Concept Analysis.                 represents the meanings of concepts and links.
With the use of conceptual graphs the Text Mining                      Conceptual modeling has long been applied for
problems of Named Entity Recognition and Relations                 databases and software modeling [19] and this term is
Extraction are solved. Then these solutions are applied            also used in other fields including NLP. Entity
for creating concept lattice. The main problem                     Relationship Diagram (ERD) [19] is well known
investigated in the paper is the problem of creating               representative of conceptual models. It describes the
formal contexts on a set of conceptual graphs. Its solution        structure of database in terms of entities, relationships,
is based on the analysis of semantic roles and conceptual          and constraints. These terms of entities, relationships,
patterns in conceptual graphs. Concept lattice built on            and constraints are explicitly or implicitly present at
textual data is applied for knowledge extraction.                  many other conceptual models including ones discussed
Knowledge, sometimes interpreted as facts, can be                  in this paper.
extracted by using navigation in the lattice and                       Formal Concept Analysis (FCA) [13] is the paradigm
interpretation its concepts and hierarchical links between         of conceptual modeling which studies how objects can
them. Experimental investigation of the proposed                   be hierarchically grouped together according to their
technique is performed on the annotated textual corpus             common attributes. In the FCA, its conceptual model is
consisted of descriptions of biotopes of bacteria.                 the lattice of formal concepts (concept lattice) which is
                                                                   built on the abstract sets treated as objects and their
†The paper concerns the work which is partially                    attributes. Concept lattices have been applied as an
supported by Russian Foundation of Basic Research,                 instrument for information retrieval and knowledge
grant № 15-07-05507                                                extraction in many applications. The number of FCA
                                                                   applications now is growing up including applications in
1 Introduction                                                     social science, civil engineering, planning, biology,
                                                                   psychology and linguistics [21], [22]. Several successful
                                                                   implementations of FCA methods on fact extraction on
   Knowledge extraction from textual data requires                 textual data [8] and Web data are known [15]. Although
more in-depth intensive analysis of this data. In the area         the high level of abstraction makes FCA suitable for use
of Text Mining, some variants of knowledge extraction              with data of any nature, its application to specific data
have been realized by solving such problems as                     often requires special investigation. It is fully relevant for
sentiment analysis, fact extraction and decision making            using FCA on textual data.
support. To solve these problems it is necessary to have               The main problem in creating concept lattice on
models that reflect semantics of textual data. It is               textual data is building so called formal contexts on this
especially urgent when this data is presented as                   data. Formal context is matrix representation of the
unstructured natural language texts.                               relation on the sets of objects and attributes. So it is
                                                                   needed to acquire words or word combinations from
                                                                   texts which are interpreted as objects and attributes. To
                                                                   restrict all possible combinations of words of such
Proceedings of the XVIII International Conference
                                                                   meanings we need to select from them those ones which
«Data Analytics and Management in Data Intensive
                                                                   are valued for solving concrete problem or the class of
Domains» (DAMDID/RCDL’2016), Ershovo, Russia,
                                                                   problems. As a result a concept lattice created on texts
October 11 - 14, 2016




                                                              16
becomes domain specific. This is similar to the design of         properties of A :  {m  M |  g , m  I  g  A}
ontologies and concept lattice is often considered as             and B :  {g  G |  g , m  I  m  B} then the pair
framework of ontology [21].
    Another paradigm of conceptual modeling is                    (A, B) that A  B, B  A is named as formal concept.
Conceptual Graphs (CGs) [24]. Conceptual graph is                 The sets A and B are closed by composition of mappings:
bipartite directed graph having two types of vertices:             A ''  A, B ''  B ; A and B is called the extent and the
concepts and conceptual relations. Conceptual terms of            intent of a formal context K = (G, M , I ) respectively.
entities and relationships are represented in conceptual             By other words, a formal concept is a pair (A, B) of
graphs as its concepts and conceptual relations.                  subsets of objects and attributes which are connected so
    Conceptual graphs have been applied for modeling              that every object in A has every attribute in B, for every
many real life objects including texts. Acquiring                 object in G that is not in A, there is an attribute in B that
conceptual graphs from natural language texts is non-             the object does not have and for every attribute in M that
trivial problem but it is quite solvable [3], [5].                is not in B, there is an object in A that does not have that
    The main purpose of this paper is to show how two             attribute.
paradigms of conceptual modeling - Conceptual Graphs                 The partial orders established by relations ф and Р
and Formal Concept Analysis - can be united in one
                                                                  on the set G and M induce a partial order ≤ on the set of
modeling technique. The idea of joining these two
                                                                  formal concepts. If for formal concepts (A1, B1) and (A2,
paradigms seems very attractive but not elaborated much
enough [22], [26].                                                B2), A1 ф A 2 and B2 Р B1 then (A1, B1) ≤ (A2, B2) and
    Proposed technique is used in on-going project of             formal concept (A1, B1) is less general than (A2, B2). This
creating fact extraction system working on biomedical             order is represented by concept lattice. A lattice consists
data. Experimental investigation of it is performed on the        of a partially ordered set in which every two elements
annotated textual corpus consisted of descriptions of             have a unique supremum (also called a least upper bound
biotopes of bacteria.                                             or join) and a unique infimum (also called a greatest
                                                                  lower bound or meet).
2 CGs-FCA modeling                                                  According to the central theorem of FCA [13], a
                                                                  collection of all formal concepts in the context
The proposed modeling technique named briefly as CGs-             K = (G, M , I ) with subconcept-superconcept ordering
FCA modeling is based on using conceptual graphs and
                                                                  ≤ constitutes the concept lattice of K . Its concepts are
concept lattice. It may be applied for knowledge
extraction from textual data. In CGs-FCA modeling                 subsets of objects and attributes connected each other by
conceptual graphs serve as semantic models of text                mappings A , B  and ordered by a subconcept-
sentences and the data source for formal context of               superconcept relation.
concept lattice. Concept lattice built on textual data is
applied for knowledge extraction. Knowledge,
sometimes interpreted as facts, can be extracted by using
navigation in the lattice and interpretation its concepts
and hierarchical links between them.
    To illustrate CGs-FCA modeling, consider some
FCA basics.


2.1 Formal Concept Analysis basics
There are two basic notions FCA deals with: formal
context and concept lattice [13]. Formal context is a
triple K = (G, M , I ) , where G is a set of objects, M –
set of their attributes, I  G  M – binary relation
which represents facts of belonging attributes to objects.
The sets G and M are partially ordered by relations ф
and Р , correspondingly: G = (G, ф ) , M  (M , Р ) .
Formal context may be represented by [0, 1] - matrix
K = {ki , j } in which units mark correspondence
                                                                  Figure 1 Example of formal context and concept lattice.
between objects gi G and attributes m j  M . The
concepts in the formal context have been determined by              To illustrate these abstract definitions consider an
the following way. If for subsets of objects A  G and            example. Figure 1 shows simple formal context and
                                                                  concept lattice composed on the sets G = {DNA, Virus,
attributes B  M there are exist mappings (which may              Prokaryotes, Eukaryotes, Bacterium} and M =
be functions also) A : A  B and B : B  A with                 {Membrane, Nucleus, Replication, Recombination}. The




                                                             17
set G is ordered according to sizes of its elements: DNA                 using corpus tagging and semantic models of
is smallest and bacterium is biggest ones. The set M has                  texts [10].
relative order: one part (Membrane, Nucleus)
characterizes microbiological structure of objects from               We apply the second approach and use conceptual
G, but another part (Replication, Recombination)                  graphs for representing semantics of individual sentences
characterizes the way of breeding, and these parts are            of a text.
incomparable. In the concept lattice the bacterium is
placed in the concept C1 = ({Prokaryotes, Eukaryotes,             2.2 The modeling process
Bacterium}, {Membrane, Replication}). In this concept,
three objects {Prokaryotes, Eukaryotes, Bacterium}                Consider in general the process of CGs – FCA modeling.
constitute the extent of the concept; they are united by          It includes the following steps.
their mutual attribute {Membrane, Replication} which                   1. Acquiring a set of conceptual graphs from input
constitute the intent of the concept. The concept C1 is           texts. As it is mentioned above conceptual graphs can be
more general concept than the concept C2 =                        acquired from texts by existing information systems. For
({Eukaryotes}, {Nucleus}).                                        example they can be created by our system CGs Maker
                                                                  1
    Also on the Fig. 1 there are two different branches of         . Some details about it can be found in [3], [5]. We use
concepts characterizing two families: the viruses and             verb-centered approach for creating conceptual graphs.
DNA and prokaryotes, eukaryotes and bacteria. This                According to this approach, a conceptual graph is
concept demonstrates the fact of separation of objects            constructed so that there is the central concept in it which
from the set G into two important branches. The link              is realized as a verb. If there are no verbs in a sentence
between them is the attribute “Membrane”. It is known             then method also creates conceptual graph. Verb-
[7] that viruses can have a lipid shell formed from the           centered approach is important for us since it provides
membrane of the host cell. Therefore, the membrane is             predicate forms in the structures of conceptual graphs.
positioned in the formal context on the Fig. 1 as an              These forms are mostly used for representing conceptual
attribute of the virus.                                           graph semantics.
    This example demonstrates specific ways of                        2. Aggregating the set of conceptual graphs.
extracting knowledge from conceptual lattice:                     Aggregation is needed to exclude excessive dimension of
     analyzing formal concepts in concept lattice;               conceptual models, not related to useful information. We
     analyzing conceptual structures in concept                  have tested following ways of conceptual graphs
         lattice – its sub lattices in the general case.          aggregation: conceptual graphs clustering and using
                                                                  corpus tagging together with support of concept types in
                                                                  conceptual graphs. Clusters of conceptual graphs need to
2.1.1 FCA on textual data                                         be semantically interpreted which may lead to additional
                                                                  investigations. The second method is more constructive
The main problem in applying FCA on textual data is the           since it selects those conceptual graphs which concepts
problem of building formal context. If textual data is            have mappings to certain domain. Such domain of terms
represented as natural language texts then this problem           may be presented by corpus tagging or by thesaurus.
becomes acute.                                                    Some details of aggregation are below.
     There are several approaches to the construction of              3. Creating formal contexts. This is the central point
formal contexts on the textual data, presented as separate        of CGs – FCA modeling. One or several formal contexts
documents, as data corpora. One, mostly applied variant           have been built on the aggregated conceptual graphs. The
is the context in which the objects are text documents and        number of formal concepts and the method of building
the attributes are the terms from these documents.                them depend on the problem being solved with CGs –
Another variant is building formal context directly on the        FCA modeling.
texts and the formal context may represent various                    4. Building concept lattice. Having a formal context
features of textual data:                                         as input data, a concept lattice may be created by using
       semantic relations (synonymy, hyponymy,                   various algorithms. There is a field of research in FCA
        hypernymy) in a set of words for semantic                 devoted to creating and developing algorithms for
        matching [16],                                            concept lattice creation [21]. On the current stage of CGs
                                                                  – FCA modeling technique we use standard solution of
       verb-object dependencies from texts [10],
                                                                  creating concept lattice realized in the open source tool
       words and their lexico-syntactic contexts [20].
                                                                  [27]. Nevertheless, here there are certain possibilities to
These lexical elements must be distinguished in texts as          create new algorithms oriented on specific structure of
objects and attributes. There are following approaches            formal contexts acquired from conceptual graphs. One of
to solve this problem:                                            such structure is block-diagonal structure which arises
                                                                  namely on using textual data as input.
       adding special descriptions to texts which mark               5. Knowledge extraction from concept lattice. In
        objects and attributes and partial order,                 concept lattice it is possible to identify connections

1
  The lightweight online version of CGs Maker for
simple English and Russian texts can be found at http://85.142.138.156:8888 .




                                                             18
between its concepts according to the principle of                     similar “semantically”. That means that their concepts
"common – particular". Each concept may be interpreted                 have the same type. For example different names of
as the set of potential facts of certain level, which is               bacteria belong to the type “bacterium” or the type “the
associated with other facts. So the knowledge extracted                name of bacteria”.
from concept lattice may be interpreted a s facts.                         The second way of conceptual graphs aggregation is
                                                                       based on supporting types of concepts by using external
2.3 Aggregation of conceptual graphs                                   resources. Thesaurus or corpus tagging may be such
                                                                       resource. Section 3 contains additional details.
    In the theory of conceptual graphs aggregation means
replacing conceptual graphs by more general graphs [24]
. These general graphs may be created as new graphs or                 2.4 Creating formal contexts
may be graphs or sub graphs from initial set of graphs.
Aggregation of conceptual graphs has semantic meaning                  The crucial step in the described process of CGs – FCA
and general graphs make up the context (not formal                     modeling is creating formal contexts on the set of
context) of initial set of graphs.                                     conceptual graphs.
    Clustering is a way of aggregation of conceptual                       At first glance, this problem seems simple: those
graphs. Graphs which are the nearest ones to the centers               concepts of conceptual graphs which are connected by
of clusters have been treated as general graphs.                       "attribute" relation have been put into formal context as
    We have studied several approaches for clustering                  its objects and attributes. Actually the solution is much
conceptual graphs [2] using various similarity measures.               more complex.
There are two known similarity measures proposed in
[17], the conceptual similarity                                            Fig. 2 shows an example of conceptual graph for the
                                                                       sentence “Burkholderia phytofirmans belongs to the
                             2n( c )
                   sc                                                 beta-proteobacteria and was isolated from surface-
                          n( 1 )  n( 2 )                            sterilized glomus vesiculiferum-infected onion roots.”

                                                           (1)

and relative similarity
                             2m( c )
               sr 
                      m c ( 1 )  m c ( 2 ) .          (2)

   Here  1 ,  2 - conceptual graphs,  c   1   2 is their
common sub graph, n ( i ) - number of concepts of graph
 i , m ( ) - number of relations of graph  i , m ( )
          i                                          c   i
is the number of relations of conceptual graph  i , at least
one of which belongs to the common sub graph  c .
    If two conceptual graphs have identical concepts then
their conceptual similarity has non zero value. Relative
similarity is non-zero when two conceptual graphs have
identical structures of patterns of conceptual relations.
    We used conceptual and relative similarities (1), (2)
                                                                       Figure 2 Conceptual graph for the sentence
and their combination in the experiments of conceptual
                                                                       “Burkholderia phytofirmans belongs to the beta-
graphs clustering [2]. Except traditional algorithms of
                                                                       proteobacteria and was isolated from surface-sterilized
clustering such as K-means, we used genetic clustering
                                                                       glomus vesiculiferum-infected onion roots.”
algorithm with special encoding. The peculiarity of
implementing genetic algorithms for clustering is that
                                                                           This graph has five conceptual relations “attribute”
there may be several final solutions i.e. several different
                                                                       but only four of them indicate the objects and attributes
variants of clustering.
                                                                       valid for formal context. Using “phytofirman” as object
    All numerical characteristics of conceptual graphs
                                                                       and “Burkholderia” as attribute in the formal context is
clustering results (number of clusters, dimensions of
                                                                       wrong way because “Burkholderia phytofirmans” is
clusters, etc.) are not informative. Clusters of conceptual
                                                                       known full name of this bacterium [6] and full names of
graphs need to be semantically interpreted. The way of
                                                                       bacteria have to be objects in a formal context devoted to
that interpretation depends on the nature of the problem
                                                                       bacteria. Word combinations denoting the names of
to be solved with conceptual graphs.
                                                                       bacteria must be recognized before the building of
    Both conceptual and relative similarity measures
                                                                       conceptual graphs. There is no other way of doing this
share a common sub graph  c . But two conceptual                      than to use an external source of information, for
graphs may have no common sub graph but may be                         example, the corpus tagging. So in this example the sub




                                                                  19
graph - (attribute) - < burkholderia > is              by huge amount of scientific publications in
useless for bacteria names recognizing.                             Bioinformatics and organizing them into corpora with
    Remaining elements of conceptual graph on the Fig.              access to full texts of articles via such systems as
2 are not useless and play significant roles in creating            PubMed [25]. Information resources of PubMed have
formal context. Conceptual graph on the Fig. 2 represents           been united in several subsystems presenting databases,
two facts:                                                          corpora and ontologies.
1. bacterium Burkholderia phytofirmans belongs to                   So called “research community around PubMed” [14]
     beta-proteobacteria;                                           forms data intensive domain in this area. It not only uses
2. this bacterium infects the onion.                                data from PubMed but also creates new data resources
     To provide the presence information about those and            and data mining tools including specialized languages
other facts in the formal contexts the following rules are          for effective biomedical data processing [11].
implemented as mostly important when creating formal                In our experiments we use PubMed vocabulary thesaurus
contexts.                                                           MeSH (Medical Subject Headings) as external resource
1. Not only individual concepts and relations, but also             for supporting types of concepts in conceptual graphs.
     patterns of connections between concepts in
     conceptual graphs represented as sub graphs have               3.2 Data structures
     been analyzed and processed. These patterns are
     predicate forms  -  -              Our experiments have been carried out using text corpus
     which in conceptual graphs look as the template                of bacteria biotopes which is used in the innovation
     - (patient) - < verb > - (agent) -                    named as BioNLP Shared Task [6]. Biotope is an area of
     . Not only agent and patient semantic                 uniform environmental conditions providing a living
     roles but also other similar to them (goal on the Fig.         place for plants, animals or any living organism. Biotope
     2) roles are allowed in templates.                             texts form tagged corpus. The tagging includes full
2. The hierarchy of conceptual relations in conceptual              names of bacteria, its abbreviated names and unified key
     graphs is fixed and taken into account when creating           codes in the database. We can add additional tags and we
     formal context. This hierarchy exists on the Fig.2:            do it.
     relations “agent”, “goal”, “from”, “modificator” are               A BioNLP data is always domain-specific. All the
     on the top level and relations "attribute" belong to           texts in the corpus are about bacteria themselves, their
     underlying levels. Using this hierarchy of                     areal and pathogenicity. Not every text contains these
     conceptual relations we can select for formal                  three topics but if some of them are in the text then they
     contexts more or less details from conceptual                  are presented as separate text fragments. This simplifies
     graphs.                                                        text processing.
   These empirical rules are related to the principle of                The CGs – FCA modeling environment has DBMS
pattern structures which was introduced in FCA in the               for storing and managing data used in experiments. We
work [12]. A pattern structure is the set of objects with           use relational database on the SAP-Sybase platform.
their descriptions (patterns), not attributes. Patterns also        Database stores texts, conceptual graphs, formal contexts
have similarity operation. The instrument of pattern                and concept lattices. Special indexing is applied on
structures is for creating concept lattices on the data             textual data.
being more complicated than sets of objects and
attributes.                                                         3.3 BioNLP tasks
   Conceptual graph is a pattern for the object it
                                                                    According to the BioNLP Shared Task initiative [6] there
represents. A sub graph of conceptual graph is projection
                                                                    are two main tasks solving on biomedical corpora: the
of a pattern. Namely projections are often used for
                                                                    task of Named Entity Recognition (NER) and the task of
creating formal contexts. Similarity operation on
                                                                    Relations Extraction (RE).
conceptual graphs is a measure of similarity which is
                                                                        The task of Named Entity Recognition on the corpus
applied in clustering. The relative similarity (2) is mostly
                                                                    of bacteria descriptions is formulated as seeking bacteria
close to be similarity operation for patterns.
                                                                    names presented directly in the texts or as co-references
   The CGs – FCA modeling technique was tested in
                                                                    (anaphora).
various levels of its realization for classification
                                                                         Relations Extraction means seeking links between
messages in technical support services [3], modeling
                                                                    bacteria and their habitat and probably diseases it causes.
requirements for information systems [4] and classifying
queries to biomedical systems [5].
                                                                    3.4 NER and anaphora resolution
3 CGs-FCA modeling on biomedical data
                                                                    The task of Named Entity Recognition has direct solution
                                                                    with conceptual graphs. The only problem which is here
3.1 Biomedical data intensive domain
                                                                    is anaphora resolution.
Bioinformatics is the field where Data Mining and Text                  Anaphora resolution is the problem of resolving
Mining applications are growing up rapidly. New term of             references to earlier or later items in the text. These items
“Biomedical Natural Language Processing” (BioNLP)                   are usually noun phrases representing objects called
has been appeared there [1]. This appearing is stipulated           referents but can also be verb phrases, whole sentences




                                                               20
or paragraphs. Anaphora resolution is the standard                only in one sentence. For extracting relations between
problem in NLP.                                                   bacteria on several texts we applied concept lattices.
    Biotopes texts we work with contain several types of              We had selected 130 mostly known bacteria and have
anaphora:                                                         processed corresponding corpus texts about them. All the
     hyperonym definite expressions (“bacterium” -               texts were preliminary filtered for excluding stop words
         “organism”, “cell” - “bacterium”),                       and other non-informative lexical elements.
     higher level taxa often preceded by a                           Three formal contexts of “Entity”, “Areal” and
         demonstrative determinant (“this bacteria”, “this        “Pathogenicity” were built on the texts. They have the
         organism”),                                              names of bacteria as objects and corresponding concepts
     sortal anaphors (“genus”, “species”, “strain”).             from conceptual graphs as attributes. Table 2 shows
   For anaphora detection and resolution we use a                 numerical characteristics of created contexts.
pattern-based approach. It is based on fixing anaphora
items in texts and establishing relations between these
items and bacteria names. We use double-pass algorithm
for anaphora resolution which controls so called isolated
concepts appeared on the first pass of the algorithm.
Isolated concepts are those concepts which are not
connected by relation with any other concepts. As a rule
they appear when a sentence contains abbreviations or
code of bacterium. For example, in the sentence
“Streptococcus thermophilus strain LMG 18311” there is
code of bacterium strain. This code will be presented as
isolated concept in conceptual graph. Later in another
sentence there is text fragment “…two yogurt strains of           Figure 3 Recall and precision ratio for NER solution on
S. thermophiles …” which has abbreviation of the name             60 objects
of bacterium. Having isolated concept with strain code
                                                                  Table 2 Numerical characteristics of created contexts
we can identify it with bacterium using corpus tagging.
For resolving abbreviations programming triggers which             Context           Number        Number         Number of
react to the second word after abbreviation are applied.           name              of            of             formal
   To evaluate the quality of this solution of NER the                               objects       attributes     concepts
standard characteristics of recall, precision and F-score          Entity               130             26            426
were calculated. To obtain them it was needed to mark              Areal                130             18            127
named entities manually in the texts used in experiments.          Pathogenicity        130             28            692
The table 1 contains values of recall, precision and F-                Among attributes there are bacteria properties (gram-
score compared with corresponding values from the                 negative, rod-shaped, etc.) for “Entity” context, mentions
work [23]. In this work pattern-based approach is also            of water, soil and other environment parameters for
applied and several external resources were involved in           “Areal” context and names and characteristics of
the NER solution. The Alvis system was explored in [23]           diseases for “Pathogenicity” context
and SemText is the name of our system which explores                   As it is followed from the table there is relatively
CGs – FCA modeling.                                               small number of formal concepts in the contexts. This is
                                                                  due to the sparse form of all contexts generated by
Table 1 Recall, precision and F-score for NER                     conceptual graphs.
solutions                                                              For extracting relations we use visualization on the
                                                                  current stage of modeling technique. It allows getting
                Recall         Precision F-score
                                                                  results only for relatively small lattices.
  Alvis         0,52           0,46           0,59                     Often relations between concepts in concept lattice
  SemText       0,42           0,53           0,47                may be treated as facts. Extracting facts from concept
The ratio of the values of recall and precision is more           lattices is realized by forming special views constructed
informative than their individual ones and is shown on            on the lattice and corresponded to certain property (intent
the Fig. 3. According to the table 1 and Fig. 3 we resume         in the lattice) or entity (extent in the lattice) on the set of
that there is medium quality of our solution of NER. It is        bacteria. Every view is a sub lattice. It shows the links
explained by disability of our algorithm to interpret all         between concrete bacterium and its properties.
possible isolated concepts in conceptual graph. As a                   An example of such view as the fragment of lattice is
result approximately half of marked lexical elements              shown on Fig. 4. The lattice on the Fig. 4 contains formal
were not recognized as entities.                                  concepts related to the following bacteria: Borrelia
                                                                  turicatae,      Frankia,     Legionella,       Clamydophila,
3.5 Relations extraction with concept lattices                    Thermoanaerobacter tengcongensis, Xanthomonas
                                                                  oryzae. Highlighted view on the figure illustrates gram-
Conceptual graphs represent relations between words.              negative property of bacteria. Such bacteria are resistant
Therefore they can be applied for relations extraction but        to conventional antibiotics.




                                                             21
    Using this view, some facts about bacteria can be                       solution. We guess that using patterns structures
extracted:                                                                  and their projections is that way of formalizing
                                                                            CGs – FCA modeling technique.
    only     three    bacteria   from     the   set,                  3.   The next stage of developing CGs – FCA
     Thermoanaerobacter                tengcongensis,                       modeling is creating fledged information system
     Clamydophila and Xanthomonas oryzae, are gram-                         which process user queries and produce
     negative;                                                              solutions of certain tasks on textual data. Not
    two gram-negative bacteria, Thermoanaerobacter                         only visualization but also special user oriented
     tengcongensis and Xanthomonas oryzae, have the                         interfaces to concept lattice will be created in this
     shape as rod;                                                          system.
    one of gram-negative bacteria, Clamydophila, is
     obligately pathogenic.                                         References
   Note that attribute obligately pathogenic was formed             [1] BioNLP 2014. Workshop on Biomedical Natural
directly from the two words in the text according to the                Language Processing. Proceedings of the
rule of marking words denoting extreme situation.                       Workshop. The Association for Computational
                                                                        Linguistics.      Baltimore,     2014.   155     p.
                                                                        http://acl2014.org/acl2014/W14-34/W14-34-
                                                                        2014.pdf
                                                                    [2] Bogatyrev M., Latov, V., Stolbovskaya. Application
                                                                        of Conceptual Graphs in Digital Libraries. – Proc.
                                                                        RCDL (Digital libraries: advanced methods and
                                                                        technologies, digital collections), pp. 464-468.
                                                                        2007.
                                                                    [3] Bogatyrev M., Kolosoff A. Using Conceptual
                                                                        Graphs for Text Mining in Technical Support
Figure 4 Example of view of gram-negative property of                   Services. Pattern Recognition and Machine
bacteria.                                                               Intelligence. - Lecture Notes in Computer Science,
                                                                        2011, Volume 6744/2011, pp. 466-471. Springer-
   Comparing our results of relations extraction with the               Verlag,                Heidelberg,           2011.
known ones from [23] we resume that concept lattice                     http://link.springer.com/chapter/10.1007%2F978-3-
provides principally another variant of solution of this                642-21786-9_75
task. In [23] results of relations extraction are presented         [4] Bogatyrev, M., Nuriahmetov, V., Application of
as marked words in the texts. Visualized concept lattice                Conceptual Structures in Requirements Modeling. –
is more powerful object for investigating relations.                    Proc. of the International Workshop on Concept
                                                                        Discovery in Unstructured Data (CDUD 2011) at
4 Conclusions and future work                                           the Thirteenth International Conference on Rough
                                                                        Sets, Fuzzy Sets, Data Mining and Granular
 This paper describes the idea of joining two paradigms
                                                                        Computing - RSFDGrC 2011. Moscow, Russia,
of conceptual modeling - conceptual graphs and concept
                                                                        2011, pp. 11-19.
lattices. Current results of realizing this idea as CGs –
FCA modeling on textual data show its good potential for            [5] Bogatyrev, M. Y., Vakurin V. S. Conceptual
knowledge extraction.                                                   Modeling in Biomedical Data Research.
     In spite of advantage of CGs – FCA modeling there                  Mathematical Biology and Bioinformatics. 2013.
are some problems which need to be solved for                           Vol. 8. № 1, pp.           340–349. (in Russian).
improving the quality of modeling technique.                            http://www.matbio.org/2013/Bogatyrev_8_340.pdf
     1. Conceptual graphs acquired from texts contain               [6] Bossy R, Jourde J, Manine A-P, Veber P, Alphonse
          many noise elements. Noise is constituted by the              E, Van De Guchte M, Bessières P, Nédellec C:
          text elements that contain no useful information              BioNLP 2011 Shared Task - The Bacteria Track.
          or cannot be interpreted as facts. Noise elements             BMC Bioinformatics. 2012, 13: S8, pp. 1-15.
          significantly decrease efficiency of algorithms of            http://bmcbioinformatics.biomedcentral.com/article
          CGs – FCA modeling. To exclude noise we need                  s/10.1186/1471-2105-13-S11-S3
          to distinguish textual data that can be excluded          [7] Campbell, N. A., etc., Biology: Concepts and
          from consideration, for example, information                  Connections. Benjamin-Cummings Publishing
          about when and by whom a bacterium was first                  Company, 2005.
          identified.                                               [8] Carpineto, C., & Romano, G. Exploiting the
     2. Empirical rules which we use for creating formal                Potential of Concept Lattices for Information
          contexts cannot embrace all configurations of                 Retrieval with CREDO. Journal of Universal
          conceptual graphs. More formal approach to                    Computing, 10, 8, 985-1013. 2004.
          creating formal contexts on the set of conceptual         [9] Carpineto, C., Romano, G. Using Concept Lattices
          graphs will guarantee the completeness of                     for Text Retrieval and Mining. In B. Ganter, G.




                                                               22
     Stumme, and R. Wille (Eds.), Formal Concept                 [18] Obitko, M., Snasel,V., Smid, Jan. Ontology Design
     Analysis: Foundations and Applications. Lecture                  with Formal Concept Analysis. – Proc. of the CLA
     Notes in Computer Science 3626, pp. 161-179.                     2004 International Workshop on Concept Lattices
     Springer-Verlag, Berlin, 2005.                                   and their Applications. http://ceur-ws.org/Vol-
[10] Cimiano, P. Hotho, A. Staab, S. Learning Concept                 110/paper12.pdf
     Hierarchies from Text Corpora using Formal                  [19] Olivé, Antoni, Conceptual Modeling of Information
     Concept Analysis. Journal of Artificial Intelligence             Systems. Springer-Verlag. Berlin, Heidelberg, 2007.
     Research, Volume 24, pp. 305-339. 2005.                          https://www.amazon.com/dp/3540393897/ref=rdr_
     http://arxiv.org/pdf/1109.2140.pdf                               ext_tmb
[11] Edhlund, B., McDougall, A., Pubmed Essentials,              [20] Otero P. G., Lopes G. P., Agustini, A., Automatic
     Mastering the World's Health Research Database.                  Acquisition of Formal Concepts from Text, Journal
     Form         &        Kunskap       AB,        2014.             for Language Technology and Computational
     https://www.amazon.com/PubMed-Essentials-                        Linguistics. Vol. 23(1), pp. 59-74. 2008.
     Mastering-Research-Database/dp/1312289457                   [21] Poelmans J., Kuznetsov S. O., Ignatov D. I., Dedene
[12] Ganter, B., Kuznetsov, S.O. Pattern structures and               G. Formal Concept Analysis in knowledge
     their projections. In: ICCS, pp. 129–142. 2001                   processing: A survey on models and techniques //
[13] Ganter, B., Stumme, G., Wille, R., eds., Formal                  Expert Systems with Applications. 2013. Vol. 40.
     Concept Analysis: Foundations and Applications,                  No. 16. P. 6601-6623.
     Lecture Notes in Artificial Intelligence, No. 3626,         [22] Priss, U., Linguistic Applications of Formal Concept
     Springer-Verlag. 2005                                            Analysis. In: Ganter; Stumme; Wille (eds.), Formal
[14] Hunter L, Cohen KB. Biomedical language                          Concept Analysis, Foundations and Applications.
     processing: what’s beyond PubMed? Mol. Cell.                     Springer Verlag. LNAI 3626, p. 149-160. 2005
     2006. V. 21. P. 589–594.                                    [23] Ratkovic, Z., Golik, W., Warnier, P. Event
[15] Ignatov, Dmitry I. and Kuznetsov, Sergei O. and                  extraction of bacteria biotopes: a knowledge-
     Poelmans, Jonas, Concept-based Biclustering for                  intensive     NLP-based      approach.    -    BMC
     Internet Advertisement. In: Vreeken, J and Ling, C               Bioinformatics 2012, 13, (Suppl 11): S8, pp. 1-11.
     and Zaki, MJ and Siebes, A and Yu, JX and                        http://bmcbioinformatics.biomedcentral.com/article
     Goethals, B and Webb, G and Wu, X , Eds., Proc. of               s/10.1186/1471-2105-13-S11-S8
     12th IEEE International Conference On Data                  [24] Sowa, J.F., Conceptual Structures: Information
     Mining Workshops (ICDMW 2012), pp. 123-130,                      Processing in Mind and Machine. Addison-Wesley,
     2012.                                                            London, UK. 1984
[16] Meštrović, A. Semantic Matching Using Concept               [25] U.S.       National     Library     of     Medicine.
     Lattice. Concept Discovery in Unstructured Data,                 http://www.ncbi.nlm.nih.gov/pubmed .
     CDUD 2012, pp. 49-58. http://ceur-ws.org/Vol-               [26] Wille, R. Conceptual Graphs and Formal Concept
     871/paper_6.pdf                                                  Analysis. Proceedings of the Fifth International
[17] Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-                     Conference on Conceptual Structures: Fulfilling
     Yates, Flexible Comparison of Conceptual Graphs.                 Peirce's Dream. 290 - 303. Springer-Verlag,
     Lecture Notes in Computer Science 2113. Springer-                London. 1997
     Verlag, 2001.                                               [27] ConExp-NG. https://github.com/fcatools/conexp-ng




                                                            23