<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Least General Generalization of the Linguistic Structures</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Boris Galitsky Oracle Corp., Redwood Shores CA USA and Dmitry Ilvovsky National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We convert existing training datasets into the ones closed under linguistic generalization operations to expand infrequent cases. We transfer the definition of the least general generalization from logical formulas to linguistic structures, from words to phrases, sentences, speech acts and discourse trees. The main advantage of the resultant frameworks is explainability and learnability from a small set of samples. Learning via generalization of linguistic structures turned out to be well suited for industrial linguistic applications with limited training datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>camera(zoom(AnyZoom), beginner),
where variables (non-instantiated values, not specified in NL expressions) are capitalized. Given the
above pair of formulas, unification computes their most general specialization camera(zoom(digital),
beginner), and anti-unification computes their most specific generalization, camera(zoom(AnyZoom),
AnyUser).</p>
      <p>At syntactic level, we have generalization (‘^’) of two noun phrases as: {NN-camera, PRP-with,
[digital], NN-zoom [for beginners]}.</p>
      <p>We eliminate expressions in square brackets since they occur in one expression and do not occur in
another. As a result, we obtain{NN-camera, PRP-with, NN-zoom]}, which is a syntactic analog as the
semantic generalization above.</p>
      <p>The purpose of an abstract generalization is to find commonality between portions of text at various
semantic levels. Generalization operation occurs on the levels of Text / Paragraph / Sentence / Individual
word.</p>
      <p>At each level except the lowest one, individual words, the result of generalization of two expressions
is a set of expressions. In such set, for each pair of expressions so that one is less general than other, the
latter is eliminated. Generalization of two sets of expressions is a set of sets which are the results of
pairwise generalization of these expressions.</p>
      <p>
        Only a single generalization exists for a pair of words: if words are the same in the same form, the
result is a node with this word in this form. To involve word2vec models
        <xref ref-type="bibr" rid="ref6">(Mikolov et al., 2015)</xref>
        , computing
generalization of two different words, we use the following rule. If subject1=subject2, then
subject1^subject2 = &lt;subject1, POS(subject1), 1&gt;. Otherwise, if they have the same part-of-speech,
subject1^subject2 =&lt;*,POS(subject1), word2vecDistance(subject1^subject2)&gt;. If part-of-speech is
different, generalization is an empty tuple. It cannot be further generalized.
      </p>
      <p>For a pair of phrases, generalization includes all maximum ordered sets of generalization nodes for
words in phrases so that the order of words is retained. In the following example</p>
      <p>To buy digital camera today, on Monday</p>
      <p>Digital camera was a good buy today, first Monday of the month
generalization is {&lt;JJ-digital, NN-camera&gt; ,&lt;NN- today, ADV,Monday&gt;} , where the generalization
for noun phrases is followed by the generalization by adverbial phrase. Verb buy is excluded from
both generalizations because it occurs in a different order in the above phrases. Buy - digital - camera
is not a generalization phrase because buy occurs in different sequence with the other generalization
nodes.</p>
      <p>
        At the discourse level, rhetorical relations with elementary discourse units can be generalized as well.
Only rhetorical relations of the same class (presentation relation, such as antithesis, subject matter
relation, such as condition, and multinuclear relation, such as list) can be generalized. We use N for a
nucleus or situations presented by this nucleus, and S for satellite or situations presented by this satellite.
Situations are propositions, completed actions or actions in progress, and communicative actions and states
(including beliefs, desires, approve, explain, reconcile and others). Hence we have the following
expression for Rhetoric Structure Theory- based
        <xref ref-type="bibr" rid="ref7">(RST, Marcu, 2000)</xref>
        generalization for two texts text1 and
text2:
text1 ^ text2 = i,j (rstRelation1i, (…,…) ^ rstRelation2j (…,…)),
where I  (RST relations in text1), j  (RST relations in text2). Further, for a pair of RST relations
their generalization looks as follows: rstRelation1(N1, S1) ^ rstRelation2 (N2, S2) = (rstRelation1^
rstRelation2 )( N1^N2, S1^S2).
      </p>
      <p>The texts in N1, S1 are subject to generalization as phrases. The rules for rst1^ rst2 are as follows. If
relation_type(rst1 ) ! = relation_type(rst2 ) then similarity is empty. Otherwise, we generalize the
signatures of rhetoric relations as sentences: sentence(N1, S1) ^ sentence (N2, S2) (Iruskieta et al., 2015).</p>
      <p>
        To optimize the calculation of generalization score, we rely on a computational study which
determined the POS weights to deliver the most accurate similarity measure between sentences
possible
        <xref ref-type="bibr" rid="ref1">(Galitsky et al., 2012)</xref>
        . The problem was formulated as finding optimal weights for nouns,
adjectives, verbs and their forms (such as gerund and past tense) such that the resultant search
relevance is maximl. Search relevance was measured as a deviation in the order of search results from
the best one for a given query (delivered by Google); current search order was determined based on
the score of generalization for the given set of POS weights (having other generalization parameters
fixed). As a result of this optimization performed in
        <xref ref-type="bibr" rid="ref1">(Galitsky et al., 2012)</xref>
        , we obtained WNN = 1.0,
WJJ = 0.32, WRB = 0.71, WCD = 0.64, WVB = 0.83, WPRP = 0.35 excluding common frequent verbs like
get/ take/set/put for which WVBcommon= 0.57. We also set that W&lt;POS,*&gt; =0.2 (different words but the
same POS), and W&lt;*,word&gt; =0.3 (the same word but occurs as different POSs in two sentences).
Generalization score (or similarity between sentences sent1, sent2) then can be expressed as sum
through phrases of the weighted sum through words wordsent1 and word sent2
score(sent1, sent2) = ∑ {NP, VP, …}∑ WPOS word_gen(word sent1 word sent2). The best generalization can
then be defined as the one with the highest score. This way we define a generalization for phrases,
sentences and paragraphs, as well as verb signatures from VerbNet, speech acts, communicative actions,
as well rhetorical relations.
      </p>
      <p>
        Result of the generalization can be further generalized with other parse trees or generalization. For a
set of sentences, totality of generalizations forms a lattice: order on generalizations is set by the
subsumption relation and generalization score
        <xref ref-type="bibr" rid="ref5">(Khardon and Arias, 2006)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Question Answering Relevance via Generalization</title>
      <p>
        To rank answers according to the relevance to a complex paragraph-sized question, paragraph-level
generalization comes into play. It needs to involve a set of parse trees for each sentence of the
paragraph and the paragraph-level discourse information. We refer to the sequence of parse trees plus a
number of arcs for inter-sentence relations of the discourse type between the nodes for words as a
parse thicket
        <xref ref-type="bibr" rid="ref2">(Galitsky, 2017)</xref>
        . It is a graph that includes parse trees for each sentence, as well as
additional arcs for inter-sentence discourse relationships. We intend to demonstrate the richness of
representation by parse thickets and the robustness of syntactic generalization operation on them for search
relevance.
      </p>
      <p>Our example is a web search query and its answers selected from the first page of a Google search.
Although both answers A1 and A2 share very similar keywords, we show that using discourse
information helps to differentiate them, relying on parse thicket representation and syntactic
generalization operation with query Q.</p>
      <p>Q: I am buying a foreclosed house. A bank offered me to waive inspection; however I am afraid I will
not identify some problems in this property unless I call a specialist.</p>
      <p>A1: My wife and I are buying a foreclosure from a bank. In return for accepting a lower offer, they
want me to waive the inspection. I prefer to let the bank know that I would not waive the inspection …
Instead I would agree that I would pay the costs involved for a proper and thorough home
inspection…
A2: I am a foreclosure specialist in a bank which is subject to an inspection. FTC offered us to waive
inspection if we can identify our potential problems with customers we lent money to buy their
properties.</p>
      <p>The reader can see that A2 is totally irrelevant while A1 is relevant.</p>
      <p>We selected the first Google search result for the correct answer and composed a totally irrelevant
answer with the same keywords to demonstrate the role of discourse-level generalization.</p>
      <p>The list of common keywords gives us a hint that both documents are about a relationship between
the same entities, a house, a buyer and a bank in connection to a foreclosure and an inspection.
However one can see that the relations between these entities in A1 and A2 are totally different. It is
also obvious that something beyond the keyword statistics and n-gram analysis needs to be done to
figure out the correspondence of the structure of these relations between A1 and Q, and A2 and Q.
Buy, foreclosure , house, bank, wave, inspection..</p>
      <p>One can see that the key for the right answer here is rhetorical (RST) relation of contrast: bank
wants the inspection waved but the buyer does not. Parse thicket generalization gives the detailed
similarity picture that looks more complete than both the bag-of-words approach and pair-wise
sentence generalization would. The similarity between Q and A1 is expressed as a parse thicket
expressed here as a list of phrases
[[NP [DT-a NN-bank ], NP [NNS-problems ], NP [NN*-property ], NP [PRP-i ]], VP [VB-* TO-to
NN-inspection ], VP [NN-bank VB-offered PRP-* TO-to VB-waive NN-inspection ], VP [VB-*
VBidentify NNS-problems IN-* NN*-property ], VP [VB-* {phrStr=[], roles=[A, *, *], phrDescr=[]}
DT-a NN-* ]]]
And similarity with the invalid answer A2 is expressed as a parse thicket formed as a list of phrases
[[NP [DT-a NN-bank ], NP [PRP-i ]], [VP [VB-* VB-buying DT-a ], VP [VB-* PRP-me TO-to
VBwaive NN-inspection ], VP [VB-* {phrStr=[], roles=[], phrDescr=[]} PRP-i MD-* RB-not VB-* DT-*
NN*-* ],</p>
      <p>The important phrases of the Q ^ A1 similarity are VP [NN-bank VB-offered PRP-* TO-to VB-waive
NN-inspection], VP [VB-* VB-identify NNS-problems IN-* NN*-property],
which can be interpreted as a key topic of both Q and A1: bank and not another entity should offer to
waive inspection. This is what differentiates A1 from A2 (where these phrases are absent). Although
bank and problems do not occur in the same sentences in Q and A1, they were linked by anaphora and
RST relations. Without any kind of discourse analysis, it would be hard to verify whether the phrases
containing bank and problems are related to each other. Notice that in A2, problems are associated with
customers, not banks, and different rhetoric relations from those common between Q and A1 help us
figure that out. Notice the semantic role attributes for verbs such as VB-* {phrStr=[], roles=[A, *, *],
phrDescr=[]} in generalization result.</p>
      <p>Parse thickets for Q, A1 and A2 are shown in Fig. 1a, 1b and 1c respectively. Notice the similarity in
discourse structure of Q, A1 and not in A2: the RST-contrast arc. Also, there is a link for a pair of
communicative actions for Q, A1 (it is absent in A2): afraid-call and accept-want.</p>
      <sec id="sec-2-1">
        <title>4 Conclusions and Future Work</title>
        <p>The generalization operation described earlier can be applied for the expanding of the training sets.
It can multiply tail cases, make it more balanced, and eliminate noisy samples which cannot be
generalized. We are planning to apply it in the number of the industrial linguistic applications with
limited training datasets.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Acknowledgements References</title>
        <p>The article was prepared within the framework of the HSE University Basic Research Program and
funded by the Russian Academic Excellence Project '5-100'.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Galitsky</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabor</surname>
            <given-names>Dobrocsi</given-names>
          </string-name>
          , Josep Lluis de la Rosa,
          <year>2012</year>
          .
          <article-title>Inferring the semantic properties of sentences by mining syntactic parse trees</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <fpage>V81</fpage>
          -82 pp
          <fpage>21</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Galitsky</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Matching parse thickets for open domain question answering</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          , v 107, pp.
          <fpage>24</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Robinson</surname>
            <given-names>JA.</given-names>
          </string-name>
          <article-title>A machine-oriented logic based on the resolution principle</article-title>
          .
          <source>Journal of the Association for Computing Machinery</source>
          ,
          <volume>12</volume>
          :
          <fpage>23</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Plotkin</surname>
            ,
            <given-names>GD.</given-names>
          </string-name>
          <article-title>A note on inductive generalization</article-title>
          . In B. Meltzer and D. Michie, editors,
          <source>Machine Intelligence</source>
          , volume
          <volume>5</volume>
          , pages
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          . Elsevier North-Holland, New York,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Khardon</surname>
            , Roni and
            <given-names>Marta</given-names>
          </string-name>
          <string-name>
            <surname>Arias</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>The subsumption lattice and query learning</article-title>
          .
          <source>Journal of Computer and System Sciences. v 72</source>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>1</given-names>
          </string-name>
          ,
          <year>February 2006</year>
          , pp
          <fpage>72</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            , Tomas, Chen, Kai, Corrado;
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <source>Dean; Jeffrey</source>
          .
          <year>2015</year>
          .
          <article-title>Computing numeric representations of words in a high-dimensional space</article-title>
          .
          <source>US Patent 9</source>
          ,
          <issue>037</issue>
          ,
          <fpage>464</fpage>
          ,
          <string-name>
            <surname>Google</surname>
          </string-name>
          , Inc.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Marcu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2000</year>
          .
          <article-title>Rhetorical Parsing of Unrestricted Texts. Computational Linguistics V 2 N3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Ettinger</surname>
          </string-name>
          , Allyson, Sudha Rao,
          <string-name>
            <surname>Hal Daumé</surname>
            <given-names>III</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emily Bender</surname>
          </string-name>
          .
          <source>Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task</source>
          .
          <year>2017</year>
          . EMNLP, Vancouver, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kovalerchuk</surname>
            ,
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kovalerchuk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Toward Virtual Data Scientist with Visual Means</article-title>
          . IJCNN.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Makhalova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ilvovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Galitsky</surname>
          </string-name>
          .
          <article-title>Pattern Structures for News Clustering</article-title>
          .
          <source>FCA4AI@ IJCAI</source>
          ,
          <fpage>35</fpage>
          -42
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          , Pattern Structures and
          <string-name>
            <given-names>Their</given-names>
            <surname>Projections</surname>
          </string-name>
          .
          <source>ICCS 2001</source>
          . Vol.
          <volume>2120</volume>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Fitting Pattern Structures to Knowledge Discovery in Big Data</article-title>
          .
          <source>ICFCA 2013</source>
          . Vol.
          <volume>7880</volume>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Scalable Knowledge Discovery in Complex Data with Pattern Structures</article-title>
          .
          <source>PReMI'2013</source>
          . Vol.
          <volume>8251</volume>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>