<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Graphs and Commonsense Knowledge improve the Dialogue Reasoning Ability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minglei Gao</string-name>
          <email>minglei_gao3@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sai Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <email>zyfeng@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenhuan Lu</string-name>
          <email>wenhuan@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Shenzhen Research Institute of Tianjin University, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin 300350</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Retrieving responses is a subtask in the dialogue system. The focus of the existing methods is semantic matching, but the reasoning ability is insufficient. The implicit feature association between the context and the candidate responses was not discovered. And this implicit feature association is precisely the key to realize reasoning. In this paper, we propose a new approach based on commonsense knowledge combined with graph features. We are using the advantages of graph structure in reasoning, putting the context and candidate responses in the same graph, and using commonsense knowledge to explicitly show the associated features, thereby improving the dialogue system's reasoning ability. Experiments show good performance through the effective combination of commonsense knowledge and graph structure.</p>
      </abstract>
      <kwd-group>
        <kwd>Retrieval Responses Reasoning GCN</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Retrieving responses is an important approach in the dialogue system. Its goal is
to choose the most suitable response based on the given context. The retrieved
responses are often fluent and natural, with abundant information. The success
of the retrieval can make the dialogue proceed more accurately and smoothly
and can better enhance the user experience.</p>
      <p>
        Previous work mainly concentrated on the matching relationship between
context and candidate responses. It is well known that neural networks can learn
multiple levels of rich information in semantics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. But, reasoning ability and
the ability to capture commonsense information are insufficient [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Reasoning
needs to learn key semantic features and perform effective reasoning based on the
relationship between features. However, the feature information in the contextual
semantics is not enough to support effective reasoning. It is very interesting to
explore.
      </p>
      <p>Copyright c 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
In this paper, we propose a graph reasoning model based on commonsense
knowledge. Specifically, by extracting the key information in the context and
candidate responses, and using commonsense knowledge to expand the key
information explicitly, and using the grammatical information of the text to construct
a graph, effective reasoning on the structured graph information can improve the
reasoning ability of the model. By adding commonsense knowledge to the graph
structure, an effective connection can be established between the context and
the candidate response, which is more helpful for reasoning. The experimental
results show that our model’s reasoning ability has been dramatically improved
with the help of graph structure and commonsense knowledge.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Our Approach</title>
      <sec id="sec-2-1">
        <title>Problem Definition</title>
        <p>Given a dataset D = f(U; C)g, where U represents dialogue context U =
fu1; u2; : : : ; ung. And C = fc1; c2; c3; c4g, and ci is a response candidate. The
model is excepted to learn a function f (U; ci), which can evaluate the relevance
between U and ci.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Reasoning Graph</title>
        <p>We construct the context and candidate responses into a graph. Specifically,
the Grammar Parsing tool is used to analyze the context information and the
candidate response. By merging the co-occurring word nodes in the context and
the response, an effective connection is established on the graph. In addition,
the ConceptNet knowledge base is used to find nodes related to the original
node. ConceptNet is an extensive knowledge base of commonsense, containing
a large number of nodes and relationships. We mark each context by part of
speech, select verbs and nouns as key nodes, query the nodes around the nodes
in ConceptNet, and add them to the graph. In particular, we delete the nodes
whose credibility weight is less than 1 to get a complete analysis graph. With the
help of ConceptNet, graph nodes have obtained richer commonsense information,
which further helps improve reasoning ability.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Model Overview</title>
        <p>
          The overall model is exhibited in Figure 1. Our model includes a semantic
representation module, and a graph structure representation module. The
semantic representation module uses the Xlnet [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which is a pretrained model, and
the graph structure representation module uses a Graph Convolutional
Neural Network [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. GCN can integrate and learn the feature information of nodes
and connected nodes, making the feature representation of nodes more
abundant. It is used to integrate expanded commonsense knowledge into our core
nodes. Specifically, the feature representation of the node is obtained from
Xl
        </p>
        <p>ConceptNet</p>
        <p>GCN</p>
        <p>results</p>
        <p>Attention
 1
 2
    −1</p>
        <p>...</p>
        <p>...</p>
        <p>Xlnet
n
o
i
t
a
t
n
e
s
e
r
p
e
r
e
d
o
N</p>
        <sec id="sec-2-3-1">
          <title>Grammar Parsing</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Article + [SEP] + Options + [SEP]</title>
          <p>net’s context representation. The pre-training language model will fully consider
the information of the token context for the representation of each token, so the
representation is more accurate.</p>
          <p>hk =</p>
          <p>W X
ni2gk jgkj
1
hni
!
Where gk = fn0; ; ntg represents the connected nodes of the k-th node, jgkj is
the total number of the connected nodes, hni is the representation of the token
ni, W 2 Rd k is the weight matrix.</p>
          <p>To make the characteristics between nodes more obvious, the information of
other nodes near the node is aggregated. Where l is the the layer of GCN. The
zil represents the aggregated results. Ni represents all connected nodes of the
i-th node, and hlj is the j-th node representation. So far, we have obtained the
neighbor information and updated node representation hli+1.</p>
          <p>zil = X
1</p>
          <p>hlj
j2Ni jNij
hl+1 =
i</p>
          <p>W lhli + zil
hc</p>
          <p>W1hli
i = Pj2N hc</p>
          <p>W1hlj</p>
          <p>In addition, we have added an attention mechanism to learn the importance
of nodes. With the help of attention, the ability of model reasoning can be
effectively improved.
(1)
(2)
(3)
(4)
hg = X</p>
          <p>ilhli
i2N
(5)
where hc is the representation of the context, hg is the graph representation.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        We test our proposed model on MuTual [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. MuTual is constructed based on
listening test data. We use these evaluation indicators R@1, R@2 and MRR.
R@1 and R@2 are the recall at position 1 and 2 in 4 candidates, and MRR is
the Mean Reciprocal Rank.
      </p>
      <p>Our experimental comparison mainly includes the original Xlnet model, only
commonsense knowledge node information without graph structure, and our
model. The experimental results are as follows. We can see that when there is
only commonsense knowledge without graph structure information, the model’s
performance is deteriorating compared with the original Xlnet model. Because
the commonsense knowledge is isolated information without mutual connection,
it is equivalent to introducing more noise. These introduced noises will increase
the difficulty of model feature extraction, which leads to the deterioration of
the model effect. When the graph structure is added, the isolated points are
effectively combined, and the reasoning ability of the model is improved by
learning the relationship between the nodes.</p>
      <p>0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0</p>
      <p>Origin Node + Origin Our Model
MRR
In this paper, we propose a new approach based on commonsense knowledge
combined with graph reasoning to solve dialogue reasoning problems. The text
is constructed into graphs through grammatical analysis, combined with the
expansion of commonsense knowledge to make the relevance more obvious. In
this way, the relationship between the context and the candidate response is
more obvious, and the superiority of the graph in reasoning is fully utilized to
enhance the reasoning ability of the model. In future work, we will try more
variations model to test our approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>What does BERT learn about the structure of language? Proceedings of the 57th Conference of the Association for Computational Linguistics</article-title>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2019</year>
          , Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <fpage>3651</fpage>
          -
          <lpage>3657</lpage>
          . ACL, Italy, (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.X.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Wu</surname>
          </string-name>
          , H.:
          <article-title>Multi-turn response selection for chatbots with deep attention matching network</article-title>
          .
          <source>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</source>
          , Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <fpage>1118</fpage>
          -
          <lpage>1127</lpage>
          . ACL, Melbourne, Australia (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Carbonell, J.G.,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          .
          <source>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems</source>
          <year>2019</year>
          , pp.
          <fpage>5754</fpage>
          -
          <lpage>5764</lpage>
          , NeurIPS, Vancouver, BC, Canada (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Defferrard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bresson</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandergheynst</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks on graphs with fast localized spectral filtering</article-title>
          .
          <source>Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems</source>
          <year>2016</year>
          , pp.
          <fpage>3837</fpage>
          -
          <lpage>3845</lpage>
          , Barcelona, Spain (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Mutual: A dataset for multi-turn dialogue reasoning</article-title>
          .
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>1406</fpage>
          -
          <lpage>1416</lpage>
          . ACL,
          <string-name>
            <surname>Online</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>