<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Variational Entity Set Expansion for Automatically Populated Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pushpendre Rastogi</string-name>
          <email>pushpendre@jhu.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Poliak</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vince Lyzinski</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Van Durme</string-name>
        </contrib>
      </contrib-group>
      <fpage>49</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>We propose Neural Variational Set Expansion to extract actionable information from a noisy knowledge graph (KG) and propose a general approach for increasing the interpretability of recommendation systems. We demonstrate the usefulness of applying a variational autoencoder to the Entity Set Expansion task based on a realistic automatically generated KG.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Imagine a physician trying to pin-point a speci c diagnosis or a journalist investigating abuses of governmental
power. In both scenarios, a domain expert may try to nd answers based on prior known, relevant entities {
either a list of diagnoses of with similar symptoms that a patient is experiencing or a list of known conspirators.
Instead of manually looking for connections between potential answers and prior knowledge, a searcher would
like to rely on an automatic Recommender to nd the connections and answers for them, i.e. related entities.</p>
      <p>In the information retrieval (IR) community, Entity Set Expansion (ESE) is the established task of
recommending entities that are similar to a provided seed of entities. The physician and journalist in our example can
not fully take advantage of IR advances in ESE for two main reasons. Recent advances 1) often assume access
to a clean, large Knowledge Graph and 2) are uninterpretable.</p>
      <p>
        Many advanced ESE algorithms rely on manually curated, clean Knowledge Graphs (KG). In real-world
settings, users rarely have access to clean KGs, and instead may rely on automatically generated KGs. Such
KGs are often noisy because they are created from complicated and error-prone NLP processes. For example,
automatic KGs may include duplicate entities, associations (relations) between entities may be missing, and
entities with similar names may be incorrectly disambiguated. These imperfections prevent machine learning
approaches from performing well on automatically generated KGs. Furthermore, many ESE algorithms degrade
as the sparsity and unreliability of KGs increases [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Advanced ESE methods, especially those that rely on
neural networks, are uninterpretable [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. If a physician can not explain decisions, patients may not trust her and
if a journalist can not demonstrate how a certain individual is acting unethically or above the law, a resulting
article may lack credibility. Furthermore, uniterpretability may limit the applications of advancements in IR,
and more broadly arti cial intelligence, as humans \won't trust an A.I. unless it can explain itself."1
      </p>
      <p>We introduce Neural Variational Set Expansion (NVSE) to advance the applicability of ESE research. NVSE
is an unsupervised model based on Variational Autoencoders (VAEs) that receives a query, uses a Bayesian
approach to determine a latent concept that uni es entities in the query, and returns a ranked list of similar
entities based on the previously determined uni ed latent concept. NVSE does not require supervised examples of
queries and responses, nor pre-built clusters of entities. Instead, our method only requires sentences with linked
entity mentions, i.e. spans of token associated with a KG entity, often included in automatically generated KGs.
Copyright c by the paper's authors. Copying permitted for private and academic purposes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craswell</surname>
          </string-name>
          , N.:
          <article-title>Neural Models for Information Retrieval</article-title>
          . ArXiv e-prints (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Pujara</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustine</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Sparsity and noise: Where knowledge graph embeddings fall short</article-title>
          .
          <source>In: EMNLP</source>
          . (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Rastogi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyzinski</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>Van Durme</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Vertex nomination on the cold start knowledge graph</article-title>
          .
          <source>Technical report, Human Language Technology Center of Excellence</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>