=Paper= {{Paper |id=Vol-2180/paper-37 |storemode=property |title=Seq2RDF: An End-to-end Application for Deriving Triples from Natural Language Text |pdfUrl=https://ceur-ws.org/Vol-2180/paper-37.pdf |volume=Vol-2180 |authors=Yue Liu,Tongtao Zhang,Zhicheng Liang,Heng Ji,Deborah McGuinness |dblpUrl=https://dblp.org/rec/conf/semweb/LiuZLJM18 }} ==Seq2RDF: An End-to-end Application for Deriving Triples from Natural Language Text== https://ceur-ws.org/Vol-2180/paper-37.pdf
       Seq2RDF: An end-to-end application for
     deriving Triples from Natural Language Text

    Yue Liu, Tongtao Zhang, Zhicheng Liang, Heng Ji, Deborah L. McGuinness

          Department of Computer Science, Rensselaer Polytechnic Institute

        Abstract. We present an end-to-end approach that takes unstructured
        textual input and generates structured output compliant with a given
        vocabulary. We treat the triples within a given knowledge graph as an
        independent graph language and propose an encoder-decoder framework
        with an attention mechanism that leverages knowledge graph embed-
        dings. Our model learns the mapping from natural language text to triple
        representation in the form of subject-predicate-object using the se-
        lected knowledge graph vocabulary. Experiments on three different data
        sets show that we achieve competitive F1-Measures over the baselines
        using our simple yet effective approach. A demo video is included.


1     Introduction
Converting free text into usable structured knowledge for downstream applica-
tions usually requires expert human curators, or relies on the ability of machines
to accurately parse natural language based on the meanings in the knowledge
graph (KG) vocabulary. Despite many advances in text extraction and seman-
tic technologies, there is yet to be a simple system that generates RDF triples
from free text given a chosen KG vocabulary in just one step, which we consider
an end-to-end system. We aim to automate the process of translating a natural
language sentence into a structured triple representation defined in the form of
subject-predicate-object, s-p-o for short, and build an end-to-end model
based on an encoder-decoder architecture that learns the semantic parsing pro-
cess from text to triple without tedious feature engineering and intermediate
steps. We evaluate our approach on three different datasets and achieve com-
petitive F1-measures outperforming our proposed baselines, respectively. The
system, data set and demo are publicly available12 .

2     Our Approach
Inspired by the sequence-to-sequence model[5] in recent Neural Machine Trans-
lation, we attempt to use this model to bridge the gap between natural lan-
guage and triple representation. We consider a natural language sentence X =
[x1 , . . . , x|X| ] as a source sequence, and we aim to map X to an RDF triple
Y = [y1 , y2 , y3 ] with regard to s-p-o as a target sequence that is aligned with
1
    https://github.com/YueLiu/NeuralTripleTranslation
2
    https://youtu.be/ssiQEDF-HHE
a given KG vocabulary set or schema. Given DBpedia for example, we take a
large amount of existing triples from DBpedia as ground truth facts for training.
Our model learns how to form a compliant triple with appropriate terms in the
existing vocabulary. Furthermore, the architecture of the decoder enables the
model to capture the differences, dependencies and constraints when selecting
s-p-o respectively, which makes the model a natural fit for this learning task.

      Lake    George          is          at      the    southeast            base   of   the   Adirondack Mountains



                                                        Bi-directional LSTM



                                                           Concatenate




     Encoder




               




             dbr:Lake_George_(New_York)                       dbo:country                       dbr:Adirondacks

                  dbr:George_Lake                            dbo:birthplace                 dbr:Adirondack_Mountains
                                                             dbo:location
              dbr:Lake_George_(Florida)                                                     yago:Mountain109359803
                                                              dbo:isPartOf
         dbr:Lake_George_(New_South_Wales)                                                  dbr:Whiteface_Mountain

     Decoder


Fig. 1: Model Overview. Three colors (red, yellow, blue) represent the active attention
during s-p-o decoding respectively. We currently only generate a single triple per
sentence, leaving the generation of multiple triples per sentence for future work.

As shown in Figure 1, the model consists of an encoder taking in a natural
language sentence as sequence input and a decoder generating the target RDF
triple. The model pursues the maximized conditional probability
                                                              3
                                                              Y
                                               p(Y |X) =              p(y|y