<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Knowledge Graphs from Unstructured Texts: Experiences in the E-commerce Field for Question Answering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diogo Teles Sant'Anna</string-name>
          <email>diogoteles08@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodrigo Oliveira Caus</string-name>
          <email>rodrigo.caus@students.ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucas dos Santos Ramos</string-name>
          <email>lrsantostw@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Hochgreb</string-name>
          <email>victor@gobots.com.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Cesar dos Reis</string-name>
          <email>jreis@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GoBots</institution>
          ,
          <addr-line>Campinas, SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computing, University of Campinas</institution>
          ,
          <addr-line>Campinas - SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, there is a growing number of sales occurring over the Web in e-commerce stores. Customers often have questions about a product before they buy it. By answering them instantly, online stores can improve user experience, customer's satisfaction and sales conversion rate. E ective and automated customer service via computer systems requires the handling of large amounts of unstructured information like product speci cations. In this paper, we de ne and evaluate a technique to generate knowledge graphs (KGs) by extracting relevant product information from unstructured natural language questions and answers. The knowledge encoded in the KG is used to answer new clients' questions via SPARQL requests. Our solution is evaluated in a real world, using data from online stores in the GoBots, a leading e-commerce chatbot business in Latin America. Obtained results show the bene ts of exploring KGs for responding a higher spectrum of questions in real-world settings.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>e-commerce</kwd>
        <kwd>automatic question answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>E-commerce stores have a keen interest in automatically understanding questions
asked about their products. This can help in generating immediate and accurate
answers for their customers to quickly and e ciently serve them, and also lower
the cost of a large team to answer questions manually.</p>
      <p>The answer to e-commerce questions must be highly accurate, because
providing the wrong information to the customer can have unintended consequences.
For example, if the customer asks if the tire works in their car and the system
Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
mistakenly answers yes, the customer will buy the product, but will have to
return it later. Moreover, if the system erroneously replies that it does not work,
the customer will stop buying a product they might have purchased. To answer
this question automatically, the system requires speci c knowledge of whether
the product is or not compatible with the car. Usually, there is a great volume
of questions concerning compatibility of products in e-commerce sales.</p>
      <p>Currently, GoBots3 performs automatic processing of natural language (NL)
questions through techniques that need to ful ll several requirements including:
1) understanding sentence intent and entities; 2) constructing vocabularies to
handle synonymous terms; and 3) generating structured rules to address similar
cases. However, several questions cannot be automatically answered due to lack
of speci c knowledge about products, specially questions regarding compatibility.
Our purpose is to create a solution based on knowledge graphs to answer those
type of questions and therefore complement the existent system.</p>
      <p>In this paper, we investigate a solution to extract and structure knowledge
about compatibility of products. We assume that the knowledge needed to
answer untreated questions is found in pairs of questions and answers already
answered by human attendants. In this sense, our solution explores as source of
knowledge existing pairs of customer's questions and attendant's answers. Both
components of the Q&amp;A (question and answer) pair are in natural language and
present unstructured information. Our solution extracts knowledge and structure
it into a knowledge graph (KG) based on a de ned domain ontology. The
construction of KG adds great value because it structures information and enables
further reuse and query over it.</p>
      <p>Our approach explores the detection of entities and intentions from input
questions and answers to create Resource Description Framework (RDF) triples.
Our solution generates a triple store, which is used in the GoBots computational
environment to help automatically answering new clients' questions. The existing
GoBots solution searches for speci c product information on the generated KG
via API connections.</p>
      <p>The evaluation assessed the quality of the generation of the KG for a speci c
domain of products and the usability of the structured knowledge to answer new
questions. We use real-world data from e-commerce businesses and applied the
solution to answer real-time questions to understand the bene ts of the proposal.
Obtained results with quality measurements indicate the promising e ectiveness
of our methods. Our experimental results were focused in the automotive domain,
but our solution is applicable and extensible for other domains.</p>
      <p>The remainder of this article is organized as follows: Section 2 presents the
related work. Section 3 reports on our approach to generate a KG from pairs of
questions and answers in NL texts and its use in the deployed KG services for
obtaining answers from the KG. Section 4 shows the experimental evaluation
with the results obtained. Section 5 discusses the ndings whereas Section 6
draws conclusions and future work.
3 O cial website: https://gobots.ai
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Automated answering solutions for e-commerce is an area with a lot of research
due to the high relevance of this type of solution to the market. Shiqian et
al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed a new framework for automatic response generation
considering product-related questions through user reviews. In this framework, called
RAGE, the relevant revisions made about a product are extracted based on
machine learning techniques. The information is incorporated to guide response
generation. Results obtained from the solution were applied to real data from
online stores. Although this work proposed the generation of answers
automatically, they did not investigate the structuring of knowledge about the products.
      </p>
      <p>
        Research on the construction and use of KGs has intensi ed. Ho ner et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
conducted an extensive investigation into the advances and challenges of using
KGs for creation of Q&amp;A systems. KG queries have been studied to answer
questions considering structured facts expressed in the knowledge base. They
complement existing work with 72 publications about 62 systems developed from
2010 to 2015. Then they identi ed challenges faced by those approaches and
collected solutions for them from the 72 publications. Finally, they provided
recommendations on how to develop future Semantic Question Answering systems.
      </p>
      <p>
        Kertkeidkachorn and Ichise [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] introduced a framework called T2KG to create
a KG automatically from natural language texts. In the T2KG, entities from
natural language context are mapped to the corresponding uniform resource
identi er (URI) in the KG, which are usually the subject or object of triples.
In a new approach, a rule-based and a similarity-based technique are combined
for mapping the predicate of a triple generated from text to its corresponding
predicate in an existing KG. The experimental results demonstrated that T2KG
can successfully generate a KG and populate an existing one with new knowledge
from text. However, the framework performs poorly when mapping predicates
containing many composite words.
      </p>
      <p>
        Along these lines, Hao et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] explored neural network-based classi ers to
represent questions and their candidate answers retrieved from knowledge bases.
In their work, di culties of using current query languages, such as SPARQL, are
presented. They argue that users need not only to be familiar with the particular
language grammars, but also to be aware of the architectures of the knowledge
base they are querying. By contrast, they present a question answering system
relying on knowledge, which takes NL as query language, being a more
userfriendly solution.
      </p>
      <p>Our literature analysis indicates that there are improvements and
contributions on both the elds of automatic question answering, and the usability and
construction of KGs. Our work contributes with a technique to generate KGs
from unstructured NL sentences organized in Q&amp;A pairs on a e-commerce
context. The method is able to extract speci c information about products of a
given domain, and then uses it to answer new questions with a high precision.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Developed approach and software tool</title>
      <sec id="sec-3-1">
        <title>System overview</title>
        <p>
          Formally, a KG, G = (V; E), is a labeled directed graph with nodes
representing entities such as \Barack Obama" and edges representing relations between
entities, e.g., \Barack Obama\ \isPresidentOf" \United States" [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. KGs often
assume the form of RDF triples in a way that G = ft1; t2; :::; tng. A triple is
de ned as t = (s; p; o) where \s", \p" and \o" are called respectively subject,
predicate and object of the triple. The predicate connects the subject to the
object. In the example, \Barack Obama" is the subject, \isPresidentOf" is the
predicate and \United States" is the object of the triple t = (Barack Obama,
isPresidentOf, United States). The meaning of resources in subject, predicate and
object is encoded in a prede ned ontology (cf. Subsection 3.2).
        </p>
        <p>
          The GoBots Service is capable of querying the KG Service to
automatically answer questions (b in Figure 1) that rely on speci c knowledge stored in
the RDF Store. For this purpose, the KG Service is responsible to formulate
SPARQL Queries [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] (c in Figure 1) to retrieve knowledge from the RDF Store.
The found knowledge is sent back to the GoBots Service to formulate the proper
answer for the e-commerce question.
        </p>
        <p>The query requested by the GoBots Service consists of a structured
representation of the components of the question asked by a customer. The key
knowledge encoded in our KG refers to compatibility knowledge between
products and consumer item. Then, the request contains all needed information to
identify the product and a consumer item (cf. Subsection 3.2). The following
information is sent to the KG service:
{ the ID of the product;
{ set of attributes that characterizes the consumer item;
{ the intent of the question.</p>
        <p>The SPARQL Query formulation is managed in the Query Construction
component (c in Figure 1). It encodes entry data in the structure of subjects,
predicates and objects, based on the ontology created for supporting question
answering in the e-commerce domain (j in Figure 1) (cf. Subsection 3.2). The
component manages the SPARQL result and the KG Service returns as follows:
{ An indicative signalizing whether was found the knowledge of a compatibility
between the given product and the consumer item as input.
{ Which kind of compatibility was found relating the product and the
consumer item (cf. Subsection 3.2).
{ The complete attendant's answer given to the question which originated the
knowledge retrieved.</p>
        <p>In an independent and asynchronous process, the KG is updated based on
a set of questions and answers handled by human attendants. A question may
not be automatically answered by any of the GoBots services (d in Figure 1),
so the answer manually provided by a human attendant for such question, along
with the product information (e and f in Figure 1), are stored in a database for
future processing.</p>
        <p>In the process of triple creation (g in Figure 1) (cf. Subsection 3.3), the KG
Service is responsible to process the stored set of questions answered manually
by human attendants. This is a key source of knowledge to update the KG and
answer similar new incoming questions. This knowledge extraction process relies
on the extraction of intentions and entities (h in Figure 1) from each Q&amp;A pair
to structure the knowledge via RDF triples (i in Figure 1). SPARQL Updates
are used to insert RDF triples into the KG, implemented using a RDF Store.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Ontology for products compatibility representation</title>
        <p>At this stage, we present the de ned ontology used to explicitly handle meaning
in our KG. The ontology was created based on the e-commerce domain motivated
by compatibility issues between products.</p>
        <p>
          An ontology speci es a conceptualization of a domain in terms of attributes,
relationships and concepts[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], the latter referred in this work as classes. Figure 2
presents an overview of the key classes and relationship in our ontology de ned
to address product compatibility issues.
        </p>
        <p>The Product class represents a product on e-commerce. The ID is an identi er
of the product. The ontology has di erent attributes to represent di erent kinds
of IDs, such as EAN, which universally identi es products. The universal factor of
the ID enables the knowledge to be reused for products of di erent e-commerces.</p>
        <p>ConsumerItem is an abstract class with the purpose to identify items (owned
by costumers) that may have some integration with products of the e-commerce.
A subclass of ConsumerItem must have proper attributes that identify them.
More speci cally, a subclass must present a subset of attributes that uniquely
identi es an Item. The importance of this subset is detailed in subsection 3.3
(concerns on minimum entities required). As an example, we present the
ConsumerItem subclass named Car, representing the automotive domain, which has
the attributes model, brand and year. The subset of attributes that uniquely
identi es a car is model and year.</p>
        <p>The concept of compatibility is stored through relations involving the class
Compatibility. The usage of the compatibility as a class makes easier to store
metadata about the relation between the product and the ConsumerItem. For
example, the date the knowledge was retrieved, the e-commerce store selling the
product, the question it was retrieved from and the complete answer given by
the attendant.</p>
        <p>Besides considering metadata from the relation between the product and
the ConsumerItem, our ontology also di erentiates the compatibilities between
themselves throught the concept we call compatibility type. The compatibility
type indicates whether the compatibility is a rmative, negative or conditional
for example, which are represented on the ontology as FullCompatibility,
NoCompatibility and ConditionalCompatibility. More speci cally, those types are
disjoint subclasses of the class Compatibility, which actually is used as an
abstract class. So every object of Compatibility must be object of some of the
compatibility types.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Automatic KG Generation</title>
        <p>Formally, let P be a product of an e-commerce store; id a unique identi er of
P . Let S = f(id1; q1; a1); :::; (idn; qn; an)g be a set of question and answer pairs
about a product, in which qi is the question, and ai the answer number i about
the product with idi. Also, let O be an ontology about the domain in which
product P is inserted. We aim to automatically generate a KG G, represented
by a set of (s; p; o) triples, which expresses knowledge about the product P
according to the ontology O.</p>
        <p>For example, let P be an armrest with a unique identi er id = 108093.
Consider a set of question and answer pairs about P , such that, S = f (108093,\Good
Morning. Does this armrest t in Ford Fiesta Sedan year 2012 Rocam model?",
\The advertised product is compatible")g. Figure 3 presents this example
instantiating the ontology (cf. subsection 3.2) to generate RDF triples. In this
example, 17 triples were inserted in the KG (12 of them are shown in Figure 3).</p>
        <p>Our solution for KG generation recognizes entities and intents (h and i in
Figure 1) in sentences from questions and answers (e in Figure 1). This
recognition is based on a machine learning model trained with examples of numerous
entities and intents from existing sentences in the GoBots data environment.</p>
        <p>On the NLU context, an intent represents the purpose of a user's input, which
is always a sentence in natural language. Intents are a given name, often a verb or
a noun, that best describes the user's intention. For example, the sentence \Is this
product compatible with the car Ford Fusion 2019?", the intent "Compatibility"
is identi ed. Our solution explores a classi er trained with several manually
assigned examples of sentences with its corresponding intent.</p>
        <p>An entity represents a term or expression with a known meaning relevant
for the comprehension of the sentence. Entities have names and values. The
values are the words themselves, and names represent the meaning of the word.
For example, the sentence \Is this product compatible with the car Ford Fusion
2019?" encompasses the entities brand, model and year to identify the car. These
attributes receive the values \Ford", \Fusion" and \2019", respectively. Our
solution explores an entity extractor trained with names and values of entities
relevant on the domain of interest.</p>
        <p>Our solution de nes intents and entities for extraction from text based on
the information used to build our KG. More speci cally, our technique explores
question intents (from the question answer pair) to identify whether the question
refers to compatibility, so an intent \Compatibility" is used. In the compatibility
question, the client usually exposes the consumer item, with which the product
might or not have compatibility. The question entities aim at identifying the
attributes of the consumer item. In the running example, assuming a car as a
consumer item, the entities \brand", \model" and \year" are used to identify
attributes of the car.</p>
        <p>For the attendant answer from the question answer pair, the technique
explores the intent to evaluate the type of compatibility between the product and
the consumer item. In particular, the intents used are FullCompatibility,
NoCompatibility and ConditionalCompatibility, modeled in the ontology. Let us call this
set of intents Ivalid.</p>
        <p>The KG generation relies on the following functions:
1. extractEntity(F) that for a given sentence F , returns a set E = f(e1; v1; c1),
(e2; v2; c2), :::, (ep; vp; cp) g where p is the number of entities found in the
sentence; ei is the name of the i-th entity found in the sentence (e.g. \year");
vi is the value of the i-th entity found (e.g. \2010"); ci is the con dence about
the correctness of the i-th extraction [ranging from 0 to 1].
2. extractIntent(F) that given a sentence F , returns a pair (I; c), in which I
is the intent of the sentence F , and c is the con dence of the extraction.</p>
        <p>The solution requires the de nition of a set Emin as the minimal entities set
for a given domain. It is de ned as Emin = fe1; e2; :::; eng where each ei element
refers to an entity. These entities are the minimal ones required in a sentence
(as an expected pattern). The generation of triples relies on it by assuming
that such sentence carries meaningful knowledge. For our solution, let Emin =
fmodel; yearg be the set of minimal entities. In this sense, if the model and year
of the car are not provided, it is not possible to identify the car properly and
therefore it is not possible to generate knowledge about it.</p>
        <p>At this point, we de ne a minimum acceptable con dence threshold. The
extraction of entities or intents requires adequate con dence level to avoid
errors. Therefore, consider minConf idence the value of the threshold con dence.
Results with a con dence value below such value are ignored. For our solution,
let minConf idence = 0:8.</p>
        <p>Algorithm 1 automatically generates the KG G having as input a set of
questions and answers made about products as S = f(id1; q1; a1); :::; (idn; qn; an)g.
Triples are generated whether intent represents valid relation with a certain
con dence, and entities match the expected elements from the Emin set with a
certain con dence.</p>
        <p>Algorithm 1 Knowledge Graph Generation</p>
        <p>We present a full example to illustrate the KG generation procedure.
Assuming the following example as input instances, S = [[ID01, Does this tire ts the
palio 2014?, \Yes the product ts your car"], [ID02, \I have a ford ka 2013 sedan
can I buy the product?", \Unfortunately this product doesn't t in your car"]].
The execution of Algorithm 1 works as follows:
1. The algorithm takes the rst input of S, that is, input = [ID01, \Does this
tire ts the palio 2014?", \Yes the product ts your car"];
2. E receives the entities found by the entities extractor in the sentence
input.question. In this case, input.question = \Does this tire ts the palio
2014?". For this sentence, E = (model,\palio", 0.85), (year,\2014", 0.92);
3. Ique receives the intent found by the extractor of intents from the sentence
input.question. In this case, input.question = \Does this tire ts the palio
2014" obtains Ique = (Compatibility, 0.96);
4. Ians receives the intent found by the extractor of intents from the sentence
input.answer. In this case, input.answer = \Yes the product ts your car"
obtains Ians = (FullCompatibility, 0.94);
5. At this stage, if Emin E :names where both are fmodel, yearg; Ique =
Compatibility; Ians 2 Ivalid where Ivalid = f FullCompatibility,
NoCompatibility, ConditionalCompatibility g; Ique:conf idence = 0:96 minConf
idence; Ians:conf idence = 0:94 minConf idence; and each E :conf idence
minConf idence, that is, 0.85 minConf idence and 0.92 minConf
idence, where minConf idence = 0.8;
6. Among the classes in our Ontology (Figure 2), it nds the class that has
attributes with the same names as the entities in E :names. In this case, the
class found is Car ;
7. The solution instantiates a car and uses an unique identi er. Instances of
the same car (i.e., same \model" and \year" pair) would be assigned to the
same identi er. For example, let a instance have the identi er \car1";
8. Afterwards, the solution adds the attributes of that instance in G creating
triples. In our example, the triples (car1, model, \palio") and (car1, year,
2010) were created;
9. Algorithm also instantiates the class product and the class compatibility,
specifying their identi er and attributes according to the modeling in the
ontology. For example, let the instances have respectively the identi ers
\product1" and \compatibility1-1", the latter representing a compatibility
between \car1" and \product1";
10. The nal stage creates the basic triples used to represent the compatibility in
the KG and adds it to G. It obtains: (compatibility1-1, type,
FullCompatibility), (product1, hasCompatibility, compatibility1-1) and (compatibility1-1,
compatibleWith, car1);
11. Finally, it results in the following triples from our example: G = (car1, model,
\palio"), (car1, year, 2014), (product1, hasCompatibility,
compatibility11), (compatibility1-1, type, FullCompatibility), (compatibility1-1,
compatibleWith, car1), (car2, brand, \ford"), (car2, model, \ka"), (product2,
hasCompatibility, compatibility2-2), (compatibility2-2, type, NoCompatibility),
(compatibility2-2, compatibleWith, car2);
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <sec id="sec-4-1">
        <title>Quality of the generated KG evaluated with question and answer in the automotive e-commerce domain</title>
        <p>This analysis evaluates the processing of pairs of question and answers for the
generation of triples to populate the KG. The questions processed were about
compatibility between automotive products and customers' cars, and they were
all manually answered by store attendants. The triples are inserted according
to Algorithm 1 and structured based on the ontology de ned in Figure 2. Our
objective is to assess the capability of Algorithm 1 to generate KGs and the
quality of such result.</p>
        <p>
          In the experiments, we used the RASA NLU [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to extract intents and entities
from a given NL sentence through models previously trained. This framework
uses a word embedding model based on StarSpace [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to classify intents and
conditional random elds model to extract entities. For training RASA, data
from automotive e-commerce stores were collected from real-world operations in
the GoBots software environment, as follows:
{ List of 1,147 questions manually annotated with their respective intents for
training purpose. It was divided into 28 intents, where 407 questions were
classi ed as compatibility intent. The other intents were not considered to
the KG generation;
{ Sets of 2,054 car models, 132 car brands and 106 car years and some of their
respective synonyms to be extracted from customer's question as entities.
{ List of 274 real-world human attendant answers manually annotated with
the respective intents. It was divided into 13 intents, where 74 were classi ed
as FullCompatibility intent, 64 as NoCompatibility intent, and 43 as
ConditionalCompatibility intent. Other intents were used to classify answers, but
only Full Compatibility was used to generate knowledge at this stage of the
research.
        </p>
        <p>The solution was evaluated over a ltered range of questions to avoid
processing questions that certainly cannot contribute with knowledge in our current
solution. The complete input set S presented 25,383 pairs of question and
answers. Our evaluation used all input questions that were received by one speci c
automotive e-commerce store between January/2020 and April/2020 that match
with the following conditions:
1. The question was evaluated with the intent Compatibility with a con dence
higher than 0.8.
2. The question was answered by an attendant from the e-commerce store.</p>
        <p>The Algorithm 1 populated the KG with the following con guration:
{ The complete input set of question and answer pairs S.
{ The question intent Ique = Compatibility.
{ The minimal question entities set Emin = fmodel, yearg.
{ The answer valid intent set Ivalid = fFullCompatibilityg.
{ Con dence threshold minConf idence = 0:84.</p>
        <p>Considering the whole data set S as input, 1,744 Q&amp;A pairs generated
knowledge encoded in the KG5. In total, 20,289 new triples were added, composed by
1,534 compatibilities between products and car.</p>
        <p>In order to understand the quality of KG generation over the whole data set
S, we applied a manual evaluation on 600 Q&amp;A pairs randomly selected from
S. Three researchers (co-authors in this paper) participated on the evaluation,
evaluating 200 pairs each. We inspected Algorithm 1 execution log to evaluate
the correctness of the KG generation. For this, we determined two measures:
{ among the Q&amp;A pairs that generated triples, which ones generated correct
triples;
4 This minimal value of con dence was de ned based on internal assessments of the
solution that presented better results.
5 A sample of the produced KG can be accessed at https://rodrigocaus.github.io/
ecommerce-kgqa.
{ among the Q&amp;A pairs that did not generate triples, which ones actually had
enough information to generate triples, and so should have been used to do
so.</p>
        <p>In order to obtain those measures, we made a manual evaluation inferring
which Q&amp;A pairs ideally should generate knowledge about FullCompatibility and
which knowledge should be generated. This judgment was made without
considering the NLU results. We considered the question and answer texts, judged if
the question describes a Car with a valid model and year, and if the attendants'
answer implies that the product is compatible. Let a relevant Q&amp;A pair be that
in which it is possible to determine both valid Car and FullCompatibility intent.</p>
        <p>This evaluation enables to determine two metrics: Precision and Miss Rate.
Precision is de ned as the relation between question and answer pairs that
generated correct triples by all those pairs that generated triples (cf. Formula 1).
This metric signalizes the reliability of our solution.</p>
        <p>P recision =
#Q&amp;AGeneratedCorrectT riples</p>
        <p>#Q&amp;AGeneratedT riples</p>
        <p>Miss Rate is the relation between relevant Q&amp;A pairs that did not generate
any triples, and all relevant Q&amp;A pairs (cf. Formula 2). This metric represents
the amount of knowledge that should be added to the KG, but it was not.</p>
        <p>M issRate =
#RelevantQ&amp;ADidN otGenerateT riples</p>
        <p>#RelevantQ&amp;A</p>
        <p>Table 1 shows the evaluation results, from which the de ned metrics were
computed. The number of pairs that generated correct triples was 34 and of
relevant Q&amp;A pairs was 54. Precision was 0.971; and Miss Rate was 0.352. We
understand that a high Precision and a low Miss Rate indicate good quality of
the generated KG, although a lower Miss Rate is desirable.
(1)
(2)</p>
        <p>In order to evaluate the reasons that led to the generation of incorrect
knowledge and missing of relevant knowledge added to KG, we classi ed errors on the
NLP in our aforementioned manual evaluation. Figure 4 shows the most frequent
RASA classi cation errors in the evaluated question and answer pairs. Note that
an incorrect pair may contain more than one classi cation error. A wrong
question entity (model or year) classi cation led to the generation of incorrect triples;
failure to correctly detect question entities and answer intents led Q&amp;A pairs to
not generate triples.</p>
        <p>Table 1 presents 546 Q&amp;A pairs that did not have enough information to
generate triples according to our solution. The understanding of this value requires
an analysis of our exigences to a Q&amp;A pair be able to generate triples. One of
our exigences, the Ivalid, determines that the pair can generate knowledge only
if the answer intent is FullCompatibility. Figure 5 exposes the distribution of
answer intents along the pairs of S and shows that pairs with FullCompatibility
correspond only to 17.6% the evaluation input. This refers to the highest bound
of Q&amp;A pairs that would generate RDF triples. This shows that addressing other
types of compatibility might provide ways of enhancing our KG.
Fig. 5: Distribution of detected answer intents among the 25,383 Q&amp;A pairs
processed in S.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation of querying the KG in the automotive e-commerce domain</title>
        <p>We deployed the KG to answer incoming questions about compatibility between
automotive products and cars. The aim was to evaluate the capability of the KG
Service to answer new customer questions coming from the GoBots service.</p>
        <p>In addition to the knowledge encoded in the KG by the processing of
question answer pairs, an additional set of triples was inserted in the KG from a
compatibility list between cars and products provided by an automotive store
working with GoBots. In total, for answering new incoming questions, the KG
Service counted with a KG containing 1,923,053 triples, composed by 347,482
compatibilities between products and cars.</p>
        <p>The evaluation was conducted on the real-world operation of the GoBots
service. This receives a huge number of real-time questions to answer
automatically every day. When the service is not capable to answer a given question,
it queries the KG Service in an attempt to build an answer if the question ts
on the criteria: the intent is Compatibility with con dence threshold of 0:8; and
extracted entities are at least Emin = fmodel, yearg.</p>
        <p>The service was evaluated over a period of 12 days. 2,667 questions t on
the restrictions and were queried in the KG service for evaluation purpose
concerning an automotive e-commerce store. It was found knowledge to successfully
answer 103 of 2,667 questions posed to the KG service in the evaluated period,
corresponding to 3.9%.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>This paper addressed the problem of extracting and structuring knowledge from
questions and answers in NL on products in e-commerce. The generated KG was
deployed in the GoBots systems environment and helped addressing the quality
of automated answers to enhance customer services. The solution has the
potential to complement the customer service that requires specialized attendants.</p>
      <p>The solution is scalable in the sense that e-commerce stores contain a large
number of Q&amp;A pairs. The more pairs are processed, the more knowledge will
be available for answering questions. Also, if universal identi er for products is
applicable, knowledge from a single product can be useful and reused on di erent
stores, given that most of the time there are intersections on items sold. Aiming
to enhance the amount of knowledge available, we plan to improve the solution
to process new Q&amp;A pairs as soon as a new untreated question is answered by
an human attendant.</p>
      <p>The performance of the generation of triples is highly linked to the e
ectiveness of the entity and intent extraction. Figure 4 presented only one entity
extraction error that led to generation of incorrect triples, which determined a
satisfactory precision level. Most errors on NLU extraction led to missing of
relevant knowledge. To decrease the miss rate, we should expand the training data
sets and better distribute the sentences between di erent intentions, in order
to decrease NLU extraction errors. The production of large and balanced data
sets is a major research challenge for the future development to reach further
domains other than automotive.</p>
      <p>Figure 5 presented a low number of Q&amp;A pairs with FullCompatibility. To
increase the number of questions answered with the retrieved knowledge from
KG, we shall include NoCompatibility and ConditionalCompatibility to the valid
intents to encode additional knowledge in our KG. This represents the most
clear research path to follow for short-range improvements on the solution. The
solution is already prepared to support such knowledge. For both Compatibility
classes, the solution would work on an analogue way. Further improvements are
necessary to encode the conditional aspect in our KG.</p>
      <p>The proposed solution aims to work with knowledge regarding compatibility
between products and generic ConsumerItems. Although it was evaluated using
the domain of cars (automotive), our proposal is naturally and easily expansible
to other domains, considering a new ConsumerItem CI, related to the selected
domain, and the following steps:
1. Updating the ontology by adding CI and its attributes. There must be a
subset of them able to uniquely identify an instance of CI.
2. Adding attributes of CI as entities on the NLU processor and training it
with examples.
3. Generating knowledge using Algorithm 1 with Emin according to the
attributes uniquely identifying CI, shown in item 1.
4. Updating GoBots Service to query KG Service when the question has
Compatibility intent and the given Emin entities.</p>
      <p>The knowledge generated by the presented solution can be valuable on
different future case uses other than answering questions. For example, improving
products descriptions and delivering valuable recommendations. Given the case
when we nd a knowledge of NoCompatibility for a question, our KG could be
used to nd another product with the same category that might have a
FullCompatibility with the ConsumerItem of the question. The solution could point
this other product as a recommendation.</p>
      <p>Further improvements on the KG are also related to its potential regarding
semantics. As future work, we plan to study how to take bene t of using reasoners
for inconsistency detection, for instance.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Advanced solutions for customer services concerning automated question
answering can bene t customer's experience and sales conversion. This work aimed to
increase the e ectiveness of automatic answering systems to consumer questions
about products in e-commerce platforms. We investigated ways of generating a
knowledge base encoded in a RDF triple store produced from non-structured
existing data about products. The extracted knowledge has been used by the
automated response system in the GoBots company to answer speci c questions
without the direct assistance of human attendants. We showed the feasibility
of our solution in KG construction via automatic generation of RDF triples
extracted from NL messages. Experiments evaluated the e ectiveness of the KG
generation and asserted the high reliability of the solution applied in real-world
data. We demonstrated that our solution is feasible for answering new questions
based on the constructed KG. The structured knowledge was used to answer
real-time questions on e-commerce store indicating its practical and direct
utility. Future work involves the extension and application of the solution in a wider
range of domains and with higher volume of data.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This research was supported by GoBots and the S~ao Paulo Research Foundation
(FAPESP) (Grant #2019/08609-0)6.
6 The opinions expressed in this work do not necessarily re ect those of the funding
agencies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bocklisch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faulkner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pawlowski</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Rasa: Open source language understanding and dialogue management</article-title>
          .
          <source>arXiv preprint arXiv:1712.05181</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , H.:
          <article-title>Driven answer generation for productrelated questions in e-commerce</article-title>
          .
          <source>In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining</source>
          . pp.
          <volume>411</volume>
          {
          <issue>419</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gruber</surname>
            ,
            <given-names>T.R.:</given-names>
          </string-name>
          <article-title>A translation approach to portable ontology speci cations</article-title>
          .
          <source>Knowl. Acquis</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <volume>199</volume>
          {220 (Jun
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge</article-title>
          .
          <source>In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>221</volume>
          {
          <issue>231</issue>
          (01
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Sparql 1.1 query language</article-title>
          . https://www.w3.org/TR/ sparql11-query (
          <year>2013</year>
          ), accessed in 2020-
          <volume>04</volume>
          -04
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Ho ner,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Walter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ngonga</surname>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>Survey on challenges of question answering in the semantic</article-title>
          .
          <source>Semantic Web Journal</source>
          <volume>8</volume>
          ,
          <issue>1</issue>
          {
          <fpage>26</fpage>
          (01
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kertkeidkachorn</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ichise</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An automatic knowledge graph creation framework from natural language text</article-title>
          .
          <source>IEICE Transactions on Information and Systems</source>
          . E101.D pp.
          <volume>90</volume>
          {
          <issue>98</issue>
          (01
          <year>2018</year>
          ). https://doi.org/10.1587/transinf.2017SWP0006
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivatsa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kase</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Exploiting relevance feedback in knowledge graph search</article-title>
          .
          <source>In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>1135</volume>
          {
          <issue>1144</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.:
          <source>Starspace: Embed all the things! arXiv eprint arXiv:1709.03856</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>