Conversational Information Retrieval using
Knowledge Graphs
Pruthvi Raj Venkatesh1,∗,† , K Chaitanya2,† , Rishu Kumar3,† and P Radha Krishna4,†
1,2,3
    Openstream Technologies,No.25, S.V.Arcade, South End Main Road , 9th Block, Jayanagar, Bengaluru,
Karnataka,India, 560069
4
  National Institute of Technology Campus, Hanamkonda, Telangana,India,506004


                                       Abstract
                                       Recent years have seen a huge increase in the popularity of information retrieval(IR) systems, which
                                       enable users to hold natural language conversations. IR Systems such as conversational agents are
                                       typically goal-oriented and use predefined queries to retrieve information from backend systems. Re-
                                       searchers have improved these agents to adapt to different modalities, such as images, sound, and video,
                                       to enhance the conversational experience. Though these systems effectively address users’ information
                                       requirements, there is a need for an approach that can easily adapt to diverse use cases and meet all
                                       user’s information needs without the user being aware of the backend system. In this work, we propose
                                       a novel approach called Multimodal Information Retrieval System using Knowledge Graph (MIR-KG) to
                                       address the information requirement of the user. In the proposed approach, the data surfaced through the
                                       conversation agents are stored in a backend database called knowledge graphs (KG). The approach takes
                                       multimodal input, uses an offline representation of KG called ontology to identify entities and relations,
                                       and generates dynamic KG queries. The paper introduces a context-building technique called Multimodal
                                       Context Builder(MCB) to preserve user-provided entities in long conversations and use the ontology
                                       to build the KG queries over the context information. We compared our results with a Multi-headed
                                       Hierarchical Encoder-Decoder with attention approach and found that the proposed approach gives a
                                       more detailed response to user queries. The Training Data Generator (TDG) generates the base training
                                       set for setting up the conversation agent. This approach eliminates the time required to collect question
                                       and answer pairs needed in the case of goal-based modules. The proposed approach is demonstrated
                                       using an already constructed KG with data from the MMD[1] website. The approach can also be applied
                                       to other domains..

                                       Keywords
                                       Question Answering System, QnA, Knowledge Graph, Multimodal, Neo4J, Ontology, Entity Identification,
                                       Entity masking, Semantic parsing, Logical forms


1. Introduction
Conversational agents(CA) enable users to post questions and get answers. Organizations
have recently concentrated on enabling people to engage with these conversational agents

PASIR-2022: First Workshop on Proactive and Agent-Supported Information Retrieval, October 21st, 2022, Atlanta,
Georgia
∗
    Corresponding author.
†
     These authors contributed equally.
Envelope-Open pruthvi@openstream.com (P. R. Venkatesh); chaitanya@openstream.com (K. Chaitanya);
rishu@openstream.com (R. Kumar); prkrishna@nitw.ac.in (P. R. Krishna)
                                     © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  CEUR
  Workshop
  Proceedings
                http://ceur-ws.org
                ISSN 1613-0073
                                     PASIR Workshop Proceedings (PASIR2022)
Figure 1: Multi-turn IR in the fashion domain.


through multimodal interactions. The approach described in this research, MIR-KG (Multimodal
Information Retrieval System utilizing Knowledge Graph), uses ontologies and knowledge
graphs to power the question-answering process. The proposed approach combines transformer-
based intent and multimodal entity detection from user questions with procedural dynamic
query creation. The multimodal entities provided as input to the conversational agent are
progressively collected into the conversation context. The dynamic query generation engine
uses the context to generate multimodal knowledge graph queries. Multimodal Knowledge
Graphs (MMKG)[2] is used to store data and ontology to represent the structural information of
the MMKG database. Ontology is used by the Natural Language Understanding (NLU) engine
to convert the user question into a logical form. Figure 1 shows an example of a multi-turn
information retrieval session where the user is looking for a product and the user is providing
text and image input. The user in the first leg(Q1) asks for a product by giving a product
description. In the second turn(Q2), the user uploads an image as input; in the third turn(Q3),
the user provides the price and gender requirements of the product. The main contribution
of this paper is a novel approach for multimodal conversation using MMKG, dynamic query
generation using ontologies, and an MCB approach for supporting context-driven information
retrieval systems using ontologies.


2. Literature Survey
There has been considerable work in question answering systems that use knowledge graphs.
There has also been research on multimodal conversational IR systems. However, little work has
been done using KG and transformer-based intent and entity identification models to generate
responses in multimodal IR systems. Earlier works in literature [3-7] concentrated on knowledge
graph IR through queries generated from a trained model or an intent and query mapping
repository. These approaches needed question and query mapping training data and have
limited usability in domains with minimal or no training data. The proposed approach does not
depend on training data as ontology-based query generation is used, thus making the proposed
approach easily implementable in new domains. Literature work [8-9] focused on intermodal
representations, which required the development of a dedicated IR module for multimodal
search. The proposed approach does not require any complex IR system as the multimodal data
is represented as node attributes in the knowledge graph that is queried with other attributes
in the node. Literature works [10-15] are not the ideal IR approach for real-time interactions
because they only focus on one model representation at a time. The proposed approach does
not have these limitations as information can be retrieved from different multimodal fields or
text attributes simultaneously.


3. Basic Concepts
3.1. Multimodal Knowledge Graph
A Multimodal Knowledge Graph(MMKG) is a KG where the attribute set will have values, which
are a multimodal representation of data such as text, image, video, sound, human expression,
and so on. The representation of data will include binary data, references (image URL, video
URL), and embeddings(image embeddings[16], sentence embeddings[17]). Figure 2(a) shows
an example of a multimodal knowledge graph where the entities E include Product, Sleeves,
Type, Color, and, Gender. Relation R includes HAS_SLEEVES, HAS_TYPE, HAS_COLOUR,
FOR_GENDER. The attribute A includes both text and image data. The Product entity has
text attributes Title, Price, and Description and Image attributes ImageURL, ImageData, and
ImageEmbeddings

3.2. Ontology
The ontology acts as an offline schema definition of the MMKG database. It holds enough
information necessary for generating the dynamic query. Ontology is defined as a set 𝑂 =
{𝐸𝑜 , 𝑅𝑜 , 𝐴𝑜 , 𝑇𝑜 } where 𝐸𝑜 ⊆ 𝐸 is a set of entities, 𝐴𝑜 ⊆ 𝐴 is a set of attributes. 𝑅𝑜 = {𝑎𝑛 , 𝑟𝑛𝑚 , 𝑎𝑚 } is
a set of relation paths where 𝑎𝑛 and 𝑎𝑚 are entities in 𝐴𝑜 and 𝑟𝑛𝑚 ∈ 𝑅 is the relation between
𝑎𝑛 and 𝑎𝑚 . 𝑇𝑜 ={’Text’, ’Numeric’, ’Image’, ’Embeddings’, ’Date’} is a set of data types. Each
attribute 𝑎𝑛 ∈ 𝐴𝑜 will have a type 𝑡𝑛 ∈ 𝑇𝑜 . Figure 2(b) shows the entities, attributes, and types.
Figure 2(c) shows the relations between the entities.

3.3. Conversational Agent Framework
CA Framework[18] provides a platform for developing and hosting conversational agents, or
BOTS. A CA Framework contains templates, standard development kits(SDK), and tools to
facilitate the creation of conversational agents. The SDK provides a dialogue or conversation
flow feature to manage a long-running conversation. A dialog flow is configurable to perform
tasks such as sending messages to users, asking users to enter questions, providing more context,
and calling backend API. The conversation flow can be limited to a single or many turns, as
Figure 2: Multimodal Knowledge Graph and Ontology.


shown in Fig 1. CA Framework has a persistent store that allows developers to store user-
provided information in user variables. User variables are configurable to persist though out
the chat session, initialized, reinitialized, or destroyed depending on the user input during the
dialogue session. In the example in Figure 1, the dialogue flow is configured to ask the user an
initial question(Q1). After Q1 results are provided, the user is asked to provide more context in
Q2 and Q3. The dialogue is configured to request more context until the user is satisfied with
the results or signals to close the chat.


4. Proposed Approach
This section captures the implementation approach for our proposed Multimodal Information
Retrieval System using Knowledge Graph (MIR-KG) and Multimodal Context Builder(MCB)
using modalities of text and image. Figure 3 shows the representation of the process flow.

   1. The user enters the question or provides more context in a CA that is hosted on a web
      application or an app
   2. CA framework receives the question provided. The dialogue flow triggers the NLU service
      to determine the intent and entities and produce logical forms. Intents are matched with
      the utterances generated using the training data generator. (Refer to Sec 4.1)
   3. Depending on the identified intent, the dialogue flow either a) Initiates the dynamic query
      generation module providing the context object collected in user variables, or b) stores the
      intents and entities identified in the user variable and requests more context by sending a
      message to the user, such as ”can you provide more information.”
Figure 3: Multimodal Information Retrieval System using Knowledge Graph (MIR-KG) and Multimodal
Context Builder(MCB)


   4. The context object and the ontology are provided as input to the dynamic query generation
      module, which creates the query to execute against the MMKG.
   5. The results are returned to the CA framework. The CA framework formats the results to
      show results in a user-friendly format.


4.1. Training Data Generator
The problem with CA based on trained models is obtaining or producing the necessary data
set for training the backend models. Though there are approaches such as synthetic data
generation[19] used for generating the training data, there is still a requirement of ground
truth question-answer pairs which may exist or may take time to gather from subject matter
experts(SME). The paper proposes a novel Training Data Generator(TDG) approach to create
training data containing question-answer pairs using ontology to address this problem.
    The proposed TDG approach generates a set of tuples 𝑇 = {𝐼 , 𝑄, 𝐸𝑜 } where I is the set of
intents, Q is the set of questions, and 𝐸𝑜 is the entity in the ontology that will be provided as
a query output of question Q. The approach uses a common phrase set 𝐶𝑃 = {𝑃, 𝐸𝑜 } where
P is the set of common phrases to generate the training data for questions related to entity
𝐸𝑜 . The algorithm for TDG is shown in Algorithm 1. For every entity 𝑒𝑛 in 𝐸𝑜 , the algorithm
look for common phrases 𝑝𝑛 in P and generates the question 𝑞𝑛 obtained by concatenating the
string 𝑝𝑛 with the entity name of 𝑒𝑛 and a where clause 𝑤𝑛 . The where clause 𝑤𝑛 is obtained by
concatenating the string ”where” with every combination of attribute {𝑒𝑖 , 𝑒𝑗 } ∈ 𝐴𝑜 where 𝐴𝑜 is
the set of attributes defined in the ontology.
    The following scenario explains the TDG process for the ontology shown in Figure 2(b).
Consider 𝐸𝑜 = {“𝑃𝑟𝑜𝑑𝑢𝑐𝑡”, “𝐶𝑜𝑙𝑜𝑟”}, attribute set 𝑎𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = {“𝑃𝑟𝑖𝑐𝑒”, “𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛”} and 𝑎𝑐𝑜𝑙𝑜𝑟 =
“𝑇 𝑖𝑡𝑙𝑒” and 𝑝𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = {”I am looking for a”,”show me ”}. The training tuples generated for this
setup are {”getProduct”, ”I am looking for a product where product price is #ent and color title
is #ent”,” Product”}, {”getProduct”, ”I am looking for a product where product description is #ent
and color title is #ent”,” Product”}, {”getProduct”, ”show me product where product price is #ent
and color title is #ent”,” Product”}, {”getProduct”, ”show me product, where product description is
#ent and color title, is #ent”,” Product”}. It can be noted from the example that the TDG approach
Algorithm 1 Training data generator
Input: 𝐸𝑜 , 𝑎𝑒 from Ontology file
Output: List of utterances (𝑞𝑖𝑗 )
  1: for each 𝑒𝑖 ∈ 𝐸𝑜 do
  2:     for each 𝑎𝑖𝑒 ∈ 𝑎𝑒 do
  3:         for each 𝑒𝑗 ∈ 𝐸𝑜 do
  4:             for each 𝑎𝑗𝑒 ∈ 𝑎𝑗 do
  5:                 if 𝑎𝑖𝑒 ≠ 𝑎𝑗𝑒 then
  6:                      𝑝𝑒 ← 𝑓 𝑒𝑡𝑐ℎ 𝑐𝑜𝑚𝑚𝑜𝑛 𝑝ℎ𝑟𝑎𝑠𝑒 𝐶𝑃 𝑓 𝑟𝑜𝑚 𝑒𝑛𝑡𝑖𝑡𝑦 𝑒𝑖
  7:                      𝑖𝑒 ← 𝑐𝑜𝑛𝑐𝑎𝑡(”𝑔𝑒𝑡”, 𝑒𝑖 )
  8:                      𝑤𝑒 ← 𝑐𝑜𝑛𝑐𝑎𝑡(”𝑤ℎ𝑒𝑟𝑒”, 𝑒𝑖 , 𝑎𝑖𝑒 , ”𝑖𝑠 #𝑒𝑛𝑡 𝑎𝑛𝑑”, 𝑒𝑗 , 𝑎𝑗𝑒 , ”𝑖𝑠 #𝑒𝑛𝑡”)
  9:                      𝑞𝑖𝑗 ← 𝑐𝑜𝑛𝑐𝑎𝑡(𝑝𝑒 , 𝑒𝑖 , 𝑤𝑒 )
 10:                 end if
 11:             end for
 12:         end for
 13:     end for
 14: end for


can generate multiple combinations of questions depending on the entities and attributes present
in the ontology. The training tuples generated are used for intent identification by the NLU
in Section 4.2.3. This approach thus reduces the overall training time because of the lesser
dependency on the base training data

4.2. NLU Service
The NLU service converts the natural language question into a logical form used by the dynamic
query generator to generate MMKG queries. The NLU service requires certain intent and entity
configurations to perform the following activities 1) Intent identification, 2) Entity identification,
and 3) Logical form generation. The NLU service is built on a standard NLP framework such as
spacy[20]. Figure 4 shows the process flow of NLU service.

4.3. Intent and Entity Configuration
This process involves configuring the intents and entities the CA system should support. The
NLP pipeline will use the configuration to identify entities and intent from the question. The
configuration process involves updating the intent(TDG output) and entity information into
the NLU service configuration. Entity information include

   1. Synonym Entities: Configuration to identify entities that have discrete values. The
      configuration information includes tuples {𝐸𝑡 , 𝐴𝑡 , 𝑉𝑡 } where 𝐸𝑡 ∈ 𝐸𝑜 and 𝐸𝑜 are the entities
      in ontology, 𝐴𝑡 is the attribute of 𝐸𝑡 with value 𝑉𝑡 ∈ 𝑉 where V is the set of distinct values
      in MMKG for entity 𝐸𝑡 . For the MMKG in Figure 2, the distinct values for Color include
      {Color, Blue},{Color, Green}. The distinct values for gender include {Gender, Male},{ Gender
Figure 4: NLU Service Pipeline Training Data Generator.


      , Female}, {Gender , Male | Men},{Gender , Female | Ladies}. It can be noted that {Gender,
      Female | Ladies} also captures synonyms for the value ”Female” as ”Ladies.”
   2. Regular Expression Entities: Similar to the Synonym entities, this configuration iden-
      tifies entities with continuous values. In Figure 2, the ”Price” attribute in the ”Product”
      entity is a continuous variable. ”250$” can be identified using the regular expression
      [0-9]$.
   3. Phrase Entities: Configuration to identify entities that are embedded in phrases.

4.4. Entity Identification
Entity recognition uses the NLP pipeline and entity configuration to identify the entities provided
as part of the query. The NLP pipeline is a feature provided by the NLP framework and consists
of Entiry Ruler, Named Entity Recognition(NER) and Entity Relation Linker. Entity Ruler
identifies and marks tokens in the supplied question as entities after comparing them with
the configured entities. NER assigns labels to the identified entities, and the Entity relation
linker identifies the relationship between the entities and the entity values depending on the
relation terms such as ”greater than”, ”lesser than”, ”similar to”, ”having a value”, and so on. The
entity recognition module generates quadruple 𝐸𝑅 = {𝐸𝑜 , 𝐴𝑜 , 𝑉 , 𝑅} where R is the relation value.
For the question, ”can you filter these sweatshirts for men and price less than 250 dollars,” the
word ”Male” is identified using Synonym entities, and ”250 dollars” is identified using Regular
Expression entities. NER maps ”Male” with the ”Gender.Title” attribute and ”250” with the
”Product.Price” attribute. The Entity relation linker maps entities and attributes with values
and generate quadruple {”Gender”,” Title”,” Male”,” =”} and {”Product”,” Price”,” 250”,” <”}.

4.5. Intent Identification
The question sent to the NLU service matches the predefined questions generated by TDG. The
proposed approach uses transformer-based models for sentence similarity and identifying the
most relevant TDG utterance matching the query passed. If Q is the question passed, the entity
masking modules mask the entities identified by the NLP pipeline and generate 𝑄𝑚𝑎𝑠𝑘 . The
sentence embedding model[17] generates the sentence vector 𝑉𝑚𝑎𝑠𝑘 for 𝑄𝑚𝑎𝑠𝑘 . The sentence
vector 𝑉𝑚𝑎𝑠𝑘 is compared for similarity with all the questions in tuples 𝑇 = {𝐼 , 𝑄, 𝐸𝑜 } generated
by TDG using the algorithm mentioned in Algorithm 2. The tuple with the maximum similarity
score and greater than the minimum threshold 𝑡𝑚𝑖𝑛 is identified as the intent of the question.
𝑡𝑚𝑖𝑛 is usually set to 0.75 or above. If question Q = ”I am looking for t-shirts in red color and
price less than 250 dollars”, the entity masking module generates 𝑄𝑚𝑎𝑠𝑘 = ”I am looking for #ent
in #ent color and price less than #ent dollars.” The intent identification algorithm identifies the
TDG tuple = {”getProduct”, ”I am looking for a product where product price is #ent and color
title is #ent”,” Product”} as the intent tuple.

Algorithm 2 Intent Identification
Input: 𝑉𝑚𝑎𝑠𝑘 , 𝑇
Output: TDG tuple, 𝑇 = {𝐼 , 𝑄, 𝐸𝑜 }
 1: for each 𝑞𝑖 ∈ 𝑇 do
 2:     𝑞𝑣𝑒𝑐𝑡𝑜𝑟 ← 𝑔𝑒𝑡𝑆𝑒𝑛𝑡𝑒𝑛𝑠𝑒𝑉 𝑒𝑐𝑡𝑜𝑟(𝑞𝑖 )
                                                      𝑛
                       𝑞        ∗𝑉                 ∑𝑖=1 𝑞𝑖𝑣𝑒𝑐𝑡𝑜𝑟 ∗ 𝑣𝑖𝑚𝑎𝑠𝑘
 3:     𝑞𝑖𝑠𝑐𝑜𝑟𝑒 ← ||𝑞 𝑣𝑒𝑐𝑡𝑜𝑟|| ∗ |𝑉𝑚𝑎𝑠𝑘 | =       𝑛                𝑛
                       𝑣𝑒𝑐𝑡𝑜𝑟        𝑚𝑎𝑠𝑘                   2               2
                                              √∑𝑖=1 𝑞𝑖𝑣𝑒𝑐𝑡𝑜𝑟 √∑𝑖=1 𝑣𝑖𝑣𝑒𝑐𝑡𝑜𝑟
 4: end for
 5: 𝑞𝑚𝑎𝑥 ← 𝑚𝑎𝑥(𝑞𝑖𝑠𝑐𝑜𝑟𝑒 )
 6: if 𝑞𝑖𝑠𝑐𝑜𝑟𝑒 ≥ 𝑡𝑚𝑖𝑛 then
 7:     return 𝑡𝑖 = {𝐼𝑖 , 𝑄𝑖 , 𝐸𝑜𝑖 }
 8: else
 9:     return unknown intent
10: end if


4.6. Logical Form Generation
A logical form is generated containing the output from entity and intent identification. The
logical form generated is a set 𝐿 = {𝑄, 𝐸𝑅, 𝑡𝑖 , 𝑞𝑚𝑎𝑥 }. Q is the question asked, ER is the set of all
the recognized quadruples, 𝑡𝑖 is the entity tuple identified by the intent identification algorithm
and 𝑞𝑚𝑎𝑥 is the maximum similarity score of the identified intent question with the supplied
question.

4.7. MultiModal Context Building(MCB)
The MCB technique uses ontology and entity lists provided by the NLU to generate the context
objects. The context object is a collection of entities recognized by the NLU in multiple turns.
The CA framework(Refer 3.3) plays a key role in creating and updating the context object. Table
1 shows the working of the MCB approach for the sample conversation shown in Figure 1. The
table captures the states and responses of the different components involved in a multi-turn
conversation. It can be noted from the table that the CA Framework is configured to control
the flow by asking questions shown in the ”CA Response” column. The user-provided response
is shown in ”User Question/Context”. Once the intent I is identified in the first turn, the CA
framework considers only the entities recognized by the NLU engine to build the context object
in subsequent turns. If no entities are identified, the CA framework creates the context object
by mapping the question provided with the product ”Description” column. The results of the
dynamic query generated using the CO will be shown to the user at each turn. If unsatisfied, the
user provides more context to filter the results further. The MCB approach thus incrementally
builds the context in multiple turns and becomes a convenient option to maintain the context.
This approach is not limited by the total number of turns in the conversational flow.


4.8. Dynamic Query Generator
The dynamic query generation module processes the context object stored in the CA Framework
and builds the KG query. Figure 5(a) shows the query mapping between entity fields, return
entity, ontology fields, and the query components for Neo4J and SQL. Figure 5(b) shows the
dynamic query generation for query Q1 in Figure 1. The dynamic query generation module
contains procedural code for handling data types such as embeddings, text, numeric, and date.
These rules can be altered depending on the target platform. The query object is the entity
whose attributes will be displayed in the results, and this value is fetched from 𝐸𝑜 in the context
object. The query’s return fields are obtained from the entity 𝐸𝑜′ 𝑠 attribute set in the ontology.
The ”where” clause for the question is populated by the entities listed in the context object.
Since the dynamic query generator contains generic rules depending on the data types, the
approach becomes extensible. This approach requires the creation of ontology to apply to a
different domain saving significant effort in training and testing.


5. Results
The proposed approach was tested by ingesting 1743 JSON records from the MMD[1] data into
the Neo4J KG database, resulting in 2438 nodes and 12201 relations. The TDG generated 152
utterances, 12 intents, and 38 entities. The average response time for results generation was less
than 2 sec. KG data load time was approximately 10 minutes, and embedding generation time
Figure 5: Dynamic generation of KG queries.


was around 2 seconds. The results were compared with Multi-headed Hierarchical Encoder-
Decoder with attention(MHRED-attn) results. Table 2 shows the comparison of results for a
sample scenario. It can be noted that the user is asking about multiple product attributes, and
the MHRED-attn model could not provide the required results. However, the proposed approach
returned the product attributes from the KG containing the required information


6. Conclusion
Conversation agents enable users with IR by querying the underlying systems. This paper
presents a novel predictive and rule-based approach that reduces training time and supports
multimodal conversations using a multimodal knowledge graph. Domain ontology defines
entities, attributes, and rules and drives the dynamic query generation module, which helps
create conversational IR systems for a new domain with minimal effort.


References
 [1] MMD: Towards Building Large Scale Multimodal Domain-Aware Conversation Systems
     -https://amritasaha1812.github.io/MMD/.
 [2] Xiangru Zhu, Zhixu Li, Xiaodan Wang, Xueyao Jiang, Penglei Sun, Xuwu Wang, Yanghua
     Xiao, and Nicholas Jing Yuan, ”Multimodal Knowledge Graph Construction and Applica-
     tion: A Survey”, 2022, doi: https://doi.org/10.48550/arXiv.2202.05786
 [3] H.Jiang, B.Yang, L.Jin and H.Wang, ”A BERT-Bi-LSTM-Based Knowledge Graph
     Question Answering Method,” 2021 International Conference on Communications,
     Information System and Computer Engineering (CISCE), 2021, pp.308-312, doi:
     10.1109/CISCE52179.2021.9445907.
 [4] L.Ma, P.Zhang, D.Luo, M.Zhou, Q.Liang and B.Wang, ”Answer Graph-based Interactive
     Attention Network for Question Answering over Knowledge Base,” 2020 IEEE Intl Conf
     on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing,
     Sustainable Computing and Communications, Social Computing Networking (ISPA/BD-
     Cloud/SocialCom/SustainCom), 2020, pp.521-528, doi: 10.1109/ISPA-BDCloud-SocialCom-
     SustainCom51426.2020.00091.
 [5] Y.Li, J.Cao and Y.Wang, ”Implementation of Intelligent Question Answering System Based
     on Basketball Knowledge Graph,” 2019 IEEE 4th Advanced Information Technology, Elec-
     tronic and Automation Control Conference (IAEAC), 2019, pp.2601-2604, doi: 10.1109/I-
     AEAC47372.2019.8997747.
 [6] S.Hu, L.Zou, J.X.Yu, H.Wang and D.Zhao, ”Answering Natural Language Questions by Sub-
     graph Matching over Knowledge Graphs (Extended Abstract),” 2018 IEEE 34th International
     Conference on Data Engineering (ICDE), 2018, pp.1815-1816, doi: 10.1109/ICDE.2018.00265.
 [7] X.Dai, J.Ge, H.Zhong, D.Chen and J.Peng, ”QAM: Question Answering System Based on
     Knowledge Graph in the Military,” 2020 IEEE 19th International Conference on Cogni-
     tive Informatics and Cognitive Computing (ICCI*CC), 2020, pp.100-104, doi: 10.1109/IC-
     CICC50026.2020.9450261.
 [8] S. Zoghbi, G. Heyman, J. C. Gomez, and M-F. Moens. Fashion Meets Computer Vision and
     NLP at e-Commerce Search. International Journal of Computer and Electrical Engineering
     (IJCEE), Vol. 8, No 1, pp. 31–43, February 2016.
 [9] K. Laenen, S. Zoghbi, and M-F. Moens. Web Search of Fashion Items with Multimodal
     Querying. In Proceedings of WSDM 2018: The Eleventh ACM International Conference
     on Web Search and Data Mining, Marina Del Rey, CA, USA, February 5–9, 2018
[10] S. Bell and K. Bala. Learning Visual Similarity for Product Design with Convolutional
     Neural Networks. ACM Transactions on Graphics (TOG), vol. 34 , No 4, pp. 1-10, July 2015.
[11] J.-H. Hsiao and L.-J. Li. On Visual Similarity based Interactive Product Recommendation
     for Online Shopping. 2014 IEEE International Conference on Image Processing (ICIP), pp.
     3038-3041, 2014.
[12] B. Zhao, J. Feng, X. Wu, and S. Yan. Memory-Augmented Attribute Manipulation Net-
     works for Interactive Fashion Search. IEEE Conference on Computer Vision and Pattern
     Recognition (CVPR 2017). pp. 6156–6164. 2017
[13] X. Han, Z. Wu, P. X. Huang, X. Zhang, M. Zhu, Y. Li, Y. Zhao, and L. S. Davis.Auto-
     matic Spatially-Aware Fashion Concept Discovery. 2017 IEEE International Conference on
     Computer Vision (ICCV), pp. 1472-1480, 2017
[14] C. R. Sapna, M. Anagha, K. Vats, K. Baradia, T. Khan, S. Sarkar, and S. Roychowdhury.
     Recommendence and fashionsence online fashion advisor for offline experience. ACM
     International Conference Proceeding series, pp. 256–259, 2019
[15] A. Paranjape, A. See, K. Kenealy, H. Li, A. Hardy, P. Qi, K. R. Sadagopan, N. M. Phu, D.
     Soylu, and C. D. Manning. Neural generation meets real people: Towards emotionally
     engaging mixed-initiative conversations. Stanford NLP, 3rd Proceedings of Alexa Prize,
     arXiv:2008.12348, 2020
[16] Vision Transformer -https://en.wikipedia.org/wiki/Vision_transformer
[17] Sentence Embedding -https://en.wikipedia.org/wiki/Sentence_embedding
[18] Bot Framework SDK -https://docs.microsoft.com/en-us/azure/bot-service/bot-service-
     overview?view=azure-bot-service-4.0
[19] Synthetic Data -https://en.wikipedia.org/wiki/Synthetic_data
[20] Language Processing Pipelines · spaCy -https://spacy.io/usage/processing-pipelines