<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Semantic Web Confernce, November</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Conversational GUI for Semantic Automation Layer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Karl I. Weidele</string-name>
          <email>daniel.karl@ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaetano Rossiello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gregory Bramble</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abel N. Valente</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sugato Bagchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faisal Chowdhury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Martino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nandana Mihindukulasooriya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haritha Ananthakrishnan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hendrik Strobelt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hima Patel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gliozzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Owen Cornec</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankush Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sameep Mehta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manish Kesarwani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Horst Samulowitz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oktie Hassanzadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arvind Agarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Amini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Semantic Data Science</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Conversational Graphical User Interface</institution>
          ,
          <addr-line>Semantic Automation Layer, IBM, watsonx, Data Lakehouse</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>6</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>We present a Conversational Graphical User Interface for Semantic Automation Layer surfacing in IBM watsonx.data Tech Preview. Drawing from IBM pre-trained large language models (watsonx.ai), the layer semantically enriches data tables contained in a large, meta-structured data lakehouse (watsonx.data). We propose a graphical user interface backed by another fine-tuned watsonx.ai language model to provide business analysts with conversational access and control over the lakehouse using natural language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Semantic Automation Layer</title>
      <p>A semantic layer consists of business terms and glossary items that are meaningful in two main
ways. Firstly, they make up the language of a particular business and secondly, they enhance
existing data fields and labels that are cryptic and hard to understand. We build on the idea of a
semantic layer as it is crucial for modern data management and data science, however, most of
the semantic layers proposed in industry require constant human efort to ensure that the term
mappings are correct for all of the data of the business. Coupled with the constantly changing
nature of business data, this tedious manual process is both taxing to employees capacities and
costly to businesses.</p>
      <p>We therefore introduce a Semantic Automation Layer made up of four key components: (1)
Semantic Search (keyword-based and search for joinables) using table information to build
embeddings to more easily find and expand on enterprise data, (2) Semantic Enrichment that
uses generative and discriminative artificial intelligence approaches to automatically caption,
describe, tag and map tables and columns, (3) Glossary Utilities featuring semantically assisted
techniques to merge and consolidate glossaries, to capture human feedback more natural and
playful compared to the current industry practice, and (4) Data Quality Toolkit to provide data
CEUR
Workshop
Proceedings
profiling and quality estimation metrics to assess the quality of ingested data in a systematic
and objective manner.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Conversational GUI</title>
      <p>We developed a Svelte web application to let users interact directly with the Semantic Automation
Layer in a graphical way. The interface also comprises of a conversational agent in which a
ifne-tuned large language model can receive instructions from users (see figure 1).</p>
      <p>To demonstrate business value of the overall system, let us assume the following user scenario:
a business analyst (BA) would like to access and curate data from the lakehouse for further
processing in their business intelligence application. BA specifically would like to investigate
whether banking customer churn can be explained by economic data. The user phrases their
instruction, upon which the conversational agent recognizes this as a search, retrieves data
tables from the lakehouse and presents the results ranked by semantic relevance and with clean,
AI generated table captions (see figure 2).</p>
      <p>The conversational agent observes the UI context, so when BA requests to take a look into
the schema of this table, agent knows which table is meant. BA also could have opened the
table via a direct click, which would have auto-generated a language utterance to still keep chat
history and UI state in sync. Figure 3 shows the requested schema of the Individual customer
churn fact table including semantic table enrichments (each indicated with a light bulb icon)
from top to bottom: auto-caption, generated description, estimated primary key, generated
tags. On the column level, the Semantic Automation Layer further generates descriptions for
otherwise cryptic column names, and maps columns to business glossary concepts.</p>
      <p>Let us further assume BA is in possession of a local CSV file, which contains unemployment
rates for diferent geographies. To later join this local CSV file with the customer churn table
via zip codes, BA first requests to enrich the churn table to see if geographic information can be
added to this table. The agent recognizes this intent as a search for joinable tables. BA selects
the Geographic area table and explores join quality metrics, such as the identified join column
and containment or Jaccard coeficients.</p>
      <p>The resulting table (see figure 5) contains 42 columns including the 23 additional geographical
area features. BA can further explore the data quality report for the joined table, comprising of
metrics such as Data Duplicates, Data Completeness, Outlier Detection, and more.</p>
      <p>In a last step, which is not represented graphically here, BA imports their unemployment
data CSV into the lakehouse, before performing a final join with the churn-geo table via zip
codes.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Conversational Agent</title>
      <p>The conversational agent is a fine-tuned JSON-to-JSON transformer model which receives the
current UI view ID, the context map containing user selections or other UI observations, as well
as the user utterance as its JSON input string. The goal of the model is to perform three tasks
(1) intent recognition, (2) entity extraction and (3) response generation in a single pass. The
output of the transformer model therefore is a JSON string containing the view ID mapping to
the user’s intent, the required input parameters for the view, as well as a friendly response to
be presented in the chat component.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>