<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Live Study Proposal: Collecting Natural Language Trace Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sugandha Lohar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jane Cleland-Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandar Rasin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick M¨ader</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DePaul University</institution>
          ,
          <addr-line>Chicago, IL 60422</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ilmenau University of Technology</institution>
          ,
          <addr-line>Ilmenau</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>207</fpage>
      <lpage>210</lpage>
      <abstract>
        <p>[Context &amp; motivation:] Software traceability links which are created during the development process are subsequently underutilized because project stakeholders lack the skills they need to compose trace queries. TiQi addresses this problem by accepting natural language trace queries and transforming them to executable SQL. [Question/problem:] The TiQi engine depends on the presence of a domain model. This can best be constructed through collecting samples from potential users. We are interested to learn what trace queries would be of interest to potential stakeholders and what terminology they would choose to express those queries. [Principal ideas/results:]In this live study we will demonstrate TiQi in action, and lead the participants through a series of carefully crafted 'what-if' scenarios designed to capture a variety of sample queries. [Contribution:] The study is expected to lead to a more extensive domain model which will improve the accuracy of TiQi's query transformation process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In safety-critical domains traceability is prescribed by regulators [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] who typically
require traceability paths to be established between hazards, faults, mitigating
requirements, design, code, and test cases [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] etc. Manufacturers must invest
significant e↵ort in creating trace links; [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] however these links are often used
purely for compliance purposes and are otherwise underutilized [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The potential
benefits of using trace links to support activities such as impact analysis and
regression test case selection go unrealized. One reason is that trace queries
can be quite complex and cut across many di↵erent artifacts. For example, a
typical trace query could synthesize data about faults, mitigating requirements,
code, test cases, and test logs, as well as the trace links that connect them. This
leads to significant complexity, especially as many software developers have only
rudimentary skills at constructing structured queries using technologies such as
SQL or XQuery [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        In our previous paper presented at the International Requirements
Engineering Conference, entitled “TiQi: Towards Natural Language Trace Queries” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
we presented a solution for transforming NL trace queries into SQL. Prior to
engaging in this project we assumed that we could find and adopt an open source
Copyright © 2015 by the authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
      </p>
      <p>Hazard
‒ ID
‒ Hazard
‒ Classification
‒ Probability
causes</p>
      <p>FaultFault
‒ ID
‒ Contributing Fault
‒ Severity
Environmental
Assumption
general NL database query language and customize it for the traceability domain;
however, an extensive search of available open-source solutions was
unsuccessful. We therefore constructed a generic NL database query mechanism and then
customized it with vocabulary and concepts from the traceability domain. The
result of this work is TiQi, which transforms spoken or written natural language
trace queries into executable SQL statements. In a series of controlled
experiments, TiQi has transformed NL queries from di↵erent projects at accuracy
rates ranging from 50% to 89%.</p>
      <p>TiQi prompts the user for a NL query by displaying a Traceability
Information Model (TIM), which as depicted in Figure 1 models artifact types, their
attributes, and semantically typed links.</p>
      <p>In order to interpret queries, TiQi requires an underlying domain model
describing the terminology and concepts of the traceability and product domain.
This includes project-specific terms, question terms and junk. Project-specific
terms describe artifact types and attribute names, and can be extracted from
the raw data of the project. Question terms form the ‘glue’ which holds the
pieces of the query together. For example terms such as show me, list all, or I’d
like to see are all synonyms for the SQL construct SELECT. Similarly, terms
such as associated with, related to, or with are synonyms for various forms of
JOIN.</p>
      <p>Our research goal is to collect sample trace queries from which we can
extend TiQi’s domain model. There are two specific research questions. First,
“What queries are of interest to project stakeholders?” and second “What
terminology do project stakeholders use to express their queries?”</p>
    </sec>
    <sec id="sec-2">
      <title>Research Design</title>
      <p>
        The study will be performed interactively as a large group with both a
paperbased and online option. The session will open with 10 minutes of training in
which the notion of trace queries and TIMs will be presented. First, demographic
data about each participant will be collected. Then each participant will be given
a TIM similar to the one shown in Figure 1. Half of the participants will work
on the Isolette System and half on the EasyClinic system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Three data records
will be provided as samples for each artifact type.
      </p>
      <p>In the first part of the study specific scenarios will be used as prompts.
The participant will be asked to address the scenario by creating a NL query –
using words of their own choosing. There are nine scenarios of which four are
shown below. Each participant will be asked to select and create a query for
five of them. Participants will also be asked to add 1-2 of their own queries
that would support a Software Engineering task which they perform regularly.
Sample scenarios include:
1. The safety ocer is worried that an important requirement R136 is not
correctly implemented. The developer tells him that it is not only
implemented but has also passed its acceptance tests. The security ocer runs a
trace query to confirm this. What might his query be?
2. Piotr discovered that one of the high risk hazards have been neglected during
system specification. He cannot find any related requirements. Now he is
worried that other hazards might have been neglected too. Can you help
him issue a trace query to overcome his concern?
3. Last week, Jin modified some functions in order to improve the eciency of
the system. The system has not functioned well since then. She is not sure
if all the test cases have passed. What trace query should she issue in order
to verify this?
4. While testing the system, Paxton becomes suspicious that the test cases
aren’t sucient to show that all requirements have been addressed. What
trace query could he issue to help him investigate this problem?
To add some fun to the session – once all queries have been collected, we
will randomly select three of them and feed them to TiQi. The participants will
experience TiQi’s capabilities and/or failures in real time. In terms of benefits,
all participants will learn more about natural language trace queries and will be
given access to our online TiQi tool (not yet released publicly).</p>
      <p>Because our goal is to capture sample queries from a broad range of potential
project stakeholders, we have only basic prerequisites. We require all
participants to hold an MS degree in Computer Science, Information Systems, Software
Engineering or a related field, and/or to have at least one year of IT experience.
We will capture this information during the session through the meta-questions
and will later discard any responses from people who do not meet these criteria.</p>
      <p>To analyze the data we will use data mining tools to generate a list of
terms, phrases, and their frequencies. We will then automatically filter the list
to exclude project specific terms and to remove terms that are already known
to TiQi. This will leave the new question terms and the non-useful junk terms.
Researchers on our team will then go through this list and categorize the terms
according to the equivalent SQL expression, and add them to the model. “Junk“
terms that are neither question terms nor project specific, will be identified and
removed.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>The end result of this study is expected to be a greatly expanded list of
synonyms and definitions for each of the SQL expressions in the domain model.
This expanded set of domain terminology will enable TiQi to service future NL
queries more accurately.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. M¨ader,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Cleland-Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>A visual traceability modeling language</article-title>
          .
          <source>In: MoDELS (1)</source>
          . pp.
          <fpage>226</fpage>
          -
          <lpage>240</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. M¨ader,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Cleland-Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>A visual language for modeling and executing traceability queries</article-title>
          .
          <source>Software and System Modeling</source>
          <volume>12</volume>
          (
          <issue>3</issue>
          ),
          <fpage>537</fpage>
          -
          <lpage>553</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. M¨ader, P.,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Strategic traceability for safety-critical projects</article-title>
          .
          <source>IEEE Software 30(3)</source>
          ,
          <fpage>58</fpage>
          -
          <lpage>66</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pruski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lohar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aquanette</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amornborvornwong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rasin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          : Tiqi:
          <article-title>Towards natural language trace queries</article-title>
          .
          <source>In: IEEE 22nd International Requirements Engineering Conference</source>
          , RE 2014, Karlskrona, Sweden,
          <source>August 25-29</source>
          ,
          <year>2014</year>
          . pp.
          <fpage>123</fpage>
          -
          <lpage>132</lpage>
          (
          <year>2014</year>
          ), http://dx.doi.org/10.1109/RE.
          <year>2014</year>
          . 6912254
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rempel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , M¨ader,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kuschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Cleland-Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Mind the gap: Assessing the conformance of software traceability to relevant guidelines</article-title>
          .
          <source>In: 36th International Conference on Software Engineering (ICSE)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A comparative evaluation of two user feedback techniques for requirements trace retrieval</article-title>
          .
          <source>In: Proceedings of the ACM Symposium on Applied Computing, SAC</source>
          <year>2012</year>
          , Riva, Trento, Italy, March
          <volume>26</volume>
          -30,
          <year>2012</year>
          . pp.
          <fpage>1069</fpage>
          -
          <lpage>1074</lpage>
          (
          <year>2012</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2245276.2231943
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>