<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Inclusive Design Insights from a Preliminary Image-Based Conversational Search Systems Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yue Zheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lei Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junmian Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tianyu Xia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuanyuan Yin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shan Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haiming Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Southampton</institution>
          ,
          <addr-line>University Rd, Southampton SO17 1BJ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>86</fpage>
      <lpage>94</lpage>
      <abstract>
        <p>The digital realm has witnessed the rise of various search modalities, among which the Image-Based Conversational Search System stands out. This research delves into the design, implementation, and evaluation of this specific system, juxtaposing it against its text-based and mixed counterparts. A diverse participant cohort ensures a broad evaluation spectrum. Advanced tools facilitate emotion analysis, capturing user sentiments during interactions, while structured feedback sessions ofer qualitative insights. Results indicate that while the text-based system minimizes user confusion, the image-based system presents challenges in direct information interpretation. However, the mixed system achieves the highest engagement, suggesting an optimal blend of visual and textual information. Notably, the potential of these systems, especially the image-based modality, to assist individuals with intellectual disabilities is highlighted. The study concludes that the Image-Based Conversational Search System, though challenging in some aspects, holds promise, especially when integrated into a mixed system, ofering both clarity and engagement.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversational Search</kwd>
        <kwd>Image-Based Search</kwd>
        <kwd>Inclusive Design</kwd>
        <kwd>Human-Computer Interaction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid development of conversational search engines has completely changed the field of information
retrieval. A new era of natural language-based tailored search experiences is being ushered in by
platforms like ChatGPT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, users with disabilities or cognitive impairments face challenges
when using standard conversational search engines, which frequently require users to possess extensive
linguistic talents and awareness of the engine’s intricacies. These people may have linguistic restrictions
that make it more dificult for them to be receptive and expressive. The situation is further complicated
by the paucity of data pertaining to the various challenges linked to cognitive impairments.
      </p>
      <p>
        Currently, available tools to assist these users encompass assistive devices, text-to-speech technologies
for digital content narration, and basic messaging systems integrating voice, gestures, or sign language
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Recent eforts have led to the development of text-based, image-based, and mixed search engines
with various tasks assigned to 21 participants on these platforms. By integrating sensor technologies
to capture physiological signals, the aim becomes clear: determining the optimal search strategy for
particular user groups to foster a more inclusive and accessible search interface.
      </p>
      <p>The key component of this strategy is the conversion of textual results from conversational search
engines into visual formats, which facilitates information understanding. Simultaneously, sensors are
utilized to naturally record user input, removing the requirement for human intervention. The objective
of this paradigm shift is to increase the accessibility of information, particularly for individuals with
disabilities, thereby improving their independence and expanding their knowledge.</p>
      <p>This research centers on an image-based conversational search system that leverages sensor data,
like gestures and eye movements, to gauge user satisfaction. This feedback refines subsequent searches,
fostering a more adaptive and user-centric experience.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Recent research has delved deeply into the evolution and utility of voice assistants and conversational
interfaces [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The emergence and significance of voice assistants like Alexa, Siri, and Cortana,
emphasizing their integration in various sectors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Additionally, spoken conversational search sheds light
on user interactions in speech-only search tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The research from both 2017 and 2018 provides a
comprehensive overview of how people interact with these systems and the challenges therein [7].
      </p>
      <p>The usability and efectiveness of these conversational interfaces have been a pivotal area of research.
Ghosh assessed the System Usability Scale’s utility for evaluating voice-based user interfaces,
emphasizing the nuances of voice interactions [8]. In a similar vein, Avula and colleagues explored the dynamics
of user engagement with chatbots during collaborative searches[9]. Their study presented key findings
on how chatbots can be leveraged to enhance the search experience.</p>
      <p>Conversational search systems have been studied not just for individual use but also for collaborative
scenarios. Avula et al. focused on embedding search into conversational platforms to support
collaborative search, a novel approach to enhance user interaction and data retrieval[10]. Moreover, Kiesel and
his team delved into voice query clarification, underscoring the need for refining voice-based search
queries to improve search outcomes[11].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment Setup</title>
      <sec id="sec-3-1">
        <title>3.1. System Design</title>
        <p>To optimize response speed and quality, the selection was made to incorporate several well-established
APIs. The choice of these specific four APIs was influenced by a combination of factors: the precision of
their image analysis algorithms, the ease of integration with Flask web applications, and the availability
of comprehensive documentation and robust developer support. These APIs are renowned for their
rigorous testing and validation in the domain of image analysis and search, thus ofering dependable
and scalable solutions for image processing.</p>
        <p>From the myriad APIs suitable for image search, such as Google Cloud, Microsoft Azure, and Amazon
AWS, AWS Rekognition [12] emerged as the preferred choice. AWS Rekognition is a cloud-based solution
renowned for its sophisticated image analysis capabilities. It leverages deep learning algorithms to
undertake visual content analysis, with these algorithms being honed using extensive image databases.
This ensures precise identification of patterns and nuances within images. Notably, AWS Rekognition’s
scalability enables it to execute real-time analysis on extensive collections of images and videos. Such a
capability renders it invaluable for diverse applications spanning social media, e-commerce, and security.
The potential for enhancement further exists with the possible integration of APIs like ChatGPT and
DALL-E, both of which are OpenAI oferings.</p>
        <p>Incorporating the ChatGPT API into a search engine can amplify its eficiency. The ChatGPT API
can transform user-inputted textual descriptions into structured search queries with its advanced NLP
algorithms. This not only refines the search results but also simplifies the user journey in pinpointing
the desired images.</p>
        <p>In this paper, the ARASAAC[13] dataset is used to present the message in the image-based and mixed
search system. The ARASAAC dataset serves as an invaluable resource in the realm of augmentative
and alternative communication, facilitating the conveyance of ideas through a vast array of over
40,000 pictograms and symbols. Sponsored by the Aragonese government, ARASAAC ofers symbols
that span a broad spectrum of topics, from daily activities and fauna to emotions, sustenance, and
transportation modalities. Designed for universal comprehensibility, these pictograms are especially
beneficial for individuals with communication challenges. The dataset is readily accessible in various
formats, including PNG, SVG, and EPS, through the ARASAAC website or its API. Its significance in
the field is underscored by its integration into numerous AAC tools, encompassing speech generators,
symbol libraries, and communication boards.</p>
        <p>The frontend comprises three segments: the text-based conversational search system, the
imagecentric conversational search system, and a hybrid system. The text-based system facilitates user
interaction via text and voice, whereas the image-centric system focuses on image-based interactions.
In contrast, the hybrid system accepts image inputs and responds with a combination of images and
texts[14].</p>
        <p>Regarding the backend, the system integrates OpenAI services to produce text-based chat
conversations. Concurrently, a separate backend component is responsible for generating image-text chat
interactions and ofers an image search feature powered by the Google API.</p>
        <p>The user interface is pivotal for efective user-system engagement. A carefully designed UI can
significantly elevate user experience by creating an intuitive, user-centric space. To tailor the design,
thorough interviews and surveys determined user needs, leading to a blend of aesthetic and functionality.
While initial designs mirrored ChatGPT’s layout, feedback favored a conventional chat interface, with
system messages on the left and user messages on the right, ensuring familiarity and clarity. Post user
feedback, attention gravitated towards the visual design, culminating in the selection of the Material
Design paradigm, a design lexicon conceived by Google. Material Design[15], a cohesive amalgamation
of guidelines, elements, and tools, facilitates the crafting of engaging UIs. Its open-source nature
expedites the collaboration between designers and developers, ensuring aesthetically and functionally
sound products. Material.io provides exhaustive guidelines and UI components tailored for various
platforms, including Android, Flutter, and Web. At its core, Material Design enshrines principles vital
for exceptional UI creation, ranging from accessibility benchmarks to pivotal layout and interaction
patterns.</p>
        <p>The system’s architecture adopts a tripartite division, drawing inspiration predominantly from
traditional chat application design: system responses are situated on the left, user responses on the right,
and user input at the base. Each search functionality, contingent upon its input modality, necessitates a
distinct UI design:
• Text-based Conversational Search UI: This UI adheres to the archetypal chat application layout
with a distinctive feature at its base—an input box coupled with a toggle, enabling users to
interchange between text and voice input, facilitated by the web API’s speech recognition.
• Image-based Conversational Search UI: Emulating the conventional chat layout, this UI pivots
entirely around image responses, incorporating an image upload module at its base for user input.
• Mixed Image and Text Conversational Search UI: While this UI’s foundation mirrors the
imagebased counterpart, it integrates text-based responses beneath the images in the system and user
sections. Refer to Figure 1 for a detailed visual.</p>
        <p>Image recognition accuracy was paramount. Initially, labels from images were transformed into
sentences using the OpenAI API, but this resulted in unclear sentences. Leveraging the Arasaac symbol
library, where each symbol has a defined meaning, a new method matched input images to library
symbols to generate clearer sentences. This refined technique improved the system’s conversational
lfuency.</p>
        <p>Adapting the user interface for a diverse user base across various devices was critical. Emphasizing
a consistent interface, a responsive design strategy was employed to cater to diferent screen sizes.
Additionally, user feedback mechanisms were integrated, enabling continuous UI/UX improvements.</p>
        <p>In summation, sculpting the UI for a conversational search system proved to be an intricate endeavor,
but one that was immensely gratifying. The resultant UI, a product of meticulous attention to user
feedback and iterative design, seamlessly marries aesthetics with user-centric functionality.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiment Design</title>
        <p>Twenty-one volunteers were enlisted for the study. The predominant group comprised 18 students
from the University of Southampton, aged between 23 and 25 years. Additionally, there was one female
participant under 18 and two participants over 50, one male and one female. None of the participants
had visual impairments or neurological or psychiatric disorders. Due to sensor inconsistencies leading
to considerable data variation, data from one participant was excluded. Consequently, data from 20
participants was utilized for the study.</p>
        <p>The primary objective was to garner sensor data from participants interacting with a search engine
across three modalities: Text, Image, and Mixed mode. Preceding the experiment, the seating position
relative to the monitor was standardized, accounting for fixed placements of both the camera (resolution:
1920×1080 pixels, 30fps) and the eye-tracker[16]. This ensured optimal facial capture and eye-tracking
data accuracy.</p>
        <p>Upon arrival, participants were directed to don the Shimmer sensor, connecting two electrodes to two
ifngers and an additional electrode to the ear. Once sensors were verified to communicate efectively
with the data platform, the experimental tasks commenced.</p>
        <p>A structured flow was followed throughout the experiment, consolidated within a unified interface.
This began with an eye-tracker calibration using the iMotions[17] software, followed by a brief system
introduction on a welcome screen.</p>
        <p>Subsequent steps included:
• Consent Form: Ensuring ethical compliance by informing participants of research objectives and
collecting informed consent.
• Demographic Input: Participants entered basic demographic details, including age, gender,
education, and any physical conditions.</p>
        <p>• Instruction Page: A preparatory guide to familiarize participants with the upcoming tasks.</p>
        <p>The crux of the experiment revolved around participants interacting with a search engine to complete
three tasks, which varied in content but maintained a consistent structure. To avoid bias arising from
the order of tasks, their sequence was randomized using a card draw system. The tasks are as follows:
Task 1: Ask the system to recommend a movie and tell you the duration of the movie.
Task 2: Ask the system to recommend an action movie and tell you the duration of the movie.
Task 3: Ask the system as follows:
• “I want to see a movie"
• “Action movie"
• “Tell me the duration of the movie"</p>
        <p>Participants were then asked to complete these tasks in a randomized diferent system of the three
systems mentioned above and fill out a questionnaire to reflect their experience with the three search
engine modes. The content of the questionnaire is shown in the Table 1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Findings and Discussion</title>
      <p>A collection of 21 data sets was gathered, each encompassing skin conductivity, heart rate, eye movement
data, and facial emotion metrics for an individual participant. Upon analysis, one data set displayed
substantial deviation, possibly attributable to incorrect sensor placement, and was consequently
excluded. This resulted in a compilation of 20 valid data sets, each generating an average of 10 minutes of
sensor data. The iMotions software captures sensor data at millisecond intervals, ensuring a precision
of 1 millisecond in the obtained CSV files.</p>
      <p>The principal objective of this study is to facilitate individuals with physical disabilities, particularly
those facing challenges using a mouse, in executing searches via sensors. As such, it is imperative to
analyze users’ physiological data during their decision-making process in a search operation. Within
our experimental framework, a user’s decision is indicated by activating the SEND button in the
dialogue system, necessitating an examination of physiological data concurrent with a mouse click. To
leverage sensor data as a surrogate for mouse actions, it’s essential to grasp users’ mouse usage patterns.
While direct investigation of physiological data at the mouse click instance is feasible, pinpointing
the exact moment of action is arduous. Moreover, the sheer volume of data from a singular moment
might result in fragmented data, increasing the risk of analytical errors due to potential physiological
signal inaccuracies. Consequently, a broader temporal scope is advocated for analyzing user operation
behavior. The association between mouse clicks and search rankings, introduced a temporal interval
spanning the onset of mouse movement to the subsequent cessation to elucidate user behavioral traits
[18]. Their findings emphasized the optimal duration of 40 milliseconds post-initiation of mouse
movement to balance data volume against user behavior insight. Aligning with this, our analysis
focuses on sensor data fluctuations within the 40 ms post-mouse movement, utilizing the period’s
average data as a reflection of users’ physiological traits. The subsequent challenge lies in ascertaining
the commencement point of this interval.</p>
      <p>The study commenced by examining the average GSR (Galvanic Skin Response) levels of users
across the three search engine modes. GSR, an indicator of an individual’s physiological activation and
emotional state, is derived as the mean value over the entirety of the user’s interaction period. This
data, quantified in microsiemens (  S) to represent conductivity, is procured from the Shimmer sensor.
Notably, conductivity elucidates the resistance encountered by an electric current as it traverses through
a unit material area. Within the realm of GSR measurements, this is employed to signify variations in
skin conductivity, predominantly attributed to perspiration.</p>
      <p>GSR levels escalate in tandem with a user’s cognitive engagement when interacting with a computer
via voice and gestures [18]. Elevated GSR levels are typically observed during states of emotional
arousal, such as excitement or anxiety, attributed to heightened activity in the sympathetic branch
of the autonomic nervous system, resulting in increased sweat production and skin conductance.
Conversely, states of relaxation or pleasure may witness a reduction in GSR, due to elevated activity in
the parasympathetic branch, leading to diminished skin conductance.</p>
      <p>Heart rate (bpm) Eye fixation time (ms) Engagement Mood
Text mode
Image mode
Mixed mode</p>
      <p>This investigation explored the correlation between the physiological parameters of participants
while utilizing various search engine display modes. Through analyzing fluctuations between emotional
response and cognitive load, this study aimed to discern if search engines that utilize imagery, or a
combination of imagery and text, influence users’ cognitive load or emotional states. Table 2 delineates
the variance in several datasets across diferent modes.</p>
      <p>To represent the correlation of data across diverse modes, data in Text mode was normalized to 1.
Subsequent data was calculated in proportion to this baseline, with results rounded to three decimal
places.</p>
      <p>It is evident that the form of information (either Text or Image) provided by the search engine doesn’t
markedly afect heart rate. Conversely, parameters like GSR, eye fixation time, and engagement mood
exhibit significant alterations. Specifically, GSR diminishes in Image mode but escalates in Mixed mode,
suggesting that a purely image-based display might efectively curtail user stress and cognitive load.
However, a combined display appears to augment both factors. From an ocular trajectory perspective,
Image mode intensifies comprehension challenges, necessitating users to fixate longer for content
assimilation. The Mixed mode, incorporating both visual and textual elements, seems to mitigate this
challenge. Regarding facial expressions, both Image and Mixed modes induce notable shifts in users’
expressions, potentially because the image format incurs an initial learning curve, leading to evident
surprise among first-time users.</p>
      <p>In examining decision-making data shifts, it was noted that within the initial 40 milliseconds of
mouse movement, there was a discernible change in physiological signals, presumably indicating a
decision-making process. To validate if physiological data can signify decision-making during searches,
data procured by sensors in this timeframe were meticulously analyzed. Table 3 highlights the degree
of change for each dataset within this interval.</p>
      <p>Overall, all three figures rise when users make certain decisions. From the above table, the average
growth rate of GSR is 11.8% in all three modes. And the growth rate of Eye fixation time is 94.15%. And
the average growth rate of Heart rate is 6.53%.</p>
      <p>Upon analyzing the sensor data, several patterns emerged. The physiological signals, especially
eye movements and gestures, indicated a high level of user engagement when interacting with the
image-based conversational search system. The frequency of positive feedback actions, such as nodding
in agreement, was significantly higher compared to instances of confusion or dissatisfaction.</p>
      <p>Feedback from participants further verified these findings. A significant number of users expressed a
preference for the text-based and mixed systems. Their belief in the potential of conversational search
systems, especially in helping individuals with intellectual disabilities, was also verified.</p>
      <p>The image-based conversational search system was generally well-received by users. The visual
representation of search results provided an intuitive interface, reducing the cognitive load on users.
This was particularly evident in the reduced time taken for users to comprehend search results, as
compared to traditional text-based outputs. Additionally, the adaptive feedback loop mechanism, which
refined search results based on real-time user feedback, further enhanced user satisfaction levels.</p>
      <p>One of the standout findings was the system’s potential to revolutionize the search experience
for users with disabilities. The visual-centric approach catered to those with linguistic or cognitive
challenges, making information more accessible. Furthermore, the sensor integration, which allowed
for gesture-based feedback, provided an inclusive platform for those with physical disabilities. This was
a significant stride towards creating a more equitable digital landscape.</p>
      <p>When juxtaposed with traditional text-based conversational search engines, the image-based system
showcased several advantages. The most prominent was the system’s ability to transcend language
barriers, given its reliance on universally understood visual cues. Moreover, the sensor-based feedback
mechanism provided a more organic and intuitive user experience, as opposed to the often cumbersome
text-based feedback methods. However, it’s worth noting that while the image-based system excelled
in inclusivity and user engagement, there were instances where users familiar with traditional systems
took time to adapt to this novel approach.</p>
      <p>In conclusion, the findings underscored the potential of the image-based conversational search system,
not just as a novel innovation, but as a tool for fostering inclusivity in the digital realm. The system’s
design, rooted in user-centric principles, and its adaptive capabilities, driven by sensor feedback, set it
apart as a pioneering solution in the realm of information retrieval.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Recommendations</title>
      <p>The inception of this project was motivated by the objective of crafting an intuitive user interface,
prioritizing ease of use, and fostering a user-centric experience. The trajectory of the project was
marked by the adoption of a renowned frontend framework, supplemented by the material design style
for frontend development. To enhance the depth of user experience analysis, the iMotions application
was harnessed alongside sensors for data collection and assessment.</p>
      <p>One of the project’s cornerstone achievements was the design and execution of a conversational
search system. The resilience of the system’s backend was pivotal in ensuring swift data acquisition
and processing. Furthermore, the project witnessed significant strides in user interface development,
characterized by a natural, dialogue-centric design. Through the implementation of renowned front-end
frameworks and the material design aesthetic, users were guaranteed an intuitive interaction paradigm.
Optimization of the data structure was instrumental in expediting searches and rendering, assuring
users of timely feedback. This interface’s versatility across diverse devices underscored its eficacy.</p>
      <p>The system’s evaluation phase furnished indispensable insights into both system performance and
user experience. Comprehensive assessments were ensured through the deployment of varied task
levels, complemented by a counterbalancing strategy. Emotion analysis, underpinned by the iMotions
application, illuminated both the system’s strengths and areas ripe for enhancement. Participant
feedback was overwhelmingly afirmative, with particular emphasis on the system’s promise for
individuals with intellectual challenges. While each system presents its unique strengths, the hybrid
conversational search system seemingly ofers an optimal blend of clarity and engagement. The
feedback further underscores the transformative potential of these systems in enhancing technological
accessibility for individuals with disabilities.</p>
      <p>Emotion analysis, conducted using the iMotions application, yielded significant insights into user
experience. The text-based search system was observed to be the most intuitive and least perplexing
for users. Conversely, the image-based system demanded greater cognitive efort from users, resulting
in heightened confusion. Interestingly, the hybrid system, integrating both text and images, received
the most engagement, indicating its enhanced interactivity for users.</p>
      <p>Participant feedback corroborated these observations. A notable proportion expressed a predilection
for both the text-based and hybrid systems. Moreover, the eficacy of conversational search systems,
particularly in assisting individuals with intellectual disabilities, received afirmation.</p>
      <p>In summary, while each system presents its unique strengths, the mixed conversational search system
seemingly ofers an optimal blend of clarity and engagement. The feedback further underscores the
transformative potential of these systems in enhancing technological accessibility for individuals with
disabilities.</p>
      <p>This assessment lays a robust groundwork for subsequent enhancements to the system, aiming
to cater to a broader audience without compromising its fundamental functionality and user-centric
design.</p>
      <p>We anticipate the following developments in the next stages of research and improvement: First, our
algorithms will be painstakingly fine-tuned to improve search result accuracy and make sure results
align more closely with user preferences. Next, we want to expand the scope of our emotion analysis by
including a larger number of sensors and a broader range of emotions. In addition, we have observed
from feedback that there may be a benefit to enhancing accessibility features, particularly in order to
accommodate users with cognitive disabilities, highlighting our dedication to inclusiveness. Finally, we
want to strengthen our commitment to always adapting to the changing needs and preferences of our
users by adding a real-time feedback mechanism to the system.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT to assist with partial translation and
language polishing of the manuscript. The AI tool was solely employed to improve readability and
lfuency of pre-existing content. All key scientific insights, conclusions, and technical content remain
entirely human-authored. The authors carefully reviewed and edited all AI-assisted output and take
full responsibility for the final work.
[7] J. R. Trippas, D. Spina, L. Cavedon, H. Joho, M. Sanderson, Informing the design of spoken
conversational search: Perspective paper, in: Proceedings of the 2018 conference on human
information interaction &amp; retrieval, 2018, pp. 32–41.
[8] D. Ghosh, P. S. Foong, S. Zhang, S. Zhao, Assessing the utility of the system usability scale for
evaluating voice-based user interfaces, in: Proceedings of the Sixth International Symposium of
Chinese CHI, 2018, pp. 11–15.
[9] S. Avula, G. Chadwick, J. Arguello, R. Capra, Searchbots: User engagement with chatbots during
collaborative search, in: Proceedings of the 2018 conference on human information interaction &amp;
retrieval, 2018, pp. 52–61.
[10] S. Avula, J. Arguello, R. Capra, J. Dodson, Y. Huang, F. Radlinski, Embedding search into a
conversational platform to support collaborative search, in: Proceedings of the 2019 Conference
on Human Information Interaction and Retrieval, 2019, pp. 15–23.
[11] J. Kiesel, A. Bahrami, B. Stein, A. Anand, M. Hagen, Toward voice query clarification, in: The 41st
international acm sigir conference on research &amp; development in information retrieval, 2018, pp.
1257–1260.
[12] A. Mishra, Machine learning in the AWS cloud: Add intelligence to applications with Amazon</p>
      <p>Sagemaker and Amazon Rekognition, John Wiley &amp; Sons, 2019.
[13] D. Paolieri, A. Marful, Norms for a pictographic system: the aragonese portal of
augmentative/alternative communication (arasaac) system, Frontiers in psychology 9 (2018) 2538.
[14] L. Zhou, J. Gao, D. Li, H.-Y. Shum, The design and implementation of xiaoice, an empathetic social
chatbot, Computational Linguistics 46 (2020) 53–93.
[15] C. Coljee-Gray, Material design, M3 - Material Design, 2017. https://m3.material.io/, Last accessed
on 2023-8-21.
[16] T. P. Nano, Easy to use, small, portable eye tracker, 2023. https://www.tobii.com/products/
eye-trackers/screen-based/tobii-pro-nano, Last accessed on 2023-9-26.
[17] iMotions Lab, imotions lab human behaviour research platform, 2015. https://imotions.com/
products/imotions-lab/, Last accessed on 2023-9-26.
[18] Y. Shi, N. Ruiz, R. Taib, E. Choi, F. Chen, Galvanic skin response (gsr) as an index of cognitive load,
in: CHI’07 extended abstracts on Human factors in computing systems, 2007, pp. 2651–2656.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <article-title>Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope, Internet of Things and Cyber-Physical Systems (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Light</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McNaughton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <article-title>New and emerging aac technology supports for children with complex communication needs and their communication partners: State of the science and future research directions</article-title>
          ,
          <source>Augmentative and Alternative Communication</source>
          <volume>35</volume>
          (
          <year>2019</year>
          )
          <fpage>26</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulholland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Uren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rüger</surname>
          </string-name>
          ,
          <article-title>Applying information foraging theory to understand user interaction with content-based image retrieval</article-title>
          ,
          <source>in: Proceedings of the third symposium on Information interaction in context, 2010</source>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaushik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Velavan</surname>
          </string-name>
          ,
          <article-title>An exploratory study on a reinforcement learning prototype for multimodal image retrieval using a conversational search interface, Knowledge 2 (</article-title>
          <year>2022</year>
          )
          <fpage>116</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Hoy</surname>
          </string-name>
          , Alexa, siri, cortana, and
          <article-title>more: an introduction to voice assistants</article-title>
          ,
          <source>Medical reference services quarterly 37</source>
          (
          <year>2018</year>
          )
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Trippas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cavedon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <article-title>How do people interact in conversational speech-only search tasks: A preliminary analysis</article-title>
          ,
          <source>in: Proceedings of the 2017 conference on conference human information interaction and retrieval</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>