=Paper= {{Paper |id=Vol-2778/paper7 |storemode=property |title=The Semantic Combining for Exploration of Environmental and Disease Data Dashboard for Clinician Researchers |pdfUrl=https://ceur-ws.org/Vol-2778/paper7.pdf |volume=Vol-2778 |authors=Albert Navarro-Gallinad,Alan Meehan,Declan O'Sullivan |dblpUrl=https://dblp.org/rec/conf/semweb/Navarro-Gallinad20 }} ==The Semantic Combining for Exploration of Environmental and Disease Data Dashboard for Clinician Researchers== https://ceur-ws.org/Vol-2778/paper7.pdf
 The Semantic Combining for Exploration of
Environmental and Disease Data Dashboard for
           Clinician Researchers

          Albert Navarro-Gallinad, Alan Meehan, and Declan O’Sullivan

    ADAPT Centre for Digital Content, Trinity College Dublin, Dublin, Ireland
 School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
       {albert.navarro,alan.meehan,declan.osullivan}@adaptcentre.ie



        Abstract. While Semantic Web technologies facilitate the integration
        of heterogeneous data sources through the Resource Description Frame-
        work (RDF) and ontologies, they present an obstacle for non-technical
        researchers who want to access and explore the data to meet their needs.
        To address this problem, visual tools and analytical platforms with a
        user-centred approach are an emerging solution. This paper outlines the
        design of a dashboard called Semantic Combining for Exploration of En-
        vironmental and Disease data (SCEED), an initial visual tool designed
        for use by clinician researchers to explore and retrieve combined environ-
        mental and disease data for further analysis. The evaluation of SCEED
        consists of a combination of standard usability and effectiveness meth-
        ods, using the AVERT project as a case study. In the AVERT project,
        clinician researchers need to address the challenges of querying specific
        vasculitis flare clinical data for a particular patient to retrieve linked en-
        vironmental data from a triplestore, and downloading the chosen data
        as input for their statistical models. The initial evaluation has concluded
        that the SCEED dashboard is an adequate initial design to fulfil, and
        points towards an interface to engage clinician researchers directly with
        Linked Data. Furthermore, this paper helps to highlight the difficulty of
        conducting usability evaluations with small sample sizes and how evalu-
        ation metrics can be combined to assess the requirements for developing
        an effective tool.

        Keywords: Semantic Web · dashboard · usability evaluation.


1     Introduction
Semantic Web technologies have a steep learning curve which can present an
obstacle for non-technical researchers when trying to access and explore the
data for their needs. Visualization tools and analytical platforms operating on
top of Semantic Web architectures can support accessing and exploring Linked
Data for non-technical or non-domain expert users by aiding query formulation
              c 2020 for this paper by its authors. Use permitted under Creative Com-
    Copyright �
    mons License Attribution 4.0 International (CC BY 4.0).


                                            73
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 in an intelligible manner [5]. A dashboard approach with multiple coordinated
 views offers advantages for statistical data including the integration of multiple
 data sources, details of the underlying data, a flexible data analysis layer and
 a reusable framework [16, 18, 3]. Furthermore, a user-centred approach to dash-
 board design provides an easy and intuitive interface to be used by a focused
 group [16, 18, 3, 11] and the use of standard usability evaluation with standard
 post questionnaires enables comparison of prototype tools with later versions of
 the tool, as well as with other tools.
     At the moment, typically clinician researchers require knowledge engineers to
 access, explore and retrieve data that they are interested in from datasets when
 implemented using standard Semantic Web technologies. Therefore, there is an
 opportunity to propose a semantic analytical platform following a user-centred
 approach. In particular, health related statisticians or a clinician with statisti-
 cal experience (hereafter clinician researchers) lack tools to explore clinical and
 environmental linked data, which will be used as input to train their models
 (described further in the Section 3).
     This paper outlines the design of the Semantic Combining for Exploration
 of Environmental and Disease data (SCEED) dashboard, an initial visual tool
 designed to be used by clinician researchers to explore and retrieve combined
 environmental and disease data for further analysis. The contribution of this
 paper is the SCEED dashboard itself along with an initial evaluation of the
 usability and effectiveness through a standard usability test.
     The paper is structured as follows. Section 2 reviews related research. Section
 3 overviews the AVERT project. Section 4 outlines the design and implementa-
 tion of the SCEED dashboard. Section 5 describes the evaluation method, the
 results and analysis of the user experiment and discusses the outcome of this
 initial evaluation. Section 6 concludes the paper and states the future work.


 2    Related Work

 This section overviews the state-of-the-art in Semantic-Web based visual tools
 to meaningfully explore linked data for clinician researchers. We have classified
 the reviewed tools based on the usability evaluation for their visual techniques.
     Relevant tools where non-standard usability evaluation used. The
 Granatum project addresses the computational challenges that genomic scien-
 tists have in analysing single-cell RNA sequencing data with a graphical analysis
 pipeline [25]. As part of the project, Hasnain et al. 2014 [9] evaluates the devel-
 oped Liked Biomedical Dataspace for supplementing drug discovery with domain
 experts, following a user-centred approach for bioinformaticians and biomedical
 researchers. This dataspace uses ReVealD as the visual query system, which
 is evaluated with ’Tracking Real-time User Experience (TRUE)’ methodology,
 and later became a platform integrated for this project. Kamdar et al. 2014 [9]
 evaluates this platform for biomedical researchers with metrics such as number
 and time taken per step to complete a task. The fact that an ad hoc usability
 questionnaire was used in this particular study encumbers further comparison


                                         74
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 with other studies. Likewise, Villanueva-Rosales et al. 2015 [23] uses an ad hoc
 usability questionnaire to evaluate, with multidisciplinary participants, an exper-
 imental graphical user interface for The Earth, Life and Semantic Web Project
 [4]. In addition, Scharl et al. 2017 [21] assesses the usability of the semantic
 analytical platforms usability with heuristic evaluation, formative usability tests
 and feedback from actual users, communication professionals; non-standard us-
 ability/effectiveness metrics.
     Relevant tools where standard usability evaluation was used. Sabol
 et al. 2014 [19] presents a toolchain to explore and visually analyse Linked Data
 for non-Semantic Web experts. The authors evaluate the work from a formative
 usability angle with a quantitative, standard NASA Task Load Index (TLX)
 for workload and time per task; and qualitative, think-aloud protocol metrics.
 Furthermore, Dafli et al. 2015 [6] evaluates the usability and efficacy of the
 Open Laberinth extension with specific scenarios aimed at health professionals. A
 System Usability Scale (SUS) questionnaire, a standard method, in combination
 with eViP questionnaire and expert reviews were the used metrics. This standard
 questionnaire is also used by Braoveanu et al. 2016 [3] in combination with the
 time per task and discussion of the task results for a user study with tourism
 researchers and practitioners; and by Zained et al. 2015 [24] with an additional
 custom questionnaire designed to test the FedViz interface for a researchers and
 engineers in Semantic Web.
     In contrast to the related work outlined above, our approach is focused
 on providing access and exploration of linked environmental and clinical data
 for clinician researchers. In this paper, we present a dashboard with a user-
 centred approach addressing clinician researchers assessed with a standard us-
 ability/effectiveness evaluation, enabling comparison with later versions of the
 SCEED tool, and with other related tools.


 3      AVERT Background
 AVERT1 and HELICAL2 are two projects in the field of Healthcare Data Link-
 age which share the same data integration approach based on Semantic Web
 technologies [17], providing a scalable semantic architecture for data related to
 rare chronic diseases. The data model links clinical data for patients with ANCA
 vasculitis (a rare kidney disease) with environmental data, for the goal of pre-
 dicting when flares of the disease may occur for individual patients.
     In the AVERT semantic architecture [17], Semantic Web technologies are
 used to combine multiple diverse data sources with spatial and temporal common
 features between medical registries and environmental data. These datasets are
 converted to RDF [12] with R2RML [7], a mapping from relational databases for-
 mat to RDF datasets; and stored in a triple store, allowing information retrieval
 through semantic queries. Then SPARQL enables the retrieval, manipulation
 and linkage of the stored data. Currently a knowledge engineer performs these
  1
      https://www.tcd.ie/medicine/thkc/avert/
  2
      http://helical-itn.eu/


                                          75
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 SPARQL queries to fulfil the clinician researchers needs, a human-in-the-loop
 approach.
     The intention going forward in such projects is to allow the clinician re-
 searchers themselves (through a dashboard) to access and explore the clinical
 and environmental data that are represented and linked through semantic web
 technologies. The data will be used as input to train their models to poten-
 tially find associations and relationships between environmental factors and the
 disease flares of patients.
     An effective tool would thus intend to achieve the following requirements
 extracted from expert consensus within AVERT:
     Requirement 1 : enable the clinician researcher to query specific clinical pa-
 tient data to retrieve linked environmental data, without the need for knowledge
 of the underpinning semantic web technologies;
     Requirement 2 : support the understanding of the clinician researcher in the
 use and limitations of the linked environmental data to support identification of
 flares for rare diseases;
     Requirement 3 : allow for the download of selected clinical and environmental
 data to be used as input in statistical models for data analysis.
     The SCEED dashboard is a prototype tool aimed at satisfying these three
 requirements.


 4    Design and Implementation

 The development of the dashboard was motivated by the needs of clinician re-
 searchers in the AVERT project, who are not Semantic Web experts, to identify
 relevant environmental data that should be linked to longitudinal ANCA vas-
 culitis patient clinical data to support spatio-temporal analysis of the data. This
 is done in order to support prediction of flares for individual patients and to ul-
 timately support the discovery of environmental factors that trigger the disease
 in the patient cohort.
     The dashboard operates on top of AVERTs semantic architecture (see Section
 3), where data from multiple data sources is uplifted to RDF [17]: weather and
 pollution data (27.5M triples) along with infectious disease data and clinical
 data (2.6M triples). This relevant data is retrieved from a triplestore supporting
 GeoSPARQL [1] queries, which are key for the nature of our data (which has
 spatio-temporal components).
     The initial dashboard was designed to have four tabs (see Fig. 1), each of
 which are described next.
     Query tab. In the Query tab of the dashboard, Fig. 1 part a), the user
 can select options from the different flare related parameters. These selected
 options are then substituted into a SPARQL query template, URL encoded and
 executed against the data in the triplestore.This tab is aimed towards satisfying
 Requirement 1.
     Link data tab. This is the first tab the user sees when submitting a query,
 Fig. 1 part b). The aim of this tab is to provide the environmental data linked


                                         76
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers




                                                                              a)
                                                                              b)
                                     e)

                                               d)




                                                                                c)




 Fig. 1. SCEED dashboard multiple view after finishing the tasks from the study. a)
 Query section. The user can select an option from a dropdown display list for Patient
 ID and Flare date, a numeric value for Days before Flare and an input from the radio
 buttons in Spatial aggregation. Data tabs: b) Link Data, c) Std.Data, d) Comp.Data
 and e) Vis.Data; the different tabs allow the user to navigate, compare, visualize and
 download meaningful information. Each tab starts with an introductory text informa-
 tion to guide the user and ends with a data visualization as table or graph.
                                          77
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 to the clinical patient queried in the Query tab. This data is displayed as a
 table with a hovering feature that displays the data description, gathered from
 the climate data store provided by the European Centre for Medium-Range
 Weather Forecasts (ECMWF) parameter database3 . The table displayed after
 the submission of the query can be downloaded as a CSV file (in support of
 Requirement 3 ).
     Standard data tab. In the Standard data tab, Fig. 1 part c), the user can
 compare different environmental variables for a better understanding of their
 variability (in support of Requirement 2 ), since they have been (statistically)
 standardized, hence they have been converted to the same scale, producing stan-
 dard scores (Z-scores). The table displayed has some highlighted values with
 colour encoding depending on the category of the value, available to download
 as a CSV file.
     Comparative data tab. A CSV file is stored with data from the previous
 submitted query. These are then compared in an interactive plot with legend
 features, allowing for selecting and deselecting of options key for multiple flare
 environmental related data comparison (see Fig. 1 d). The multiline plot allows
 a user to discover/present/identify trends, seasonality, comparison and check
 for outliers previously discovered in the standardization tab; to improve their
 comprehension of the environmental data previous to the patient’s flare event
 (in support of Requirement 2 ).
     Visualization tab. In this tab Fig. 1 part e), the user can visualize the
 last submitted query to have a cleaner view of each variable. The tab is aimed
 to provide a quick insight of the data prior to download to make sure it has
 accomplished the statisticians needs.
     The SCEED dashboard (shown in Fig. 1) is coded in Python (3.6), using
 the Dash 1.7.0) 4 package as a framework that facilitates building cross-platform
 analytical platforms. This dashboard is coded dynamically, displaying the drop
 down options for each parameter reacting to available data in the triplestore
 endpoint. Therefore, if new data is added to the triplestore, according to the same
 data model, the dashboard will react accordingly showing the new available data.
 This approach is ideal when managing both clinical data (since data collection is
 an ongoing process), as well as environmental time series data which is constantly
 being updated


 5      Evaluation
 An experiment was undertaken to evaluate the usability of the initial SCEED
 dashboard in accessing environmental data linked with clinical de-identified pa-
 tient records, by clinician researchers who have no practical experience with
 in Semantic Web technologies. The user experiment was structured by a brief
 introduction to the dashboards background, the tasks to be completed by the
 participants and a follow up post-questionnaire.
  3
      https://apps.ecmwf.int/codes/grib/param-db
  4
      https://dash.plotly.com/


                                         78
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 5.1   Experimental Setup and Execution

 The target group was clinician researchers that are not Semantic Web experts,
 who would be users of the analytical platform and who had data exploration
 needs in the health domain similar to those of the AVERT project. This targeted
 selection criteria resulted in the recruitment of seven participants: PhD students
 (3) and professors (4), who were experienced in analysing clinical data with
 statistical models. These participants are within the 30-50 age range, with a
 female to male ratio of 2:5 and international with fluent English. This sample
 size is covering the requirements, evaluating a prototype of a novel user-interface
 design in the first stages that has a specialised nature (exploration of clinical and
 linked environmental data) [15].
     The experiment started with the participants signing the informed consent
 document, which commenced with a short explanation about the purpose of
 the dashboard, its main target contribution to research and a mention of the
 semantic technologies operating in the back-end. This was the first contact with
 the SCEED dashboard and the participant had not interacted with or seen the
 dashboard previously. Furthermore, each participant was asked to follow a role
 while testing SCEED: that is a researcher with access to clinical patient data
 would like to extract and comprehend environmental data related to patient
 flares.
     Clinical data was simulated from the AVERT data model tailored to support
 the chosen tasks. Environmental data was obtained from ERA5, the fifth gen-
 eration ECMWF atmospheric reanalysis of the global climate [8], and reduced
 to four variables (columns in the data table of Fig. 1), again to support timely
 exploration of the dashboard for the given tasks.
     Each participant was asked to follow a concurrent think-aloud protocol (CTA)
 [2] and participants’ think-aloud statements were recorded by hand by the exper-
 iment designer during the evaluation. The think-aloud protocol requires listening
 to the participant process while completing the tasks as well as encouraging the
 think-aloud action. The participants think-aloud statements and extra feedback
 were recorded by hand for a qualitative analysis with Grounded Theory [10, 22].
     As the experiment was conducted during COVID-19 restrictions, synchronous
 remote testing was the method used through a video conferencing platform
 with remote control functions. Interestingly, we observed that the remote testing
 nature of the study reinforced the ideal spectator role within the participant-
 observer interaction, an optimal testing environment for the CTA method. An
 hour was allocated to each participant to explore the dashboard, complete the
 tasks and fill out the post-questionnaire.
     Each participant was asked to complete a series of tasks carefully selected
 to assess the three core requirements of the dashboard stated previously. These
 tasks were written and given together with the informed consent document at
 the start of the video conference. The observer tracked manually the time spent
 per task with a stopwatch when the participant explicitly made a comment
 that the task had been completed. The tasks were set out as follows. First the


                                         79
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 instruction, and then the criteria for when the task will have been completed by
 the participant. The tasks were sequenced and numbered as follows:
  1. Submit a query for a specific patients flare. The task will be complete
     when an environmental data table is displayed in the LinkData tab.
  2. Explain the meaning of each column in the environmental table.
     The task will be complete when the participant has hovered over the columns
     headings of the main table and read the description of the environmental
     variables.
  3. Try different aggregation approaches.The task will be complete after
     the participant has explored the different spatial aggregations available in
     the Query section and the main table reacts/changes accordingly.
  4. Compare variables for the same flare in the standard data tab. The
     participant will have finished this task after selecting the Std.Data tab.
  5. Compare environmental data from different flares in the compar-
     ison tab. The participant will have finished this task when they have suc-
     cessfully compared two flares in the Comp.Data tab.
  6. Visualize Link Data variables in the visualization tab. The task will
     be completed when the participant successfully visualizes the environmental
     data prior to download in the Vis.Data tab.
  7. Download useful raw data for the researchers needs. The partici-
     pant will have finished this task after downloading the data from either the
     LinkData or Std.Data tabs.
     After the completion of the tasks, the participants were asked to complete a
 Post-Study System Usability Questionnaire (PSSUQ) (described further in the
 next section) to evaluate the user experience in a quantitative metric.
     The methods described above include a CTA protocol, successful completion
 of the tasks and time on task to support the PSSUQ standard questionnaire.
 The CTA protocol grants feedback to understand the effective task completion
 and time on task in a meaningful way. These methods combine of quantitative
 and qualitative metrics to evaluate the usability of the SCEED tool.

 5.2   Results
 Quantitative results Time on task. The box plots in Fig. 2 compare partici-
 pants times spent on task, which all the participants completed successfully. The
 spread of the time values per task, the length of the boxes (IQR), is below 1 min
 for the simplest tasks of submitting a query and downloading the data (T1 and
 T7); between 1-2 min for the tasks of selecting different query parameters and
 tabs (T3, T4 and T6) and around 3 min for the more complex tasks of explaining
 the meaning and comparing the data (T2 and T5). Furthermore, the median fol-
 lows a similar pattern than the spread with most of the tasks below 3.5 minutes,
 increased by 1 min for T2 and doubled for the most complex task of compar-
 ing patients flares (T5). The box plots in Fig. 2 also identified 3 outliers and
 proved to be suitable in studying the patterns of this data with a sample size of 7.



                                         80
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


              20.0
                                                                             n=7
              17.5                                                           Tukey-style
                                                                             whiskers
              15.0
 Time [min]




              12.5

              10.0

               7.5

               5.0

               2.5

               0.0
                      T1        T2        T3        T4        T5        T6        T7
                                                   Tasks

                     Fig. 2. Time spent to complete each task during the experiment.



     PSSUQ: The Post-Study System Usability Questionnaire. The PSSUQ is a
 general questionnaire meant to assess the usability evolution during the devel-
 opment of a system with 19 questions [13], second version of the questionnaire
 was used in this study. The PSSUQ follows a 7-point Likert Scale and assesses
 four different metrics: system usefulness (SysUse), information quality (Info-
 Qual), interface quality (IntQual)and overall, averaged from 1-8, 9-15, 16-18,
 1-19 questions. The questionnaire results and aggregations per group for the
 SCEED dashboard are presented as box plots in Fig. 3. This visualization allows
 us to compare the distributions without any assumptions, again adequate for
 our sample size. Most of the PSSUQ scores have a median of 2 and a spread
 between 1-1.5 points, reduced to ≤ 1 for the four averaged metrics with larger
 sample sizes. Moreover, Q7 and Q16, which indicate the learning easiness and
 the interface pleasantness, got the best scores. However, Q12 and Q18, regarding
 easy finding of information and having the needed functions, have an increased
 median of 3, the higher the worse in this scale, and Q9 has a spread of 3.5 points
 indicating a diversity in opinions on the error messages. The identified outliers
 for the individual questions are from 2 participants, which were not satisfied with
 the system use and features available. Furthermore, some participants provided
 qualitative comments through the PSSUQ open comment section coherent with
 the previous results.


 Qualitative results The experiment also followed a concurrent think-aloud
 protocol providing the study with qualitative data, analysed by means of Grounded
 Theory. The observer/note-taker coded and categorized the annotations manu-
 ally and with the note-taker’s criteria alone. The categories of the annotations
 were recurrent and natural, commenting upon the important features of the
 dashboard. All the participants discussed the usefulness and understanding of
 the patient IDs exploration, the Z-scores, the comparative plot visualization


                                                  81
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


          7
                                                                    n=7
          6                                                         Tukey-style
                                                                    whiskers
          5
  Score




          4

          3

          2

          1
              1
                  2
                      3
                          4
                              5
                                  6
                                      7
                                                   8
                                                   9
                                                  10
                                                  11
                                                  12
                                                  13
                                                  14
                                                  15
                                                  16
                                                  17
                                                  18

                                           Sy 9

                                             fo se
                                              tQ l
                                              ve l
                                                    ll
                                           In ua
                                            O ua
                                                 ra
          Q
               Q
                   Q
                       Q
                           Q
                               Q
                                   Q
                                       Q
                                                 Q




                                                  1
                                               sU
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q
                                                Q


                                               Q
                                          In
                                           PSSUQ Item/Scale

 Fig. 3. PSSUQ scores box plot with the four averaged metrics (SysUse, InfoQual,
 IntQual and Overall) on the right end with a sample sizes of 56, 49, 21 and 133.


 and the downloading approach of patient flare linked environmental data in the
 dashboard. However, the rest of the features, including the dates selection (days
 before flare), the spatial aggregation feature, hovering for the variable informa-
 tion display and the usefulness of the Vis tab; presented a variety of patterns.
 Participants expressed that the linked environmental data would be more useful
 if provided as a time series for a specific period instead of only dates before
 the flare. Moreover, clinician researchers wanted the possibility to explore and
 download all data in a summarized way.
     The emerging themes from the Grounded Theory analysis were (1) accessing
 flare related environmental data, (2) associating multiple patients and (3) explor-
 ing longitudinal data. These themes acknowledged the perception associated to
 the complex topic of comprehending linked environmental data to support rare
 disease research (Requirement 2 ).


 5.3          Discussion

 The SCEED dashboard was developed in order to support clinician researchers
 exploring clinical data linked with environmental data by querying the data,
 visualizing these datasets as tables and visualizations for comprehension and
 downloading the data for their models; without previous knowledge of Semantic
 Web technologies. We conducted a user experiment to evaluate the SCEED
 dashboard by the completion of 7 tasks using standard methodologies: time on
 task, CTA and PSSUQ; to assess the usability. These tasks were selected to
 assess the three core requirements.
     First, all the participants were successful in completing the tasks displaying
 a similar pattern (Fig. 2). This pattern suggests that the time spent to complete
 each task increases with the complexity of the tasks. This complexity directly


                                                82
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 relates to the difficulty of fulfilling the requirements by the clinician researchers,
 supporting the achievement of Requirements 1 and 3.

     Second, the analysis of the PSSUQ responses leads to a better understanding
 of the SCEED dashboard specific features. The PSSUQ aggregated results in
 Fig. 3 show the known consistent pattern of poor ratings for InfoQual relative
 to IntQual and for Q9 [14], supporting the robustness of the questionnaire with
 less than 15 participants [20]. Moreover, these aggregated results are lower than
 the norm defined for the PSSUQ version 2 [13], the lower the value the higher
 the satisfaction; and provide a reference for the next versions of the dashboard.
 The open comments of this questionnaire indicated that the system was easy to
 use and had good features; while requiring additional ones, explained by Q18
 score (see Fig. 3), to fulfil Requirement 2.

     Third, the CTA allowed us to understand participants thoughts as they oc-
 curred while completing the tasks. The categorization for the think-aloud state-
 ments made clear that the dashboard needs improvements in a number of areas
 which will be addressed in the next versions. Furthermore, the emerging themes
 of the Grounded Theory analysis endorse the previous statements made along
 fulfilling Requirements 1 and 3, and acknowledging alternatives on how Require-
 ment 2 could be addressed in the following versions. These next versions will be
 updated with a more focused multiple patients approach on a selcted date range
 to improve the environmental data exploration.

    When the results of the various evaluation methods described above are ex-
 amined together, we were able to achieve some more insights. A number of
 examples of these insights are worth presenting. The emotional responses noted
 during the CTA provides an explanation for the three outliers in Fig. 2. These
 participants were curious and wanted to explore all the functionalities during
 the tasks. On the other hand, task 2 dispersive values can be explained by the
 inefficient formulation of the task, since some participants discovered the hover
 functionality while performing task 1, resulting in quicker times.

     Having selected only clinician researchers provides more relevant results than
 enlarging the sample size for this first version. However, this supposes a chal-
 lenge when making statements around the quantitative results which is why we
 combine different metrics in the evaluation. Another limitation of this work is
 that the manual evaluation for the qualitative data with a limited number of
 participants, lacking depth of the qualitative results. These limitations will be
 addressed in later evaluations with a more comprehensive and automatic de-
 sign, and increasing the sample size with the involvement of more real end users
 including different interests for a wider acceptance of the prototype.

     However, the evaluation conducted on this paper, which combines different
 standard metrics, could be beneficial for assessing other tools with low sample
 sizes. Finally, the results on this initial evaluation hold promise for producing
 an interface that will engage clinician researchers directly with Linked Data.


                                         83
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 6    Conclusions and Future Work
 From the design and the results obtained from the evaluation performed, the
 SCEED dashboard is an adequate initial design to fulfil the clinician researchers
 requirements of querying specific clinical patient data to retrieve environmen-
 tal data linked to vasculitis patient flare clinical data from the triplestore and
 downloading meaningful data to be used as input for statistical models. How-
 ever, new features are necessary for comprehending the use and limitations of
 the environmental data for rare disease flare discovery.
     The combination of measuring the time per task, CTA protocol and PSSUQ
 provided enough data to assess the usability of the dashboard, highlighting the
 successful aspects, identifying the items that need to be improved and the new
 features to be added. In the future, this methodology will be used as a baseline
 to track the evolution of the dashboard.

    Acknowledgements This research was conducted with the financial sup-
 port of HELICAL as part of the European Union’s Horizon 2020 research and
 innovation programme under the Marie Sk�lodowska-Curie Grant Agreement No.
 813545 at the ADAPT SFI Research Centre at Trinity College Dublin.

 References
  1. Battle, R., Kolas, D.: Enabling the geospatial Semantic Web with Parliament and
     GeoSPARQL. Semantic Web 3(4), 355–370 (2012). https://doi.org/10.3233/SW-
     2012-0065
  2. Boren, T., Ramey, J.: Thinking aloud: Reconciling theory and practice.
     IEEE Transactions on Professional Communication 43(3), 261–278 (Sep 2000).
     https://doi.org/10.1109/47.867942
  3. Braşoveanu, A.M.P., Sabou, M., Scharl, A., Hubmann-Haidvogel, A., Fischl, D.:
     Visualizing statistical linked knowledge for decision support. Semantic Web 8(1),
     113–137 (Jan 2017). https://doi.org/10.3233/SW-160225
  4. Chavira, L.A.G.: The Earth Life and Semantic Web Project Experiment GUI.
     Tech. rep., U.S. Department of Health & Human Services (Jul 2015)
  5. Dadzie, A.S., Rowe, M.: Approaches to visualising Linked Data: A survey. Semantic
     Web 2 2(2), 89–124 (Jan 2011). https://doi.org/10.3233/SW-2011-0037
  6. Dafli, E., Antoniou, P., Ioannidis, L., Dombros, N., Topps, D., Bamidis, P.D.:
     Virtual Patients on the Semantic Web: A Proof-of-Application Study. Journal of
     Medical Internet Research 17(1), e16 (2015). https://doi.org/10.2196/jmir.3933
  7. Das, S., Sundara, S., Atkinson, R.: R2RML: RDB to RDF Mapping Language.
     https://www.w3.org/TR/r2rml/ (2012)
  8. ERA5, C.C.C.S.C..: Fifth Generation of ECMWF Atmospheric Reanalyses of the
     Global Climate. Copernicus Climate Change Service Climate Data Store (CDS).
     ECMWF. https://cds.climate.copernicus.eu/cdsapp#!/home
  9. Hasnain, A., Kamdar, M.R., Hasapis, P., Zeginis, D., Warren, C.N., Deus, H.F.,
     Ntalaperas, D., Tarabanis, K., Mehdi, M., Decker, S.: Linked Biomedical Datas-
     pace: Lessons Learned Integrating Data for Drug Discovery. In: International
     Semantic Web Conference 2014. pp. 114–130. Lecture Notes in Computer Sci-
     ence, Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-
     11964-9 8


                                          84
Exploration of Environmental and Disease Data Dashboard for Clinician Researchers


 10. Khan, S.: Qualitative Research Method: Grounded Theory. International Journal of
     Business and Management 9 (Oct 2014). https://doi.org/10.5539/ijbm.v9n11p224
 11. Koopman, R.J., Kochendorfer, K.M., Moore, J.L., Mehr, D.R., Wakefield, D.S.,
     Yadamsuren, B., Coberly, J.S., Kruse, R.L., Wakefield, B.J., Belden, J.L.: A Dia-
     betes Dashboard and Physician Efficiency and Accuracy in Accessing Data Needed
     for High-Quality Diabetes Care. The Annals of Family Medicine 9(5), 398–405 (Sep
     2011). https://doi.org/10.1370/afm.1286
 12. Lassila, O., Swick, R.R.: Resource Description Framework (RDF) Model and
     Syntax Specification. https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
     (1999)
 13. Lewis, J.: Psychometric Evaluation of the PSSUQ Using Data from Five Years
     of Usability Studies. Int. J. Hum. Comput. Interaction 14, 463–488 (Sep 2002).
     https://doi.org/10.1080/10447318.2002.9669130
 14. Lewis, J.R.: IBM computer usability satisfaction questionnaires: Psychometric
     evaluation and instructions for use. International Journal of Human–Computer
     Interaction 7(1), 57–78 (Jan 1995). https://doi.org/10.1080/10447319509526110
 15. Macefield, R.: How To Specify the Participant Group Size for Usability Studies: A
     Practitioner’s Guide. Journal of Usability Studies 5(1), 12 (2009)
 16. McKenna, S., Staheli, D., Fulcher, C., Meyer, M.: BubbleNet: A Cyber Security
     Dashboard for Visualizing Patterns. Computer Graphics Forum 35(3), 281–290
     (2016). https://doi.org/10.1111/cgf.12904
 17. Reddy, B.P., Houlding, B., Hederman, L., Canney, M., Debruyne, C., O’Brien, C.,
     Meehan, A., O’Sullivan, D., Little, M.A.: Data linkage in medical science using the
     resource description framework: The AVERT model. HRB Open Research 1, 20
     (Mar 2019). https://doi.org/10.12688/hrbopenres.12851.2
 18. Reynolds, D., Cyganiak, R.: The RDF Data Cube Vocabulary.
     https://www.w3.org/TR/vocab-data-cube/ (2014)
 19. Sabol, V., Tschinkel, G., Veas, E., Hoefler, P., Mutlu, B., Granitzer, M.: Dis-
     covery and Visual Analysis of Linked Data for Humans. The Semantic We-
     bISWC 2014, Lecture Notes in Computer Science 8796, 309324 (Oct 2014).
     https://doi.org/10.13140/2.1.3744.2566
 20. Sauro, J., Lewis, J.R.: Quantifying the User Experience: Practical Statistics for
     User Research. Elsevier, Cambridge, 2nd edition edn. (2016)
 21. Scharl, A., Herring, D., Rafelsberger, W., Hubmann-Haidvogel, A., Kamolov, R.,
     Fischl, D., Föls, M., Weichselbraun, A.: Semantic Systems and Visual Tools to
     Support Environmental Communication. IEEE Systems Journal 11(2), 762–771
     (Jun 2017). https://doi.org/10.1109/JSYST.2015.2466439
 22. Tie, Y.C., Birks, M., Francis, K.: Grounded theory research: A de-
     sign framework for novice researchers:. SAGE Open Medicine (Jan 2019).
     https://doi.org/10.1177/2050312118822927
 23. Villanueva-Rosales, N., Chavira, L.G., del Rio, N., Pennington, D.: eScience
     through the Integration of Data and Models: A Biodiversity Scenario. In: 2015
     IEEE 11th International Conference on E-Science. pp. 171–176 (Aug 2015).
     https://doi.org/10.1109/eScience.2015.77
 24. Zainab, S., Saleem, M., Mehmood, Q., Zehra, D., Decker, S., Hasnain, A.: Fed-
     Viz: A Visual Interface for SPARQL Queries Formulation and Execution. In:
     VOILA@ISWC2015 (2015)
 25. Zhu, X., Wolfgruber, T.K., Tasato, A., Arisdakessian, C., Garmire, D.G.,
     Garmire, L.X.: Granatum: A graphical single-cell RNA-Seq analysis
     pipeline for genomics scientists. Genome Medicine 9(1), 108 (Dec 2017).
     https://doi.org/10.1186/s13073-017-0492-3


                                          85