<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Posters and Demos, October</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Visualization through Domain Knowledge Integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreia Almeida</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Alves</string-name>
          <email>amp.alves@campus.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maribel Yasmina Santos</string-name>
          <email>maribel@dsi.uminho.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana León</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Moura Pires</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ALGORITMI Research Centre, University of Minho, Campus de Azurém</institution>
          ,
          <addr-line>4800-058 Guimarães</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NOVA University of Lisbon, School of Science and Technology</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research Center on Software Production Methods (PROS), Universitat Politècnica de València</institution>
          ,
          <addr-line>Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>8</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Big Data is challenging analytical contexts, namely when aligning data and analytical requirements. While the capacity to collect and store new data is expanding rapidly, the pace at which it can be analyzed is developing more slowly. Defining these analytical requirements and selecting the most appropriate visualizations often depends on an in-depth understanding of what users need from the data. To address this problem, this paper proposes an assisted model-driven analytics approach to support visualization, taking domain knowledge and data as input. It allows the user to be guided in the mapping between domain concepts and available data, as well as in the translation of domain questions into analytical tasks that can be supported by useful visualizations for decision support. The approach is supported by a Meta-model that formalizes concepts needed to answer three fundamental questions, what, why, and how. This Meta-model contextualizes the data, the analytical tasks, and the supporting visualizations. The applicability of the proposal is shown through a demonstration case focused on the genome domain. The results highlight how useful visualizations are derived from the specified domain questions.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Integration</kwd>
        <kwd>Analytical requirements</kwd>
        <kwd>Analytical visualizations</kwd>
        <kwd>Model-driven analytics</kwd>
        <kwd>Conceptual meta-model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>The amount of data that needs to be analyzed is continually increasing. This presents constant
challenges when it comes to selecting and running the most appropriate visualizations for each
dataset, especially when working in contexts of large volumes of data, as Big Data is challenging
analytical contexts, namely when aligning data and analytical requirements and using the most
appropriate visualizations for supporting users to make more informed decisions.</p>
      <p>
        In this context, the multiplicity of choices and the lack of clarity regarding analytical
objectives make it dificult for users to establish efective connections between the two for data
visualization [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Each dataset has unique characteristics and not all types of visualizations
∗Corresponding author.
CEUR
Workshop
Proceedings
appropriately represent them [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Although some studies have proposed approaches to optimize
data visualization ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), several challenges exist in aligning analytical
requirements with the data, as well as translating domain questions, expressed in natural language,
into analytical tasks. These tasks are then used to design analytical visualizations.
      </p>
      <p>This paper proposes an assisted model-driven analytics approach to support analytical
visualization, using domain knowledge and data as input. After mapping the data, the approach
helps the user translate domain questions into analytical tasks that are supported by analytical
visualizations. The proposed iterative process, from the identification of the most appropriate
analytical tasks for each question to the analytical visualizations, is supported by the
modeldriven analytics component of the approach. This component includes a Meta-model that
contextualizes the data, the analytical tasks applied, and the analytical visualizations that can be
used to analyze the results obtained from performing these tasks. This Meta-model formalizes
the concepts needed to answer three fundamental questions: what, the type of data the user
is dealing with; why, the reason why the user wants to analyze that data; and how, how the
visualization is implemented in terms of design choice.</p>
      <p>This paper is structured as follows. Section 2 presents related work. Section 3 presents the
proposed model-driven assisted analysis approach. Section 4 presents the proposed Meta-model.
Section 5 presents and discusses a demonstration case applied in the genomics domain. Finally,
section 6 summarizes the conclusions and future work.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        In the context of research approaches to data analytics, model-driven approaches are presented
through the concept of modeling real-world domains as a knowledge base to ease the analysis
of the modeled domains. This type of approach generally focuses on facilitating visualization
design choices but is not capable of bridging the mapping of domain data into visual channels [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
We believe that within this space, it is possible to contribute towards the inclusion of conceptual
models as domain knowledge that will be used to relate domain concepts with domain data
and help translate user requirements. Some works of data analytic approaches, with a focus
on modeling, such as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], propose a model-driven architecture that allows automation for the
creation of visualization through the translation of the user-specific objectives/goals. Other
works, e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], intend to facilitate the design choices regarding visualizations to users who lack
data analysis expertise through the use of a model-driven approach in which user requirements,
data profiling, and visualization design are considered. In addition, other works use iterative
goal-oriented models that specify visualizations to create dashboards [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or propose visualization
frameworks that map user requirements to data visualizations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The work of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] explores
how joint interactive visualization can improve the communication of knowledge between
diferent users, promoting mutual understanding through the visual representation of data.
      </p>
      <p>
        These studies, although they share similarities with the work presented in this paper, do not
provide sound mapping strategies between the required domain knowledge and data. There is
a lack of an approach that guides the user in aligning analytical requirements with the data, as
well as translating domain questions into analytical tasks to ensure that analytical visualizations
adequately address the identified analytical tasks. This approach seeks to bridge this gap by
aligning domain knowledge, domain data, and analytical requirements with suitable
visualizations tailored to designed analytical tasks. Another defining mark of our approach is the support
for identifying analytical tasks from analytical requirements using a taxonomy that maps user
requirements into analytical tasks. For proposing the taxonomy, related works considered the
works of [
        <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
        ], which describe visualization tasks at varying levels of abstraction and consider
that analytical tasks are driven by the need to perform complex actions based on thorough data
analysis ([
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]), and other low-level taxonomies ([
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]) that typically encompass simpler
actions that do not require an in-depth analysis of the overall analytical context. This taxonomy
is integrated into a Meta-model that contextualizes the domain questions, the analytical tasks
and the analytical visualizations useful for decision-making.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Model-driven Analytics Approach</title>
      <p>
        Data-oriented analytics guides the identification of valuable insights from vast amounts of
data. This is particularly relevant in a context where Big Data imposes complex challenges in
the alignment between analytical requirements and data. Usually, the application domain is
described using a conceptual model, but the definition of the analytical requirements and the
identification of the corresponding visualizations are often done by looking into the users’ needs.
This work aims to advance model-driven analytics with an approach that considers the domain
knowledge and data as input and assists the user in translating domain questions expressed
in natural language into specific analytical tasks that are supported by useful visualizations.
The approach here proposed follows the human in the loop principle proposed by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], being
supported by an iterative analytical process to augment human capabilities (and not to replace
them). As [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] highlights, this iterative process i) requires interactions between the user and
the several analytical visualizations supporting many possible queries and, as such, handling
complexity with data analysis at diferent levels of detail; ii) can be framed by three essential
questions: what data the user is dealing with, why the user intends to use a visualization tool,
and how the visual encoding and interaction are constructed in terms of design choices.
      </p>
      <p>
        Despite the relevance of the proposal presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], this does not describe the concepts of
the domain or map the domain to data abstractions, focusing more directly on understanding
the analytical tasks required to answer domain-specific questions and how visualizations can
support these tasks. Our approach considers these three fundamental pillars presented by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
what, why, and how. However, it also proposes an analytical approach that, besides dealing
with the concepts of the domain and the data, guides the user in mapping the concepts of the
domain with the data. This is essential to ensure alignment between the domain concepts
and the available data, promoting a consistent and targeted analysis of the domain’s analytical
requirements. The approach here proposed (Figure 1) considers three main components:
• Domain Knowledge and Data, where a conceptual model of the domain is available
describing the main concepts and relationships, as well as the available data and the
domain questions for the data;
• Model-driven Analytics, including the proposed Meta-model that contextualizes the
data, the analytical tasks, and the analytical visualizations that can be used to analyze the
results of the analytical tasks;
• Assisted Model-driven Analytics, with four core steps guiding the proposed approach
from the domain concepts, data, and questions to the visualizations. This encompasses
the mapping of the domain concepts and data, the identification of the analytical tasks for
the defined domain question(s), the processing of these tasks and, finally, the processing
of visualizations that map the tasks’ output into useful instruments for decision support.
      </p>
      <p>Considering the Domain Concepts formalized in a data model (such as a Class Diagram or
an Entity-Relationship Diagram) and a specific dataset for analysis ( Domain Data), already with
the prepared data, the first step of the assisted model-driven analytics components maps these
two relevant pieces of information to check the alignment between them (Domain and Data
Mapping). This step involves mapping the attributes defined in the classes of the data model
with the attributes of the domain data available for analysis, ensuring a common understanding
of the concepts and supporting data. A list, table, or another similar artefact must be made
available as a result of this mapping step (Mapped Data). This information is useful for the
identification of the analytical tasks ( Analytical Tasks Identification ), translating the Domain
Questions (questions set by the domain user to be answered) into the Analytical Tasks that
will be detailed with the help of the proposed Meta-model. This Meta-model includes a set of
Analytical Tasks that define an iterative sequence of analysis processes. The Data Engineer
plays a key role in supporting the identification of the analytical tasks needed to answer
the domain’s questions. These tasks are associated with output targets, which represent the
analytical results (Analytical Outputs) obtained after the data analysis process ( Analytical
Tasks Processing). These are the inputs for the visualizations (Visualizations Processing).
The Meta-model supports the identification of the appropriate visualizations according to the
obtained results. This approach adopts a human in the loop philosophy, with the interaction of
the Domain User and the Data Engineer and the processing components, and also interactions
with or between components.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Model-driven Analytics Meta-model</title>
      <p>The proposed approach is supported by a Meta-model that contextualizes the domain questions,
the analytical tasks and the analytical visualizations useful for decision-making. This section
ifrst presents the proposed Meta-model and describes its main packages and concepts. The
Unified Modeling Language (UML) Package Diagram presented in Figure 2 includes three main
packages, What Dimension, Why Dimension and How Dimension, formalizing the concepts
needed to answer three fundamental questions: what is the focus of the analysis?, why are
we analysing these data?, and how can we analyse these data?. Each dimension includes its
sub-packages and the dependencies associated with them. These dependencies can be classified
into two types: import, where one package imports the functionality of another package, and
access, where one package requires concepts or functionalities present in another package.</p>
      <p>The What Dimension (Figure 3) package corresponds to the ”what” component of the
Metamodel and includes the Dataset sub-package with three detailed sub-packages: Items, Attributes
and Data Types. Between these three sub-packages, there is an association between items and
their respective attributes, and each attribute is associated with a specific data type.</p>
      <p>The Why Dimension (Figure 4) includes three sub-packages: Domain Questions, Analytical
Tasks and Targets. At the sub-package level, domain questions include the user’s questions that
are translated into analytical tasks and therefore require access to the functionalities present
in that package. Furthermore, the analytical tasks sub-package requires access to the targets
sub-package to filter, select or have as expected output one of the three possible targets available
in the Meta-model, namely Attribute Target, Item Target and Dataset Target. The Analytical
Tasks also have a connection with the Attributes sub-package as the Meta-model includes a
relationship with a specific analytical task ( Compute Attribute) which allows attributes to be
derived. The other dependency occurs since certain analytical tasks allow for the creation of
analytical visualizations that can be used to analyze the results of the analysis.</p>
      <p>The How Dimension (Figure 5) package includes the Charts sub-package, with the set of
analytical visualizations and their components. At a lower level, the Chart Components
subpackage depends on the Attributes package to identify the possible charts to use.</p>
      <p>Detailing the Meta-model, the ”what” component, presented in Figure 3, is formalized by
specifying the diferent types of datasets ( Network and the particular Tree type, Field, Table,
and Spatial with the particular case of Geospatial data). The items included in these datasets
aggregate diferent attributes that address simple data (such as a quantitative or qualitative
value) or complex data (such as temporal or spatial data) and their corresponding values. Each
attribute is associated with a specific data type. Datasets can include indexes for their items or
attributes. Items can be classified and may establish relationships between them.</p>
      <p>The ”why” component, depicted in Figure 4, addresses the Domain Questions, which
represent the questions the user wants answered and which can be translated into Analytical Tasks.
The Analytical Tasks, which determine the actions that will be applied to the data and that can
be formalized for addressing the analytical requirements of a domain, include tasks that express
actions that can be used to find insights (tasks such as relationship, pattern, find extreme, find
anomalies and find clusters), compare, determine distribution, organize, or to derive new data
(that can be the expected output or be the input of another task). An analytical task usually
selects data from a target (attribute, item or dataset), filters data from a target (attribute, item
or dataset), and has as expected output a target (attribute, item or dataset). Depending on the
analytical tasks used, some can derive visualizations to answer formulated domain questions
(identify, compare and determine distribution), while others serve as intermediate steps during
the analysis process (organize and derive).</p>
      <p>The Meta-model establishes constraints on the types of charts that can be used for each
analytical task since the decision on the chart will depend on the specific tasks that will be
supported by analytical visualizations. The constraint next presented states that for the
Relationship analytical task, whose objective is to identify and analyse the relationships and interactions
between attributes, one of the possible visualization charts is a ScatterPlot (ChartType).
Context t:AnalyticalTask :: ChartType
If t.Identify.Relationship-&gt;notEmpty() and t.Chart-&gt; notEmpty()
then t.Chart.type = ScatterPlot or t.Chart.type = LineChart or
t.Chart.type = HeatMap or t.Chart.type = HighlightTable or
t.Chart.type = Map or t.Chart.type = SymbolMap endif</p>
      <p>The ”how” component (Figure 5) addresses a set of analytical visualizations and their
components, taking into account the analytical task(s) and the type of data used to meet users’
analytical needs. Each visualization, represented by the Chart class, has derived attributes
(nMarks, nAxis and nHeader) which are obtained from the number of associations between
Chart and the corresponding components, ChartComponents. Each Chart can contain several
chart components, depending on the type of chart (ChartType). In this way, the Chart class
has diferent chart types (ChartType), characterized by the number of marks ( nMarks), axes
(nAxis) and headers (nHeader). The headers mentioned use the data from the corresponding
attribute(s) to form a header with one or more entries and can be of type column or row; the
axes use data that correlates with a range of values and can be of type x or y; and the marks
control the type of MarkType, which can have diferent types of marks, such as color, size, text,
shape, position and angle. These can be associated with one or more data attributes (Attribute)
resulting from the data analysis and have specific restrictions depending on the type of chart.</p>
      <p>This component presents constraints related to whether or not the ChartComponents can
be included, as well as the number and use of each one, impacting the presentation of the
ifnal visualization. Each group of constraints has been formulated according to the analytical
requirements needed to create each specific chart.</p>
      <p>In terms of the relationships between the classes and components of the Meta-model, each
target in the why component is linked to its respective class in the what component. In addition,
each target can be associated with an attribute that contains a specific order in which it will be
displayed in the visualization, represented by the AttributeOrder class. The Analytical Tasks
class of the why component allows the creation of zero or more visualizations depending on the
type of task and, therefore, a relationship is established between it and the Chart class of the how
component. Finally, for the visualization to be derived, each component (ChartComponents) is
assigned to an attribute resulting from the expected output of the analytical task, thus linking
the ChartComponent class to the Attribute class of the what component.</p>
      <p>Due to the size of the images and space limitations in the paper, the global Meta-model can
be found in 1, while the full list of constraints can be found in 2.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Demonstration Case: Genomics Domain</title>
      <p>
        This section presents the application of the proposals to a demonstration case in the Genomics
Domain. The domain concepts are formalized in the Conceptual Schema of Genome (CSG) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
a data model expressed in a UML Class Diagram. Given the extension of this model, Figure 6
highlights the classes and relationships that are considered in this demonstration case. This
model includes the Gene that is part of the ChromosomeElement, located in the Chromosome
and can be transcribed as part of the TranscriptableElement. Additionally, a Chromosome can
be located in several variations (Variation). This Variation class, Precise or Imprecise, can occur
1https://bit.ly/3UVCNSI
2https://bit.ly/3OYcU0D
at specific positions in the genome ( VariationPosition). In addition, the Precise class includes
the reference allele (ref) and the possible alternative alleles (alt) and is associated with the
Genotype_Freq class which records the frequency of diferent allele combinations.
      </p>
      <p>The domain data with a prepared dataset includes positions in the DNA (variants) where a
variation may occur, the gene afected by each variant, and the genotype of the patient (one
or two copies of the alternative allele). The dataset contains six columns representing this
information: Chrom, POS, REF, ALT, Genotype, and Gene. Each variant is represented by its
position in the DNA. This position is represented by the chromosome (Chrom), the sequence
position where the variant occurs in the chromosome (POS), the reference allele (REF), and
all possible alternative alleles that could be observed in that position (ALT). The Gene column
defines the gene or genes afected by the studied variant. The genotype ( Genotype) determines
which alleles the patient has at the studied position. The reference allele (REF) is represented by
a 0, and each of the alternative alleles is represented by 1, 2, 3, …. Since humans have two copies
of each chromosome, the Genotype value also represents if the patient presents the variant in
one copy (heterozygote) or both copies (homozygote).</p>
      <p>The result of the first step of the proposed approach ( Domain and Data Mapping) is shown
in Table 1, mapping the domain concepts and the available data. In the example provided,
there is a direct correspondence between the attributes defined in the classes of the conceptual
data model and the attributes of the data available for analysis (Domain Data). For instance,
the domain concept Chromosome.name, which is an attribute of type String in the conceptual
data model, corresponds to the attribute Chrom in the dataset, which is also of type String
and contains the actual values of the chromosome names. Similarly, the domain concept
VariationPosition.start, which is an attribute of type Long Integer and indicates the
starting position of a genetic variation, corresponds to the POS attribute in the available data,
which is also a Long Integer and stores the positions of genetic variations. This mapping
ensures that data types and attributes are compatible between the conceptual data model and the
available data, and allows for data validation, checking that all defined concepts are represented
in the available data.</p>
      <p>Next, the object diagrams with the instantiation of the Meta-model are presented. Figure 7
presents the ”Why Dimension”, highlighting the domain question to be discussed, the analytical
tasks needed to answer the question and the targets used by the analytical tasks.</p>
      <p>For this dataset in VCF, Variant Call Format, the following domain question was formalized
”What is the distribution of variants along the genome in a sample”. In the second step of the
approach, Analytical Task Identification, this domain question is transformed into two analytical
tasks, Compute Attribute and Determine Distribution, to analyze the data and produce the
expected result. This identification of analytical tasks was supported by the Data Engineer,
interacting with the Domain User. The first analytical task identified is considered fundamental,
”Create the left and right allele attribute of a gene, for each existing position” , since this task allows
the derivation of two attributes, Left Allele and Right Allele. The expected result is a
dataset with the addition of the two derived attributes. The second analytical task, ”Visualize
the distribution of variants along the genome in a sample”, requires the visualization of the
distribution of variants in the genomic dataset under analysis. This task targets the dataset from
the first analytical task ( Compute Attribute). Its expected output is a list of multiple variants.</p>
      <p>For the ”What Dimension”, Figure 8, the first analytical task selects data from a set of data
in a table. These table items integrate a set of attributes with a specific data type and the
corresponding attribute values. The dataset includes indexes for items and attributes. The
expected output target is a dataset with two new attributes (Left Allele and Right Allele)
added to the initial dataset. These attributes have the condition that, depending on the position,
whenever the allele is represented by 0, the derived attribute corresponds to the REF attribute,
but if one of the alleles is represented by 1, 2, 3, among others, then the derived attribute
corresponds to ALT, which represents all possible alternative alleles. In addition, the following
analytical task involves selecting data from a table dataset, namely the dataset resulting from
the previous task. The expected output target is a dataset with the six attributes belonging to
the initial dataset (Chrom, REF, ALT, Genotype and Gene) and the two newly derived attributes
(Left Allele and Right Allele). These analytical results derive from the processing of these
analytical tasks belonging to the third stage of the approach, Analytical Tasks Processing.</p>
      <p>After obtaining the analytical outputs, the Visualizations Processing step supports the
development of a chart from the Determine Distribution task. Figure 9 shows the instantiation of
the ”How Dimension” component and the types of components required in terms of headers,
axes and marks, as well as the type of ChartType used to form the analytical visualizations. The
complete objects’ diagram joining the three dimensions can be found in 3.</p>
      <p>The visualization generated through the Tableau tool is of the GanttChart type (Figure 10),
one of the possible charts according to the restrictions of the Meta-model and the type of data.
This includes a header with the Chrom and Gene attributes, an axis corresponding to the POS
attribute and three marks: the first text mark includes the Chrom, Gene, Left Allele and Right
Allele attributes, the second shape mark includes to the POS attribute and, finally, the color
mark highlights the Gene. The order of the attributes in the visual representation is determined
by the hierarchy established in the data model (Domain Concepts). In this case, the Chrom
belonging to the Chromosome class according to the data mapping is presented as the first
attribute, followed by the Gene and then the numerical POS attribute. There is a hierarchy
between the Chromosome class and the Gene class, and genes are constituent parts of these
chromosomes. The POS attribute, being a numerical attribute associated with the horizontal axis
of the chart, is found after the Gene attribute in the analytical visualization. Now, it is possible
to analyze, for a given chromosome and gene, its position in the genome and the corresponding
alleles. For example, the chr1 chromosome with the BRAC1 gene at a position between 20
and 30 million corresponds to the T/C alleles, with the Left Allele represented by T and the
Right Allele represented by C.</p>
      <p>The obtained visualization allows analytics for the defined domain question ”What is the
distribution of variants along the genome in a sample?”. Although it is possible to create other
types of charts, they would not be as efective in demonstrating what the Domain User requires.
It is relevant to note that the visualization presented as a result of this demonstration case
has been validated by a domain expert, thus ensuring that the selected visualization eficiently
meets the analytical objectives. Furthermore, it is important to mention that it is up to the user
to make the final decision regarding the choice of the most suitable visualization within the
range of possible visualizations suggested in the Meta-model.</p>
      <p>Based on this demonstration case, we found that the analytical approach proposed facilitates
the development of visualizations that efectively address domain questions. By integrating
domain knowledge and data as input, this approach aligns analytical requirements with data and
assists users in translating domain questions into actionable analytical tasks supported by useful
visualizations. The Meta-model plays a crucial role in this iterative process by contextualizing
data, identifying applicable analytical tasks, and guiding the creation of visualizations. This
approach is applicable across various domains and datasets.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>In this paper, we have presented an approach to supporting analytical visualization that provides
guidance, from mapping data and identifying analytical tasks to creating analytical visualizations
capable of responding to users’ analytical needs. This approach is supported by a Meta-model,
which contextualizes the data, the analytical tasks and the analytical visualizations that make it
possible to analyze the results of these tasks. To verify the validity of the approach, we applied
it to a demonstration case in the genomics domain, presenting an example of a useful analytical
visualization.</p>
      <p>In future work, further evaluation and the extension of the Meta-model, to add interactivity
between the user and the proposed visualizations, are considered.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&amp;D
Units Project Scope UIDB/00319/2020 (ALGORITMI) and UIDB/04516/2020 (NOVA LINCS),
and by the Spanish Ministry of Universities and the Universitat Politècnica de València under
the Margarita Salas Next Generation EU grant. This paper uses icons made available by
www.flaticon.com.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A requirements-driven framework for automatic data visualization</article-title>
          ,
          <source>in: Enterprise, Business-Process and Information Systems Modeling</source>
          , Springer Nature Switzerland,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavalle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trujillo</surname>
          </string-name>
          ,
          <article-title>Requirements-driven visualizations for big data analytics: A model-driven approach</article-title>
          , in: Conceptual Modeling, Springer International Publishing,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Golfarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <article-title>A model-driven approach to automate data visualization in big data analytics</article-title>
          ,
          <source>Information Visualization</source>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1177/1473871619858933.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Mellor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Clark</surname>
          </string-name>
          , T. Futagami,
          <article-title>Model-driven development - guest editor introduction (</article-title>
          <year>2003</year>
          ). doi:
          <volume>10</volume>
          .1109/MS.
          <year>2003</year>
          .
          <volume>1231145</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Munzner</surname>
          </string-name>
          ,
          <article-title>Visualization Analysis and Design, A K Peters/</article-title>
          CRC Press,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .1201/ b17511.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavalle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trujillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <article-title>Visualization requirements for business intelligence analytics: A goal-based, iterative framework</article-title>
          ,
          <source>in: 2019 IEEE 27th International Requirements Engineering Conference (RE)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>119</lpage>
          . doi:
          <volume>10</volume>
          .1109/RE.
          <year>2019</year>
          .
          <volume>00022</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Eppler</surname>
          </string-name>
          ,
          <article-title>Facilitating knowledge communication through joint interactive visualization</article-title>
          ,
          <source>JUCS - Journal of Universal Computer Science</source>
          <volume>10</volume>
          (
          <year>2004</year>
          )
          <fpage>683</fpage>
          -
          <lpage>690</lpage>
          . doi:
          <volume>10</volume>
          .3217/ jucs- 010- 06- 0683.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brehmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Munzner</surname>
          </string-name>
          ,
          <article-title>A multi-level typology of abstract visualization tasks</article-title>
          ,
          <source>IEEE Trans. Visualization and Computer Graphics (TVCG) (Proc. InfoVis)</source>
          (
          <year>2013</year>
          )
          <fpage>2376</fpage>
          -
          <lpage>2385</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Heer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          ,
          <article-title>Interactive dynamics for visual analysis</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>55</volume>
          (
          <year>2012</year>
          )
          <fpage>45</fpage>
          -
          <lpage>54</lpage>
          . doi:
          <volume>10</volume>
          .1145/2133806.2133821.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E. R. A.</given-names>
            <surname>Valiati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Pimenta</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. M. D. S. Freitas</surname>
          </string-name>
          ,
          <article-title>A taxonomy of tasks for guiding the evaluation of multidimensional visualizations</article-title>
          ,
          <source>in: Proceedings of the 2006 AVI Workshop on BEyond Time</source>
          and
          <article-title>Errors: Novel Evaluation Methods for Information Visualization</article-title>
          , BELIV '06,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2006</year>
          , p.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: https://doi.org/10.1145/1168149.1168169.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. X. Zhou</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Feiner</surname>
          </string-name>
          ,
          <article-title>Visual task characterization for automated visual discourse synthesis</article-title>
          ,
          <source>in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '98</source>
          , ACM Press/Addison-Wesley Publishing Co., USA,
          <year>1998</year>
          , p.
          <fpage>392</fpage>
          -
          <lpage>399</lpage>
          . URL: https://doi.org/10.1145/274644.274698. doi:
          <volume>10</volume>
          .1145/274644.274698.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wehrend</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <article-title>A problem-oriented classification of visualization techniques</article-title>
          ,
          <source>Proceedings of the First IEEE Conference on Visualization: Visualization '90</source>
          (
          <year>1990</year>
          )
          <fpage>139</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Casamayor</surname>
          </string-name>
          ,
          <article-title>On how to generalize species-specific conceptual schemes to generate a species-independent conceptual schema of the genome</article-title>
          ,
          <source>BMC Bioinformatics</source>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1186/s12859- 021- 04237- x.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>