=Paper=
{{Paper
|id=Vol-2578/BigVis12
|storemode=property
|title=A GUI Design for Composition Discovery of View Interestingness
|pdfUrl=https://ceur-ws.org/Vol-2578/BigVis12.pdf
|volume=Vol-2578
|authors=Xiaozhong Zhang,Xiaoyu Ge,Panos Chrysanthis
|dblpUrl=https://dblp.org/rec/conf/edbt/ZhangGC20
}}
==A GUI Design for Composition Discovery of View Interestingness==
<pdf width="1500px">https://ceur-ws.org/Vol-2578/BigVis12.pdf</pdf>
<pre>
               A GUI Design for Composition Discovery of View
                               Interestingness
               Xiaozhong Zhang                                                Xiaoyu Ge                            Panos K. Chrysanthis
        Computer Science Department                              Computer Science Department                    Computer Science Department
          University of Pittsburgh                                 University of Pittsburgh                       University of Pittsburgh
           xiaozhong@pitt.edu                                        xiaoyu@cs.pitt.edu                             panos@cs.pitt.edu


ABSTRACT                                                                                   However, using a single utility function to estimate the inter-
View recommendation has emerged as a powerful tool to assist                           estingness of a view is usually not enough, because the inter-
data analysts in exploring and understanding big data. Due to the                      estingness of a view usually involves multiple utility functions
large search space of possible views, finding a view that shows                        simultaneously. These utility functions measure the interesting-
interesting patterns is not a trivial task. Existing view recommen-                    ness of a view from different aspects, and need to be considered
dation approaches have proposed a variety of utility measures                          at the same time to reach a reasonable and accurate assessment.
in selecting interesting views. Even though using a single util-                       These aspects could include the relevance of the view, conciseness
ity measure or a linear combination of multiple utility measures                       of the pattern, generality of the pattern, and so on.
might be suitable in specific scenarios for view interestingness                           Several works [2, 5, 10] have adopted a view interestingness
estimation, we claim that any assumption of the composition of                         measure that involves multiple utility functions. However, all of
view interestingness could be inaccurate without verification from                     them assume that the view interestingness is a linear combination
real users.                                                                            of individual utility functions. This assumption might be suitable
   Therefore, in this paper, we propose a novel graphical user                         in specific scenarios, but is usually not accurate in a more general
interface (GUI) designed to be used in a user study to shed light                      sense. For example, deviation [7] is a commonly-used utility func-
on the composition of view interestingness. Specifically, we first                     tion. However, a view with high deviation could be uninteresting
create a classification system for view recommendation tasks, and                      if the context of the view is not relevant to the analytical task.
identify utility measures suitable for each category. Then, we de-                     Similarly, a relevant view could also be uninteresting due to the
sign a GUI that uses the identified utility measures to discover                       lack of deviation. It can be easily seen from the example that the
how users evaluate the views with respect to the utility measures,                     interestingness of a view is not a linear combination of the two
and how they assess the overall view interestingness based on the                      measures (i.e., deviation and relevance).
utility measures. Finally, we use an example to illustrate how the                         Since any assumption regarding the utility measure compo-
user answers to the questions in the GUI can be used to discover                       sition in the view interestingness could be inaccurate without
the composition form of view interestingness.                                          verification from real users, user studies that record and analyze
                                                                                       real user assessment of the view interestingness become highly
                                                                                       needed.
                                                                                           In light of the above demand, in this work, we propose a graph-
1    INTRODUCTION                                                                      ical user interface (GUI) that is designed to be used in a user study
The ubiquitously available information sources and the advance-                        to shed light on the composition of view interestingness.
ments in data storage and acquisition techniques have led to an
                                                                                       Contributions Specifically, the contributions of this paper are the
aggressive increase in the data volumes available for data analysis
                                                                                       following:
tasks. One major challenge in utilizing these abundantly available
data is discovering insights from them effectively and efficiently.                        • Create a classification system for view recommendation
Examples of an “insight” include the structure, patterns, and causal                         tasks, and identify utility measures suitable for each task
relationships. To explore these massive and structurally compli-                             category.
cated datasets, data analysts often utilize visual data analysis tools                     • Design a GUI that uses the identified utility measures to
such as Tableau [1] and Voyager [8]. However, the effectiveness of                           discover how the users evaluate the views with respect to
these tools depends on the user’s expertise and experience. Com-                             the utility measures, and how the users assess the overall
ing up with a visualization that shows interesting trends/patterns                           view interestingness based on the utility measures.
is a non-trivial issue, because the search space of possible visual-
izations is prohibitively large.                                                           • Illustrate an example of how a general composition form of
   To address the above challenge, several methods for recom-                                view interestingness could be derived from users answers
mending views (i.e., histograms or bar charts) have recently been                            to the questions in our GUI.
proposed (e.g., [2, 3, 5, 7, 9]). These methods automatically gen-                     Outline The rest of the paper is structured as follows. Section
erate all possible views of the data, and recommend the top-k                          2 covers the background of the paper. Section 3 introduces our
interesting views, according to some utility function (e.g., devia-                    novel classification system and the utility measures suitable for
tions, data variance, usability) that measures the interestingness                     each category. Section 4 presents our proposed GUI. Section 5
of views.                                                                              gives an example to illustrate how the user feedback can be used
© 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed-   to discover the composition of view interestingness. Section 6
ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copen-            concludes the paper and discusses future works.
hagen, Denmark) on CEUR-WS.org. Use permitted under Creative Commons Li-
cense Attribution 4.0 International (CC BY 4.0)
2     BACKGROUND                                                          where βi are the weights assigned to the corresponding utility
In this section, we present the necessary background details of           functions ui , i = 1, ..., n. The weights βi can either be specified
our work. Specifically, we discuss how views can be constructed           by the user, the system [2], or discovered during the interactive
through SQL queries, and explain how utility functions can be             recommendation process through user feedback [10].
used to recommend views.                                                     As previously mentioned, the above assumption that the view
                                                                          interestingness can be represented by a linear combination of
2.1    Views & Data Visualization                                         utility functions might be suitable in specific scenarios, but is
                                                                          usually not accurate in a more general sense. Therefore, in this
To begin, we start by describing a view (i.e., histogram or bar
                                                                          paper, we are going to introduce a user study GUI design that aims
chart) in the context of structural databases. A view vi essentially
                                                                          to shed light on a more general form of the composition of view
represents an SQL query with a group-by clause over a database D
                                                                          interestingness.
[2, 10]. Under the typical multi-dimensional data models, data can
be modeled as a set of measure attributes M = {m 1, m 2, m 3, ...}
                                                                          3     THE CLASSIFICATION SYSTEM
and a set of dimension attributes A = {a 1, a 2, a 3, ...}. The measure
attributes (e.g., number of items sold) is the set of attributes that     In this section, we introduce a new classification system for view
contain measurable value and can be aggregated. The dimensional           recommendation tasks, and identify utility measures suitable for
attributes (e.g., brand, year, color, size) is the set of attributes on   each category. The identified utility measures will be used in the
which measure attributes are viewed. To formulate an SQL query            designed GUI (Section 4) to discover the view interestingness
with a group-by clause, we need to have a set of aggregation func-        composition.
tions F = { f 1, f 2, f 3, ...}. Thus, we can represent each view vi as
a triple (a, m, f ), such that one dimension attribute a is applied to    3.1    Classification Dimensions
one aggregation function f on the corresponding measure attribute         The classification system has two dimensions. The first dimension
m. Consequently, the View Space (VS), i.e., the total number of           is based on the exploration nature of the task, so we call it the
possible views is:                                                        exploration dimension. It has two categories: exploratory and
                                                                          targeted. A view recommendation task is exploratory if the user
                        V S = |A| × |M | × |F |                    (1)    does not have a highly specific analytical goal in mind, and wants
Clearly, VS can be very large, especially with high-dimensional           to discover as many interesting views from the data as possible
data.                                                                     (e.g., find interesting views from the census data). On the contrary,
                                                                          a view recommendation task is targeted if the user has a highly
2.2    View Recommendation                                                specific analytical goal in mind, and is looking for specific views
In order to recommend the set of k most interesting views from            based on the goal (e.g., find interesting views about financial
a large number of views, utility scores are required to rank all          situations across different work classes from the census data).
the views. To compute such utility scores, existing literature have          In fact, it can be seen that the exploration dimension has a
proposed a large number of utility functions, some commonly               continuous domain, because any view recommendation task is
used ones includes deviation [7], diversity [4], usability [2]. A         between the most exploratory (i.e., all possible views are candi-
utility function u() maps a view to a real number indicating the          dates for interesting views) and the most targeted (i.e., only one
interestingness of the view.                                              view is the candidate for interesting views). However, for the sake
                                                                          of the classification purpose, we have discretized the exploration
   Definition 2.1. (View Recommendation Problem) Given a data-            dimension into two categories, as mentioned above.
base D, a utility function u(), and the size of the preferred view           The second dimension is based on the comparison nature of the
recommendations k, find the top-k views v 1 , v 2 , ..., vk constructed   task, so we call it the comparison dimension. It has two categories:
from D that have the highest utilities according to u() among all         non-comparative and comparative. A view recommendation task
possible views.                                                           is non-comparative if one data subset is involved in the views.
                                                                          For example, a non-comparative task could be finding interesting
   The definition is straightforward. However, as mentioned ear-
                                                                          views about female participants from the census data. The data
lier, using a single utility function to estimate the interestingness
                                                                          subset that is involved in the views is the female population. A
of a view is usually not accurate, because the interestingness is
                                                                          view recommendation task is comparative if two or more data sub-
usually determined by a combination of multiple utility functions
                                                                          sets are involved in the views, and can be compared in the views.
simultaneously. This observation leads to a refined definition.
                                                                          For example, a comparative task could be finding interesting views
   Definition 2.2. (View Recommendation with Composite Utility            about the difference between female and male participants from
Function) Given a database D, a utility function u() that is a            the census data. The two data subsets that are involved in the
composite of a set of n utility functions U = {u 1, u 2, ..., un }, and   views are the female population and the male population.
the size of the preferred view recommendations k, find the top-              The two dimensions together form four categories for view
k views v 1 , v 2 , ..., vk constructed from D that have the highest      recommendation tasks: exploratory non-comparative, exploratory
utilities according to u() among all possible views.                      comparative, targeted non-comparative, and targeted compara-
                                                                          tive.
   It can be seen that the composition of the utility function u() in
Definition 2.2 plays an important role in the view recommendation         3.2    Utility Measures
problem. Recent works [2, 5, 10] have suggested that the utility
                                                                          In this part, we discuss different utility measures and the categories
functions U = {u 1, u 2, ..., un } are linearly combined to form the
                                                                          of view recommendation tasks that they are suitable for. The
composite utility function u(). In other words, they adopt the
                                                                          utility measures will be used in the GUI to allow the user to
following form of a composite utility function:
                                                                          evaluate the view with respect to them. Concrete examples for
                    u() = β 1u 1 () + · · · + βn un ()             (2)    the user evaluation of the utility measures based on an example
view recommendation task will be given in Section 4 during the
introduction of the GUI.
   We consider six utility measures in this section.
   Novelty Novelty measures how unfamiliar the user is with the
context of the view. The context of a view is the information car-
ried by the view other than the aggregate values. Recall that a view
can be thought of as the query result of applying an aggregation
function on a measure attribute and grouping the aggregate values
by a dimension attribute over a data subset. Based on the above
observation, the context of a view could include the data subset,
the dimension attribute, the measure attribute, and the aggregation
function.
   If we assume that the previously recommended views become
the knowledge of the user, then novel views will gradually help
the user explore unknown or unfamiliar regions of the view space,         Table 1: View Recommendation Task Categories and Suitable
thus helping increasing the diversity [6] of the recommendation           Utility Measures for each Category
set.
   The measure of Novelty is especially useful in exploratory view
recommendation tasks, because it can help the user explore the               Views with high deviation are helpful because they indicate a
view space quickly and comprehensively.                                   high correlation between the dimension attribute and the within-
   Relevance Relevance [6] measures how relevant the context of           group aggregate value differences. This measure is more useful for
the view is to the user’s analytical goal. The information contained      comparative tasks, where the views contain the aggregate values
in the context of the view, such as the data subset, the dimension        of two or more data subsets.
attribute, the measure attribute, and the aggregation function could         Generality Generality [4] measures how well, does the user
play a role in the user’s determination of Relevance.                     thinks, the perceived patterns in the view can be generalized to
   Relevant views are helpful, because they may contain informa-          the whole data subset.
tion that can help the user reach the analytical goal. The measure           Coverage information of the view and of the bars in the view
of Relevance is especially useful for targeted view recommenda-           are among the factors that could affect the user’s determination of
tion tasks, because it can help the user quickly locate the targeted      generality. The coverage of a view is the percentage of the number
regions in the view space.                                                of records covered by the view against the number of records in
   Conciseness Conciseness [4] measures the easiness for the user         the data subset. A view does not cover a given record if any of its
to perceive and remember the patterns in the view. The context            dimension attribute or measure attribute is missing. Similarly, the
of the view, the number of groups in the view, the order of the           coverage of a bar in the view (i.e., an aggregate value bar) is the
groups, and the value pattern in each group are among the factors         percentage of the number of records covered by the bar against
that could affect the user’s determination of Conciseness.                the number of records in the data subset. A bar does not cover a
   Concise views are helpful because they do not overwhelm                record if its dimension attribute does not belong to the group of
the user with a large amount of information, and can be easily            the bar or its measure attribute is missing.
perceived and remembered. Conciseness is useful in all four cate-            The higher the coverage, the higher the generality of the pat-
gories of view recommendation tasks.                                      tern will be. Views with high generality are useful, because the
                                                                          patterns in the view are more likely to be valid in the whole data
   Diversity Diversity [4] measures the perceived fluctuation of          subset as well. Generality is useful in all four categories of view
the aggregate value across the groups. The number of groups               recommendation tasks.
in the view, the order of the groups, and the aggregate value in             A summary of the four categories of view recommendation
each group are among the factors that could affect the user’s             tasks and the utility measures suitable for each category is illus-
determination of the diversity.                                           trated in Table 1.
   Views with high diversity are helpful because they indicate a
high correlation between the dimension attribute and the aggregate        4   THE GUI DESIGN
values. This measure is more useful for non-comparative tasks,
                                                                          In this section, we discuss the details of our designed GUI that
where the views only contain the aggregate values of one data
                                                                          will allow users to evaluate the views with respect to the utility
subset.
                                                                          measures and the overall interestingness.
   Deviation Deviation measures the perceived fluctuation of the             The GUI will reside on a web application and be used in a user
within-group aggregate value difference across the groups. This           study. The frontend of the web application will be developed using
measure is a simplified form of the Deviation measure in [7], in a        JavaScript, and the backend will be developed using Java.
way that this measure does not normalize the aggregate values of             An example view recommendation task will be used in this
each data subset into a distribution. We adopt this simplified form       section to facilitate the GUI demonstration. The example task is
based on the fact that it is very difficult for the user to imagine the   to discover the difference between female and male participants
normalized aggregate values without the assistance of any helper          in the census data. In other words, the user’s task is to find out
views.                                                                    interesting views that show large differences between the aggre-
   The number of groups in the view, the order of the groups,             gate values of the female and the male population. It can be seen
and the aggregate value difference between the data subsets in            that this is an exploratory comparative task. So the utility mea-
each group are among the factors that could affect the user’s             sures suitable for the task are Novelty, Conciseness, Deviation,
determination of the deviation.                                           and Generality. However, in order to demonstrate the measures
                                                        Figure 1: GUI Overview


of Relevance and Diversity, we will include them in the GUI in-           The x-axis is for the dimension attribute (i.e., the attribute by
troduction as well. We will use a different view recommendation       which the result is grouped). The x-axis label is the name of the
task when introducing Relevance and Diversity.                        dimension attribute. The x-axis tick labels are the distinct values
   The GUI has three parts: View Selector, View Inspector, and        of the dimension attribute (i.e., group names).
View Evaluator. The overview of the GUI is shown in Figure 1.             The y-axis is for the aggregate values. The y-axis label is a
                                                                      combination of the name of the aggregation function and the
                                                                      name of the measure attribute (i.e., the attribute on which the
4.1    View Selector                                                  aggregation function is applied). The y-axis label in the example
The View Selector pane (Figure 1 Pane A) is the starting point of     is “AVG(capital-gain)”. The y-axis tick labels are value indicators
the user workflow. The pane contains a list of views, each of which   for the y-axis grid lines.
is in the form of a bar chart or histogram. As mentioned before,          Another component of the view is the legend, which identifies
each view is generated by applying an aggregation function F on       the two data subsets (i.e., population) being compared. The two
a measure attribute M and grouping the result by a dimension          subsets in the example are “Female” and “Male”.
attribute A by a back-end database server.                                The main content of the view is the aggregate values across the
    Besides a thumbnail of the view, each entry in the pane also      groups of the two data subsets. There are two bars in each group.
includes a description and a progress indicator. The description      The left one is for the first subset, while the right one is for the
contains the information about the aggregation function, mea-         second subset. The two bars are in different colors, so that the user
sure attribute, and dimension attribute used to generate the view.    can easily distinguish between the two.
Examples of the description are “COUNT by occupation” and                 The last part of the View Inspector is the additional information
“AVG(capital-gain) by work class”. The progress indicator is in the   section. It contains information that cannot be easily embedded in
form of a percentage number, indicating the completion percent-       the main view. One example of such information is the coverage
age of the questions for a specific view. The entry that is being     information, as discussed in Section 3. For each subset, the cover-
selected will have a light blue background, for example the 6th       age information contains three numbers: the number of records
one in Figure 1, to help the user identify the view that is being     that the view covers, the number of all records in the subset, and
inspected.                                                            the percentage of the former against the latter.
                                                                          If the cursor hovers over one of the bars in the view, the cor-
4.2    View Inspector                                                 responding aggregate value as well as the coverage information
                                                                      for that specific bar (i.e., the number of records covered by the
After the user selects an entry in the View Selector, the selected    bar) will be displayed in the form of a tooltip. The tooltip helps
view will be displayed in the View Inspector (Figure 1 Pane B).       the user to get a precise reading of the aggregate value and an
   The example view in the View Inspector is in the form of a bar     understanding of the coverage information at a finer granularity
chart, and shows the average capital gain of the female and male      level.
population across the different marital statuses.
4.3    View Evaluator                                                        Deviation The question reads, “How different are the within-
The third pane of the GUI is the View Evaluator (Figure 1 Pane            group value differences across the groups? (10 being most dif-
C). It contains eight questions for the user to evaluate the selected     ferent)”. It can be seen that the value difference between the two
view.                                                                     subsets in each group fluctuates a lot across the groups, so the user
    Each question has a title, a description, and a score selector. The   could give a high score for Deviation.
title indicates the utility measure being evaluated. The description         Generality The question reads, “Based on the coverage info,
describes the utility measure and rating rules. The score selector        how well do you think the patterns will be generalized to the
is in the form of a dropdown menu. The options are integers               whole data subset? (10 being most generalizable)”. The coverage
between 0 and 10, inclusive. The rating rules are set such that           information refers to the coverage percentage of the view and
the higher the score, the more interesting the view is with respect       the individual bars. The coverage of the male population is quite
to the utility measure being evaluated. Term definitions will be          low at 20%. Besides, the tooltips show that the coverage for the
displayed when the cursor hovers over the information icon of the         “Married-spouse-absent”, “Separated”, and “Widowed” groups of
questions containing the term. For example, the definition of the         the male population are also very low. Therefore, the user could
context of the view will be displayed for the questions of Novelty        give a low score for Generality, indicating that she thinks that the
and Relevance.                                                            patterns in the view are not very likely to be valid in the whole
                                                                          data subset.
   4.3.1 Utility Measures. The first six questions are designed
                                                                             4.3.2 Overall Interestingness. Questions 7 and 8 are de-
to discover how the user would evaluate the view with respect to
                                                                          signed to discover the utility measure composition of the overall
the utility measures discussed in Section 3.
                                                                          view interestingness. In other words, they are designed to find out
   Based on the affecting factors identified for each utility measure
                                                                          how each utility measure affects the overall view interestingness.
in Section 3, each of the first six questions can be used to discover
                                                                             Question 7 reads, “Based on your scores for the utility measures
which factor(s) the user uses to assess the view with respect to the
                                                                          above, how interesting do you think the view is? (10 being most
corresponding utility measure. We will introduce the questions
                                                                          interesting)”. The question allows the user to consider carefully
and the example feedback based on the example view in Figure 1
                                                                          and comprehensively the different aspects of the view before
for the first six questions in the following.
                                                                          providing an overall score for the view interestingness.
   Novelty The question reads: “How novel (i.e., unfamiliar to               Question 8 reads, “If one of the utility measures of the view has
you) is the context of the view? (10 being most novel)”. If we            changed and the others remain the same, what new overall scores
assume that the user is not very familiar with the capital gains of       will you give?”. Two scenarios have been designed regarding
the two population across different marital status, then the user         how each utility measure could change. They are “Up to 10” and
could give a high score, indicating that the context of the view is       “Down to 0” (i.e., the utility measure goes up to its maximum value
novel to her.                                                             and down to its minimum value). These questions are designed to
    Relevance We temporarily change the view recommendation               discover how the changes of each utility measure affect the overall
task to a targeted task to make Relevance suitable for the task. The      interestingness.
new task is to discover differences in financial situations between
female and male participants across different marital status groups.      5    COMPOSITION DISCOVERY
    The question for Relevance reads, “How relevant to your task          In this section, we discuss how the answers to the questions in our
is the context of the view? (10 being most relevant)”. Since the          proposed GUI could shed light on the general composition form
capital gain is an indicator of financial situations, the view is very    of view interestingness. Specifically, we will illustrate a potential
relevant to the task, and the user could give a high score.               composition form based on user feedback in the example below.
   Conciseness The question reads, “How easy are the patterns             For the sake of simplicity, we will refer to any specific utility
in the view to be perceived and remembered? (10 being easiest)”.          measure as M and the view interestingness as I in the example.
Assume that the user tries to remember the patterns in the view in           After the user answers all the questions for a view, for each
the following way. “For the never-married and the married with            M, we will get three readings of I for three M values. The first
the couple living together, the capital gains of the two populations      M value (i.e., m 1 ) is the answer to the corresponding question for
are similar. For the married with the couple not living together          M in Questions 1 to 6. The corresponding I value (i.e., i 1 ) is the
due to various reasons, capital gain of the female population is          answer to Question 7. The second M value (i.e., m 2 ) is always 10,
less than that of the male population.”                                   which corresponds to the “Up to 10” question for M in Question 8.
   The above pattern is easy to remember, but it requires some            The corresponding I value (i.e., i 2 ) is the answer to the above “Up
effort from the user to come up with a plan to group the original         to 10” question. The third M value (i.e., m 3 ) is always 0, which
marital status groups. Therefore the user could give a medium             corresponds to the “Down to 0” question for M in Question 8. The
score, indicating that some effort was required from her to perceive      corresponding I value (i.e., i 3 ) is the answer to the above “Down
and understand the patterns in the view.                                  to 0” question. We assume that 0 < m 1 < 10, i 3 < i 1 < i 2 based
   Diversity We temporarily change the view recommendation                on user feedback. For instance, Table 2 shows the values of mi
task to a non-comparative task to make Diversity suitable for             and i i , i = 1..3, for the example feedback for the view in Figure 1.
the task. The new task is to discover interesting views for male
participants in the census data. In the new task, only the blue bars               Utility Measure    m1     i1   m2    i2   m3    i3
will remain in the view.                                                               Novelty        7      5    10    6    0     4
   The question for Diversity reads, “How different are the values                  Conciseness       5      5    10    6    0     1
across the groups? (10 being most different)”. It can be seen that                    Deviation       7      5    10    7    0     0
the value in the blue bar fluctuates a lot across the groups, so the
user could give a high score for Diversity.                                              Table 2: Example User Feedback
   The discovery of the composition form is divided into two
steps: 1) determine the basic form of each M in I , 2) refine the
basic form of each M.
   Basic form: Firstly, we use the value of |i 2 − i 3 | to determine
the basic form of each M in I . We list some possible conditions
for |i 2 − i 3 |, which are not exhaustive:
     • If |i 2 − i 3 | is large (e.g., larger than 20/3) and i 3 = 0, which
         means that M has a large influence on I and will bring I
         to zero when it drops to zero, then we call M a type-A
         measure. According to Table 2, Deviation (|i 2 − i 3 | = 7,
                                                                                      Figure 2: Example Interestingness Readings for Deviation
         i 3 = 0) could be a type-A measure.
     • If |i 2 − i 3 | is moderate (e.g., between 10/3 and 20/3) and
         i 3 > 0, which means that M has a moderate influence
         on I , but will not bring I to zero when it drops to zero,               larger in the lower part of its domain, and smaller in the higher
         then we call M a type-B measure. According to Table 2,                   part of its domain. Similarly, an exponent close to 1 means that
         Conciseness (|i 2 − i 3 | = 5, i 3 = 1) could be a type-B                the influence of the utility measure is approximately consistent
         measure.                                                                 throughout its domain. An exponent larger than 1 means that the
     • If |i 2 − i 3 | is small (e.g., smaller than 10/3) and i 3 > 0,            influence of the utility measure is smaller in the lower part of its
         which means that M has a small influence on I and will not               domain, and larger in the higher part of its domain.
         bring I to zero when it drops to zero, then we call M a type-
         C measure. According to Table 2, Novelty (|i 2 − i 3 | = 2,
                                                                                  6     CONCLUSION
         i 3 = 4) could be a type-C measure.                                      In this work, we first create a novel classification system for view
                                                                                  recommendation tasks and identify utility measures suitable for
   If the above three types can cover all utility measures based on               each task category. Then, we design a GUI that uses the identified
user feedback, then we could build a potential form of I as follow:               utility measures to discover how the users evaluate the view with
                                                                                  respect to the utility measures and how the users assess the overall
                  n                                                               view interestingness based on the utility measures. Finally, we
                       ! n               ! n             !
                 Ö        Ö                 Õ
           I=       Ai       (Bi + wbi )         Ci + wc        (3)               use an example to illustrate how user answers to the questions in
                  i=1         i=1                 i=1                             the GUI can be used to discover the composition form of view
where Ai ’s are type-A measures, Bi ’s are type-B measures, Ci ’s                 interestingness.
are type-C measures, and wbi ’s and wc are constants. For the sake                   In the future, we plan to conduct a user study using the pro-
of simplicity, we have omitted any non-essential scaling or offset                posed GUI, collect and analyze the user feedback, and discover
constants in Equation 3.                                                          a general composition form of view interestingness. The discov-
    It can be seen that the behaviors of the measures in Equation 3               ered interestingness composition form will be shared with the
satisfy the conditions for the corresponding types. In other words,               community to advance the development of view recommendation
if all utility measures are of similar values, then the influence of              technologies.
Ai (i.e., the gradient: ∂I /∂Ai ) is larger than that of Bi , which in
turn is larger than that of Ci . Besides, any Ai ’s dropping to zero
                                                                                  REFERENCES
                                                                                   [1] [n. d.]. Tableau public. http://public.tableau.com. ([n. d.]). Accessed: 2019-02-
will bring I to zero, while the drops of any Bi or Ci will not.                        09.
                                                                                   [2] Humaira Ehsan, Mohamed A. Sharaf, and Panos K. Chrysanthis. 2016. MuVE:
   Refined form: Secondly, we use all three points of each M to                        Efficient Multi-Objective View Recommendation for Visual Data Exploration.
refine the basic form of M in I . We introduce a simple refinement                     In IEEE ICDE.
                                                                                   [3] H. Ehsan, M. A. Sharaf, and P. K. Chrysanthis. 2018. Efficient Recommenda-
below, which is the addition of influence change rate (i.e., gradient                  tion of Aggregate Data Visualizations. IEEE Transactions on Knowledge and
change rate) to the basic form.                                                        Data Engineering 30, 2 (2018), 263–277.
   If we call the three value pairs for each M as X (m 1, i 1 ), Y (m 2, i 2 ),    [4] Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for
                                                                                       data mining: A survey. ACM Comput. Surv. 38, 3 (2006), 9. https://doi.org/10.
and Z (m 3, i 3 ), we can draw the three points onto an MI -plane. The                 1145/1132960.1132963
three points for Deviation is illustrated in Figure 2.                             [5] Rischan Mafrur, Mohamed A. Sharaf, and Hina A. Khan. 2018. DiVE: Diversi-
   If we assume that the relationship between I and each M can                         fying View Recommendation for Visual Data Exploration. In ACM CIKM.
                                                                                   [6] Manasi Vartak, Silu Huang, Tarique Siddiqui, Samuel Madden, and Aditya G.
be approximated as exponential in the form of I = M e , then we                        Parameswaran. 2016. Towards Visualization Recommendation Systems. ACM
will be able to use curve fitting to determine the exponent e for                      SIGMOD Record 45, 4 (2016), 34–39.
                                                                                   [7] Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya G. Parameswaran,
each M. For example, the three points in Figure 2 are almost on a                      and Neoklis Polyzotis. 2015. SEEDB: Efficient Data-Driven Visualization
line, and thus the exponent e for Deviation would be close to 1.                       Recommendations to Support Visual Analytics. PVLDB 8, 13 (2015), 2182–
   Then we can refine Equation 3 by adding the exponential rela-                       2193.
                                                                                   [8] Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay,
tionship information for each M to get:                                                Bill Howe, and Jeffrey Heer. 2016. Voyager: Exploratory analysis via faceted
                                                                                       browsing of visualization recommendations. IEEE Transactions on Visualiza-
                  n
                           ! n                ! n                !
                Ö            Ö
                                 e
                                                Õ                                      tion & Computer Graphics 1 (2016), 1–1.
        I=          Aei ai     (Bi bi + wbi )      Cieci + wc           (4)        [9] Xiaozhong Zhang, Xiaoyu Ge, and Panos K. Chrysanthis. 2019. Leverag-
               i=1          i=1                    i=1                                 ing Data-Analysis Session Logs for Efficient, Personalized, Interactive View
                                                                                       Recommendation. In IEEE International Conference on Cognitive Machine
                                                                                       Intelligence.
where eai , ebi , eci are the exponents indicating the influence change           [10] Xiaozhong Zhang, Xiaoyu Ge, Panos K. Chrysanthis, and Mohamed A. Sharaf.
rates of the utility measures.                                                         2019. ViewSeeker: An Interactive View Recommendation Tool. In Proceedings
                                                                                       of the EDBT 2019 BigVis Workshop.
   An exponent between 0 and 1 means that the exponential curve
is concave. In other words, the influence of the utility measure is

</pre>