Trusted and Auditable Decision Aids over Data Streams
            Dominic J. Duxbury                                       Norman W. Paton                                    John A. Keane
         University of Manchester                               University of Manchester                         University of Manchester
        M13 9PL, Manchester, UK,                                M13 9PL, Manchester, UK                          M13 9PL, Manchester, UK
     dominic.duxbury@manchester.ac.uk                        norman.paton@manchester.ac.uk                     john.keane@manchester.ac.uk

                                                                                    we outline our approach to building a decision support platform
                                                                                    for these dynamic multi-criteria optimisation problems.
ABSTRACT                                                                                Decision support systems are only useful if they are trusted
Data stream management systems exist to support dynamic anal-                       by a decision maker. Trust is especially challenging when work-
ysis of streaming data, often to inform decision-making. Decision                   ing with dynamic data; a decision maker does not have time to
support systems exist to enable decisions to be made that take                      ascertain if a black box system has made a mistake, and therefore
into account user priorities. However, although these categories                    it is beneficial to provide provenance data to the decision maker,
of system are now quite mature, there has been little work in-                      ensuring that the information motivating a recommendation is
vestigating their use together. In this paper we bring together a                   readily available. Data provenance provides a historical record of
well established streaming platform (Storm) and a widely used                       data and its origins, which allows the user to assess data quality
decision-support methodology (Analytic Hierarchy Process) to                        and suitability. In addition to the underlying evidence, it is also
provide dynamic decision support over data streams. In so do-                       important that the user has some understanding of the space of
ing, we also investigate approaches making recommendations                          possible solutions; as a result, some form of explanation mecha-
auditable (using provenance) and trustable (using explanations).                    nism is required that explains how a recommendation has been
The resulting stream decision support system is illustrated using                   arrived at, and/or describes the relationship between alternative
an application that supports train journey planning.                                options.
                                                                                        All this is required in a context where there may be genuine
                                                                                    uncertainty relating to criteria that inform a recommendation.
1    INTRODUCTION                                                                   As such, it is important for maintaining trust to ensure that the
                                                                                    uncertainty intrinsic in a recommendation is either presented to
Data streams exist as an abstraction to support analysis of dy-
                                                                                    a user or able to be reflected within the decision-making process.
namic data as it is produced [11]. Decision Support systems exist
                                                                                        Drawing this together, we have the following 5 desiderata for
to support users in navigating a space of options [3]. These seem
                                                                                    dynamic multi-criteria decision support systems:
to be complimentary paradigms, which can be brought together
to support decision making with dynamic data. Current practice                          (1) declarative specification of preferences,
in stream data processing makes extensive use of Stream Pro-                            (2) dynamic revision of recommendations,
cessing Engines (SPEs) which provide a framework for acting                             (3) provenance capturing the data underpinning decisions,
upon elements in a stream. For decision support, an interesting                         (4) explanation of how a proposal was made, and
problem is how to build on these capabilities to support real-time                      (5) explicit support for uncertain data.
decision support over streams.                                                          To investigate how these desiderata can be supported in stream
   For real-time decision support systems, the choices made by                      decision support, a running example based on train journey plan-
decision makers often affect the state of the system. It is therefore               ning is introduced in Section 2. An architecture for dynamic
useful to model decision makers as not just users, but as compo-                    decision support is described in Section 3. The application of
nents of a cyber-physical-social system (CPSS). CPSS span the                       the architecture to support the above desiderata is discussed in
physical, information, cognitive and social domains. In the CPSS                    Section 4. Section 5 describes some related work, and conclusions
field, human users are considered a component of the system;                        are presented in Section 6.
falling within the cognitive domain [7]. Human components can
be a necessary part of a system, such as when making life or                        2    MOTIVATING EXAMPLE
death decisions. Decision support systems are therefore often                       To illustrate multi-criteria decision support over streams, we
vital, as they bridge the information and cognitive domains by                      consider an application relating to train journey planning. We
distilling data to assist decision makers.                                          assume that a user can state where they need to go from and
   Decision support systems are enabled by decision analysis.                       to, along with the proposed start time. We also assume that the
Decision analysis is the field concerned with the study of complex                  most suitable journey time for a user may depend on different
decisions. Multi-criteria decision analysis is a sub-discipline of                  criteria, specifically the arrival time of the journey, the price of
decision analysis comprising techniques for evaluating solutions                    the journey, and the number of changes.
with multiple conflicting criteria [3]. A common example of this                        For example, in Figure 1, a decision maker must choose a route
is purchasing a car; the safest car is not often the cheapest and                   from A to F in a way that takes into account price, arrival time
so these criteria are conflicting. These criteria can have different                and number of changes.
importance to different decision makers so we require a method                          Table 1 shows the solutions to this example. We note that
for users to specify their preferences. If the values of these criteria             the solution ABF dominates ABDF as it is equal or better for
are also changing then we call the problem dynamic. In this paper                   all criteria values. This leaves us with two potential solutions;
                                                                                    ABF and ACDF . A business person may prefer ABF because it
First International Workshop on Data Science for Industry 4.0.                      is quicker, whereas a student may prefer to save money and
Copyright ©2019 for the individual papers by the papers’ authors. Copying permit-
ted for private and academic purposes. This volume is published and copyrighted     take ACDF . There is no optimal solution for everyone and so we
by its editors.                                                                     require user specification of criteria preferences (Desiderata 1).

Published in the Workshop Proceedings of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on CEUR-WS.org.
         Figure 1: Example Train Routing Scenario


Circles and arrows depict stations and trains respectively.

         Solution Price (£) Changes Arrival Time
         ABF       15         1          14:00
         ABDF      16         2          14:00
         ACDF      9          2          14:40                                        Figure 2: Prototype Architecture
              Table 1: The solutions to figure 1.

                                                                       service generates a list of train journeys between the requested
                                                                       origin and destination stations at the specified departure time.
   One such criterion, arrival time, indicates the expected arrival    Initial values are then calculated for all criteria. The Timetable Ser-
time of a journey. This is subject to change, as trains may be         vice returns an unranked list of train journeys which are passed
delayed or lines closed. Ticket prices are also subject to change      from the Application Controller to the Live Train Service. A stream-
up until the time of purchase. If a train is delayed or the price      ing component is also required to update the dynamic criteria
increases, the resulting solution may no longer be optimal, there-     and to produce a new ranking in real-time. The Live Train Service
fore dynamically revising recommendations (Desiderata 2) to            is an implementation of this component for the train scenario.
reflect the most recent information is clearly beneficial. The user    In this case the live train service must update the expected train
may also move between stations as a part of their interaction          arrival time. The Live Train Service is initialised with a list of train
with the system; hence requiring an entirely new set of solutions.     journeys, which are ranked by the Ranking Service. A stream of
   A decision maker may see these solutions and choose option          UK wide train updates from National Rail is filtered, and match-
ACDF because they believe it will only take 10 minutes. However,       ing updates are used to update criteria values. The updated list
this route could unreliable due to engineering works, so it may be     of train journeys is then re-ranked by the Ranking Service. The
important for the user to understand the source and derivation         output stream of ranked train journeys is communicated to the
of criteria values (Desiderata 3) to improve trustability, or to       User Interface over web-sockets.
understand the uncertainty that is characteristic of this particular      The Ranking Service accepts a specification of preferences and
train service (Desiderata 5).                                          a list of solutions, to produce a ranking. This ranking is calcu-
   Finally, after expressing their preferences, accepting criteria     lated through the application of the Analytic Hierarchy Process,
values and understanding uncertain aspects, a user is left with a      a popular method for multi-criteria decision analysis. The criteria
recommended journey. It may be difficult to trust this recommen-       and criteria behaviour are specified through the configuration.
dation without understanding why it was selected. Therefore            For example we specify that price is a criterion and should be
we should provide the user with an explanation of where the            minimised. This allows the service to remain generic. The other
recommendation falls in the solution space, so that they can un-       generic component is the provenance sub-system. The prove-
derstand the trade-offs being made, and how this ties into their       nance sub-system generates, stores and serves provenance data
criteria preferences (Desiderata 4).                                   within the platform. This subsystem is made up of a message
                                                                       queue, a database (Prov DB) and two services; one for generat-
3   ARCHITECTURE                                                       ing provenance (Prov Generator Service), one for serving it (Prov
To evaluate our approach, a prototype platform has been devel-         Provider Service). The sub-system receives messages from the
oped. This platform implements our desiderata from Section 1,          streaming service which are processed to produce provenance
whilst providing decision support for train route planning. The        graphs.
system utilises a micro-services architecture shown in Figure 2.
   The decision maker operates the decision support system             3.1    Architecture Components
through the user interface. The user inputs details for a planned      In this subsection, we provide further details of the components
trip; an origin station, a destination station and a departure time.   in Figure 2.
The user also must specify their preferences with regard to the
criteria. This information is sent with a request to open a web-          Live Train Service. The live train service applies Apache Storm
sockets connection to the Application Controller. The Application      to transform streams of tuples. Apache Storm is an open source
Controller holds the state of the train journeys (solutions) within    SPE which utilises three abstractions; spouts, bolts and topolo-
the system. The controller uses the planned trip to build an http      gies. Spouts produce streams. Bolts consume any number of
request to send to the Timetable Service.                              streams to produce new output streams. A topology describes
   Our architecture requires a solution service to generate the ini-   a network of spouts and bolts. Within our streaming compo-
tial solution space. The Timetable Service is the implementation       nent we instrument these operators to extract provenance data.
of the solution service for the train route planning scenario. The     We extend the base classes for bolts and spouts to produce two
 Operator                         Input             Output
 NationalRailSpout                N/A               <timestamp :: Timestamp, id :: trainID, destination :: String, newExpectedArrival :: Timestamp>
 DelayBolt                        NationalRailSpout <timestamp :: Timestamp, journeys :: [Journey]>
 RankingBolt                      DelayBolt         <timestamp :: Timestamp, rankedJourneys :: [<score :: Double, journey :: Journey>] >
                                                 Table 2: Input and Output types for each operator


new provenance aware classes; ProvenanceAwareBolt and Prove-
nanceAwareSpout. An example of a bolt extending this class is
shown in Listing 1. Execute defines how a bolt processes each
tuple and declareOutputFields declares the shape of tuples in the
output stream. An operator inheriting from these classes will
write provenance information concerning its inputs and outputs
to the provenance sub-system.
   For the train route scenario we have three operators; Nation-
alRailSpout, DelayBolt and RankingBolt . The NationalRailSpout
produces a stream of delays, the DelayBolt applies relevant delays
to a list of journeys and the RankingBolt interfaces with the Rank-
ing Service to calculate a score for each journey. Table 2 shows
the input and output tuples for each operator. We instrument all
                                                                                                     Figure 3: Provenance graph for a train schedule update
the operators to supply us with provenance regarding the history
of solutions, their criteria values and the resulting ranking.

p u b l i c c l a s s E x a m p l e B o l t extends P r o v e n a n c e A w a r e B o l t {         or an exponential scale (2). These formulas map two normalised
   p u b l i c void e x e c u t e ( T u p l e t u p l e ) { }                                       values (x, y) to the fundamental scale proposed by Saaty [14]. For
   p u b l i c void d e c l a r e O u t p u t F i e l d s ( D e c l a r e r d e c l a r e r ) { }
                                                                                                    the train route planning scenario we apply the first formula (1),
}
                                                                                                    because all criteria form a linear scale. E.g. train prices might be
           Listing 1: Code for a provenance aware bolt                                              £10, £15, £20 for three alternative routes and not £10, £100, £1000.

   Ranking Service. To calculate a recommendation we apply the                                                                                            ex
                                                                                                      f (x, y) = |(x − y) × 8| + 1 (1)          f (x, y) = y        (2)
Analytic Hierarchy Process (AHP) [14]. AHP is a structured tech-                                                                                          e
nique for organising and analysing complex decisions. AHP con-
sists of an overall goal, a group of options or alternatives for
                                                                                                        The eigenvalues of the comparison matrix for each criterion
reaching the goal and a group of factors or criteria that relate
                                                                                                    represent the score for the respective criteria value of each so-
the alternatives to the goal; the criteria can be further broken
                                                                                                    lution. The criteria value scores are then multiplied by the rele-
down. These criteria generally have different values for different
                                                                                                    vant criteria weightings and summed across each solution. This
decision makers and so the algorithm requires users to express
                                                                                                    process produces the scores which are used to derive a global
their preferences. The user preferences are expressed in the form
                                                                                                    ranking.
of pairwise comparisons. For instance, a decision maker could
                                                                                                        The normalisation of criteria values can cause some brittleness
express that “Price is more important than Travel Duration”. Pair-
                                                                                                    in the results when we only have a small range. If the algorithm
wise comparisons are easy for a user to express and model the
                                                                                                    is supplied with two journeys, one costing £50 and another £51
users knowledge within the system. The comparisons are then
                                                                                                    these are seen as the best and worst possible price and so scored
used to generate weightings for each criteria.
                                                                                                    accordingly. It would be beneficial for the algorithm to recognise
   To produce a ranking, criteria values must also be scored. To
                                                                                                    that there is little difference between these two prices. We aim
do this the values are first normalised according to the range of
                                                                                                    to solve this by allowing those implementing the framework to
values across all solutions using the following formula:
                                                                                                    specify a range of possible values for a criterion.
                                  x − minX                                                              The decision support component operates over web-sockets.
                            Norm(x) =
                               maxX − minX                                                          The service requires a configuration file when a connection is
Where minX and maxX are the smallest and largest criteria values                                    opened, providing information about criteria. Critically the con-
respectively. The values are then compared pairwise to generate                                     figuration indicates the number of criteria and whether numerical
a comparison matrix. For three solutions S 1 , S 2 and S 3 and a                                    criteria should be maximised or minimised. The configuration
criterion X with normalised criteria values x 1 , x 2 , x 3 , we would                              also allows us to indicate how we should compare non numerical
generate a comparison matrix C.                                                                     criteria. Once a connection is opened, AHP is applied to a stream
                                                                                                    of solutions, producing a stream of rankings.
                           S1                        S2                S3
                   S1 "     1                    f (x 1 , x 2 )    f (x 1 , x 3 )#                     Provenance Sub-system. The provenance sub-system processes
               C = S 2 f (x 2 , x 1 )                 1            f (x 2 , x 3 )                   messages from the streaming system and stores the output in
                   S 3 f (x 3 , x 1 )            f (x 3 , x 2 )         1                           a database for future querying. To store this data we choose to
                                                                                                    conform to the PROV standard [10]. PROV defines a data model
  We provide two separate formulas for comparing criteria val-                                      consisting of a set of vertices and edges for modelling provenance
ues, depending on whether the values fall along a linear scale (1)                                  as graphs. We adapt a subset of these to map to concepts from data
stream analysis. For vertices we use entities, activities and agents.
For edges we use wasGeneratedBy, used and wasAssociatedWith.
    The PROV data model describes entities as “an immutable
piece of state”, activities as “dynamic aspects of the world which
produce entities” and agents as “parties which take a role in activ-
ities”. We model stream elements as entities, stream operations as
activities and stream operators as agents. Note, we call a set of in-
puts and outputs a stream operation. The stream operator refers
to the operator applied to these inputs to produce the outputs.
    Edges describe the relationships between two entities. wasGen-
eratedBy links an entity to the activity which generated it. used
links an activity to an entity it consumed. wasAssociatedWith
links an activity to an agent associated with it. We say a stream
element was generated by a stream operation. These operations
used a stream element or window of elements. The operation also
wasAssociatedWith the operator which was applied. An example
provenance graph is shown in Figure 3. This example shows the
derivation for an expected train arrival time. The new arrival time
wasGeneratedBy an operation which used the scheduled arrival
time and the schedule delay. The operation wasAssociatedWith
the delay operator (DelayBolt).                                         Figure 4: Cumulative Density Function for Arrival Time

3.2    Framework Concepts
In the remainder of this section, we explain what we mean by ex-        this distribution we can view the probability of the potential
planation and uncertainty and how these concepts surface within         risks (lateness) for a journey. CDFs serve as alternative to criteria
our architecture.                                                       values for uncertain criteria but we require a method of compar-
                                                                        ing two CDFs. To do this we extract three key values from the
   Explanation. The AHP algorithm outputs a weight vector for           distribution; optimistic, expected and pessimistic values. For a
criteria and a score for each solution. Whilst this is useful for       CDF f we define optimistic, expected and pessimistic values as
constructing a ranking, these values are difficult for a human          x such that f (x) = 0.05, f (x) = 0.5 and f (x) = 0.95 respectively.
to interpret. Therefore we require some further explanation of          An example for train arrival times is shown in Figure 4. The user
how the system arrived at a recommendation. Fundamentally               interface allows the decision maker to toggle which of these three
we describe explanation as a description of how a set of criteria       values is fed into the ranking algorithm.
preferences are used by AHP to select a solution from a solution
space. Perhaps the most important part, is an explanation of the        4    MOTIVATING EXAMPLE APPLICATION
trade-offs and benefits of a recommendation and how this ties           In this section we explain how the user interacts with the system
into the specified user preferences. For instance, in the case of       and how this interface supports the five desiderata from Section 1.
train route planning, a user could specify that price is critical       The user interface aims to target end-users, rather than decision
to them. Assuming the system recommends ABC, the cheapest               scientists [16]. The user interface for the train route planner is
option, a simple explanation would be that ABC is the cheapest          shown in Figure 5.
train and price is the most important criterion.                           For a decision maker planning a train journey, the first task
   Our recommendations are dynamic and so it is important that          is to specify the planned trip. The top left corner shows the
an explanation can be processed by the user quickly. This lead          trip input form, where the user can input where they wish to
us towards visual forms of explanation such as bar and spider           travel From (Origin Station), To (Destination Station) and the
charts. Spider charts visualise multi-variate data as a shape con-      time they are Leaving At (Departure Time). Once these values
structed from three or more quantitative variables across axes          are set the user can click Calculate Routes to generate a set of
stemming from the same point. Typically a chart with a larger           possible journeys. The next task is for the user to specify their
area represents a better solution, but these charts can be mislead-     preferences (Desiderata 1). In our user interface these pairwise
ing as the order of criteria can greatly affect the area. For this      user preferences are located in the bottom left. In Figure 5 the
reason we chose instead to visualise the solution space through         preferences are set to default, with all criteria equal. Each pair can
bar charts where the values for each criterion and solution are         be set through a drop-down menu one of five potential values;
plotted side-by-side. Bar charts are one of the most simple forms
                                                                            (1) X is much more important than Y ,
of data visualisation, leaving less room for misinterpretation.
                                                                            (2) X is more important than Y ,
   Uncertainty. Uncertainty is modelled using cumulative proba-             (3) X is just as important as Y ,
bility density functions (CDFs) drawn from historical data. These           (4) X is less important than Y ,
functions capture information regarding the potential values of             (5) X is much less important than Y .
an uncertain criterion for a particular solution. Arrival time is an    These preferences can be changed at any point, triggering the
uncertain criterion for train route planning. We derive a CDF of        system to re-rank the journeys.
arrival times for a journey from the historical performance of the         Once the planned trip and preferences have been detailed
trains travelling the same route. These CDFs are a simple model,        the user is presented with the top five ranked journeys (the
capturing the distribution of potential criteria values. Through        fourth and fifth fall below the fold). Immediately the user can
                                              Figure 5: Route Planning User Interface


view criteria values of each journey (Price , Arrival Time and        times (such as commuters) whereas pessimistic values would
Transfers ). These values and the resultant ranking are updated       be more important in a scenario where a user is travelling for
continuously once routes have been calculated (Desiderata 2).         something more time critical (such as a job interview).
To prevent information overload some extra details are hidden.
Clicking the plus next to Journey Path displays the information       5   RELATED WORK
needed to undertake a journey, including the journey path and the     This paper has proposed an approach for the integration of
trains of which the journey is composed. Each journey also has        streaming data with decision support methodologies, with a view
a View Detail button, which allows the user to view provenance
information in a pop-up window (Desiderata 3). The design for
this window is shown in Figure 6. Here the user can view the
history of values for Arrival Time and the data sources.
   The values for each of the criteria are shown in the bar charts
at the top of Figure 5, with the x-axes ordered according to the
ranking. These charts allow the user to visually compare a rec-
ommendation (the furthest left value) to the solution space (all
other values). The charts are also ordered according to the weight-
ing calculated through AHP, with the most important criteria
appearing on the left. This means a user can both understand
the trade-offs of a recommendation and how this ties into their
specified preferences (Desiderata 4).
   Finally the user can toggle between Pessimistic , Expected and
Optimistic modes for the predicted arrival time by clicking the
corresponding button. These modes simply change the value
extracted from the CDF, as described in Section 3.2 (Desiderata 5).
Expected values are more useful for users making a journey many             Figure 6: Provenance Data for an Arrival Time
to enabling users to make decisions that reflect their priorities in   ACKNOWLEDGMENTS
the context of a changing physical environment. In this section,       Dominic Duxbury is supported by an EPSRC iCASE award in
we review related work on the intersection of cyber-physical           association with BAE Systems. The authors would also like to
systems (CPS) with decision support, stream data analytics and         recognise Andrew Campbell and Joseph Allen for their assistance
provenance for data streams.                                           in designing the user interface.
   In relation to CPS, decision support is growing in significance.
CPS with key decision support components are being widely              REFERENCES
adopted in the medical field ([4, 19]). These systems advise doc-       [1] J. BenÃŋtez, X. Delgado-GalvÃąn, J. Izquierdo, and R. PÃľrez-GarcÃŋa. 2012.
tors in the diagnosis and treatment of patients. Liu et al. [7]             An approach to AHP decision in a dynamic context. Decision Support Systems
                                                                            53, 3 (2012), 499 – 506. DOI:http://dx.doi.org/10.1016/j.dss.2012.04.015
outlines a framework in the context of command and control;             [2] Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy. 2005.
highlighting how decision support can be integrated within a                Mining Data Streams: A Review. SIGMOD Rec. 34, 2 (June 2005), 18–26. DOI:
larger CPS and the benefits of doing so. Wang [18] et al. make the          http://dx.doi.org/10.1145/1083784.1083789
                                                                        [3] Salvatore Greco, Matthias Ehrgott, and JoseÌĄ Rui Figueira. 2016. Multiple
argument for referring to CPS as cyber-physical-social systems              Criteria Decision Analysis. Springer, Springer, New York, NY.
(CPSS). This paper argues the importance of the human aspect            [4] Yu Jiang, Houbing Song, Rui Wang, Ming Gu, Jiaguang Sun, and Lui Sha. 2017.
within CPS, identifying that users should be more closely inte-             Data-centered runtime verification of wireless medical cyber-physical system.
                                                                            IEEE Transactions on Industrial Informatics 13, 4 (aug 2017), 1900–1909. DOI:
grated within the systems they control. Our architecture fulfils            http://dx.doi.org/10.1109/TII.2016.2573762
this paradigm by improving extraction of knowledge (pairwise            [5] M. Kontaki, A. N. Papadopoulos, and Y. Manolopoulos. 2008. Continuous
                                                                            K-dominant Skyline Computation on Multidimensional Data Streams. In Pro-
comparisons) and presentation of knowledge (recommendations).               ceedings of the 2008 ACM Symposium on Applied Computing (SAC ’08). ACM,
   There is a substantial body of work on stream data analyses,             New York, NY, USA, 956–960. DOI:http://dx.doi.org/10.1145/1363686.1363908
often investigating how specific analyses can be carried out effi-      [6] Hyo-Sang Lim, Yang-Sae Moon, and Elisa Bertino. 2010. Provenance-based
                                                                            Trustworthiness Assessment in Sensor Networks. In Proceedings of the Sev-
ciently on rapidly streaming data (e.g. [2, 15]). Here the focus has        enth International Workshop on Data Management for Sensor Networks (DMSN
been more on the intersection of streaming and decision support             ’10). ACM, New York, NY, USA, 2–7. DOI:http://dx.doi.org/10.1145/1858158.
architectures than on algorithms for stream analytics, although             1858162
                                                                        [7] Zhong Liu, Dong Sheng Yang, Ding Wen, Wei Ming Zhang, and Wenji Mao.
this architectural work would benefit from, and presents specific           2011. Cyber-physical-social systems for command and control. IEEE Intelligent
requirements for, efficient multi-dimensional optimization over             Systems 26, 4 (2011), 92–96. DOI:http://dx.doi.org/10.1109/MIS.2011.69
                                                                        [8] Peter Macko and Margo Seltzer. 2012. A General-purpose Provenance Li-
streams (e.g. [5]).                                                         brary. In Proceedings of the 4th USENIX Conference on Theory and Prac-
   It has been recognised that multi-criteria decision support              tice of Provenance (TaPP’12). USENIX Association, Berkeley, CA, USA, 6–6.
systems need to operate in dynamic environments. For example,               http://dl.acm.org/citation.cfm?id=2342875.2342881
                                                                        [9] Archan Misra, Marion Blount, Anastasios Kementsietsidis, Daby Sow, and
Benitez et al. [1] and Raharjo et al. [13] consider making incre-           Min Wang. 2008. Advances and Challenges for Scalable Provenance in Stream
mental responses to changes in criteria, but there has been less            Processing Systems. In Provenance and Annotation of Data and Processes, Ju-
of a focus on responding to changes in criteria values.                     liana Freire, David Koop, and Luc Moreau (Eds.). Springer Berlin Heidelberg,
                                                                            Berlin, Heidelberg, 253–265.
   It has also been recognized that provenance for data streams        [10] Luc Moreau, Paolo Missier, James Cheney, and Stian Soiland-Reyes. 2013.
is both important for specific streaming applications where deci-           PROV-N: The Provenance Notation. World Wide Web Consortium, United
                                                                            States.
sions may be audited, but also challenging in relation to scalabil-    [11] S. Muthukrishnan. 2005. Data Streams: Algorithms and Applications. now, 2600
ity [9]. Previous work has involved designing generic approaches            AD Delft, The Netherlands. https://ieeexplore.ieee.org/document/8186985
to collecting and storing provenance data [8, 12]. These systems       [12] Priya. Narasimhan and Peter Triantafillou. 2012. SPADE: support for prove-
                                                                            nance auditing in distributed environments. In Proceedings of the 13th Inter-
provide a generic interface for provenance management but no                national Middleware Conference. Springer, Springer, New York, NY, 101–120.
integration with streaming systems. Lim et al. have looked at               https://dl.acm.org/citation.cfm?id=2442634
integrating provenance with streaming systems in the context of        [13] Hendry Raharjo, Min Xie, and Aarnout C. Brombacher. 2009. On modeling
                                                                            dynamic priorities in the analytic hierarchy process using compositional data
sensor networks [6], and Blount et al. provided provenance for              analysis. European Journal of Operational Research 194, 3 (2009), 834 – 846.
medical event streams [17]. These papers engineer a solution for            DOI:http://dx.doi.org/10.1016/j.ejor.2008.01.012
                                                                       [14] R. W. Saaty. 1987. The analytic hierarchy process-what it is and how it is used.
generating and managing provenance specific to their respective             Mathematical Modelling 9, 3-5 (1987), 161–176. DOI:http://dx.doi.org/10.1016/
areas rather than seeking to integrate provenance generation                0270-0255(87)90473-8
into generic SPEs.                                                     [15] Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka,
                                                                            André C. P. L. F. de Carvalho, and João Gama. 2013. Data Stream Clustering:
                                                                            A Survey. ACM Comput. Surv. 46, 1, Article 13 (July 2013), 31 pages. DOI:
                                                                            http://dx.doi.org/10.1145/2522968.2522981
                                                                       [16] Sajid Siraj, Ludmil Mikhailov, and John A. Keane. 2015. PriEsT: an interactive
                                                                            decision support tool to estimate priorities from pairwise comparison judg-
6   CONCLUSIONS                                                             ments. ITOR 22, 2 (2015), 217–235. DOI:http://dx.doi.org/10.1111/itor.12054
Decision support systems use user-specified criteria to compare        [17] Min Wang, Marion Blount, John Davis, Archan Misra, and Daby Sow. 2007.
                                                                            A Time-and-value Centric Provenance Model and Architecture for Medi-
candidate solutions within a multi-dimensional space of alter-              cal Event Streams. In Proceedings of the 1st ACM SIGMOBILE International
natives. This requirement for user-driven comparison of can-                Workshop on Systems and Networking Support for Healthcare and Assisted Liv-
didate outcomes is widely recognised in decision support, and               ing Environments (HealthNet ’07). ACM, New York, NY, USA, 95–100. DOI:
                                                                            http://dx.doi.org/10.1145/1248054.1248082
seems relevant to streaming applications in transport, health-         [18] Ying Ming Wang and Kwai Sang Chin. 2011. Fuzzy analytic hierarchy process:
care, command and control, etc. In this paper we have identified            A logarithmic fuzzy preference programming methodology. International
                                                                            Journal of Approximate Reasoning 52, 4 (2011), 541–553. DOI:http://dx.doi.
five desiderata for trusted and auditable decision aids over data           org/10.1016/j.ijar.2010.12.004
streams, described an architecture that supports these desider-        [19] Yin Zhang, Meikang Qiu, Chun-Wei Tsai, Mohammad Mehedi Hassan, and
ata, and illustrated its application to an application in journey           Atif Alamri. 2017. Health-CPS: Healthcare Cyber-Physical System Assisted
                                                                            by Cloud and Big Data. IEEE Systems Journal 11, 1 (mar 2017), 88–95. DOI:
planning. Future work includes the evaluation of the approach in            http://dx.doi.org/10.1109/JSYST.2015.2460747
different applications, scalability of decision support over high-
velocity data streams, and investigation of different approaches
to uncertainty.