Speculative Execution of Similarity Queries: Real-Time Parameter Optimization through Visual Exploration T. Spinner1 , U. Schlegel1 , M. Schall1,2 , F. Sperrle1 , R. Sevastjanova1 , B. Gobbo3 , J. Rauscher1 , M. El-Assady1 , D. Keim1 1 University of Konstanz 2 University of Applied Sciences Konstanz 3 Politecnico di Milano Parameter Design a Central Projection View b Tabular Results View c Algorithmic * Cross Similarity View Specific K Speculative Results Rank Tendency Projection Root Query Weight and Attributes Design Speculative Result Change Figure 1: The SimSearch workspace, built around the central projection view (a), showing the projected results of the similarity search algorithm for the query defined in the parameter designer (b). The tabular results view (c) shows the same results, linked to the projected nodes by color and hover events. Both the projection and table view switch to a speculative execution state on hovering a weight slider, indicating the changes occurring if the weight is adjusted accordingly. ABSTRACT exploration of different parameter settings can help to obtain the The parameters of complex analytical models often have an un- proper combination more effectively. Thus, domain experts need predictable influence on the models’ results, rendering parameter concurrent access to models, parameters, and results, enabling tuning a non-intuitive task. By concurrently visualizing both the them to understand how parameters influence the results and model and its results, visual analytics tackles this issue, support- how they can be refined to match the analysis goal. ing the user in understanding the connection between abstract Visual analytics enables users to explore and analyze data and model parameters and model results. We present a visual analyt- models by providing integrated visual representations for data, ics system enabling result understanding and model refinement models, and parameters. Such visual techniques enable interac- on a ranking-based similarity search algorithm. Our system (1) vi- tive parameter adjustment during exploration and analysis [6]. sualizes the results in a projection view, mapping their pair-wise Visual analytics bridges the gap between heuristics to find suit- similarity to screen distance, (2) indicates the influence of model able parameters and domain experts with the knowledge to steer parameters on the results, and (3) implements speculative execu- results in a human-centered direction. For instance, a visual inter- tion to enable real-time iterative refinement on the time-intensive active what-if analysis facilitates experts to understand black-box offline similarity search algorithm. model decisions by enabling direct data and parameter manipula- tion [13]. The comprehensive understanding of the relationship between model parameter choices and outcome is a fundamental 1 INTRODUCTION requirement for well-informed decision making [20]. By applying Similarity search in large database systems is a crucial feature standard visual analytics techniques, such as aggregation, filter- in many applications and often requires a manual adjustment of ing, or speculative execution [18], the vast results- and parameter parameters to suit various search scenarios [17]. Such parameters spaces can be interactively explored, despite the algorithms being are hard to optimize by randomly probing the search space, but time- and resource-consuming. Thus, visual analytics supports they significantly influence the retrieved results’ quality [7]. In the comprehension of parameter choices in similarity search ap- many cases, even experts with prior domain knowledge struggle plications for users and domain experts. Visual analytics enables to understand the inner workings of the used mining models and informed reasoning about a query’s results, allows the under- the influence of abstract model parameters, which prevents them standing and diagnosis of parameters, and supports the user in from reaching the desired analysis goal. Systematic steering and refining those parameters to get the best possible results. We propose a visual analytics workspace to support users in © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- result understanding and model refinement on a ranking-based ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 similarity search algorithm in the context of large data founda- International (CC BY 4.0) tions. Our system consists of a user-centered visualization of parameters and results to facilitate the users’ exploration and un- the parameter space and then guiding the user by estimating derstanding of the parameter choices. We enable users to interac- the effects of parameter changes on the result. Schall et al. [15] tively update model parameters based on their domain knowledge propose a heat-map method to superimpose the prediction of a and findings during the analysis process. Our visual analytics deep neural network over its input image. This allows the model system further facilitates real-time analysis using speculative ex- engineer to identify problems in the prediction and tune the ecution on a time-intensive similarity search algorithm, enabling hyper-parameters accordingly. The resulting workflow is iter- online exploration and execution of the offline algorithm. ative and guided by the provided visualization. This method is Summarizing, we present a visual analytics system for sim- applied to offline handwriting recognition, where spatial infor- ilarity search, providing the following main contributions: (1) mation is essential but not available in ground-truth data. our system supports the understanding of results and parameters, Speculative Execution and Guidance — Sperrle et al. [18] emphasizing the most critical data characteristics by mapping present an adaptation of speculative execution for visual ana- similarity to spatial distance and highlighting communities of lytics to support exploratory model analysis and -optimization in similar attribute combinations. (2) Our system enables the diagno- visual analytics. Inspired by speculative execution in CPUs, they sis of results and parameters by allowing the real-time interactive define it as “the proactive, near-real-time computation of compet- exploration of the parameter space to investigate the influence ing model alternatives” to support model state-space exploration. of parameter choices, enabled by the speculative execution. (3) Our system uses speculative execution to execute queries au- Our system supports the refinement of the involved parameters, tomatically using adapted weights, serving two purposes: first, supporting the iterative guided optimization of the model to solve speculatively preparing those results while the system would oth- a given analysis task. erwise idle enables a near-realtime analysis of related parameter configurations. Second, our system compares all obtained results 2 RELATED WORK and guides the user in their exploration by visually highlighting To cover the various involved research domains and applications, alternative feature weights that produce significantly different we structure our related work into sub-topics, summarizing the results. In recent years, such guidance has been identified as one most relevant works regarding one aspect of our approach. of the main challenges in visual analytics [3, 4] characterized by user and machine teaching each other while mutually learning Visual Analytics Foundations — Similarity searches in large from each other [19]. Such guidance enables a more efficient database systems are often automatically executed using pre- human-machine collaboration and paves the way towards true defined similarity functions and distance measures. However, mixed-initiative [1] systems. user-adaptable similarity search applications increase in impor- tance, and user integration rises [17]. Visual analytics combines Application Background — Related to our application, we fo- automated analysis techniques with interactive visualizations to cus on work for similarity search on heterogeneous data collec- enable users to understand and reason about large datasets [6]. tions. Gionis et al. [5] tackle the curse of dimensionality for search Sacha et al. [14] have presented a knowledge generation model in high-dimensional attribute spaces by hashing data entities that describes how knowledge is generated during the analy- and performing an approximate nearest-neighbor search on the sis process, building upon prior methodologies in visual ana- hashes. Sun et al. [21] present a metapath-based search algorithm, lytics [2, 12]. Besides the computer system that visualizes and deriving similarity from linkage paths in the network, addressing models data, they describe the human as a core element whose the advent of heterogeneous information networks. Patroumpas creativity, interaction abilities, and perception help find and com- and Skoutas [11] frame the problem as search on enriched, geo- prehend patterns hidden in the data. graphical data, i.e., geospatial attributes with additional textual, numerical, or temporal information. Our approach builds upon Weight Space Exploration — As visual analytics is concerned their work, tackling the open challenge of user-centered model with integrating human knowledge with automated machine optimization. learning, it is frequently used for model exploration and opti- mization. Sedlmair et al. [16] provide a conceptual framework of visual parameter space analysis, structuring the design space. Pa- 3 THE SIMILARITY SEARCH SYSTEM jer et al. [10] present a tool for the visual analysis and exploration While search is an essential tool to locate entities of interest in of weight spaces, tackling the problem of setting abstract weight large data foundations, it has significant limitations when the parameters. Their tool supports the understanding of sensitivity data distribution is unknown and, hence, explorative access to the and helps identify weight regions of interest for a desired output. data is required. Specifically, the exact attribute combination of Mühlbacher et al. [9] present TreePOD, a sensitivity-aware ap- the results might not be known beforehand, or multiple entities proach to selecting Pareto-optimal decision trees. In contrast to in a particular region might be of interest. The used similarity most existing work, we tackle the exploratory analysis of simi- search (SimSearch) algorithm [11] fulfills these requirements by larity queries and rely on the analyst’s intuition rather than on considering entities that feature attribute combinations close to quality metrics. the desired search parameters. By specifying the number 𝑘 of Parameter Optimization for Mining Models — Parameter ranked closest matches, the analyst can explore the region of optimization for data mining systems or hyper-parameter opti- interest and refine the search parameters according to the anal- mization in machine learning is an open problem that frequently ysis goal. The high-dimensional search space poses particular occurs in scientific or industrial use-cases. Analytic optimiza- challenges for the visual representation of the results: pair-wise tion or exhaustive search for parameter optimization is often distances between entities and the root search have to be consid- impossible in these models due to black-box methods or high- ered, as well as the influence of each single search attribute. dimensional parameter spaces. Torsney et al. [22] apply a guided The variety of data types and -domains that might occur in the semi-automatic method to this problem by first sampling from data attributes requires the concurrent use of different distance SimSearch Model SimSearch VA Backend b SimSearch Workspace a POST POST SQL Transform 1 1 DB Cache Cache Tabular Tabular Similarity Ranked Ranked Relational Results Results Search Project Data Engine 2 2 Cross Cross Similarity Similarity Matrix Matrix Redis Parallel, speculative execution Figure 2: The SimSearch visual analytics system’s architecture, split into frontend (a) and backend (b) applications. Search queries are issued to the SimSearch engine, which returns (1) a table of the top-𝑘 ranked results and (2) a 𝑘 × 𝑘 cross- similarity matrix, encoding the pair-wise similarities between entities. The results are cached, filtered, projected, and transformed by the SimSearch visual analytics backend before delivering them to the SimSearch workspace frontend. functions, rendering an objective comparison between the ob- decisive criterion for choosing the provided projection methods tained distances impossible. For example, a geospatial attribute was their ability to derive a stable transformation under a chang- might have a real-world geographical distance function associ- ing set of input vectors. The cross-similarity matrix is filtered ated, while a numeric attribute could exemplary have a loga- for its top 𝑘 values and converted into a list representation to rithmic distance function defined. Figure 4a illustrates the non- reduce network load and computational complexity in the fron- comparability of those two measures in a two-axis plot. The tend. Both results, the projected graph, and the cross-similarity similarity search algorithm allows specifying weight parameters list are then cached and returned to the frontend application. in the interval [0; 1] to balance the distance functions between Figure 2 shows the architectural details of the system, including different attributes, tackling this problem. Figure 4b illustrates, data paths, caching, and the applied data transformations. how applying weights can scale the search space accordingly. Caching Strategy and Requirements — The cache has a cru- However, no objective can be optimized to automatically deter- cial impact on the system’s responsiveness, requiring the caching mine the ideal set of weights for a query since it heavily depends strategy to obtain the best possible balance between data topical- on the data domain and the analysis task, rendering human feed- ity and system performance. Since this choice is strongly depen- back essential for parameter optimization. dent on the frequency with which data evolution events occur We, therefore, identify three fundamental challenges: (1) the in the data foundation, we tackle this challenge by occasionally high-dimensional and interconnected results must be presented querying the similarity search engine despite the results already such that the analyst understands their meaning, mapping simi- being present in the cache. Since this strategy triggers a request of larity to the spatial distances in the visualization, (2) the analyst multiple similar parameter combinations, the results in the local must understand the influence of the parameters on the results, search space are updated, maximizing the probability of future and (3) the interactive exploration of the parameter space must cache hits with the most recent data entities. The cache’s required be possible to refine the parameters targeting the analysis goal. storage space is neglectable since, in a typical scenario, 𝑘 = 50 Our proposed visual analytics system makes the similarity can be taken as a reasonable upper-bound for the top-𝑘 results search model accessible in a comprehensive workspace, combin- of interest. The storage consumption for a query grows linearly ing different views and panels to address the identified challenges. with 𝑘, except for the 𝑘 × 𝑘 cross similarity matrix, which grows quadratically. Taking the upper bound of 𝑘 = 50, we can estimate 3.1 The Similarity Search Backend its storage consumption as 50 · 50 · 64 bit = 160 000 bit ≈ 20 kB. To avoid computationally-, time-, and storage-expensive opera- tions in the frontend, our implementation splits the SimSearch system into frontend (2a) and backend (2b). The backend inter- 3.2 The Similarity Search Workspace faces with the similarity search model, being exposed via REST To allow the interactive analysis of the SimSearch algorithms’ re- API. The result of a request to the SimSearch API consists of sults and enable informed decision making during the parameter (1) a ranked list of the top-𝑘 similar results together with (2) a tuning process, our proposed similarity search workspace com- 𝑘 × 𝑘 cross-similarity matrix, denoting the pair-wise similarities bines multiple components in a comprehensive user interface, between every two entities. The raw results are cached by the shown in Figure 1. backend application for later search queries with similar parame- ters. The results are then transformed from the 𝑛-dimensional Central Projection View (1a) — After defining a search query attribute space down to the two-dimensional screen space and and receiving the similarity search engine results, the analyst converted into a graph representation using a specified projection must understand (1) the connection between results and root algorithm. We include different projection methods to achieve query and (2) the pair-wise relationship between the results. The good results for varying search attributes and input parameters: SimSearch workspace is built around a central projection view, for low-dimensional searches (𝑛 ≤ 4), the system supports PCA mapping the n-dimensional data points to the two-dimensional and MDS, based directly on the attribute values or the cross- screen space while preserving the distances between entities similarity matrix, respectively. For higher-dimensional searches, as well as possible. The search attributes are projected as an UMAP [8] can provide fast and stable projections highlighting additional, virtual entity to set the result entities into relation connections in the data while preserving its global topology. The with the specified search parameters. Besides the spatial position of entities in the result space, the Search A pair-wise relation between entities is essential to interpret con- Search C nections and reveal proximities in the data that the projection could not preserve. Therefore, we indicate these relations by ex- tracting the top k values from the cross-similarity matrix and Search B displaying them as links between the respective entities. The edges’ line width is proportional to the similarity between two entities, visually highlighting the most important connections. Figure 3: Cached subsets of the search space covered by Important information for each entity is attached directly to three consecutive searches with slightly changed parame- the projected node: the similarity rank is annotated persistently ters. The stability of the central entities is maximal, while on each node, while the exact attribute combinations and simi- the stability for the border-cases vanishes. larity scores for each attribute can be displayed by hovering an entity either in the projection view or in the tabular results view. entities that are descending from the top 𝑘 results, causing them By coloring the results according to their spatial position in the to lose their place in the table. projection using a two-dimensional colormap, the entities are Time-consuming search operations are executed speculatively visually clustered and linked to the tabular results view. before an actual user interaction is performed, enabling the itera- Besides displaying the inter-linkage, we also apply k-means tive refinement of search parameters. When the user performs an clustering to the projected points, reducing visual clutter by form- action, and the resulting parameter combination causes a cache ing local groups and highlighting results with spatial proximity hit, the results can be delivered and visualized in real-time. Be- in the projection space instead of the attribute space. While the sides the increase in responsiveness, more and more discrete cross-similarity would ideally correspond with the k-means clus- samples of the local search space are present in the cache with ters in the 𝑛-dimensional space, this is not valid for the projected the ongoing analysis process. By setting the frequency of an en- entities since not all information can be preserved in the projec- tity in the latest result sets into relation with the total number tion. Therefore, the clustered entities can share similar attributes, of results, we derive a measure for an entity’s stability over the which, at the same time, might diverge from the most similar changing search parameters, as shown in Figure 3. The stability entities denoted by the cross-similarity matrix. I.e., entities might is then mapped to the node size in the projection view, with be close in only a subset of their attributes, causing them to be larger nodes indicating entities that appear more frequently in assigned to the same cluster, while the total similarity across all the recent result sets. attributes might be vanishing, preventing their cross-similarity link from being strong enough to be displayed. 4 USE-CASES Tabular Result View (1b) — Complementing the projection We show the applicability and advantages of the proposed Sim- view, we include the tabular results view in the SimSearch work- Search workspace based on two exemplary use-cases. The first space, showing the ranked entities together with their attribute use-case (subsection 4.1) is hands-on and describes in detail how set and the corresponding similarity scores. The table’s rows our proposed system can be used to reach the analysis goal, are linked to the nodes in the projection view, simultaneously while the second use-case (subsection 4.2) demonstrates how our highlighting a specific node in both views on mouse hover. By system can be applied to varying tasks and domains. clicking the table header for one attribute column, the column can be re-ordered according to its contained values, enabling the 4.1 Assessing the Local Business Landscape direct comparison between the individual similarity scores for This use-case is based on a real-world, large-scale (≈ 120 GB) each attribute. dataset containing information about companies in Italy. Parameter Designer (1c) — The parameter designer is the pri- In the use-case, a small company with ≈ 50 employees plans to mary interface for specifying and refining search queries, pro- expand, for which several potential new locations are considered. jection settings, and weight parameters. Search attributes can Since the company is dependant on the local infrastructure and be added from a list of all available attributes in the dataset, al- other supplying companies, geographical proximity to those com- lowing to set a target value for each selected parameter. A slider panies is an essential requirement. Simultaneously, the company attached to each attribute enables the analyst to set the attribute’s wants to avoid direct local competition through other companies relative importance concerning all other defined attributes, giv- working in the same sector and having a similar corporate struc- ing full control over the balance between attributes and their ture. Our proposed SimSearch workspace supports the search corresponding distance function. and interactive exploration of the potential company locations To diagnose the weight parameters’ influence on the result to fulfill the company’s requirements. set, hovering a weight slider triggers the projection and tabular By specifying the attribute combinations in the parameter result view to switch to the speculative execution state. In the designer according to the desired or declined company profiles speculative execution state, the views indicate the change in the together with the considered company location, the local search result set under a speculative de- and increase of the respective space can be explored. The projection view reveals the most attribute weight. In the projection view, this is done by inserting similar companies and indicates their pair-wise relationships, re- the possible new positions of the entities under the changing vealing communities and enabling the analyst to assess the most projection, marking the results under a positive weight adjust- influential search attributes. In doing so, it becomes clear that the ment with a red outline and the results under a negative weight geolocation only has marginal influence on the search results, adjustment with a green outline. Complementing the projection, and the shown companies are too far for a business relationship. the tabular results view is extended by two additional columns, Since the numerical search attributes, such as the number of indicating the change in each result entity’s rank and marking employees, can not be objectively compared to the geospatial The embedding method will be chosen to reflect the expert’s do- main knowledge of semantically different and similar documents. Cross-similarities will show potential miss-classifications. This w0 · ∆ geolocation ∆ geolocation s(c1,c2) allows adjusting the weights of the similarity search to increase s(c0,c1) s(c1,c2) s(c0,c1) the similarity to semantically relevant documents and separate s(c0,c2) them from semantically distinct ones. s(c0,c2) 5 DISCUSSION AND FUTURE WORK ∆ num_employees w1 · ∆ num_employees While the presented similarity search workspace implements a (a) (b) variety of features and techniques to make the data search space Figure 4: Similarity search results {𝑐 0, 𝑐 1, 𝑐 2 } and cross- and the model parameter space accessible by the analyst, possi- similarities 𝑠 (𝑐𝑎 , 𝑐𝑏 ) with 𝑎, 𝑏 ∈ {0, 1, 2}. Search attributes ble extensions could further strengthen the system’s usefulness. originating from different data domains render an objec- Such extensions could include improvements to the search func- tive comparison of the similarity scores impossible (a). By tionality and the explanation of results or the implementation applying weightings {𝑤 1, 𝑤 2 }, the analyst can adjust the dis- of advanced guiding techniques. Furthermore, the presented ap- tance functions according to his domain knowledge (b). proach could be generalized to other domains and tasks with a similar problem setting, i.e., where high-dimensional result company location, the weights in the parameter designer have entities of complex mining models have to be visualized, and the to be iteratively refined to match the analyst’s understanding of model must be refined to match a particular analysis task. each attribute’s desired influence on the results. Figure 4 shows Extending the Search Functionality — Additional views could how the weight adjustment helps to balance the different dis- augment the existing visualizations with an abstract overview of tance functions. By indicating the changes in the result set for a possible actions and the resulting changes, enabling the analyst possible weight adjustment, the analyst can exploit the systems to identify possible changes at first glance before descending into speculative execution feature to observe changes in real-time detailed views. For example, an additional view visualizing all and assess the most purposeful operation before the actual, time- possible weight combinations probed by the speculative execu- consuming execution. Using the tabular results view, the analyst tion component and their likely outcomes could provide first can verify the possible changes in detail by observing how each hints where the region of interest might be located. Additional attribute’s ranking would change under the operation or if the interestigness measures could augment the parameter designer’s company would be excluded from the result set. By iteratively weight sliders with information on the intervals corresponding refining the search parameters, the analyst can explore the search with the most significant changes in the result set. Extending the space ideally for each potential location, leading to well-informed interestingness feature, decision boundaries could be estimated decision making for the new location. by probing the search space in regions with a high gradient, providing a sensitivity analysis for each parameter. 4.2 Mail Forwarding Extending Guidance — The system currently provides orient- This use-case is based on internal mail forwarding within a large ing guidance to users alerting them to similar weight config- company. Incoming postal mail is automatically opened and dig- urations that produce significantly different search results. In itized, using an OCR system, on arrival at the company head- addition to highlighting different possible weight settings, the quarters. The digitized mail item is then used as a search query system could actively propose user actions like moving weight on a structured database of the company’s customers, contracts, sliders or switching to different projection methods. By analyzing products, or projects to electronically forward the scanned docu- and learning from user interactions, the system could identify ment to the staff responsible for working this task needing the the users’ preferences and provide suggestions adapted to their document. This use-case requires both a robust search engine for understanding of the domain and analysis task. By giving the retrieving database entries (e.g., contracts) containing keywords, system more initiative in the exploration process, the system names, or numerical values similar to the query document and should become both more effective and efficient to use. semantic understanding of the content to weigh these attributes. Generalization as Visual Analytics Technique — There are The structured data in the database consists of categorical several other problems in automated data mining pipelines with attributes, person or item names, as well as spatial, temporal, nu- the same or a similar structure as the similarity search appli- merical, or general ontological values. These may occur within cation addressed in the presented system, such as clustering, the scanned document with different individual similarities as classification, or graph merging. Specifically, our approach can well as in many different combinations. Thus the need arises be generalized to understand, diagnose, and refine models where to weigh these database attributes against each other to model (1) the result is a number of 𝑛-dimensional entities with arbitrary the overall semantic similarity. This configuration of the search distance functions associated, and (2) the outcome depends on a query likely is done by a human engineer with expert domain set of parameters whose influence on individual results is opaque. knowledge. One approach here is to use a set of example doc- uments for evaluation and repeatedly querying for them and Scalability — The system’s scalability is directly dependent on modifying the attribute weights until relevant database entries the underlying similarity search algorithm. Despite implementing are found with high overall similarity to the query document, various techniques (caching, speculative execution) to enable with less relevant entries being significantly dissimilar. interactive visual analytics on the offline search algorithm, the We propose to use SimSearch in this process to both see the similarity search model’s response time is the limiting factor for overall similarity of the different database entries using the cur- the approach. While response times of 1 − 30 s can be bridged rent configuration and identify clusters in the embedding space. by applying the implemented techniques, longer response times render an online analysis increasingly difficult since (1) non- are coupled with the parameter refinement functionality, inte- ideal sampling points might have been chosen for speculative grating the speculative results into their visual representation. execution or (2) the analyst might change the search space context We show our proposed similarity search workspace’s applica- more rapidly than results can be preemptively queried and cached. bility and usefulness based on two use-cases, both anchored in The response times of the similarity search algorithm could be real-world application examples and datasets. reduced by parallelizing the main stages of the algorithm, namely (1) generating a ranked list of results for each queried attribute ACKNOWLEDGEMENTS and (2) compiling the ranked lists into a list of top-𝑘 results [11]. This work has received funding from the European Union’s Hori- Limitations and Future Work — Currently, views of higher zon 2020 research and innovation programme under grant agree- abstraction giving the analyst reference points on promising anal- ment No 825041. ysis directions are missing. We will tackle this issue by adding a third view to the similarity search workspace, showing all pos- REFERENCES [1] James F. Allen, Curry I. Guinn, and Eric Horvitz. 1999. Mixed-initiative sible weight combinations in a matrix view and indicating the interaction. IEEE Intelligent Systems and their Applications 14, 5, 14–23. regions of the highest expected result change. Currently, the an- [2] Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of alyst has to evaluate the speculative changes in results manually abstract visualization tasks. IEEE Trans. on Vis. and Comput. Graphics 19, 12, 2376–2385. by observing the predicted outcomes and comparing them across [3] Davide Ceneda, Theresia Gschwandtner, and Silvia Miksch. 2019. A Review the different parameter combinations. In future versions, we will of Guidance Approaches in Visual Data Analysis: A Multifocal Perspective. automatically highlight regions of interest using the number Comput. Graphics Forum 38, 3, 861–879. [4] C. Collins, N. Andrienko, T. Schreck, J. Yang, J. Choo, U. Engelke, A. Jena, and of changes in the result set for each combination as an inter- T. Dwyer. 2018. Guidance in the human–machine analytics process. Visual estingness measure. This functionality will be strengthened by Informatics 2, 166–180. [5] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search implementing interactive, adaptive guidance. If one operation in High Dimensions via Hashing. In Proc. of the 25th Intl. Conference on Very has significantly higher interestingness than others, it will be Large Data Bases (VLDB ’99). San Francisco, CA, USA, 518–529. actively proposed as a possibly rewarding action. Furthermore, [6] Daniel Keim, Jörn Kohlhammer, Geoffrey Ellis, and Florian Mansmann. 2010. Mastering The Information Age – Solving Problems with Visual Analytics. Euro- by tracking recent interactions of the user with the system, we graphics Association. will estimate the likelihood of future interactions based on the [7] Sean D MacArthur, Carla E Brodley, Avinash C Kak, and Lynn S Broderick. history, adapting the guidance to user preferences. Despite the 2002. Interactive content-based image retrieval using relevance feedback. Comput. Vision and Image Understanding 88, 2, 55–75. presented use-cases proving our approach’s applicability in dif- [8] Leland McInnes, John Healy, and James Melville. 2018. UMAP: Uni- ferent real-world scenarios and data domains, a future user study form Manifold Approximation and Projection for Dimension Reduction. arXiv:stat.ML/1802.03426 will further validate the system’s usefulness and provide insights [9] T. Mühlbacher, L. Linhardt, T. Möller, and H. Piringer. 2018. TreePOD: on both benefits and open challenges. Besides measuring quanti- Sensitivity-Aware Selection of Pareto-Optimal Decision Trees. IEEE Trans. on tative criteria, such as task completion time and comparing the Vis. and Comput. Graphics 24, 1, 174–183. [10] S. Pajer, M. Streit, T. Torsney-Weir, F. Spechtenhauser, T. Möller, and H. Piringer. analysis results to ground-truth data, an additional qualitative 2017. WeightLifter: Visual Weight Space Exploration for Multi-Criteria Deci- evaluation will expose additional user requirements and future sion Making. IEEE Trans. on Vis. and Comput. Graphics 23, 1, 611–620. points for improvements of the system. [11] Kostas Patroumpas and Dimitrios Skoutas. 2020. Similarity search over en- riched geospatial data. In Proc. of the Sixth Intl. ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data. ACM. [12] Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage 6 CONCLUSION points for analyst technology as identified through cognitive task analysis. In Proc. of Intl. Conference on Intelligence Analysis, Vol. 5. McLean, VA, USA, 2–4. Applying complex data mining models to large data foundations [13] Dominik Sacha, Michael Sedlmair, Leishi Zhang, John A Lee, Jaakko Peltonen, introduces particular challenges to the analysis process. Both the Daniel Weiskopf, Stephen C North, and Daniel A Keim. 2017. What you see is what you can change: Human-centered machine learning by interactive parameter space and the search space might be opaque, requiring visualization. Neurocomputing 268, 164–175. manual probing to approach the regions of interest and, hence, [14] Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Geoffrey Ellis, and Daniel A. Keim. 2014. Knowledge Generation Model for Visual rendering an interactive exploration impossible. Applying visual Analytics. IEEE Trans. on Vis. and Comput. Graphics 20, 12, 1604–1613. analytics, models, parameters, and results can be made accessible [15] Martin Schall, Dominik Sacha, Manuel Stein, Matthias O Franz, and Daniel A through interconnected visualizations, revealing hidden connec- Keim. 2018. Visualization-assisted development of deep learning models in offline handwriting recognition. In Symp. on Vis. in Data Science at IEEE VIS. tions between components and providing advanced mechanisms, [16] M. Sedlmair, C. Heinzl, S. Bruckner, H. Piringer, and T. Möller. 2014. Visual such as speculative execution, to enable the real-time exploration Parameter Space Analysis: A Conceptual Framework. IEEE Trans. on Vis. and of otherwise time-consuming data processing pipelines. Comput. Graphics 20, 12, 2161–2170. [17] Thomas Seidl and Hans-Peter Kriegel. 1997. Efficient user-adaptable similarity The presented system implements views and techniques to search in large multimedia databases. In VLDB, Vol. 97. 506–515. make the parameters and results of a novel similarity search [18] Fabian Sperrle, Jürgen Bernard, Michael Sedlmair, Daniel Keim, and Menna- tallah El-Assady. 2019. Speculative Execution for Guided Visual Analytics. algorithm accessible to the analyst. Specifically, we provide a arXiv:cs.HC/1908.02627v1 projected view of the search results, highlighting the similarity [19] Fabian Sperrle, Astrik Jeitler, Jürgen Bernard, Daniel A. Keim, and Mennatallah to the root query, the pair-wise similarity between the result en- El-Assady. 2020. Learning and Teaching in Co-Adaptive Guidance for Mixed- Initiative Visual Analytics. In EuroVis Workshop on Visual Analytics (EuroVA), tities, the stability of the results, as well as communities of close K. Vrotsou and C. Turkay (Eds.). The Eurographics Association. entities. The projected view is complemented with and linked to [20] Thilo Spinner, Udo Schlegel, Hanna Schäfer, and Mennatallah El-Assady. 2019. a tabular view of the results, indicating their rank and providing explAIner: A visual analytics framework for interactive and explainable ma- chine learning. IEEE Trans. on Vis. and Comput. Graphics 26, 1, 1064–1074. sorting functions on distinct attributes or their corresponding [21] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Path- similarity. Supporting parameter refinement and search space Sim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. Proc. of the VLDB Endowment 4, 11, 992–1003. exploration, the system implements speculative execution on the [22] Thomas Torsney-Weir, Ahmed Saad, Torsten Moller, Hans-Christian Hege, time-consuming similarity search operation, presenting the user Britta Weber, Jean-Marc Verbavatz, and Steven Bergner. 2011. Tuner: Prin- with possible outcomes of parameter changes on-demand before cipled parameter finding for image segmentation algorithms using visual response surface exploration. IEEE Trans. on Vis. and Comput. Graphics 17, 12, actually performing an action. The projection and tabular views 1892–1901.