-

International Conference on Design of Experimental Search & Information REtrieval Systems, September

Exploring Datasets via Cell-Centric Indexing

Jef Heflin

Brian D. Davison

Haiyan Jia

1 0 Computer Science & Engineering, Lehigh University , 113 Research Dr., Bethlehem, PA, 18015 , USA 1 Journalism and Communication, Lehigh University , 33 Coppee Dr., Bethlehem, PA, 18015 , USA

2021

1 5 18

We present a novel approach to dataset search and exploration. Cell-centric indexing is a unique indexing strategy that enables a powerful, new interface. The strategy treats individual cells of a table as the indexed unit, and combining this with a number of structure-specific fields enables queries that cannot be answered by a traditional indexing approach. Our interface provides users with an overview of a dataset repository, and allows them to eficiently use various facets to explore the collection and identify datasets that match their interests.

eol>cell-centric indexing dataset search exploratory interface

1. Introduction not be able to determine if the table is a perfect match until they have downloaded the (potentially large) table.

The twenty-first century has experienced an informa- Studies have shown that understanding what is inside tion explosion; data is growing exponentially and users’ the content of a dataset, rather than simply the dataset information retrieval needs are becoming much more descriptions and metadata, could be critical for their evalcomplicated [1]. Given people’s increasing interests in uation of whether any of the search results suficiently datasets, there is a need for user-friendly search services matches the search need, especially for non-expert users. for data journalists, scientists, decision makers, and the For instance, a recent user study [2] has revealed that general public to locate datasets that can meet their data query refinement, as a result of unsatisfying search reneeds. sults, is negatively associated with user experience with

Even though users, under many circumstances, are the dataset search tools. What reduces the need for query not experts in the domain in which they search, they refinement is a preview of the dataset content, which should be able to easily use such an application; the query helps users gauge the relevance of the datasets. Simiprocess should be responsive and eficient. The result larly, an experimental study that explores novel dataset should provide a general picture of what the dataset is search engine prototypes has found that interfaces with about, and ofer enough information for the searcher to a content preview feature were perceived as more usable. know how likely the dataset will contain data that they In particular, non-expert users reported greater benefits look for. from the content preview, as they rated the interfaces

Traditional database management systems group data with higher levels of usefulness, ease of use, usability, by tables and then organize this data into rows and and technology adoption intention, than expert users columns. When users are aware of the database schema, [3]. These indicate the strong need for understanding they can construct queries, but what if users are simply the actual content of datasets, even at the cell level. trying to find which tables in a large data lake are rel- To enable suficient query refinement for schemaevant to their needs? One approach is to simply index optional queries, we present the novel concept of cellinformation about the table in various fields: e.g., title, centric indexing. The key idea is that we use individuals description, columns, etc. While this approach may be cells of a table as the fundamental unit and build inverted suficient for some queries, in many cases the user will indices on these cells. These indices provide diferent ifelds that index both the content of the cell and its context. For our purposes, the context includes other cell values in the same row, the name of the column (if available), and metadata about the containing dataset. This approach allows us to refine our search by row descriptors, column descriptors or both at the same time. In essence we free the data from how it is structured, and schema information, when available, is merely one of the many ways to locate the data of interest. Thus, we take the view that fundamentally, users are searching for specific data (i.e., particular cells or collections thereof), and the tables are merely artifacts of how the data is stored. Derthick et al. [8] describe a visual query language

We recognize that this approach also has downsides. that dynamically links queries and visualizations. The In particular, an index of cells (and their contexts) will system helps a user to locate information in a multiincur substantial storage overhead in comparison to an object database, illustrates the complicated relationship index of dataset metadata. Moreover, if the desired search between attributes of multiple objects, and assists the result is one or more datasets, at run-time there will be user to clearly express their information retrieval needs additional processing to assemble the cell-specific results in their queries. Similarly, Yogev et al. [9] demonstrate an to enable the retrieval and ranking at that level of granu- exploratory search approach for entity-relationship data, larity. However, our cell-centric approach gives us some combining an expressive query language, exploratory additional flexibility and we believe that good system de- search, and entity-relationship graph navigation. Their sign, appropriate data structures, and eficient algorithms work enables people with little to no query language can ameliorate the costs. expertise to query rich entity-relationship data.

This paper incorporates material previously presented In the domain of web search, Koh et al. [10] devise a in a poster [4] and a workshop [5]. Our contributions of user interface that supports creativity in education and this paper are: research. The model allows users to send their query to their desired commercial search engine or social platform • We propose cell-centric indexing as an innovative in iterations. As the system goes through each iteration, approach to an information retrieval system. A it will combine the text and image results into a compocell-centric index enables a user to find data with- sition space. Addressing a similar problem, Bozzon et out having to know the pre-existing structure of al. [11] design an interactive user interface that employs each table; exploratory search and Yahoo! Query Language (YQL) • We describe the mechanisms of one implementa- to empower users to iteratively investigate results across tion of a cell-centric dataset search engine. We multiple sources. describe the structure and method of data storage A tag cloud is a common and useful visualization of and querying of our server; and, data that represents relative importance or frequency via • We describe a novel prototype interface that lever- size. Some researchers have adapted this idea to visualize ages cell-centric indexing in order to give users query results. Fan et al. [12] focus on designing an insummaries of a dataset repository in terms of ti- teractive user interface with image clouds. The interface tles, content, and column names. The user can enables users to comprehend their latent query inteniflter on any of these facets to generate more spe- tions and direct the system to form their personalized cific summaries. image recommendations. Dunaiski et al. [13] design and evaluate a search engine that incorporates exploratory

The rest of the paper is organized as follows: we first search to ease researchers’ scouting for academic publicadiscuss related work, briefly describe the idea of cell- tions and citation data. Its user interface unites concept centric indexing and its advantages and disadvantages, lattices and tag clouds to present the query result in introduce the structure of our server and the methodol- a readable composition to promote further exploratory ogy involved in querying, and finally describe a prototype search. On the other hand, Zhang et al. focus their work interface. on knowledge graph data [14]. They combine faceted browsing with contextual tag clouds to create a system 2. Related Work that allows users to rapidly explore graphs with billions of edges by visualizing conditional dependencies between Scholars have investigated exploratory search to help selected classes and other data. Although they don’t use searchers succeed in an unfamiliar area by proposing tag clouds, Singh et al. [15] also display conditional denovel information retrieval algorithms and systems; some pendencies in their data outline tool. For a given pivot of them propose innovative user interfaces, while oth- attribute and set of automatically determined compare ers try to predict the user’s information need and to attributes, they show conditional values, grouped into use the prediction to better facilitate the subsequent in- clusters of interaction units. teraction. Chapman et al. [6] have reviewed diferent Other scholars have investigated query languages and approaches to dataset search. Google’s dataset search models. Ianina et al. [1] concentrate on developing an [7] is an example of a traditional approach to indexing exploratory search system that facilitates the user havweb datasets: the system crawls the Web and indexes ing a way to conduct long text queries, while minimizdataset that have metadata expressed in the schema.org ing the risk of returning empty results, since the itera(or a related) format. The only required properties are tive “query–browse–refine” process [ 16] may be timename and description. consuming and require expertise. Meanwhile, Ferré and Hermann [17] focus more on the query language, LISQL, and they ofer a search system that integrates LISQL and table as the indexed object, each datum (cell in the table) faceted search. The system helps users to build complex is an indexed object. In its simplest form, we propose queries and enlightens users about their position in the four fields: content: the value of the cell, title: the label of data navigation process. the dataset the cell appears in, columnName: the header

Yet another approach is to predict the user’s search of the column the cell appears in, and rowContext: the intent so that better search results can be presented. Pel- values in all cells in the same row as the indexed cell. Fortonen et al. [18] utilize negative relevance feedback in mally, a cell value , from table = ⟨, , ⟩ can be inan interactive intent model to direct the search. Negative dexed with: content = , , title = , columnName = ℎ , relevance feedback predicts the most relevant keywords, and rowcontext = ⋃︀=1 ,. This index would allow which are later arranged in a radar graph where the users to find all cells that have a column header token in center denotes the user, to represent the user’s intent. common regardless of dataset, or all cells that appear in Likewise, Ruotsalo et al. [19] propose a similar intent the same row as some identifying token, or look for the radar model that predicts a user’s next query in an inter- occurrence of specific values in specific columns. active loop. The model uses reinforcement learning to However, in this form, users still need to know which control the exploration and exploitation of the results. keywords to use and which fields to use them in. A cellcentric index alone is not helpful to a user who is not already familiar with the collection of datasets. In order 3. Cell-Centric Indexing to support the user in exploring the data, we propose We define a table as = ⟨, , ⟩ where is the label the abstraction conditional frequency vectors (CFVs). Let of the table, = ⟨ℎ1, ℎ2, ..., ℎ⟩ is a list of the column be a set of items, be a set of descriptors (e.g., tags headers, and is an × matrix of the values contained that describe the items), and ⊆ × be a set of item in the table. , refers to the value in the -th row and and descriptor pairs ⟨, ⟩. Let be a query, where the -th column, which has the heading ℎ . We note ( ) ⊆ represents the pairs for only those items that that this model can be easily extended to include other match . Then a CFV for and is a set of descriptormetadata, as appropriate. frequency pairs where the frequency is the number of

A naïve approach to indexing a collection of datasets times that the corresponding descriptor occurs within would be to simply treat each table as a document, and ( ): {⟨, ⟩ | = #{⟨, ⟩| ⟨, ⟩ ∈ ( )}}. For have separate fields for the label, column headings, and cell-centric indexing, the items are the set of all cells re(possibly) values. When terms are used consistently and gardless of source dataset, and pairs cells with terms the user is familiar with the terminology, this may work from the -th field. For example, if a cell 5 was in a well. However, this approach has several weaknesses: column titled "Real Estate Price," then includes the pairs ⟨5, ⟩, ⟨5, ⟩, and ⟨5, ⟩. • Any query on values has lost context of what Typically, we sort the CFV in terms of descending frecolumn the value appears in and what identify- quency. ing information might be present elsewhere in the same row. For example, a table that contains capitals like (Paris, France) and (Austin, Texas) 4. System Architecture is unlikely to be relevant to a query about “Paris

Texas” but would otherwise match. The architecture of the system is depicted in Figure 1. At • It is dificult to determine which new terms can the core of our system is an Elasticsearch server. Elasticbe used to refine the query. Users would need search [20] is a scalable, distributed search engine that to download some of the datasets and choose also supports complex analytics. Our system has two distinctive terms from the most relevant ones. main functions: 1) parse collections of datasets, map • A user’s constraint could be represented in dif- them into the fields of a cell-centric index, and send indexferent tables in very diferent ways. If the user ing requests to Elasticsearch; and, 2) given a user query, is looking for “California Housing Prices”, there issue a series of queries to Elasticsearch and construct may be a table with some variant of that name, histograms (CFVs) for each field. The Query Processor there may be a “Real Estate Prices” table with translates our high-level query API into specific Elasticrows specific to California, or there may be a Search queries, and assembles the results into CFVs. “Housing Prices” table that has a column for each state, including California. A user should be able 4.1. Index Definition to explore the collection to see how the data is organized and what terminology is used.

In Elasticsearch, a mapping defines how a document will

be indexed: what fields will be used and how they will be processed. In cell-centric indexing the cell is the

We have proposed cell-centric indexing as a novel way to address the problems above. Rather than treating the

document, and our index must have fields that describe cells. Our mapping is summarized in Table 1. In addition to the four fields mentioned in Section 3, we have fields for the fullTitle (used to identify which specific datasets match the query) and metadata such as tags, notes, organization, and setId. The setId allows us to distinguish between diferent datasets with the same title, and to get an accurate count of how many datasets match a query. Note, that content is divided into two fields: content and contentNumeric, for reasons that will be described below. For each field, we give its type and, if applicable, the analyzer used to process text from the field.

We use three types of fields: text, keyword, and double. Text type fields are tokenized and processed by word analyzers, whereas keyword type fields are indexed as is (without tokenization or further processing). Double type fields are used to store 64-bit floating point numbers. Most of our fields are text fields, but contentNumeric is a double field, which allows it to store both integer and real numeric values, and both fullTitle and setid are keyword ifelds, since we want users to be able to view the complete name of the dataset in the result, and there is no need to parse setIds.

All text fields require an analyzer which determines how to tokenize the field and if any additional processing is required. We use two built-in Elasticsearch analyzers: the stop analyzer divides text at all non-letter characters and removes 33 stop words (such as “a”, “the”, “to”, etc.). For most fields, we use the stop analyzer, but we use the wordDelimiter analyzer for the colunnName field. In addition to dividing text at all non-letter characters, it also divides text at letter case transitions (e.g., “birthDate” is tokenized to “birth” and “date”). This analyzer does not remove stop words.

4.2. Indexing a Dataset

The system loads each dataset using the following process: 1. Read the metadata, which can include title, tags, notes and organization. If the original table is formatted as CSV, then this data might be contained in a separate file in the same directory, or as a row in a repository index file. If the table is formatted using JSON, the metadata may be specified along with the content, and there may be many datasets described in a single file. 2. Read the column headings ⟨ℎ1, ℎ2, ..., ℎ⟩ 3. For each row in the dataset: a) Read the row values: ⟨1, 2, ..., ⟩ b) Create rowContext by concatenating the values in the row. Note, to avoid creating diferent large context strings for each value in the row, we create a single rowContext. This means that each value is also part of its own row context. This decision helps make the system more eficient. An additional eficiency consideration is that each value included in rowContext is truncated to the first 100 characters. c) Build an index request for each cell value . If the content is numeric (integer or real), it will be indexed in the contentNumeric field; otherwise it is indexed in the content field. The columnName field will be indexed with the corresponding header ℎ. The title is indexed twice, once as a tokenized field that can be used in queries, and again as a keyword field that preserves the order of the title and can be used to precisely identify the dataset the cell originated from. All other metadata fields are indexed in a straight-forward way.

Field columnName

content contentNumeric rowContext

title fullTitle tags notes organization setid

Type text text double text text keyword text text text keyword

Analyzer wordDelimiter stop N/A stop stop N/A stop stop stop N/A

4.3. Querying the index Our Query Processor takes a conjunctive, fielded query

and returns a histogram for each response field. The response fields are fields that contain information that helps the user understand the characteristics of cells that match the query. Currently, response fields are title, columnName, content, rowContext, and fullTitle. Given a query , the query process is: 5. If < − 1.5 and > + 1.5 (i.e., numeric data is not particularly skewed), build an aggregation query for contentNumeric data using ranges calculated from and : the lowest range is [, – 1.5 ] and the highest is [ + 1.5 , ], where there are 3 intermediate ranges of uniform size with the middle range – 0.5 , + 0.5 ]. If data are skewed, the ranges are shifted appropriately. 6. Issue an Elasticsearch histogram aggregation query with the calculated ranges. Treat each range as a content term, and insert these terms and their frequencies into the the content CFV.

7. Return CFVs for each response field.

Much of the processing above allows the system to 1. Issue query requesting term aggregations for ti- dynamically determine buckets for numeric content that tle, columnName, content, rowContext and fullTi- provide a useful picture of its distribution. Unlike textual tle, Term aggregations are a feature of Elastic- terms, numeric terms exhibit greater variability. Hissearch that return a list of terms that appear in the tograms built using distinct numeric strings are unlikely selected documents, along with their frequency, to have significant value. For example, “135”, “135.0” and i.e., CFV’s for . “1.35E+2” are all equivalent, while many users might con2. Calculate the min and max for matching con- sider “135.0001” to be close enough. To address this, we tentNumeric data. create ranges over numeric values. Our approach com3. Select a representative set of matching numeric putes the mean and standard deviation over the middle content by issuing a percentile query against the 90% of data, thus removing the influence of outliers, and contentNumeric field that excludes the top and then specifies the buckets to have a width of one standard bottom 5 percent of the data. deviation with one bucket centered over the mean. Once 4. Calculate the mean and standard deviation the histogram of numeric ranges is created, its data is of set . merged with the content histogram to produce a single histogram that shows frequencies of textual terms and

An example of a typical use case is demonstrated using

Figures 2-4. In this specific case, the user wants to find a dataset containing data on Kenya’s performance in the 2004 Athens Olympics. Initially, the user is presented with the graphs in Fig. 2. These histograms show the most frequent title and column terms in the collection of indexed datasets. However, the example histograms do not initially show anything regarding the Olympics. By using the “More Items” button at the bottom of the title histogram the user can find the term Olympics and add it directly to their query. After this term is added the screen changes to that shown in Fig. 3. The user can now look through all 4 histograms and decide which term best helps them get to their desired data. Once again using the “More Items” button, the term Athens can be found

5.1. Pre-query Histograms

Before any search parameters are set, the user is shown two pre-query histograms that return up to the 50 most frequent title and column name tokens within the current repository (see Fig. 2). Column name and title histograms provide a good overview and are vital in allowing the user to explore the datasets without prior knowledge of the contents. The pre-query histograms are presented to the user when there are no active queries, such as when the page is initially loaded or when all queries have been deleted. Clicking on a histogram bar will automatically add the corresponding term to the query and generate the standard set of histograms.

5.2. Results Histograms

The standard screen displays the user’s current query and five histograms. Each histogram is associated with a ifeld, and tokens are sorted in descending frequency of cooccurrence with the query. The length of a bar indicates how many cells match the query. As with the pre-query histograms, clicking on a bar adds the associated term to the query, and generates a new result histogram. Below Figure 4: Results of Full Title Histogram with query: each histogram is an option to provide more results on the title=olympics, content=athens histogram. Initially, each histogram presents the top 10 results, however, the top 25 results are pre-fetched which allows the newly requested results to be automatically added to the histogram. content=“athens”. For this refinement, we show the Full

Due to the connection between the count of matched Title histogram (see Fig. 4). In this histogram, the bars cells and bar length, there is the possibility that the first represent the number of cells in a data set that match the bar will be significantly larger than all remaining bars, user’s query. The user can add this bar to the query to get making them dificult to see or select. To combat this, specific information about the distribution of terms in we compare the counts of the two most frequent results. the chosen dataset. Additionally, this enables the option If the first result contains 10 times more hits than the to search for the dataset, which is accomplished using a second most frequent we change the scale of the his- Google query of the dataset’s full title.1 The user can contograms to logarithmic, thus making it easier to visualize tinue to explore the dataset collection by adding terms distinctions in skewed distributions. to and removing terms from the query.

Figure 3 shows the response of our prototype interface to the query with title=“olympics”. It displays a 6. Conclusion CFV for each field as a histogram; the longer the red bar, the more frequently that term co-occurs with the query. We have proposed cell-centric indexing as an innovaAs we can see, 318 datasets contain matches, and after tive approach to information retrieval of tabular datasets. “olympics,” the most common title word is “summer.” The Such indices support richer queries about tables that do most frequently-occurring terms in the column names of not require the user to know the pre-existing structure matching cells are “RANK” and “attempts”. The content of each table. They also provide the potential for new histogram combines terms with numeric ranges. In par- exploratory interfaces, and we describe one that gives ticular, the first, second, and fifth rows were all inserted users summaries of a dataset repository in terms of tiby the numeric range processing (as described previously tles, content, and column names. The user can filter on in Sect. 4.3). For this query, there are many cells with any of these facets to generate more specific summaries. values between 0 and 4, and slightly fewer with values Future work will test the efectiveness of this novel apbetween 4 and 21. The next most common content val- proach in facilitating dataset searches especially amongst ues are the terms “olympics” and “summer.” Note, the non-expert users. ifgure does not show the histogram for full titles that corresponds to this query (but is still part of the prototype interface). As discussed in the next paragraph, this Acknowledgments histogram indicates how many matching cells are in each dataset. This material is based upon work supported by the Na

The user can refine their query and create new his- tional Science Foundation under Grant No. III-1816325. tograms by clicking on any terms in the result. For ex- Lixuan Qiu and Drake Johnson contributed to early drafts ample, if the user clicks on “athens” in the content his- of this paper. We thank Alex Johnson, dePaul Miller, togram (after scrolling down), the system will display Keith Register, and Xuewei Wang for contributions to a new set of histograms summarizing the datasets that the system implementation. have “athens” as a content field and “olympics” in the title; in other words the query will be title=“olympics” and

1Many of our dataset collections do not have a URL recorded,

which is why we do not simply link to the dataset as a result. [11] A. Bozzon, M. Brambilla, S. Ceri, P. Fraternali, Liquid query: Multi-domain exploratory search on [1] A. Ianina, L. Golitsyn, K. Vorontsov, Multi-objective the web, in: Proceedings of the 19th International topic modeling for exploratory search in tech news, Conference on World Wide Web, WWW ’10, Asin: A. Filchenkov, L. Pivovarova, J. Žižka (Eds.), Ar- sociation for Computing Machinery, New York, tificial Intelligence and Natural Language, Springer, NY, USA, 2010, p. 161–170. doi:10.1145/1772690. 2017, pp. 181–193. Communications in Computer 1772708.

and Information Science, vol 789. [12] J. Fan, D. A. Keim, Y. Gao, H. Luo, Z. Li, [2] H. Borchart, Efects of content preview on query Justclick: Personalized image recommendation via refinement in dataset search, Senior Project Re- exploratory search from large-scale flickr images, port, Cognitive Science Program, Lehigh University, IEEE Transactions on Circuits and Systems for Bethlethem, PA, 2021. Video Technology 19 (2008) 273–288. [3] L. Miller, Facilitating dataset search of non-expert [13] M. Dunaiski, G. J. Greene, B. Fischer, Exploratory users through heuristic and systematic information search of academic publication and citation data processing, Honors Thesis, Cognitive Science Pro- using interactive tag cloud visualizations, Scientogram, Lehigh University, Bethlethem, PA, 2020. metrics 110 (2017) 1539–1571. [4] D. Johnson, K. Register, B. D. Davison, J. Heflin, An [14] X. Zhang, D. Song, S. Priya, J. Heflin, Infrastructure exploratory interface for dataset repositories using for eficient exploration of large scale linked data cell-centric indexing, in: Proceedings of the 2020 via contextual tag clouds, in: International SemanIEEE International Conference on Big Data (IEEE tic Web Conference, Springer, 2013, pp. 687–702.

BigData 2020), 2020, pp. 5716–5718. Poster paper. [15] M. Singh, M. J. Cafarella, H. V. Jagadish, Dbex[5] L. Qiu, H. Jia, B. D. Davison, J. Heflin, An architec- plorer: Exploratory search in databases, in: E. Piture for cell-centric indexing of datasets, in: Pro- toura, S. Maabout, G. Koutrika, A. Marian, L. Tanca, ceedings of PROFILES’20: 7th International Work- I. Manolescu, K. Stefanidis (Eds.), Proceedings shop on Dataset PROFILing and Search, 2020, pp. of the 19th International Conference on Extend82–96. Held with ISWC 2020. ing Database Technology, EDBT 2016, Bordeaux, [6] A. Chapman, E. Simperl, L. Koesten, G. Konstan- France, March 15-16, 2016, OpenProceedings.org, tinidis, L.-D. Ibáñez, E. Kacprzak, P. Groth, Dataset 2016, pp. 89–100. doi:10.5441/002/edbt.2016. search: a survey, The VLDB Journal 29 (2020) 251– 11.

272. [16] R. W. White, R. A. Roth, Exploratory Search: Be[7] N. Noy, M. Burgess, D. Brickley, Google dataset yond the Query-Response Paradigm, Synthesis Lecsearch: Building a search engine for datasets in an tures on Information Concepts, Retrieval, and Seropen Web ecosystem, in: Proceedings of The Web vices, Morgan & Claypool Publishers, 2009. doi:10.

Conference, 2019, pp. 1365–1375. 2200/S00174ED1V01Y200901ICR003. [8] M. Derthick, J. Kolojejchick, S. F. Roth, An in- [17] S. Ferré, A. Hermann, Semantic search: Reconciling teractive visual query environment for exploring expressive querying and exploratory search, in: data, in: Proceedings of the 10th Annual ACM International Semantic Web Conference, Springer, Symposium on User Interface Software and Tech- 2011, pp. 177–192. nology, UIST ’97, Association for Computing Ma- [18] J. Peltonen, J. Strahl, P. Floréen, Negative relechinery, New York, NY, USA, 1997, p. 189–198. vance feedback for exploratory search with visual doi:10.1145/263407.263545. interactive intent modeling, in: Proceedings of [9] S. Yogev, H. Roitman, D. Carmel, N. Zwerdling, To- the 22nd International Conference on Intelligent wards expressive exploratory search over entity- User Interfaces, IUI ’17, Association for Computing relationship data, in: Proceedings of the 21st Inter- Machinery, New York, NY, USA, 2017, p. 149–159. national Conference on World Wide Web, WWW doi:10.1145/3025171.3025222. ’12 Companion, Association for Computing Machin- [19] T. Ruotsalo, J. Peltonen, M. J. A. Eugster, ery, New York, NY, USA, 2012, p. 83–92. doi:10. D. Głowacka, P. Floréen, P. Myllymäki, G. Jacucci, 1145/2187980.2187990. S. Kaski, Interactive intent modeling for exploratory [10] E. Koh, A. Kerne, R. Hill, Creativity support: In- search, ACM Trans. Inf. Syst. 36 (2018). doi:10. formation discovery and exploratory search, in: 1145/3231593.

Proceedings of the 30th Annual International ACM [20] C. Gormley, Z. Tong, Elasticsearch: the definitive SIGIR Conference on Research and Development guide: a distributed real-time search and analytics in Information Retrieval, SIGIR ’07, Association for engine, O’Reilly Media, Inc., 2015. Computing Machinery, New York, NY, USA, 2007, p. 895–896. doi:10.1145/1277741.1277963.