Assessing Ontologies Usage Likelihood via Search Trends1 Mattia FUMAGALLI a2 , Tania BAILONI b and Fausto GIUNCHIGLIA b a Conceptual and Cognitive Modeling Research Group (CORE), Free University of Bozen-Bolzano, Bolzano, Italy b Department of Information Engineering and Computer Science (DISI) - University of Trento, Italy Abstract. The generation of high quality and re-usable ontologies depends on ef- fective methodologies aimed at supporting the crucial process of identifying the on- tology requirements, in terms of the number of potential end-users and end-users’ informational needs. It is widely recognized that the exploitation of competency questions (CQs) plays an important role in this requirement definition phase. In this paper, we aim at introducing a new general approach to exploit (web) search trends, and the huge amount of searches that people make every-day with web search en- gines, as a pivotal complementary source of information for the identification of in- formal needs of large numbers of end-users. To achieve this goal we use the “auto- suggest” results provided by search engines like Bing and Google as a goldmine of data and insights. We select a set of keywords to identify the ontology termi- nology, and we collect and analyze a huge amount of web search queries (WSQs) related to the selected set of keywords. In turn, we identify the search trends re- lated to the collected WSQs and we show how the corpus of selected WSQs can be used to assess the usage likelihood of a selected ontology w.r.t. the identified (web) search trends. The experimental results are used to discuss the practical utility of the proposed approach. Keywords. ontologies, web search, search trends, ontologies design, topic modeling, usage likelihood 1. Introduction Ontologies are the main backbone structure of many semantic applications and are cen- tral in supporting semantic interoperability. Building useful, high-quality and re-usable ontologies is not a trivial task and mainly depends on effective methodologies aimed at supporting the process of ontology engineering [1]. In this respect, within the ontol- ogy requirements definition phase, which is essential for the ontology development life- cycle [2], the identification of the amount of possible end-users of the ontology and the end-users’ informational needs, is one of the most pivotal activities. Understanding how many users may need the ontology and the users’ view of the knowledge to be encoded in the ontology is key for defining the function of the ontology and enabling its re-usability. 2 This paper was written under the contract with the University of Trento 1 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In the work of devising methodologies for ontology development, the key role of competency questions (CQs) for identifying ontology requirements is widely recognized [3,4]. CQs are specific questions about a given domain of information, and represent valuable information sources for checking the scope of the ontologies being developed. In a similar spirit, the work presented in this paper aims at devising a methodol- ogy to exploit the huge amount of searches that people make every-day with large-scale cross-domain web search engines as a valuable source of information for the ontologies requirements definition phase. Nowadays people use web search engines to find information about almost every- thing. An outstanding example is provided by Google3 , which processes over 3.5 billion searches per day and 1.2 trillion searches per year worldwide4 . These searches provide insights about multiple and different people’s needs and can be analyzed and exploited in different ways. The words or strings of words that web search engine users type into a search box are called web search queries (WSQs). These queries are very different from the ones provided in standard query languages. They are indeed often given in plain text with optional “search-directives” (such as “and”/“or” with “-” to exclude), but they are not constrained by fixed syntax rules, as command languages with special parameters [5]. WSQs can be categorized into three main broad categories [6], namely, informa- tional, navigational, and transactional (or “do”, “know”, “go” [7]). This classification was empirically derived through the analysis of the queries of some of the most used search engines [8] and shows how WSQs are real-world applications of different kinds of natural language questions, providing hints about different motivations and semantic needs. WSQs encode people search interest trends. They represent a goldmine of insights for today’s knowledge engineers and can complement the role of CQs in identifying users’ view and their semantic needs. Moreover, WSQs can provide, statistically relevant information about the ontology usage likelihood, namely the amount of possible end- users of the ontology. This aspect is central to support the re-usability of the ontology. Finally, WSQs can be easily collected by analyzing the suggested results of the most used search engines. Following the lack of work in exploiting WSQs for ontology (“data-driven”) assess- ment and development [9], we propose a new approach to support the ontology require- ments specification phase. We select a set of keywords (KWDs) to identify the ontology terminology. We collect and analyze a huge set of WSQs related to the selected keywords and, in turn, we identify the search trends related to the collected WSQs. As final step, we show how the corpus of selected WSQs can be used to assess a given ontology w.r.t. the identified (web) search trends. The main contributions are: • given a set of selected keywords, a procedure for gathering, processing and an- alyzing the WSQs provided by the auto-suggested results of search engines like Google and Bing5 (Section 2 and 3); • given a set of selected WSQs, a machine learning based pipeline for identifying web search trends (Section 4); 3 https://www.google.com/ 4 https://www.internetlivestats.com/google-search-statistics/ 5 https://www.bing.com/ • (the beginning of) a search driven ontology assessment method where we provide a first test, by assessing the usage likelihood of 8 state-of-the-art (SoA) ontologies against a gold-standard data set, manually created by us, with around 8,000 WSQs, grouped by 36 input core keywords and associated to the 8 input SoA ontologies (Section 5). The paper is completed as follows: Section 6 describes the related work and Section 7 discusses conclusions and future work. 2. Ontologies and Search Trends Suppose that there is a need to capture the requirements, in terms of semantic needs and user view, of the knowledge to be encoded in an ontology to search for information in the travel domain. This activity requires that knowledge engineers take into account somehow the end-users needs, and it is necessary to ground the ontology building process [2,4]. A first central task consists in identifying the objectives and the purpose of the ontology from a user’s point of view, namely to determine the domain of interest and the scope. For instance, the ontology may be used to find the best flights in terms of price, or it may be used to find the less crowded places where to go in a certain period, and so forth. This identification step involves the selection of a lexicon for the given ontology, namely a set of core keywords that will be central in the final application and will be used to identify the questions that users may run over the system. In the context of the travel domain, for instance, we may have keywords like booking, place, vacation, route, seat number, period, destination, hotel, price and guide. The keywords for the determination of the ontology purpose are usually collected in two ways: i. from application-specific documents, usually by running knowledge ex- traction tools [10]; ii. directly from the suggestions of the possible end-users, usually through interviews. Another source of information for selecting the lexicon of an ontol- ogy, especially if this ontology will be implemented to support mainstream applications (e.g., ecommerce websites) can be the web search data offered by free online services like Google Trends6 . These services can provide, given a domain of interest, multiple suggestions and related keywords relying on the huge number of searches that people run every day over mainstream search engines. Selecting the concepts of the ontology by using keywords suggestion tools like Google Trends helps in intercepting people’s infor- mational needs about a given selected topic. This way of deriving the ontology lexicon, while being complementary to the more traditional ones (see i. and ii. above), introduces two main advantages as well. Firstly, extracting keywords from search trends allows col- lecting new useful insights about the number of searches for a given term (for instance it may be possible to give more weight to the keyword “travel”, instead of “vacation”, given the much higher number of searches about travels). Secondly, search trends suggestion tools provide a highly scalable means by which information about the domain of interest and scope of the ontology can be collected (notice that web searches are not made by experts, and these data should not be considered as a replacement of data gathered from domain experts). Another central task in the identification of the requirements, which directly follows the keyword selection task, consists of identifying the questions that users may run over 6 https://trends.google.com/trends the ontology. Similarly to the selection of the lexicon, the suggestion tools provided by mainstream search engines can play a pivotal role. Along with the competency questions (CQs) [3] provided by domain experts and possible end-users, insights about WSQs, related to the selected keywords, can be used as a complementary source of information to specify the semantic needs and user views of the knowledge to be encoded in the ontology, and to assess the usage likelihood of an ontology. Table 1. Examples of WSQs for the “travel” domain WSQ1 are flights still going to china WSQ2 are flights cheaper on boxing day WSQ3 when flights get delayed WSQ4 can flights be cancelled due to snow WSQ5 flights where you can choose seats WSQ6 which flights allow pets WSQ7 why flights to brazil are so expensive WSQ8 what flights go from terminal 2 manchester WSQs are essentially questions at a conceptual level an ontology may be able to answer. For instance, the ontology may be used to address questions like “are flights still going to China?”, or “can flights be canceled due to snow?” (see Table 1 for more examples). Differently from competency questions, which are identified through inter- views with domain experts and end-users brainstorming, WSQs are derived by analyzing the suggestion results of general (e.g., Google and Bing) or vertical search engines (e.g., Library of Congress or Nuora7 ). WSQs represent a significant corpus of information from which users’ view and usage likelihood of the knowledge to be encoded in the ontology can be extracted. In this paper, we categorize this corpus of information in (web) search trends, namely set of weighted keywords derived from sets of web search queries. WSQs can be then used during the test workflow to assess the ontology w.r.t some given (web) search trends. Using WSQs to identify search trends and assessing ontologies w.r.t. search trends is an open research problem [11]. WSQs, besides being written in natural language, i.e., plain text, are indeed very noisy and not constrained by fixed syntax rules. We model this problem as a process where the main goal is to calculate the similarity of an ontology vocabulary w.r.t. a selected corpus of (web) search trends. The focus of our approach is mainly to support ontology engineers in properly verifying: a. whether an ontology can be used to address the semantic needs expressed by the identified search trends; b. the usage likelihood of the ontology. 3. From Keywords Selection to WSQs Let us imagine, that a knowledge engineer has to develop a semantic web application (i.e., a web service like an ecommerce website, a vertical search engine on a website, and so forth) and needs to determine the knowledge and information that are necessary for structuring the semantic of the data with an ontology. The first task she has to address is to identify a corpus of keywords and queries that concerns the target domain. Once the 7 https://www.nuroa.co.uk/, https://www.loc.gov/ corpus is determined, she is able to develop from scratch the ontology or to select from a number of existing ontologies the most appropriate for the application. 3.1. Gathering Data Services that provide data and insights about searching keywords and web search queries are pivotal means by which the initial data selection process could be facilitated. In what follows, through a running example, we describe how we addressed the task of gathering and analyzing WSQs given a set of keywords. As first step, in order to make the approach as general as possible, we focused on very broad domains of interest, and we investigated what can be the most searched and commonly used keywords over the web. As input references we considered: • the most commonly used entities in the Google Knowledge Graph8 (namely the knowledge graph used by Google and its services to enhance search engine’s re- sults); • Rosch’s [12] work on basic level categories, which provides a cognitive grounding for the selection of the most informative categories and related keywords; • the Schema.org9 vocabulary, and core entities, itself related to the Google knowl- edge graph, being one of the most used semantic resources for structuring and semantically enriching web documents content. After the analysis of these resources, we collected 36 keywords. 29 of them were identified within the common Schema.org types used by the Knowledge Graph Search API, namely: Action; Book Series; Book; Creative Work; Educational Organization; Event; Government Organization; Local Business; Movie Series; Movie; Music Album; Music Group; Music Recording; Offer; Organization; Periodical; Person; Place; Prod- uct; Recipe; Restaurant; Review; Sports Team; TV Episode; TV Series; Vehicle; Video Game Series; Video Game; Website. 7 of them were identified within Rosch’s basic level categories, namely: Bird; Clothing; Fish; Fruit; Furniture; Tool; Tree. Notice that we produced the final list with the purpose of covering a very broad set of common sense everyday searches, without claiming to be exhaustive. As second step, starting from the selected keywords, we collected the correspond- ing WSQs. In order to gather this data we used two main tools, namely Google Trends and Answer The Public10 . Google Trends provides reliable and updated insights about Google searches, while Answer The Public offers a very easy-to-use (free) service to col- lect a good amount of information about web searches, by using Google and Bing APIs. Both of these services, given a set of keywords, provide the corresponding “most typed” WSQs. Moreover, Answer The Public organizes WSQs according to different criteria, or “modifiers”. For instance, WSQs can be categorized per keyword, typology, e.g., ‘ques- tions’ or ‘comparison’, or types of adverb, like ‘what’ or ‘where’. This organization can be used as a starting point for a fine-grained categorization of WSQs (for instance, WSQs can be grouped as ‘temporal’ or ‘spatial’, according to their corresponding modifiers). By merging and processing (e.g., we needed to delete some noisy or irrelevant WSQs) the results gathered with the above-introduced services, we collected around 8,000 WSQs. 8 https://developers.google.com/knowledge-graph 9 https://schema.org/ 10 https://answerthepublic.com/ All these queries have been categorized in relation to the input 36 keywords listed above. Moreover, four main preliminary categories of queries, all of them grouping specific ‘modifiers’, have been identified: • Questions, i.e., WSQs characterized by modifiers like ‘how’, ‘are’, ‘what’, ‘where’, ‘who’, ‘which’, ‘will’, ‘when’, ‘can’ and ‘why’; • Prepositions, i.e., WSQs characterized by modifiers like ‘for’, ‘near’, ‘to’, ‘with’, ‘without’, ‘is’ and ‘can’; • Comparisons, i.e., WSQs used to compare something with something else (e.g., iPhone vs. Samsung), characterized by modifiers like ‘and’, ‘like’, ‘or’, ‘versus’ and ‘vs’; • Related and Alphabeticals, i.e., general WSQs mainly related to the reference key- word, which cannot properly be identified as from the categories above. 3.2. WSQs Data Set Generation After grouping all the WSQs for all the selected keywords, we processed the data to decrease the noise. Notice that, at the current state, we performed this step manually, but we are evaluating solutions for automatic support (e.g., by means of ad hoc NLP techniques) as part of the future work. As first step, we checked every single WSQ independently of its categorization. We corrected queries with typos, we dropped duplicated WSQs and WSQs written in languages different from English (the language we selected as reference). For instance, we dropped queries like ‘periodical ka hindi’ or ‘website kaise banaye in hindi’ that ask for information in Hindi. Table 2. Example of WSQs outputs and categorization for the keyword “website” Keyword Category Modifier WSQ >are website expenses deductible are >are website terms of use required >can website access camera can >can website detect vpn question >when website was created when >when website was published which >which website to watch anime >who website belongs to who >who website is registered to >website for photographers website for >website for selling items preposition >website is under maintenance is >website is not secure >website with free images with >website with games comparison >website like youtube like >website like airbnb >website visitor counter general >website url >neargroup website As second step, we assessed the WSQs main categories and modifiers. Each single WSQ was checked according to its categorization, namely, we checked whether it was correctly categorized. Moreover, some groups of WSQs associated with some specific modifiers were discharged because of their incompatibility with the ontology query an- swering capability. For instance, we dropped those WSQs that imply deep and complex processing like those with modifiers like ‘how’ and ‘why’ (e.g., ‘how bird eat’, ‘why fruit is good for you’) and those that are pointless like, ‘near me’, ‘furniture is’ and ‘person is dead’. The main output is represented by Table 2 (above). Here we have a categorization of some WSQs for the keyword ‘website’, with related categories and modifiers. The table follows the structure of the categorization: from the general keyword to the specific WSQ. Figure 1. WSQs dataset overview - number of WSQs per (a) category, (b) modifier, and (c) keyword The distribution of the WSQs, over the whole dataset, given their keywords, cate- gories and modifiers, is shown by Figure 1. From the chart, it is possible to identify the size of the selected keywords, categories and modifiers in terms of the number of asso- ciated WSQs. Notice that the number of queries in the different categorizations is quite unbalanced, this being motivated by the available extracted data (e.g., there are more examples of ‘general’ queries and there are more examples of queries about ‘TV-series’ than queries about ‘trees’). Looking at Figure 1(a) the ‘general’ category is the broadest; with the 67% of the total number of queries. Figure 1(b) shows the distribution of the queries in terms of modifiers. Modifiers providing comparisons are the most represented across the selected WSQs: ‘with’ and ‘like’, for instance, are the broadest groups with 390 and 359 queries respectively, while modifiers like ‘are’ and ‘can’ (categorized as prepositions) have only 36 and 31 queries each. Figure 1(c) shows the distribution of the queries among the keyword classification. Most of the queries are related to ‘TV series’ (562), while ‘action’, for instance, (clearly a much more abstract keyword) has only 39 related queries. 4. Search Trends Determination The goal here is to extract web search trends from the selected WSQs dataset, where we assume that each set of WSQs associated to a selected keyword consists of a mixture of web search trends (WSTs), and a WST11 is a set of weighted keywords. Loosely speak- ing, our basic assumption is that the semantics of search trends is somehow ‘hidden’ in- side the multiple collected WSQs. As a result, the process of extracting web search trends consists in uncovering this hidden semantics, i.e., trends, that characterize a given set of WSQs. In order to achieve this task we adopted one of the foundational techniques in topic modeling, namely the Latent Dirichlet Allocation (LDA) [13] approach. We applied LDA by using the Parallel Topic Model of the Mallet library12 [14] with sparse LDA sampling scheme and data structure [15]. LDA is a generative probabilistic model that uses Dirichlet priors for the document-topic and word-topic distributions [16]. We con- sidered this widely applied algorithm as a suitable algorithm for achieving our WSTs de- termination goal (in the context of this current assessment method set-up). Notice that we are aware of other topic-extraction algorithms and of recent developments of LDA (e.g., LDA + embeddings like lda2vec [17]), however the evaluation of other topic-extraction algorithms is out of the scope of this paper. Adapting the formal description of LDA, we modeled a set of documents W as W = (w1 , ..., wn ), each document being a set of W SQs. Similarly, we modeled K as the set of all the keywords collected by all the documents W and Kw = (k1 , ..., kn ) as the set of keywords in a document w. Then, given a set of trends T = (t1 , ...,tn ), where a trend is a distribution of keywords in K (e.g., Creative −Work = h0.3 Movies, 0.4 Books, 0 Fruit, 0.2 Document, 0.1 Pricei), we looked for: (i) the probability of each trend ti occurring in document wi (from 0 to 1); (ii) the weight of each keyword ki , for a given trend ti (from 0 to n). Starting from the data set of WSQs described in Section 3, in order to identify a group of search trends we adopted the two following approaches: (a) we manually grouped the WSQs associated to the source keywords into 5 broader documents, associating each of these documents to a trend and then calculating the weight of each keyword for the given trends (from now on we call this approach “semi-automatic approach”); (b) we automatically identified 5 search trends from the complete list of WSQs doc- uments (one per keyword) and then we calculated the weight of each keyword for the given trends (from now on we call this approach “(fully-) automatic ap- proach”); In both cases (a. and b.) we achieved the goal by addressing the following steps: 1. we took a set of WSQs (associated to a keyword or an arbitrary group) as an input doc- ument; 2. we processed each document via a NLP pipeline that performs various steps, including: 2(a). tokenization; 2(b). lower case all characters; 2(c). filter out stop-words, 2(d). find corresponding synonyms and hypernyms in WordNet13 for each extracted term; 3. we applied the LDA algorithm over the whole set of documents to extract the specific 11 From now on we use ‘WST’, ‘trend’ and ‘search trend’ interchangeably 12 http://mallet.cs.umass.edu/diagnostics.php 13 https://wordnet.princeton.edu/ Figure 2. Search trends visualization according to the keyword (kwd) weights generated by means of LDA probability distribution for 5 topics and the weights of all the keywords (when the in- put documents are divided per keywords the trends extraction is automatically derived, when the input documents are the 5 groups we manually defined, the trends extraction is mapped to the manual grouping). The semi-automatic grouping of the documents containing the list of WSQs led to the identification of the following trends: • trend0 : fruit, fish, restaurant; • trend1 : offer, event, action; • trend2 : person, fish, bird, tree; • trend3 : creative-work, review, periodical, book, book-series, music-recording, movie-series, movie, tv-series, tv-episode, videogame-series, videogame, clothing, furniture, website, recipe, music-album, tool, vehicle, product, review; • trend4 : local-business, restaurant, place, music-group, government-organization, organization, sports-team, educational-organization. The automatic extraction of 5 trends from the documents containing the list of WSQs, grouped according to the reference keyword, generated the following correla- tions: • trend5 : product, review, action, movie, tv-episode, fruit; • trend6 : bird, video-game, local-business, offer, clothing, videogame-series, movie- series, fish, restaurant, tv-series, furniture; • trend7 : person, event, place, tool, recipe, vehicle; • trend8 : music-group, music-recording, music-album, sports-team, tree; • trend9 : website, government-organization, organization, book, creative-work, educational-organization, periodical, book-series. Each identified trend along with the top keywords is shown in Figure 2. Each semi- automatically derived trend can be described as follows: trend0 is clearly about food products and facilities; trend1 is a more abstract trend, highly characterized by events, happenings, activities and other occurrences; trend2 groups the main categories of liv- ing beings; trend3 is clearly about media and creative works; finally, trend4 is heavily characterized by keywords about organizations. In turn, each automatically derived trend can be described as follows: trend5 is very related to media and creative works with a high focus on the keyword review; trend6 is again very related to creative works with a strong focus on the keyword series; trend7 is heavily related to the keyword place; trend8 is focused on music and music groups and trend9 is clearly related to organization and concrete media objects like books. The first observation is that for the semi-automatically generated trends, as expected, we had very coherent groupings, which are influenced by the manual selection of the WSQs before the trend definition phase. The second observation is that the distribution of the weights over the keywords was more balanced for the automatically generated trends. This is because in the semi-automatically generated trends we generated docu- ments with huge amount of WSQs, with multiple overlapping keywords, thus allowing the generation of a broader set of identifying keywords, but, at the same time limiting the “inclusiveness” of the trend (i.e., to be identified as related to one of these trends the weight for the related keyword must be higher). 5. Assessment via Search Trends After producing a data set of WSQs, given a set of selected input keywords, and after determining a set of WSTs, we are ready to select the ontologies to be assessed. At this point, two are the phases we performed. Firstly (Section 5.1), we selected a set of SoA ontologies, and: i. we generated a corpus, i.e., what we call here “core” data set, from each ontology via a NLP pipeline (as from Section 4); ii. we mapped the WSQs of the data set described in Section 3 to the selected ontologies, in order to generate a reference gold standard data set. Notice that the creation of the gold standard is not a mandatory step for the final assessment. However, for the sake of research, we generated this data set in order to better understand the efficiency of the approach we are proposing. Secondly (Section 5.2), we run the as- sessment and compared the corpus derived from the ontologies concepts with the trends generated through the processes (semi-automatic and automatic) described in Section 4. Figure 3. An overview of the process for assessing ontologies via search trends The results were then tested against the results provided by the gold standard data set derived by means of the previous phase. Figure 3 provides an overall view of the entire pipeline and shows how the above mentioned phases are combined with the phases described in the previous sections. The blue boxes in the diagram represent the activities we run along the pipeline, the doc icons represent the related output. Activities and output grouped by the dotted line represent the phase dedicated to the gold standard generation (to be considered as optional). No- tice that, as one of the contributions of this paper, the pipeline we implemented using RapidMiner14 framework and the annotated data set were published free for research purposes.15 5.1. Ontologies Core Data Set and Gold-standard Data Set Generation As first activity, we assessed most of the available ontologies by looking at the Linked Open Vocabulary Catalog (LOV)16 and other sources (see for instance Datahub17 ). Fol- lowing the running example introduced by the previous sections, we took into account both general-purpose and domain-specific ontologies, where the former are better in an- swering queries from a wide range of subjects, while the latter are better in answering queries of particular and domain-specific subjects. For the selection of the domain on- tologies we considered the most specific keywords from the list defined in Section 3 (e.g., ‘movie’, ‘tv episode’, ‘tv series’ or ‘local business’ and ‘offer’). At the end of the anal- ysis process we identified four generic (or general-purpose) and four domain-specific ontologies, namely: Schema.org, Opencyc18 , SUMO19 , DBpedia20 , GR21 , EBUCore22 , BioTop23 , MO24 . 14 http://www.rapidminer.com 15 https://github.com/Matt-81/ontologies-by-trends 16 https://lov.linkeddata.es/dataset/lov 17 https://datahub.io/ 18 https://old.datahub.io/dataset/opencyc 19 http://www.adampease.org/OP/ 20 http://dbpedia.org/ontology/ 21 http://www.heppnetz.de/ontologies/goodrelations/v1 22 https://www.ebu.ch/metadata/ontologies/ebucore/index.html 23 http://biotopontology.github.io/ 24 http://musicontology.com/ Table 3. Distribution of WSQs over ontologies (a) (b) Ontology Matches no. Ontologies no. WSQs no. Schema.org 5332 0 1097 DBpedia 2919 1 3247 OpenCyc 1095 2 1323 SUMO 1169 3 1503 BioTop 96 4 298 GR 172 5 86 EBUCore 283 6 7 MO 491 7 0 After the identification of the candidate ontologies, we addressed the mapping with the given input WSQs to generate the gold standard data set25 . We analyzed every single ontology by using Protégé26 , this was in order to check whether each ontology contains concepts or properties that can be mapped to the given input keywords coming from every single WSQ. The modifiers associated with the WSQs were particularly useful to check for objects or data properties. For instance, the WSQ ‘product like kindle’, asking for products that are similar to the Kindle product, can be mapped into schema.org with the property ‘isSimilarTo’, which has both domain and range in the classes ‘Product’ and ‘Service’. This WSQ can be also mapped into DBpedia, OpenCyc and GR for similar reasons. To give another example, we mapped the query ‘who organization members’, which asks for the members of the WHO (World Health Organization), to schema.org, because of the property ‘member’, which has domain in the class ‘Organization’ and range in ‘Organization’ and ‘Person’ classes. This mapping is feasible also for DBpedia, thanks to the property ‘organisation member’ that has domain in the class ‘Organisation’ and range in ‘Organisation member’. The same can be said for the other ontologies, i.e., OpenCyc, SUMO, EBUCore, where the mapping is supported by ‘member’ properties. At the end of the mapping process, we generated a data set with very fine-grained in- formation about which ontology (within the analyzed ones) can be used to answer the collected WSQs. The main insight is that the 85% of the collected WSQs have a match in at least one of the selected ontologies. Table 3 shows the number of matches found for each ontology (a), and the number of queries grouped by the number of different ontologies mapped (b). Firstly, as expected, it can be noticed that, considering the distribution of the matches across the ontologies, the general-purpose ontologies (i.e. schema.org, DBpedia, OpenCyc and SUMO) have the highest coverage and schema.org is the topmost in the ranking. Moreover, it can be observed that 1097 suggestions (15%) cannot be matched. The majority of the queries (3247 WSQs, 43%) have only one match, however many of them can be mapped over two or three ontologies (respectively the 20% and 17%). It can be further noticed that the maximum number of ontologies with a “query match” is six, thus no suggestion can be found in all the eight ontologies studied. Similarly, the analysis of the mapping of the WSQs grouped by the keyword categorization highlighted the difference in the dis- tribution of the matches. For instance, all the WSQs with the keyword ‘music record- ing’ found a match, while less than half of those with the keyword ‘clothing’ can be an- 25 For the gold standard generation task, a master student and postdoctoral researcher from computer science, with high expertise in knowledge engineering were involved 26 https://protege.stanford.edu/ swered using the ontologies selected. Analyzing the distribution of the WSQs, grouped by keyword, over the ontologies was helpful to understand the “coverage” of each key- word among the ontologies (i.e., which ontology maps a specific keyword), in fact, some keywords have matches in only one ontology (i.e. ‘clothing, ‘furniture’, ‘local business’, ‘movie series’ and ‘recipe’); while others are present in more than one (e.g. keyword ‘fruit’ can be found in OpenCyc and SUMO). As final step, we extracted the core data set and the gold standard data set, where: i. the former is generated from a set of text files, each one representing an ontology and collecting the related triples with class-properties relations (e.g., ‘Person’- ‘DomainOf’- ‘hasName’; ‘Album’- ‘DomainOf’-‘hasAuthor’); ii. the latter, is generated from a set of text files, each one representing an ontology and collecting the list of WSQs that can be addressed by that ontology according to the above described annotation task. We then ran the extraction of the two data sets by applying the same NLP pipeline we used to derive the web search trends (see Section 4), and, for each ontology, we generated a corpus of weighted keywords from the corresponding triples and a corpus of weighted keywords from the annotation. 5.2. Assessing ontologies The goal here is to run a preliminary experiment to assess ontologies w.r.t. a given set of search trends, and thus trying to infer their usage likelihood. In order to achieve this goal, we ran two main experimental trials. The first one was to evaluate the ontologies w.r.t. the semi-automatic generated web search trends, i.e., trend0 , trend1 , trend2 , trend3 , trend4 . The second was to evaluate the ontologies w.r.t. the automatic generated web search trends, i.e., trend5 , trend6 , trend7 , trend8 , trend9 . In both the trials, the corpus derived directly from the ontologies was tested against the corpus generated from the gold standard. The results of the first and the second trials are showed in Table 4. In the extreme left column, the input corpus for each ontology is reported. The manually annotated golden standard data sets are denoted as “-gold”, the data sets generated directly from the ontology are denoted as “-core”. All the other columns report a confidence prediction value (from 0 to 1) for each reference trend. We take this confidence as the measure to assess the given input ontology (i.e., the probability of a trend occurring in an ontology, see Section 4). The first observation concerns the comparison between the “-gold” and “-core” data sets prediction results. In both the trials, the ontologies for which the “-gold” prediction value is aligned to the “-core” value are DBpedia, EBUcore and mo. This means that about the 60% of the “-core” predictions is not aligned with the “-gold” predictions, thus highlighting a relevant difference between the “-gold” and “-core” corpora in general (see for instances the example of opencyc, where this difference is very high). The second observation concerns the differences between the results related to the automatically generated trends and the semi-automatically generated trends. The ontolo- gies of the first trial were associated mostly to trend0 and trend1 . The ontologies of the second trial were mostly associated to trend6 and trend9 . By looking at the keywords composing the generated trends (Section 4) it is possible to notice that the most selected trends, in both the trials, cover a wide range of keywords, from ‘food’, ‘living being’ and ‘event’, for the semi-automatically generated trends, to ‘organization’ and ‘creative work’ for the automatically generated trends. Still, the prediction results of the second Table 4. Ontology assessment prediction values, given the selected trends trial are less unbalanced than the prediction results of the first trials (see values across trends). The third observation concerns the association between ontologies and trends. As we expected, the general ontologies are usually associated to broad trends, i.e., trends with a wide range of keywords. See for instance the example of DBpedia and Schema.org, which are associated with trends like trend0 , trend1 and trend9 . On the other hand, for what concerns the more specific and domain-oriented ontologies, it is possible to notice that there can be a mismatch between their target and the trends they are associated with. One familiar with the Music Ontology, for instance, would have expected a strong connection with trend3 , trend4 or trend8 . However, mo is associated with trend1 and trend9 (and that is very strong compared to the expected trends). While we are careful not to draw overly general conclusions from this preliminary experiment with a small set of ontologies, we still observe a few salient phenomena. First of all, the differences between the “-core” and “-gold” results, suggest a further analysis of how the “-core” corpora can be directly extracted from the ontologies. The challenge here is to properly identify the choices applied during the annotation process and the generation of the gold standard, and to devise the proper automatic solution to better simulate that choices. In the above experiment, for instance, we implemented the NLP pipeline we described in Section 4. More NLP options need to be considered and compared. The similar impact of the semi-automatic grouping and the automatic group- ing suggests, on the other hand, that: i. grouping manually WSQs does not necessarily imply a more understandable or clear-cut division of trends, rather it may bias the gener- ation process (the confidence values for the semi-automatically generated trends are in- deed more unbalanced, compared to the automatically generated trends; see, for instance trend1 vs. trend2 ); ii. the WSQs preprocessing phase, where WSQs may be ‘cleaned’ or ‘deleted’, has a central role in the generation of trends, thus multiple versions of WSQs text files, created after the keywords’ selection step, should be tested and compared. Fi- nally, a more efficient set-up for the generation of trends might be found, and differ- ent approaches should be tested and compared (see for instance the integration of LDA with embeddings [17]). We foresee these improvements of our setup as immediate future work. 6. Related work Our work is mainly related to the huge research effort in ontology (functional) evalua- tion [18,19,20,21]. More specifically, the work that most overlaps with our efforts is that on data driven and competency questions (CQs) ontology evaluation [9,3], where the main goal is to support the ontology development requirement specification phase and facilitating the reuse of these semantic data structures. This work has been extensive and has exploited a huge amount of methods and techniques including, e.g. OntoKeeper [22] and TONE [1] (the former being a semiotic-driven approach for assessing biomedical ontologies, the latter being a very high precision evaluation method based on the con- cepts semantic richness notion). Our work differs from this in two major respects. The first is that we ground our data-driven approach on information gathered from the search data coming from large-scale web search engines. The formalization and the experimen- tal set-up of our method are then heavily influenced by the nature of this kind of data source and, in particular, by the necessity of adapting and modeling this data in order to make them exploitable in the context of ontology evaluation. Our goal is to propose a practically useful method to extend the assessment and analysis opportunities that are currently available for knowledge engineers. The second difference, which is actually a consequence of the first, is that, by offering a method for exploiting web search data, we allow knowledge engineers to rely on a huge amount of valuable information that can be integrated to the one gathered by means of more traditional methods. This new highly- scalable approach to intercept people’s informational needs about specific domains will help indeed the knowledge engineers to understand better what is the real potential of an ontology for a given domain of application. As last consideration, it is important to observe how the method we applied to iden- tify web search trends has been widely studied in the context of topic modeling [23,24]. Our approach introduces a new perspective on the application of these approaches over the web search data scenario. 7. Conclusion and Future Work In this paper, we have proposed a general approach for the assessment of ontologies ac- cording to (web) search trends, namely sets of weighted keywords derived from sets of web search queries. This, in turn, has allowed us to better understand how this assess- ment method can play a central role in the ontology engineering phase, in particular, by supporting the identification of ontologies usage likelihood. The future work will concentrate on a more fine-grained implementation of the search trends determination phase and an extension of the experimental set-up for im- proving the understanding of the prediction results and better simulate the results related to the gold-standard data sets. Another future goal is to generate a broad categorization of web search trends, considering a huge amount of search keywords and related domains of interest, and assess most of the currently available ontologies w.r.t. these identified trends. Acknowledgements. This work has been supported by the project “DELPhi - Discov- Ering Life Patterns” funded by the MIUR Progetti di Ricerca di Rilevante Interesse Nazionale (PRIN) 2017 - no. 1062, 31.05.2019 References [1] Demaidi MN, Gaber MM. TONE: A Method for Terminological Ontology Evaluation. In: Proceedings of the ArabWIC 6th Annual International Conference Research Track; 2019. p. 1–10. [2] De Nicola A, Missikoff M, Navigli R. A software engineering approach to ontology building. Informa- tion systems. 2009;34(2):258–275. [3] Bezerra C, Freitas F, Santana F. Evaluating ontologies with competency questions. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Tech- nologies (IAT). vol. 3. IEEE; 2013. p. 284–285. [4] Grüninger M, Fox MS. The role of competency questions in enterprise engineering. In: Benchmark- ing—Theory and practice. Springer; 1995. p. 22–31. [5] Mihajlovic V, Hiemstra D, Blok HE, Apers PM. Exploiting query structure and document structure to improve document retrieval effectiveness. Centre for Telematics and Information Technology (CTIT); 2006. [6] Figueroa A. Exploring effective features for recognizing the user intent behind web queries. Computers in Industry. 2015;68:162–169. [7] Gibbons K. Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle. Search Engine Watch Retrieved. 2014;24. [8] Jansen BJ, Booth DL, Spink A. Determining the informational, navigational, and transactional intent of Web queries. Information Processing & Management. 2008;44(3):1251–1266. [9] Brewster C, Alani H, Dasmahapatra S, Wilks Y. Data driven ontology evaluation. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). 2004. [10] Al-Aswadi FN, Chan HY, Gan KH. Automatic ontology construction from text: a review from shallow to deep learning trend. Artificial Intelligence Review. 2019:1–28. [11] McDaniel M, Storey VC. Evaluating Domain Ontologies: Clarification, Classification, and Challenges. ACM Computing Surveys (CSUR). 2019;52(4):1–44. [12] Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P. Basic objects in natural categories. Cognitive psychology. 1976;8(3):382–439. [13] Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993–1022. [14] Newman D, Asuncion A, Smyth P, Welling M. Distributed algorithms for topic models. Journal of Machine Learning Research. 2009;10(Aug):1801–1828. [15] Yao L, Mimno D, McCallum A. Efficient methods for topic model inference on streaming document col- lections. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining; 2009. p. 937–946. [16] Grinshpan AZ. An inequality for multiple convolutions with respect to Dirichlet probability measure. Advances in Applied Mathematics. 2017;82:102–119. [17] Moody CE. Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:160502019. 2016. [18] Hlomani H, Stacey D. Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal. 2014;1(5):1–11. [19] Gangemi A, Catenacci C, Ciaramita M, Lehmann J. Modelling ontology evaluation and validation. In: European Semantic Web Conference. Springer; 2006. p. 140–154. [20] Giunchiglia F, Fumagalli M. Entity Type Recognition – dealing with the Diversity of Knowledge. In: Seventeenth International Conference on Principles of Knowledge Representation and Reasoning; 2020. . [21] Giunchiglia F, Fumagalli M. Teleologies: Objects, actions and functions. In: International conference on conceptual modeling. Springer; 2017. p. 520–534. [22] Manion F, Liang C, Harris M, Wang D, He Y, Tao C, et al. OntoKeeper: Semiotic-driven ontology evaluation tool for biomedical ontologists. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2018. p. 1614–1617. [23] Wallach HM. Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international confer- ence on Machine learning; 2006. p. 977–984. [24] Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey. Multimedia Tools and Applications. 2019;78(11):15169– 15211.