Towards a Distributional Semantic Web Stack André Freitas1 , Edward Curry1 , Siegfried Handschuh1,2 1 Insight Centre for Data Analytics, National University of Ireland, Galway 2 School of Computer Science and Mathematics, University of Passau Abstract. The capacity of distributional semantic models (DSMs) to discover similarities over large scale heterogeneous and poorly structured data brings them as a promising universal and low-effort framework to support semantic approximation and knowledge discovery. This position paper explores the role of distributional semantics in the Semantic Web vision, based on state-of-the-art distributional-relational models, catego- rizing and generalizing existing approaches into a Distributional Seman- tic Web stack. 1 Introduction Distributional semantics is based on the idea that semantic information can be extracted from lexical co-occurrence from large-scale data corpora. The simplic- ity of its vector space representation, its ability to automatically derive meaning from large-scale unstructured and heterogeneous data and its built-in seman- tic approximation capabilities are bringing distributional semantic models as a promising approach to bring additional flexibility into existing knowledge repre- sentation frameworks. Distributional semantic approaches are being used to complement the seman- tics of structured knowledge bases, generating hybrid distributional-relational models. These hybrid models are built to support semantic approximation, and can be applied to selective reasoning mechanisms, reasoning over incomplete KBs, semantic search, schema-agnostic queries over structured knowledge bases and knowledge discovery. 2 Distributional Semantic Models Distributional semantic models (DSMs) are semantic models which are based on the statistical analysis of co-occurrences of words in large corpora. Distributional semantics allows the construction of a quantitative model of meaning, where the degree of the semantic association between different words can be quantified in relation to a reference corpus. With the availability of large Web corpora, comprehensive distributional models can effectively be built. DSMs are represented as a vector space model, where each dimension repre- sents a context C for the linguistic or data context in which the target term T occurs. A context can be defined using documents, co-occurrence window sizes (number of neighboring words or data elements) or syntactic features. The dis- tributional interpretation of a target term is defined by a weighted vector of the contexts in which the term occurs, defining a geometric interpretation under a distributional vector space. The weights associated with the vectors are defined using an associated weighting scheme W, which can re-calibrates the relevance of more generic or discriminative contexts. A semantic relatedness measure S between two words in the dataset can be calculated by using different similar- ity/distance measures such as the cosine similarity or Euclidean distance. As the dimensionality of the distributional space can grow large, dimensionality reduction approaches d can be applied. Different DSMs are built by varying the parameters of the tuple (T , C, W, d, S). Examples of distributional models are Latent Semantic Analysis, Random In- dexing, Dependency Vectors, Explicit Semantic Analysis, among others. Distri- butional semantic models can be specialized to different application areas using different corpora. 3 Distributional-Relational Models (DRMs) Distributional-Relational Models (DRMs) are models in which the semantics of a structured knowledge base (KB) is complemented by a distributional semantic model. A Distributional-Relational Model (DRM) is a tuple (DSM, KB, RC, F, H, OP), where: DSM is the associated distributional semantic model ; KB is the struc- tured dataset, with elements E and tuples Ω; RC is the reference corpora which can be unstructured, structured or both. The reference corpora can be internal (based on the co-occurrence of elements within the KB) or external (a separate reference corpora); F is a map which translates the elements ei ∈ E into vectors → − ei in the the distributional vector space V S DSM using the natural language label and the entity type of ei ; H is a set of threshold values for S above which two terms are considered to be equivalent; OP is a set of operations over − →ei in V S DSM and over E and Ω in the KB. The set of operations may include search, query and graph navigation operations using the distance measure S. The DRM supports a double perspective of semantics, keeping the fine- grained precise semantics of the structured KB but also complementing it with the distributional model. Two main categories of DRMs and associated applica- tions can be distinguished: Semantic Matching & Commonsense Reasoning: In this category the RC is unstructured and it is distinct from the KB. The large-scale unstructured RC is used as a commonsense knowledge base. Freitas & Curry [1] define a DRM (τ − Space) for supporting schema-agnostic queries over the structured KB: terms used in the query are projected into the distributional vector space and are semantically matched with terms in the KB via distributional semantics using commonsense information embedded on large scale unstructured corpora RC. In a different application scenario, Freitas et al. [3] uses the τ − Space to support selective reasoning over commonsense KBs. Distributional semantics is used to select the facts which are semantically relevant under a specific reasoning context, allowing the scoping of the reasoning context and also coping with incomplete knowledge of commonsense KBs. Pereira da Silva & Freitas [2] used the τ − Space to support approximate reasoning on logic programs. Knowledge Discovery: In this category, the structured KB is used as a dis- tributional reference corpora (where RC = KB). Implicit and explicit semantic associations are used to derive new meaning and discover new knowledge. The use of structured data as a distributional corpus is a pattern used for knowledge discovery applications, where knowledge emerging from similarity patterns in the data can be used to retrieve similar entities and expose implicit associations. In this context, the ability to represent the KB entities’ attributes in a vector space and the use of vector similarity measures as way to retrieve and compare similar entities can define universal mechanisms for knowledge discovery and semantic approximation. Novacek et al. [5] describe an approach for using web data as a bottom-up phenomena, capturing meaning that is not associated with explicit semantic descriptions, applying it to entity consolidation in the life sciences do- main. Speer et al. [8] proposed AnalogySpace, a DRM over a commonsense KB using Latent Semantic Indexing targeting the creation of the analogical closure of a semantic network using dimensional reduction. AnalogySpace was used to reduce the sparseness of the KB, generalizing its knowledge, allowing users to explore implicit associations. Cohen et al. [6] introduced PSI, a predication- based semantic indexing for biomedical data. PSI was used for similarity-based retrieval and detection of implicit associations. 4 The Distributional Semantic Web Stack DRMs provide universal mechanisms which have fundamental features for se- mantic systems: (i) built-in semantic approximation for terminological and in- stance data; (ii) ability to use large-scale unstructured data as commonsense knowledge, (iii) ability to detect emerging implicit associations in the KB, (iv) simplicity of use supported by the vector space model abstraction, (v) robust- ness with regard to poorly structured, heterogeneous and incomplete data. These features provide a framework for a robust and easy-to-deploy semantic approx- imation component grounded on large-scale data. Considering the relevance of these features in the deployment of semantic systems in general, this paper syn- thesizes its vision by proposing a Distributional Semantic Web stack abstraction (Figure 1), complementing the Semantic Web stack. At the bottom of the stack, unstructured and structured data can be used as reference corpora together with the target KB (RDF(S)). Different elements of the distributional model are included as optional and composable elements of the architecture. The ap- proximate search and query operations layer access the DSM layer, supporting users with semantically flexible search and query operations. A graph navigation layer defines graph navigation algorithms (e.g. such as spreading activation, bi-directional search) using the semantic approximation and the distributional information from the layers below. 1 Distributional-Relational Model 2 Distributional Stack for Semantic Web c1 Kate Hudson Semantic search Spreading activation childOf Structured KB Application Random walk ... rθ :spouse Question Answering Bi-directional search ... θ r Chris Ontology alignment Ƭ -Space Operations & ... Robinson Recommender systems Graph Navigation Model Navigational θ e Model Personalised search Stanley Robinson ... Latent Semantic c0 Approximate Search / Query Analysis Cosine similarity Explicit Semantic cn context space semantic relatedness = Euclidean distance Distributional Semantic Model Analysis cos(θ) = daughter . child = 0.234 Jaccard distance Random Indexing Chebyshev Context ... Distributional Semantic Model Similarity Model Correlation Model Reference Data ... Corpus Window size context c term Weighting Model Linguistic features same term context c TF/IDF Entity type ... Okapi BM25 ... Target data Reference corpora distribution of Mutual Information term associations T-Test RDF(S) Ustructured χ2 Observer ... Observer I II ... a child (plural: children) is a human between ... ...is the third child and only son of Prince ... Reality ... was the first son and last child of King ... ... Fig. 1: (A) Depiction of an example DRM (τ − Space) (B) Distributional Seman- tic Web stack. Acknowledgment: This publication was supported in part by Science Foundation Ireland (SFI) (Grant Number SFI/12/RC/2289) and by the Irish Research Council. References 1. Freitas, A., Curry, E., Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach. In Proc. of the 19th Intl. Conf. on Intelligent User Interfaces (IUI). (2014). 2. Pereira da Silva, J.C., Freitas A., Towards An Approximative Ontology-Agnostic Approach for Logic Programs, In Proc. of the 8th Intl. Symposium on Foundations of Information and Knowledge Systems. (2014). 3. Freitas, A., Pereira Da Silva, J.C., Curry, E., Buitelaar, P., A Distributional Seman- tics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases. In Proc. of the 19th Int .Conf. on Applications of Natural Language to Information Systems (NLDB). (2014). 4. Speer, R., Havasi, C., Lieberman, H., AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge. In Proc. of the 23rd Intl. Conf. on Artificial Intelligence, 548-553. (2008). 5. Novacek, V., Handschuh, S., Decker, S.. Getting the Meaning Right: A Comple- mentary Distributional Layer for the Web Semantics. In Proc. of the Intl. Semantic Web Conference, 504-519. (2011). 6. Cohen, T., Schvaneveldt, R.W., Rindflesch, T.C.. Predication-based Semantic In- dexing: Permutations as a Means to Encode Predications in Semantic Space. T. AMIA Annu Symp Proc., 114-118. (2009). 7. Turney, P.D., Pantel P., From frequency to meaning: vector space models of seman- tics. J. Artif. Int. Res., 37(1), 141–188. (2010). 8. Speer, R., Havasi, C., Lieberman, H., AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge. In Proc. of the 23rd Intl. Conf. on Artificial Intelligence, 548-553. (2008).