Towards a Distributional Semantic Web Stack

               André Freitas1 , Edward Curry1 , Siegfried Handschuh1,2
     1
         Insight Centre for Data Analytics, National University of Ireland, Galway
           2
             School of Computer Science and Mathematics, University of Passau


         Abstract. The capacity of distributional semantic models (DSMs) to
         discover similarities over large scale heterogeneous and poorly structured
         data brings them as a promising universal and low-effort framework to
         support semantic approximation and knowledge discovery. This position
         paper explores the role of distributional semantics in the Semantic Web
         vision, based on state-of-the-art distributional-relational models, catego-
         rizing and generalizing existing approaches into a Distributional Seman-
         tic Web stack.


1    Introduction

Distributional semantics is based on the idea that semantic information can be
extracted from lexical co-occurrence from large-scale data corpora. The simplic-
ity of its vector space representation, its ability to automatically derive meaning
from large-scale unstructured and heterogeneous data and its built-in seman-
tic approximation capabilities are bringing distributional semantic models as a
promising approach to bring additional flexibility into existing knowledge repre-
sentation frameworks.
    Distributional semantic approaches are being used to complement the seman-
tics of structured knowledge bases, generating hybrid distributional-relational
models. These hybrid models are built to support semantic approximation, and
can be applied to selective reasoning mechanisms, reasoning over incomplete
KBs, semantic search, schema-agnostic queries over structured knowledge bases
and knowledge discovery.


2    Distributional Semantic Models

Distributional semantic models (DSMs) are semantic models which are based on
the statistical analysis of co-occurrences of words in large corpora. Distributional
semantics allows the construction of a quantitative model of meaning, where the
degree of the semantic association between different words can be quantified
in relation to a reference corpus. With the availability of large Web corpora,
comprehensive distributional models can effectively be built.
    DSMs are represented as a vector space model, where each dimension repre-
sents a context C for the linguistic or data context in which the target term T
occurs. A context can be defined using documents, co-occurrence window sizes
(number of neighboring words or data elements) or syntactic features. The dis-
tributional interpretation of a target term is defined by a weighted vector of the
contexts in which the term occurs, defining a geometric interpretation under a
distributional vector space. The weights associated with the vectors are defined
using an associated weighting scheme W, which can re-calibrates the relevance
of more generic or discriminative contexts. A semantic relatedness measure S
between two words in the dataset can be calculated by using different similar-
ity/distance measures such as the cosine similarity or Euclidean distance. As
the dimensionality of the distributional space can grow large, dimensionality
reduction approaches d can be applied.
    Different DSMs are built by varying the parameters of the tuple (T , C, W, d, S).
Examples of distributional models are Latent Semantic Analysis, Random In-
dexing, Dependency Vectors, Explicit Semantic Analysis, among others. Distri-
butional semantic models can be specialized to different application areas using
different corpora.


3   Distributional-Relational Models (DRMs)

Distributional-Relational Models (DRMs) are models in which the semantics of
a structured knowledge base (KB) is complemented by a distributional semantic
model.
    A Distributional-Relational Model (DRM) is a tuple (DSM, KB, RC, F, H, OP),
where: DSM is the associated distributional semantic model ; KB is the struc-
tured dataset, with elements E and tuples Ω; RC is the reference corpora which
can be unstructured, structured or both. The reference corpora can be internal
(based on the co-occurrence of elements within the KB) or external (a separate
reference corpora); F is a map which translates the elements ei ∈ E into vectors
→
−
ei in the the distributional vector space V S DSM using the natural language
label and the entity type of ei ; H is a set of threshold values for S above which
two terms are considered to be equivalent; OP is a set of operations over −  →ei in
V S DSM and over E and Ω in the KB. The set of operations may include search,
query and graph navigation operations using the distance measure S.
    The DRM supports a double perspective of semantics, keeping the fine-
grained precise semantics of the structured KB but also complementing it with
the distributional model. Two main categories of DRMs and associated applica-
tions can be distinguished:
Semantic Matching & Commonsense Reasoning: In this category the RC
is unstructured and it is distinct from the KB. The large-scale unstructured RC
is used as a commonsense knowledge base. Freitas & Curry [1] define a DRM
(τ − Space) for supporting schema-agnostic queries over the structured KB:
terms used in the query are projected into the distributional vector space and
are semantically matched with terms in the KB via distributional semantics
using commonsense information embedded on large scale unstructured corpora
RC. In a different application scenario, Freitas et al. [3] uses the τ − Space to
support selective reasoning over commonsense KBs. Distributional semantics is
used to select the facts which are semantically relevant under a specific reasoning
context, allowing the scoping of the reasoning context and also coping with
incomplete knowledge of commonsense KBs. Pereira da Silva & Freitas [2] used
the τ − Space to support approximate reasoning on logic programs.
Knowledge Discovery: In this category, the structured KB is used as a dis-
tributional reference corpora (where RC = KB). Implicit and explicit semantic
associations are used to derive new meaning and discover new knowledge. The
use of structured data as a distributional corpus is a pattern used for knowledge
discovery applications, where knowledge emerging from similarity patterns in the
data can be used to retrieve similar entities and expose implicit associations. In
this context, the ability to represent the KB entities’ attributes in a vector space
and the use of vector similarity measures as way to retrieve and compare similar
entities can define universal mechanisms for knowledge discovery and semantic
approximation. Novacek et al. [5] describe an approach for using web data as a
bottom-up phenomena, capturing meaning that is not associated with explicit
semantic descriptions, applying it to entity consolidation in the life sciences do-
main. Speer et al. [8] proposed AnalogySpace, a DRM over a commonsense KB
using Latent Semantic Indexing targeting the creation of the analogical closure
of a semantic network using dimensional reduction. AnalogySpace was used to
reduce the sparseness of the KB, generalizing its knowledge, allowing users to
explore implicit associations. Cohen et al. [6] introduced PSI, a predication-
based semantic indexing for biomedical data. PSI was used for similarity-based
retrieval and detection of implicit associations.


4   The Distributional Semantic Web Stack

DRMs provide universal mechanisms which have fundamental features for se-
mantic systems: (i) built-in semantic approximation for terminological and in-
stance data; (ii) ability to use large-scale unstructured data as commonsense
knowledge, (iii) ability to detect emerging implicit associations in the KB, (iv)
simplicity of use supported by the vector space model abstraction, (v) robust-
ness with regard to poorly structured, heterogeneous and incomplete data. These
features provide a framework for a robust and easy-to-deploy semantic approx-
imation component grounded on large-scale data. Considering the relevance of
these features in the deployment of semantic systems in general, this paper syn-
thesizes its vision by proposing a Distributional Semantic Web stack abstraction
(Figure 1), complementing the Semantic Web stack. At the bottom of the stack,
unstructured and structured data can be used as reference corpora together
with the target KB (RDF(S)). Different elements of the distributional model
are included as optional and composable elements of the architecture. The ap-
proximate search and query operations layer access the DSM layer, supporting
users with semantically flexible search and query operations. A graph navigation
layer defines graph navigation algorithms (e.g. such as spreading activation,
bi-directional search) using the semantic approximation and the distributional
information from the layers below.
      1     Distributional-Relational Model                                 2     Distributional Stack for Semantic Web
                c1                        Kate Hudson
                                                                             Semantic search                                             Spreading activation
                           childOf                      Structured KB                                            Application             Random walk


                                           ...
                                      rθ
                                              :spouse                   Question Answering
                                                                                                                                         Bi-directional search


                              ...
                            θ r
                                         Chris                          Ontology alignment
        Ƭ -Space                                       Operations &                                                                      ...
                                      Robinson                       Recommender systems                 Graph Navigation Model
                                                        Navigational
                    θ  e
                                                           Model        Personalised search
                               Stanley
                              Robinson
                                                                                             ...                                         Latent Semantic
                                                 c0                                                    Approximate Search / Query        Analysis
                                                                            Cosine similarity                                            Explicit Semantic
     cn      context space                 semantic relatedness =        Euclidean distance           Distributional Semantic Model      Analysis
                                     cos(θ) = daughter . child = 0.234      Jaccard distance                                             Random Indexing
                                                                                 Chebyshev                                     Context   ...
    Distributional Semantic Model                                                                     Similarity Model
                                                                                 Correlation                                    Model
    Reference Data                                                                         ...
         Corpus                                                                                                                           Window size
                        context c term                                                               Weighting Model                      Linguistic features
   same term                                  context c
                                                                                    TF/IDF                                                Entity type
                                 ...                                           Okapi BM25                                                 ...
                                                                                                   Target data     Reference corpora
                                                  distribution of      Mutual Information
                                                term associations                    T-Test
                                                                                                       RDF(S)            Ustructured
                                                                                         χ2
                Observer                                                                 ...
     Observer I II
                       ... a child (plural: children) is a human between ...
                            ...is the third child and only son of Prince ...
        Reality
                            ... was the first son and last child of King ...
                                            ...

Fig. 1: (A) Depiction of an example DRM (τ − Space) (B) Distributional Seman-
tic Web stack.

Acknowledgment: This publication was supported in part by Science Foundation
Ireland (SFI) (Grant Number SFI/12/RC/2289) and by the Irish Research Council.


References
1. Freitas, A., Curry, E., Natural Language Queries over Heterogeneous Linked Data
   Graphs: A Distributional-Compositional Semantics Approach. In Proc. of the 19th
   Intl. Conf. on Intelligent User Interfaces (IUI). (2014).
2. Pereira da Silva, J.C., Freitas A., Towards An Approximative Ontology-Agnostic
   Approach for Logic Programs, In Proc. of the 8th Intl. Symposium on Foundations
   of Information and Knowledge Systems. (2014).
3. Freitas, A., Pereira Da Silva, J.C., Curry, E., Buitelaar, P., A Distributional Seman-
   tics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases.
   In Proc. of the 19th Int .Conf. on Applications of Natural Language to Information
   Systems (NLDB). (2014).
4. Speer, R., Havasi, C., Lieberman, H., AnalogySpace: Reducing the Dimensionality of
   Common Sense Knowledge. In Proc. of the 23rd Intl. Conf. on Artificial Intelligence,
   548-553. (2008).
5. Novacek, V., Handschuh, S., Decker, S.. Getting the Meaning Right: A Comple-
   mentary Distributional Layer for the Web Semantics. In Proc. of the Intl. Semantic
   Web Conference, 504-519. (2011).
6. Cohen, T., Schvaneveldt, R.W., Rindflesch, T.C.. Predication-based Semantic In-
   dexing: Permutations as a Means to Encode Predications in Semantic Space. T.
   AMIA Annu Symp Proc., 114-118. (2009).
7. Turney, P.D., Pantel P., From frequency to meaning: vector space models of seman-
   tics. J. Artif. Int. Res., 37(1), 141–188. (2010).
8. Speer, R., Havasi, C., Lieberman, H., AnalogySpace: Reducing the Dimensionality of
   Common Sense Knowledge. In Proc. of the 23rd Intl. Conf. on Artificial Intelligence,
   548-553. (2008).