An Elephant in the Dark
    Creating Semantic Representations of Perceived Data
                 with Conceptual Spaces

               Hadi Banaee, Erik Schaffernicht, and Amy Loutfi

 Centre for Applied Autonomous Sensor Systems, School of Science and Technology,
                   Örebro University, SE-701 82 Örebro, Sweden
             {hadi.banaee,erik.schaffernicht,amy.loutfi}@oru.se


      Abstract. This paper discusses the task of creating semantic represen-
      tations to describe numerical observations using conceptual spaces. The
      theory of conceptual spaces is considered as a semantic representation to
      conceptualise the perceived numerical information and to infer linguistic
      descriptions. We propose a data-driven approach to construct concep-
      tual spaces from numerical data automatically. First, the elements of a
      conceptual space are derived based on a set of numerical observations
      in order to semantically represent the concepts of a given data set. This
      data-driven conceptual space is then employed for the task of semantic
      inference, in order to linguistically describe unknown perceived observa-
      tions.

      Keywords: Semantic representation · Conceptual spaces.


1   Motivation
Some Hindus bring an elephant to be exhibited in a dark room. A number of men
touch and feel the elephant in the dark and, depending upon where they touch it,
they believe the elephant to be like a water spout (trunk), a fan (ear), a pillar
(leg) and a throne (back)... [1].
    This is the beginning of an ancient parable, called The Elephant in the Dark,
to demonstrate the problem of perception limitations. In this story, the individ-
uals have their own perceptions of the elephant (an unknown concept for them)
and therefore use their own inference to explain it. This is the problem of de-
scribing a concept based on the perceived information. The men sought to map
or categorise the perceived information according to similar concepts that were
known to them. However, their failure to successfully describe the concept of
Elephant was due to the limitations of their sensory perceptions.
    Describing unknown observations in natural language appears to be an easy
task for humans. Both speakers and hearers have a great deal of common sense
understanding of the concepts and properties that enable them to describe such
observations. An example is the description of “Hippogriffs” in J.K. Rowling’s
Harry Potter books as: Hippogriffs have the bodies, hind legs, and tails of horses,
but the front legs, wings, and heads of eagles, with cruel beaks and large orange
2                                 H. Banaee et al.


                Perceived              Semantic                   Inferred
                Numerical           Representations              Linguistic
               Information    (Data-driven Conceptual Spaces)   Descriptions


         Fig. 1: Semantic representation for describing numerical data.


eyes [2]. This description uses familiar concepts that are most similar (eagle
and horse), together with perceivable features (orange, large, etc.) that are un-
derstandable for humans. However, deriving descriptions for unknown concepts
is no trivial task in artificial intelligence (AI). This task is especially crucial if
the information given to the system is in the form of numeric or non-symbolic
measurements (e.g., sensor data).
    One goal of cognitive science is to construct artificial systems that can under-
stand and model the cognitive activities of humans, such as concept learning and
semantic inference [3]. However, a critical issue is how the given information is to
be modelled in knowledge representation frameworks [4,5]. Concerning the task
of the semantic description of concepts by means of perceived data, two aspects
need to be considered: induction and semantics. Inductive inference performs a
generalisation from a number of observations, which infers the characteristics of
the concepts. Semantic inference is the process of inferring meaningful descrip-
tions or truth conditions from semantically enriched information represented
in logical or natural sentences. Neither symbolic, nor sub-symbolic approaches
satisfactorily address these two AI problems simultaneously.
    Consequently, the theory of conceptual spaces was introduced by Gärden-
fors [6] as a mid-level representation to addressing both concept learning and
semantic inference problems [3]. A conceptual space consists of a set of quality
dimensions in various domains. These are placed within a geometrical structure
in order to model, categorise, and represent the concepts [6]. This paper consid-
ers the task of semantic representation in describing the numerical observations.
The semantic representation task investigates representational models in order
to be able to bind perceived numerical data as input into a set of linguistic char-
acterisations as output (See Fig. 1). Our claim here is that the conceptual spaces
can be considered as a semantic representation to conceptualise the perceived
numerical information and to be utilised to infer linguistic descriptions.
    Conceptual spaces are principally derived in a knowledge-driven manner,
on the assumption that there is prior knowledge from perceptual mechanisms
or experts that manually initialise the elements of the conceptual space (i.e.,
domains, quality dimensions, and concepts’ regions) [7]. However, the challenge
discussed here is how to automatically construct a conceptual space from given
information [8] to perform concept learning and semantic inference tasks. This is
an important motivation, due to a growing class of problems that involves more
complicated observations that have little or no prior knowledge concerning their
semantic significance [9].
                                                  An Elephant in the Dark        3

2   On the notion of Semantic Representation
The notion of a semantic representation has been used in a variety of ways in
different areas such as knowledge representation in AI, cognitive science, and
philosophy of language. Two prominent traditions for semantic representations
exist [10]. One is to study the semantics of words by representing the relations
of the words in natural language. For such representations, also called amodal
approaches [11], the input is linguistic information. Another tradition focuses
on conceptual structures for the representation of meanings, which considers the
relations between concepts and percepts to model the semantics. In this case
of semantic representations, also called experiential [12], the input is a set of
perceptual information. The origin of this kind of semantic representation is the
study of cognitive semantics, wherein the focus is on the meaning of the concepts
as a cognitive phenomenon [13]. Cognitive semantics considers the meaning of
linguistic expressions as mental entities coming from our perceptions. The per-
ceptual information is then formed as concepts in our mind. This point of view is
opposed by the realist approaches that define semantics as something out in the
world [6]. Here semantics can be represented using e.g., abstract propositions and
description logic, and can be modelled and verified by truth conditions. Within
the cognitive semantics, however, the meaning is a conceptual structure that
comes before the truth [6]. Semantic representations, from the cognitive point
of view, should be a conceptual structure which represents both perceptual and
linguistic information. In this work, the notion of a semantic representation fol-
lows the latter definition, by first constructing a conceptual representation using
perceptual information, and then inferring semantically enriched descriptions.
Therefore, a semantic representation of knowledge provides a conceptual struc-
ture for the meaning of perceived concepts [10]. This kind of representation eases
the task of semantic reasoning of the perceived information.


3   Data-driven Conceptual Spaces as Semantic
    Representations
The main contribution of this study is to investigate the possibility of inferring
human understandable semantics for any given data set through the data-driven
conceptual spaces. This section explains how a conceptual space can be automat-
ically constructed using the observed information, and then how such data-driven
conceptual space can be utilised to infer semantic descriptions for any unseen
observation. This process is then assessed by applying the approach on a data
set of leaf examples. The formal definitions, the proposed algorithms, and the
technical aspects of the framework are elaborated in detail in [14].

Identifying Quality Dimensions The origin of quality dimensions is still an
open question in the field of conceptual spaces [6]. Once the process of construct-
ing a conceptual space starts, as Quine noted in [15], some innate quality dimen-
sions are needed to make concept learning possible. However, there is no unique
4                               H. Banaee et al.


    Fig. 2: Steps for constructing a conceptual space from numerical data.


way to specify which set of dimensions is sufficient to characterise the concepts.
In many developed examples of conceptual spaces, determining the quality di-
mensions relies on the background knowledge. Phenomenal (human perceptual)
quality dimensions are usually chosen by the experts, and the scientific (sen-
sory) quality dimensions are usually inferred from the perceived behaviours [6].
However, this issue is more challenging when dealing with systems where there
is no prior knowledge to explain the semantics of dimensions. An agreed point
in the literature of conceptual spaces is that it is almost impossible to provide a
complete list of human perceptual quality dimensions [16].

Construction of Conceptual Spaces The framework we propose provides
a procedure to utilise machine learning algorithms for the task of identifying
relevant features and concepts in a numerical data set, to specify the domains
and quality dimensions of a conceptual space in a data-driven manner. Our un-
derlying assumption for the use of machine learning techniques is that highly
discriminative and distinctive features are adequate choices for quality dimen-
sions and domains, since they allow clear separation of the different concepts.
    To identify those discriminative and distinctive features, we use information-
theoretic measures like joint mutual information to rank the relevance of a feature
in relation to each concept. We represent the feature-concept associations in a
weighted bipartite graph and use a heuristic search, based on finding maximum
bi-cliques, to group high-ranked features, which are then chosen as domains for
the different concepts. After determining the quality dimensions and domains,
the concept representation is constructed from the available instances. Thereto,
two properties are estimated: the concept’s convex regions and the concept’s
salient weights in relation to the quality dimensions. This calculation is formu-
lated based on the associated observations to the concepts, without involving
the external knowledge. Fig. 2 illustrates the steps of constructing a conceptual
space from a set of numerical data in a data-driven manner.

Semantic Inference in Conceptual Spaces The semantic inference process
is introduced to linguistically represent a new observation within the built con-
ceptual space. First, a symbol space is introduced which includes the semantics
of the corresponding concepts and quality dimensions. Then, the inference of lin-
guistic descriptions for an unknown observation is performed in two phases: (1)
                                                  An Elephant in the Dark        5


     Fig. 3: Steps of semantic inference in constructed conceptual spaces.


                  Fig. 4: The conceptual space of leaf data set.


Inclusion: the new instance is localised into the built conceptual space to deter-
mine its associated concepts and quality dimensions. This determination is done
by considering the inclusion of the instance and the use of similarity measures in
such space. (2) Realisation: then, the lexicalisation of the instance is induced by
extracting the semantic labels of the associated concepts and quality dimensions.
Fig. 3 illustrates the steps of semantic inference for a new observation within a
constructed conceptual space.


A Case Study: Leaf Data Set The plausibility of the proposed approach
is tested using a set of leaf samples. The output conceptual space is then used
to infer linguistic descriptions for a set of new leaves. Fig. 4 shows the derived
conceptual space of six leaf concepts. The approach has specified six quality
dimensions that are grouped in three domains within the data-driven constructed
conceptual space. By applying the semantic inference on the conceptual space of
leaves, an unknown perceived leaf can be located within the space, and then be
characterised by its associated concepts and quality dimensions. For example, a
new leaf can be linguistically described as: “This unknown leaf is like Japanese
Maple leaves, but it is oval with a lobed margin.”
6                                H. Banaee et al.

4   Related Work on Conceptual Spaces and AI
The aim of representing knowledge in a conceptual space is to develop an intu-
itive interpretation of the relationship between symbolic and sub-symbolic infor-
mation [6,3]. Gärdenfors has discussed thoroughly the role of conceptual spaces
as a knowledge representation framework in AI systems [5], focusing on the tasks
of induction and reasoning [17]. Concept formation tightly connects the theory
of conceptual spaces to the learning problem. Many approaches for learning are
typically performed by connectionist approaches (i.e., neural network architec-
tures [18]). But such solutions neglect the explainability of the involved concepts
or the learned model itself. In addition to the theoretical AI problems, the feasi-
bility of using conceptual spaces has been studied in various application domains
of AI, such as geographical measurement [19], cognitive robotics [20,21], and vi-
sual perception [22]. Using data mining approaches in the process of deriving
conceptual spaces has been studied in a few isolated works. Keßler [8] outlined
the idea of using conceptual spaces to describe data, with some discussions on
the possibility of automatically generating such spaces from databases. Lee [23]
proposed a data mining method coupled with conceptual spaces, which addresses
cognitive tasks such as concept formation using clustering techniques. The main
drawback of these approaches is that they rely on knowing about the semantics
of an application area beforehand in order to directly determine the domains
and the quality dimensions.

5   Conclusions
This paper presents the notion of data-driven conceptual spaces as a tool for
creating semantic representations in order to linguistically describing numeri-
cal data. The proposed approach holds for certain classes of problems. It ex-
plores applications wherein the input data is difficult to interpret at first glance.
Within such applications, the task of specifying the interpretable domains and
dimensions based on human perceptions is non-trivial. These classes of problems
usually deal with raw sensor data (sometimes multi-variate data) with little or
no prior knowledge about their semantics [9]. One issue of constructing concep-
tual spaces in a data-driven way is the semantics of domains. Feature grouping
method is based on how well a subset of the features distinctly represents the
various concepts. However, there still exists the problem of verifying the semantic
dependency of the quality dimensions within a domain. Regarding this problem,
Gärdenfors in [6] suggests that the verification of deciding whether two quality
dimensions are integral or not can be done by empirical testing based on the
expert judgements, and not necessarily using analytical techniques. It is seem-
ingly difficult to realise the semantic dependency of the features analytically.
For example, values of RGB as the dimensions of the colour domain do not in-
dicate their semantic relations. Indeed, solving the issue of domain specification
can lead to forming a general solution to the problem of determining an evalu-
ation criterion to choose between competing conceptual spaces, an issue raised
by Gärdenfors in [17].
                                                      An Elephant in the Dark          7

References
 1. Rumi, J.a.D., Arberry, A.J.: Tales from the Masnavi. Curzon paperbacks. Curzon
    Press (1993)
 2. Rowling, J.K.: Harry Potter And The Prisoner Of Azkaban. New York: Arthur A.
    Levine Books (1999)
 3. Aisbett, J., Gibbon, G.: A general formulation of conceptual spaces as a meso level
    representation. Artificial Intelligence 133(1) (2001) 189–232
 4. Gärdenfors, P.: Symbolic, conceptual and subconceptual representations. In: Hu-
    man and Machine Perception. Springer (1997) 255–270
 5. Gärdenfors, P.: Conceptual spaces as a framework for knowledge representation.
    Mind and Matter 2(2) (2004) 9–27
 6. Gärdenfors, P.: Conceptual spaces: The geometry of thought. MIT press (2000)
 7. Rickard, J.T., Aisbett, J., Gibbon, G.: Reformulation of the theory of conceptual
    spaces. Information Sciences 177(21) (2007) 4539–4565
 8. Keßler, C.: Conceptual spaces for data descriptions. In: The cognitive approach
    to modeling environments (CAME), workshop at GIScience. (2006) 29–35
 9. Rickard, J.T.: A concept geometry for conceptual spaces. Fuzzy optimization and
    decision making 5(4) (2006) 311–329
10. Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation.
    Psychological review 114(2) (2007) 211
11. Feng, Y., Lapata, M.: Visual information in semantic representation, Association
    for Computational Linguistics (2010) 91–99
12. Vigliocco, G., Meteyard, L., Andrews, M., Kousta, S.: Toward a theory of semantic
    representation. Language and Cognition 1(2) (2009) 219–247
13. Allwood, J.S., Gärdenfors, P.: Cognitive semantics: Meaning and cognition. Vol-
    ume 55. John Benjamins Publishing (1999)
14. Banaee, H.: From Numerical Sensor Data to Semantic Representations : A Data-
    driven Approach for Generating Linguistic Descriptions. PhD thesis, Örebro Uni-
    versity, School of Science and Technology (2018)
15. Quine, W.V.O.: Ontological relativity and other essays. Number 1. Columbia
    University Press (1969)
16. Gärdenfors, P.: Semantics based on conceptual spaces. In: Indian Conference on
    Logic and Its Applications, Springer (2011) 1–11
17. Gärdenfors, P.: Induction, conceptual spaces and ai. The Dynamics of Thought
    (2005) 109–124
18. Sun, R.: Artificial intelligence: Connectionist and symbolic approaches. (1999)
19. Adams, B., Raubal, M.: Conceptual space markup language (csml): Towards the
    cognitive semantic web. In: ICSC. (2009) 253–260
20. LeBlanc, K.: Cooperative anchoring: sharing information about objects in multi-
    robot systems. (2010)
21. Chella, A., Frixione, M., Gaglio, S.: Anchoring symbols to conceptual spaces:
    the case of dynamic scenarios. Robotics and Autonomous Systems 43(2) (2003)
    175–188
22. Chella, A., Frixione, M., Gaglio, S.: A cognitive architecture for artificial vision.
    Artificial Intelligence 89(1-2) (1997) 73–111
23. Lee, I.: Data mining coupled conceptual spaces for intelligent agents in data-rich
    environments. In: International Conference on Knowledge-Based and Intelligent
    Information and Engineering Systems, Springer (2005) 42–48