<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Real-Estate Listings Based on Decision Systems over Ontological Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Extended Abstract</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Pancerz</string-name>
          <email>kpancerz@wszia.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Mich</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Information Technology and Management Sucharskiego Str.</institution>
          <addr-line>2, 35-225 Rzeszow</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Management and Administration Akademicka Str.</institution>
          <addr-line>4, 22-400 Zamosc</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the paper, we describe the process of mining real-estate listings based on decision systems over ontological graphs. Such decision systems have been proposed to deal with data in the form of concepts linked by di erent semantic relations. A special attention is focused on preprocessing steps transforming advertisements in the textual form into decision systems (decision tables) de ned over ontological graphs.</p>
      </abstract>
      <kwd-group>
        <kwd>decision systems</kwd>
        <kwd>ontological graphs</kwd>
        <kwd>data mining</kwd>
        <kwd>real-estate listings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Text mining is a rapidly growing application of knowledge discovery in data (cf.
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). A special kind of data in the textual form is constituted by advertisements,
for example, real-estate ones. In case of advertisements, data have the form of
loosely coupled words (terms, concepts) rather than full, grammatically correct
sentences. Moreover, underlying data are of qualitative character. We can
distinguish several challenges posed by textual data: understanding data semantics
and semantic relations between them, considering the external knowledge in
processes of data classi cation, encoding textual data for classi ers working with
numerical data.
      </p>
      <p>
        The semantic relations between concepts play an important role, among
others, in cognitive psychology, linguistics and currently also in computer science. In
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], two general types of relations between words (concepts) were distinguished:
Paradigmatic relations (relations between words belonging to the same
grammatical category) and Syntagmatic relations (relations between words that go
together in a syntactic structure). Paradigmatically related words are, to some
degree, grammatically substitutable for each other. In our research, we are
interested in paradigmatic relations. As it will be shown later, in real-estate listings,
we do not analyze a semantic structure of sentences, but we try to derive some
knowledge about concepts (terms) included in them, for example, whether they
are synonyms, whether one concept can be replaced with another, for example,
more general one, etc.
      </p>
      <p>
        In our research, we use the following taxonomy of types of semantic
relations (which is modeled on the project called Wikisaurus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] aiming at creating
a thesaurus of semantically related terms): synonymy, antonymy, hyponymy/
hyperonymy (subclass - superclass), and meronymy/ holonymy (part - whole).
In the approach presented in this paper, we are interested in synonymy and
hyponymy/ hyperonymy. We will use the following notation: isSyn denotes
synonymy, (u; v) 2 isSyn means that "u is a synonym of v", isGen denotes
hyponymy, (u; v) 2 isGen means that "u is a hyponym of v" ("u is generalized
by v"), and isSpec denotes hyperonymy, (u; v) 2 isSpec means that "u is a
hyperonym of v" ("u is specialized by v"). Additionally, we take into
consideration a semantic relation called "being an instance". Being an instance concerns
an example (instance) of a given concept. This kind of relations is important in
mining real-estate listings because they include, for example, instances of places.
      </p>
      <p>
        The knowledge about semantic relations between concepts is included in
ontologies. Our approach is based on the de nitions of ontology given by Neches et
al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Kohler [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. That is, ontology is constructed on the basis of a controlled
vocabulary and the relationships of the concepts in the controlled vocabulary.
Formally, the ontology can be represented by means of graph structures. Let
O be a given ontology. An ontological graph is a quadruple OG = (C; E; R; ),
where C is a nonempty, nite set of nodes representing concepts in the ontology
O, E C C is a nite set of edges representing relations between concepts
from C, R is a family of semantic descriptions of types of relations (represented
by edges) between concepts, and : E ! R is a function assigning a semantic
description of the relation to each edge. Sometimes, we can also consider
subgraphs of ontological graphs, called by us local ontological graphs. In Figure 1,
exemplary ontological graphs representing the real-estate domain are shown.
a)
      </p>
      <p>b)</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], information (decision) systems were proposed as the knowledge
representation systems. In simple case, they consist of vectors of numbers or symbols
(attribute values) describing objects from a given universe of discourse. In our
research, we are interested in mining textual data in the form of concepts (words,
terms). Therefore, in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], ontologies were incorporated into information
(decision) systems, i.e., attribute values were considered in the ontological
(semantic) space. Information (decision) systems over ontological graphs can be
created in di erent ways. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], two approaches were mentioned. In the rst
one, attribute values are concepts from ontologies assigned to attributes - a
simple information (decision) system over ontological graphs. In the second one,
attribute values are local ontological graphs of ontologies assigned to attributes
- a complex information (decision) system over ontological graphs. In case of
real-estate listings, whether a client was interested in a given property can be
considered as a decision attribute.
      </p>
      <p>
        The main goal of this paper is to show how to transform real-estate listings
into simple information (decision) systems over ontological graphs. This is a very
important preprocessing step that can be generally depicted as it is shown in
Figure 2. We can distinguish three main steps:
{ Stemming - de ning basic grammatical forms (roots) for particular words
existing in advertisements, for example, using a quite popular Porter stemming
algorithm [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
{ Attributation - assigning concepts (built from words) existing in
advertisements to proper attributes as their values, according to de ned ontological
graphs.
{ Deinstantiation - replacing instances existing in advertisements with the
most speci c concepts (with respect to the hyponymy / hyperonymy
relation) whose instances they are.
      </p>
      <p>Deinstantiation is an important step if we are interested in a more general
knowledge derived from real-estate listings, for example, some client is interested only
in houses in a village (not a particular one).</p>
      <p>Let us consider, as an example, the following advertisement: "For sale,
Warsaw, Poland, Villa, bedrooms: 4, 437 m2". Let us also assume that we take into
consideration three attributes: Transaction, Place, and Property with proper
ontological graphs assigned to them. After performing our procedure, the
advertisement becomes one object (row) in an information (decision) system:
U=A T ransaction</p>
      <p>P lace</p>
      <p>P roperty
u1</p>
      <p>
        Having an information (decision) system over ontological graphs, we can
apply di erent machine learning and data mining methods to extract some valuable
knowledge. In our previous papers, for such systems, we considered: rough sets
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], decision rules [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], decision rules based on the DRSA approach [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], neural
networks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. The Wikisaurus Homepage: http://en.wiktionary.org/wiki/Wiktionary: Wikisaurus</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kargupta</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sivakumar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yesha</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Data Mining: Next Generation Challenges</article-title>
          and
          <string-name>
            <given-names>Future</given-names>
            <surname>Directions</surname>
          </string-name>
          . The MIT Press, Cambridge, MA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Kohler, J.,
          <string-name>
            <surname>Philippi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Ruegg, A.:
          <article-title>Ontology based text indexing and querying for the semantic web</article-title>
          .
          <source>Knowledge-Based Systems 19</source>
          ,
          <fpage>744</fpage>
          {
          <fpage>754</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          :
          <article-title>Semantic Relations and the Lexicon</article-title>
          . Cambridge University Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Neches</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fikes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gruber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senator</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swartout</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Enabling technology for knowledge sharing</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>12</volume>
          (
          <issue>3</issue>
          ),
          <volume>36</volume>
          {
          <fpage>56</fpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Dominance-based rough set approach for decision systems over ontological graphs</article-title>
          . In: Ganzha,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maciaszek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Paprzycki</surname>
          </string-name>
          , M. (eds.)
          <source>Proceedings of the FedCSIS 2012</source>
          . pp.
          <volume>323</volume>
          {
          <fpage>330</fpage>
          .
          <string-name>
            <surname>Wroclaw</surname>
          </string-name>
          ,
          <string-name>
            <surname>Poland</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Toward information systems over ontological graphs</article-title>
          . In: Yao,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Slowinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Polkowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <source>Rough Sets and Current Trends in Computing, Lecture Notes in Arti cial Intelligence</source>
          , vol.
          <volume>7413</volume>
          , pp.
          <volume>243</volume>
          {
          <fpage>248</fpage>
          . Springer-Verlag, Berlin Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Decision rules in simple decision systems over ontological graphs</article-title>
          . In: Burduk,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Jackowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Kurzynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Wozniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zolnierek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the CORES'2013, Advances in Intelligent Systems and Computing</source>
          , vol.
          <volume>226</volume>
          , pp.
          <volume>111</volume>
          {
          <fpage>120</fpage>
          . Springer International Publishing (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Semantic relationships and approximations of sets: An ontological graph based approach</article-title>
          .
          <source>In: Proceedings of the HSI'2013</source>
          . pp.
          <volume>62</volume>
          {
          <fpage>69</fpage>
          .
          <string-name>
            <surname>Sopot</surname>
          </string-name>
          ,
          <string-name>
            <surname>Poland</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Some remarks on complex information systems over ontological graphs</article-title>
          . In: Gruca,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Czachorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Kozielski</surname>
          </string-name>
          , S. (eds.)
          <source>Man-Machine Interactions 3, Advances in Intelligent Systems and Computing</source>
          , vol.
          <volume>242</volume>
          , pp.
          <volume>377</volume>
          {
          <fpage>384</fpage>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewicki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Encoding symbolic features in simple decision systems over ontological graphs for PSO and neural network based classi ers</article-title>
          .
          <source>Neurocomputing</source>
          <volume>144</volume>
          ,
          <issue>338</issue>
          {
          <fpage>345</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.: Rough</given-names>
          </string-name>
          <string-name>
            <surname>Sets</surname>
          </string-name>
          .
          <source>Theoretical Aspects of Reasoning about Data</source>
          . Kluwer Academic Publishers, Dordrecht (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An algorithm for su x stripping</article-title>
          .
          <source>Program</source>
          <volume>14</volume>
          ,
          <issue>130</issue>
          {
          <fpage>137</fpage>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>