<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uncertainty Issues in Automating Process Connecting Web and User</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alan Eckhardt</string-name>
          <email>alan.eckhardt@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Horváth</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dušan Maruščák</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Róbert Novotný</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Vojtáš</string-name>
          <email>peter.vojtas@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>P. J. Šafárik University Košice</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We are interested in replacing human processing of web resources by automated processing. Based on an experimental system we identify uncertainty issues which make this process difficult for automated processing. We show these uncertainty issues are connected with Web content mining and user preference mining. We conclude with a discussion of possible future development heading to an extension of web modeling standards with uncertainty features.</p>
      </abstract>
      <kwd-group>
        <kwd>Uncertainty modeling</kwd>
        <kwd>Uncertain reasoning</kwd>
        <kwd>World Wide Web</kwd>
        <kwd>Web content mining</kwd>
        <kwd>User profile mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The amount of data accessible on Web is a great challenge for web search systems.
Using these data (and information and knowledge hidden in them) can be a
competitive advantage both for companies and individuals. Hence Web search
systems form a part of different systems ranging from marketing systems, competitors
and/or price tracking systems to private decision support systems.</p>
      <p>
        The main vision of Semantic Web [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is to automate some web search activities
that a human is able to do personally, but they are time-consuming or tedious. Using
this automation of human search will speed up the process of searching, find a wider
range of resources and when necessary soften and optimize our search criteria.
      </p>
      <p>
        We quote the Uncertainty Reasoning for the World Wide Web (URW3) Incubator
Group charter [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]: “…as work with semantics and services (on the Web) grows more
ambitious, there is increasing appreciation of the need for principled approaches to
representing and reasoning under uncertainty. In this Charter, the term «uncertainty»
is intended to encompass a variety of forms of incomplete knowledge, including
incompleteness, inconclusiveness, vagueness, ambiguity, and others. The term
«uncertainty reasoning» is meant to denote the full range of methods designed for
representing and reasoning with knowledge when Boolean truth values are unknown,
unknowable, or inapplicable. Commonly applied approaches to uncertainty reasoning
include probability theory, Dempster-Shafer theory, fuzzy logic, and numerous other
methodologies.” In this paper we are using term “uncertainty” in this wider (generic)
understanding and we would like to contribute to these efforts (for related discussion
see [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]).
      </p>
      <p>In this paper we concentrate especially to issues connected with replacing human
abilities on the web by software. From this point of view, some sorts of uncertainty
are not “human_to_machine_web” specific, like faulty sensors, input errors, data
recorded statistically, medical diagnosis, weather prediction, gambling etc. These are
difficult for human alone and also outside the web.</p>
      <p>
        According to Turtle and Croft [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], uncertainty in information retrieval can be
found especially in three areas: “Firstly, there is the problem of the representation
and annotation of a resource (service). Difficulties arise also in case when attempting
to represent the degree to which a resource is relevant to the task. The second
problem is the representation of the kind of information, action, that a user needs to
retrieve, perform (this need is especially difficult since it typically changes during the
session). Thirdly, it is necessary to match user needs to resource concepts.”
      </p>
      <p>In our opinion, these areas of uncertainty apply also to our case, when replacing
human activities on the web by software. Specific tasks connected to these three
problems are depicted in Figure 1 and we will discuss them in this paper.</p>
      <p>Web</p>
      <p>Web Content</p>
      <p>Mining</p>
      <p>Middleware</p>
      <p>User Profile Mining</p>
      <p>Our goal is to discuss uncertainty issues based on a system integrating the whole
chain of tools from the Web to the user. The uncertainty problem here appears as a
problem of two inductive procedures. Two types of data mining that appear in these
systems will be discussed here. One is Web content mining and second is user profile
(preference) mining. Middleware will do the matching part and query evaluation
optimization.</p>
      <p>As a motivating example, assume that we have users looking for a hotel in a certain
region. The amount of data is huge and they are distributed over several sites.
Moreover users have different preferences which are soft and difficult to express in a
standard query language.</p>
      <p>
        From the middleware point of view, there is no chance to evaluate user’s query
over all data. For middleware we have decided to use Fagin threshold algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
which can find best (top-k) answers without looking to all objects. Fagin algorithm
works under following assumptions. First, we have the access to objects (in our case
hotels) in different lists ordered by user particular attribute ordering, equipped by a
numerical score ranging from 0 to 1, e.g. f1(x) = cheap(x), f2(x) = close(x),… Second,
we have a combination function computing total fuzzy preference value of an object
based on preference values of attributes, e.g. @(x) = ((3*cheap(x) + close(x))/4).
      </p>
      <p>In the practical application we have to consider different users with possible
different attribute orderings f1u, f2u and combination functions @u. These represent the
overall user preference @u(f1u, f2u) and the user profile for this task. The task for the
user profile mining part is to find these particular attribute orderings and the
combination function (using user’s ranking of a sample of hotels).</p>
      <p>On the web side of our system, the information of vendors, companies or
advertisement is very often presented using Web pages in a structured layout
containing data records. These serve for company presentation and are assumed to be
mainly visited by a potential customer personally.</p>
      <p>Structured data objects belong to very important type of information on the Web
for systems dealing with competitor tracking, market intelligence or tracking of
pricing information from sources like vendors.</p>
      <p>We need to bring this data to our middleware. Due to the size of Web, the
bottleneck is the degree of automation of data extraction. We have to balance the
tradeoff between the degree of automation of Web data extraction and the amount of
user (administrator) effort which is needed to train data extractor for a special type of
pages (increasing precision).</p>
      <p>First restriction we make is that we consider Web pages containing several
structured data records. This is usually the case of Web pages of companies and
vendors containing information about products and services and, in our case, hotels.
Main problem is to extract data and especially attribute values to middleware.</p>
      <p>Although we use a system which has the modules in experimental implementation,
we do not present this system here. Our main contributions are
• Identification of some uncertainty issues in web content mining system
and extracting attribute values from structured pages with several records
• Identification of some uncertainty issues in user profile model and using
profile mining methods
• Discussion of coupling of these systems via a middleware based on Fagin
threshold algorithm complemented by various storage and querying
methods
We point to uncertainty issues by inserting (UNC) in the appropriate place in the text.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Uncertainty in Web Content Mining</title>
      <p>In this section we describe our experience with a system for information extraction
from certain types of web pages and try to point out places where uncertainty
occurred.</p>
      <p>Using our motivation as a running example, imagine a user looking for a hotel in a
certain location. A relevant page for a user searching for hotels can look as on
Figure 2. Comparing more similar pages would increase the chance of finding the best
hotel. An automated tool would enhance this search.</p>
      <p>
        For structured Web data extraction it is possible to use semiautomatic systems like
Lixto [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Stalker [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or WIEN [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. These require user preannotated pages, which
are used in the training process. Moreover, they are most suitable for pages, which
have dynamic content, but relatively fixed structure.
      </p>
      <p>
        Our solution is based on different approach. Instead of training techniques we use
the automatic discovery of data regions which encompass multiple similar data
records on the page. This is supported by an extraction ontology [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which is used to
extract the values from data records. There are many ways how to search for similar
records in source tree. The system IEPAD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] uses the features of Patricia tree (radix
tree) to find the repeating sequences. This system is outperformed by the MDR
system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which operates directly on the DOM tree of input in which it searches for
repeating node sequences with same parent. However, both methods search for
objects of interest in the whole web document. This can be time consuming and, as
we have experienced, it surprisingly decreases precision. Furthermore, these systems
do not extract attribute values from data records.
      </p>
      <p>In this paper we consider a system as a sequence of both data record extraction and
attribute value selection, with possibility of ontology starting almost from scratch
(e.g. user search key words).</p>
      <p>The system will be described in several phases, which are described in the
following sections.
2.1</p>
      <sec id="sec-2-1">
        <title>Data Regions and Data Records Discovery</title>
        <p>
          The first step in the extraction process is the retrieval of relevant web pages. For
automatic localization of such resources we use the system Egothor (see [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
and [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]), which is an open-source, high-performance,
full-featured text search engine. This system is used for
downloading the HTML source codes of relevant pages.
        </p>
        <p>
          In the next step we build a DOM model [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] of the
web page under considerations. This model is used for
both data region and data records extraction. Figure 2
shows an example of relevant web page. This page
contains summary information about three hotels, i. e.
three data records. All of them form a single data region.
        </p>
        <p>Our goal is to automatically discover this data region
and records within. (It should be noted that the
discovery process is not limited to the single-region
pages).</p>
        <p>To reduce the search space and to increase precision,
we prune the input DOM tree, omitting elements which
do not contain any textual information in their subtrees.</p>
        <p>An example of such tree is shown on the Figure 3 – the
numbers in black circles represent the relevance of the
particular node. Zero-weighted nodes are omitted from
the data record search. (UNC1) To identify nodes with
relevant information in the sub-tree is the first
uncertainty problem we point out in our system.</p>
        <p>Next, we use breadth first partial tree alignment to
detect data regions and records by taking element tuples,
triples etc. and comparing their corresponding subtrees
by various metrics (e. g. the tree Levenshtein distance) (UNC2) To tune the similarity
measures for discovery of similar tags is another Fig.3. DOM subtree
uncertainty problem in our system.</p>
        <p>Most often every repeated sequence of tags discovered in section 2.1 makes up a
real data record (a single hotel). All attributes of this record can be found in one
subtree and we can proceed to the attribute extraction using the ontology. However, the
non-contiguous data records can pose a problem in the region discovery phase.
Typically a data record constitutes a single visual region, nevertheless in the HTML
code can two or more records occur in a single table, which means that attributes of
these records have a common subtree. It is therefore necessary to identify
noncontiguous data records and separate attributes of these records (UNC3).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Attribute Values Extraction</title>
        <p>As we have mentioned before, we use an ontology to extract the actual attribute
values of product in the page. This ontology is dynamic – it starts from the scratch,
containing user search keywords, and subsequently it evolves with new key words
and typical values (using standard vocabularies). It is represented in OWL syntax with
additional annotation properties and allows the specification of values extraction
parameters: e. g. a regular expression which can be used to match the attribute values,
an explicit enumeration of possible attribute values, or the tuning parameters (such as
maximum or minimum attribute value length). It is evident, that the richer ontology
leads to better results in the extraction process. An example of ontology specification
can be seen on Figure 4:
&lt;owl:DatatypeProperty rdf:ID="hasPrice"&gt;
&lt;rdfs:domain rdf:resource="#Hotel"/&gt;
&lt;p1:maxLength
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"&gt;
10
&lt;/p1:maxLength&gt;
&lt;p1:pattern rdf:datatype="http://www.w3.org/2001/XMLSchema#string"&gt;
(\$)? ?[\d]{1,10} ?(.){1,3}
&lt;/p1:pattern&gt;
&lt;rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"&gt;</p>
        <p>PRICE
&lt;/rdfs:label&gt;
&lt;/owl:DatatypeProperty&gt;
The extraction process can be improved in various ways. We have experimented with
data extraction from the detail page (which is usually linked to the summary page),
including OCR usage and a special technique based on text difference algorithm and
style sheet analysis for better attribute value extraction. Additionally it is possible to
employ the approximate regular expression matching, which allows to detect and
repair mistyped or mismatched attribute values.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantic Web infrastructure</title>
        <p>
          User preference mining is done locally and assumes the extracted data are stored in
middleware. Extracted data have to be modeled on an OWA (open world assumption)
model, and hence traditional database models are not appropriate. We are compatible
with a semantic web infrastructure described in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The storage is based on the ideas
of Data Pile described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. A typical schema of record resembles a RDF statement
with some statements about this statement (nevertheless we do not need reification
here).
        </p>
        <p>resource
Hotel1
Hotel1
attribute</p>
        <p>Price
Distance
value
V1
D1</p>
        <sec id="sec-2-3-1">
          <title>Extracted_from</title>
          <p>URL1.html
URL1.html</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Extracted_by</title>
          <p>Tool1
Tool1</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>Using_Ontology</title>
          <p>O1
O1</p>
          <p>If a value of an attribute is missing, for our middleware system it means that a
record is missing (thus implementing OWA). Note that we have records without any
uncertainty degree attached. Any application can evaluate it according to the
remaining values (e. g. it can be known that Tool1 is highly reliable on extracting
price, but less on distance).</p>
          <p>To know what we are looking for and which attribute values to extract we need to
know user interest. For middleware we moreover need to know the ordering of
particular attributes and the combination function.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Usage of user profiles as the user preference model</title>
        <p>One possibility to model user preferences is to use user profiles. We work with the
assumption that we have a set of user profiles P1,…,Pk and we know the ideal hotel
for each profile. These profiles may be created as the clusters of users or manually by
an expert in the field (a hotel-keeper in our example). Manual creation is more
suitable because we will know more details about user, but it is often impossible.
Independently of the way profiles are created, we have ratings of hotels associated
with each profile, thus knowing the best and worst hotels for that profile.</p>
        <p>We propose computing the distance di of user User1 profile U1 from each profile Pi
in following way</p>
        <p>∑ Rating(User1, o j ) − Rating(Pi , o j )
d i = j=1,...,n
n
(1)</p>
        <p>Equation (1) represents the average difference between the user’s rating of an
object oj and profile’s Pi’s rating.</p>
        <p>The ideal hotel for the user can be computed as an average of ideal hotels for each
profile Pi, weighted by the inverse of distance di (see (2)). The average is computed
on attributes of hotels. Formally,
150
ec100
i
r
P
50</p>
        <p>P
4
P
1</p>
        <p>Then, IdealHotel(User1) is the weighted centroid of profiles’ best hotels. An
example of data, user profiles’ best hotel and user’s best hotel is on Figure 5. User’s
best hotel is clearly closest to Profile 3.</p>
        <p>0
500</p>
        <p>1000</p>
        <p>Distance</p>
        <p>Fig. 5. Positions of best hotels for the user profiles and for the user</p>
        <p>After the computation of the ideal hotel for the user, we will use it for computing
ratings of remaining hotels. Disadvantage of this user model is that it cannot be used
in the Fagin threshold algorithm.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Uncertainty in User Preference Mining</title>
      <p>In our meaning, user preferences are expressed in the form of classification rules,
where the values of attributes are assigned their grades corresponding to the orderings
of the domains of these attributes. The higher the grade is, the more appropriate
(preferable) the value of an attribute is for the given user. This form of grading
corresponds to truth values well-known in fuzzy community and thus the orderings
correspond to fuzzy functions.</p>
      <p>
        The combination function can be represented by a fuzzy aggregation function (see
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). Fuzzy aggregation functions are monotone functions of n variables, with the
range of the unit interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] of real numbers (in practical applications we use only
a finite part of it).
      </p>
      <p>Main assumption of our learning of the user preferences is that we have a
(relatively small) sample of objects (hotels) evaluated by the user. We would like to
learn his/her preferences from this sample evaluation. The point is to use this learned
user preference to retrieve top-k objects from a much larger amount of data.
Moreover, using the user sample evaluation, we do not have to deal with the problem
of matching the query language and document language. These ratings are a form of
QBE – querying by example.</p>
      <p>
        There are many approaches to user modeling, one of the most used is collaborative
filtering method [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Our method is content based filtering – it uses information
about attributes of objects.
4.1
      </p>
      <sec id="sec-3-1">
        <title>Learning Local Preferences</title>
        <p>
          In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] we have described several techniques of learning user’s preferences
of particular attributes (UNC5) represented by fuzzy functions f1, f2, … on attribute
domains. These techniques use regression methods. A problem occurs here. There can
be potentially a big number of hotels of one sort (e.g. cheap ones) but the detection of
user preference (cheap, medium or expensive) should not be influenced by the
number of such hotels. Regression typically counts number of objects. We have
introduced a special technique of discretization to get the user’s true local preference
(for details see [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]).
        </p>
        <p>Another approach not using regression is the following. The view of the whole
domain of attribute Price is in Figure 6. We can see that with increasing price, the
rating is decreasing. This can be formalized (details are out of the scope of this paper)
number of objects</p>
        <p>[35,70)
[70,100)
rating
[100,150)</p>
        <p>price</p>
        <p>Fig. 6. Ratings for whole attribute domain
and we have experimented also with this possibility. These methods also give local
preference in the form of a fuzzy function (here small, cheap,…) and hence are usable
for Fagin Threshold algorithm.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Learning Combination Function</title>
        <p>
          Second assumption of the Fagin’s model [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is to have a combination function @,
which combines the particular attribute preference degrees f1, f2, … (local
preferences) to an overall score – @(f1, f2, …) - according to which the top-k answers
will be computed.
        </p>
        <p>
          There are several ways to learn (UNC6) the combination functions and several
models. It is an instance of classification trees with monotonicity constraints (see
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], more references to ordinal classification are presented).
        </p>
        <p>
          We learn the aggregation function by the method of Inductive Generalized
Annotated Programming (IGAP) described in [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ]. The result of IGAP is a set of
Generalized Annotated Program rules in which the combination function has a form
of a function annotating the head of the rule – here the quality of hotel:
User1_hotel(H) good in degree at least @( f1(x), f1(y), …)
IF User1_hotel_price(x) good in degree at least f1(x) AND
        </p>
        <p>
          User1_hotel_distance(y) good in degree at least f2(y)
Note that these are rules of generalized annotated programs [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Implementation and Experiments</title>
      <p>Our Web content mining system has a modular implementation which allows
additional modules to be incorporated (e. g. querying with preference-based
querying). Communication between modules is based on the traditional
Observer/Listener design pattern. All modules, which require communication with
other ones, have to implement a Listener interface. All listeners are bound to the
central Bus, which manages the communication between them. Each listener can
specify a range of broadcasted and received events, which will be supported by it.</p>
      <p>
        We proposed and implemented the middleware system for performing top-k
queries over RDF data. As a Java library, our system can be used either on the server
side, for example in a Web service, or on the client side. In both cases, it gathers
information from local or Web data sources and combines them into one ordered list.
To avoid reordering each time a user comes with different ordering, we have designed
a general method using B+ trees to simulate arbitrary fuzzy ordering of a domain ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
There are several implemented classes for standard user scoring functions, and Fagin
TA and NRA algorithms.
      </p>
      <p>Detailed description of experiments is out of the scope of this paper. We can
conclude that experiments have shown this solution is viable.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>Using an experimental implementation, in this paper we have identified several
uncertainty challenges, when
(UNC1) identifying HTML nodes with relevant information in the sub-tree,
(UNC2) tuning similarity measures for discovery of similar tag subtrees,
(UNC3) identifying single data records in non-contiguous html source,
(UNC4) extracting attribute values
(UNC5) learning user’s preferences of particular attributes
(UNC6) learn the user preference combination function.</p>
      <p>We have experimented with some candidate solutions.</p>
      <p>Models and methods in these experiments can be based on models of fuzzy
description logic (FDL).</p>
      <p>
        One possibility is to use a FDL with both concepts and roles fuzzified (see e. g.
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]). One problem of embedding FDL with fuzzy roles into OWL is that they consist
of subject, predicate, object and the fuzzy value. This cannot be directly modeled by
RDF data.
      </p>
      <p>
        Second possibility is to use a FDL where only concepts are fuzzified and roles
remain crisp (and hence both roles and fuzzy concepts can be modeled by RDF data).
One such example is fEL@ introduced in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>Acknowledgement. This work was supported in part by Czech projects 1ET
100300517 and 1ET 100300419 and Slovak projects VEGA 1/3129/06 and NAZOU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baumgartner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flesca</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
          </string-name>
          , G.:
          <article-title>Visual Web Information Extraction</article-title>
          . VLDB Conference,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bednárek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obdržálek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yaghob</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zavoral</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Data Integration Using DataPile Structure</article-title>
          ,
          <source>In: Proceedings of the 9th East-European Conference on Advances in Databases and Information Systems, ADBIS</source>
          <year>2005</year>
          , Tallinn,
          <source>ISBN 9985-59-545-9</source>
          ,
          <year>2005</year>
          ,
          <fpage>178</fpage>
          -
          <lpage>188</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Semantic Web</article-title>
          . In: Scientific American Magazine, May 2001
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <issue>4</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C-H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lui</surname>
            ,
            <given-names>S-L.</given-names>
          </string-name>
          :
          <article-title>IEPAD: Information extraction based on pattern discovery</article-title>
          .
          <source>WWW-10</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Mining Data Records in Web Pages</article-title>
          .
          <source>In: Proc S IGKDD</source>
          .
          <volume>03</volume>
          ,
          <string-name>
            <surname>August</surname>
          </string-name>
          24-
          <issue>27</issue>
          ,
          <year>2003</year>
          , Washington, DC, USA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eckhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pokorny</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vojtas</surname>
            ,
            <given-names>P..</given-names>
          </string-name>
          <article-title>A system recommending top-k objects for multiple users preferences</article-title>
          .
          <source>In: 2007 IEEE Conference on Fuzzy Systems, IEEE</source>
          <year>2007</year>
          ,
          <volume>1101</volume>
          -
          <fpage>1106</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Eckhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horváth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>PHASES: A User Profile Learning Approach for Web Search</article-title>
          .
          <article-title>Accepted as short paper for WI'</article-title>
          07
          <string-name>
            <given-names>Web</given-names>
            <surname>Intelligence</surname>
          </string-name>
          <string-name>
            <surname>Conference</surname>
          </string-name>
          ,
          <year>November 2007</year>
          , Fremont CA
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Eckhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horváth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning different user profile annotated rules for fuzzy preference top-k querying</article-title>
          .
          <source>Accepted for SUM'07 Scalable Uncertainty Management Conference</source>
          ,
          <year>October 2007</year>
          , Washington DC Area
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Embley</surname>
          </string-name>
          , D. W.,
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liddle</surname>
            ,
            <given-names>S. W.</given-names>
          </string-name>
          :
          <article-title>Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents</article-title>
          .
          <source>CIKM</source>
          <year>1998</year>
          ,
          <volume>52</volume>
          -
          <fpage>59</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fagin</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lotem</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naor</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimal Aggregation Algorithms for Middleware</article-title>
          .
          <source>In Proc. 20th ACM Symposium on Principles of Database Systems</source>
          ,
          <volume>102</volume>
          -
          <fpage>113</fpage>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Galamboš L.:
          <article-title>Dynamization in IR Systems</article-title>
          .
          <source>In: Proc. IIPWM</source>
          '
          <fpage>04</fpage>
          -
          <string-name>
            <surname>Intelligent Information</surname>
          </string-name>
          Processing And Web Mining, ed.
          <source>M. A. Klopotek</source>
          ,
          <year>Springer 2004</year>
          ,
          <volume>297</volume>
          -
          <fpage>310</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Galamboš</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Semi-automatic stemmer evaluation, ibid</article-title>
          .
          <fpage>209</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gurský</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horváth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novotný</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaneková</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>UPRE: User preference based search system</article-title>
          ,
          <source>2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)</source>
          ,
          <source>IEEE</source>
          <year>2006</year>
          , pp.
          <fpage>841</fpage>
          -
          <lpage>844</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Horváth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Ordinal Classification with Monotonicity Constraints</article-title>
          .
          <source>In. Proceedings of the 6th Industrial Conference on Data Mining (ICDM '06)</source>
          , Leipzig, Germany,
          <year>2006</year>
          : LNAI 4065, Springer,
          <year>2006</year>
          , ISBN 3-540-36036-0, p:
          <fpage>217</fpage>
          -
          <lpage>225</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kushmerick</surname>
          </string-name>
          , N.:
          <article-title>Wrapper induction: efficiency and expressiveness</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>118</volume>
          :
          <fpage>15</fpage>
          -
          <lpage>68</lpage>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Muslea</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            <given-names>C..:</given-names>
          </string-name>
          <article-title>A hierarchical approach to wrapper induction</article-title>
          .
          <source>Conf. on Autonomous Agents</source>
          ,
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Potharst</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feelders</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          :
          <article-title>Classification trees for problems with monotonicity constraints</article-title>
          .
          <source>In: ACM SIGKDD Explorations Newsletter archive Volume 4 , Issue</source>
          <volume>1</volume>
          (
          <year>June 2002</year>
          ): ACM Press,
          <year>2002</year>
          , p:
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Turtle</surname>
            ,
            <given-names>H. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W. B.:
          <article-title>Uncertainty in Information Retrieval Systems</article-title>
          .
          <source>In: Proc. Second Workshop Uncertainty Management and Information Systems:</source>
          From Needs to Solutions, Catalina, Calif.,
          <year>1993</year>
          <article-title>as quoted in S. Parsons. Current Approaches to Handling Imperfect Information in Data and Knowledge Bases</article-title>
          .
          <source>IEEE TKDE 8</source>
          ,
          <issue>3</issue>
          (
          <year>1996</year>
          )
          <fpage>353</fpage>
          -
          <lpage>372</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>EL description logic with aggregation of user preference concepts</article-title>
          .
          <source>In: Duží</source>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          et al. Eds.
          <article-title>Information modeling and Knowledge Bases XVIII</article-title>
          , IOS Press, Amsterdam,
          <year>2007</year>
          ,
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yaghob</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zavoral</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semantic Web Infrastructure using DataPile</article-title>
          ,
          <source>In: Proc. 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Imtelligent</source>
          Agent Technology, Butz,
          <string-name>
            <surname>C. J.</surname>
          </string-name>
          et al. (eds.),
          <source>IEEE</source>
          <year>2006</year>
          ,
          <volume>630</volume>
          -
          <fpage>633</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <article-title>W3C Document Object Model</article-title>
          . http://www.w3.org/DOM/
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <article-title>Charter of W3C Uncertainty Reasoning for the World Wide Web Incubator Group</article-title>
          , http://www.w3.org/2005/Incubator/urw3/charter
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <article-title>Wiki of W3C Uncertainty Reasoning for the World Wide Web XG Search:</article-title>
          http://www.w3.org/2005/Incubator/urw3/wiki/FrontPage
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>24. http://www.egothor.org/</mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Kifer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subrahmanian</surname>
            ,
            <given-names>V. S.</given-names>
          </string-name>
          :
          <article-title>Theory of generalized annotated logic programming and its applications</article-title>
          ,
          <source>J. Logic Programing</source>
          ,
          <volume>12</volume>
          (
          <year>1992</year>
          ) pp
          <fpage>335</fpage>
          -
          <lpage>367</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G..</given-names>
          </string-name>
          <string-name>
            <surname>Stamou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzouvaras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>The fuzzy description logic f-shin</article-title>
          .
          <source>Proceedings of URSW</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Vojtáš</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>EL description logic with aggregation of user preference concepts</article-title>
          .
          <source>M. Duží</source>
          et al. Eds.
          <article-title>Information modelling and Knowledge Bases XVIII</article-title>
          , IOS Press, Amsterdam,
          <year>2007</year>
          ,
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Collaborative Crawling: Mining User Experiences for Topical Resource Discovery</article-title>
          ,
          <source>IBM Research Report</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>