<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using ReGfinoeogGleooqguleeries UsiWngoWrdoNredtNgeltosGselosstsoesretfione Queries Jan Nemrava</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Economics</institution>
          ,
          <addr-line>Prague, W.Churchill Sq. 4, 130 67 Praha 3</addr-line>
          ,
          <country>Czech Republic Department of</country>
        </aff>
      </contrib-group>
      <fpage>85</fpage>
      <lpage>94</lpage>
      <abstract>
        <p>This paper describes one of the ways how to overcome some of the major limitations of current fulltext search engines. It deals with synonymy of the web search engine results by clustering them into relevant synonym category of given word. It employs WordNet lexical database and several linguistic approaches to classify results in search engine result page (SERP) in appropriate synonym category according to WordNet synsets. Some methods to refine the classification are proposed and some initial experiments and results are described and discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>text mining</kwd>
        <kwd>text classification</kwd>
        <kwd>web search engine</kwd>
        <kwd>WordNet gloss</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Fulltext search engines have recently become a basic tool for acquiring arbitrary
information from the World Wide Web. The amount of queries inserted into
Google rises rapidly and so does the number of indexed pages. ’To Google’
became a commonly used verb describing the act of searching any information on
the Internet. Nowadays, Google has an Internet domain in 135 world countries
and with its 88 language interfaces is a world most leading search engine. This
determines to use Google and other search engines as a most suitable tool for
an easy access to any kind of information from our desktop PC and makes the
proclaimed information society viable. Nevertheless, still there exist some
limitations that play an important role in searching information within a keyword
based search interfaces. One of the keyword-based web search major problems
is that people tend to insert too general queries (according to Search Engine
Journal [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in 2004 more than 50% of all queries inserted were one or two words
long), which leads to huge amount of returned hits to a given query. The way
how to deal with a huge amount of returned web pages is to arrange the results
according to their proper meaning using their synonyms or the word sense
disambiguation. The purpose of this paper is to describe some techniques how to
arrange returned web sites into appropriate synonym classes using large lexical
database WordNet 1 for discovering the synonyms and Hearst Patterns for
discovering is-a relations between the queried term and its possible superclass (i.e.
hypernym) concept.
      </p>
      <sec id="sec-1-1">
        <title>1 http://wordnet.princeton.edu/</title>
        <p>The structure of this paper is as follows: Section 2 describes our motivation,
section 3 contains description of all information sources that were used. Our goals
and techniques used for this approach according with a given examples, some
drawbacks and limitations are discussed in section 4. Before concluding, section
5 discusses some relevant work on this topic.
As it was stated in the Introduction, the problem of ambiguous queries presents
a strong limitation of current web search technology. There are already emerging
some query refinement techniques, which allow users to zoom into more specific
query, but most of the time they only provide a ”query modification” lists as a
single list without distinguishing between the real meanings of given word (e.g.
Ask Jeeves2). Another query refinement method recently introduced by leading
fulltext search engine is offering real time suggestions while the user is typing
in his query. One of the advantages is that the user sees the most suitable word
form for a particular search in the realtime (though the suggested word may
not be the grammatically or semantically best one, but it is the one that is</p>
      </sec>
      <sec id="sec-1-2">
        <title>2 http://www.ask.com</title>
        <p>used by the most of the users). Google Suggest3 is good example of this method.
To our knowledge there isn’t any fulltext search engine that would be able to
separate returned results according to their meanings. Some efforts can be seen
in Vivisimo4, but is not known in public.</p>
        <p>
          In this paper we would like to present approach that use existing dictionary and
glosses describing its concepts together with the largest text corpora available,
the Internet, to discover meanings that the word inserted can carry. This work
was inspired by Philipp Cimiano’s work on Pankow [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] system and the idea of
using heterogenous evidence for confirming is-a relation.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Information Sources</title>
      <p>
        In this section, we will describe the above mentioned techniques in detail. All
approaches used here are well known among the Semantic Web [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] community
for a long time. They are frequently used for ontology learning and creating is-a
relations and taxonomies. Namely they are:
– WordNet - large lexical database containing words ordered in synsets
(synonym sets).
– Hearst Patterns - technique exploiting certain lexico-syntactic patterns to
discover is-a relations between two given concepts.
– monothetic clustering - information retrieval technique used for grouping
documents according to specified feature.
– fulltext search engine - GoogleT M API interface.
      </p>
      <p>– NLP - natural language processing techniques.
3.1</p>
      <sec id="sec-2-1">
        <title>WordNet</title>
        <p>
          The main source of information is WordNet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. WordNet is a huge lexical
database containing about 150,000 words organized in over 115,000 synsets for
a total of 203,000 word-sense pair. Each word comes along with a short
description called a gloss. The glosses are usually one or two sentences long. Beside the
fact that all ordinary part of speech are present it contains nouns which are of
major importance for us, because one of them is most likely a super concept (a
hypernym) to the given word. This is a key idea of this paper.
        </p>
        <p>After a user inserts some proper noun, it is looked up in a WordNet and all
its meanings saved in WordNet are extracted together with their glosses. Each
synonym contains just one gloss. Each gloss is preprocessed and then labeled
by POS tagger. The preprocessing contains elimination of punctuation,
hyphenation and stop words. Next step is POS tagging and only nouns are kept and
saved as candidate nouns. Candidate nouns are words that can be potentially
selected as a hypernym for a given term.
3 http://www.google.com/webhp?complete=1</p>
        <sec id="sec-2-1-1">
          <title>4 http://www.vivisimo.com</title>
          <p>3.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Hearst Patterns</title>
        <p>
          Hearst patterns are lexico-syntactic patterns firstly used by M.A.Hearst[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] in
1992. These patterns indicate the existence of class/subclass relation in
unstructured data source, e.g. web pages. Examples of lexico-syntactic patterns that
were described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are following:
– NP0 such as NP1, NP2,. . .,NPn−1 (and | or) NPn
– such NP0 as NP1, NP2,. . .,NPn−1 (and | or) NPn
– NP1, NP2,. . .,NPn−1 (and | or) other NP0
– NP0 (including—especially) NP1, NP2,. . .,NPn−1 (and | or) NPn
– and very common ”N Pi is a N P0”
Hearst firstly noticed that from patterns above we can derive that for all NPi,
1 ≤ i ≤ n, hyponym(N Pi, N P0). Given two term t1 and t2 we are able to record
how many times some of these patterns indicate an is − a-relation between
given t1 and t2. Some normalizing techniques should be employed as some of the
patterns will likely occur more frequently than the others. Although Cimiano
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] noticed that Hearst patterns occur relatively rarely in closed corpus and as
described later, it is applicable also on Internet, their results provide valuable
information. The main drawback is that Google search does not offer to use
proximity operators and with the query requested as an exact match user must enter
exact order of the whole pattern. For example searching for pattern ”planets
such as Pluto, Neptune and Uranus” will provide about 50 results, while
”planets such as Pluto, Uranus and Neptune” won’t return any. The most powerful
pattern that we use for primary decisions is the ”N Pi is a N P0”.
3.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Clustering</title>
        <p>
          Associating documents to relevant category (synonym category in our case) is
a task very similar to a classic information retrieval task named by van
Rijsbergen[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] polythetic clustering, where documents’ membership to a cluster is
based on sufficient fraction of the terms that define the cluster. As stated in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
creating is-a relations is a special case of polythetic clustering where subclass
belongs only to one superclass and this means that the membership is based
only on one feature, called monothetic clusters.
        </p>
        <p>This alternative form of clustering has two advantages over the polythetic
variety. The first is the relative ease with which one can understand the topic
covered by each cluster. The second advantage of monothetic clusters is that
one can guarantee that a document within a cluster will be about that clusters
topic. None of this would be possible with polythetic clusters.
3.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Google API</title>
        <p>The world leading fulltext search engine provides direct access to its huge databases
through Google API5. It has limited daily number of queries and compared to</p>
        <sec id="sec-2-4-1">
          <title>5 http://www.google.com/apis</title>
          <p>HTML based interface it is relatively slow, but it provides easy access from
any programming language. Each query is responded in the same way as is the
HTML interface. User can get number of results, web page titles, links and
snippets (short description of web page based either on META tag description or
part of text with emphasized keywords). Our algorithm search for very specific
text patterns and we are interested only in aggregate number of results.
Next session describes application of above described information sources and
some initial results.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discovering the synonym classes</title>
      <p>It was already described in a section about WordNet, that certain nouns from
so called glosses are of our main interest. According to our observation glosses
mostly contain one noun that is a hypernym to the given concept. This is a core
prerequisite for our method as our aim is to find that hypernym noun among the
words in gloss. After some simple NLP methods are applied, we retrieve candidate
nouns for each gloss. What follows is a description of concrete situation that our
script has to deal with. The example is a term Pluto which can be found in three
different contexts according to WordNet. Pluto can be either a planet, a god or
a cartoon.</p>
      <p>– WordNet glosses for concept Pluto
- SYN 1 a small planet and the farthest known planet from the sun; has
the most elliptical orbit of all the planets
- SYN 2 (Greek mythology) the god of the underworld in ancient mythology;
brother of Zeus and husband of Persephone
- SYN 3 a cartoon character created by Walt Disney
– Candidate nouns for concept Pluto.</p>
      <p>- SYN 1 planet;sun;orbit;planets;
- SYN 2 Greek;god;underworld;mythology;brother;Zeus;husband;Persephone;
- SYN 3 cartoon;character;Walt;Disney;
– Patterns applied on SYN 1 - number of returned results is in brackets
- ”Pluto is a planet” (1550), ”Pluto is planet” (145)
- ”Pluto is a sun” (2), ”Pluto is sun” (0)
- ”Pluto is a orbit” (0), ”Pluto is orbit” (1)
- ”Pluto is a planets” (0), ”Pluto is planets” (0)
It is necessary to take into a consideration the total amount of web pages where
the words are mentioned and use this value to normalize the values.
w(i) = tf (i)/T C(i)
(1)
where i represents the i−th synonym class, tf is number of results for given
pattern and T C is number of web pages returned when querying two terms
without any constraints, it represents the popularity of the given pair of terms.
Candidate for the hypernym noun is then simply the highest value from all
synonymic class array.</p>
      <p>W = max(w(i))
(2)
This candidate noun needs to be validated and confirmed by another Hearst
patterns. The problem with a necessity of strict word order was mentioned in
previous session. We must cope with this problem in order to find another pattern
to validate the results from ”is a” step. Pattern NPn−1 and other NP0 was
chosen, because we predict its bias with strict word order to be the lowest among
all remaining patterns. In this pattern we had to deal with creating a plural form
of each candidate noun. Some simple rules were adopted, such as adding ”ies”
suffix at the end of the word when the last character is ”y” etc.. No language
exceptions were taken into consideration.</p>
      <p>– Patterns tested in a validation step (returned hits are in brackets)
- ”Pluto and other planets” (57)
- ”Pluto and other planet” (0)
- ”Pluto and other suns” (0)
- ”Pluto and other sun” (0)
- ”Pluto and other orbits” (0)
- ”Pluto and other orbit” (0)
- ”Pluto and other planetss” (0)
- ”Pluto and other planets” (57)</p>
      <p>Maximum value from the array is considered as hypernym noun. If both
patterns determine the same noun, it is considered as a hypernym noun. In the
opposite case some other techniques to confirm or reject this hypothesis should
be applied. The possibilities are discussed in last section. The process of
searching for the right hypernym noun is repeated for all synonym classes that were
given by WordNet. Next paragraph discusses some results that were gained on
a test set.</p>
      <p>The test set consisted of about 50 of proper nouns from space, travel and zodiac
area. At the beginning it was necessary to manually check whether all the words
from the test set are listed in WordNet. The result was that 96% (i.e. 48 from
50) proper nouns have their gloss in WordNet. Then the above described script
has been run on each of 50 test words. After all the tests has been carried out,
it was necessary to check the correspondence of the discovered hypernym with
the real world concepts.</p>
      <p>We discovered, that from the test set, 62% (31 words which contained 61
synonymic classes in total) were assigned with a hypernym correctly and they
corresponded to real life objects. 9 words and all their meanings were assigned
wrongly. The remaining 16% contained mistake in assigning some of the
synonym class. More detailed analysis of words that were incorrectly labeled can be
found in Table 2.</p>
      <p>Mining for other synonyms than those explicitly stated in WordNet would
definitely provide better results in some cases, on the other hand the certainty
of wrongly assigned hypernym noun would undoubtly rise.
4.1</p>
      <sec id="sec-3-1">
        <title>Results</title>
        <p>We tested a set of 50 proper nouns from several different areas such as astronomy
and zodiac. Some of these were chosen because they were tested with the
abovementioned PANKOW system. From these 50 test concepts with 92 synonyms in
total, we got precision 62 percent. The results were appropriate to estimations
and with regard to the fact, that this technique has been recently implemented
and is far from mature, we found them satisfying. There are several drawbacks
and suggestion for future work that will be discussed in this section and in the
conclusion.</p>
        <p>One of the drawbacks is the system speed which depends on Google API
responses which are quite slow recently. The average time to resolve one synonymic
class is about 50 seconds with average 20 Google queries per one synonym class.
Another objective drawback is the limitation of current Google web search
interface. It has no proximity operators and the query must be either inserted as
an exact match or connected with AND boolean operator. Besides these
technological problems there is also a limited amount of daily queries to one thousand
which is sufficient only to process about two tens of concepts, which currently
presents the main obstacle.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Related work</title>
      <p>
        This section discusses work related to exploitation of WordNet glosses to use
them with query refinements. Since word ambiguity presents an important
issue in Information Retrieval community, there has been a lot of efforts invested
to discover how to deal with the problem. The importance of disambiguated
words and concept further increased with introduction of ontologies as a core of
the so called Semantic Web. Nowadays, there is an enormous effort on this
research field. The most successful approaches so far, either reuse some knowledge
stored existing sources (exploiting Web directories structure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], dictionaries or
tagged corpuses) or make use of the inherited redundancy of information that
are present on Internet (e.g. Armadillo [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or KnowItAll [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Both of these
systems continually and automatically expands the initial given lexicon by learning
to recognize regularities in the large repositories, either internal regularity to a
single document or external across set of documents.
      </p>
      <p>
        Query refinement based on a concept hierarchies was discussed in for example
in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or by Kruschwitz in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Project that also use similar ideas to ours is
one called WordNet::Similarity [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It is a tool kit written in Perl implementing
several algorithms for measuring semantic similarity and relatedness between
WordNet concepts. Two of algorithms (lesk and vector measures in concrete)
uses WordNet glosses. Lesk finds overlaps between two given glosses to count
the relatedness of them. The vector measure creates a cooccurrence matrix for
each word used in the WordNet glosses from a given corpus, and then represents
each gloss/concept with a vector that is the average of these cooccurrence
vectors.
      </p>
      <p>
        Project that inspired this work is called PANKOW (Pattern-based Annotation
through Knowledge on the Web) and was created by Cimiano et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This
work focuses on application of Hearst patterns over a given ontology to discover
is-a relations solely from Internet. Some of the data tested in our paper were
actually taken from their work.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we presented an approach for discovering synonym classes of given
proper nouns. We used some freely accessible information sources and connected
them together to get new features for discovering meanings of given proper
noun. List of some commonly used proper nouns was collected and the proposed
method was tested with this list. From 50 test concepts with 92 synonyms in
total, we got precision 62 percent.</p>
      <p>It remains for further work to find out how to exploit the WordNet hierarchy and
involve glosses from class instances and subconcepts. Introducing another
validation pattern would definitely increase the precision of the system. So far, the
system can handle only single word queries. Handling more words queries and
deriving proper synonyms categories could be an interesting challenge. Another
task would be to implement a way how to deal with words and concepts not
included in WordNet. Cimiano’s PANKOW similar system might be beneficial
for this task.</p>
      <p>Although this application has certain drawbacks, we showed that the idea of
exploiting WordNet glosses for discovering certain facts about given concepts is
viable and with some improvements in speed and precision it could serve as a
helpful tool for unexperienced Internet users.</p>
      <sec id="sec-5-1">
        <title>ACKNOWLEDGEMENTS</title>
        <p>The author would like to thank to Vojtech Svatek for his comments and help.
The research has been partially supported by the FRVS grant no. 501/G1.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baker</surname>
            <given-names>L.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Search Engine Users Prefer Two Word Phrases</surname>
          </string-name>
          , Search Engine Journal http://www.searchenginejournal.com/index.php?p=
          <fpage>238</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            <given-names>O.:</given-names>
          </string-name>
          <article-title>The semantic web</article-title>
          .
          <source>Scientific American</source>
          , May
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cimiano</surname>
            <given-names>P.</given-names>
          </string-name>
          et al.:
          <source>Learning Taxonomic Relations from Heterogeneous Evidence</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Staab</surname>
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning by googling</article-title>
          .
          <source>SIGKDD Explor. Newsl. 6</source>
          ,
          <issue>2</issue>
          (Dec.
          <year>2004</year>
          ),
          <fpage>24</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ciravegna F</surname>
          </string-name>
          . et al.:
          <article-title>Learning to Harvest Information for the Semantic Web</article-title>
          ,
          <source>Proceedings of the 1st European Semantic Web Symposium</source>
          , Heraklion, Greece, May
          <volume>10</volume>
          -12,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Etzioni</surname>
            <given-names>O.</given-names>
          </string-name>
          et al.:
          <article-title>KnowItNow: Fast, Scalable Information Extraction from the Web</article-title>
          ,
          <source>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing</source>
          , p.
          <fpage>563</fpage>
          -
          <lpage>570</lpage>
          ,
          <year>October 2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fellbaum</surname>
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>WordNet, an electronic lexical database</article-title>
          , MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hearst</surname>
            <given-names>M. A.</given-names>
          </string-name>
          :
          <article-title>Automatic Acquisition of Hyponyms from Large Text Corpora</article-title>
          .
          <source>In Proceedings of the Fourteenth International Conference on Computational Linguistics</source>
          , pages
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          , Nantes, France,
          <year>July 1992</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kavalec</surname>
            ,
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Information Extraction and Ontology Learning Guilded by Web Directory</article-title>
          ,
          <source>Lyon</source>
          <volume>21</volume>
          .
          <fpage>07</fpage>
          .
          <year>2002</year>
          26.07.
          <year>2002</year>
          . In: AUSSENAC-GILLES, Nathalie,
          <string-name>
            <surname>MAEDCHE</surname>
          </string-name>
          , Alexander (ed.).
          <source>Workshop 16. Natural Language Processing and Machine Learning for Ontology Engineering</source>
          . Lyon : University Claude Bernard,
          <year>2002</year>
          , s.
          <volume>3942</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kruswitz</surname>
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Intelligent document retrieval : exploiting markup structure</article-title>
          , Dordrecht : Springer 2005, ISBN - 1-
          <fpage>4020</fpage>
          -3767-8
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Navigli</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velardi</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>27</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>1075</fpage>
          -
          <lpage>1086</lpage>
          ,
          <year>July 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Parent</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lytinen</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An adaptive agent for web exploration based on concept hierarchies</article-title>
          .
          <source>In Proceedings of the International Conference on Human Computer Interaction</source>
          . New Orleans, LA,
          <year>August 2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pedersen</surname>
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Wordnet::similarity - measuring the relatedness of concepts</article-title>
          .
          <source>In Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04)</source>
          ,
          <year>2004</year>
          . http://citeseer.ist.psu.edu/644388.html
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Porter</surname>
            <given-names>M.</given-names>
          </string-name>
          : Porter Stemmer Algorithm, [online], http://tartarus.org/~martin/PorterStemmer/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ratnaparkhi</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Adwait Ratnaparkhi's Research Interests</article-title>
          , [online], http://www.cis.upenn.edu/~adwait/statnlp.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Van Rijsbergen C.J.:</surname>
          </string-name>
          <article-title>Information retrieval (second edition), Chapter 3</article-title>
          ,
          <string-name>
            <surname>Butterworths</surname>
          </string-name>
          , London,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sanderson</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Deriving concept hierarchies from text,[online] citeseer</article-title>
          .ist.psu.edu/cimiano03deriving.html
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Weiss S.M</surname>
          </string-name>
          . et al.:
          <source>Text Mining - Predictive Methods for Analyzing Unstructured Information</source>
          . Springer,
          <year>2005</year>
          , ISBN 0-387-95433-3.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>