<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using WordNet Glosses to Refine Google Queries</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Jan</forename><surname>Nemrava</surname></persName>
							<email>nemrava@vse.cz</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Information and Knowledge Engineering</orgName>
								<orgName type="institution">University of Economics</orgName>
								<address>
									<addrLine>W.Churchill Sq. 4</addrLine>
									<postCode>130 67</postCode>
									<settlement>Prague, Praha 3</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Information and Knowledge Engineering</orgName>
								<orgName type="institution">University of Economics</orgName>
								<address>
									<addrLine>W.Churchill Sq. 4</addrLine>
									<postCode>130 67</postCode>
									<settlement>Prague, Praha 3</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using WordNet Glosses to Refine Google Queries</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F06BC38646C237B962ADDCA9301B932A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>text mining</term>
					<term>text classification</term>
					<term>web search engine</term>
					<term>WordNet gloss</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes one of the ways how to overcome some of the major limitations of current fulltext search engines. It deals with synonymy of the web search engine results by clustering them into relevant synonym category of given word. It employs WordNet lexical database and several linguistic approaches to classify results in search engine result page (SERP) in appropriate synonym category according to Word-Net synsets. Some methods to refine the classification are proposed and some initial experiments and results are described and discussed.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Fulltext search engines have recently become a basic tool for acquiring arbitrary information from the World Wide Web. The amount of queries inserted into Google rises rapidly and so does the number of indexed pages. 'To Google' became a commonly used verb describing the act of searching any information on the Internet. Nowadays, Google has an Internet domain in 135 world countries and with its 88 language interfaces is a world most leading search engine. This determines to use Google and other search engines as a most suitable tool for an easy access to any kind of information from our desktop PC and makes the proclaimed information society viable. Nevertheless, still there exist some limitations that play an important role in searching information within a keyword based search interfaces. One of the keyword-based web search major problems is that people tend to insert too general queries (according to Search Engine Journal <ref type="bibr" target="#b0">[1]</ref>, in 2004 more than 50% of all queries inserted were one or two words long), which leads to huge amount of returned hits to a given query. The way how to deal with a huge amount of returned web pages is to arrange the results according to their proper meaning using their synonyms or the word sense disambiguation. The purpose of this paper is to describe some techniques how to arrange returned web sites into appropriate synonym classes using large lexical database WordNet<ref type="foot" target="#foot_0">1</ref> for discovering the synonyms and Hearst Patterns for discovering is-a relations between the queried term and its possible superclass (i.e. hypernym) concept.</p><p>The structure of this paper is as follows: Section 2 describes our motivation, section 3 contains description of all information sources that were used. Our goals and techniques used for this approach according with a given examples, some drawbacks and limitations are discussed in section 4. Before concluding, section 5 discusses some relevant work on this topic. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Motivation</head><p>As it was stated in the Introduction, the problem of ambiguous queries presents a strong limitation of current web search technology. There are already emerging some query refinement techniques, which allow users to zoom into more specific query, but most of the time they only provide a "query modification" lists as a single list without distinguishing between the real meanings of given word (e.g. Ask Jeeves 2 ). Another query refinement method recently introduced by leading fulltext search engine is offering real time suggestions while the user is typing in his query. One of the advantages is that the user sees the most suitable word form for a particular search in the realtime (though the suggested word may not be the grammatically or semantically best one, but it is the one that is used by the most of the users). Google Suggest<ref type="foot" target="#foot_3">3</ref> is good example of this method. To our knowledge there isn't any fulltext search engine that would be able to separate returned results according to their meanings. Some efforts can be seen in Vivisimo<ref type="foot" target="#foot_4">4</ref> , but is not known in public. In this paper we would like to present approach that use existing dictionary and glosses describing its concepts together with the largest text corpora available, the Internet, to discover meanings that the word inserted can carry. This work was inspired by Philipp Cimiano's work on Pankow <ref type="bibr" target="#b3">[4]</ref> system and the idea of using heterogenous evidence for confirming is-a relation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Information Sources</head><p>In this section, we will describe the above mentioned techniques in detail. All approaches used here are well known among the Semantic Web <ref type="bibr" target="#b1">[2]</ref> community for a long time. They are frequently used for ontology learning and creating is-a relations and taxonomies. Namely they are:</p><p>-WordNet -large lexical database containing words ordered in synsets (synonym sets). -Hearst Patterns -technique exploiting certain lexico-syntactic patterns to discover is-a relations between two given concepts. -monothetic clustering -information retrieval technique used for grouping documents according to specified feature. -fulltext search engine -Google T M API interface.</p><p>-NLP -natural language processing techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">WordNet</head><p>The main source of information is WordNet <ref type="bibr" target="#b6">[7]</ref>. WordNet is a huge lexical database containing about 150,000 words organized in over 115,000 synsets for a total of 203,000 word-sense pair. Each word comes along with a short description called a gloss. The glosses are usually one or two sentences long. Beside the fact that all ordinary part of speech are present it contains nouns which are of major importance for us, because one of them is most likely a super concept (a hypernym) to the given word. This is a key idea of this paper. After a user inserts some proper noun, it is looked up in a WordNet and all its meanings saved in WordNet are extracted together with their glosses. Each synonym contains just one gloss. Each gloss is preprocessed and then labeled by POS tagger. The preprocessing contains elimination of punctuation, hyphenation and stop words. Next step is POS tagging and only nouns are kept and saved as candidate nouns. Candidate nouns are words that can be potentially selected as a hypernym for a given term.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Hearst Patterns</head><p>Hearst patterns are lexico-syntactic patterns firstly used by M.A.Hearst <ref type="bibr" target="#b7">[8]</ref> in 1992. These patterns indicate the existence of class/subclass relation in unstructured data source, e.g. web pages. Examples of lexico-syntactic patterns that were described in <ref type="bibr" target="#b7">[8]</ref> are following:</p><p>-NP 0 such as NP 1 , NP 2 ,. . .,NP n−1 (and | or) NP n such NP 0 as NP 1 , NP 2 ,. . .,NP n−1 (and | or) NP n -NP 1 , NP 2 ,. . .,NP n−1 (and | or) other NP 0 -NP 0 (including-especially) NP 1 , NP 2 ,. . .,NP n−1 (and | or) NP n and very common "N P i is a N P 0 " Hearst firstly noticed that from patterns above we can derive that for all NP i , 1 ≤ i ≤ n, hyponym(N P i , N P 0 ). Given two term t 1 and t 2 we are able to record how many times some of these patterns indicate an is − a-relation between given t 1 and t 2 . Some normalizing techniques should be employed as some of the patterns will likely occur more frequently than the others. Although Cimiano <ref type="bibr" target="#b2">[3]</ref> noticed that Hearst patterns occur relatively rarely in closed corpus and as described later, it is applicable also on Internet, their results provide valuable information. The main drawback is that Google search does not offer to use proximity operators and with the query requested as an exact match user must enter exact order of the whole pattern. For example searching for pattern "planets such as Pluto, Neptune and Uranus" will provide about 50 results, while "planets such as Pluto, Uranus and Neptune" won't return any. The most powerful pattern that we use for primary decisions is the "N P i is a N P 0 ".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Clustering</head><p>Associating documents to relevant category (synonym category in our case) is a task very similar to a classic information retrieval task named by van Rijsbergen <ref type="bibr" target="#b15">[16]</ref> polythetic clustering, where documents' membership to a cluster is based on sufficient fraction of the terms that define the cluster. As stated in <ref type="bibr" target="#b16">[17]</ref> creating is-a relations is a special case of polythetic clustering where subclass belongs only to one superclass and this means that the membership is based only on one feature, called monothetic clusters. This alternative form of clustering has two advantages over the polythetic variety. The first is the relative ease with which one can understand the topic covered by each cluster. The second advantage of monothetic clusters is that one can guarantee that a document within a cluster will be about that clusters topic. None of this would be possible with polythetic clusters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Google API</head><p>The world leading fulltext search engine provides direct access to its huge databases through Google API <ref type="foot" target="#foot_5">5</ref> . It has limited daily number of queries and compared to HTML based interface it is relatively slow, but it provides easy access from any programming language. Each query is responded in the same way as is the HTML interface. User can get number of results, web page titles, links and snippets (short description of web page based either on META tag description or part of text with emphasized keywords). Our algorithm search for very specific text patterns and we are interested only in aggregate number of results.</p><p>Next session describes application of above described information sources and some initial results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discovering the synonym classes</head><p>It was already described in a section about WordNet, that certain nouns from so called glosses are of our main interest. According to our observation glosses mostly contain one noun that is a hypernym to the given concept. This is a core prerequisite for our method as our aim is to find that hypernym noun among the words in gloss. After some simple NLP methods are applied, we retrieve candidate nouns for each gloss. What follows is a description of concrete situation that our script has to deal with. The example is a term Pluto which can be found in three different contexts according to WordNet. Pluto can be either a planet, a god or a cartoon.</p><p>-WordNet glosses for concept Pluto -SYN 1 a small planet and the farthest known planet from the sun; has the most elliptical orbit of all the planets -SYN 2 (Greek mythology) the god of the underworld in ancient mythology; brother of Zeus and husband of Persephone -SYN 3 a cartoon character created by Walt Disney -Candidate nouns for concept Pluto.</p><p>-SYN 1 planet;sun;orbit;planets; -SYN 2 Greek;god;underworld;mythology;brother;Zeus;husband;Persephone; -SYN 3 cartoon;character;Walt;Disney; -Patterns applied on SYN 1 -number of returned results is in brackets -"Pluto is a planet" (1550), "Pluto is planet" (145) -"Pluto is a sun" (2), "Pluto is sun" (0) -"Pluto is a orbit" (0), "Pluto is orbit" (1) -"Pluto is a planets" (0), "Pluto is planets" (0)</p><p>It is necessary to take into a consideration the total amount of web pages where the words are mentioned and use this value to normalize the values.</p><formula xml:id="formula_0">w(i) = tf (i)/T C(i)<label>(1)</label></formula><p>where i represents the i−th synonym class, tf is number of results for given pattern and T C is number of web pages returned when querying two terms without any constraints, it represents the popularity of the given pair of terms.</p><p>Candidate for the hypernym noun is then simply the highest value from all synonymic class array.</p><formula xml:id="formula_1">W = max(w(i))<label>(2)</label></formula><p>This candidate noun needs to be validated and confirmed by another Hearst patterns. The problem with a necessity of strict word order was mentioned in previous session. We must cope with this problem in order to find another pattern to validate the results from "is a" step. Pattern NP n−1 and other NP 0 was chosen, because we predict its bias with strict word order to be the lowest among all remaining patterns. In this pattern we had to deal with creating a plural form of each candidate noun. Some simple rules were adopted, such as adding "ies" suffix at the end of the word when the last character is "y" etc.. No language exceptions were taken into consideration.</p><p>-Patterns tested in a validation step (returned hits are in brackets) -"Pluto and other planets" (57) -"Pluto and other planet" (0) -"Pluto and other suns" (0) -"Pluto and other sun" (0) -"Pluto and other orbits" (0) -"Pluto and other orbit" (0) -"Pluto and other planetss" (0) -"Pluto and other planets" (57)</p><p>Maximum value from the array is considered as hypernym noun. If both patterns determine the same noun, it is considered as a hypernym noun. In the opposite case some other techniques to confirm or reject this hypothesis should be applied. The possibilities are discussed in last section. The process of searching for the right hypernym noun is repeated for all synonym classes that were given by WordNet. Next paragraph discusses some results that were gained on a test set. The test set consisted of about 50 of proper nouns from space, travel and zodiac area. At the beginning it was necessary to manually check whether all the words from the test set are listed in WordNet. The result was that 96% (i.e. 48 from 50) proper nouns have their gloss in WordNet. Then the above described script has been run on each of 50 test words. After all the tests has been carried out, it was necessary to check the correspondence of the discovered hypernym with the real world concepts. We discovered, that from the test set, 62% (31 words which contained 61 synonymic classes in total) were assigned with a hypernym correctly and they corresponded to real life objects. 9 words and all their meanings were assigned wrongly. The remaining 16% contained mistake in assigning some of the synonym class. More detailed analysis of words that were incorrectly labeled can be found in Table <ref type="table" target="#tab_1">2</ref>.</p><p>Mining for other synonyms than those explicitly stated in WordNet would definitely provide better results in some cases, on the other hand the certainty of wrongly assigned hypernym noun would undoubtly rise. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Results</head><p>We tested a set of 50 proper nouns from several different areas such as astronomy and zodiac. Some of these were chosen because they were tested with the abovementioned PANKOW system. From these 50 test concepts with 92 synonyms in total, we got precision 62 percent. The results were appropriate to estimations and with regard to the fact, that this technique has been recently implemented and is far from mature, we found them satisfying. There are several drawbacks and suggestion for future work that will be discussed in this section and in the conclusion.</p><p>One of the drawbacks is the system speed which depends on Google API responses which are quite slow recently. The average time to resolve one synonymic class is about 50 seconds with average 20 Google queries per one synonym class. Another objective drawback is the limitation of current Google web search interface. It has no proximity operators and the query must be either inserted as an exact match or connected with AND boolean operator. Besides these technological problems there is also a limited amount of daily queries to one thousand which is sufficient only to process about two tens of concepts, which currently presents the main obstacle.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Related work</head><p>This section discusses work related to exploitation of WordNet glosses to use them with query refinements. Since word ambiguity presents an important issue in Information Retrieval community, there has been a lot of efforts invested to discover how to deal with the problem. The importance of disambiguated words and concept further increased with introduction of ontologies as a core of the so called Semantic Web. Nowadays, there is an enormous effort on this research field. The most successful approaches so far, either reuse some knowledge stored existing sources (exploiting Web directories structure <ref type="bibr" target="#b8">[9]</ref>, dictionaries or tagged corpuses) or make use of the inherited redundancy of information that are present on Internet (e.g. Armadillo <ref type="bibr" target="#b4">[5]</ref> or KnowItAll <ref type="bibr" target="#b5">[6]</ref>). Both of these systems continually and automatically expands the initial given lexicon by learning to recognize regularities in the large repositories, either internal regularity to a single document or external across set of documents. Query refinement based on a concept hierarchies was discussed in for example in <ref type="bibr" target="#b11">[12]</ref> or by Kruschwitz in <ref type="bibr" target="#b9">[10]</ref>. Project that also use similar ideas to ours is one called WordNet::Similarity <ref type="bibr" target="#b12">[13]</ref>. It is a tool kit written in Perl implementing several algorithms for measuring semantic similarity and relatedness between WordNet concepts. Two of algorithms (lesk and vector measures in concrete) uses WordNet glosses. Lesk finds overlaps between two given glosses to count the relatedness of them. The vector measure creates a cooccurrence matrix for each word used in the WordNet glosses from a given corpus, and then represents each gloss/concept with a vector that is the average of these cooccurrence vectors. Project that inspired this work is called PANKOW (Pattern-based Annotation through Knowledge on the Web) and was created by <ref type="bibr">Cimiano et al. [4]</ref>. This work focuses on application of Hearst patterns over a given ontology to discover is-a relations solely from Internet. Some of the data tested in our paper were actually taken from their work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>In this paper we presented an approach for discovering synonym classes of given proper nouns. We used some freely accessible information sources and connected them together to get new features for discovering meanings of given proper noun. List of some commonly used proper nouns was collected and the proposed method was tested with this list. From 50 test concepts with 92 synonyms in total, we got precision 62 percent.</p><p>It remains for further work to find out how to exploit the WordNet hierarchy and involve glosses from class instances and subconcepts. Introducing another validation pattern would definitely increase the precision of the system. So far, the system can handle only single word queries. Handling more words queries and deriving proper synonyms categories could be an interesting challenge. Another task would be to implement a way how to deal with words and concepts not included in WordNet. Cimiano's PANKOW similar system might be beneficial for this task.</p><p>Although this application has certain drawbacks, we showed that the idea of exploiting WordNet glosses for discovering certain facts about given concepts is viable and with some improvements in speed and precision it could serve as a helpful tool for unexperienced Internet users.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. A context suggestion interface</figDesc><graphic coords="2,59.16,284.91,326.11,59.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Overall precision</figDesc><table><row><cell cols="2">Total number of words in list 50 (100%)</cell></row><row><cell>Words listed in WordNet</cell><cell>48 (96%)</cell></row><row><cell>Correct</cell><cell>39 (78%)</cell></row><row><cell>-completely correct</cell><cell>31 (62%)</cell></row><row><cell>-partially correct</cell><cell>8 (16%)</cell></row><row><cell>Wrong</cell><cell>9 (18%)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Statistics of wrongly discovered terms</figDesc><table><row><cell>Number of wrong instances</cell><cell>17 (100%)</cell></row><row><cell>Both patterns wrong</cell><cell>7 (41%)</cell></row><row><cell cols="2">"is a" correct, "and other" wrong 4 (23%)</cell></row><row><cell cols="2">"is a" wrong, "and other" correct 6 (35%)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Examples of negatively labeled synonyms.</figDesc><table><row><cell cols="3">Proper Noun "Is a" pattern "and other" pattern</cell></row><row><cell>Greenland</cell><cell>island</cell><cell>Arctic</cell></row><row><cell>Reykjavik</cell><cell>Iceland</cell><cell>Iceland</cell></row><row><cell>Kenya</cell><cell>Great</cell><cell>Great</cell></row><row><cell>Luxembourg</cell><cell>-</cell><cell>-</cell></row><row><cell>Luxembourg</cell><cell>city</cell><cell>city</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://wordnet.princeton.edu/ V. Snášel, K. Richta, J. Pokorný (Eds.): Dateso</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2006" xml:id="foot_1">, pp. 85-94, ISBN 80-248-1025-5.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">http://www.ask.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">http://www.google.com/webhp?complete=1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4">http://www.vivisimo.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5">http://www.google.com/apis</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGEMENTS</head><p>The author would like to thank to Vojtech Svatek for his comments and help. The research has been partially supported by the FRVS grant no. 501/G1.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Search Engine Users Prefer Two Word Phrases</title>
		<author>
			<persName><forename type="first">L</forename><surname>Baker</surname></persName>
		</author>
		<ptr target="http://www.searchenginejournal.com/index.php?p=238" />
	</analytic>
	<monogr>
		<title level="j">Search Engine Journal</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">The semantic web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lassila</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001-05">May 2001</date>
			<publisher>Scientific American</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Learning Taxonomic Relations from Heterogeneous Evidence</title>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Learning by googling</title>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explor. Newsl</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="24" to="33" />
			<date type="published" when="2004-12">Dec. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Learning to Harvest Information for the Semantic Web</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st European Semantic Web Symposium</title>
				<meeting>the 1st European Semantic Web Symposium<address><addrLine>Heraklion, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">May 10-12, 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">KnowItNow: Fast, Scalable Information Extraction from the Web</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2005-10">October 2005</date>
			<biblScope unit="page" from="563" to="570" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">WordNet, an electronic lexical database</title>
		<author>
			<persName><forename type="first">C</forename><surname>Fellbaum</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Automatic Acquisition of Hyponyms from Large Text Corpora</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hearst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourteenth International Conference on Computational Linguistics</title>
				<meeting>the Fourteenth International Conference on Computational Linguistics<address><addrLine>Nantes, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1992-07">July 1992</date>
			<biblScope unit="page" from="539" to="545" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Information Extraction and Ontology Learning Guilded by Web Directory</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kavalec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Svatek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop 16. Natural Language Processing and Machine Learning for Ontology Engineering</title>
				<editor>
			<persName><forename type="first">Aussenac-Gilles</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nathalie</forename><surname>Maedche</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Alexander</forename></persName>
		</editor>
		<meeting><address><addrLine>Lyon; Lyon</addrLine></address></meeting>
		<imprint>
			<publisher>University Claude Bernard</publisher>
			<date type="published" when="2002">21. 2002. 2002. 2002</date>
			<biblScope unit="page">3942</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Intelligent document retrieval : exploiting markup structure</title>
		<author>
			<persName><forename type="first">U</forename><surname>Kruswitz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Dordrecht</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Velardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="1075" to="1086" />
			<date type="published" when="2005-07">July 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An adaptive agent for web exploration based on concept hierarchies</title>
		<author>
			<persName><forename type="first">S</forename><surname>Parent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mobasher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lytinen</forename><forename type="middle">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Human Computer Interaction</title>
				<meeting>the International Conference on Human Computer Interaction<address><addrLine>New Orleans, LA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001-08">August 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Wordnet::similarity -measuring the relatedness of concepts</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pedersen</surname></persName>
		</author>
		<ptr target="http://citeseer.ist.psu.edu/644388.html" />
	</analytic>
	<monogr>
		<title level="m">Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04)</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Porter Stemmer Algorithm</title>
		<author>
			<persName><forename type="first">M</forename><surname>Porter</surname></persName>
		</author>
		<ptr target="http://tartarus.org/~martin/PorterStemmer/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Ratnaparkhi</surname></persName>
		</author>
		<ptr target="http://www.cis.upenn.edu/~adwait/statnlp.html" />
		<title level="m">Adwait Ratnaparkhi&apos;s Research Interests</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Van Rijsbergen</surname></persName>
		</author>
		<title level="m">Information retrieval (second edition), Chapter 3, Butterworths</title>
				<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Deriving concept hierarchies from text</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sanderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Croft</surname></persName>
		</author>
		<ptr target="citeseer.ist.psu.edu/cimiano03deriving.html" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Text Mining -Predictive Methods for Analyzing Unstructured Information</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Weiss</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005">2005</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
