=Paper=
{{Paper
|id=None
|storemode=property
|title=
	      HNews: An Enhanced Multilingual Hyperlinking News Platform
	  
|pdfUrl=https://ceur-ws.org/Vol-704/29.pdf
|volume=Vol-704
|dblpUrl=https://dblp.org/rec/conf/iir/CaoPB11
}}
==
	      HNews: An Enhanced Multilingual Hyperlinking News Platform
	  ==
<pdf width="1500px">https://ceur-ws.org/Vol-704/29.pdf</pdf>
<pre>
            HNews: an enhanced multilingual
              hyperlinking news platform

              Diego De Cao, Daniele Previtali, and Roberto Basili

                   University of Roma Tor Vergata, Roma, Italy
                  {decao,previtali,basili}@info.uniroma2.it


      Abstract. In this paper, we describe the HNews platform, a Web-based
      system addressing the general problem of aggregating and enriching news
      from different sources and languages. In the indexing stage, the news
      items gathered from RSS feeds or video streams are analyzed through
      Information Extraction tools. Their topical category information and
      the Named Entities mentions are recognized and used to create seman-
      tic metadata so to enrich the information available for each news item.
      Moreover, a robust unsupervised Word Sense Disambiguation algorithm
      is applied to the available texts that are thus further semantically anno-
      tated. This is used to align news items in different languages, such as Ital-
      ian and English, and support cross-lingual search. As a result, advanced
      search features, such as cross-lingual or typed entity-based queries, are
      enabled in HNews. In this paper, we also present the browser, making
      use of a spatial metaphor for the arrangement of the retrieved news. It
      enables to capture different aspects such as the ”semantic” similarity
      among news, or the timeliness of individual news items as well as their
      relevance with respect to an incoming user query.


1   Introduction
As globalization emerges, information access across language boundaries is be-
coming a critical issue. The World Wide Web has become accessible to more
and more countries and technological advances overcome the network, interface
and computer system differences which are constraints to information access. In
particular the World Wide Web is becoming a major ”media” for news delivery
(e.g. broadcasting) and content creation. Consequently, it has now an increasing
appeal for users the ability to search news from different sources, media and in
different languages. The application of Information Retrieval techniques to the
problems raised by news aggregation in such heterogeneous scenarios is becom-
ing a crucial technological challenge. Currently, the major technology enabling
the information access across different sources is represented by the News Aggre-
gator software. Aggregators reduce the time and effort needed to regularly check
websites for updates as well as for creating a unique integrated information space
or a ”personal newspaper”. Correspondingly, every aggregator has explored the
way of integrate some Information Retrieval capability in order to reduce the
effort to satisfy real user information needs.
    Research on ad hoc retrieval systems focus on a variety of methods, these
ranging between strongly lexicalized, statistical methods for relevance model-
ing to more semantic-oriented techniques, usually based on deeper levels of text
analysis and language processing paradigms (such as parsing or semantic disam-
biguation processes). In the former, so-called shallower, approaches, documents
are retrieved through simple matching mechanisms between texts and queries;
ranking according to relevance is the side effect of statistical models of terms co-
occurrences in texts. Semantic approaches attempt to exploit at a certain degree
the syntactic and semantic information made available by linguistic analysis. In
the attempt of reproducing some levels of text understanding, a meaning surro-
gate of the input text is obtained including also semantic (sometime syntactic)
indexes. These are metadata that restrict the potential interpretations of the
texts and are supposed to improve the retrieval accuracy. For example, a smaller
number of false positives are expected, as constraints at the semantic level can be
imposed to re-rank candidates documents. The technology that supports the de-
tection, disambiguation and formalization of meaningful information, nowadays
called semantic metadata, from unstructured texts is Information Extraction
(hereafter IE), as it has been studied since early 90’s ([1]).
    Notice that the availability of semantic information is useful in particular
in cross-linguistic scenarios, whereas strongly lexicalized statistical methods are
not of much help: they in fact cannot be used to retrieve documents that are
expressed in a language different from the one used to query the IR system. In
these cases, beyond merely accepting extended character sets and performing
language identification, the information retrieval systems should also be able to
provide help in locating information across the language boundaries. Moreover
access to distributed information is complex also due to the heterogeneity of
the sources and to diversity of interests, expectations and purposes of the target
retrieval processes. Heterogeneity here characterizes:
 – Data typologies, as sources of information are characterized by different me-
   dia and content types.
 – Data formats, as even the same content can be made available through a
   media according to different levels of granularity and quality. Formats may
   highly vary across and within archives.
 – Contents, as the source information is not characterized by a specific knowl-
   edge domain but is spread across heterogeneous semantic dimensions.
 – Languages, as even an individual structured archive may well include docu-
   ments expressed in different natural languages.
    The major media channel for news is represented by television. In previous
work, the problem of extracting of semantic metadata from broadcasted TV and
radio news has been discussed and a corresponding system, RitroveRAI, is pre-
sented [2]. It makes use of human language technologies for IE over multimedia
data (i.e. speech recognition and grammatical analysis of incoming news). The
HNews system, presented in this paper, represents an extension of RitroveRAI,
as it integrates the indexing of video news with the gathering and annotation
of news from different Web sources. News derived from newspaper portal in the
Web are characterized by texts that are less noisy than speech transcriptions. As
a consequence, in HNews, a set of different natural language processing modules
is applied and a comprehensive enrichment of individual news through semantic
metadata is obtained. The next section presents the overall structure of the ap-
plied indexing process, while Section 3 describes the search interface. The section
4 concludes this work by discussing applications and extension of the system.


2   Enriching Web news through semantic metadata

RSS is a family of web feed formats used to publish frequently updated news
contents in a standardized format and is adopted by of the web newspapers in
publishing timely updates. The HNews platform exploits these news aggregator
standards, and collect news updates from independent RSS feed sources. How-
ever, as rich metadata are required to improve the quality of the retrieval process,
the limitations of current RSS protocols must be carefully handled in a system
such as HNews. The idea of applying IE to contents requires that the RSS-
supported data gathering process is followed by an in-depth analysis according
to a family of advanced natural language processing tools. Unfortunately, the
RSS feed of most newspapers makes available only a summary of individual
news, on which a too small scale linguistic analysis is possible. The summary
in fact is usually very short and insufficient to perform accurate extraction (e.g.
sense disambiguation is more complex when shorter contexts are targeted). As
an extension of a classical news aggregator, HNews provide crawling capabilities,
use RSS links to access the corresponding complete news contents, i.e. full Web
articles pages. A specific RSS processor for individual newspaper sources has
been developed at this purpose.
    Once the full textual content is made available, a cascade of NLP tools is ap-
plied to extend contents with semantically rich metadata. The most interesting
among these metadata is the set of entities, such as the places, locations, orga-
nizations and temporal expressions mentioned in the news item. Accordingly, a
statistical Named Entity Recognition module is applied to detect and compile
lists of NEs (i.e. persons, cities, companies) that will be part of the metadata
related to the targeted item. The applied NER tool is further described in Sec-
tion 2.1). A relevant information used to organize and retrieve news items is the
editorial class, i.e. the topical category corresponding to the news content. While
these are usually made available by the different providers through the RSS for-
mat itself, the reference classification scheme adopted by the various sources is
highly varying and differs from one another. In order to determine a unified
and comparable scheme, news from different sources are classified by the HNews
system into a set of predefined editorial categories, inspired by previous work
on this problem [2]. The supervised statistical classification process adopted in
HNews is described in section 2.2. Further analysis is carried out to disambiguate
sense of relevant words through a Word Sense Disambiguation (WSD) stage, as
discussed in section 2.3. WSD here is applied especially to natural language
queries in order to support cross-linguistic search. Finally, the comprehensive
set of information about an underlying news item (i.e. title, text content, named
entities, topical category and words senses) are indexed through a well-known
engine (i.e. Lucene [3]). The above process is applied to the textual components
of Web pages, although according to a separate independent workflow (as dis-
cussed in the description of the RitroveRAI system [2]), HNews is also able to
index TV broadcasted news, whenever the segmented news and their speech
transcriptions are made available (see for example, [4]) according to the work-
flow described in [2]. The rich form of semantic metadata in HNews allows to
integrate semantically typed information in order to make heterogenous sources
and different languages coexist and support flexible forms of querying and con-
ceptual aggregation. While Section 3 will discuss the navigation capabilities and
the resulting information space made available by the HNews dedicated GUI,
the rest of this section will describe the main HLT processing stages applied by
HNews.

2.1   Statistical Named Entity Recognition
The Named Entity Recognition (NER) task aims at the identification of all
named locations, persons, organizations as well as dates, times, monetary amounts
or other numerical expressions that appear in free forms in a text. NER is a
crucial step for the enrichment proposed by HNews as it highly improves the
performance of news aggregation. A reference system for NER is certainly Iden-
tiFinder [5], that makes use of a variant of a Hidden Markov model to identify
names, dates or numerical quantities. In the original proposal, states of the HMM
are designed to correspond to the above classes. There is a conditional state for
”not a token class”. Each individual word is assumed to be either part of a
specific pre-determined class or not part of any class. According to the defini-
tion of the task, one of the class labels or the label that represent ”none of the
classes” is assigned to every word. IdentiFinder uses word features, which are
language-dependent, such as capitalization, numeric symbols and special char-
acters, because they give good evidence for identifying tokens. A version of an
HMM-based NER has been designed at our Lab, and it has been trained against
annotated Web material in Italian for the major categories of people, locations,
organisations and dates. The resulting NER module is applied in the HNews
workflow both for news and query processing. Moreover, given the extremely
noisy nature of speech transcriptions, two different HMM-based recognizers are
adopted against the texts and the segments of TV broadcasted. While perfor-
mances close to 87% accuracy are obtained over standard textual input, a not
negligible performance drop is observed over transcribed speech material, where
70% is the reachable accuracy on average.

2.2   News Categorization
Text categorization has been traditionally modeled as a supervised machine
learning task [6]. In HNews, a simple yet efficient model, i.e. Rocchio, is ap-
plied. The model, described in [7] is a profile based classifier, where a specific
cross validation process allows to optimize at individual class level and obtain
performance close to state of the art systems (e.g. Support Vector Machines).
Given the set of training document Ri , classified under the topics Ci (positive
examples), the set Ri of the documents not belonging to Ci (negative examples)
and given a document dh and a feature f , the Rocchio model [8, 7] defines the
weight Ωf of f in the profile of Ci as:
                                                     
                             β   X        γ    X     
                 Ωfi = max 0,       ωfh −         ωfh                            (1)
                           |Ri |         |Ri |       
                                      dh ∈Ri            dh ∈Ri


where ωf h is the weight of the feature f in the document dh . In equation 1,
the parameters β and γ control the relative impact of positive and negative
examples and determine the weight of f in the i-th profile. In [8], values β=16,
γ=4 have been first proposed for the categorization of low quality images. These
parameters indeed greatly depend on the training corpus and different settings
of their values produce significant performance variations.
    Notice that, in equation 1, features with negative difference between positive
and negative relevance are set to 0. This represents an elegant feature selection
method: the 0-valued features are irrelevant in the similarity estimation. As a
result, the remaining features are optimally used, i.e. only for classes for which
they are selective. In this way, the minimal set of truly irrelevant features (those
having a weight of 0 for all the classes) can be better captured and removed.
    In [7], a modified Rocchio model is presented that makes use of a single
parameter γi as follows:
                                                                 
                                   1  X            γi  X         
                  Ωfi = max 0,              ωfh −             ωfh                (2)
                              |Ri |              |Ri |           
                                      dh ∈Ri            dh ∈Ri


    Moreover, a practical method for estimating the suitable values of the γi
vector has been introduced. Each category in fact has its own set of relevant and
irrelevant features and equation 2 depends on γi , for each class i. Now, if we
assume the optimal values of these parameters can be obtained by estimating
their impact on the classification performance, nothing prevents us from deriving
this estimation independently for each class i. This results in a vector of γi
each one optimizing the performance of the classifier over the i-th class. The
estimation of the γi is carried out by a typical cross-validation process. Two
data sets are used: the training set (about 70% of the annotated data) and a
validation set (about 30% of the remaining data). First the categorizer is trained
on the training set, where feature weights (ωfd ) are estimated. Then profile vectors
Ωfi for the different classes are built by setting the parameters γi to those values
optimizing accuracy on the validation set. The resulting categorizer is then tested
on a separate test set. Results on the Reuters benchmark are about 85%, close
to state-of-art more complex classification models ([7, 9]). Extensive discussion
on the performances reached over different benchmarks is reported in ([7]).
                   Fig. 1. Categorization of Web news in HNews


    A typical example of the obtained results is reported in Figure 1, that shows
the results of a retrieval session in HNews: the third column reports the topical
classification of each retrieved news item. While the last two entries are derived
from the ”Corriere della Sera” (CdS) portal, and their topic label is already avail-
able, i.e. ”Categoria: Politica”, the first originates from Ansa, and it is missing
of any topic label. Column 3 in the Figure reports the automatically labels as-
signed by the HNews classifier, that in the last two cases confirms the original
CdS classification. Notice how while the first two news are dealing with similar
topics, their focus is different and this is very well reflected by the classifier.

2.3   Applying Word Sense Disambiguation for Query expansion in
      CLIR
Lexical ambiguity is a fundamental aspect of natural language. Word Sense Dis-
ambiguation (WSD) investigates methods to automatically determine the in-
tended sense of a word in a given context, according to a predefined set of sense
definitions. These are usually provided by a reference semantic lexicon. The im-
pact of WSD on IR tasks is still an open issue and large scale assessment is
needed. Unsupervised systems are certainly very interesting as for their appli-
cability to non English, i.e. resource poor, languages. While state-of-art systems
are usually supervised, their porting to other languages is mostly expensive as
large scale resources are needed. For this reason, unsupervised approaches to
inductive WSD are very appealing. In the framework of the HNews architecture,
we adopt a network based model of WSD based on WordNet, as discussed in [10].
In [11], a variant of the the PageRank algorithm, called personalized PageRank
is presented. WordNet is assumed as a network of senses and a random walk
model of its links is defined. Then, sentences, or entire texts, are used to initial-
ize the state of the WordNet network, and the stable state of its ”random walk”
is assumed as the posterior statistics across the senses of the targeted words.
While the approach can be applied either to individual words or to entire sen-
tences, in [10] it has been shown that a distributional approach can improve the
personalized PageRank disambiguation algorithm, both in accuracy and time
complexity. The initial state is obtained as determined by a topical expansion of
individual sentences through the use of Latent Semantic Analysis [12]: sentence
is firstly mapped into a vector in the LSA space and the closest words to the
vector are retained and added to the sentence terms. The initialization of the
network with this expanded lexicon provides the resulting PageRank to converge
first and to more accurate sense distributions. Details of the technique are dis-
cussed in [10]. In [10] a detailed evaluation of the adopted system for English is
reported, moreover in [13] we have reported evaluation of the applicability of the
WSD system to the Italian language. Results over the Senseval ’07 benchmark
are about 71,5% of F1 for the English. Instead, the Evalita ’07 benchmark is
used to evaluate the Italian language reaching about 52% of F1. Major lack is
due to the difference of the WordNet version used between English and Italian
language. For its application into HNews, a large collection of Web news, con-
sidered as the specific domain corpus, has been used to derive the LSA model
where distributional evidence is represented. We first built the classical vector
space model and then Latent Semantic Analysis is applied. Notice that the topi-
cal similarity across news are able to better characterize typical contents for the
suitable word senses of all terms in a news item. When a sentence, a document or
a query is input, first it is expanded with the set of its closest words in the LSA
space. The expanded query is then used to trigger the personalized PageRank
that provides the final preferences for senses of individual terms, e.g. the nouns,
verbs or adjectives used in a query.


Fig. 2. CLIR in HNews: Italian and English news in responde to a query in English

    While senses can be part of the document index, the very interesting aspect
that characterizes the adoption of WSD in HNews is that it is an enabling factor
for Cross Language Information Retrieval (CLIR). In fact, although WordNet
[14] was developed for the English language, several versions for other languages,
aligned with the English sense hierarchy, have been developed. MultiWordNet
[15] is a WordNet for Italian that is strictly aligned with Princeton WordNet
1.6, at the synset level. A large number of English synsets are in fact put in
correspondence with an Italian synset: words in this latter are thus synonyms
each other while being specific ”translations” of the words in the source English
synset. The application of our WSD model to CLIR is thus straightforward. First,
an English query is processed and its terms and Named Entities are extracted.
Then nouns and verbs are disambiguated through our personalized PageRank
and their WordNet senses are detected. Finally a translated query in Italian
is obtained. It includes the original (language neutral) Named Entities as well
as the Italian words obtained from the MultiWordNet synsets corresponding
to the selected senses of English words. In this way two versions (English and
Italian) of the same query are obtained and documents written in both languages
can be returned. In Figure 2, the response to the English query ”The death of
Arafat, leader of Palestina” is shown whereas the first hits include both news
in Italian and English. Notice that scores are able to separate well meaningful
from irrelevant news.


3   The retrieval interface

In a general IR scenario, the user interface must be able to support the user
to submit queries to the system and trigger the navigation or browsing of the
returned documents. An example of the browser interface is shown in Figure 3.


                         Fig. 3. HNews search interface


   The interface is composed by five different frames, i.e. the top one, three in
the middle and the footer one (see the red boxes in Figure 3). The top frame
provides the query interface, were user can edit its queries as

 – simple expressions, i.e. bag of keywords
 – short texts in two languages that are analyzed by HLT tools
 – boolean combination of simple queries

   A variable set of constraints, i.e. individual simple queries, can be designed
by the user, according to the different type of metadata considered. Relevant
fields of the indexes range in fact from the Topical Category, to full text content,
or Named Entities. The query shown in Fig. 3 is the expression

        (Content: "Roma in Campionato" AND Person:"Ranieri" AND
                         Organisation:"Roma")

that is ”Find all news that discuss the Roma team in the football league and
Ranieri, i.e. the current coach. It is also possible to specify whether one individual
condition must or can be satisfied adding some further flexibility in the boolean
combination of individual constraints (Fig. 3).
    The middle frame includes three individual frames. In the central one the
returned results for a query are shown, as already seen in Fig. 1. On the user
click any result triggers multiple visualization actions. The middle left frame is
used to show the video or the photos related to the selected news item. The
middle right frame shows the metadata related to individual news items, such
as publication dates and times, Web source or editorial category. The bottom
frame, at the footer, changes according to individual selections in the returned
news and it shows their textual contents.
    Whenever an interesting news item has been found, the user can browse the
Web, as links to the originating pages (at the source RSS portal) are made avail-
able. Moreover, the HNews platform also allows to support a spatial metaphor
where an information space is shown to the user, centered in the selected news
item in the set of the returned answers. The space is obtained through the men-
tioned Latent Semantic Analysis of the underlying collection. It aims at captur-
ing the semantic relatedness between news (either Web or multimedia) as well
as between news and Named Entities. A graph local to one selected news item
can be in fact obtained by retrieving all news closer to it in the LSA space. The
graph also expresses arcs among these news whenever close (i.e. similar) enough
news pairs are involved. An example of the spatial view is shown in Figure 4. In
the navigation tool, different visual layers are made available to capture different
useful information:

 – Every link represents the similarity of a news pair in the LSA space1 , so
   that the closer two news the more relevant they are for the same queries.
 – Different font sizes are used to discriminate timeliness: the more recent a
   news item is, the larger its font will be. The resulting zooming effect tries to
   compensate coverage with timeliness: very old news will not limit visualiza-
   tion of more recent material, although being retrieved and shown
 – Colors from green to red are used to represent relevance of a news item for
   an input query: green expresses more relevant, i.e. better, responses, while
   red is used for the worse ones.
 – Shapes discriminate semantic types. While news (i.e. textual objects) are
   shown as colored boxes, ellipses are used to represent entities, and their se-
   mantic types, such as person vs. organizations. Notice that typed entities
1
    The Latent Semantic Analysis is the same performed for the WSD step and described
    in section 2.3
    are represented in the LSA space as much as documents, so that their po-
    sitioning in the network and their similarities are seemingly computed and
    depicted.

    The resulting graphs have the desirable side effect of expressing a global
view on the information space. Links express distances in the LSA space and
this implies that news naturally organize into visual clusters, usually made of
topically related materials. In the example in Figure 4, news aggregates form
two major regions: the bottom one concerning mostly stories related to the
”economics” class, while the upper right one concerns mainly ”sport” topics.
    As presented in section 2.3, the system is also able to search news in different
languages as side effects of word sense disambiguation. In Figure 2, it is shown
how a query can be submitted in English through a specific parameter (CLIR:
en -> it) of the query frame (i.e. the upper frame). We already discussed the
returned results that also include news from the Italian channels such as Corriere
della Sera or Ansa. Notice that also the viceversa (i.e. query in Italian and
documents in English) is currently supported.


4   Conclusions

In this paper ,the HNews system for semantic annotation and indexing of news
from Web newspapers or TV broadcasts has been presented. A complex fam-
ily of language processing tools is adopted for Information Extraction, i.e. the
automatic recognition of different semantic information. Different models and
approaches are exploited in HNews to extract rich metadata, such as Named En-
tities, topical categories or word senses. The resulting platform is also supporting
cross-lingual queries and advanced boolean combinations of simple queries acting
over texts and metadata. In particular, two languages are currently supported:
Italian and English. News are downloaded from the Web on a continuous basis.
Indexing proceeds from RSS feeds, so that published materials are available in
almost real time. One of the mostly innovative aspects of the HNews system
with respect to previous experiences in this area (e.g., the Prestospace system,
[2]) is the browsing modalities offered, that integrates the search and semantic
navigation functionalities. A quantitative model of semantic similarity is in fact
defined over the rich set of metadata and connected graphs of news items and
Named Entities in the information space are correspondingly obtained. These are
quite effective to quickly focus on the information of highest relevance/interest,
as their conceptual, rather than mere textual, nature is made explicit in the
graph. HNews is the basis for future developments targeted to the support to
the creation of a community of readers but also producers of news and other
contents. In this way, the HNews portal would be directly reusable to support
the gathering of user-generated contents. The exploitation of these latter for de-
veloping models of large scale, realistic opinion mining processes is the focus of
future research enabled by the HNews system.
Fig. 4. HNews news navigator
References
 1. Pazienza, M.T., ed.: Information Extraction: A Multidisciplinary Approach to an
    Emerging Information Technology, International Summer School, SCIE-97, Fras-
    cati, Italy, 14-18, 1997. In Pazienza, M.T., ed.: SCIE. Volume 1299 of Lecture
    Notes in Computer Science., Springer (1997)
 2. Basili, R., Cammisa, M., Donati, E.: Ritroverai: A web application for semantic
    indexing and hyperlinking of multimedia news. [9] 97–111
 3. Gospodnetić, O.: Advanced Text Indexing with Lucene. O’Reilly Media (2003)
 4. Messina, A., Boch, L., Dimino, G., Bailer, W., Schallauer, P., Allasia, W., Groppo,
    M., Vigilante, M., Basil, R.: Creating rich metadata in the tv broadcast archives
    environment: The prestospace project”. In: Proceedings of the AXMEDIS Confer-
    ence. (2006)
 5. Bikel, D., Schwartz, R., Weischedel, R.: An algorithm that learns what’s in a name.
    Machine Learning Journal (1999)
 6. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput-
    ing Surveys 34(1) (2002) 1–47
 7. Basili, R., Moschitti, A., Pazienza, M.: Nlp-driven ir: Evaluating performance over
    a text classification task. In: Proceeding of the 10th ”International Joint Conference
    of Artificial Intelligence” (IJCAI 2001), Seattle, Washington, USA (2001)
 8. Ittner, D.J., Lewis, D.D., Ahn, D.D.: Text categorization of low quality images.
    In: Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and
    Information Retrieval. (1995)
 9. Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A., eds.: The Semantic Web - ISWC
    2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland,
    November 6-10, 2005, Proceedings. In Gil, Y., Motta, E., Benjamins, V.R., Musen,
    M.A., eds.: International Semantic Web Conference. Volume 3729 of Lecture Notes
    in Computer Science., Springer (2005)
10. De Cao, D., Basili, R., Luciani, M., Mesiano, F., Rossi, R.: Robust and efficient
    page rank for word sense disambiguation. In: Proceeding of TextGraphs-5: Graph-
    based Methods for Natural Language Processing, Uppsala, Sweden (2010)
11. Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In:
    Proceedings of the 12th Conference of the European Chapter of the Association
    for Computational Linguistics. EACL ’09, Morristown, NJ, USA, Association for
    Computational Linguistics (2009) 33–41
12. Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic anal-
    ysis theory of acquisition, induction and representation of knowledge. Psychological
    Review 104 (1997) 211–240
13. De Cao, D., Basili, R., Luciani, M., Mesiano, F., Rossi, R.: Enriched page rank for
    multilingual word sense disambiguation. In: Proceeding of 2nd Italian Information
    Retrieval 2011 Workshop. (2011)
14. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller., K.: An on-line lexical
    database. International Journal of Lexicography 13(4) (1990) 235–312
15. Pianta, E., Bentivogli, L., Girardi, C.: Multiwordnet: developing an aligned multi-
    lingual database”. In: Proceedings of the First International Conference on Global
    WordNet, Mysore, India (January 21-25 2002)

</pre>