=Paper= {{Paper |id=Vol-1176/CLEF2010wn-CriES-HerzigEt2010 |storemode=property |title=Multilingual Expert Search using Linked Open Data as Interlingual Representation |pdfUrl=https://ceur-ws.org/Vol-1176/CLEF2010wn-CriES-HerzigEt2010.pdf |volume=Vol-1176 }} ==Multilingual Expert Search using Linked Open Data as Interlingual Representation== https://ceur-ws.org/Vol-1176/CLEF2010wn-CriES-HerzigEt2010.pdf
    Multilingual Expert Search using Linked Open
         Data as Interlingual Representation

                     Daniel M. Herzig and Hristina Taneva

                                Institute AIFB
                       Karlsruhe Institute of Technology
                          76128 Karlsruhe, Germany
              herzig@kit.edu, hristina.taneva@student.kit.edu



      Abstract. Most Information Retrieval models take documents as Bag-
      of-Words and are thereby bound to the language of the documents.
      In this paper, we present an approach using Linked Open Data re-
      sources, i.e. URIs, as interlingual document representations. Documents
      and queries are summarized by the resources they contain. We show the
      applicability of our approach for multilingual retrieval with a case study
      on expert search.


1    Introduction

When encountering a problem, there are often two ways to get to a solution.
Either acquire the knowledge, in order to solve the problem by oneself, or ask
for outside help, preferably somebody who has expertise and experience in the
needed domain. The first case is often not feasible or would require too much
time. In the second case, the subsequent problem of finding the right expert
arises. We address this problem and present an approach for expert search in
this paper.
    Identifying who an expert is for a certain domain can be done in many
ways. One possible solution is to use documents and assume that the authors
have expertise on the topics they wrote about. We apply this assumption and
consider documents for the identification of experts.
    Obviously, the more specific the problem is the harder is it to find an expert.
Thus, extending the considered search space even across languages improves the
situation. The scenario of considering documents in different languages is not an
artificial one, e.g. global companies have product documentations in many lan-
guages or online developer forums have discussion threads in different languages.
Our approach addresses the problem of how to deal with different languages by
applying an interlingual representation for documents based on Linked Open
Data resources.

Expert Search Track at CriES Our approach participated at the expert search
track of the Cross-lingual Expert Search Workshop (CriES) at CLEF 2010 [18].
The setting and the evaluations presented in this paper are provided by the
workshop. The task of the expert search track was to find experts for 60 topics
2

consisting of 15 topics in each of the four languages English, Spanish, French,
and German in the Yahoo! Answers data corpus [19]1 . The data corpus consists
of 780193 threads, i.e. questions and answers, in the categories ”Health”, ”Com-
puter & Internet”, and ”Science & Math.”, in four languages written by 169819
users, i.e. experts. Table 1 gives an overview of the data set. More details and
an overview of the results of the workshop can be found in [18].
                                 Threads         Users
                      English 712370 (91%) 149410 (88%)
                      Spanish 38722 (5%) 11931 (7%)
                      French    19867 (3%)    5749 (3%)
                      German 9234 (1%)        3152 (2%)
Table 1. Overview of the data corpus regarding the size and language distribution.

  This paper is organized as follows. After the introduction in this section, we
describe the usage of Linked Open Data as an interlingual representation in
Section 2. In Section 3, we present our model for expert search, how we create
profiles between resources and experts and how we estimate parameters. Sec-
tion 4 presents the evaluation and Section 5 discusses related work. Finally, we
conclude in Section 6.


2   Multilingual IR based using Linked Open Data
Most common models in Information Retrieval see documents as Bag-of-Words,
i.e. they resolve the order of the words and take the collection of words as the
representation of a document. As a consequence, this representation is directly
bound to the language of the document. When using keyword queries in one
language, relevant documents in another languages are probably not retrieved.
We propose an approach using Linked Open Data resources as document rep-
resentation. Linked Open Data (LOD) refers to interlinked, publicly available,
and structured datasets on the web using semantic web standards, in particular
the Resource Description Framework (RDF) [4, 7].
     The first principle of LOD states that things should be identified by Uniform
Resource Identifiers (URIs), where things, i.e. resources, can be virtually every-
thing. The notion is not limited to physical things, but comprises also abstract or
intangible concepts, like happiness or fire alarm. URIs are not necessarily human
readable, since they are meant to be processed by machines. Therefore, human
readable labels are often assigned to URIs. Since there can be multiple labels
in different languages for one URI, the URI itself can be seen as an interlingual
representation for the resource it identifies. Figure 1 illustrates an example about
the resource representing Germany and its labels in several languages.
     The resource in Figure 1 is taken from DBpedia2 . DBpedia is a popular
LOD dataset extracted from Wikipedia, which exploits the interlanguage links
1
  This dataset is provided by the Yahoo! Research Webscope program (see http://
  research.yahoo.com/) under the following ID:L6. Yahoo! Answers Comprehensive
  Questions and Answers (version 1.0)
2
  http://dbpedia.org, Aug 4 2010
                                                                                                                                            3


                                                                   db:	
  Germany	
  

                                                                                                  rdfs:	
  
                                                l   abel	
                                                  l   abel	
  
                                      rdfs:	
  


                       "Alemania"	
  	
                "Allemagne"	
  	
         "Germany"	
  	
                    "Deutschland"	
  	
  




                     db=	
  "h9p://dbpedia.org/resource/"	
  
                     rdfs="h9p://www.w3.org/2000/01/rdf-­‐schema#"	
  	
  



Fig. 1. The resource representing ”Germany” with human readable labels in different
languages.




of Wikipedia for the labels. Since not all articles have a corresponding article in
all other languages, some resources do not have labels in all languages. Figure 2
gives an overview of the number of articles in the considered languages, which
directly corresponds to the number of resources and their labels. We use these
Wikipedia resources in our approach to capture the aboutness [10] of documents.
However, our approach is not limited to resources from Wikipedia. Other LOD
and RDF resources could be used likewise, e.g. AGROVOC3 , a conceptualization
of the agricultural domain features labels in five languages and could be used
for documents in this domain.
    We used the Wikipedia Miner                     Number of Articles in Wikipedia
Toolkit to extract the resources from
                                                    3000000
documents[13]. The miner identifies
possible candidates in the text and                 2500000
                                                                                           Number of Articles




then disambiguates and verifies them                2000000
up to a given confidence value by us-               1500000
ing the link structure of Wikipedia
                                                    1000000
and the surrounding terms, see [12]
for details. The extracted resources                 500000

form a Bag-of-Resources representa-                       0

tion of the document as illustrated in                        EN      ES    FR     DE

Figure 3. Each resource identifies un- Fig. 2. Number of articles in Wikipedia
ambiguously one thing. As mentioned for different languages as of September
above, these resources are interlingual 2009.
even though they have often English
names.
    Although the advantage of Linked Open Data is the connection between
resources, we omit this feature and leave the exploitation of links between re-
sources for future work. For now, we use only the resources. Therefore, the
current approach seems similar to concept based IR approaches, especially since
Wikipedia has been frequently used as a concept space[5], but also EuroWord-
Net4 or UWN[6]. However, the difference is that using resources allows to directly
3
    http://www.fao.org/agrovoc/, Aug 4 2010
4
    http://www.illc.uva.nl/EuroWordNet/, Aug 4 2010
4

exploit additional information from other Linked Open Data sources, e.g. the
resource representing Germany from Figure 1 is linked through the typed link
owl : sameAs [2] to resources representing the same thing, e.g. to the resource
from The New York Times 5 or from Geonames 6 . Beside information about the
same thing, also the typed links between different resources can be exploited. It
allows to enrich the representation with additional information and adapt it to
specific use cases or domains.

                 document	
  1:	
                                                                       bag	
  of	
  resouces	
  1:	
  
                                                                                                           db:	
  Bulgaria	
  
                  Bulgaria's	
   best	
   World	
   Cup	
   performance	
   was	
   in	
   the	
  
                                                                                                            db:	
  FIFA_World_Cup	
  
                  1994	
  World	
  Cup	
  in	
  the	
  United	
  States,	
  where	
  they	
  beat	
  
                  defending	
   champions	
   Germany	
   to	
   reach	
   the	
   semi-­‐                  db:	
  United_States	
  
                  finals.	
  
                                                                                                            db:	
  Germany	
  

                 document	
  2:	
                                                                       bag	
  of	
  resouces	
  2:	
  
                                                                                                           db:	
  Germany	
  
                  Deutschland	
   ist	
   ein	
   föderalis'scher	
   Staat	
   in	
  
                  MiOeleuropa.	
   Deutschland	
   ist	
   Gründungsmitglied	
   der	
                      db:	
  Sovereign_state	
  
                  Europäischen	
   Union	
   und	
   mit	
   knapp	
   82	
   Millionen	
                   db:	
  Central_Europe	
  
                  Einwohnern	
  deren	
  bevölkerungsreichstes	
  Land.	
  
                                                                                                            db:	
  Ci'zen	
  




Fig. 3. Text documents in different languages and their interlingual representation in
Linked Open Data resources.


3     Expert Search

The Yahoo! Answers data corpus contains discussion threads consisting of an
initial question and subsequent answers. The problem of expert search in this
context is to find users, who are likely able to answer a given question q, i.e. a
topic, based on the threads in the data corpus.
     We apply mixture language models. Potential experts are ranked according
to the probability that the expert ex ∈ E can answer the given question q ∈ Q,
i.e. P (ex|q). A question q is modeled as a Bag-of-Resources: q = {r1 , ..., rn }.
                                                                                                                n
                                                                                                                Y
                 P (ex|q) ∝ P (ex) · P (q|ex) = P (ex) ·                                                                  P (ri |ex)      (1)
                                                                                                                i=1

    We apply Bayes’s theorem and assume P (q) and the prior P (ex) to be equal
to 1. The probability P (ri |ex) is approximated as a weighted sum of several
features f and smoothed by information over the entire corpus C.
                                                  X
                    P (ri |ex) =                   (λf · Pf (ri |ex)) + λC · PC (ri )
                                                     f
                                                                                               X
                                                                                     s.t.               λf + λC = 1
                                                                                                  f

5
    http://data.nytimes.com/55761554936313344161, Aug 4 2010
6
    http://sws.geonames.org/2921044/about.rdf, Aug 4 2010
                                                                                                5

3.1     Expert - Resource Profiles

One answer per thread is marked by the questioner or by votes of other users
as the best answer to the question. The user, who gave the best answer, is
identifiable by its ID. All other answers do not have a user ID. We exploit this
setting by building two different models. These models are illustrated in Figure 4
and explained below.


                                                        q	
  
                                     aBest	
  
                                                        a	
  

                               expert	
  ID	
         	
  a*	
             aAll	
  
                                                        a	
  

                                                        a	
  




Fig. 4. One example discussion thread. The initial question q and the best answer a∗
are combined to abest and subject to the Best-Answer Model. All other answers are put
together as aall and considered by the All-other-Answer Model.


Best-Answer Model
This model takes the question q and the best answer a∗ together as abest and
relates abest to the expert who gave the best answer, as illustrated in Figuren 4.
The idea behind this model is that the user obviously understood the question,
because he was able to give the best answer. Therefore he holds expertise about
the covered resources. Formally, the model is defined as follows, where f req(r, a)
is the frequency of resource r in a.
                               X                                   P (ex|abest ) · P (abest )
              Pbest (r|ex) =              P (r|abest ) ·
                               abest
                                                                           P (ex)

      with


                                             f req(r, abest )
                 P (r|abest ) = P
                                           r∈abest f req(r, abest )
                P (ex|abest ) = 1, iff ex author of abest , 0 otherwise
                                 1               1
                   P (abest ) =      , P (ex) =
                                |Q|             |E|


All-other-Answers Model
This model relates all answers aall , except the best answer, to the expert, who
gave the best answer. The assumption behind this model is that an expert,
who gave the best answer, might also say that other answers are not correct.
Therefore, we assume that the expert has expertise about the resources covered
6

by these answers as well, at least to some extent. Formally, the model is defined
analogously to the previous one:

                                X                      P (ex|aall ) · P (aall )
                Pall (r|ex) =          P (r|aall ) ·
                                aall
                                                              P (ex)



3.2   Parameter Estimation

The mixture model presented in the previous section allows to balance the in-
fluence of each model through the corresponding parameter λf . In our case, we
need to determine λbest , the weight for the Best-Answer-Model and λall , the
weight for the All-other-Answers-Model. The smoothing parameter λC remains
                                        !
fixed at λC = 0.1. Hence, λbest + λall = 0.9 must hold.
In order to examine the effect of different parameter configurations on the per-
formance of the retrieval, we used the 60 topics provided by workshop along with
the given a priori relevance information. The a priori relevance information is
directly taken from the data set, i.e. each topic has exactly one relevant expert,
namely the one, who wrote the best answer for this topic. This setting is not
optimal, since these questions are part of the data corpus itself and not distinct
from it. Furthermore, it can be assumed that there are more than one relevant
expert per question and that judging the performance by the occurrence of just
one expert in the result set will deviate from the actual result. However, it al-
lows at least to roughly estimate a parameter configuration. We used the Mean
Average Precision (MAP) to measure the performance.
    Since the 60 topics are part of the Best-Answer-Model a correlation between
λbest and the M AP can be assumed in this setting. The M AP was measured
in steps of 0.05 from λbest = 0, i.e. the performance without the Best-Answer-
Model, to λbest = 0.9, the performance of the Best-Answer-Model alone. The
observed MAP for the different parameters is shown in the left plot of Figure 5.
As assumed, a correlation between λbest and the M AP can be observed. Re-
markably, the MAP decreases for λbest = 0.9, despite the assumed correlation.
This suggests that information is lost, if the All-Answers-Model is not involved
and as a consequence, the optimal parameter configuration can not be the max-
imal observed MAP. We used least-square curve fitting to approximate the ob-
served values, i.e. the red line in Figure 5. The maximum of the fitted curve is
λbest = 0.66. Comparing the estimated values with the actual MAP computed
ex post with the entire assessments shows that the maximum is even lower at
about λbest = 0.53, see right plot of Figure 5.


4     Evaluation
We submitted three runs with different configurations, see Table 2 for an overview
of the results. Some of the 60 topics are very short and many are written
in rather colloquial language and grammar or use abbreviations, e.g. ”Why
                                                                                             7




               0.4




                                                       0.4
               0.3




                                                       0.3
             MAP




                                                     MAP
               0.2




                                                       0.2
               0.1




                                                       0.1
               0.0




                                                       0.0
                     0.0   0.2   0.4     0.6   0.8           0.0   0.2   0.4     0.6   0.8
                                 λbest                                   λbest

Fig. 5. Parameter estimation through non linear curve fitting (red curve) over the
Mean Average Precision for a parameter sweep on λbest (black dots). The left plot
shows the a priori estimation computed with the relevant information about the best
expert only. The right plot shows the actual MAP computed ex post with the entire
assessments.

do women get PMS?” or ”hab es runtergeladen wie kann ich bei msn chat-
ten?”, which caused problems for the Wikipedia Miner to identify resources.
For 11 topics the Wikipedia Miner did not identify any resources. In these
cases, we extracted the resources manually, e.g. the resources db:Woman and
db:Premenstrual syndrome for the first question mention before and the re-
sources db:MSN and db:Online chat for the latter. We did this for run1 and
run3 and left the topics untouched for run2, in order to see how the approach
performs without any manual intervention.

                         Strict                  Lenient          Parameters
Run Id         P@10          MRR        P@10         MRR         λbest λall λC
run3           0.49 (+157%) 0.76 (+90%) 0.87 (+123%) 0.93 (+48%) 0.7 0.2 0.1
run1           0.48 (+153%) 0.77 (+93%) 0.86 (+121%) 0.94 (+49%) 0.6 0.3 0.1
run2           0.35 (+84%) 0.65 (+63%) 0.61 (+56%) 0.74 (+17%) 0.6 0.3 0.1
BM25 + Z-Score 0.19          0.40       0.39         0.63

Table 2. Results of the runs submitted to the CriES pilot challenge. The percentages
show the performance against the BM25+Z-Score baseline.


    Table 2 shows the results for the top 10 retrieved experts. Precision at cut-
off level 10 (P@10) and Mean Reciprocal Rank (MRR) are used as evaluation
measures. Precision/Recall curves for each run are presented in Figure 6 using
strict and lenient assessments [18]. All three runs exceed the standard IR base-
line, BM25 + Z-Score [18]. The baseline uses machine translation to translate
the topics in the four languages and matches them against monolingual indexes.
The results retrieved from the four monolingual indexes are combined for each
expert using the Z-Score [15].
    Beside retrieving relevant experts for a topic, one main aim of our approach
was to cross the language barrier and find experts regardless of their language.
Figure 7 visualizes the language distribution of the retrieved experts for each
topic language for run1. In order to facilitate the comparison, the distribution
8




Fig. 6. Precision/Recall curves based on interpolated recall for strict (left plot) and
lenient (right plot) assessment.

of threads and experts in the data set is displayed on the right. One can see
that indeed experts in all four languages were retrieved in most cases. Further,
the domination of english speaking experts is due to the proportions in the data
set and in addition due to the larger, underlying resource space, as illustrated
in Figure 2. However, a bias towards the language of the topic is also observ-
able, because not all resource have labels in all other languages as discussed in
Section 2.

                                                                                                                      100
10




                              10




                                                            10




                                                                                         10

                                                                              ●



     ●
                                                                                                                                      Threads in the data set
                                                                                                                                      Users in the data set
                                                                                                                       80
8




                              8




                                                            8




                                                                                         8




     ●




                                                                                                                       60
6




                              6




                                                            6




                                                                                         6




                                                                                                                      %
                                                                                              ●                  ●
4




                              4




                                                            4




                                                                                         4




                                                                                    ●                            ●
                                                                                                                       40
             ●
2




                              2




                                                            2




                                                                                         2




                                                                                                      ●

                                                                                                                       20
                         ●                       ●                                  ●                      ●
0




                              0




                                                            0




                                                                                         0




                                                                 ●




     EN     ES     FR    DE        EN     ES     FR    DE        EN     ES    FR    DE        EN     ES    FR    DE       0   EN    ES     FR          DE
          English Topics                Spanish Topics                French Topics                German Topics                   Data Corpus




Fig. 7. Boxplots illustrating the distribution of the top 10 retrieved experts by language
for each topic language. The right most plot shows the distribution by language of
threads and experts in the Yahoo Answers data set.


5         Related Work
Our approach uses Language Models, which have been studied by [9] and applied
by [3] for expert search. The latter compares two models with different search
strategies. The first model collects all documents for every candidate and then
identifies the relevant topics in these documents. The second model finds first
the significant documents for a given topic and then discovers the associated
experts. Our approach is based on model similar to the second model. We find
first the documents comprising the resources of the query and then relate the
resources to the expert who gave the best answer.
    Using concepts instead of terms was studied by [14, 16, 17]. These approaches
use Explicit Semantic Analysis and match topics to documents in a concept space
consisting of Wikipedia articles. Our approach uses also Wikipedia as back-
ground knowledge and a representation similar to the concept representation, as
long as only the resources are considered. However, as discussed in Section 2,
using URIs instead of concepts allows to draw information from other sources
                                                                                 9

and facilitates the usage of connections between the resources. In order to ex-
tract the resources from the documents, we screened several tools that deal with
Named Entity Recognition and Extraction. The Enrycher web service [21] tries
to extract not just resources, but also triples, which connect these resources.
The OpenCalais Web Service analyzes text and returns semantic metadata[1].
However, these approaches do not work with all four languages. The advantage
of the Wikipedia Miner [13], beside fast and precise results, is that the resource
space is clearly defined, i.e. all articles of Wikipedia, and that it supports the
language of the loaded Wikipedia file. Hence, we choose [13] for our approach.
    Not just indexing the terms of a document, but the idea of indexing what a
document is about, i.e. topic indexing, was introduced by [10]. Another approach
to topic indexing by embedding background knowledge derived from Wikipedia
was introduced by [11]. All relevant topics being mentioned in a document are
linked to Wikipedia articles. The titles of the articles are used as index terms.
A similar approach to ours, however our approach is not limited to Wikipedia
and the usage of URIs instead of terms allows to exploit the links between the
URIs in a later stage.
    A different approach to multilingual IR was introduced by [8], who uses a
multilingual ontology to map a term with the appropriate concept. However, it
does not consider disambiguation of terms. An aspect covered by our approach,
since URIs are not ambiguous and the URIs are determined using the disam-
biguation of [13]. [20] examines the impact of the use of semantic annotations
on the performance in monolingual and cross-language retrieval.

6   Conclusions and Future Work
We presented an approach for the Expert Search challenge of the CriES Work-
shop at CLEF 2010 using Linked Open Data resources, i.e. URIs, as interlingual
document representations. We used Wikipedia as the corpus of resources, but the
approach is not limit to the usage of Wikipedia. Resources are extracted from the
documents using the Wikipedia Miner Toolkit [13] and used to create Expert-
Resource profiles. A mixture model is applied for the retrieval and ranking of
experts for a given topic. Also topics are represented as a Bag-of-Resources.
    Our approach yielded solid results by exceeding the standard BM25 + Z-
Score baseline from 17% to 157% regarding Mean Reciprocal Rank and Precision
at 10. Another advantage of our approach is that not the entire documents need
to be indexed, but just a summary consisting of several URIs, which decreases
the index size.
    In future, we plan to use more features of Linked Open Data for IR. In
particular, exploiting the links between resources to leverage the interconnection.

7   Acknowledgments
We thank Philipp Sorg for the helpful discussions and his valuable feedback.
We also thank David Milne for developing the Wikipedia Miner and making it
available as open source. Research reported in this paper was supported by the
German Federal Ministry of Education and Research (BMBF) under the iGreen
project (grant 01IA08005K).
10

References
 1. Opencalais. http://www.opencalais.com, Aug 4 2010.
 2. Owl web ontology language overview. W3c recommendation, World Wide Web
    Consortium, February 2004.
 3. K. Balog, L. Azzopardi, and M. de Rijke. A language modeling framework for
    expert finding. Information Processing & Management, 45(1):119, 2009.
 4. C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. Int. J.
    Semantic Web Inf. Syst., 5(3):1–22, 2009.
 5. P. Cimiano, A. Schultz, S. Sizov, P. Sorg, and S. Staab. Explicit vs. latent concept
    models for cross-language information retrieval. In Proceedings of the Int. Joint
    Conf. on Artificial Intelligence (IJCAI), pages 1513–1518. AAAI Press, July 2009.
 6. G. de Melo and G. Weikum. Towards a universal wordnet by learning from com-
    bined evidence. In Proc. of the 18th ACM Conf. on Information and Knowledge
    Management (CIKM 2009), pages 513–522, New York, NY, USA, 2009. ACM.
 7. E. M. Frank Manola. RDF primer. http://www.w3.org/TR/rdf-primer/, Feb.
    2004. W3C Recommendation.
 8. J. Guyot, S. Radhouani, and G. Falquet. Ontology-based multilingual information
    retrieval. In CLEF Workhop, Working Notes Multilingual Track, pages 21–23,
    2005.
 9. D. Hiemstra. Using language models for information retrieval. University of twente,
    2001.
10. M. Maron. On indexing, retrieval and the meaning of about. Journal of the
    American Society for Information Science, pages 38–43, 1977.
11. O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with wikipedia. In Proc.
    of the AAAI WikiAI workshop, 2008.
12. D. Milne and I. H. Witten. Learning to link with wikipedia. In Proc. of the 17th
    ACM Conf. on Information and knowledge management (CIKM), pages 509–518,
    Napa Valley, California, USA, 2008. ACM.
13. D. Milne and I. H. Witten. An open-source toolkit for mining wikipedia. In Proc.
    New Zealand Computer Science Research Student Conf., volume 9, 2009.
14. M. Potthast, B. Stein, and M. Anderka. A wikipedia-based multilingual retrieval
    model. In ECIR, pages 522–530, 2008.
15. J. Savoy. Data fusion for effective european monolingual information retrieval.
    In Multilingual Information Access for Text, Speech and Images, pages 233—244.
    2005.
16. P. Sorg and P. Cimiano. Cross-lingual information retrieval with explicit semantic
    analysis. In Working Notes for the CLEF 2008 Workshop, 2008.
17. P. Sorg and P. Cimiano. An experimental comparison of explicit semantic anal-
    ysis implementations for cross-language retrieval. In Proc. of the Int. Conf. on
    Applications of Natural Language to Information Systems (NLDB), pages 36–48.
    Springer, June 2009.
18. P. Sorg, P. Cimiano, and S. Sizov. Overview of the cross-lingual expert search
    (CriES) pilot challenge. In Working Notes of the CLEF 2010 Lab Sessions, 2010.
19. M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large
    online QA collections. In Proc. of the 46th Annual Meeting for the Association for
    Computational Linguistics: Human Language Technologies (ACL-08: HLT), page
    719727, 2008.
20. M. Volk and P. Buitelaar. Ontologies in cross-language information retrieval. In
    Proceedings of WOW2003, 2003.
21. T. Štajner, D. Rusu, L. Dali, B. Fortuna, D. Mladenić, and M. Grobelnik. En-
    rycherService oriented Text Enrichment. Proc. of SiKDD, 2009.