When is the Structural Context Effective?

                        Muhammad Ali Norozi                                              Paavo Arvola
           Dept. of Computer and Information Science                            School of Information Sciences
         Norwegian University of Science and Technology                             University of Tampere
                      Trondheim, Norway                                               Tampere, Finland
                         mnorozi@idi.ntnu.no                                        paavo.arvola@uta.fi


ABSTRACT                                                                formation is the context. The representation of the semi-
Structural context surrounding the relevant information is              structured documents aims to follow the established struc-
intuitively and empirically considered important in informa-            ture of documents, i.e., an academic book is typically com-
tion retrieval. Utilizing this context in scoring has improved          posed of hchaptersi, hsectionsi, hsubsectionsi etc., struc-
the retrieval effectiveness. In this study we will objectively          tures. hchapter1i is followed by hchapter2i and within
look into the significance of the structural context in contex-         hchapter1i, hsection1i is followed by hsection2i. Elements
tualization process, and try to answer the core question of             hsection1i and hsection2i are siblings, and hence most
under which circumstances do we need to deal with the such              likely, semantically related. The following element takes
types of context?                                                       the concepts further from the preceding elements, and the
                                                                        preceding elements provide the basics or foundation for the
Categories and Subject Descriptors                                      following elements. Therefore, together in the document
                                                                        order, the preceding and following elements form a strong
H.3.3 [Info. Search and Retrieval]: Search process
                                                                        and connected perspective (the kinship structural context),
1.    INTRODUCTION                                                      surrounding the relevant information. Two general types of
                                                                        context can be distinguished based on the standard relation-
   Document parts, referred to as elements, have both a hier-
                                                                        ships. Hierarchical context, for one, refers to the ancestors,
archical and a sequential relationship with each other. The
                                                                        whereas horizontal refers to the preceding and following el-
hierarchical relationship is a partial order of the elements,
                                                                        ements [3]. In existing studies, context has been referred to
which can be represented with a directed acyclic graph, or
                                                                        externally as the hyperlink structure of the elements as well.
more precisely, a tree. In the hierarchy of a document, the
                                                                        The context is internal when it is considered from within the
upper elements form the context of the lower ones. In ad-
                                                                        document, and it is external when it is considered outside
dition to the hierarchical order, the sequential relationship
                                                                        the document(s).
corresponds to the order of the running text. From this
                                                                           Contextualization [3] is a re-scoring model, where the ba-
perspective, the context covers the surroundings of an ele-
                                                                        sic score, usually obtained from a full-text retrieval model,
ment. An implicit chronological order of a document’s text
                                                                        of a contextualized document or element is re-enforced by
is formed, when the document is read by a user.
                                                                        the weighted scores of the contextualizing documents or el-
   In focused retrieval, the use of context is a driving force
                                                                        ements (elements in the sub-tree of interest or structural
to alleviate or “un-bias” the retrieval of items with varying
                                                                        context). In this section, we will formalize the context from
length. Namely, information retrieval is based on evidence
                                                                        in and outside the document using contextualization model.
of the retrievable units at hand, and longer text units have
indeed more textual evidence. This has led to a play-safe               2.1   Structural Context
strategy where the larger elements are favoured by retrieval               Structural context is the sub-tree of interest from the hi-
systems. How effective the context is to neutralize the side-           erarchical tree structure of the semi-structured document.
effects or bias because of size or length (smaller elements             Internally, in hierarchical contextualization [3], the intrin-
with less textual evidence gets same opportunity to satisfy             sic tree structure within the XML document is employed.
the users need), has been reported experimentally in many               Structural context in hierarchical or vertical contextualiza-
studies [1–3, 6, 9, 10, 8, 7]. The question asked here is:              tion is the context based on parent-child relationship in doc-
why the structural context is important in the retrieval of             ument’s hierarchical structure. An element’s parent or an-
focused items? In addition, we also ask if the use of context,          cestors are accounted to be the structural context, while
under certain circumstances (worst-case), would harm the                contextualizing the element itself. The sub-tree of interest
retrieval. This means if the context is poor or even mislead-           is shown in Figure 1(a). Horizontal contextualization [3]
ing.                                                                    takes into account the sibling elements in the document’s
2.    CONTEXT                                                           hierarchical structure as the structural context. If we visu-
  In semi-structured documents, context of an element cov-              alize the document’s hierarchically tree structure, horizontal
ers everything in the document excluding the element it-                structural context is horizontal, as it is based on one level
self. The surrounding items or elements of the relevant in-             (the same level as the element to be contextualized) of the
                                                                        tree at a time (see Figure 1(b)). The most recent form of
Copyright remains with the authors and/or original copyright holders.   contextualization, the Kinship contextualization [7], is both
DIR 2013, April 26, 2013, Delft, The Netherlands.                       horizontal (siblings) and vertical (ancestors & descendants
                                                   article                                                                                                                               article                                                                                                                                article
                                                     <1>                                                                                                                                   <1>                                                                                                                                    <1>


                    header                                                            body                                                                header                                                            body                                                                 header                                                            body
                    <1.1>                                                            <1.2>                                                                <1.1>                                                            <1.2>                                                                 <1.1>                                                            <1.2>


                                                                                                                                                                                                                                                                              title       id     revision    categories            sec                              sec                            sec
                                                                                                                                       title       id     revision    categories            sec                              sec                            sec
 title       id     revision    categories            sec                              sec                            sec                                                                                                                                                    <1.1.1>   <1.1.2>    <1.1.3>      <1.1.4>
                                                                                                                                      <1.1.1>   <1.1.2>    <1.1.3>      <1.1.4>                                                                                                                                                  <1.2.1>                          <1.2.2>                        <1.2.3>
<1.1.1>   <1.1.2>    <1.1.3>      <1.1.4>                                                                                                                                                 <1.2.1>                          <1.2.2>                        <1.2.3>
                                                    <1.2.1>                          <1.2.2>                        <1.2.3>


                                                                                                                                                                                                                                                                                                 timestamp    category        st           p                   st               list          st        weblink
                                                                                                                                                          timestamp    category        st           p                   st               list          st        weblink
                    timestamp    category        st           p                   st               list          st        weblink                                                                                                                                                               <1.1.3.1>   <1.1.4.1>    <1.2.1.1>    <1.2.1.2>           <1.2.2.1>         <1.2.2.2>    <1.2.3.1>    <1.2.3.2>
                                                                                                                                                          <1.1.3.1>   <1.1.4.1>    <1.2.1.1>    <1.2.1.2>           <1.2.2.1>         <1.2.2.2>    <1.2.3.1>    <1.2.3.2>
                    <1.1.3.1>   <1.1.4.1>    <1.2.1.1>    <1.2.1.2>           <1.2.2.1>         <1.2.2.2>    <1.2.3.1>    <1.2.3.2>


                                                                                                                                                                                                                                                                                                                                    b            link          entry           entry            entry
                                                                                                                                                                                             b            link          entry           entry            entry                                                                 <1.2.1.2.1>   <1.2.1.2.2>    <1.2.2.2.1>     <1.2.2.2.2>      <1.2.2.2.3>
                                                       b            link          entry           entry            entry                                                                <1.2.1.2.1>   <1.2.1.2.2>    <1.2.2.2.1>     <1.2.2.2.2>      <1.2.2.2.3>
                                                  <1.2.1.2.1>   <1.2.1.2.2>    <1.2.2.2.1>     <1.2.2.2.2>      <1.2.2.2.3>                                                                                                                                                                                                                                                                                               Out-links In-links
                                (a) Hierarchical                                                                                                                        (b) Horizontal                                                                                                                                (c) Kinship                                                                                         (d) Citation

                                                                      Figure 1: Structural context, the sub-tree of interest, example taken from [7]

elements) but intrinsically non-hierarchical perspective of                                                                                                                                                                                                                 didate to satisfy the user’s need and therefore, the elements
the hierarchical information. Structural context is hence                                                                                                                                                                                                                   should be contextualized by the elements in the sub-tree
both vertical and horizontal in the document’s hierarchical                                                                                                                                                                                                                 of interest. Hence, the good structural context contains a
form, Figure 1(c).                                                                                                                                                                                                                                                          strong likelihood factor that should be used to deduce that
   And externally, in citation contextualization [8], the doc-                                                                                                                                                                                                              the contextualized element is a good candidate for the posed
ument’s hyperlink structure is taken in to account. The                                                                                                                                                                                                                     query.
structural context here is based on the hyperlinks’ graph of                                                                                                                                                                                                                   The tree-structure of the XML document (Figure 1) is as-
documents hyper-linking (connecting) one another in form                                                                                                                                                                                                                    sumed to be a graph. In order for the structural context to
of inlinks (indegree) and outlinks (outdegree). In this case,                                                                                                                                                                                                               take part in the contextualization process, each of the nodes
the sub-graphs instead of tree of interest are the out-links                                                                                                                                                                                                                in the sub-tree of interest should possess an impact factor.
graph and the in-links graphs (see Figure 1(d)).                                                                                                                                                                                                                            Conceptually, the impact factor is produced in the follow-
                                                                                                                                                                                                                                                                            ing manner: Myriad of random surfers traverse the XML
2.2                       Why Structural Context?                                                                                                                                                                                                                           graphs. In particular, at any time step a random surfer is
   Structural context is the essential component of the Con-                                                                                                                                                                                                                found at an element and either (a) makes a next move to the
textualization model [1]. With contextualization model, us-                                                                                                                                                                                                                 sub-element of the existing element by traversing the con-
ing the structural context, the aim is to rank higher an el-                                                                                                                                                                                                                tainment edge, or (b) makes a move to the parent-element
ement in a good context (strong evidence in the structural                                                                                                                                                                                                                  of the existing element, or (c) jumps randomly to another
context) than an identical element in a not so good context                                                                                                                                                                                                                 element in the XML graph. As the time goes on (the num-
(less or no evidence in the structural context) within the                                                                                                                                                                                                                  ber iterations), the expected percentage of surfer at each
document. And therefore, retrieve elements independent of                                                                                                                                                                                                                   node converges to a limit, the dominant eigenvector of the
their sizes. A small element, in term of size, can be viewed                                                                                                                                                                                                                XML graph. This limit provides the impact or strength of
and hence scored in relation to its structural context, and                                                                                                                                                                                                                 each element in the structural context of the element to be
its smaller size (which means having less evidence in total)                                                                                                                                                                                                                contextualized, in the form of g function. All the elements,
doesn’t stop it from being selected as one of the best results.                                                                                                                                                                                                             in the structural context of the contextualized element, are
   In order to cope up with the “biasness” issue (described                                                                                                                                                                                                                 considered for contextualization; where the contextualiza-
earlier), in contextualization model, the weight of a relevant                                                                                                                                                                                                              tion vector g identifies the importance of each of the unit of
element is adjusted by the basic weights of the elements in                                                                                                                                                                                                                 the structural context (Equation 1).
the structural context (its contextualizing elements). In ad-
dition to basic weights, each element in the structural con-                                                                                                                                                                                                                2.4                    Generalized Combination Functions
text of the contextualized element, should possess an impact                                                                                                                                                                                                                  The generalized re-ranking combination function based on
factor. An higher impact factor shows the importance of the                                                                                                                                                                                                                 the random walk principle, which also captures the struc-
contextualizing element and vice versa. The role and rela-                                                                                                                                                                                                                  tural context, can be formally defined as follows:
tion of elements in the structural context are operationalized
by giving the element a contextualizing weight. A contex-                                                                                                                                                                                                                                              CR(x, f, Cx , g k )                                                        =                   (1 − f ) · BS(x) +
                                                                                                                                                                                                                                                                                                                                                                                                          X
tualization vector is defined to capture the impact factor                                                                                                                                                                                                                                                                                                                                                      BS(y) · g k (y)
of each contextualizing element, and this contextualization                                                                                                                                                                                                                                                                                                                                                        y∈Cx
vector is represented by a g function in Equation 1.                                                                                                                                                                                                                                                                                                                                                  f·                  X                    (1)
                                                                                                                                                                                                                                                                                                                                                                                                                              g k (y)
2.3                       Contextualization and Random Walks                                                                                                                                                                                                                                                                                                                                                          y∈Cx

   Random walk principle is employed, for contextualization,                                                                                                                                                                                                                where
to induce a similarity structure over the documents based                                                                                                                                                                                                                     • BS(x) is the basic score of contextualized element x
on the containment and reverse-containment relationships                                                                                                                                                                                                                         (text-based score, e.g., tf · ief )
(element, sub-element and vice versa). Hence, these rela-                                                                                                                                                                                                                     • f is a parameter which determines the weight of the
tionships affect the weight each element, in the structural                                                                                                                                                                                                                      context in the overall scoring.
context, has in contextualization.                                                                                                                                                                                                                                            • Cx is the kinship context surrounding the contextu-
   The premise is that good structural context (identified by                                                                                                                                                                                                                    alizing element x, i.e., Cx ⊆ structural context(x),
random walk and the contextualization model [7]) provides                                                                                                                                                                                                                        ⊆, because only the structural context containing the
evidence that an element in focused retrieval is a good can-                                                                                                                                                                                                                     query terms are considered.
     • g k (y) is the generalized contextualization vector based    computing g k vector is feasible for a reasonably large XML
       on random walk, which gives the authority weight (the        document collections. At the query time, the scores from
       impact) of y, the contextualizing elements (elements in      g k vector and the basic scores are combined to produce an
       structural context) of x in the sub-tree of interest.        overall ranking score, using Equation 1.
                                                                       In the generalized combination function given (Equation 1),
3.    EFFECTS OF CONTEXTUALIZATION ON                               the contextualization force has to be parametrized. In our
      DIFFERENT TEST COLLECTIONS                                    earlier work [7], the contextualization force was tuned and
   Structural context in the contextualization framework, is        reported the values leading to best overall performance. In
independent of the basic weighting scheme of the elements           the parametrization process it was found that the optimal
and it could be applied on the top of any query language,           values of contextualization force f (from Equation 1) lies in
retrieval systems and test collections. The effects of contex-      the range, (f ∈ {.25,..., 2.50}). These optimal values for f
tualization on different test collections have been observed        are obtained by using cross-validation technique. A 68-fold2
in the existing studies. Contextualization model has been           cross-validation (or complete cross-validation) technique has
applied on the top of different and competitive baseline sys-       been performed - by randomly partitioning the collection
tems using a diverse set of test collections, e.g., semantically    into 68 training and test samples based on the number of as-
annotated Wikipedia collection from INEX 20091 , IEEE col-          sessed topics. Of the 68 samples, a single sample is retained
lection, and iSearch scientific collection [3, 7, 8]. In order to   as the validation set for testing, and remaining 67 samples
get the best possible baseline system, a data fusion was per-       are used as training set. The cross-validation process is re-
formed based on sum of normalized scores (CombSUM) [11]             peated 68 times (for each fold), with each of 68 samples used
and Reciprocal Ranking [4] of INEX 2009 submitted runs.             exactly only once as validation set. These 68 independent
   In the experimental evaluation the retrieval effectiveness       or unseen samples are then combined to produce a single or
at different granularity levels were observed. Mainly, re-          a set of estimations for parameter f .
trieval effectiveness at paragraph, article and INEX’s fo-
cused retrieval level selection has been observed. The ap-          3.2      Query Term Probabilities
proaches were evaluated using the evaluation framework pro-           If a relevant element does not contain any of the query
vided by TREC and INEX evaluation initiatives. The re-              term(s), it does not match to the query. Hence, in order
ported results were shown to be promising using both TREC           to retrieve such elements, some expansive methods, such as
and INEX evaluation framework [3, 7].                               contextualization, ought to be used. It seems obvious that,
   The focused task in INEX ad-hoc track is to retrieve             in a relevant small element, the probability of occurrence
most focused elements satisfying an information need with-          of a query term is smaller than in a larger element. In or-
out overlapping elements. An overlapping result list means          der to demonstrate this lack of evidence on small elements,
that the elements in the result list may have a descendant re-      we calculated some posteriori probabilities for query term
lationship with each other and share the same text content.         occurrences in a relevant document (Rd ) and in a relevant
For instance, in Figure 1 the hentryi element h1.2.2.2.1i           paragraph (Rp , i.e., the relevant hpi elements from the XML
and the hseci element h1.2.2i are overlapping. In the ex-           graph), based on INEX 2009, 68 topics (title field) and their
isting studies, in the focused retrieval task, the INEXs’ fo-       relevance assessments. The probabilities are calculated as
cused approach is followed, considering a result list where         the fraction of relevant elements containing any query term,
only one of the overlapping elements from each branch is            or all query terms over all relevant elements of same kind.
selected. This means that including the hseci element in            The probability of occurrence of any query term (from the
the results would mean excluding the entry element in the           query Q) in a Rp and in a Rd respectively are:
results or vice versa.                                                               !                          !
   Contextualization and the fusion approach as scoring meth-
                                                                            [                           [
                                                                        P       q Rp = 0.847, P             q Rd = 0.995
ods, however, do not take any stand on which elements                        q∈Q                        q∈Q
should be selected from each branch. Thus a structural fu-
sion has been performed, where the element level selection          This means that the probability of occurrence of none of the
is taken from the baseline run and subsequently re-rank the         query terms in Rp and a Rd is 0.153 and 0.005 respectively3 .
elements of the baseline run.                                       Accordingly, the probabilities of occurrence of all the query
                                                                    terms in Rp and Rd , respectively are:
3.1     Test Settings                                                    Y
                                                                             P (q|Rp ) = 0.127,
                                                                                                   Y
                                                                                                       P (q|Rd ) = 0.469
  The hierarchical structure of XML documents in the Wiki-                 q∈Q                    q∈Q
pedia 2009 collection, are captured using the dewey encoding
scheme (as shown in Figure 1). This way each element in the         The difference in the amount of evidence at different gran-
document possess a unique index within the document, and            ularity levels become even more obvious, when we draw the
together with document’s unique id, this becomes unique for         frequencies of the query terms in this picture. A query term
the entire collection. The tree structure of XML documents          occurs on average 3.4 times in a Rp and 45.4 times in a Rd .
are converted into a matrix, and random walk is performed           4.     WORST CASE ANALYSIS
on this matrix at indexing time, as it is described in detail, in     Worst-case for a document d, in contextualization models,
our earlier work [7]. The contextualization vector g k from         means when structural context of element x is chosen such
Equation 1 is computed off-line for each and every XML              that:
document in the Wikipedia collection. This suggests that                    structural context(x) ∈  / elementsy (d)       (2)
1
                                                                         (∀ elements y in document d          where x and y ∈ d )
 Wikipedia collection containing 2.66 million semantically          2
annotated XML documents (50, 7 Gb) and 68 related topics              68, because of the 68 topics from INEX 2009.
                                                                    3
provided by the INEX 2009 ad-hoc track [5].                           Test is performed without stemming or stop-word removal
                 1.0
                                                                  worst-case-p
                                                               CRhierarchical                cally into the hypothesis that structural context gathered
                 0.9
                                                               Baselinep
                                                                       Fusion                from within the document, “horizontally” and “vertically”
                 0.8                                           Baselinea
                                                                       Fusion                using the hierarchical tree structure of document, and from
                 0.7                                           CRworst-case-a
                                                                 hierarchical
                                                                                             outside, using the hyperlinks graph structure of documents
                 0.6                                                                         referencing each other, influences the retrieval effectiveness.
     Precision


                 0.5                                                                         Worst-case experiments also support the theoretical sound-
                 0.4                                                                         ness of contextualization, i.e., if we apply contextualization
                 0.3                                                                         blindly, in the worst case, we would have as good result
                 0.2                                                                         as the basic scoring method. The results obtained in this
                 0.1
                                                                                             study are in-line with the earlier work on contextualiza-
                 0.0
                                                                                             tion [1, 3, 6, 7, 9, 10]. In this study we have experimented
                    0.0   0.1   0.2   0.3   0.4    0.5   0.6     0.7     0.8     0.9   1.0   with semi-artificial data, in the sense that we muddled the
                                                  Recall                                     context for the worst-case analysis. However, in real data the
Figure 2: Precision - recall, worst-case scenario at                                         quality of context varies as well. For example in Wikipedia
article (a) and paragraph (p) granulation and the                                            there are different kinds of pages ranging from listings to top-
fusion baseline systems.                                                                     ically very coherent documents. In order to get the best re-
                                                                                             sults in retrieval, analysing the quality and topical coherency
                                                                                             of context would be of great benefit. The analysis of con-
The non-structural context (Equation 2), should theoreti-
                                                                                             text may be topic dependent, since some queries may have
cally expose the worst-case effects of the contextualization
                                                                                             contextual parts. For instance a query: “Losses Belgium in
model. Non-structural context is structural by definition,
                                                                                             WW2”, crave for answers about Belgium in the context of
but physically not in the structural context of element x.
                                                                                             WW2.
How should we interpret the non-structural context, in or-
der to experimentally visualize the worst-case scenario? In-                                 6.    REFERENCES
stead of taking the actual and true structural context, we
randomly select the structural context from another non-                                     [1]  P. Arvola, M. Junkkari, and J. Kekäläinen. General-
relevant but retrieved document. Such a document (re-                                             ized Contextualization Method for XML Information
trieved but not relevant) would have misleading evidence                                          Retrieval. In Proc. of the 14th ACM CIKM, pages 20–
(false positive) and hence best suited for the worst-case eval-                                   27. ACM, 2005.
uation. Randomly selecting a document with zero basic                                        [2] P. Arvola, J. Kekäläinen, and M. Junkkari. The Effect
score would be trivial and not suitable for our purposes.                                         of Contextualization at Different Granularity Levels in
   By applying this simplistic approach on every element to                                       Content-oriented XML Retrieval. In Proc. of the 17th
be contextualized, we can formulate the worst-case scenario.                                      ACM CIKM, pages 1491–1492. ACM, 2008.
We have used the reciprocal rank fusion approach (fusing                                     [3] P. Arvola, J. Kekäläinen, and M. Junkkari. Contextu-
98 INEX 2009 runs) as the baseline system, for worst-case                                         alization Models for XML Retrieval. Info. Processing
analysis, which has been used before in our earlier work, find                                    & Management, pages 1–15, 2011.
further details from [8]:    X            1                                                  [4] G. Cormack, C. Clarke, and S. Buettcher. Reciprocal
            RRScore(e, q) =                                 (3)                                   rank fusion outperforms condorcet and individual rank
                                  k + rank(r,  e, q)
                             r∈R                                                                  learning methods. In Proc. of the 32nd international
where                                                                                             ACM SIGIR conference on Research and development
    • R is the set of runs (rankings)                                                             in information retrieval, pages 758–759. ACM, 2009.
    • and rank(r, e, q) returns the rank of element e as a                                   [5] S. Geva, J. Kamps, M. Lethonen, R. Schenkel, J. Thom,
      result of query q in run r.                                                                 and A. Trotman. Overview of the INEX 2009 ad-hoc
    • If e is not in the ranking, rank(r, e, q) is not defined                                    track. Focused Ret. and Evaluation, pages 4–25, 2010.
                                 1
      and the outcome of k+rank(r,e,q)    is 0.                                              [6] Y. Mass and M. Mandelbrod. Component Ranking and
    • The parameter k is for tuning.                                                              Automatic Query Refinement for XML Retrieval. Ad-
Figure 2 reveals the worst-case depiction of the contextual-                                      vances in XML IR, pages 1–18, 2005.
ization model. Not unexpectedly, the worst-case scenario is                                  [7] M. A. Norozi, P. Arvola, and A. P. de Vries. Contex-
as good as the baseline system, slightly better but not sig-                                      tualization using hyperlinks and internal hierarchical
nificant enough to be visible statistically. We can claim here                                    structure of wikipedia documents. In Proc. of the 21st
that, when the structural context is chosen randomly (hap-                                        ACM CIKM, pages 734–743. ACM, 2012.
hazardly), in the worst-case, the contextualization method                                   [8] M. A. Norozi, A. P. de Vries, and P. Arvola. Contex-
will not be worse than the basic scoring method.                                                  tualization from the Bibliographic Structure. In Proc.
                                                                                                  of the ECIR 2012 Workshop on Task-Based and Aggre-
5.               CONCLUSIONS AND FUTURE WORK                                                      gated Search (TBAS2012), page 9, 2012.
   Structural context is the sub-tree of interest, utilized in                               [9] P. Ogilvie and J. Callan. Hierarchical Language Models
conjunction with contextualization model, improves the re-                                        for XML Component Retrieval. Advances in XML IR,
trieval effectiveness. We have presented an exploratory and                                       pages 269–285, 2005.
theoretical study into the use of structural context from el-                                [10] G. Ramirez Camps. Structural Features in XML Re-
ements in the hierarchical structure of information, to im-                                       trieval. PhD thesis, SIKS, the Dutch Research School
prove retrieval performance. We looked into the structural                                        for Information and Knowledge Systems., 2007.
context from document’s hierarchical structure internally,                                   [11] J. A. Shaw and E. A. Fox. Combination of multiple
and hyperlinks structure externally. We looked theoreti-                                          searches. In The 2nd TREC. Citeseer, 1994.