<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Semantic Relationships between Wikipedia Categories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergey Chernov</string-name>
          <email>chernov@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tereza Iofciu</string-name>
          <email>iofciu@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Nejdl</string-name>
          <email>nejdl@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuan Zhou</string-name>
          <email>zhou@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Centre, University of Hannover</institution>
          ,
          <addr-line>Expo Plaza 1, 30539, Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Wikipedia is the largest online collaborative knowledge sharing system, a free encyclopedia. Built upon traditional wiki architectures, its search capabilities are limited to title and full-text search. We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which could improve its search capabilities and provide contributors with meaningful suggestions for editing the Wikipedia pages. We analyze relevant measures for inferring the semantic relationships between page categories of Wikipedia. Experimental results show that Connectivity Ratio positively correlates with the semantic connection strength.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Wikipedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a freely accessible Web encyclopedia. The Wikipedia project
started in 2001 as a complement to the expert-written Nupedia and it is currently run by
the Wikipedia Foundation. There are Wikipedia versions in 200 languages, with more
than 3,700,000 articles and 760,000 registered users. An especially interesting aspect
of Wikipedia is the categorization and linkage within its content. Pages in Wikipedia
are explicitly assigned to one or more Categories. Categories should represent major
topics and their main use within Wikipedia is in finding useful information. There are
two types of categories. The first type is used for classification of pages with respect to
topics. They can have hierarchical structure, for example the page can be assigned to the
category Science or one of its subcategories like Biology and Geography. The second
type of categories is Lists, they usually contain links to instances of some concept, for
example List of Asian Countries points to 54 Asian countries. There also exist
numerous links between pages. While most of them are created to provide efficient navigation
over the Wikipedia contents, they also represent some semantic relationships between
pages or categories.
      </p>
      <p>Like in most of the wikis, the search capabilities on Wikipedia are limited to
traditional full-text search, while search could benefit from the rich Wikipedia semantics
and may allow complex searches like find Countries which had Democratic Non-Violent
Revolutions. Using categories as a loose database schema, we can enrich Wikipedia
search capabilities with such complex query types. Wikipedia categories could be
organized in a graph, where the nodes are categories and the edges are hyperlinks. For
example, if some page from the category “Countries” points to a page from the
category “Capitals” we can establish a connection “Countries to Capitals”. However, not
all hyperlinks in Wikipedia are semantically significant such that they can be used to
facilitate search. The problem is how to distinguish strong semantic relationships from
irregular and navigational links.</p>
      <p>In this paper we propose two measures for automatic filtering of strong semantic
connections between Wikipedia categories. One measure is the number of links
between pages in two categories, and the other is Connectivity Ratio. They can be applied
to inlinks or outlinks separately. For evaluation, we apply these measures to the
English Wikipedia and perform user study to assess how semantically strong the extracted
relationships are. We observe that both number of links and Connectivity Ratio
correlates with semantic connection strength. It supports our hypothesis, while much more
experiments are needed to achieve a convincing evaluation.</p>
      <p>The rest of the paper is organized as follows. The related work is given in
Section 2. In Section 3 we describe in detail the problem of discovering strong semantic
relationships between categories and the possible use of semantic scheme in Wikipedia.
Later, in Section 4 we describe our analysis of factors, relevant for discovering semantic
links and present our experiments in Section 5. We conclude and outline future research
directions in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The idea to bring semantics into Wikipedia is not new, several studies on this topic have
been carried out in the last few years.</p>
      <p>
        The semantic relationships in Wikipedia were discussed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The authors
considered the use of link types for search and reasoning and its computational feasibility. Its
distinctive feature is the incorporation of semantic information directly into wiki pages.
Later, the semantic links proposal was extended in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to the Semantic Wikipedia
vision. According to this model, the pages annotations should contain the following key
elements: categories, typed links, and attributes. Typed links in form of is capital of are
introduced via markup extension [[is capital of::England]], each link can be assigned
multiple types. They also proposed the usage of semantic templates, based on the
existing Wikipedia templates. We follow this approach, but concentrate on automatic
extraction instead of manual link assignment. Also, our goal is to enable better search
on Wikipedia, but not to provide means for full-fledged reasoning. So we can tolerate
higher level of inconsistency in annotations and use ill-defined schemas. The system
for semantic wiki authoring is presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It aids users in specifying link types,
while entering the wiki text. This approach considers ontology-like wiki types, using
“is a” or “instance of” relationship types. Since the prototype supports manual editing,
it does not discuss automatic relationship assignment. Our approach can be used as an
additional feature in this system.
      </p>
      <p>
        One of the first attempts to automatically extract the semantic information from
Wikipedia is presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which aims at building an ontology from Wikipedia
collection. This work focus on the extraction of categories using links and surrounding
text, while we aim at extracting semantic links using assigned categories. The paper [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
shows the importance of automatic extraction of link types, and illustrates several basic
link types, like synonyms, homonyms, etc. It also suggests to use properties for dates
and locations. However, it does not propose any concrete solutions or experimental
results. Studies of history flow in Wikipedia are presented in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The work is focused
on discovering collaboration patterns in page editing history. Using an original
visualization tool they discovered editing patterns like statistical corroboration, negotiation,
authorship, etc. This work does not consider semantic annotation of Wikipedia articles.
      </p>
      <p>
        The link structures in Wikipedia have been studied recently. The work from [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
presents an analysis of Wikipedia snapshot on March 2005. It shows that Wikipedia
links form a scale-free network and the distribution of indegree and outdegree of Wikipedia
pages follow a power law. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] authors try to find the most authoritative pages in
different domains like Countries, Cities, People, etc., using PageRank [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and HITS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
algorithms. It is reported in the paper, that Wikipedia forms a single connected graph
without isolated components or outliers.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem</title>
      <p>The usage of semantic links can be illustrated by the example we have mentioned in
Section 1. Consider the query find Countries which had Democratic Non-Violent
Revolutions. When we search in full-text for Country Revolution Democracy we get a lot
of pages, which contain all the keywords, but most of them do not talk about particular
countries. In a database-like view, the target page of our query should belong to the
Countries category, and it should have a connection to a page in the category
Revolutions which mentions the word Democracy. In current Wikipedia, there is actually a link
between the pages Ukraine and Orange Revolution. If we put into a separate inverted
list1 all pages with Country to Revolution link type, we can force the previous query to
return more relevant results.</p>
      <p>However, it is infeasible to maintain and index all possible links between Wikipedia
categories. An example of typical Wikipedia linkage between categories is shown in
the Fig. 1. Ovals correspond to categories, squares contain the lists of pages and arrows
show existence of at least on hyperlink between categories. The category Republics is
pointed by the Female Singers, Egg, and Non-violent Revolutions categories. It also
points to Capitals in Europe, Spanish-American War People and Non-violent
Revolutions categories. Some of these links can be converted into strong semantic
relationships, like “Republics to Non-violent Revolutions ” categories, while relationships like
“Egg to Countries” are not regular semantic connections and only used for navigation
or some unimportant purposes. It is useless to type and index such
“LinkSourceCatergory to LinkTargetCategory” relationships, as they cannot help users in search. Instead,
we need to filter out unimportant links and extract semantically significant
relationships from Wikipedia. This could be achieved by analyzing the link density and link
structures between the categories.</p>
      <p>Besides search, the prominent semantic relationships can be of use in template
generation and data cleaning. For example, if we have some pages in Countries without
link to pages in Capitals, the system could suggest users to add missing link.</p>
      <p>
        One may want to create more precise link types and distinguish between type
“Country has Capital” and “Country changed Capital”. However, this task is much more
chal1 Inverted indices are used in information retrieval for keyword search, for detail see [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
lenging and it is not the focus of this paper, in which we concentrate on selecting only
coarse-grained semantic relationships.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Approach to Extracting Semantic Links</title>
      <p>This section presents our approaches to extracting semantically important relationships
from the links in Wikipedia. This task can be seen as an automatic construction of a
database schema, where we want to emphasize the meaningful relationships between
categories and disregard unimportant ones.</p>
      <p>It seems reasonable, that highly connected categories represent strong semantic
relations. For example, if a considerable percentage of pages from category “Country”
have links to category “Capital”, we can infer that there must be a “Country to Capital”
relationship between the two instances categories. On the other hand, if there are only
a few links between two categories like “Actor” and “Capital”, it seems that there is no
regular semantic relationship like “Actor to Capital”.</p>
      <p>We conduct experiments to test this filtering method. In the experiments, we extract
a core set of pages which have a common topic (in our case the common topic is
Countries). For these pages we extract all the categories they belong to, and also two lists of
categories, one for the pages with links toward Countries (inlink pages) and one for the
pages referred by Countries (outlink pages). The experiments with these lists can give
an idea about what link direction is more important for semantic relationship
discovery. During the experiments we test two measures used for finding the strong semantic
connections:
1. Number of links between categories. The more links we have between pages
in two categories, the stronger should their semantic connection be. As we study
separately the effect of outgoing links and incoming links, each time only links in
one direction are considered.
2. Connectivity Ratio. We can normalize the number of links with the category size,
to reduce the skew toward large categories. We call this normalized value
Connectivity Ratio, and it represents the density of linkage between two sets (in one
direction). Namely</p>
      <p>ConnectivityRatioi =</p>
      <p>N Lij
N Pi
where N Lij is the number of links from category i to category j2, and N Pi is the
total number of pages in categoryi.</p>
      <p>We have received a valuable comment from anonymous reviewers, that size of the
target directory is also important for normalization and N Pj could actually be included
into the formula. We agree with this viewpoint and will experiment in future with more
modifications of Connectivity Ratio.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Studies</title>
      <p>In this section we describe our experiment setup and discuss the results.
5.1</p>
      <sec id="sec-5-1">
        <title>Collection</title>
        <p>
          For experiments we used the Wikipedia XML corpus [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] which is available for the
participants of INEX 2006 evaluation forum3. This corpus is based on the English
Wikipedia dump, it has about 668,670 pages, which belong to 63,879 distinct
categories4; only pages from article namespace are included. We exported the dataset into
a MySQL 4.1 database, the data size was about 1,2 Gigabytes.
        </p>
        <p>For the experiments we selected three sets of pages, which we called Countries,
Inset and Outset. The Countries set consists of 257 pages devoted to countries, they were
manually extracted from the “List of countries” Wikipedia page, this set represents the
Countries category. We did not use Countries category directly, since it contains
subcategories like European countries, African countries, etc., rather than separate pages
with countries. Since in this paper we do not consider hierarchical nature of categories,
2 In current experiments j always corresponds to a Countries set.
3 http://inex.is.informatik.uni-duisburg.de/2006/
4 Some categories names differ only by space character before the names, or slightly different
spelling. Our experimental setup does not assume use of NLP techniques, so we did not remove
these inconsistencies and treated these categories as distinct.
we selected countries as described above. We also built the Inset, which contains all
Wikipedia pages that point to any of the pages in the Countries, and Outset contains
pages being pointed by the pages in Countries. The statistics summary for the selected
sets is presented in Table 1.</p>
        <p>Each page consists of the name of the page, a list of associated categories, and
a list of links that can be internal links (pointing to Wikipedia pages) or external links
(pointing to pages from the Web). In our experiments, we only considered internal links.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results</title>
        <p>
          The main evaluation criteria for our task is the quality of extracted semantic
relationships. To enable quantitative comparison between semantic connection, we introduce
the Semantic Connection Strength measure (SCS). It receives value 0, 1 or 2, where
value 2 represents a strong semantic relationship, value 1 represents a average
relationship and value 0 represents a weak relationship 5. In our assessment, the assessors were
given the following instruction: “category A is strongly related to category B (value 2)
if they believe that every page in A should conceptually have at least one semantic link
to B; A and B are averagely related (value 1), if they believe 50% of pages in A should
have semantic links to B; otherwise, A and B are weakly related (value 0).” This
evaluation setup is slightly similar to one from [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], while we measure semantic connection
between categories, rather than terms. Our experimental results showed that the level of
disagreement between assessors could be high (sometimes it reached 40%). It indicates
that SCS is a very subjective measure and should be improved in the future. In current
experiments, only assessments made by one person were used, because we found
important inconsistencies in other assessments and they could not be removed until the
submission deadline.
        </p>
        <p>In the first set of experiments, we tested whether the number of links between
categories is a good indicator of the level of semantic relationship. By “number of links
between categories” we mean the number of pages in source category, which have at
least on link to any page in target category.</p>
        <p>We ranked the categories from Inset and Outset by the number of pages in them,
because according to the way we obtain Inset and Outset, it is exactly the ranking by
number of links between categories. From each of the obtained rankings we selected
100 sample categories using a fixed interval, such that they are uniformly distributed
5 The intermediate values are also possible when averaging the assessment results.
1-20
21-40</p>
        <p>Cat4e1g-6o0ries
61-80
81-100
across each ranking. For example out of 15,000 we select categories number 1, 150,
300, 450, ..., 15000. These sample categories with corresponding numbers of links are
listed in the Table 2.</p>
        <p>The SCS measures of sample sets were averaged over every 20 categories, and the
results are shown in the Fig. 2. On the ordinate we put the average of the SCS, and
on the abscissa we show categories in descreasing order of number of links between
pages in them and pages in Countries. We can see from the plot, that by using Inset we
obtained stronger semantic relationships in comparison to Outset. This could be either
a sign of superior importance of inlinks or just show a special property of category
Countries. We will try to answer this question in next set of experiments with more
categories considered.</p>
        <p>The better performance of Inset is also observed in the second set of experiments,
where we used Connectivity Ratio as a ranking factor. The results are given in Table 3
and Fig. 3. The performance of the Connectivity Ratio measure is up to 25% better than
that of number of links, which proves the advantage of the normalization.</p>
        <p>The results are considerably less than 2, it shows that still a lot of weak semantic
connections get to the top of the ranking and there is much space for improvement. The
results are not round (0, 1 or 2) since they are averaged over intervals of 20 judgements.</p>
        <p>We expected the Connectivity Ratio to rank semantically strong relationships higher
and our pilot experiments supported this hypothesis. While current experiments are
certainly not sufficient to prove general effectiveness of Connectivity Ratio on eny pair of
categories, the monotonic decrease of both plots on Fig. 3 shows correlation between
SCS and Connectivity Ratio, which means it worth working on it further. The
important problem is to find relevant factors to include in category ranking algorithm,
Connectivity Ratio behaves like a relevant factor for our ranking of categories by semantic
relatedness.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>We have observed that, for a given category, inlinks have superior performance in
comparison to outlinks. This could be either a sign of importance of inlinks or an evidence
of a special property of category Countries. We will try to answer this question in next
set of experiments with more categories considered.</p>
      <p>We also show that normalized Connectivity Ratio is a better measure for extracting
the semantic relationships between categories. We consider this result might be skewed
toward our core Countries category, as it is natural that there are a lot of inlinks to the
pages representing countries (consider that every event must happen in a country). The
results we obtained are also influenced by the ranking scheme we chose. It is
necessary to improve the Connectivity Ratio formula so that it can bring out more relevant
relations and removes the trivial ones.</p>
      <p>For our future experiments we want to select more categories as a starting set and
remove bias introduced by the Countries categories. The assessment of semantic
relationship should be improved by taking into account possible information need. It would
be interesting to study a cardinality of link types relationships. For example, “Actor to
BirthYear”6 is a n:1 relation, while “Actor to Film” is a n:n relation. Another interesting
aspect is to investigate bidirectional relationships, categories size and their indegree, we
are also going to apply link analysis algorithms for establishing the semantic authorities
among categories.
6 In Wikipedia there are dozens of categories like “born 1970”, “born 1971”, etc., which
represent persons who were born in particular year. We are grateful to anonymous reviewers,
who made a good point that BirthYear is actually a relation, not the category. But we decided
to keep this example to show common inconsistency of real-world data and to underline the
difficulties one has to consider, while moving from theory to practice.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank Paul Chirita for numerous discussions and Michal Kopycki and
Przemyslaw Rys for their invaluable help with the experimental setup.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Wikipedia, the Free Encyclopedia. http://wikipedia.org, accessed in
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>David</given-names>
            <surname>Aumueller</surname>
          </string-name>
          . SHAWN:
          <article-title>Structure Helps a Wiki Navigate</article-title>
          .
          <source>In Proceedings of BTW Workshop WebDB</source>
          Meets IR,
          <year>March 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Bellomi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Bonato</surname>
          </string-name>
          .
          <article-title>Network Analisis for Wikipedia</article-title>
          .
          <source>In Proceedings of Wikimania</source>
          <year>2005</year>
          , The First International Wikimedia Conference.
          <source>Wikimedia Foundation</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Abraham</given-names>
            <surname>Bookstein</surname>
          </string-name>
          , Vladimir Kulyukin, Timo Raita,
          <string-name>
            <given-names>and John</given-names>
            <surname>Nicholson</surname>
          </string-name>
          .
          <article-title>Adapting Measures of Clumping Strength to Assess Term-Term Similarity</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>54</volume>
          (
          <issue>7</issue>
          ):
          <fpage>611</fpage>
          -
          <lpage>620</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Brin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <article-title>The Anatomy of a Large-scale Hypertextual Web Search Engine</article-title>
          .
          <source>Computer Networks and ISDN Systems</source>
          ,
          <volume>30</volume>
          (
          <issue>1-7</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ludovic</given-names>
            <surname>Denoyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Gallinari</surname>
          </string-name>
          .
          <source>The Wikipedia XML Corpus. Technical report</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kinzler</surname>
          </string-name>
          .
          <article-title>WikiSense - Mining the Wiki</article-title>
          .
          <source>In Proceedings of Wikimania</source>
          <year>2005</year>
          , The First International Wikimedia Conference.
          <source>Wikimedia Foundation</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Jon</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          .
          <source>Technical Report RJ 10076, IBM</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Kozlova</surname>
          </string-name>
          .
          <article-title>Automatic Ontology Extraction for Document Classification</article-title>
          .
          <source>Master's thesis</source>
          , Saarland University, Germany,
          <year>February 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Markus</surname>
          </string-name>
          <article-title>Kro¨ tzsch, Denny Vrandecic, and Max Vo¨ lkel. Wikipedia and the Semantic Web - The Missing Links</article-title>
          .
          <source>In Proceedings of Wikimania</source>
          <year>2005</year>
          , The First International Wikimedia Conference.
          <source>Wikimedia Foundation</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fernanda</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Viegas</surname>
            ,
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Wattenberg</surname>
            , and
            <given-names>Kushal</given-names>
          </string-name>
          <string-name>
            <surname>Dave</surname>
          </string-name>
          .
          <article-title>Studying Cooperation and Conflict between Authors with History Flow Visualizations</article-title>
          .
          <source>In Proceedings of SIGCHI</source>
          <year>2004</year>
          , Vienna, Austria, pages
          <fpage>575</fpage>
          -
          <lpage>582</lpage>
          . ACM Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Max Vo¨ lkel, Markus Kr o¨tzsch, Denny Vrandecic, Heiko Haller, and
          <string-name>
            <given-names>Rudi</given-names>
            <surname>Studer</surname>
          </string-name>
          .
          <article-title>Semantic Wikipedia</article-title>
          .
          <source>In Proceedings of the 15th international conference on World Wide Web</source>
          , Edinburgh, Scotland,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Jakob</given-names>
            <surname>Voss</surname>
          </string-name>
          .
          <source>Measuring Wikipedia. In 10th International Conference of the International Society for Scientometrics and Informetrics</source>
          , Stockholm,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ian H. Witten</surname>
          </string-name>
          , Alistair Moffat, and
          <string-name>
            <surname>Timothy</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bell</surname>
          </string-name>
          . Managing Gigabytes:
          <article-title>Compressing and Indexing Documents and Images</article-title>
          . Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>