<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <volume>20</volume>
      <issue>2000</issue>
      <fpage>95</fpage>
      <lpage>97</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1 Introduction
2
pages (reciprocal or not) is used as a basis for ranking web pages in search
on sales resulting from such visits. Finally, the existence of links between
thousands of rings into a hierarchy, and allow to search web rings by topic.
will of course miss the most general web rings.
ther simply for the number of click-throughs, or by distributing commissions
ties. Many sites link to each other voluntarily [1], and monitoring the trac
Web rings turn out be very useful for casual browsing, since they provide
can be hidden and be more or less obvious to the user. Very often, each
rings are accessed through ring portal sites [6] which organize hundreds of
often depends on the eorts of a ring administrator. Man y rings provide
They help organize collections of pages, create directories, or form
communiHyperlinks are one of the most important features of the World Wide Web.
shortcuts, some have broken links, and even when all the links exist some
member site of a web ring will link to an index page which itself links to
In reality these rings are obviously much more complicated. Their quality
results [3, 4, 5].
to dene and detect w eb rings automatically [
        <xref ref-type="bibr" rid="ref3">7</xref>
        ], even though this criterion
and advertising banners [2]. Sites also frequently pay for incoming trac,
eisite, in such a way that the member sites together form a ring. Most web
structure. In its most ideal form, sites link to a "next" and a "previous"
all the member sites of the ring. Some suggest using this spoke structure
that o ws from one site to another is the basis for both aliate programs
One particular link structure that has become very widespread is the ring
ring to sample the main page on a few dozen related sites. This makes it
example, it is often very dicult to come up with an accurate searc h query
Conventional search engines tend to be rather ineectiv e in this case. For
that captures the nuances of the pages that one is trying to discover. On
an ecien t way to discover and explore smaller sites about a particular topic.
possible to nd great sources of information that one w ould otherwise not
the other hand, web ring portal sites make it easy to nd a ring co vering a
given generic topic. It is then very easy to quickly follow the links in the
to a web ring, is how large such a ring should become. Obviously, a web
site for another is the rst site’s loss and the second site’s gain, mem bers of
dieren t consideration should limit the size of web rings. Since web rings
bring him back at a later stage, especially if another member of the web ring
sales. And if users are going to leave a site anyhow, it is best if they leave
slight advantage over directories in that it is possible to surf from one site
web rings recognize that helping a visitor nd what he needs ma y ultimately
ring that is very large might as well simply be a list that doesn’t connect
to the next without returning to the directory every time.
often form web rings as well. While one might think that a user leaving one
Surprisingly, competing sites such as online stores or online auction sites
it by going to a partner site [11].
      </p>
      <p>
        One question asked by web ring administrators as new sites get added
links back. Thus, by belonging to a web ring a site can increase trac and
are typically organized in a hierarchy, the size of a ring should be determined
back to itself { a line instead of a loop. However, we will argue that a
even have known to search for. Note that for this purpose web rings have a
sites. The question then becomes when to split these rings up. Is it better
far a user is likely to surf before he stops. We briey summarize this model
a random walk, in which the increments are independent and identically zn
distributed. Thus the value of the nth page is given by
to have one large ring, or several rings of smaller size?
here. It is assumed that the pages encountered while browsing have a value
Web rings that are allowed to grow freely as people add their sites to the
A simple answer is given by the Law of Surng [8 ], which tells us how
or utility. In particular, the utility of the nth page is These values follow xn.
ring soon become very large. Some rings can have in excess of a thousand
cien tly connected web, the threshold will be independent of n, so that
the probability p(x) that a surfer surfs x links is
are distributed according to the inverse Gaussian distribution [
        <xref ref-type="bibr" rid="ref4">10</xref>
        ]. That is,
= x is a constant. The number of pages visited before &lt; x is then xn xn
such that it is optimal to continue only if &gt; xn xn
With a nite (non-zero) discoun t rate &gt; 0, and a very large and
sufknown, since in the continuous limit of the random walk specied b y Eq.
(1) one obtains true Brownian motion. In that limit the rst passage times
p(x) = r 2 x 2 (x : 3=2e 2x )2
      </p>
      <p>0.15
particular site among these 2n sites that a surfer needs to nd.</p>
      <p>(a)
(b) 1-e
(c)
rings is larger than the expected probability of nding it in one large ring:
Putting everything together, we see that splitting one ring into two
equally sized rings will be optimal (i.e. lead to a higher probability of nding
dy p(x)dx: n1 Zn2n Zy1
the target site) provided the expected probability of nding the page in t wo
particular, there is a regime in which the inverse Gaussian distribution is
care is taken in the integral approximation in order to avoid the divergence
split the ring if &lt; 1 . p12
initially chose the incorrect ring, and then assume that we can not return
inequality.</p>
      <p>We note that small modications to the model ma y change the threshold
discrete case by explicitly summing the sums instead of approximating them
dependence on ring size.</p>
      <p>These results can be applied recursively. Any ring can be replaced by
probability of error that we allow until we split the ring. For example, if we
always better to keep the ring intact, since there is no way to satisfy the
at zero, we get that a large ring should always be split provided that &lt; . 12
two rings and a decision of which ring to follow. Thus, as long as there is a
What happens if we assume a dieren t form for the Law of Surng? In
other hand, the probability of chosing the correct ring is or less, it is 21
distribution. It may at rst be surprising that this result is independen t of
Note that we obtain the same result if we solve the problem for the
the size of the original ring. However, this must be true since a power-law
to the correct ring once we are done surng our rst c hoice, we should only
by integrals.
organized as a hierarchical tree structure, rather than in a ring. If, on the
will be advantageous to split the ring. This result is true for any power-law
distribution does not provide a natural length scale that is necessary for a
That is, as long as the sites can be divided into two meaningful groups, it
approximated by a power-law with exponent 3=2. In this case, if proper
The parameters in the model that helps to construct such a hierarchy
links. It is then only a question of deciding how to optimally organize sites
between this model and real usage data from a portal site. Combining the
[12].
can in principle be measured, even though in practice there are a number
into rings, and how to t rings in to a hierarchy.
have a circular structure is not important { what matters is the ease with
The model presented in the last section suggests that there is an optimal
then lead to automated ways of optimally organizing an index to web sites
which one navigates from one site to the next by following the web ring
of challenges. It should then be possible to make quantitative comparisons
tree structure to organize web rings of similar topic. The fact that web rings
ideas presented here with machine learning and clustering algorithms may</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>VLDB</given-names>
            <surname>Conference</surname>
          </string-name>
          , Edinburgh, Scotland,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>large-scale knowledge bases from the web"</article-title>
          ,
          <source>Proceedings of the 25th</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajagopalan</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          ,
          <article-title>"Extracting</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Raj</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chhikara</surname>
            and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Leroy</surname>
            <given-names>Folks</given-names>
          </string-name>
          , The Inverse Gaussian Distribution,
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Marcel</given-names>
            <surname>Dekker</surname>
          </string-name>
          , Inc.,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>