<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Google Censors Itself for China. BBC News (Jan.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Position Paper: A Study of Web Search Engine Bias and its Assessment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ing-Xiang Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cheng-Zen Yang</string-name>
          <email>czyang@syslab.cse.yzu.edu.tw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Sci. and Eng., Yuan Ze University 135 Yuan-Tung Road, Chungli Taiwan</institution>
          ,
          <addr-line>320, ROC</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <volume>26</volume>
      <issue>2006</issue>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Search engine bias has been seriously noticed in recent years. Several pioneering studies have reported that bias perceivably exists even with respect to the URLs in the search results. On the other hand, the potential bias with respect to the content of the search results has not been comprehensively studied. In this paper, we propose a two-dimensional approach to assess both the indexical bias and content bias existing in the search results. Statistical analyses have been further performed to present the significance of bias assessment. The results show that the content bias and indexical bias are both influential in the bias assessment, and they complement each other to provide a panoramic view with the two-dimensional representation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;search engine bias</kwd>
        <kwd>indexical bias</kwd>
        <kwd>content bias</kwd>
        <kwd>information quality</kwd>
        <kwd>automatic assessment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In recent years, an increasingly huge amount of information has
been published and pervasively communicated over the World
Wide Web (WWW). Web search engines have accordingly
become the most important gateway to access the WWW and
even an indispensable part of today’s information society as well.
According to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][7], most users get used to few particular search
interfaces, and thus mainly rely on these Web search engines to
find the information. Unfortunately, due to some limitations of
current search technology, different considerations of operating
strategies, or even some political or cultural factors, Web search
engines have their own preferences and prejudices to the Web
information [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ]. As a result, the information sources and
content types indexed by different Web search engines are
exhibited in an unbalanced condition. In the past studies
[
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ], such unbalanced item selection in Web search
engines is termed search engine bias.
      </p>
      <p>In our observations, search engine bias can be incurred from three</p>
      <p>
        Recently, the issue of search engine bias has been noticed, and
several studies have been proposed to investigate the
measurement of search engine bias. In [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ], an effective
method is proposed to measure the search engine bias through
comparing the URL of each indexed item retrieved by a search
engine with that by a pool of search engines. The result of such
search engine bias assessment is termed the indexical bias.
Although the assessment of indexed URLs is an efficient and
effective approach to predict search engine bias, assessing the
indexical bias only provides a partial view of search engine bias.
In our observations, two search engines with the same degree of
indexical bias may return different page content and reveal the
semantic differences. In such a case, the potential difference of
overweighing specific content may result in significant content
bias that cannot be presented by simply assessing the indexed
URLs. In addition, if a search result contains redirection links to
other URLs that are absent from the search result, these absent
URLs still can be accessed via the redirection links. In this case, a
search engine only reports the mediate URLs, and the search
engine may thus have a poor indexical bias performance but that
is not true. However, analyzing the page content helps reveal a
panoramic view of search engine bias.
      </p>
      <p>In this paper, we examine the real bias events in the current Web
environment and study the influences of search engine bias upon
the information society. We assert that assessing the content bias
through the content majorities and minorities existing in Web
search engines as the other dimension can help evaluate search
engine bias more thoroughly. Therefore, a two-dimensional
assessment mechanism is proposed to assess search engine bias.
In the experiments, the two-dimensional bias distribution and the
statistical analyses sufficiently expound the bias performance of
each search engine.</p>
    </sec>
    <sec id="sec-2">
      <title>2. LITERATURE REVIEW</title>
      <p>
        Recently, some pioneering studies have been conducted to discuss
search engine bias by measuring the retrieved URLs of Web
search engines. In 2002, Mowshowitz and Kawaguchi first
proposed measuring the indexed URLs of a search engine to
determine the search engine bias since they asserted that a Web
search engine is a retrieval system containing a set of items that
represent messages [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ]. In their method, a vector-based
statistical analysis is used to measure search engine bias by
selecting a pool of Web search engines as an implicit norm, and
comparing the occurring frequencies of the retrieved URLs by
each search engine in the norm. Therefore, bias is assessed by
calculating the deviation of URLs retrieved by a Web search
engine from those of the norm.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">11</xref>
        ], a simple example is illustrated to assess indexical bias of
three search engines with two queries and the top ten results of
each query. Thus, a total of 60 URL entries were retrieved and
analyzed, and 44 distinct URLs with occurring frequencies were
transformed into the basis vector. The similarity between the two
basis vectors was then calculated by using a cosine metric. The
result of search engine bias is obtained by subtracting the cosine
value from one and gains a result between 0 and 1 to represent the
degree of bias.
      </p>
      <p>
        Vaughan and Thelwall further used such a URL-based approach
to investigate the causes of search engine coverage bias in
different countries [
        <xref ref-type="bibr" rid="ref14">18</xref>
        ]. They asserted that the language of a site
does not affect the search engine coverage bias but the visibility
of the indexed sites. If a Web search engine has many high-visible
sites, which means Web sites are linked by many other Web sites,
the search engine has a high coverage ratio. Since they calculated
the search engine coverage ratio based on the number of URLs
retrieved by a search engine, the assessment still cannot clearly
show how much information is covered. Furthermore, the
experimental sites were retrieved only from three search engines
with domain names from four countries with Chinese and English
pages, and thus such few samples may not guarantee a universal
truth in other countries.
      </p>
      <p>
        In 2003, Chen and Yang used an adaptive vector model to explore
the effects of content bias [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since their study was targeted on
the Web contents retrieved by each search engine, the content
bias was normalized to present the bias degree. Although the
assessment appropriately reveals content bias, the study ignores
the normalization influences of contents among each retrieved
item. Consequently, the content bias may be over-weighted with
some rich-context items. Furthermore, the study cannot determine
whether the results are statistically significant.
      </p>
      <p>From the past literatures in search engine bias assessment, we
argue that without considering the Web content, the bias
assessment only tells users part of the reality. Besides, how to
appropriately assess search engine bias from both views needs
advanced study. In this paper, we propose an improved
assessment method for content bias and in advance present a
twodimensional strategy for bias assessment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. THE BIAS ASSESSMENT METHOD</title>
      <p>
        To assess the bias of a search engine, a norm should be first
generated. In traditional content analysis studies, the norm is
usually obtained with careful examinations of subject experts [5].
However, artificially examining Web page content to get the
norm is impossible because the Web space is rapidly changing
and the number of Web pages is extremely large. Therefore, an
implicit norm is generally used in current studies [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ].
The implicit norm is defined by a collection of search results of
several representative search engines. To avoid unfairly favoring
certain search engines, any search engine will not be considered if
it uses other search engine's kernel without any refinement, or its
indexing number is not comparably large enough.
      </p>
      <p>
        Since assessing the retrieved URLs of search engines cannot
represent the whole view of search engine bias, the assessment
scheme needs to consider other expressions to satisfy the lack. In
the current cyber-society, information is delivered to people
through various Web pages. Although these Web pages are
presented with photos, animations, and various multimedia
technologies, the main content still consists of hypertextual
information that is composed of different HTML tags [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Therefore, in our approach, the hypertextual content is assessed to
reveal another bias aspect.
      </p>
      <p>To appropriately present Web contents, we use a weighted vector
approach to represent Web pages and compute the content bias.
The following subsections elaborate the generation of an implicit
bias norm, a two-dimensional assessment scheme, and a weighted
vector approach for content bias assessment.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Bias Norm Generation</title>
      <p>
        As the definition of bias in [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ], an implicit norm used in
our study is generated from the vector collection of a set of
comparable search engines to approximate the ideal. The main
reason of this approximation is because the changes in Web space
are extremely frequent and divergent, and thus traditional
methods of manually generating norms by subject experts are
time-consuming and become impractical. On the other hand,
search engines can be implicitly viewed as experts in reporting
search results. The norms can be generated by selecting some
representative search engines and synthesizing their search results.
However, the selection of the representative search engines
should be cautiously considered to avoid generating biased norms
that will show favoritism on some specific search engines.
The selection of representative search engines is based on the
following criteria:
1. The search engines are generally designed for different subject
areas. Search engines for special domains are not considered.
In addition, search engines, e.g. localized search engines,
designed for specific users are also disregarded.
2. The search engines are comparable to each other and to the
search engines to be assessed. Search engines are excluded if
the number of the indexed pages is not large enough.
3. Search engines will not be considered if they use other search
engine's core without any refinement. For example, Lycos has
started to use the crawling core provided by FAST in 1999. If
both are selected to form the norms, their bias values are
unfairly lower. However, if a search engine uses other's engine
kernel but incorporates individual searching rules, it is still
under consideration for it may provide different views.
4. Metasearch engines are under consideration if they have their
own processing rules. We assume that these rules are not
prejudiced in favor of certain search engines. In fact, if there
exist prejudices, they will be revealed after the assessment, and
the biased metasearch engine will be excluded.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2 The Two-dimensional Assessment Scheme</title>
      <p>Since both indexical bias and content bias are important to
represent the bias performance of a search engine, we assess
search engine bias from both aspects and present search engine
bias in a two-dimensional view. Figure 1 depicts the
twodimensional assessment process. For each query string, the
corresponding query results are retrieved from Web search
engines. Then the URL locator parses the search results and
fetches the Web pages. The document parser extracts the feature
words and computes the content vectors. Stop words are also
filtered out in this stage. Finally, feature information is stored in
the database for the following bias measurement.</p>
      <p>Search
Engine</p>
      <p>Search
Engine
...</p>
      <p>Search
Engine</p>
      <p>Web
Pages
Query</p>
      <p>URL Locator</p>
      <p>Document Parser</p>
      <p>Vocabulary</p>
      <p>Entries
Bias
Assessor</p>
      <p>Bias
Report
The bias assessor collects two kinds of information: the URL
indexes and the representative vocabulary vectors (RVV) for
corresponding Web contents. The URL indexes are used to
compute the indexical bias, and the RVV vectors are used to
compute the content bias. After the assessment, the assessor
generates bias reports.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3 The Weighted Vector Model</title>
      <p>
        Web contents are mainly composed of different HTML tags that
respectively represent their own specific meanings in Web pages.
For example, a title tag represents the name of a Web page, which
is shown in the browser window caption bar. Different headings
represent differing importance in a Web page. In HTML there are
six levels of headings. H1 is the most important; H2 is slightly
less import, and so on down to H6, the least important [
        <xref ref-type="bibr" rid="ref10">14</xref>
        ]. In
content bias assessment, how to represent a Web document plays
an important role to reflect the reality of assessment.
      </p>
      <p>
        Here we adopt a weighted vector approach to measure content
bias [
        <xref ref-type="bibr" rid="ref4">8</xref>
        ]. It is based on a vector space model [
        <xref ref-type="bibr" rid="ref11">15</xref>
        ] but adapted to
emphasize the feature information in Web pages. Because the
features in &lt;title&gt;, &lt;H1&gt;, or &lt;H2&gt; tags usually indicate important
information and are used more often in the Web documents,
features in these tags are appropriately weighted to represent Web
contents. Since the number of the total Web documents can only
be estimated by sampling or assumption, this model is more
appropriate to represent and assess the contents of Web
documents.
      </p>
      <p>Since the search results are query-specific, query strings in
different subjects are used to get corresponding representative
vocabulary vectors RVV for search engines. Each RVV represents
the search content of a search engine and is determined by
examining the first m URL entry in the search result list. Every
word in URL entries is parsed to filter out stop words and to
extract feature words. The RVV consists of a series of vocabulary
entries VEi with eight fields: the i-th feature word, its overall
frequency f, its document frequency d, the number of documents
n, its title frequency t, its H1 frequency H, its H2 frequency h, and
its score S. The score S is determined as follows:
S = ( f + t ⋅ wt + H ⋅ wH + h ⋅ wh) × log( )
where wt, wH, and wh are respective tag weights. The scores are
used in similarity computations.</p>
      <p>After all RVV vectors are computed, necessary empty entries are
inserted to make the entries in RVV exactly corresponding to the
entries in the norm for similarity computation. Then the cosine
function is used to compute the similarity between RVVi of i-th
search engine and the norm N:
Sim(RVVi, N ) = cos(RVVi, N ) =</p>
      <p>∑ j S RVVi, j
∑ j S R2VVi, j
⋅ S N , j
∑ j S N2 , j
where SRVVi,j is the j-th entry score of RVVi, and SN,j is the j-th
entry score of the norm. Finally, the content bias value
CB(RVVi,N) is defined as
CB(RVVi, N ) = 1 − Sim(RVVi, N )</p>
    </sec>
    <sec id="sec-7">
      <title>4. EXPERIMENTS AND DISCUSSIONS</title>
      <p>
        We have conducted experiments to study bias in currently famous
search engines with the proposed two-dimensional assessment
scheme. Ten search engines are included in the assessment studies:
About, AltaVista, Excite, Google, Inktomi, Lycos, MSN,
Overture, Teoma, and Yahoo. To compute RVV vectors, the top
m=10 URLs from search results are processed because it is shown
that the first result screen is requested for 85% of the queries [
        <xref ref-type="bibr" rid="ref12">16</xref>
        ],
and it usually shows the top ten results. To generate the norm, we
used a weighted term-frequency-inversedocument-frequency
(TFIDF) strategy to select the feature information from the ten search
engines. The size of N is thus adaptive to different queries to
appropriately represent the norm.
      </p>
      <p>
        We have conducted experiments to measure the biases of ten
general search engines. The indexical bias is assessed according
to the approach proposed by Mowshowitz and Kawaguchi
[
        <xref ref-type="bibr" rid="ref6">10</xref>
        ][
        <xref ref-type="bibr" rid="ref7">11</xref>
        ][
        <xref ref-type="bibr" rid="ref8">12</xref>
        ]. The content bias is assessed according to the
proposed weighted vector model. In the experiments, queries from
different subjects were tested. Two of the experimental results are
reported and discussed here. The first is a summarization of ten
hot queries. This study shows the average bias performance of
Web search engines according to their content bias and indexical
bias values. The second is a case study on overwhelming
redefinition power of search engines reported in [
        <xref ref-type="bibr" rid="ref9">13</xref>
        ]. In this
experiment, the two-dimensional assessment shows that most
n
d
(1)
(2)
(3)
search engines report similar indexical and content bias ranking
except Overture.
      </p>
    </sec>
    <sec id="sec-8">
      <title>4.1 The Assessment Results of Hot Queries</title>
      <p>
        In this experiment, we randomly chose ten hot queries from Lycos
50 [
        <xref ref-type="bibr" rid="ref18">22</xref>
        ]. For each of them, we collected 100 Web pages from ten
search engines. The queries are “Final Fantasy”, “Harry Potter”,
“Iraq”, “Jennifer Lopez”, “Las Vegas”, “Lord of the Rings”,
“NASCAR”, “SARS”, “Tattoos”, and “The Bible”. The
assessment results of their indexical bias and content bias values
are shown in Table 1 and Table 2.
In Figure 2, the average bias performance is further displayed in a
two-dimensional diagram. In the figure, two additional dotted
lines are used to represent the respective statistic mean values of
bias. The results show that Google has the lowest indexical and
content bias value, which means that Google outperforms others
in bias performance. The best bias performance in Google
represents that both the sites and the contents it retrieved are the
majority on the Web and may satisfy the most user needs. From
the average results, we found that most of the search engines
show similar bias rankings in both indexical bias and content bias.
      </p>
      <p>However, when we review the bias performance of Yahoo!, we
can see that it has quite good content bias performance, which is
ranked as the second best, but only has a medium indexical bias
ranking. Such insistent bias performance shows that Yahoo! can
discover other similar major contents from different Web sites.
However, such differences cannot be revealed when users only
consider the indexical bias as the panorama of search engine bias.
In our experiments, a one-way analysis of variance (ANOVA)
was conducted to analyze the statistical significance on bias
performance among each search engine. The ANOVA analyses in
Table 5 and Table6 indicate that the content bias of Yahoo! is
more statistically significant than the indexical bias.</p>
      <p>In Table 3 and Table4, the ANOVA results of the averaged
indexical bias and content bias are presented to display the
statistical significance between the experimental search engines.
Both of the ANOVA results reveal statistical significance of the
ten search engines over the hot query terms (p ≤ 0.05). The
pvalues in the table measure the credibility of the null hypothesis.
The null hypothesis here means that there is no significant
difference between each search engine. If the p-value is less than
or equal to the widely accepted value 0.05, the null hypothesis is
rejected.</p>
      <p>Since there is significant difference among the search engines, we
further analyze the variance across different hot query terms.
Table 5 and Table 6 show the ANOVA results of indexical bias
and content bias between each search engine over the ten hot
query terms. Table 5 further indicates that About, AltaVista,
Google, Lycos, and Overture are significant, and Table 6 presents
that About, Google, MSN, and Yahoo! are significant. From the
ANOVA analyses, the original indexical bias of MSN and Yahoo!
is less significant, but the content bias assessment can reveal the
complementary information. The two-dimensional assessment
scheme tells users a panoramic view of search engine bias.
0.7
0.6
isa0.5
lB0.4
a
icx0.3
e
nd0.2
I
0.1
0.0</p>
      <sec id="sec-8-1">
        <title>Queries</title>
      </sec>
      <sec id="sec-8-2">
        <title>Final Fantasy</title>
      </sec>
      <sec id="sec-8-3">
        <title>Harry Potter</title>
      </sec>
      <sec id="sec-8-4">
        <title>Iraq</title>
      </sec>
      <sec id="sec-8-5">
        <title>Jennifer Lopez</title>
      </sec>
      <sec id="sec-8-6">
        <title>Las Vegas</title>
      </sec>
      <sec id="sec-8-7">
        <title>Lord of the Rings</title>
      </sec>
      <sec id="sec-8-8">
        <title>NASCAR</title>
      </sec>
      <sec id="sec-8-9">
        <title>SARS</title>
      </sec>
      <sec id="sec-8-10">
        <title>Tattoos</title>
      </sec>
      <sec id="sec-8-11">
        <title>The Bible</title>
        <p>About
AltaVista
Excite</p>
      </sec>
      <sec id="sec-8-12">
        <title>Google</title>
      </sec>
      <sec id="sec-8-13">
        <title>Inktomi</title>
      </sec>
      <sec id="sec-8-14">
        <title>Lycos</title>
        <p>MSN</p>
      </sec>
      <sec id="sec-8-15">
        <title>Overture</title>
      </sec>
      <sec id="sec-8-16">
        <title>Teoma</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4.2 The Case of “Second Superpower”</title>
      <p>
        To further assess the bias event happening on the Web, we used a
real Googlewashed event happening on the Web to assess the bias
performance of Web search engines. In this experiment, we once
retrieved the search results and the Web pages from these ten
search engines about one month later after the event happened. As
reported in [
        <xref ref-type="bibr" rid="ref9">13</xref>
        ], Tyler's original concept of “Second Superpower”
was flooded by Google with Moore's alternative definition in
seven weeks. As a matter of fact, the idea of “second superpower”
first appeared in the New York Times written by Tyler to describe
the global anti-war protests [
        <xref ref-type="bibr" rid="ref13">17</xref>
        ]. After a while, Moore's essay used
the term to describe another totally different meaning, the
influence of the Internet and other interactive media [
        <xref ref-type="bibr" rid="ref5">9</xref>
        ].
In Figure 3, the two-dimensional assessment result shows that the
Googlewashed effect indeed lowers the bias performance of
Google. The two-dimensional analysis also reflects that the
Googlewashed effect was perceptible to Google and Yahoo! since
Yahoo! once cooperated with Google at that time (Actually,
Yahoo is the same to Google in this query).
0.8
0.7
sa0.6
iB0.5
l
ica0.4
ex0.3
Ind0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Interestingly, Figure 3 shows that the indexical bias ranking of
Overture is relatively higher than its content bias. After manually
reviewing the total of 100 Web pages for this query, we discovered
that there are actually several definitions about “Second
Superpower,” not just Tyler’s and Moore’s. Although most
contents retrieved by Overture point to the major viewpoints
appearing in the norm, they are retrieved from diverse URLs but
not mirror sites, and thus the search results incur a high indexical
bias value. In this study, it shows that the indexical bias cannot tell
us the whole story, but a two-dimensional scheme reflects a more
comprehensive view of search engine bias.
      </p>
    </sec>
    <sec id="sec-10">
      <title>5. CONCLUSION</title>
      <p>Since Web search engines have become an essential gateway to
the Internet, their favor or bias of Web contents has deeply
affected users' browsing behavior and may influence their sight of
viewing the Web. Recently, some studies of search engine bias
have been proposed to measure the deviation of sites retrieved by a
Web search engine from the norm for each specific query. These
studies have presented an efficient way to assess search engine
bias. However, such assessment method ignores the content
information in Web pages and thus cannot present the search
engine bias thoroughly.</p>
      <p>In this paper, we assert that both indexical bias and content bias
are important to present search bias. Therefore, we study the
content bias existing in current popular Web search engines and
propose a two-dimensional assessment scheme to complement the
lack of indexical bias. The experimental results have shown that
such a two-dimensional scheme can notice the blind spot of
onedimensional bias assessment approach and provide users with a
more thorough view to search engine bias. Statistical analyses
further present that such a two-dimensional scheme can fulfill the
task of bias assessment and reveal more advanced information
about search engine bias.</p>
      <sec id="sec-10-1">
        <title>Alta Vista</title>
        <p>iProspect Search Engine User Attitudes (April-May, 2004);
www.iprospect.com/premiumPDFs/iProspectSurveyComple
te.pdf.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>The Anatomy of Large-Scale Hypertextual Web Search Engine</article-title>
          .
          <source>In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia</source>
          ,
          <year>1998</year>
          ), ACM Press, New York,
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>I.-X.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , C.-Z.,
          <article-title>Evaluating Content Bias and Indexical Bias in Web Search Engines</article-title>
          .
          <source>In Proceedings of International Conference on Informatics, Cybernetics and Systems (ICICS</source>
          <year>2003</year>
          )
          <article-title>(Kaohsiung, Taiwan</article-title>
          ,
          <string-name>
            <surname>ROC</surname>
          </string-name>
          ,
          <year>2003</year>
          ),
          <fpage>1597</fpage>
          -
          <lpage>1605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gikandi</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <source>Maximizing Search Engine Positioning (April</source>
          <volume>2</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Inman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>Adaptive Automatic Classification on the Web</article-title>
          .
          <source>In Proceedings of the 11th International Workshop on Database and Expert Systems Applications (Greenwich</source>
          , London, U.K.,
          <year>2000</year>
          ),
          <fpage>504</fpage>
          -
          <lpage>511</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <source>The Second Superpower Rears its Beautiful Head (March</source>
          <volume>31</volume>
          ,
          <year>2003</year>
          )
          <article-title>; cyber</article-title>
          .law.harvard.edu/people/jmoore/secondsuperpower.ht ml.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mowshowitz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kawaguchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <source>Assessing Bias in Search Engines. Information Processing &amp; Management</source>
          ,
          <volume>38</volume>
          ,
          <issue>1</issue>
          (Jan.
          <year>2002</year>
          ),
          <fpage>141</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mowshowitz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kawaguchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <source>Bias on the Web. Commun. ACM</source>
          ,
          <volume>45</volume>
          ,
          <issue>9</issue>
          (Sep.
          <year>2002</year>
          ),
          <fpage>56</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mowshowitz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kawaguchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <source>Measuring Search Engine Bias. Information Processing &amp; Management</source>
          ,
          <volume>41</volume>
          ,
          <issue>5</issue>
          (Sep.
          <year>2005</year>
          ),
          <fpage>1193</fpage>
          -
          <lpage>1205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Orlowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anti-war Slogan</surname>
            <given-names>Coined</given-names>
          </string-name>
          ,
          <source>Repurposed and Googlewashed . . . in 42 Days. The Register (April</source>
          <volume>3</volume>
          ,
          <year>2003</year>
          )
          <article-title>; www</article-title>
          .theregister.co.uk/content/6/30087.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Raggett</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getting</surname>
            <given-names>Started with HTML</given-names>
          </string-name>
          ,
          <source>W3C Consortium (May</source>
          <volume>24</volume>
          ,
          <year>2005</year>
          )
          <article-title>; www</article-title>
          .w3.org/MarkUp/Guide/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A Vector</given-names>
            <surname>Space</surname>
          </string-name>
          <article-title>Model for Automatic Indexing</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>18</volume>
          , 11 (Nov.
          <year>1975</year>
          ),
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Silverstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henzinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marais</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Moricz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Analysis of a Very Large AltaVista Query Log</article-title>
          ,
          <source>ACM SIGIR Forum</source>
          ,
          <volume>33</volume>
          ,
          <issue>1</issue>
          (Fall
          <year>1999</year>
          ),
          <fpage>6</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Tyler</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <article-title>A New Power in the Streets</article-title>
          . New York Times (Feb. 17,
          <year>2003</year>
          )
          <article-title>; foi</article-title>
          .missouri.edu/voicesdissent/newpower.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Vaughan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Thelwall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <source>Search Engine Coverage Bias: Evidence and Possible Causes, Information Processing &amp; Management</source>
          ,
          <volume>40</volume>
          ,
          <fpage>4</fpage>
          ,
          <issue>(</issue>
          <year>July 2004</year>
          ),
          <fpage>693</fpage>
          -
          <lpage>707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Zittrain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Edelman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <source>Documentation of Internet Filtering in Saudi Arabia, (Sep. 12</source>
          ,
          <year>2002</year>
          )
          <article-title>; cyber</article-title>
          .law.harvard.edu/filtering/saudiarabia/.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Zittrain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Edelman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <article-title>Localized Google search result exclusions</article-title>
          ,
          <source>(Oct. 26</source>
          ,
          <year>2002</year>
          )
          <article-title>; cyber</article-title>
          .law.harvard.edu/filtering/google/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Zittrain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Edelman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <source>Internet Filtering in China. IEEE Internet Computing</source>
          ,
          <volume>7</volume>
          ,
          <issue>2</issue>
          (March/April,
          <year>2003</year>
          ),
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [22]
          <fpage>50</fpage>
          .lycos.com.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>