A Collaborative Multi-Agent Approach to Web
             Information Foraging

                        Yassine Drias and Gabriella Pasi

                  Università degli Studi di Milano-Bicocca, DiSCo
                             y.drias@campus.unimib.it
                               pasi@disco.unimib.it


      Abstract. In this paper the task of Information Foraging (IF) is consid-
      ered as a useful paradigm to address Exploratory Search. In the context
      of IF, a Web navigation strategy is introduced and formalized, and a
      multi-agent based model is proposed to exploit a collaborative approach
      to Information Foraging. A system based on this model has been devel-
      oped, and its preliminary evaluations on the ACM and DBLP reposito-
      ries are reported. The results are promising and show the ability of the
      proposed Web Information Foraging system to find relevant Web pages.

      Keywords: Web Information Foraging, Collaborative Search, Multi-
      Agent Systems


1   Introduction
Nowadays, the human ability to explore the huge amount of data on the Web
is relatively limited [1] [14]. The usual way in which users address the task of
locating information relevant to their needs is by means of Web Search Engines,
which support the task of Information Retrieval (IR).
    In [2], the author pointed out that Web queries can be classified in three
categories based on the user intent. They can be navigational, when users aim
to reach a particular Web page; informational, when the goal is to get infor-
mation on a certain topic, or transactional if the purpose is to reach Web
pages where further interaction would take place. More recently, the Exploratory
Search paradigm has been introduced, to address situations in which users in-
tend not only to find relevant documents, but they are also interested in learning,
discovering and understanding complex and new topics [5]. Besides querying, Ex-
ploratory Search systems are also based on Web browsing strategies. The idea
is to go beyond returning a set of ranked results as an answer to a query [4, 13].
    A recent paradigm related to Exploratory Search, and which aims at discov-
ering paths leading to relevant information on the Web is Information Foraging.
The idea of Information Foraging is grounded on the Optimal Foraging The-
ory [12] developed by anthropologists to model animals’ behavior while foraging
food. The Information Foraging Theory was first developed in [9]. The authors
established their study on the similarity between the animal food foraging be-
havior and the behavior of humans while seeking information on the Web. The
II

theory is based on the assumption that, when searching for information, users
rely on their senses to perceive the information scent that helps them to reach
their goal just like animals do when they follow the scent of their preys.
    The task of Web Information Foraging consists in browsing the Web to collect
information related to specific user’s needs. An automatic Information Foraging
process simulates the user’s behavior while surfing on the Web with the aim
of finding relevant information. It starts with a topical user interest, then by
using an information scent measure [10] it aims to find the surfing paths that
lead to Web pages relevant to the user. Recent works are focused on defining
Information Foraging systems that are able to discover in an automatic way
the surfing paths that could be useful to users in order to help them to learn,
discover and augment their knowledge on a specific topic [7, 8].
    In this paper, we propose a new collaborative Web Information Foraging
model; to make Web Information Foraging more effective and efficient, we pro-
pose to address it in a collaborative way. In fact, as outlined in [6], a collaborative
approach to the task of Web Search allows to achieve a higher recall and has
the potential to improve users’ search skills. The collaborative aspect of our
model relies on creating several aliases of a user seeking information on the Web
by means of a multi-agent system. In this paper we also report some prelimi-
nary evaluations conducted on large datasets including the ACM and the DBLP
repositories, which show the effectiveness of the proposed approach.


2    A Collaborative Approach to Web Information
     Foraging


The Web can be represented as a directed graph G(P, L), where the vertices P
correspond to Web pages, and the edges L represent the connections between
pages. A directed edge connects a page pi to a page pj if on page pi there exists
a link referring to page pj . Let us consider a user with a topical interest, who
starts surfing on the Web from an initial Web page to reach a page contain-
ing relevant information to her/his interest. The sequence of visited Web pages
during this process is called a surfing path. The transition choices that the user
makes when surfing define her/his surfing strategy, which affects the way s/he
selects the next Web page to visit, based on the information scent s/he perceives
of each accessible page. The information scent of a given page is an estimation of
how much useful information the user is likely to get on this page. Web surfing
strategies are related to users’ behavior, which may depend on the users’ famil-
iarity with both the Web and computer science technologies, and also on their
purpose behind Web surfing.
    In Section 2.1 we propose and formalize a Web surfing strategy, and we
exploit it by defining and implementing a collaborative (multi-agent) system for
Information Foraging.
                                                                                   III

2.1   The employed Web surfing strategy
In their Web navigation paths, users aim at increasing the information scent
they perceive when moving from one Web page to another. If at time t a user is
on a page pi , her/his motivation to move to a new page pj relies on the increase
of information scent s/he would obtain in moving from the former to the latter
page. The information scent increase at time t+1 with respect to time t can
be modeled by a quantity proportional to the similarity between the content
of the new selected page pj and the user’s interest, which consist in a vector
of keywords representing the user’s information need and her/his motivation
behind Web surfing. We propose Formula (1) to estimate the information scent
a user would get from a given Web page pj .

                       Inf oScent(pj ) = Inf oScent(pi ) + S                      (1)
     where S is the similarity between the user’s interest and the content of page
pj . If we suppose that pi is the initial Web page from which the foraging starts,
then the information scent at page pi would be equal to the similarity between
its content and the user’s interest.
In [3] different Web surfing strategies have been analysed but not formally mod-
eled; among the proposed strategies the rational surfing behavior is related to
users who start Web surfing with specific goals, but who might incur in some
random choices due to a lack of knowledge on certain topics or unfamiliarity
with the Web.
     We propose a formalization of the rational surfing behavior in Formula (2),
which defines the probability P (pi , pj ) to move from a page pi to one of its linked
pages pj .

                                 Inf oScent(pj )α × sim(pi , pj )β
               P (pi , pj ) =                                                     (2)
                             Σl∈Ni (Inf oScent(pl )α × sim(pi , pl )β )
    where α and β are parameters that control the relative weight of the InfoScent
and the content-based similarity of pages pi and pj . Ni is the set of outgoing
pages from page pi .
    The outcome of a Web information foraging process consists in a surfing path
containing all the Web pages visited by the user during the Web browsing. The
path starts from an initial page and ends with a target page which contains
information relevant to the user’s interests. The number of Web pages contained
in the surfing path is called the surfing depth. A relevance score can be associated
with a surfing path, by assessing the similarity of the last page in the path and the
formal representation of the user’s interest (we do this by formally representing
the above objects as vectors of terms and by applying the Cosine Similarity).
    In order to make the foraging process operational and efficient, and to the
aim of defining effective systems based on this paradigm, it is unrealistic to
provide just a path as a potential answer to an information need. To tackle the
above issues, we propose a collaborative model that simulates the user’s surfing
activity starting from several distinct initial pages; the proposed approach is
presented in Section 2.2.
IV

2.2    Web Information Foraging using a Multi-agent System

The collaborative approach to information foraging proposed in this Section is
based on the Web surfing strategy described in section 2.1. By this approach
the foraging activity motivated by an information need is carried out by several
agents that act as aliases of a specific Web user. The Web is the environment
where the agents operate. We adopt a graph-based representation of the Web
as previously explained, and we assume that each agent browses a part of the
Web graph with the aim of finding relevant information. Starting with the same
user interest, the agents forage information simultaneously in the Web with the
goal of reaching relevant Web pages. The collaboration feature is ensured through
the blackboard communication model, which offers real-time communication and
results sharing between the agents as suggested in [6]. The system is parallelized
and decentralized, and it is composed of two kinds of agents that can work
simultaneously:

 – A group of foraging agents that have the task of simulating the behavior of
   a specific Web user when searching for information on the Web;
 – A blackboard-handler agent that takes in charge the management of the
   blackboard system and the sorting of the partial solutions proposed by the
   foraging agents.

Figure 1 illustrates the architecture of the multi-agent system that is composed
by the following modules:


      Fig. 1. Architecture of the Web Information Foraging Multi-Agent System.


Foraging Agents Each foraging agent has the task of simulating the rational
behavior of a single Web user when searching for information on the Web. More
concretely, the agent must determine at every click corresponding to a page pi
the best move to another page pj .
    At the beginning of the process, several foraging agents are launched at the
same time and each of them starts its foraging from an initial Web page randomly
selected. Each time a foraging agent makes a move towards a new Web page
                                                                                 V

using Formula (2), it evaluates its new partial solution (actual surfing path)
and communicates it to the other agents by writing it on the blackboard. The
blackboard-handler agent will then sort the solutions according to their relevance
to the user interest and will share the top solutions on the blackboard. If the
proposed solution by the foraging agent is a top solution then the agent continues
surfing on the same path, else the agent abandons its current surfing path and
starts surfing again from a new Web page defined randomly. Procedure 1 presents
the pseudo-code of the foraging agents’ behavior.


    Procedure 1: Foraging Agent
     1. Select a Web page at random ;
     2. Perceive the environment and check the neighborhood of the current Web
       page ;
     3. Move to a new Web page using Formula 2 ;
     4. Add the new Web page to the surfing path ;
     5. Communicate the surfing path to the other agents by writing it on the
       blackboard ;
     6. Wait for the blackboard-handler agent to sort the solutions ;
     7. Check the blackboard ;
     if 8. the surfing path belongs to the top solutions then
          go to 2 ;
     else
          9. Abandon the current surfing path and Go to 1 ;


Blackboard System The blackboard system is a communication model that
allows information sharing between the agents in our system. More concretely, it
is an area of shared memory accessible in read and write mode by all the agents.
The blackboard facilitates the collaboration between the agents as it offers them
the possibility to exchange knowledge and share their partial solutions and sug-
gestions for the sake of finding the best information sources that give answers
to the user’s interest.

Blackboard-Handler Agent The main function of this agent is to sort the
solutions reported by the foraging agents in the blackboard according to their
relevance to the user interest. Each time a foraging agent reports a new solution
in the blackboard, the blackboard-handler agent inserts it in the right position
in order to keep the top solutions sorted. Procedure 2 shows the pseudo-code of
the blackboard-handler agent.

3     Preliminary Experiments
In this section we report the preliminary experiments that we have undertaken
on both the DBLP and ACM repositories. Both repositories give access to sci-
VI


 Procedure 2: Blackboard-Handler Agent
      1. Check the blackboard ;
      2. Sort the solutions proposed by the foraging agents according to their
       relevance ;
      3. As soon as a new solution is reported by a foraging agent insert it in the right
       position in the sorted list containing the top solutions ;
      4. Update the top solutions list on the blackboard ;
      5. Go to 3;


entific publications through their websites where each scientific publication has
a corresponding Web page. Based on experiments conducted on the considered
datasets, we fixed the values of the empirical parameters of Formula (2) as fol-
lows: α = 2, β = 1. Furthermore, We used 80 foraging agents and fixed the size
of the top solutions list to 25.


3.1     Datasets and Examples of Results

We built our first dataset using the XML version of the DBLP website available
on http://dblp.uni-trier.de/xml. We constructed a partial graph of the website
considering 65,336 articles that represent the nodes of the graph. After that, we
linked all the articles that share at least one author in common. The obtained
graph contains 65,336 nodes, it has a size of 640,810 and a density equal to
0.15 ∗ 10−3 .
    As a second Dataset, we used the ACM-Citation-network V8 [11], which
contains 2,381,688 papers extracted from ACM along with 10,476,564 citation
relationships. Each paper is associated with an abstract, a list of authors, a year,
a venue, and a title.
    Table 1 shows the top five results on the DBLP dataset for the user’s in-
terest described in the first column. The second column exhibits the last page
of the surfing path found by the system. The third and fourth columns present
respectively the score of the surfing path and the surfing depth.
    Table 2 presents the outcomes yielded by our approach on the ACM citation
network. For each user interest we give the full surfing path starting from the
first visited page. We notice that the last page in the surfing path is the most
relevant to the user. However, the other pages are also related to the user’s
interest, and they may be interesting to her/him.


3.2     Comparative Evaluations

In order to validate our approach, we compared the results produced by our
system to those returned by the DBLP and the ACM local search engines. In
fact, both repositories are equipped with local search engines (http://dblp.uni-
trier.de/search/, http://dl.acm.org/), which give the Web users the opportunity
to perform a keyword-based search for scientific publications. We selected about
                                                                                  VII

      User’s Interest            Last Web page in the Surfing Path        Score Surfing
                                                                                Depth
                              Approximate Retrieval: A Comparison of       0.86    4
                                Information Retrieval and Database
                                        Management Systems
      Approximate,         Exploiting the Functionality of Object-Oriented 0.67     3
Personalized, Information,       Database Management Systems for
   Retrieval, Database,                 Information Retrieval
 Management, Systems             Models for Integrated Information         0.62     3
                                   Retrieval and Database Systems
                             Personalized Information Retrieval System     0.57     3
                                   Enriching Information Retrieval         0.44     2
             Table 1. Top 5 ranked results for a user interest on DBLP


20 user’s interests representing information needs related to scientific domains
and related research fields. We introduced the selected user’s interests separately
as inputs to our system, and then we performed a search with those same user’s
interests by using the local search engines of each website. The results produced
by our approach were similar to those of the DBLP and ACM search engines,
however there are some differences it is worth to point out. The first difference is
related to the fact that although the datasets we used do not cover the complete
DBLP and ACM repositories, this did not affect the results quality returned
by our approach regarding the tested user’s interests. The second difference
resides in the way of presenting the search results to the user. Both DBLP and
ACM employ Information Retrieval techniques, for each search a set of results is
returned. The results are ranked according to their relevance to the considered
query, and each of them direct to a Web page dedicated to a scientific publication.
On the other hand, our approach returns a set of surfing paths also ranked by
their relevance. Each surfing path contains one or more Web pages with the
last one being the most relevant. Our way of presenting the search results using
Information Foraging takes into consideration the relations between the different
scientific publications/Web pages, unlike the classical way used by the majority
of search engines. For instance, the knowledge about the co-authorship relation
in our DBLP dataset and the knowledge about the citation relation in the ACM-
Citation-network enhance the search process by allowing the system to benefit
from those relations, thus not being only dependent on the keywords describing
the user’s interest.

3.3   Users’ Feedback
We asked a group of 10 Master’s and Ph.D. students to give us 3 topics related
to their research area on which they want to get scientific articles. At the end we
got 30 different user interests, each one of them is modeled by a set of keywords
that describe the information need of the user. After that, we have run our
program for each user interest and presented the outcomes to the corresponding
VIII

 User’s Interests                           Surfing Path                             Surfing
                                                                                     Depth
    Ant, System,    1     The cloud-based framework for ant colony optimization
    Optimization    2           Ant system: optimization by a colony                     2
                                         of cooperating agents
                     1     Automatic detection of photoresist residual layer in
Neural, Networks            lithography using a neural classification approach           2
                     2                      Neural Networks
                     1      Counting solutions of CSPs: a structural approach
Satisfiability, Sat, 2            CNF satisfiability test by counting and
  Optimization                           polynomial average time                         3
                     3 Global Optimization for Satisfiability (SAT) Problem
                     1    Automated content transformation with adjustment for
  Information,                 visual presentation related to terminal types
     Foraging        2     The Web Book and the Web Forager: an information              3
                                   workspace for the World-Wide Web
                     3          Information foraging in information
                                          access environments
  Information, 1          How far we’ve come: Impact of 20 years of multimedia
    Retrieval,       2 Content-Based Image Retrieval at the End of the Early Years
   Algorithms        3           Text databases and information retrieval
                     4 Information retrieval: data structures and algorithms             4

Table 2. Results for different users’ interests on the ACM-Citation-network V8 dataset


user. We then asked each user to determine the position of the most relevant
result among the list of ranked results.
    Figure 2 details the feedback we got from the users. The values on the hor-
izontal axis represent the position of the surfing path that is the most relevant
to the user while the values on the vertical axis correspond to the rate of the
satisfied user interests.


4      Conclusion

We proposed in this paper a new collaborative multi-agent approach to Web
Information Foraging where the foraging agents cooperate to find relevant Web
pages according to a user’s interest. We defined a Web surfing strategy inspired
from real Web users’ behavior and gave the pseudo code of the software agents.
In order to obtain a preliminary evaluation of our approach and compare it to
classical IR, we considered two datasets with different sizes.
    The results we achieved show the ability of our approach to find relevant Web
pages in an effective way based on a user interest. Our approach is also capable
of exploiting relations related to the Web structure. For instance, it can use the
citations and co-author relations in order to access scientific publications.
                                                                                       IX


               Fig. 2. Outcomes of a user study involving 10 students.


References

 1. Berman, F.: From teragrid to knowledge grid. Communications of the ACM 44,
    27–28 (2001)
 2. Broder, A.Z.: A taxonomy of web search. sigir forum. SIGIR Forum 36(2), 3–10
    (2002)
 3. J., L., S.W, Z.: Characterizing web usage regularities with information foraging
    agents. IEEE Trans. Knowl. Data Eng 40, 74787491 (June 2004)
 4. Marchionini, G.: Exploratory search: from finding to understanding. Commun.
    ACM 49(4), 41–46 (2006), http://doi.acm.org/10.1145/1121949.1121979
 5. Mirizzi, R., Noia, T.D.: From exploratory search to web search and back. In: Pro-
    ceedings of the Third Ph.D. Workshop on Information and Knowledge Manage-
    ment, PIKM 2010, Toronto, Ontario, Canada, October 30, 2010. pp. 39–46 (2010),
    http://doi.acm.org/10.1145/1871902.1871910
 6. Morris, M.R.: Collaborative search revisited. In: Proceedings of the 2013 Confer-
    ence on Computer Supported Cooperative Work. pp. 1181–1192. CSCW ’13, ACM,
    New York, NY, USA (2013), http://doi.acm.org/10.1145/2441776.2441910
 7. Paik, J., Pirolli, P.: ACT-R models of information foraging in geospatial intelligence
    tasks. Computational & Mathematical Organization Theory 21(3), 274–295 (2015),
    http://dx.doi.org/10.1007/s10588-015-9185-x
 8. Piorkowski, D., Fleming, S., Scaffidi, C., Bogart, C., Burnett, M., John,
    B., Bellamy, R., Swart, C.: Reactive information foraging: An empirical
    investigation of theory-based recommender systems for programmers. In:
    Proceedings of the SIGCHI Conference on Human Factors in Comput-
    ing Systems. pp. 1471–1480. CHI ’12, ACM, New York, NY, USA (2012),
    http://doi.acm.org/10.1145/2207676.2208608
 9. Pirolli, P., Card, S.K.: Information foraging. Psychological Review 106(4), 643–675
    (1999)
10. Pirolli, P., Fu, W.T.: Snif-act: A model of information foraging on the world wide
    web. User Modeling 2003, 9th International Conference 22-26, 45–54 (2003)
11. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and
    mining of academic social networks. In: KDD’08. pp. 990–998 (2008)
12. Werner, E.E., Hall, D.J.: Optimal foraging and the size selection of prey by the
    bluegill sunfish (lepomis macrochirus). Ecology 55(5), 1042 (1974)
X

13. White, R.W., Roth, R.A.: Exploratory Search: Beyond the Query-
    Response     Paradigm.     Synthesis   Lectures     on   Information    Concepts,
    Retrieval,   and    Services,   Morgan      &    Claypool    Publishers    (2009),
    http://dx.doi.org/10.2200/S00174ED1V01Y200901ICR003
14. Zhu, Y., Zhong, N., Xiong, Y.: Data explosion, data nature and dataology. Proceed-
    ings of the 2009 International Conference on Brain Informatics, BI’09, Springer-
    Verlag (2009)