=Paper= {{Paper |id=None |storemode=property |title=Caching Schema for Mobile Web information Retrieval |pdfUrl=https://ceur-ws.org/Vol-702/paper11.pdf |volume=Vol-702 |dblpUrl=https://dblp.org/rec/conf/www/LeeGKT02 }} ==Caching Schema for Mobile Web information Retrieval== https://ceur-ws.org/Vol-702/paper11.pdf
                          Caching Schema
               for Mobile Web Information Retrieval
                  R. Leey, K. Goshimay, Y. Kambayashiy and H. Takakuraz
                     Graduate School of Informatics, Kyoto Universityy
                         Data Processing Center, Kyoto Universityz
                                Sakyo Kyoto 606-8501, Japan
                        fryong, gossy, yahikog@db.soc.i.kyoto-u.ac.jp,
                              takakura@rd.kudpc.kyoto-u.ac.jp

     Abstract
     The web cache management in mobile devices becomes to be an important problem.
     In this paper, we discuss that the traditional cache method LRU(Least Recently Used)
     is not sucient in mobile environments. Since the data related to the places already
     visited by the mobile user may not be used again, a user behavior in the mobile en-
     vironments should be considered. In another aspect of web data uses, we are often
     connecting to the web with the wired broadband networks. Before traveling with the
     mobile devices that are limited in the storage space and the power, carrying out the web
     data in the mobile cache will be a good strategy to reduce the e orts in mobile-based
     web information retrieval. In the pre-fetching of web data, we can improve the cache
     eciency signi cantly by dealing with metadata such as selected URL's (with proper
     keywords) instead of taking out the whole web contents. In this paper, caching algo-
     rithms considering web contents, metadata and user behavior history are developed.
     In order to determine priority of web pages, word relationships obtained from a volume
     of web pages are used.


1 Introduction                                            rithms to determine the priorities of contents
                                                          in the cache. For example, in the traditional
One of the serious bottle-necks of mobile                 cache, data used recently will have high pri-
systems is slow communication speed. If                   ority. In mobile applications, if a user vis-
required information is stored in the cache               ited a place, information related to the place
of the mobile system, it will be retrieved                may not be required later. As the cache size
rapidly. As cache size is limited in mobile               is limited we will store URL's instead of web
systems, we need to develop proper algo-                  contents if the priority is not very high.
rithms for such a purpose.                                   If a user is located at place A, there are
   In this paper, we will discuss priority com-           some possibility to visit place B, if both are
putation algorithms for mobile systems to be              related. Strength of relationships is calcu-
used for sight-seeing. Although the applica-              lated by the number of appearance of (A,B)
tion is limited to identify the problems, it will         pair in one paragraph (or one web page).
be rather easy to extend the result to other              Usually the pair (A,B) appear frequently if
cases.                                                    they are located in a short distance. There
   Compared with traditional CPU cache and                are, however, cases they are far a part. In
web cache, we have to develop di erent algo-              such cases these locations are semantically
                                                     1
                                                    110
related (for example, having similar nature).           we can use complicated algorithms for web
   In order to nd out relationship among                cache[1, 2].
words, we rst classi ed words into G-
words(geo-words, related to location names)               1. Data sizes distribute from very small to
and N-words(non-geo-words). We have col-                     very large(for example, video).
lected 2 million web pages related to Kyoto               2. We can use disks for cache, so the cache
City in Japan, which are the basis of deriving               size can be rather large.
word relationships.
   In Section 2, we will discuss the character-           3. Update is performed at web pages.
istics of mobile cache algorithms by compar-                 Some cache contents will become out-
ing other typical cache algorithms. Section 3                of-dates easily.
shows organization of mobile cache for travel             4. There are some related pages (shown by
guide as example. How to determine prior-                    links).
ities of cache contents using word relation-
ships is discussed in Section 4. Consideration            5. There are several di erent usage pat-
of web environments is discussed in Section                  terns, which will appear periodically.
5. Section 6 shows some problems found by                    For example, usage pattern from 9 a.m.,
simulation.                                                  just before 5 p.m., and nights may be
                                                             di erent.
2 Characteristics of Mobile                                The most serious problem is size[2, 4]. If
  Cache                                                 we put one large data object in the cache,
                                                        other data cannot be stored in the cache. We
In this section we will compare traditional             need to consider the size beside the recency
CPU cache, web cache, and mobile cache to               treated by LRU. Furthermore, because of the
be discussed in this paper, to identify prob-           above 5, we need to adjust usage patterns. A
lems of mobile cache.                                   simple way is to use popularity (how many
                                                        times a page is used in 24 hours).
2.1 Traditional CPU Cache                                  We have developed a complicated web
Requirements and characteristics for CPU                cache algorithm considering size, recency and
cache are as follows.                                   popularity[3].

 1. All the data are equal size.                        2.3 Cache for Mobile Systems
 2. Cache size is very small.                           Mobile systems usually have a small amount
                                                        of cache and data usage patterns is quite dif-
 3. Update of data is performed in cache.               ferent from the above two cases[6, 8].
 4. In many case data are accessed sequen-                 Usually cache is used to utilize part data.
    tially.                                             Requirement for mobile systems is quite dif-
                                                        ferent. For example, a person visit a famous
   As speed of CPU is very high, we need to             sight-seeing spot, the map to approach the
use very ecient algorithm for cache replace-           place may not be used again, although it may
ment. LRU(Least Recently Used) is very                  be used frequently before. Another example
popular, since the computational overhead               is that all the lunch information will not be
is very low and reasonably good results are             utilized after the user takes a lunch.
obtained[7].                                               Since the communication speed is very
                                                        slow, the cache system have to store data to
2.2 Web Cache                                           be used near future, by predicting user's be-
                                                        havior. For example, a person is going to
Unlike CPU, the communication speed is                  a some place by a car, parking information
very slow for the case of web cache, thus               near the place should be stored in advance.
                                                   2
                                                  111
  As a cache size is small, we may have to              3.1 Cache Contents
store URL's instead of contents. To nd                  As a strategy to use contents in mobile en-
proper URL's, keywords for each URL are                 vironment, there are the following three pat-
required.                                               terns.
  As a summary, requirements and charac-
teristics of mobile cache are as follows.                1. Storing selected web contents into mo-
                                                            bile devices
 1. The cache size is rather small.                         If all necessary web contents are stored
 2. Cache algorithm should use prediction                   in cache, we do not need to use the
    information of the owner's behavior.                    web. Communication cost will be re-
                                                            duced. Since some web contents may be
        If the data object is used and                     updated frequently, the user cannot set
         it is predicted not to be used                     the latest information of such pages by
         again, contents will be erased from                this method.
         the cache. The URL and some                     2. Downloading all necessary web informa-
         meta information (usage informa-                   tion when requested
         tion) should be stored in the cache,               Conventional cellular phones use this
         unless usage is sure to be never oc-               method, since they cannot store a large
         curred again.                                      amount of data.
        By predicting future user behavior,
         we can order the URL's stored in                3. Storing metadata which help retrieval
         the cache using metadata attached                  necessary web pages
         to them. Contents of important                     Instead of storing all the web contents,
         URL's are retrieved in advanced,                   we can store only URL's in order to re-
         since the communication speed is                   duce the storage cost. Web page selec-
         slow.                                              tion is performed before getting the con-
                                                            tents. We can store a large number of
 3. By the current web page used by the                     URL's for small cache area.
    user, we can add URL's from its link                  According to the travel plan, we can de-
    information if required.                            termine the priority of web pages.
  For traveling purpose, before travel we can           Web pages with the top level priority
store URL's, which may be required during                   Contents of these web pages are actu-
the travel. By actual usage of web pages we                 ally stored in the cache, if they will
have to modify the URL list by the above 3.                 not change frequently. Otherwise, only
                                                            URL's are stored.
3 Mobile Travel Guide                                   Web pages with the second level priority
                                                            Only parts of web contents are stored
In this section, we will discuss functional re-             with URL's. These partial contents
quirements for a mobile travel guide support                are used to nd proper URL's. Link
system. Mobile environment has many con-                    information may be also stored in
straints such as limited storages, slow com-                cache. For partial contents, we can use
munication, poor user interfaces and slow                   frequently appearing keywords and/or
operation speed. To overcome these prob-                    the top parts (including the titles) of
lems, we will discuss generalized cache algo-               pages.
rithms for mobile systems. Unlike conven-
tional cache, metadata are also cached and              Web pages with the third level priority
user behavior is re ected to cached contents                Only URL's are stored. Figure 1 shows
as discussed in Section 2.                                  the organization of the system to be
                                                   3
                                                  112
    discussed in this paper. According to              ment with enough bandwidth. This phase
    the user behavior history, priorities of           consists of two kinds of works. One is to de-
    web pages are dynamically modi ed.                 cide visiting places in the target area. The
    We use word relationships to determine             other is to download data to be used oine
    the priority.                                      which will be needed for guide.
                                                          First, a user can get knowledge of the
                                                       places by browsing web pages, and determine
                                                       his objective spots and visiting order.
                                                          Then, required metadata are stored into
                                                       the mobile device. The required metadata
                                                       are
                                                          lists of G-words, keywords, and URL's,
                                                           and their relationships
                                                          G-words of user's destination and key-
                                                           words of his interests
                                                          initial values of user status (e.g. money
                                                           conditions, previously visited, etc.)

      Figure 1: System Organization                       In addition, web page contents themselves
                                                       are downloaded to store into web page cache.
                                                       That reduces access frequency of mobile re-
                                                       trieval.
3.2 System Functions
The system has functions of a web page                 3.2.2 Retrieval-and-Guide Phase
cache management and a dynamic travel
guide, using metadata such as relationships            In retrieval-and-guide phase, two functions
of geowords or keywords derived from the               are provided:
web, and user environment parameters. Two
phases are considered in the system, planning             web page cache management based on
phase and retrieval-and-guide phase.                       page priority ranking
   Planning phase is performed at home be-
fore the departure of a tour. In this phase,              active dynamic travel guide and page
the system works to support of surveying                   pre-fetching
destination area by retrieving web pages, de-
cision making of target spots and visiting                Active dynamic travel guide is a function
order, and storing metadata (relations, user           to recommend the user web pages related to
status) into the mobile devices.                       the current location, interesting keywords or
   On the other hand, retrieval-and-guide              his other conditions. This function consid-
phase is during the travel in outdoors. This           ers users who have not well prepared or un-
phase works for ecient management of web              expected plan changes. Guide selects pages
page cache based on page priority ranking,             which have high priority in the cache and
and active dynamic travel guide by sugges-             show the user, as if it says \Why don't you
tion of closely related web pages to the user.         visit this spot written on this page?" In this
                                                       way, the user can nd the next destination
3.2.1 Planning Phase                                   suitable to him. We regard this function as
                                                       a kind of guide agent. Moreover, the speed
In planning phase, we assume that the user             of retrieval can be improved by pre-fetching
can use the wired communication environ-               the target pages.
                                                  4
                                                 113
4 A Page Priority Ranking
  Algorithm with Geograph-
  ical Relationships
In this section, we will discuss a replace-
ment algorithm for web page cache which
stores web contents in mobile environment.
The algorithm calculates the priority rank-
ing of pages based on user environment and
makes low priority pages to replace. Though
various factors (parameters) can be used as                          Figure 2: G-G model
user environment, we rst introduce a cache
model based on only location information               for current user and possibility to be accessed
(geowords), and we will extend to the model            in the near future, based on weight values of
using keywords representing user's interests           G-G relations.
in the next step.
                                                       4.3 Priority Computation for G-
4.1 Word Relationships                                     words
From the web pages related to Kyoto, we                To describe the algorithm based on above-
have derived relationship among words[5].              mentioned basic policies, we introduce the
Words are classi ed into G-words(geo-                  following de nitions.
words, location names) and N-words(non-
geo-words). If two words are appeared fre-             De nition 1
quently in one paragraph, or one web page,                         ( i j ) is a weight of G-G rela-
                                                           W eight g ; g

we assume that there are strong relationships              tion between geoword i and j . If there
                                                                                          g        g

among the two words we use frequency to                    are no G-G relations between i and j ,      g           g

show the strength of the relationship.                             ( i j ) = 0.
                                                           W eight g ; g


                                                       De nition 2
4.2 G-G Model                                              Given a geoword j related to a page , a
                                                                                 g                             p

We rst introduce the model using only G-G                  score of p by gj , Score[gj ](p), is a prior-
relations (relations between geowords). We                 ity value of determined by j . In G-G
                                                                         p                         g

call this model G-G model in the later of the              model,       [ j ]( ) =
                                                                     S core g    p    ( j U ).
                                                                                          W eight g ; g

paper.                                                 De nition 3
   G-G model has a graph structure shown in                Given geowords 1
Figure 2. Nodes in this graph are classi ed                                        n related to a
                                                                                 g ; :::; g


into three groups. The rst group in the left               page i, a score set of i ,
                                                                 p                         p   ( i ),
                                                                                               ScoreSet p


side is a page cache , which is the set of web
                    P
                                                           is a set f   [ 1 ]( i )
                                                                      S core g       p   [ n ]( i )g.
                                                                                         ; :::; S core g       p


pages i . The second group in the center is
      p
                                                         Using these de nitions, the algorithm of
a geoword set , called index of . Each j
               G                  P        g
                                                       page priority ranking in G-G model is de-
represent regional references of each of the           scribed as follows.
web pages. The third group in the right side
is a single node, user's current location.              1. Calculate          ( i ) for each i in .
                                                                        S coreS et p                       p       P
   There exist weighted G-G relations be-                  (In each          ( i ), scores are sorted
                                                                      ScoreSet p
tween U and each j . A pair with no G-G
      g             g
                                                           in descending order)
relations is treated as a zero-weighted G-G
relation.                                               2. Sort the pages i by their score sets with
                                                                             p

   The purpose of this algorithm is to rank                higher score. (i.e. by descending lexico-
the pages in cache in the order of priority
          p         P                                      graphic order)
                                                  5
                                                 114
  This order is the priority ranking of web              pages will be modi ed by user's history of
pages in P .                                             visits.
                                                            Another example is that pages about
4.4 Keywords                                             downtown which have a lot of restaurant in-
                                                         formation are likely to be more important at
Geowords in G-G Model can be equivalently                lunch or dinner time. How much money a
replaced by keywords, which represent inter-             user has, will change the priority of restau-
ests of the user. Moreover, an integrated                rants, high grade or fast foods.
model GK-GK Model can be introduced as il-                  These examples show that it is important
lustrated in Figure 3. By using this extended            to apply parameters for satisfying priority
model, the same algorithm can be applied for             ranking re ecting various user environments.
calculating page priority to which both cur-             In this section, we discuss methods to intro-
rent location and a user interest are consid-            duce these parameters in addition to rela-
ered.                                                    tionship to the ranking algorithm.
                                                         5.1 Score Control with Parameters
                                                         In order to control value of score, we de ne
                                                         Score Control Parameter as follows.
                                                         De nition 4
                                                             Score Control Parameter PARAMi is
                                                             expressed by the triplet (Vi ; fi ; i ).
                                                           Vi is a set of criterion values for increasing
                                                         and decreasing scores. Each element of Vi
          Figure 3: GK-GK Model                          means, for instance, user's condition such as
                                                         history of visit or money conditions etc., and
                                                         values of external environments such as time
                                                         or weather etc. fi is a function which gives
5 User Environments                                      increment of score, which is determined by
                                                         an index c (either geowords or keywords) and
In the previous section, we discussed a cache            a set of criterion values Vi . i is a constant
model and an algorithm using geoword sand                number representing a weight of PARAMi .
keywords as user status. However, further-               By the parameters de ned in above, follow-
more consideration is required in order to               ing de nition can be introduced.
judge page priority ranking re ecting users'
requests more e ectively.                                De nition 5
   Assume that a user is trying to retrieve                  Score by index c of page p,
and get information about his current or                     FixedScore[c](p) is
neighborhood location from the web pages in
order to his next visiting place. He queries
the system the pages related to his current                                    +
                                                                                       X
                                                              FixedScore[c](p) = Score[c](p)
                                                                                     i fi(c; Vi )
                                                                                              
location and interest. But when he checked                                               i
the returned web page, it is about the lo-
cation where he visited just now. He will be                When executing the above algorithm,
disappointed because he does not to want in-             FixedScore[c](p) should be used as elements
formation about places which he had visited.             of score sets instead of scores. By adding the
   In this case, it is natural that a user hopes         value f (c; V ) determined by c and V to the
to know the places where he has never vis-               score, result ranking of priority can be con-
ited. That is to say, priority of cached web             trolled to re ect user's demand more. i is
                                                    6
                                                   115
                                                           applied this case.
                                                              In order to calculate temporal priority
                                                           changes, present time t should be set as an
                                                           element of V . To derive the function f , an
                                                           appropriate database which provides a corre-
                                                           spondences table between geoword and avail-
                                                           able time period is prepared.
                                                           Money Conditions It is expected that
                                                           electrical payment using mobile devices will
                                                           be more popular in the near future. In
                                                           such environments, since users' money can
Figure 4: Score Control with User Parame-                  be managed with their mobile equipments,
ters                                                       that is also possible to be used for parame-
                                                           ters of the page priority ranking algorithm.
a weight value which indicates how strongly                One of its advantages is, for example, a guide
each PARAMi should a ect on the scores.                    for rich users to area which has many lux-
  Concrete examples of user parameters for                 ury shops or facilities for which expensive en-
score control are as follows.                              trance fee is required. Instead, users without
                                                           much money can be lead to reasonable shops
                                                           or spots.
Order of Visit For a user at a particular                     For this purpose, the user's available
time, information about a place which he has               money m is set as V . Incremental function f
ever visited tend to be less important. Fur-               is prepared a database providing correspon-
thermore the later he visited a place, the less            dences between geowords and money ranges.
importance and motivation to visit.
   In order to calculate, geowords sequence of
visiting g1 ; g2 ; : : : ; gn and the past visited
         f              g
                                                           6 Simulation
history g;1 ; g;2 ; : : : ; g;m are added to the
        f                    g
                                                           We had made simulations of ranking web
set of user status values V . The function f is            pages in order to evaluate our algorithm.
de ned to be maximum at the nearest future,                   First, random 100, 500, and 1000 pages
minimum at the latest, and the farther from                (URL's) related to Kyoto City are prepared
present the closer to 0. For example, given                as target web page sets to rank. They are ex-
present time t0 and the place (geoword) vis-               tracted from our web page collection which
ited at time t be gt , the incremental value is            have been gathered for our developing re-
represented as following formula:                          gional search engine. Each of the pages has
                                                           its related geowords.
                f (gt ; V ) = et;1 t0                         Next, assumed current locations of the
                                                           mobile user are determined. we chose two
Temporal Changes of Priority Ge-                           geowords Ginkakuji and Gion at random,
owords of downtown, such as Kawaramachi                    which are names of regions in Kyoto.
and Gion for instance, are likely to have                     Then these page sets, their related ge-
higher importance at lunch or dinner time                  owords, and user's current locations are
because users wants to have a meal at that                 adapted to our page priority ranking algo-
time. That fact has few concerns with re-                  rithm. The most basic model G-G model
lations between regions. Moreover, touristic               (considering only relations of geowords) is
facilities (e.g. museums or zoos) usually have             used. Six trials are done by the combination
business hours, therefore we must consider                 of three page sets and two user locations.
whether they are available or not now. Tem-                   As a result, we succeeded to place pages
poral events such as festivals are also to be              about current locations or neighborhood to
                                                      7
                                                     116
the upper ranking. However, an important                many geowords but have rich information,
tendency was found. In every trial of rank-             we must consider more semantics and mean-
ing, similar pages are found with high prior-           ings of pages, such as geographical cover-
ity, for example,                                       age/popularity or degree of details. That is
                                                        one of our future work.
     www.momonga.org/kyoto/spot-j.html
      (lists of attractive spots with beautiful         Acknowledgment
        owers in Kyoto)                                  This work has been supported by 'Universal
      gourmet.yahoo.co.jp/gourmet/restaurant/
                                                        Design in Digital City' Project in CREST of
  

      Kinki/Kyoto/list/area genre/260005 0207
                                                        JST (Japan Science and Technology Corpo-
      (lists of restaurants in Kyoto)                   ration).
     www.kyoto-np.co.jp/kp/topics/2000jun/
      bk index.html
                                                        References
      (news back numbers in Kyoto)                       [1] M. Abrams, C. R. Standridge, G. Abdulla,
                                                             S. Wiliams, and E. A. Fox, \Caching Prox-
Common characteristics of these pages are                    ies: Limitations and Potentials,\ Proc. of
that they includes many kinds of geowords                    the 4th International WWW Conference,
in Kyoto. We should exclude such pages to                    1995.
get better results.                                      [2] C. Aggarwal, J.L. Wolf, and P.S. Yu,
   In order to avoid such cases, we must con-                \Caching on the World Wide Web,\ IEEE
sider more semantics and meanings of pages,                  Transactions on knowledge and data engi-
such as geographical coverage/popularity or                  neering, 11(1), 1999.
degree of details.                                       [3] K. Cheng, Y. Kambayashi, \LRU-SP: A size
                                                             adjustable popularity-aware LRU replace-
                                                             ment algorithm for Web Caching,\ IEEE
7 Conclusion                                                 COMPSAC, IEEE CS Press, pp. 48-53,
                                                             2000.
In this paper, we proposed a web page                    [4] R. Karedla, J. S. Love, and B. G.
cache management method in mobile envi-                      Wheery, \Caching Strategies to Improve
ronments.                                                    Disk System Performance,\ IEEE Com-
   An application system for travel guide                    puter, 27(3):38-46, 1994.
support using the cache management meth-                 [5] R. Lee, H. Takakura, and Y. Kambayashi,
ods is suggested. Metadata are stored in                     \Visual Query Processing for GIS with Web
mobile devices at planning phase, and web                    Contents,\ Proc. of the 6th IFIP Work-
retrieval during travel is supported by ef-                  ing Conference on Visual Database Systems,
  cient cache management at retrieval-and-                   May 2002. (to appear)
guide phase. The system provides another                 [6] Q. Ren, M.H. Dunham, \Using Seman-
function, an active and dynamic travel guide.                tic Caching to Manage Location Depen-
   A priority ranking algorithm for web pages                dent Data in Mobile Computing," The 6th
is suggested for the system. It uses relation-               Annual International Conference on Mobile
ships between users' current locations and                   Computing and Networking (MobiCom'00),
geographical information of web pages. This                  August 2000.
algorithm can be extended into using key-                [7] A. Silberschatz and P. B. Galvin, \Oper-
words of interests, or other various user pa-                ating Systems Concepts,\ Addison Wesley,
rameters such as history of visit, time, money               Reading, MA, fourth edition, 1994.
conditions, and so on.                                   [8] B. Zheng, D. L. Lee, \Semantic Caching
   As a result of some simulations, a prob-                  in Location-Dependent Query Process-
lem is revealed that pages which have many                   ing," The 7th International Symposium on
kinds of geowords tend to ranked higher.                     Spatial and Temporal Databases, Springer-
                                                             Verlag, LNCS 2121, pp. 97-113, 2001.
In order to nd variable pages without
                                                   8
                                                  117