=Paper=
{{Paper
|id=None
|storemode=property
|title=Caching Schema for Mobile Web information Retrieval
|pdfUrl=https://ceur-ws.org/Vol-702/paper11.pdf
|volume=Vol-702
|dblpUrl=https://dblp.org/rec/conf/www/LeeGKT02
}}
==Caching Schema for Mobile Web information Retrieval==
Caching Schema
for Mobile Web Information Retrieval
R. Leey, K. Goshimay, Y. Kambayashiy and H. Takakuraz
Graduate School of Informatics, Kyoto Universityy
Data Processing Center, Kyoto Universityz
Sakyo Kyoto 606-8501, Japan
fryong, gossy, yahikog@db.soc.i.kyoto-u.ac.jp,
takakura@rd.kudpc.kyoto-u.ac.jp
Abstract
The web cache management in mobile devices becomes to be an important problem.
In this paper, we discuss that the traditional cache method LRU(Least Recently Used)
is not sucient in mobile environments. Since the data related to the places already
visited by the mobile user may not be used again, a user behavior in the mobile en-
vironments should be considered. In another aspect of web data uses, we are often
connecting to the web with the wired broadband networks. Before traveling with the
mobile devices that are limited in the storage space and the power, carrying out the web
data in the mobile cache will be a good strategy to reduce the e orts in mobile-based
web information retrieval. In the pre-fetching of web data, we can improve the cache
eciency signi cantly by dealing with metadata such as selected URL's (with proper
keywords) instead of taking out the whole web contents. In this paper, caching algo-
rithms considering web contents, metadata and user behavior history are developed.
In order to determine priority of web pages, word relationships obtained from a volume
of web pages are used.
1 Introduction rithms to determine the priorities of contents
in the cache. For example, in the traditional
One of the serious bottle-necks of mobile cache, data used recently will have high pri-
systems is slow communication speed. If ority. In mobile applications, if a user vis-
required information is stored in the cache ited a place, information related to the place
of the mobile system, it will be retrieved may not be required later. As the cache size
rapidly. As cache size is limited in mobile is limited we will store URL's instead of web
systems, we need to develop proper algo- contents if the priority is not very high.
rithms for such a purpose. If a user is located at place A, there are
In this paper, we will discuss priority com- some possibility to visit place B, if both are
putation algorithms for mobile systems to be related. Strength of relationships is calcu-
used for sight-seeing. Although the applica- lated by the number of appearance of (A,B)
tion is limited to identify the problems, it will pair in one paragraph (or one web page).
be rather easy to extend the result to other Usually the pair (A,B) appear frequently if
cases. they are located in a short distance. There
Compared with traditional CPU cache and are, however, cases they are far a part. In
web cache, we have to develop di erent algo- such cases these locations are semantically
1
110
related (for example, having similar nature). we can use complicated algorithms for web
In order to nd out relationship among cache[1, 2].
words, we rst classi ed words into G-
words(geo-words, related to location names) 1. Data sizes distribute from very small to
and N-words(non-geo-words). We have col- very large(for example, video).
lected 2 million web pages related to Kyoto 2. We can use disks for cache, so the cache
City in Japan, which are the basis of deriving size can be rather large.
word relationships.
In Section 2, we will discuss the character- 3. Update is performed at web pages.
istics of mobile cache algorithms by compar- Some cache contents will become out-
ing other typical cache algorithms. Section 3 of-dates easily.
shows organization of mobile cache for travel 4. There are some related pages (shown by
guide as example. How to determine prior- links).
ities of cache contents using word relation-
ships is discussed in Section 4. Consideration 5. There are several di erent usage pat-
of web environments is discussed in Section terns, which will appear periodically.
5. Section 6 shows some problems found by For example, usage pattern from 9 a.m.,
simulation. just before 5 p.m., and nights may be
di erent.
2 Characteristics of Mobile The most serious problem is size[2, 4]. If
Cache we put one large data object in the cache,
other data cannot be stored in the cache. We
In this section we will compare traditional need to consider the size beside the recency
CPU cache, web cache, and mobile cache to treated by LRU. Furthermore, because of the
be discussed in this paper, to identify prob- above 5, we need to adjust usage patterns. A
lems of mobile cache. simple way is to use popularity (how many
times a page is used in 24 hours).
2.1 Traditional CPU Cache We have developed a complicated web
Requirements and characteristics for CPU cache algorithm considering size, recency and
cache are as follows. popularity[3].
1. All the data are equal size. 2.3 Cache for Mobile Systems
2. Cache size is very small. Mobile systems usually have a small amount
of cache and data usage patterns is quite dif-
3. Update of data is performed in cache. ferent from the above two cases[6, 8].
4. In many case data are accessed sequen- Usually cache is used to utilize part data.
tially. Requirement for mobile systems is quite dif-
ferent. For example, a person visit a famous
As speed of CPU is very high, we need to sight-seeing spot, the map to approach the
use very ecient algorithm for cache replace- place may not be used again, although it may
ment. LRU(Least Recently Used) is very be used frequently before. Another example
popular, since the computational overhead is that all the lunch information will not be
is very low and reasonably good results are utilized after the user takes a lunch.
obtained[7]. Since the communication speed is very
slow, the cache system have to store data to
2.2 Web Cache be used near future, by predicting user's be-
havior. For example, a person is going to
Unlike CPU, the communication speed is a some place by a car, parking information
very slow for the case of web cache, thus near the place should be stored in advance.
2
111
As a cache size is small, we may have to 3.1 Cache Contents
store URL's instead of contents. To nd As a strategy to use contents in mobile en-
proper URL's, keywords for each URL are vironment, there are the following three pat-
required. terns.
As a summary, requirements and charac-
teristics of mobile cache are as follows. 1. Storing selected web contents into mo-
bile devices
1. The cache size is rather small. If all necessary web contents are stored
2. Cache algorithm should use prediction in cache, we do not need to use the
information of the owner's behavior. web. Communication cost will be re-
duced. Since some web contents may be
If the data object is used and updated frequently, the user cannot set
it is predicted not to be used the latest information of such pages by
again, contents will be erased from this method.
the cache. The URL and some 2. Downloading all necessary web informa-
meta information (usage informa- tion when requested
tion) should be stored in the cache, Conventional cellular phones use this
unless usage is sure to be never oc- method, since they cannot store a large
curred again. amount of data.
By predicting future user behavior,
we can order the URL's stored in 3. Storing metadata which help retrieval
the cache using metadata attached necessary web pages
to them. Contents of important Instead of storing all the web contents,
URL's are retrieved in advanced, we can store only URL's in order to re-
since the communication speed is duce the storage cost. Web page selec-
slow. tion is performed before getting the con-
tents. We can store a large number of
3. By the current web page used by the URL's for small cache area.
user, we can add URL's from its link According to the travel plan, we can de-
information if required. termine the priority of web pages.
For traveling purpose, before travel we can Web pages with the top level priority
store URL's, which may be required during Contents of these web pages are actu-
the travel. By actual usage of web pages we ally stored in the cache, if they will
have to modify the URL list by the above 3. not change frequently. Otherwise, only
URL's are stored.
3 Mobile Travel Guide Web pages with the second level priority
Only parts of web contents are stored
In this section, we will discuss functional re- with URL's. These partial contents
quirements for a mobile travel guide support are used to nd proper URL's. Link
system. Mobile environment has many con- information may be also stored in
straints such as limited storages, slow com- cache. For partial contents, we can use
munication, poor user interfaces and slow frequently appearing keywords and/or
operation speed. To overcome these prob- the top parts (including the titles) of
lems, we will discuss generalized cache algo- pages.
rithms for mobile systems. Unlike conven-
tional cache, metadata are also cached and Web pages with the third level priority
user behavior is re ected to cached contents Only URL's are stored. Figure 1 shows
as discussed in Section 2. the organization of the system to be
3
112
discussed in this paper. According to ment with enough bandwidth. This phase
the user behavior history, priorities of consists of two kinds of works. One is to de-
web pages are dynamically modi ed. cide visiting places in the target area. The
We use word relationships to determine other is to download data to be used oine
the priority. which will be needed for guide.
First, a user can get knowledge of the
places by browsing web pages, and determine
his objective spots and visiting order.
Then, required metadata are stored into
the mobile device. The required metadata
are
lists of G-words, keywords, and URL's,
and their relationships
G-words of user's destination and key-
words of his interests
initial values of user status (e.g. money
conditions, previously visited, etc.)
Figure 1: System Organization In addition, web page contents themselves
are downloaded to store into web page cache.
That reduces access frequency of mobile re-
trieval.
3.2 System Functions
The system has functions of a web page 3.2.2 Retrieval-and-Guide Phase
cache management and a dynamic travel
guide, using metadata such as relationships In retrieval-and-guide phase, two functions
of geowords or keywords derived from the are provided:
web, and user environment parameters. Two
phases are considered in the system, planning web page cache management based on
phase and retrieval-and-guide phase. page priority ranking
Planning phase is performed at home be-
fore the departure of a tour. In this phase, active dynamic travel guide and page
the system works to support of surveying pre-fetching
destination area by retrieving web pages, de-
cision making of target spots and visiting Active dynamic travel guide is a function
order, and storing metadata (relations, user to recommend the user web pages related to
status) into the mobile devices. the current location, interesting keywords or
On the other hand, retrieval-and-guide his other conditions. This function consid-
phase is during the travel in outdoors. This ers users who have not well prepared or un-
phase works for ecient management of web expected plan changes. Guide selects pages
page cache based on page priority ranking, which have high priority in the cache and
and active dynamic travel guide by sugges- show the user, as if it says \Why don't you
tion of closely related web pages to the user. visit this spot written on this page?" In this
way, the user can nd the next destination
3.2.1 Planning Phase suitable to him. We regard this function as
a kind of guide agent. Moreover, the speed
In planning phase, we assume that the user of retrieval can be improved by pre-fetching
can use the wired communication environ- the target pages.
4
113
4 A Page Priority Ranking
Algorithm with Geograph-
ical Relationships
In this section, we will discuss a replace-
ment algorithm for web page cache which
stores web contents in mobile environment.
The algorithm calculates the priority rank-
ing of pages based on user environment and
makes low priority pages to replace. Though
various factors (parameters) can be used as Figure 2: G-G model
user environment, we rst introduce a cache
model based on only location information for current user and possibility to be accessed
(geowords), and we will extend to the model in the near future, based on weight values of
using keywords representing user's interests G-G relations.
in the next step.
4.3 Priority Computation for G-
4.1 Word Relationships words
From the web pages related to Kyoto, we To describe the algorithm based on above-
have derived relationship among words[5]. mentioned basic policies, we introduce the
Words are classi ed into G-words(geo- following de nitions.
words, location names) and N-words(non-
geo-words). If two words are appeared fre- De nition 1
quently in one paragraph, or one web page, ( i j ) is a weight of G-G rela-
W eight g ; g
we assume that there are strong relationships tion between geoword i and j . If there
g g
among the two words we use frequency to are no G-G relations between i and j , g g
show the strength of the relationship. ( i j ) = 0.
W eight g ; g
De nition 2
4.2 G-G Model Given a geoword j related to a page , a
g p
We rst introduce the model using only G-G score of p by gj , Score[gj ](p), is a prior-
relations (relations between geowords). We ity value of determined by j . In G-G
p g
call this model G-G model in the later of the model, [ j ]( ) =
S core g p ( j U ).
W eight g ; g
paper. De nition 3
G-G model has a graph structure shown in Given geowords 1
Figure 2. Nodes in this graph are classi ed n related to a
g ; :::; g
into three groups. The rst group in the left page i, a score set of i ,
p p ( i ),
ScoreSet p
side is a page cache , which is the set of web
P
is a set f [ 1 ]( i )
S core g p [ n ]( i )g.
; :::; S core g p
pages i . The second group in the center is
p
Using these de nitions, the algorithm of
a geoword set , called index of . Each j
G P g
page priority ranking in G-G model is de-
represent regional references of each of the scribed as follows.
web pages. The third group in the right side
is a single node, user's current location. 1. Calculate ( i ) for each i in .
S coreS et p p P
There exist weighted G-G relations be- (In each ( i ), scores are sorted
ScoreSet p
tween U and each j . A pair with no G-G
g g
in descending order)
relations is treated as a zero-weighted G-G
relation. 2. Sort the pages i by their score sets with
p
The purpose of this algorithm is to rank higher score. (i.e. by descending lexico-
the pages in cache in the order of priority
p P graphic order)
5
114
This order is the priority ranking of web pages will be modi ed by user's history of
pages in P . visits.
Another example is that pages about
4.4 Keywords downtown which have a lot of restaurant in-
formation are likely to be more important at
Geowords in G-G Model can be equivalently lunch or dinner time. How much money a
replaced by keywords, which represent inter- user has, will change the priority of restau-
ests of the user. Moreover, an integrated rants, high grade or fast foods.
model GK-GK Model can be introduced as il- These examples show that it is important
lustrated in Figure 3. By using this extended to apply parameters for satisfying priority
model, the same algorithm can be applied for ranking re ecting various user environments.
calculating page priority to which both cur- In this section, we discuss methods to intro-
rent location and a user interest are consid- duce these parameters in addition to rela-
ered. tionship to the ranking algorithm.
5.1 Score Control with Parameters
In order to control value of score, we de ne
Score Control Parameter as follows.
De nition 4
Score Control Parameter PARAMi is
expressed by the triplet (Vi ; fi ; i ).
Vi is a set of criterion values for increasing
and decreasing scores. Each element of Vi
Figure 3: GK-GK Model means, for instance, user's condition such as
history of visit or money conditions etc., and
values of external environments such as time
or weather etc. fi is a function which gives
5 User Environments increment of score, which is determined by
an index c (either geowords or keywords) and
In the previous section, we discussed a cache a set of criterion values Vi . i is a constant
model and an algorithm using geoword sand number representing a weight of PARAMi .
keywords as user status. However, further- By the parameters de ned in above, follow-
more consideration is required in order to ing de nition can be introduced.
judge page priority ranking re ecting users'
requests more e ectively. De nition 5
Assume that a user is trying to retrieve Score by index c of page p,
and get information about his current or FixedScore[c](p) is
neighborhood location from the web pages in
order to his next visiting place. He queries
the system the pages related to his current +
X
FixedScore[c](p) = Score[c](p)
i fi(c; Vi )
location and interest. But when he checked i
the returned web page, it is about the lo-
cation where he visited just now. He will be When executing the above algorithm,
disappointed because he does not to want in- FixedScore[c](p) should be used as elements
formation about places which he had visited. of score sets instead of scores. By adding the
In this case, it is natural that a user hopes value f (c; V ) determined by c and V to the
to know the places where he has never vis- score, result ranking of priority can be con-
ited. That is to say, priority of cached web trolled to re ect user's demand more. i is
6
115
applied this case.
In order to calculate temporal priority
changes, present time t should be set as an
element of V . To derive the function f , an
appropriate database which provides a corre-
spondences table between geoword and avail-
able time period is prepared.
Money Conditions It is expected that
electrical payment using mobile devices will
be more popular in the near future. In
such environments, since users' money can
Figure 4: Score Control with User Parame- be managed with their mobile equipments,
ters that is also possible to be used for parame-
ters of the page priority ranking algorithm.
a weight value which indicates how strongly One of its advantages is, for example, a guide
each PARAMi should a ect on the scores. for rich users to area which has many lux-
Concrete examples of user parameters for ury shops or facilities for which expensive en-
score control are as follows. trance fee is required. Instead, users without
much money can be lead to reasonable shops
or spots.
Order of Visit For a user at a particular For this purpose, the user's available
time, information about a place which he has money m is set as V . Incremental function f
ever visited tend to be less important. Fur- is prepared a database providing correspon-
thermore the later he visited a place, the less dences between geowords and money ranges.
importance and motivation to visit.
In order to calculate, geowords sequence of
visiting g1 ; g2 ; : : : ; gn and the past visited
f g
6 Simulation
history g;1 ; g;2 ; : : : ; g;m are added to the
f g
We had made simulations of ranking web
set of user status values V . The function f is pages in order to evaluate our algorithm.
de ned to be maximum at the nearest future, First, random 100, 500, and 1000 pages
minimum at the latest, and the farther from (URL's) related to Kyoto City are prepared
present the closer to 0. For example, given as target web page sets to rank. They are ex-
present time t0 and the place (geoword) vis- tracted from our web page collection which
ited at time t be gt , the incremental value is have been gathered for our developing re-
represented as following formula: gional search engine. Each of the pages has
its related geowords.
f (gt ; V ) = et;1 t0 Next, assumed current locations of the
mobile user are determined. we chose two
Temporal Changes of Priority Ge- geowords Ginkakuji and Gion at random,
owords of downtown, such as Kawaramachi which are names of regions in Kyoto.
and Gion for instance, are likely to have Then these page sets, their related ge-
higher importance at lunch or dinner time owords, and user's current locations are
because users wants to have a meal at that adapted to our page priority ranking algo-
time. That fact has few concerns with re- rithm. The most basic model G-G model
lations between regions. Moreover, touristic (considering only relations of geowords) is
facilities (e.g. museums or zoos) usually have used. Six trials are done by the combination
business hours, therefore we must consider of three page sets and two user locations.
whether they are available or not now. Tem- As a result, we succeeded to place pages
poral events such as festivals are also to be about current locations or neighborhood to
7
116
the upper ranking. However, an important many geowords but have rich information,
tendency was found. In every trial of rank- we must consider more semantics and mean-
ing, similar pages are found with high prior- ings of pages, such as geographical cover-
ity, for example, age/popularity or degree of details. That is
one of our future work.
www.momonga.org/kyoto/spot-j.html
(lists of attractive spots with beautiful Acknowledgment
owers in Kyoto) This work has been supported by 'Universal
gourmet.yahoo.co.jp/gourmet/restaurant/
Design in Digital City' Project in CREST of
Kinki/Kyoto/list/area genre/260005 0207
JST (Japan Science and Technology Corpo-
(lists of restaurants in Kyoto) ration).
www.kyoto-np.co.jp/kp/topics/2000jun/
bk index.html
References
(news back numbers in Kyoto) [1] M. Abrams, C. R. Standridge, G. Abdulla,
S. Wiliams, and E. A. Fox, \Caching Prox-
Common characteristics of these pages are ies: Limitations and Potentials,\ Proc. of
that they includes many kinds of geowords the 4th International WWW Conference,
in Kyoto. We should exclude such pages to 1995.
get better results. [2] C. Aggarwal, J.L. Wolf, and P.S. Yu,
In order to avoid such cases, we must con- \Caching on the World Wide Web,\ IEEE
sider more semantics and meanings of pages, Transactions on knowledge and data engi-
such as geographical coverage/popularity or neering, 11(1), 1999.
degree of details. [3] K. Cheng, Y. Kambayashi, \LRU-SP: A size
adjustable popularity-aware LRU replace-
ment algorithm for Web Caching,\ IEEE
7 Conclusion COMPSAC, IEEE CS Press, pp. 48-53,
2000.
In this paper, we proposed a web page [4] R. Karedla, J. S. Love, and B. G.
cache management method in mobile envi- Wheery, \Caching Strategies to Improve
ronments. Disk System Performance,\ IEEE Com-
An application system for travel guide puter, 27(3):38-46, 1994.
support using the cache management meth- [5] R. Lee, H. Takakura, and Y. Kambayashi,
ods is suggested. Metadata are stored in \Visual Query Processing for GIS with Web
mobile devices at planning phase, and web Contents,\ Proc. of the 6th IFIP Work-
retrieval during travel is supported by ef- ing Conference on Visual Database Systems,
cient cache management at retrieval-and- May 2002. (to appear)
guide phase. The system provides another [6] Q. Ren, M.H. Dunham, \Using Seman-
function, an active and dynamic travel guide. tic Caching to Manage Location Depen-
A priority ranking algorithm for web pages dent Data in Mobile Computing," The 6th
is suggested for the system. It uses relation- Annual International Conference on Mobile
ships between users' current locations and Computing and Networking (MobiCom'00),
geographical information of web pages. This August 2000.
algorithm can be extended into using key- [7] A. Silberschatz and P. B. Galvin, \Oper-
words of interests, or other various user pa- ating Systems Concepts,\ Addison Wesley,
rameters such as history of visit, time, money Reading, MA, fourth edition, 1994.
conditions, and so on. [8] B. Zheng, D. L. Lee, \Semantic Caching
As a result of some simulations, a prob- in Location-Dependent Query Process-
lem is revealed that pages which have many ing," The 7th International Symposium on
kinds of geowords tend to ranked higher. Spatial and Temporal Databases, Springer-
Verlag, LNCS 2121, pp. 97-113, 2001.
In order to nd variable pages without
8
117