<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A review of web crawling approaches</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Elda</forename><surname>Xhumari</surname></persName>
							<email>elda.xhumari@fshn.edu.al</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">Universtity of Tirana</orgName>
								<address>
									<addrLine>Boulevard &quot;Zogu I&quot;</addrLine>
									<postCode>1001</postCode>
									<settlement>Tirana</settlement>
									<country key="AL">Albania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Izaura</forename><surname>Xhumari</surname></persName>
							<email>izaura.xhumari@fshn.edu.al</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">Universtity of Tirana</orgName>
								<address>
									<addrLine>Boulevard &quot;Zogu I&quot;</addrLine>
									<postCode>1001</postCode>
									<settlement>Tirana</settlement>
									<country key="AL">Albania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A review of web crawling approaches</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0BA0F4F07E124461823D0D6DED00C908</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Web crawler</term>
					<term>Algorithms</term>
					<term>Types of web crawlers</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Websites are getting richer and richer with information in different formats. The data that such sites possess today goes through millions of terabytes of data, but not every information that is on the net is useful. To enable the most efficient internet browsing for the user, one methodology is to use web crawler. This study presents web crawler methodology, the first steps of development, how it works, the different types of web crawlers, the benefits of using and comparing their operating methods which are the advantages and disadvantages of each algorithm used by them.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The world wide web is a large collection of data. Data that continue to grow day by day. Nowadays it has become an important part of human life to use the internet to gain access to information on the World Wide Web. Due to bandwidth, storage capacity, limited computer resources and the rapid growth of the World Wide Web, unforeseen scaling challenges have arisen for search engines. The two most important features of the web such as the large volume of data and the speed of their change pose a difficulty for web crawling, as there are a large number of pages which are added, changed and deleted every day. Although search engine technology has dramatically scaled up to keep pace CEUR Workshop Proceedings (CEUR-WS.org) with the rise of the Web, these general-purpose search engines and crawlers have encountered some limitations as follows: 1  -It is impossible for them to index and analyze all pages and keep these search indexes up to date. -They may return hundreds or more links to a user's query, due to misunderstanding of the query pages run by these links may not be closely related to the user query. -They may not meet query requirements with different backgrounds, purposes and periods. -Dynamic content, such as news and financial data, on the Web is growing and changing frequently. Many search engines can take up to a month to refresh their indexes across the Web. Therefore, query results are probably not valid at the time the request is made. Therefore, it is necessary for a technology which enables fast crawling search to assemble web pages with the highest possible content and quality and keeping these pages up to date. This "problem" can be solved using indexes which are built by a web crawler. A web crawler, otherwise known as a "web spider", is a program that browses the World Wide Web by "clicking" on any link they find and collects information found automatically. However, building a large index for web pages is not the only web crawler application <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">How it works?</head><p>The crawler maintains a list of unvisited URLs called frontier. The list is first initialized with URLs provided by a user or other program. Each crawl cycle involves selecting a URL from the list and retrieving the corresponding page for that URL via HTTP, analyzing it to extract URLs and specific information, and finally adding these unvisited URLs to the frontier list. Before being added to the list these URLs may be marked a point depending on the benefit achieved if the page with the corresponding URL is visited. The crawl process may end when a certain number of crawled pages are accessed. If the crawler is ready to visit another page and the frontier list is empty then the situation signals a dead end for the crawler and since the crawler no longer has new pages to visit it stops. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Types of web crawler</head><p>Different types of web crawlers are available depending on how the web pages are crawled and how the future web pages are retrieved and accessed. Some of which are as follows.</p><p>A. Incremental Crawler An incremental web crawler is one of the traditional crawlers, which constantly updates an existing set of downloaded pages instead of restarting the crawling process from scratch each time. This includes some way to determine if a page has changed since it was last downloaded. Pages can appear multiple times in the crawler order, and crawling is an ongoing process that conceptually never ends. To have an updated content of downloaded web pages, an incremental web crawler links the review of previously downloaded pages to the first visit to new pages <ref type="bibr" target="#b1">[2]</ref>. The goal is to achieve updating and coverage at the same time. The advantage of an incremental web crawler is that only valuable data is provided to the user, thus the network bandwidth is stored and data enrichment is achieved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Form Focused Crawler</head><p>Form Focused Crawler deals with the rare distribution of forms on the Web. Form Crawler avoids crawling through unproductive links by restricting search to a specific topic, learning the characteristics of links and pages that lead to pages containing searchable forms, and using appropriate stopping criteria. Web crawler uses two rankings: site and links to guide its search. Later, a third classifier: the shape classifier is used to filter out useless forms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Focused Crawler</head><p>A Focused Crawler collects documents which are specific and related to the given topic. Sometimes this crawler is also known as Topic Crawler to approach how it works. Focused Crawler is a web crawler that tends to transfer pages that are related to each other. Determines if the given page has similarities to the specific topic. One of the advantages of the focused crawler is the economic flexibility in hardware and network resources. It reduces the amount of network traffic, logging and downloads <ref type="bibr" target="#b2">[3]</ref>. Focused Crawler searches, acquires, indexes, and maintains pages for specific groups of topics that represent a relatively narrow segment of the network. This crawler is run by a classifier that learns to recognize the importance of taxonomy embedded examples, and a distiller that identifies current online priority points.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2: General architecture of Focused Web Crawler</head><p>Focused Crawler is a new approach to increasing accuracy and expert internet search. An ideal focused crawler could only download those pages that are related to the topic while ignoring other pages and would anticipate the possibility of a link to a specific topic related to the topic before downloading it. Focused Crawler has three main components: a classifier that makes important judgments on crawled pages to decide on the extension of downloaded links, a distiller that sets a crawl center measure to determine visit preferences, and a crawler which has dynamically reconfigurable priority controls dominated by the classifier and distiller.</p><p>Focused crawler aims to provide a simpler alternative to overcoming the issue that instant pages which are low ranking related to the topic in question. The idea is to recursively execute an exhaustive search to a certain depth, starting with the relatives of a highly ranked page <ref type="bibr" target="#b3">[4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Parallel Crawler</head><p>As the size of the internet grows, it becomes difficult to retrieve the entire or a major portion of the web employing a single method. Therefore, several search engines typically run multiple processes in parallel to perform the above task, so download rate is maximized. This kind of crawler is known as a parallel crawler <ref type="bibr" target="#b4">[5]</ref>. We can also say that when multiple crawlers are usually run in parallel, it's referred as Parallel crawlers. A parallel crawler consists of multiple crawling processes referred to as C-procs which can run on network of workstations <ref type="bibr" target="#b5">[6]</ref>. The Parallel crawlers rely on Page freshness and Page selection. A Parallel crawler may be on local network or be distributed at geographically different locations. Parallelization of the crawling system is extremely important from the purpose of read of downloading documents in an affordable quantity of time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Distributed Crawler</head><p>Distributed Web Crawler is a distributed computing technique. Many crawlers are operating for distribution within the web crawling method to master as much web coverage as possible <ref type="bibr" target="#b6">[7]</ref>. A central server manages the communication and synchronization of nodes, as it is geographically distributed. Mainly uses PageRank algorithm to increase its efficiency and quality search. The advantage of distributed web crawler is that it is not affected by system crashes or various events and can be adapted by many crawling applications.</p><p>To design an efficient web crawler, it is required to create the distribution task between multiple machines in a synchronous process. Large websites should be distributed individually on the network and they should provide the right chance and rationality for synchronous access. Meanwhile synchronous distribution saves network bandwidth resources <ref type="bibr" target="#b3">[4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Web crawling algorithms</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>i. Breadth First Search</head><p>It starts with a small set of pages and then explores other pages following the links in the first width. Indeed, websites are not strictly traversed at first glance, but can use a variety of policies. For example, it may crawl the most important pages first. This method is used by many search engines. This crawler balances the load between servers. Breadth first algorithm work on a level by level, i.e. algorithm starts at the root URL and searches the all the neighbors URL at the same level. If the desired URL is found, then the search terminates. If it is not, then search proceeds down to the next level and repeat the processes until the goal is reached. When all the URLs are scanned, but the objective is not found, then the failure reported is generated. Breadth first Search algorithm is generally used where the objective lies in the depthless parts in a deeper tree <ref type="bibr">[8]</ref>.</p><p>ii.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Depth First Search</head><p>This is an algorithm for traversing or searching tree or graph data structures. It is a technique of systematically examining the search starting from the root node and penetrating deeper through the child node. If there is more than one child, then priority is given to the child on the left and penetrates deeply until there are no more children available. Returns to the other unexplored node and then proceeds in a similar manner. This algorithm ensures that all edges are visited at once. It is suitable for search problems, but when the branches are large, then this algorithm can end up in an endless loop.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>iii. Best First Search</head><p>Best first algorithms are often used to find search paths. Best First Search is a search algorithm that roams a graph starting from the most promising node selected according to a specified rule. The basic idea is that having a URL limit, the best URL according to some evaluation criteria such as accuracy, recall, accuracy, and points (F-Score). In this algorithm, the URL selection process is driven by lexical similarity between the topic keywords and the URL source page. Thus, the similarity between the page and the topic keywords is used to evaluate the fit with all the outbound links of the page. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fish Search Algorithm</head><p>The main principle of the algorithm is: it takes as input an initial URL and a search query, and dynamically builds a list of priorities (initialized with the initial URL) of the next URLs (referred to as nodes) to be explored. In each step the first node is removed from the list and processed. As the text of each document becomes available, it is analyzed by a scoring component assessing whether it is relevant or irrelevant to the search query (value 1-0) and, based on that result, a heuristic decides to pursue the search in that direction or not: Whenever a document source is retrieved, it is scanned for links. Nodes run by these links are assigned a depth value. If the parent is important, the depth of the children is set to a predetermined value. Otherwise, the depth of the children is set to be one less than the depth of the parent. When the depth reaches zero, the direction is interrupted and none of his children are included in the list <ref type="bibr" target="#b7">[9]</ref>.</p><p>v.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Shark-Search Algorithm</head><p>Fish-Search algorithm's main flaw is that the interrelated computation is too simple, it only has 0 and 1, interrelated and irrelevant respectively. Secondly, every node's potential score has a low precision which only has three situations (0,0.5, and l). Aimed at these disadvantages, Michael Hersovici <ref type="bibr" target="#b8">[10]</ref> brought forward an improved Shark-Search algorithm which mainly ameliorates page, interrelated query computation and potential score' s computing method. The following process is its detail:</p><p>-Import vector space model to compute the page and user query's relativity. -Consider the information given by anchor text near the hyperlink and compute the relativity between it and user's query. -Calculate both of the above two factors with child node's potential score computing formula. Through these betterments, Shark-Search algorithm's efficiency is much better than Fish-Search's <ref type="bibr" target="#b1">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>vi. Page Rank Algorithm</head><p>In Page rank algorithm web crawler decides on the importance of web pages in each web site through the total number of links or citations per page. Page rank is calculated according to the Relatedness between web pages by the Page Rank algorithm. Website ranking calculation:</p><formula xml:id="formula_0">𝑃𝑅 (𝐴) = (1 − 𝑑) + 𝑑 ( 𝑃𝑅(𝑇 1 ) 𝐶(𝑇 1 ) + ⋯ + 𝑃𝑅(𝑇 𝑛 ) 𝐶(𝑇 𝑛 ) ),<label>(1)</label></formula><p>where PR(A) -Page Rank of a given Page,D -Dumping factor 𝑇 1 -links.</p><p>To find the Page Rank for a page, called PR (A), you must first find all the pages that link to page A and Out Link from A. If we find a page 𝑇 1 that links with A then page C (𝑇 1 ) will give the number of outbound links on page A. The same procedure is done for pages 𝑇 2 , 𝑇 3 and all other pages that can be linked to the main page A -and the sum of their values will provide the Page Rank of the website <ref type="bibr" target="#b9">[11]</ref>. Table <ref type="table">1</ref> Advantages and limitations of web crawling algorithm</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Approach</head><p>A traditional crawler worked simply by extracting static data from HTML code and most websites until recently would undergo the same crawler process. The crawling process is no longer as simple as it was a few years ago, due to the increasing use of JavaScript frameworks such as Angular, React, Meteor. Many of the websites are JavaScript heavy and generates content by doing asynchronous JavaScript calls after page is loaded. The use of these frameworks makes developer life simpler and provides many benefits for creating dynamic sites. To crawl this type of web sites Web Crawlers, use Selenium.</p><p>Selenium is a Web Browser Automation Tool originally designed to automate web applications for testing purposes. It is now used for many other applications such as automating web-based admin tasks, interact with platforms which do not provide API, as well as for Web Crawling.</p><p>Building a focused web crawler using selenium tool is good way to collect useful information. Focused Crawler is an approach to increase accuracy and expert internet search. An ideal focused crawler could only download those related pages by ignoring other pages and would anticipate the possibility of a link to a specific topic related site before downloading it.</p><p>One use case of a focused web crawler is extracting financial data. Financial market is a place of risks and instability. It's hard to predict how the curve will go and sometimes, for investors, one decision could be a make-or-break move. That's why experienced practitioners never lose track of the financial data. Financial data, when extracted and analyzed in real time, can provide wealthy information for investments and trading. And people in different positions scrape financial data for varied purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>Web Crawler is the essential source of information retrieval which roams the Web and downloads web documents that suit the user's need. Web crawler is used by search engines and other users to regularly ensure that their database is up to date. In this article has been presented a review of different types crawling technologies and algorithms, why "focused crawling" technology is being used. The crawling algorithm is the most important part of any search engine. Focused Crawlers uses more complex systems and</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm</head><p>Advantages Limitations</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Breadth First Search</head><p>Suitable for situations where the solution is located at the beginning in a deep tree.</p><p>If a solution is far away then it consumes time. Consumes a large amount of memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Depth First Search</head><p>Suitable for in-depth search problems. Consumes very little memory.</p><p>If the edges are large then this algorithm can end in an endless cycle.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fish Search</head><p>The algorithm is helpful in forming the priority table.</p><p>The usage of resources of network is high. Fish search crawlers significantly load not only network, also web servers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Shark Search</head><p>The algorithm mainly ameliorates page, interrelated query computations and potential score's computing method.</p><p>The usage of resources of network is high.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Page Rank</head><p>In a short time, the most important pages are returned as Rank is calculated on the basis of the popularity of a page.</p><p>Favors older pages, because a new page, even a very good one, will not have many links unless it is part of an existing web site.</p><p>techniques to define the information of high relevance and quality. Searching algorithm is the heart of the search engine system. The choice of the algorithm has a significant impact on the work and effectiveness of focused crawler and search engine.</p><p>In conclusion the focused crawler compared to different crawlers is intended for advanced web users focuses on specific topic and it does not waste the resources on irrelevant material.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The Data Flow of a Crawler</figDesc><graphic coords="2,64.80,235.95,204.40,283.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The best-first algorithm pseudo-code</figDesc><graphic coords="4,64.80,464.71,229.15,261.11" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A framework for dynamic indexing from hidden web</title>
		<author>
			<persName><surname>Mahmud</surname></persName>
		</author>
		<author>
			<persName><surname>Hasan &amp; Soulemane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohammad</forename><surname>Moumie &amp; Rafiuzzaman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Science Issues</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Song Juping Department of Electronic Engineering</title>
		<author>
			<persName><forename type="first">Su</forename><surname>Guiyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Jianhua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ma</forename><surname>Yinghua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Shenghong</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">200030. Received April 10, 2004</date>
			<pubPlace>Shanghai; P. R. China</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Slanghai Jiaotong University</orgName>
		</respStmt>
	</monogr>
	<note>New Focused Crawling Algorithm</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Web Crawling</title>
		<author>
			<persName><forename type="first">Christopher</forename><surname>Olston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marc</forename><surname>Najork</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<author>
			<persName><forename type="first">Gautam</forename><surname>Pant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Padmini</forename><surname>Srinivasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Filippo</forename><surname>Menczer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Crawling the Web (4-6</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
		<respStmt>
			<orgName>Department of Management Sciences School of Library and Information Science, The University of Iowa</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Web Crawler: A Review</title>
		<author>
			<persName><forename type="first">Dhiraj</forename><surname>Khurana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Satish</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IJCSMS International Journal of Computer Science &amp; Management Studies</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">01</biblScope>
			<date type="published" when="2012-01">January 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Study of Web Crawler and its Different Types</title>
		<author>
			<persName><forename type="first">V</forename><surname>Trupti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ravindra</forename><forename type="middle">D</forename><surname>Udapure</surname></persName>
		</author>
		<author>
			<persName><surname>Kale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rajesh</surname></persName>
		</author>
		<author>
			<persName><surname>Dharmik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IOSR Journal of Computer Engineering (IOSR-JCE)</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="05" />
			<date type="published" when="2014-02">Feb. 2014</date>
		</imprint>
	</monogr>
	<note>Ver. VI</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Janar</title>
		<author>
			<persName><forename type="first">Yugandhara</forename><surname>Patil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sonal</forename><surname>Patil</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Junghoo Cho and Hector Garcia-Molina -Effective Page Refresh Policies for Web Crawlersǁ ACM Transactions on Database Systems</title>
				<imprint>
			<date type="published" when="2003">2016. January 2016. 2003</date>
			<biblScope unit="volume">5</biblScope>
		</imprint>
	</monogr>
	<note>Review of Web Crawlers with Specification and Working</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Focused Web Crawling Algorithms</title>
		<author>
			<persName><forename type="first">*</forename><surname>Andas Amrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">X</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<pubPlace>Shanghai, China</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The Shark-Search algorithm. An applicatication: Taibred Web Site Mapping</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Hersovici</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michal</forename><surname>Jaoov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maarek</forename><surname>Yoelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Networks and ISDN Systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="317" to="326" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A Kind of Algorithm For Page Ranking Based on Classified Tree In Search Engine</title>
		<author>
			<persName><forename type="first">Chong</forename><surname>Tian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc International Conference on Computer Application and System Modeling</title>
				<meeting>International Conference on Computer Application and System Modeling<address><addrLine>ICCASM</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
