<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Compression of a Dictionary</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jan</forename><surname>Lánský</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Physics Department of Software Engineering</orgName>
								<orgName type="institution">Charles University</orgName>
								<address>
									<addrLine>Malostranské nám. 25</addrLine>
									<postCode>118 00</postCode>
									<settlement>Praha 1</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Mathematics and Physics Department of Software Engineering</orgName>
								<orgName type="institution">Charles University</orgName>
								<address>
									<addrLine>Malostranské nám. 25</addrLine>
									<postCode>118 00</postCode>
									<settlement>Praha 1</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Michal</forename><surname>Žemlička</surname></persName>
							<email>michal.zemlicka@mff.cuni.cz</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Physics Department of Software Engineering</orgName>
								<orgName type="institution">Charles University</orgName>
								<address>
									<addrLine>Malostranské nám. 25</addrLine>
									<postCode>118 00</postCode>
									<settlement>Praha 1</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Mathematics and Physics Department of Software Engineering</orgName>
								<orgName type="institution">Charles University</orgName>
								<address>
									<addrLine>Malostranské nám. 25</addrLine>
									<postCode>118 00</postCode>
									<settlement>Praha 1</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Compression of a Dictionary</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CA967CCAA2E1796AA28E145B319F579B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Some text compression methods take advantage from using more complex compression units than characters. The synchronization between coder and decoder then can be done by transferring the unit dictionary together with the compressed message. We propose to use a dictionary compression method based on a proper ordering of nodes of the tree-organized dictionary. This reordering allows achieving of better compression ratio. The proposed dictionary compression method has been tested to compress dictionaries for word-and syllable-based compression methods. It seems to be effective for compressing dictionaries of syllables, and promising for larger dictionaries of words.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Dictionary is used in many applications. Sometimes the space occupied by a dictionary is important and should be taken into account. Then it is reasonable to consider storing the dictionary in a compressed form. We propose here a method for the compression of dictionaries. We have focused on dictionaries used for text compression -or even more precisely: on a compression of dictionaries used by word-or syllable-based document compression methods. The comparisons with other methods are therefore oriented to this topic.</p><p>The paper is structured as follows: At first (in part two) we describe the dictionaries and give some formal definitions. Then, in part three, we remember some existing methods used to store the dictionaries. Part four is dedicated to the newly proposed methods. The comparisons of the tested methods are presented in part five. Last part (sixth) is dedicated to the summary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Dictionary</head><p>We suppose that a dictionary is a set of ordered pairs (string, number), where the string is a string over an alphabet Σ and the number is an integer of the range 1-n where n is the number of the ordered pairs in the dictionary.</p><p>It is sometimes useful to partition the set of all strings (dictionary items) into several disjoint categories. It is possible that the join of the categories does not cover the set of all possible strings over Σ. In this case it is necessary to ensure that the input strings always fit in the given categories.</p><p>For the text compression purposes this requirement can be met e.g. by a proper input string selection (partition of the input message into properly formed subparts). Words and syllables are special types of such strings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Existing Methods for the Compression of a Dictionary</head><p>It is quite common for the papers on word-and syllable-based compression methods that their authors give no big importance to the compression of the dictionary as the dictionary often makes only a small part of the resulting compressed message. It is probably true for very large documents but for middle-sized documents the importance of a dictionary size grows as the dictionary takes larger part of the compressed message.</p><p>The following two approaches are the most widely used: The first approach is based on coding of a succession of strings (words or syllables) contained in it. In the second approach the dictionary is compressed as a whole. All the strings are concatenated using special separators. The resulting file is then compressed using some general method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Compression of Dictionary character-by-character -CD</head><p>There is described a method in <ref type="bibr" target="#b0">[1]</ref> for the encoding of strings using a partitioning of the strings into five categories, similarly to the method TD3 described below. Every string is encoded as a succession of string type codes followed by encoded string length and by the codes for individual symbols. String type is encoded using binary phase coding (c1), string length is encoded by adaptive Huffman code (c2), and individual symbols are coded also using adaptive Huffman code (letters by c3, numbers by c4, and other characters by c5). Lower and upper case letters use the same code value c3, they are distinguished by the syllable type. All adaptive Huffman trees are initialized according language specification. Examples are given in Fig. <ref type="figure">1</ref>.</p><p>code("to") = c1(mixed), c2(2), c3('t'), c3('o') code("153") = c1(numeric), c2(3), c4('1'), c4('5'), c4('3') code(". ") = c1(other), c2(2), c5('.'), c5('0') Fig. <ref type="figure">1</ref>. An example of a coding a string by the CD method It is not necessary to know the whole dictionary at the beginning. It is possible to compress individual items on the fly. It is then possible to encode new items whenever they are encountered. Other methods discussed in this paper need to compress the whole dictionary at once.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">External Compression of a Dictionary</head><p>Let us have a separator being not part of the used alphabet. Let all the strings forming the dictionary are concatenated to a single string using this separator. The resulting string is then encoded using an arbitrary compression method. In <ref type="bibr" target="#b1">[2]</ref> the authors tried to encode the dictionary of word using gzip, PPM, and bzip2 methods and recognized as best for this purpose bzip2. We tried to encode the dictionary using bzip2 <ref type="bibr" target="#b2">[3]</ref> (in the tables denoted as BzipD -bzip compressed dictionary) and LZW <ref type="bibr" target="#b3">[4]</ref> (denoted in the tables as LZWD -LZW compressed dictionary).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Trie-Based Compression of a Dictionary</head><p>When designing here introduced methods TD1, TD2, and TD3 we decided to represent the dictionary by a data structure trie [5, Section 6.3: Digital Searching, pp. 492-512]. Trie T is a tree of maximal degree n, where n is the size of the alphabet of symbols Σ and satisfies following conditions: The root represents an empty element. Let the string α be represented by the node A, the string β represented by the node B. If the node A is father of the node B, then the string β is created by concatenation of the string α by one symbol from Σ. For all nodes A and B exists a node C, that represents common prefix of strings α and β and this node is on both paths (including border points) from the root to B and from the root to A.</p><p>The dictionary trie is created from the strings appearing in the text. Then the trie is encoded. Duriung this encoding there is a unique number assigned to each string using depth-first traversal of the trie.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Basic Version -TD1</head><p>Trie compression of a dictionary (TD) is based on coding structure of a trie representing the dictionary. For each node in the trie we know the following: whether the node represents a string (represents), the number of sons (count), the array of sons (son), and the first symbol of an extension for each son (extension). Basic version of such encoding (TD1) is given by a recursive procedure EncodeNode1 in Fig. <ref type="figure">2</ref> which traverse the trie by a depth first search (DFS) method. For encoding the whole dictionary we call this procedure on the root of the trie representing the dictionary.</p><p>In procedure EncodeNode1 we code only a number of sons and the distances between the extensions of sons. For non-leaf nodes we must encode in one bit whether that node represents a dictionary item (e.g. syllable or word) or not. Leafs represent dictionary items always, it is not necessary to code it. Differences between extensions of the sons are given as distances of binary values of the extending characters. For coding of a number of sons and the distances between them we use gamma and delta Elias codes <ref type="bibr" target="#b5">[6]</ref>. We have tested other Elias codes too, but we achieved the best results for the gamma and delta codes. The numbers of sons and the distances between them can reach the value 0, but standard versions of gamma and delta codes starts from 1 what means that these codings do not support this value. We therefore use slight modifications of Elias gamma and delta codes: gamma 0 (x) = gamma(x + 1) and delta 0 (x) = delta(x + 1). An example is given in Fig. <ref type="figure">3</ref>. The example dictionary contains the strings ".\n", "ACM", "AC", "to", and "the". Let us introduce the TD1 method by coding the root of the trie representing our example dictionary:</p><p>In the node we must first encode the number of its sons. Root has 3 sons, hence we say that gamma 0 -code of the 3 (sons) is a string of bits '00001' and we write gamma 0 (3) = 00001.</p><p>Then we state that the already represented word (an empty string) is not part of the dictionary by writing a bit 0.</p><p>Value of the the first son is encoded as a distance between its value and zero by delta 0 (46 − 0) = 0100101111.</p><p>Then the first subtrie is encoded by a recursive call of the encoding procedure on the first son of the actual node.</p><p>When the first subtrie is fully encoded, we should specify what the second son is. The difference between the first and the second son is 65 − 46, hence we write delta 0 (65 − 46) = 000110011.</p><p>Then we encode the second subtrie and the third son and the subtrie rooted in it. Now the whole node and all it subtries are encoded. As our example node is the root, we have encoded the whole trie representing the dictionary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Version with Translator -TD2</head><p>In TD1 version the distances between sons according binary values of the extending symbols are coded. These distances are encoded by Elias delta coding representing smaller numbers by shorter codes and larger numbers by longer codes. In version TD2 we reorder the symbols in the alphabet according the types of the symbols and their frequencies typical for given language. In our exmaple the symbols 0-27 are reserved for lower-case letters, 28-53 for uppercase letters, 54-63 for digits and 64-255 for other symbols. There are some examples in table <ref type="table">1</ref>. Table <ref type="table">1</ref>. An example of new ordering of the symbols symbol 'e' 't' 'a' 'I' 'T' 'A' '0' '1' '2' ' ' ',' '.' ord(symbol) 0 1 2 28 29 30 54 55 56 64 65 66</p><p>Improving procedure TD1 by a replacement of the expression "son[i] → extension" by the expression "ord(son[i] → extension)" in the lines 08 and 11 we get procedure TD2 (Fig. <ref type="figure">4</ref>).</p><p>Let us demonstrate this method on an example (Fig. <ref type="figure">5</ref>). The example dictionary contains again the strings ".\n", "ACM", "AC", "to" and "the". We will describe the work of the coding procedure EncodeNode2 on the node labelled by 't'.</p><p>In a node we must first encode the number of its sons. Our node has two sons, hence we write gamma 0 (2) = 011.</p><p>Then we state that the already represented word (the string "t") is not part of the dictionary by writing a bit 0. ?</p><p>. 66 ) P P P P P P q ) Fig. <ref type="figure">5</ref>. Example of a dictionary for TD2 and TD3</p><p>In the new ordering described in version TD2 it is given for each symbol type some interval of the new orders. Function first returns for each type of symbols the lowest orders available for given symbol type. Function first is described in Tab. 2. We are counting (Fig. <ref type="figure">6</ref>, line 10) and coding (Fig. <ref type="figure">6</ref>, line 11) the distances between the sons. For the first sons of some nodes of a known type, we can use function first and decrease the value of the distance and shorten the code. We modify version TD2 by a modifying of the line 06 and inserting the lines 07 and 08 getting version TD3.</p><p>Let us show the differences between TD3 and TD2 on our example (Fig. <ref type="figure">5</ref>).</p><p>Let us go directly to the node 't'. Here we must first encode the number of the sons of this node (2), we write gamma 0 (2) = 011.</p><p>Then we state that the already represented word (string "t") is not part of the dictionary by writing a bit 0.</p><p>Value of the the first son of our node is encoded as a distance between its value (3) and and zero (it is the first son) decreased by value of function f irst for a lower-case letter (0). Encoded value is delta 0 (3−0−0) = 01100. It is possible to restrict the shift interval by f irst of the encoded character type as we know this type -in a subtrie of the node 't' occur only lower-case letters. The encoded value is the same as in TD2 but there is a diference is in the calculation.</p><p>Other codings are made accordingly. German language has a lot of different and long word forms, the trie representing such dictionary is quite sparse and therefore the TD3 method outperformed other methods only for dictionary of very large documents.</p><p>English typically uses less word forms than Czech and German. These word forms are often shorter than the ones used in Czech and German. The trie is then for smaller documents quite sparse and therefore our compression method outperforms the other ones only for larger documents.</p><p>In Czech the documents are typically made form lots of middle-sized word forms and the dictionary tries are therefore quite dense. It is the reason why the method has been so successful for the dictionaries of Czech documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>We have proposed three methods for compression of dictionaries based on the representation of the dictionary by a trie data structure. One of them (TD3) has compressed the dictionary of syllables for given files better than all other tested methods have. It has been also the most successful method for compression of dictionaries of words of large documents.</p><p>Such dictionaries are used by many word-and syllable-based compression algorithms. Improving compression ratio of the dictionary improves (although with smaller impact) the overall compression ratio of these methods. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Values of function first</figDesc><table><row><cell cols="4">type of symbols lower-case letter upper-case letter digit other</cell></row><row><cell>first(type)</cell><cell>0</cell><cell>28</cell><cell>54 64</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>Dictionary of syllables: Compression ratio (Compared with the size of a whole file) in bits per character</figDesc><table><row><cell>-</cell><cell cols="2">File size 100 B 1 kB 10 kB 50 kB 200 kB 500 kB 2 MB</cell></row><row><cell cols="3">Lang. Method 1 kB 10 kB 50 kB 200 kB 500 kB 2MB 5 MB</cell></row><row><cell cols="3">CZ LZWD 5.359 3.233 1.423 0.562 0.343 0.204 --</cell></row><row><cell cols="2">CZ CD</cell><cell>3.741 2.432 1.130 0.461 0.284 0.169 --</cell></row><row><cell cols="3">CZ BzipD 5.285 2.952 1.227 0.468 0.285 0.168 --</cell></row><row><cell cols="2">CZ TD1</cell><cell>4.124 2.232 0.870 0.315 0.185 0.115 --</cell></row><row><cell cols="2">CZ TD2</cell><cell>2.944 1.594 0.638 0.240 0.143 0.093 --</cell></row><row><cell cols="2">CZ TD3</cell><cell>2.801 1.532 0.612 0.226 0.134 0.081 --</cell></row><row><cell cols="3">EN LZWD 4.580 1.715 0.732 0.426 0.269 0.152 0.059</cell></row><row><cell cols="2">EN CD</cell><cell>2.983 1.287 0.583 0.360 0.234 0.133 0.052</cell></row><row><cell cols="3">EN BzipD 4.390 1.523 0.626 0.353 0.222 0.124 0.047</cell></row><row><cell cols="2">EN TD1</cell><cell>3.792 1.276 0.506 0.272 0.158 0.086 0.033</cell></row><row><cell cols="2">EN TD2</cell><cell>2.871 0.954 0.384 0.212 0.124 0.069 0.028</cell></row><row><cell cols="2">EN TD3</cell><cell>2.666 0.890 0.354 0.195 0.116 0.063 0.024</cell></row><row><cell cols="3">GE LZWD 4.259 2.995 1.139 0.580 0.345 0.202 0.104</cell></row><row><cell cols="2">GE CD</cell><cell>3.068 2.360 0.997 0.530 0.315 0.185 0.091</cell></row><row><cell cols="3">GE BzipD 4.127 2.689 0.949 0.479 0.285 0.166 0.087</cell></row><row><cell cols="2">GE TD1</cell><cell>3.952 2.539 0.832 0.377 0.207 0.122 0.045</cell></row><row><cell cols="2">GE TD2</cell><cell>3.020 1.914 0.627 0.284 0.157 0.097 0.035</cell></row><row><cell cols="2">GE TD3</cell><cell>2.730 1.805 0.599 0.275 0.150 0.086 0.033</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 .</head><label>4</label><figDesc>Dictionary of words: Compression ratio (Compared with the size of a whole file) in bits per character -File size 100 B 1 kB 10 kB 50 kB 200 kB 500 kB 2 MB Lang. Method 1 kB 10 kB 50 kB 200 kB 500 kB 2MB 5 MB CZ LZWD 5.984 4.549 3.076 1.934 1.557 1.161 --CZ CD 4.378 3.830 2.948 1.968 1.648 1.260 --CZ BzipD 5.784 4.045 2.559 1.582 1.255 0.921 --.062 1.110 0.734 0.544 0.333 0.128 GE LZWD 4.712 3.634 1.819 1.227 0.996 0.706 0.716 GE CD 3.582 3.091 1.787 1.293 1.096 0.799 0.789 GE BzipD 4.409 3.216 1.506 1.001 0.797 0.558 0.565 GE TD1 7.187 5.748 2.585 1.700 1.383 0.945 0.844 GE TD2 4.985 3.885 1.691 1.094 0.875 0.601 0.534 GE TD3 4.699 3.776 1.660 1.085 0.867 0.591 0.532</figDesc><table><row><cell>CZ TD1</cell><cell>8.443 6.520 4.146 2.250 1.713 1.178 --</cell></row><row><cell>CZ TD2</cell><cell>5.935 4.531 2.874 1.550 1.176 0.814 --</cell></row><row><cell>CZ TD3</cell><cell>5.781 4.462 2.844 1.534 1.167 0.800 --</cell></row><row><cell cols="2">EN LZWD 4.699 2.195 1.203 0.872 0.687 0.443 0.189</cell></row><row><cell>EN CD</cell><cell>3.100 1.776 1.095 0.847 0.695 0.454 0.197</cell></row><row><cell cols="2">EN BzipD 4.508 1.915 1.002 0.714 0.563 0.361 0.154</cell></row><row><cell>EN TD1</cell><cell>6.320 3.144 1.698 1.108 0.813 0.498 0.191</cell></row><row><cell>EN TD2</cell><cell>4.526 2.142 1.144 0.753 0.554 0.341 0.132</cell></row><row><cell>EN TD3</cell><cell>4.219 2</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Value of the the first son of 't' is encoded as a distance between its value 3 and zero by delta 0 (3 − 0) = 01100.</p><p>Then the first subtrie of node 't' is encoded by a recursive call of the encoding procedure on the first son of the actual node.</p><p>When the subtrie of node 't' is fully encoded, we should specify what the second son of the root is. The difference between first and second son is 6 − 3, hence we write delta 0 (6 − 3) = 01100.</p><p>Then we encode second subtrie. Now the whole node and all it subtries are encoded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Version Using Types of Strings -TD3</head><p>Words and syllables are special types of strings. We recognize these five types of words (and syllables): lower-words (from lower-case letters), upper-words (from upper-case letters), mixed-words (having the first letter upper-case and the following letters lower-case), numeric-words (from digits) a other-words (from special characters). We know the type of a coded string for some nodes in the trie (in Fig. <ref type="figure">6</ref> IsKnownTypeOfSons) and we can use this information.</p><p>If a string begins with a lower-case letter (lower-word or lower-syllable), the following letters must be lower-case too. In a trie each son of a lower-case letter can be only a lower-case letter too. Similar situation is for other-words and numeric-words. If a string begins with an upper-case letter, we must look at the second symbol to recognize the type of the string (mixed or upper). In our example (Fig. <ref type="figure">5</ref>) we know for the nodes 't', 'o', 'h' and 'e' that all their sons are lower-case letters. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>We have tested three versions of the method compressing the dictionary using the trie data structure (TD -variants TD1, TD2, TD3), one method compressing the dictionary character-by-character (CD), and two methods using an external compressing tool for the concatenated directory items (LZWD, BzipD).</p><p>We have tested the dictionaries of words and syllables for variously sized documents written in following three languages: English (EN), German (GE), and Czech (CZ).</p><p>The best for the dictionaries of syllables it appears to be the method TD3 that outperfomed all other tested methods on all tested document sizes. For example, when compressing a 10KB document, TD3-compressed dictionary takes about 770 bytes whereas the second best method (CD) takes about 1450 bytes. In the case of the compression of dictionaries of words the best-performing method has been for small documents (up to 10kB) CD, for middle-sized documents BzipD, and for large documents TD3. The boundary between 'middle-sized' and 'large' documents is in this case dependent on the used language: for Czech it was about 50kB, for English about 200kB and for German about 2MB.</p><p>It seems that the success of the TD methods (TD3 inclusive) grows with the average arity of the trie nodes. The syllables are short and the trie representing a dictionary of syllables is typically dense, hence the TD3 method has been always the best.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Text compression: Syllables</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lánský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Žemlička</surname></persName>
		</author>
		<ptr target="http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS//Vol-129/paper4.pdf" />
	</analytic>
	<monogr>
		<title level="m">DATESO 2005</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Richta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Snášel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pokorný</surname></persName>
		</editor>
		<meeting><address><addrLine>Prague, Czech</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="32" to="45" />
		</imprint>
		<respStmt>
			<orgName>Technical University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Parsing strategies for BWT compression</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">Y K</forename><surname>Isal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moffat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Compression Conference</title>
				<meeting><address><addrLine>Los Alamitos, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE CS Press</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="429" to="438" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Seward</surname></persName>
		</author>
		<ptr target="http://sources.redhat.com/bzip2/asvisitedon6th" />
		<title level="m">The bzip2 and libbzip2 official home page</title>
				<imprint>
			<date type="published" when="2005-02">February 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A technique for high performance data compression</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Welch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computer</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="8" to="19" />
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">The Art of Computer Programming</title>
		<author>
			<persName><forename type="first">D</forename><surname>Knuth</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>Addison-Wesley</publisher>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
	<note>: Sorting and Searching. Third edn</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Universal codeword sets and representation of the integers</title>
		<author>
			<persName><forename type="first">P</forename><surname>Elias</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. on Information Theory</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="194" to="203" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
