<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">UAEMex at ImageCLEF 2016: Handwritten Scanned Document Retrieval Task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Miguel</forename><surname>Ángel</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Autonomous University of the State of Mexico (UAEMex)</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">García</forename><surname>Calderón</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Autonomous University of the State of Mexico (UAEMex)</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arnulfo</forename><surname>René</surname></persName>
							<email>renearnulfo@hotmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Autonomous University of the State of Mexico (UAEMex)</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><surname>García Hernández</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Autonomous University of the State of Mexico (UAEMex)</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yulia</forename><surname>Ledeneva</surname></persName>
							<email>yledeneva@yahoo.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Autonomous University of the State of Mexico (UAEMex)</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">UAEMex at ImageCLEF 2016: Handwritten Scanned Document Retrieval Task</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5672DF99F4BA4AACBD805C32E31C744C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:18+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Information Retrieval</term>
					<term>Longest Common Subsequence</term>
					<term>Free Text Search</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the participation of the (UAEMex) at the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task. We propose to use a skip-character text search method based on Longest Common Subsequence. Our system split all characters in query to find all Longest Common Subsequence in one line of text.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>This paper describes the free text search method used by UAEMex at the ImageCLEF 2016 <ref type="bibr" target="#b2">[3]</ref> handwritten retrieval task <ref type="bibr" target="#b3">[4]</ref>. The 1st edition of the handwritten retrieval challenge has one task targeted in free text search. Considering transcript text for every character we use a skip-character text search method based on Longest Common Subsequence (LCS) problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Fixed Gap Longest Common Subsequence</head><p>The problem to extract LCS consists of given two sequences find the length of longest subsequence present in both of them. Given a string, a subsequence of the string can be obtained from the string by deleting none or some symbols <ref type="bibr" target="#b1">[2]</ref> (not necessarily consecutive ones). To extract non-consecutive subsequences, Iliopoulos <ref type="bibr" target="#b0">[1]</ref> proposes a variant to find the LCS, called Fixed Gap Longest Common Subsequence (FGLCS) problem, where a value of k is the fixed gap constraint and the distance between two consecutive matches is required to be limited to at most k+1. Figure <ref type="figure" target="#fig_0">1</ref> shows an example of LCS and FGLCS searching. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Free Text Search</head><p>The proposed method is based on FGLCS search and is divided into three phases. The system is proposed for transcriptions of incomplete or non-existent words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Preprocessing Phase</head><p>1. Delete non-alphabetic characters in transcript file. 2. Delete line breaks on every segment to get one line segment. 3. Split line by every char.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">String Matching Phase</head><p>At first step, each query is divided by a space, and then the FGLCS is searched in the actual segment for every word in the query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Ranking Phase</head><p>Every FGLCS is revised to have the same order of words that in the query, in such case, the confidence score is calculated using equation (1). The system considers that a result is relevant if confidence is more than 0.5.</p><p>• q = chars in query</p><formula xml:id="formula_0">• s = chars in the longest sequence • c = confidence c = #$ #$% #<label>(1)</label></formula><p>We prove confidence threshold with values 0.9, 0.8, 0.7, 0.6 and 0.5. The best confidence threshold was 0.5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Submitted Runs</head><p>In this section, the nine free text search runs submitted by UAEMex are presented.</p><p>Considering bad transcribed words, we change gap value to retrieve more words, however retrieval performance decrease. Run1: FGLCS search with gap = 0. Run2: FGLCS search with gap = 1. Run3: FGLCS search with gap = 2. Run4: FGLCS search with gap = 3. Run5: Union of Run1 + Run2. Run6: Union of Run1 + Run2 + Run3. Run7: Union of Run1 + Run3. Run8: Union of Run1 + Run2 + Run3 + Run4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Results</head><p>In this section, the results of submitted runs by UAEMex are presented. The results with '-' could not be analyzed. Only the measures based on segments are included, and the ones for bounding boxes were omitted. The presented results are extracted only using the n-best No.20 of the n-best providers by the organizers.</p><p>The results of the runs in development the following set of four metrics: Global Average Precision (Segm_gAP), Mean Average Precision (Segm_mAP), Global Normalized Discounted Cumulative Gain (Segm_gNDCG) and Mean Normalized Discounted Cumulative Gain (Segm_mNDCG) have been used to evaluate the accuracy of submissions (see Table <ref type="table" target="#tab_1">1 and Table2</ref>).  <ref type="table" target="#tab_1">1 and Table2</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>This paper presents results in free text search using LCS. We describe the joint participation of the UAEMex at ImageCLEF 2016 Handwritten Scanned Document Retrieval Task. The proposed method works with words of dictionary and nonexistent words. There are big differences between the results of development set (Table <ref type="table" target="#tab_0">1</ref>) and test set (Table <ref type="table" target="#tab_1">2</ref>). We assume we got bad results because we only use one n-best file provided by the organizers.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Example of FGLCS with gap 1 and FGLCS with gap 0.</figDesc><graphic coords="2,156.00,148.80,283.68,94.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Example of text segment.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Example of text segment after line break deletion.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>The results of the development set.</figDesc><table><row><cell cols="4">Segm_gAP Segm_mAP Segm_gNDCG Segm_mNDCG</cell></row><row><cell>RUN1 61.11</cell><cell>38.55</cell><cell>69.08</cell><cell>41.69</cell></row><row><cell>RUN2 47.61</cell><cell>32.33</cell><cell>59.39</cell><cell>37.56</cell></row><row><cell>RUN3 30.22</cell><cell>20.32</cell><cell>43.64</cell><cell>27.11</cell></row><row><cell>RUN4 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN5 51.21</cell><cell>36.92</cell><cell>64.55</cell><cell>40.70</cell></row><row><cell>RUN6 27.62</cell><cell>19.82</cell><cell>53.82</cell><cell>28.90</cell></row><row><cell>RUN7 0.15</cell><cell>1.69</cell><cell>1.93</cell><cell>2.82</cell></row><row><cell>RUN8 26.24</cell><cell>19.64</cell><cell>53.37</cell><cell>28.81</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>The results of the test set.</figDesc><table><row><cell cols="4">Segm_gAP Segm_mAP Segm_gNDCG Segm_mNDCG</cell></row><row><cell>RUN1 0.26</cell><cell>0.39</cell><cell>1.22</cell><cell>0.39</cell></row><row><cell>RUN2 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN3 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN4 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN5 3.51</cell><cell>0.94</cell><cell>10.15</cell><cell>1.52</cell></row><row><cell>RUN6 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN7 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>RUN8 -</cell><cell>-</cell><cell>-</cell><cell>-</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Algorithms for computing variants of the longest common subsequence problem</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Iliopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Rahman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Theoretical Computer Science</title>
		<imprint>
			<biblScope unit="volume">395</biblScope>
			<biblScope unit="page" from="255" to="267" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An optimal algorithm for the longest common subsequence problem</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fang</surname></persName>
		</author>
		<idno type="DOI">10.1109/SPDP.1991.21820</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third IEEE Symposium on</title>
				<meeting>the Third IEEE Symposium on<address><addrLine>Dallas, TX</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1991">1991. 1991</date>
			<biblScope unit="page" from="630" to="639" />
		</imprint>
	</monogr>
	<note>Parallel and Distributed Processing</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">General Overview of ImageCLEF at the CLEF 2016 Labs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seco De Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schaer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bromuri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gilbert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Piras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramisa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dellandrea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gaizauskas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mikolajczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Puigcerver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Toselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Vidal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>Springer International Publishing</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task</title>
		<author>
			<persName><forename type="first">M</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Puigcerver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Toselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Vidal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF2016 Working Notes. CEUR Workshop Proceedings, CEUR-WS.org</title>
				<meeting><address><addrLine>Évora, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-08">September 5-8 2016</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
