<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Breaking CAPTCHAs with Convolutional Neural Networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Martin</forename><surname>Kopp</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Information Technology</orgName>
								<orgName type="institution">Czech Technical University in Prague</orgName>
								<address>
									<addrLine>Thákurova 9</addrLine>
									<postCode>160 00</postCode>
									<settlement>Prague</settlement>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Cisco Systems</orgName>
								<orgName type="institution">Cognitive Research Team</orgName>
								<address>
									<settlement>Prague</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matěj</forename><surname>Nikl</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Information Technology</orgName>
								<orgName type="institution">Czech Technical University in Prague</orgName>
								<address>
									<addrLine>Thákurova 9</addrLine>
									<postCode>160 00</postCode>
									<settlement>Prague</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Martin</forename><surname>Holeňa</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Information Technology</orgName>
								<orgName type="institution">Czech Technical University in Prague</orgName>
								<address>
									<addrLine>Thákurova 9</addrLine>
									<postCode>160 00</postCode>
									<settlement>Prague</settlement>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Academy of Sciences of the Czech Republic Pod Vodárenskou věží</orgName>
								<address>
									<postCode>182 07</postCode>
									<settlement>Prague</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Breaking CAPTCHAs with Convolutional Neural Networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B0110927E35DBB4BA83D6727E82FED01</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>CAPTCHA</term>
					<term>convolutional neural networks</term>
					<term>network security</term>
					<term>optical character recognition</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper studies reverse Turing tests to distinguish humans and computers, called CAPTCHA. Contrary to classical Turing tests, in this case the judge is not a human but a computer. The main purpose of such tests is securing user logins against the dictionary or brute force password guessing, avoiding automated usage of various services, preventing bots from spamming on forums and many others.</p><p>Typical approaches to solving text-based CAPTCHA automatically are based on a scheme specific pipeline containing hand-designed pre-processing, denoising, segmentation, post processing and optical character recognition. Only the last part, optical character recognition, is usually based on some machine learning algorithm. We present an approach using neural networks and a simple clustering algorithm that consists of only two steps, character localisation and recognition. We tested our approach on 11 different schemes selected to present very diverse security features. We experimentally show that using convolutional neural networks is superior to multi-layered perceptrons.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The acronym CAPTCHA <ref type="foot" target="#foot_0">1</ref> stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and was coined in 2003 by von Ahn et al <ref type="bibr" target="#b17">[20]</ref>. The fundamental idea is to use hard AI problems easily solved by most human, but unfeasible for current computer programs. Captcha is widely used to distinguish the human users from computer bots and automated scripts. Nowadays, it is an established security mechanism to prevent automated posting on the internet forums, voting in online polls, downloading files in large amounts and many other abusive usage of web services.</p><p>There are many available captcha schemes ranging from classical text-based over image-based to many unusual custom designed solutions, e.g. <ref type="bibr" target="#b1">[3,</ref><ref type="bibr" target="#b2">4]</ref>. Because most of the older schemes have already been proven vulnerable to attacks and thus found unsafe <ref type="bibr" target="#b4">[7,</ref><ref type="bibr" target="#b16">19]</ref> new schemes are being invented. Despite that trend, there are still many places where the classical text-based schemes are used as the main or at least as a fallback solution. For example, Google uses the text-based schemes when you fail in their newer image-based ones.</p><p>This paper is focused on automatic character recognition from multiple text-based CAPTCHA schemes using artificial neural networks (ANNs) and clustering. The ultimate goal is to take a captcha challenge as an input while outputting transcription of the text presented in the challenge. Contrary to the most prior art, our approach is general and can solve multiple schemes without modification of any part of the algorithm.</p><p>The experimental part compares the performance of the shallow (only one hidden layer) and deep (multiple hidden layers) ANNs and shows the benefits of using a convolutional neural networks (CNNs) multi-layered perceptrons (MLP).</p><p>The rest of this paper is organised as follows. The related work is briefly reviewed in the next section. Section 3 surveys the current captcha solutions. Section 4 presents our approach to breaking captcha challenges. The experimental evaluation is summarised in Section 5 followed by the conclusion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Most papers about breaking captcha heavily focus on one particular scheme. As an example may serve <ref type="bibr" target="#b8">[11]</ref> with preprocessing, text-alignment and everything else fitted for the scheme reCapthca 2011. To our knowledge, the most general approach was presented in <ref type="bibr" target="#b4">[7]</ref>. This approach is based on an effective selection of the best segmentation cuts and presenting them to k-nn classifier. It was tested on many up-to-date text-based schemes with better results than specialized solutions.</p><p>The most recent approaches use neural networks <ref type="bibr" target="#b16">[19]</ref>. The results are still not that impressive as the previous approaches, but the neural-net-based approaches improve very quickly. Our work is based on CNN, being motivated by their success in pattern recognition, e.g. <ref type="bibr" target="#b3">[6,</ref><ref type="bibr" target="#b11">14]</ref>.</p><p>The Microsoft researcher Chellapilla who intensively studied human interaction proofs stated that, depending on the cost of the attack, automated scripts should not be more successful than 1 in 10 000 attempts, while human success rate should approach 90% <ref type="bibr" target="#b7">[10]</ref>. It is generally considered a too ambitious goal, after the publication of <ref type="bibr" target="#b5">[8]</ref> showing the human success rate in completing captcha challenges and <ref type="bibr" target="#b6">[9]</ref> showing that random guesses can be successful. Consequently, a captcha is considered compromised when the attacker success rate surpasses 1%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Captcha Schemes Survey</head><p>This section surveys the currently available captcha schemes and challenges they present.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Text-Based</head><p>The first ever use of captcha was in 1997 by the software company Alta-Vista, which sought a way to prevent automated submissions to their search-engine. It was a simple text-based test which was sufficient for that time, but it was quickly proven ineffective when the computer character recognition success rates improved. The most commonly used techniques to prevent automatic recognition can be divided into two groups called anti-recognition features and anti-segmentation features.</p><p>The anti-recognition features such as different sizes and fonts of characters or rotation was a straightforward first step to the more sophisticated captcha schemes. All those features are well accepted by humans, as we learn several shapes of letters since childhood, e.g. handwritten alphabet, small letters, capitals. The effective way of reducing the classifier accuracy is a distortion. Distortion is a technique in which ripples and warp are added to the image. But excessive distortion can make it very difficult even for humans and thus the usage of this feature slowly vanishes being replaced by anti-segmentation features.</p><p>The anti-segmentation features are not designed to complicate a single character recognition but instead they try to make the automated segmentation of the captcha image unmanageable. The first two features used for this purpose were added noise and confusing background. But it showed up that both of them are bigger obstacle for humans than for computers and therefore, they where replace by occlusion lines, an example can be seen in Figure <ref type="figure" target="#fig_0">1</ref>. The most recent anti-segmentation feature is called negative kerning. It means that the neighbouring characters are moved so close to each other that they can eventually overlap. It showed up that humans are still able to read the overlapping text with only a small error rate, but for computers it is almost impossible to find a right segmentation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Audio-Based</head><p>From the beginning, the adoption of captcha schemes was problematic. Users were annoyed with captchas that were hard to solve and had to try multiple times. The people affected the most were those with visual impairments or various reading disorders such as dyslexia. Soon, an alternative emerged in the form of audio captchas. Instead of displaying images, a voice reading letters and digits is played. In order to remain effective and secure, the captcha has to be resistant to automated sound analysis. For this purpose various background noise and sound distortion are added. Generally, this scheme is now a standard alternative option on major websites that use captcha.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Image-Based</head><p>Currently, the most prominent design is image-based captcha. A series of images showing various objects is presented to the user and the task is to select the images with a topic given by a keyword or by an example image. For example the user is shown a series of images of various landscapes and is asked to select those with trees, like in Figure <ref type="figure" target="#fig_1">2</ref>. This type of captcha has gained huge popularity especially on touchscreen devices, where tapping the screen is preferable over typing. In the case of Google reCaptcha there are nine images from which the 4 − 6 are the correct answer. In order to successfully complete the challenge a user is allowed to have one wrong answer. Relatively new but fast spreading type of image captcha combines the pattern recognition task presented above with object localisation. Also the number of squares was increased from 9 to 16.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Other Types</head><p>In parallel with the image-based captcha developed by Google and other big players, many alternative schemes appeared. They are different variations of text-based schemes hidden in video instead of distorted image, some simple logical games or puzzles. As an example of an easy to solve logical game we selected the naughts and crosses, Figure <ref type="figure" target="#fig_2">3</ref>. All of those got recently dominated by Google's noCaptcha button. It uses browser cookies, user profiles and history to track users behaviour and distinguish real users from bots. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Our Approach</head><p>Our algorithm has two main stages localisation and recognition. The localisation can be further divided into heat map generation and clustering. Consequently, our algorithm consist of three steps:</p><p>1. Create a heat map using a sliding window with an ANN, that classifies whether there is a character in the center or not.</p><p>2. Use the k-means algorithm to determine the most probable locations of characters from the heat map.</p><p>3. Recognize the characters using another specifically trained ANN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Heatmap Generation</head><p>We decided to use the sliding window technique to localize characters within a CAPTCHA image. This approach is well known in the context of object localization <ref type="bibr" target="#b13">[16]</ref>.</p><p>A sliding window is a rectangular region of fixed width and height that slides across an image. Each of those windows serve as an input for a feed-forward ANN with a single output neuron. Its output values are the probability of its input image having a character in the center. Figure <ref type="figure" target="#fig_3">4</ref> shows an example of such heat map. To enable a character localization even at the very edge of an image one can expand each input image with black pixels. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Clustering</head><p>When a heat map is complete, all points with value greater than 0.5 are added to the list of points to be clustered. As this is still work in progress we simplified the situation by knowing the number of characters within the image in advance and therefore, knowing the correct number of clusters k, we decided to use k-means clustering to determine windows with characters close to their center. But almost an arbitrary clustering algorithm can be used, preferably some, that can determine the correct number of clusters.</p><p>The k centroids are initialized uniformly from left to right, vertically in the middle, as this provides a good initial estimation. Figure <ref type="figure" target="#fig_4">5</ref> illustrates the whole idea. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Recognition</head><p>Assuming that the character localization part worked well, windows containing characters are now ready to be recognized. This task is known to be easy for computers to solve; in fact, they are even better than humans <ref type="bibr" target="#b7">[10]</ref>. Again, a feed-forward ANN is used. This time with an output layer consisting of 36 neurons to estimate the probability distribution over classes: numbers 0-9 and uppercase letters A-Z. Finally, a CAPTCHA transcription is created by writing the recognized characters in the ascending order of their x-axis coordinates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimental Evaluation</head><p>This section describes the selection of a captcha suite and generation of the labelled database, followed by a detailed description of the artificial neural networks used in our experiments. The last part of this section presents results of the experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experimental Set up</head><p>Training an ANN usually requires a lot of training examples (in the order of millions in the case of a very deep CNN). It is advised to have at least multiple times the number of all parameters in the network <ref type="bibr" target="#b10">[13]</ref>. Manually downloading, cropping and labelling such high number of examples is infeasible. Therefore, we tested three captcha providers with obtainable source code to be able to generate large enough datasets: Secureimage PHP Captcha [5], capchas.net [2] and BotDetect captcha <ref type="bibr" target="#b0">[1]</ref>. We selected the last one as it provides the most variable set of schemes.</p><p>BotDetect CAPTCHA is a paid, up-to-date service used by many government institutions and companies all around the world <ref type="bibr" target="#b0">[1]</ref>. They offer a free licence with an access to obfuscated source codes. We selected 11 very diverse schemes out of available 60, see Figure <ref type="figure">6</ref> for example of images, and generated 100.000 images cropped to one character for each scheme. The cropping is done to 32x32 pixel windows, which is the size of a sliding window. Cropped images are then used for training of the localization as well as the recognition ANN. The testing set consist of 1000 whole captcha images with 5 characters each.</p><p>Schemes display various security features such as random lines and other objects occluding the characters, jagged or translucent character edges and global warp. The scheme s10 -Circles stands out with its colour inverting randomly placed circles. This property could make it harder to recognize than others, because the solver needs to account for random parts of characters and their background switching colours.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Artificial Neural Networks</head><p>The perceptron with single hidden layer (SLP), the perceptron with three hidden layers (MLP) and the convolutional neural networks were tested in the localization and recognition. In all ANNs, rectified linear units were used as activation functions.</p><p>First experiment tested the influence of the number of hidden neurons of a SLP. The number of hidden neurons used for the localization network was lns={15,30,60,90} and the number of neurons for the recognition network was rns={30,60,120,180,250}. The results depicted in Figure <ref type="figure">7</ref> show the recognition rate for 1000 whole captcha images (all characters have to be correctly recognized) on the scheme s10. The scheme s10 was selected because we consider it the most difficult one. The next experiments was the same but the MLP with three hidden layers was used instead of SLP. Results, depicted in Figure <ref type="figure" target="#fig_6">8</ref>, suggest that adding more hidden layers does not improve accuracy of the localization neither of the recognition. Therefore, the rest experiments were done using SLP as it can be trained faster.</p><p>Both CNNs architectures resemble the LeNet-5 presented in <ref type="bibr" target="#b14">[17]</ref> for handwritten digits recognition. The localization CNN consists of two convolutional layers with six and sixteen 5x5 kernels, each of them followed by the Table <ref type="table">1</ref>: Results of the statistical test of Friedman <ref type="bibr" target="#b9">[12]</ref> and the correction for simultaneous hypotheses testing by Holm <ref type="bibr" target="#b12">[15]</ref> and Shaffer <ref type="bibr" target="#b15">[18]</ref>. The rejection thresholds are computed for the family-wise significance level p = 0.05 a single scheme. 2x2 max pooling layers,and finally, the last layer of the network is a fully connected output layer. The recognition CNN contains an additional fullyconnected layer with 120 neurons right before the output layer as illustrated in Figure <ref type="figure" target="#fig_7">9</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Results</head><p>After choosing the right architectures, we followed by testing the accuracy of captcha transcription on each scheme separately where both training and testing sets were generated by the same scheme. All images in the test set contained 5 characters and only the successful transcription of all of them was accepted as a correct answer. The results, depicted in Figure <ref type="figure" target="#fig_8">10</ref>, show appealing performance of all tested configurations. In the most cases it doesn't matter if the localization network was a SLP or a CNN, but the CNN clearly outperforms the SLP in the role of a recognition network. This observation is also confirmed by the statistical test of Friedman <ref type="bibr" target="#b9">[12]</ref> with corrections for simultaneous hypothesis testing by Holm <ref type="bibr" target="#b12">[15]</ref> and Shaffer <ref type="bibr" target="#b15">[18]</ref>, see Table <ref type="table">1</ref>.</p><p>A subsequent experiment tested the accuracy of captcha transcription when training and testing sets consist of im-    <ref type="figure" target="#fig_9">11</ref>. In this experiment the CNN outperformed the SLP not only in the recognition but even in the localization accuracy. The most visible difference is on schemes s08, s18, s41. Overall performance is again compared by the statistical test with results summarized in Table <ref type="table">2</ref>. All accuracies are lower than in the previous experiment, as the data set complexity grown (data were generated by multiple schemes), but the number of training examples remained the same.</p><p>Table <ref type="table">2</ref>: Results of the statistical test of Friedman <ref type="bibr" target="#b9">[12]</ref> and the correction for simultaneous hypotheses testing by Holm <ref type="bibr" target="#b12">[15]</ref> and Shaffer <ref type="bibr" target="#b15">[18]</ref>. The rejection thresholds are computed for the family-wise significance level p = 0.05 for all schemes. The last experiment tested the accuracy of captcha transcription in leave-one-scheme-out scenario. The training set contained images generated by only 10 schemes and the images used for testing were all generated by the last yet unseen scheme. Trying to recognize characters from images generated by an unknown scheme is a challenging task, furthermore the schemes were selected to differ form each other as much as possible. The results are depicted in Figure <ref type="figure" target="#fig_10">12</ref>. All configurations using a perceptron as the recognition classifier fail in all except the most simple schemes, e.g. s12 and s16. The combination of two CNNs is the best in all cases, with only exception being the scheme s30, where the combination of the localization perceptron and the recognition CNN is the best. Overall, the accuracy may seem relatively low, especially for schemes s10, s30, s31 and s41, but lets recall that recognition rate of 1% is already considered enough to compromise the scheme. The failure of CNNS on scheme s41 is understandable as the spiderweb background confuses the convolutional kernels learned on other schemes. This is the most important experiment showing the ability to solve yet unseen captcha .The ranking of all algorithms is summarized in Table <ref type="table" target="#tab_2">3</ref> and the statical tests in Table <ref type="table" target="#tab_3">4</ref>.  <ref type="bibr" target="#b9">[12]</ref> and the correction for simultaneous hypotheses testing by Holm <ref type="bibr" target="#b12">[15]</ref> and Shaffer <ref type="bibr" target="#b15">[18]</ref>. The rejection thresholds are computed for the family-wise significance level p = 0.05 for the leave-one-scheme-out scenario. The above experiments show that most of current schemes can be compromised using two convolutional networks or a localization perceptron and a recognition CNN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>In this paper, we presented a novel captcha recognition approach, which can fully replace the state-of-the art scheme specific pipelines. Our approach not only consists of less steps, but it is also more general as it can be applied to a wide variety of captcha schemes without modification. We were able to compromise 10 out of 11 using two CNNs or a localization perceptron and a recognition CNN without previously seeing any example image generated by that particular scheme. Furthermore, we were able to break all 11 captcha schemes using a CNN for the localization as well as for the recognition, with the accuracy higher than 50% when we included example images of each character generated by the particular scheme into the training set. Lets recall that 1% recognition rate is enough for a scheme to be considered compromised.</p><p>We experimentally compared the ability of SLP, MLP and CNN to transcribe characters from captcha images. According to our experiments, CNNs performs much better in both localization and recognition.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Older Google reCaptcha with the occlusion line.</figDesc><graphic coords="2,98.07,637.22,158.04,65.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Current Google reCaptcha with image recognition challenge.</figDesc><graphic coords="2,345.88,407.90,158.04,239.78" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: A naughts and crosses game used as a captcha.</figDesc><graphic coords="3,84.40,220.40,205.38,82.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example of a heat map for a challenge generated by scheme s16.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Heatmap clustering on random character locations</figDesc><graphic coords="3,322.41,353.37,110.71,159.14" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :Figure 7 :</head><label>67</label><figDesc>Figure 6: Schemes generated by the BotDetect captcha</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Comparison of MLP recognition rate on the scheme s10, depending on the number of neuron use by the localization network (lns) and the recognition network (rns).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: The architecture of a character recognition CNN.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 10 :</head><label>10</label><figDesc>scheme</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 11 :</head><label>11</label><figDesc>scheme</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 12 :</head><label>12</label><figDesc>Figure 12: The accuracy of captcha image transcription in leave-one-scheme-out scenario.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Average Rankings of the algorithms</figDesc><table><row><cell>Algorithm</cell><cell>Ranking</cell></row><row><cell>CNN+CNN SLP+CNN CNN+SLP SLP+SLP</cell><cell>1.27 2.00 3.27 3.45</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Results of the statistical test of Friedman</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The acronym captcha will be written in lowercase for better readability.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>The research reported in this paper has been supported by the Czech Science Foundation (GA ČR) grant 17-01251 and student grant SGS17/210/OHK3/3T/18.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="www.captcha.com[Cited2017-06-01" />
		<title level="m">Botdetect captcha generator</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="2017-06-01" />
		<title level="m">Metal captcha</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target=".www.wordpress.org/plugins/resisty[Cited2017-06-01" />
		<title level="m">Resisty captcha</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Multiple object recognition with visual attention</title>
		<author>
			<persName><forename type="first">Jimmy</forename><surname>Ba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Volodymyr</forename><surname>Mnih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Koray</forename><surname>Kavukcuoglu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The end is nigh: Generic solving of textbased captchas</title>
		<author>
			<persName><forename type="first">Elie</forename><surname>Bursztein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Aigrain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Angelika</forename><surname>Moscicki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><forename type="middle">C</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">8th USENIX Workshop on Offensive Technologies (WOOT 14)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">How good are humans at solving captchas? a large scale evaluation</title>
		<author>
			<persName><forename type="first">Elie</forename><surname>Bursztein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steven</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Celine</forename><surname>Fabry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><forename type="middle">C</forename><surname>Mitchell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Jurafsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Security and Privacy</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="399" to="413" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Textbased captcha strengths and weaknesses</title>
		<author>
			<persName><forename type="first">Elie</forename><surname>Bursztein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthieu</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th ACM conference on Computer and communications security</title>
				<meeting>the 18th ACM conference on Computer and communications security</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="125" to="138" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Designing human friendly human interaction proofs (hips)</title>
		<author>
			<persName><forename type="first">Kumar</forename><surname>Chellapilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Larson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrice</forename><surname>Simard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mary</forename><surname>Czerwinski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SIGCHI conference on Human factors in computing systems</title>
				<meeting>the SIGCHI conference on Human factors in computing systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="711" to="720" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Breaking recaptchas with unpredictable collapse: heuristic character segmentation and recognition</title>
		<author>
			<persName><forename type="first">Claudia</forename><surname>Cruz-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oleg</forename><surname>Starostenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fernando</forename><surname>Uceda-Ponga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vicente</forename><surname>Alarcon-Aquino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leobardo</forename><surname>Reyes-Cabrera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Pattern Recognition</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="155" to="165" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The use of ranks to avoid the assumption of normality implicit in the analysis of variance</title>
		<author>
			<persName><forename type="first">Milton</forename><surname>Friedman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the american statistical association</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">200</biblScope>
			<biblScope unit="page" from="675" to="701" />
			<date type="published" when="1937">1937</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Deep Learning</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<ptr target="http://www.deeplearningbook.org" />
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Multi-digit number recognition from street view imagery using deep convolutional neural networks</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yaroslav</forename><surname>Bulatov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julian</forename><surname>Ibarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sacha</forename><surname>Arnoud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vinay</forename><surname>Shet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A simple sequentially rejective multiple test procedure</title>
		<author>
			<persName><forename type="first">Sture</forename><surname>Holm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scandinavian journal of statistics</title>
		<imprint>
			<biblScope unit="page" from="65" to="70" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Beyond sliding windows: Object localization by efficient subwindow search</title>
		<author>
			<persName><surname>Ch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Lampert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Blaschko</surname></persName>
		</author>
		<author>
			<persName><surname>Hofmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR 2008</title>
				<meeting><address><addrLine>Los Alamitos, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Max-Planck-Gesellschaft</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
		<respStmt>
			<orgName>IEEE Computer Society</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Gradient-based learning applied to document recognition</title>
		<author>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Léon</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Haffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE</title>
				<meeting>the IEEE</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="page" from="2278" to="2324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Multiple hypothesis testing</title>
		<author>
			<persName><forename type="first">Juliet</forename><surname>Popper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaffer</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annual review of psychology</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="561" to="584" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Captcha recognition with active deep learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Stark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hazırbaş</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Triebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cremers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">GCPR Workshop on New Challenges in Neural Computation</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Captcha: Using hard ai problems for security</title>
		<author>
			<persName><forename type="first">Luis</forename><surname>Von Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manuel</forename><surname>Blum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicholas</forename><forename type="middle">J</forename><surname>Hopper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Langford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Cryptology-EUROCRYPT 2003</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="294" to="311" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
