<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Breaking CAPTCHAs with Convolutional Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Kopp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mateˇj Nikl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Cognitive Research Team in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague Thákurova 9</institution>
          ,
          <addr-line>160 00 Prague</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1885</volume>
      <fpage>93</fpage>
      <lpage>99</lpage>
      <abstract>
        <p>This paper studies reverse Turing tests to distinguish humans and computers, called CAPTCHA. Contrary to classical Turing tests, in this case the judge is not a human but a computer. The main purpose of such tests is securing user logins against the dictionary or brute force password guessing, avoiding automated usage of various services, preventing bots from spamming on forums and many others. Typical approaches to solving text-based CAPTCHA automatically are based on a scheme specific pipeline containing hand-designed pre-processing, denoising, segmentation, post processing and optical character recognition. Only the last part, optical character recognition, is usually based on some machine learning algorithm. We present an approach using neural networks and a simple clustering algorithm that consists of only two steps, character localisation and recognition. We tested our approach on 11 different schemes selected to present very diverse security features. We experimentally show that using convolutional neural networks is superior to multi-layered perceptrons.</p>
      </abstract>
      <kwd-group>
        <kwd>CAPTCHA</kwd>
        <kwd>convolutional neural networks</kwd>
        <kwd>network security</kwd>
        <kwd>optical character recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The acronym CAPTCHA1 stands for Completely
Automated Public Turing test to tell Computers and Humans
Apart, and was coined in 2003 by von Ahn et al [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ]. The
fundamental idea is to use hard AI problems easily solved
by most human, but unfeasible for current computer
programs. Captcha is widely used to distinguish the human
users from computer bots and automated scripts.
Nowadays, it is an established security mechanism to prevent
automated posting on the internet forums, voting in online
polls, downloading files in large amounts and many other
abusive usage of web services.
      </p>
      <p>
        There are many available captcha schemes ranging from
classical text-based over image-based to many unusual
custom designed solutions, e.g. [
        <xref ref-type="bibr" rid="ref4 ref5">3, 4</xref>
        ]. Because most of
the older schemes have already been proven vulnerable to
attacks and thus found unsafe [
        <xref ref-type="bibr" rid="ref20 ref8">7, 19</xref>
        ] new schemes are
being invented. Despite that trend, there are still many
places where the classical text-based schemes are used as
1The acronym captcha will be written in lowercase for better
readability.
the main or at least as a fallback solution. For example,
Google uses the text-based schemes when you fail in their
newer image-based ones.
      </p>
      <p>This paper is focused on automatic character
recognition from multiple text-based CAPTCHA schemes using
artificial neural networks (ANNs) and clustering. The
ultimate goal is to take a captcha challenge as an input while
outputting transcription of the text presented in the
challenge. Contrary to the most prior art, our approach is
general and can solve multiple schemes without modification
of any part of the algorithm.</p>
      <p>The experimental part compares the performance of the
shallow (only one hidden layer) and deep (multiple hidden
layers) ANNs and shows the benefits of using a
convolutional neural networks (CNNs) multi-layered perceptrons
(MLP).</p>
      <p>The rest of this paper is organised as follows. The
related work is briefly reviewed in the next section. Section 3
surveys the current captcha solutions. Section 4 presents
our approach to breaking captcha challenges. The
experimental evaluation is summarised in Section 5 followed by
the conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Most papers about breaking captcha heavily focus on one
particular scheme. As an example may serve [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] with
preprocessing, text-alignment and everything else fitted
for the scheme reCapthca 2011. To our knowledge, the
most general approach was presented in [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ]. This approach
is based on an effective selection of the best segmentation
cuts and presenting them to k-nn classifier. It was tested
on many up-to-date text-based schemes with better results
than specialized solutions.
      </p>
      <p>
        The most recent approaches use neural networks [
        <xref ref-type="bibr" rid="ref20">19</xref>
        ].
The results are still not that impressive as the previous
approaches, but the neural-net-based approaches improve
very quickly. Our work is based on CNN, being motivated
by their success in pattern recognition, e.g. [
        <xref ref-type="bibr" rid="ref15 ref7">6, 14</xref>
        ].
      </p>
      <p>
        The Microsoft researcher Chellapilla who intensively
studied human interaction proofs stated that, depending on
the cost of the attack, automated scripts should not be more
successful than 1 in 10 000 attempts, while human success
rate should approach 90% [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ]. It is generally considered
a too ambitious goal, after the publication of [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ] showing
the human success rate in completing captcha challenges
and [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ] showing that random guesses can be successful.
Consequently, a captcha is considered compromised when
the attacker success rate surpasses 1%.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Captcha Schemes Survey</title>
      <p>This section surveys the currently available captcha
schemes and challenges they present.
3.1</p>
      <sec id="sec-3-1">
        <title>Text-Based</title>
        <p>The first ever use of captcha was in 1997 by the software
company Alta-Vista, which sought a way to prevent
automated submissions to their search-engine. It was a simple
text-based test which was sufficient for that time, but it
was quickly proven ineffective when the computer
character recognition success rates improved. The most
commonly used techniques to prevent automatic recognition
can be divided into two groups called anti-recognition
features and anti-segmentation features.</p>
        <p>The anti-recognition features such as different sizes and
fonts of characters or rotation was a straightforward first
step to the more sophisticated captcha schemes. All those
features are well accepted by humans, as we learn several
shapes of letters since childhood, e.g. handwritten
alphabet, small letters, capitals. The effective way of reducing
the classifier accuracy is a distortion. Distortion is a
technique in which ripples and warp are added to the image.
But excessive distortion can make it very difficult even for
humans and thus the usage of this feature slowly vanishes
being replaced by anti-segmentation features.</p>
        <p>The anti-segmentation features are not designed to
complicate a single character recognition but instead they try
to make the automated segmentation of the captcha image
unmanageable. The first two features used for this
purpose were added noise and confusing background. But
it showed up that both of them are bigger obstacle for
humans than for computers and therefore, they where replace
by occlusion lines, an example can be seen in Figure 1.
The most recent anti-segmentation feature is called
negative kerning. It means that the neighbouring characters
are moved so close to each other that they can eventually
overlap. It showed up that humans are still able to read the
overlapping text with only a small error rate, but for
computers it is almost impossible to find a right segmentation.
From the beginning, the adoption of captcha schemes was
problematic. Users were annoyed with captchas that were
hard to solve and had to try multiple times. The people
affected the most were those with visual impairments or
various reading disorders such as dyslexia. Soon, an
alternative emerged in the form of audio captchas. Instead of
displaying images, a voice reading letters and digits is played.
In order to remain effective and secure, the captcha has to
be resistant to automated sound analysis. For this purpose
various background noise and sound distortion are added.
Generally, this scheme is now a standard alternative option
on major websites that use captcha.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3 Image-Based</title>
        <p>Currently, the most prominent design is image-based
captcha. A series of images showing various objects is
presented to the user and the task is to select the images
with a topic given by a keyword or by an example image.
For example the user is shown a series of images of
various landscapes and is asked to select those with trees, like
in Figure 2. This type of captcha has gained huge
popularity especially on touchscreen devices, where tapping
the screen is preferable over typing. In the case of Google
reCaptcha there are nine images from which the 4 − 6 are
the correct answer. In order to successfully complete the
challenge a user is allowed to have one wrong answer.</p>
        <p>Relatively new but fast spreading type of image captcha
combines the pattern recognition task presented above
with object localisation. Also the number of squares was
increased from 9 to 16.
When a heat map is complete, all points with value greater
than 0.5 are added to the list of points to be clustered. As
this is still work in progress we simplified the situation by
knowing the number of characters within the image in
advance and therefore, knowing the correct number of
clusters k, we decided to use k-means clustering to determine
windows with characters close to their center. But almost
an arbitrary clustering algorithm can be used, preferably
some, that can determine the correct number of clusters.</p>
        <p>The k centroids are initialized uniformly from left to
right, vertically in the middle, as this provides a good
initial estimation. Figure 5 illustrates the whole idea.
In parallel with the image-based captcha developed by
Google and other big players, many alternative schemes
appeared. They are different variations of text-based
schemes hidden in video instead of distorted image, some
simple logical games or puzzles. As an example of an easy
to solve logical game we selected the naughts and crosses,
Figure 3. All of those got recently dominated by Google’s
noCaptcha button. It uses browser cookies, user profiles
and history to track users behaviour and distinguish real
users from bots.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Our Approach</title>
      <p>
        Our algorithm has two main stages localisation and
recognition. The localisation can be further divided into heat
map generation and clustering. Consequently, our
algorithm consist of three steps:
1. Create a heat map using a sliding window with an
ANN, that classifies whether there is a character in
the center or not.
2. Use the k-means algorithm to determine the most
probable locations of characters from the heat map.
(a) Initial centroids
(b) Final centroids
3. Recognize the characters using another specifically
trained ANN.
We decided to use the sliding window technique to
localize characters within a CAPTCHA image. This approach
is well known in the context of object localization [
        <xref ref-type="bibr" rid="ref17">16</xref>
        ].
A sliding window is a rectangular region of fixed width
and height that slides across an image. Each of those
windows serve as an input for a feed-forward ANN with a
single output neuron. Its output values are the probability of
its input image having a character in the center. Figure 4
shows an example of such heat map. To enable a
character localization even at the very edge of an image one can
expand each input image with black pixels.
Assuming that the character localization part worked well,
windows containing characters are now ready to be
recognized. This task is known to be easy for computers to
solve; in fact, they are even better than humans [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ].
      </p>
      <p>Again, a feed-forward ANN is used. This time with an
output layer consisting of 36 neurons to estimate the
probability distribution over classes: numbers 0–9 and
uppercase letters A–Z. Finally, a CAPTCHA transcription is
created by writing the recognized characters in the ascending
order of their x-axis coordinates.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>This section describes the selection of a captcha suite and
generation of the labelled database, followed by a detailed
description of the artificial neural networks used in our
experiments. The last part of this section presents results of
the experiments.
5.1</p>
      <sec id="sec-5-1">
        <title>Experimental Set up</title>
        <p>
          Training an ANN usually requires a lot of training
examples (in the order of millions in the case of a very deep
CNN). It is advised to have at least multiple times the
number of all parameters in the network [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]. Manually
downloading, cropping and labelling such high number of
examples is infeasible. Therefore, we tested three captcha
providers with obtainable source code to be able to
generate large enough datasets: Secureimage PHP Captcha [
          <xref ref-type="bibr" rid="ref6">5</xref>
          ],
capchas.net [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and BotDetect captcha [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We selected the
last one as it provides the most variable set of schemes.
        </p>
        <p>
          BotDetect CAPTCHA is a paid, up-to-date service
used by many government institutions and companies all
around the world [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. They offer a free licence with an
access to obfuscated source codes. We selected 11 very
diverse schemes out of available 60, see Figure 6 for
example of images, and generated 100.000 images cropped
to one character for each scheme. The cropping is done to
32x32 pixel windows, which is the size of a sliding
window. Cropped images are then used for training of the
localization as well as the recognition ANN. The testing set
consist of 1000 whole captcha images with 5 characters
each.
        </p>
        <p>Schemes display various security features such as
random lines and other objects occluding the characters,
jagged or translucent character edges and global warp.
The scheme s10 - Circles stands out with its colour
inverting randomly placed circles. This property could make it
harder to recognize than others, because the solver needs
to account for random parts of characters and their
background switching colours.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Artificial Neural Networks</title>
        <p>The perceptron with single hidden layer (SLP), the
perceptron with three hidden layers (MLP) and the convolutional
neural networks were tested in the localization and
recognition. In all ANNs, rectified linear units were used as
activation functions.</p>
        <p>First experiment tested the influence of the number of
hidden neurons of a SLP. The number of hidden neurons
used for the localization network was lns={15,30,60,90}
and the number of neurons for the recognition network was
rns={30,60,120,180,250}. The results depicted in
Figure 7 show the recognition rate for 1000 whole captcha
images (all characters have to be correctly recognized) on
the scheme s10. The scheme s10 was selected because we
consider it the most difficult one.
(a) Snow (s04)
(b) Stitch (s08)
(c) Circles (s10)
(d) Mass (s14)
(e) BlackOverlap (s16)
(f) Overlap2 (s18)
(g) FingerPrints (s25)</p>
        <p>(h) ThinWavyLetters (s30)
(i) Chalkboard (s31)
(j) Spiderweb (s41)
(k) MeltingHeat2 (s52)</p>
        <p>The next experiments was the same but the MLP with
three hidden layers was used instead of SLP. Results,
depicted in Figure 8, suggest that adding more hidden
layers does not improve accuracy of the localization neither
of the recognition. Therefore, the rest experiments were
done using SLP as it can be trained faster.</p>
        <p>
          Both CNNs architectures resemble the LeNet-5
presented in [
          <xref ref-type="bibr" rid="ref18">17</xref>
          ] for handwritten digits recognition. The
localization CNN consists of two convolutional layers with
six and sixteen 5x5 kernels, each of them followed by the
        </p>
        <p>Accuracy comparison on the s10 scheme
70
60
2x2 max pooling layers,and finally, the last layer of the
network is a fully connected output layer.</p>
        <p>The recognition CNN contains an additional
fullyconnected layer with 120 neurons right before the output
layer as illustrated in Figure 9.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Results</title>
        <p>
          After choosing the right architectures, we followed by
testing the accuracy of captcha transcription on each scheme
separately where both training and testing sets were
generated by the same scheme. All images in the test set
contained 5 characters and only the successful transcription of
all of them was accepted as a correct answer. The results,
depicted in Figure 10, show appealing performance of all
tested configurations. In the most cases it doesn’t matter
if the localization network was a SLP or a CNN, but the
CNN clearly outperforms the SLP in the role of a
recognition network. This observation is also confirmed by the
statistical test of Friedman [
          <xref ref-type="bibr" rid="ref13">12</xref>
          ] with corrections for
simultaneous hypothesis testing by Holm[
          <xref ref-type="bibr" rid="ref16">15</xref>
          ] and Shaffer [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ],
see Table 1.
        </p>
        <p>A subsequent experiment tested the accuracy of captcha
transcription when training and testing sets consist of
imx55 onC
rke lvo
len itonu
x22 xaM
ke -op
lren ilong
x55 onC
rke lvo
len itonu
x22 xaM
ke -op
lren ilong
Fu lFa
llcyon ,tten
n
e
c
t
e
d
co Fu
nn lly
e
c
t
e
d
All Schemes Accuracy
Leave-one-out Scheme Accuracy
ages generated by all schemes. Both training and testing
set contained examples generated by all schemes. The
results are depicted in Figure 11. In this experiment the CNN
outperformed the SLP not only in the recognition but even
in the localization accuracy. The most visible difference
is on schemes s08, s18, s41. Overall performance is again
compared by the statistical test with results summarized
in Table 2. All accuracies are lower than in the previous
experiment, as the data set complexity grown (data were
generated by multiple schemes), but the number of
training examples remained the same.</p>
        <p>The last experiment tested the accuracy of captcha
transcription in leave-one-scheme-out scenario. The training
set contained images generated by only 10 schemes and
the images used for testing were all generated by the last
yet unseen scheme. Trying to recognize characters from
images generated by an unknown scheme is a
challenging task, furthermore the schemes were selected to differ
form each other as much as possible. The results are
depicted in Figure 12. All configurations using a perceptron
as the recognition classifier fail in all except the most
simple schemes, e.g. s12 and s16. The combination of two
CNNs is the best in all cases, with only exception being
the scheme s30, where the combination of the localization
perceptron and the recognition CNN is the best.
Overall, the accuracy may seem relatively low, especially for
schemes s10, s30, s31 and s41, but lets recall that
recognition rate of 1% is already considered enough to
compromise the scheme. The failure of CNNS on scheme s41 is
understandable as the spiderweb background confuses the
convolutional kernels learned on other schemes.</p>
        <p>This is the most important experiment showing the
ability to solve yet unseen captcha .The ranking of all
algorithms is summarized in Table 3 and the statical tests in
Table 4.</p>
        <p>The above experiments show that most of current
schemes can be compromised using two convolutional
networks or a localization perceptron and a recognition CNN.
‘
In this paper, we presented a novel captcha recognition
approach, which can fully replace the state-of-the art scheme
specific pipelines. Our approach not only consists of less
steps, but it is also more general as it can be applied to a
wide variety of captcha schemes without modification. We
were able to compromise 10 out of 11 using two CNNs
or a localization perceptron and a recognition CNN
without previously seeing any example image generated by that
particular scheme. Furthermore, we were able to break all
11 captcha schemes using a CNN for the localization as
well as for the recognition, with the accuracy higher than
50% when we included example images of each
character generated by the particular scheme into the training set.
Lets recall that 1% recognition rate is enough for a scheme
to be considered compromised.</p>
        <p>We experimentally compared the ability of SLP, MLP
and CNN to transcribe characters from captcha images.
According to our experiments, CNNs performs much
better in both localization and recognition.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Acknowledgement</title>
        <p>The research reported in this paper has been supported by
the Czech Science Foundation (GA CˇR) grant 17-01251
and student grant SGS17/210/OHK3/3T/18.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Botdetect captcha generator www</article-title>
          .
          <source>captcha.com [Cited</source>
          <year>2017</year>
          -
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Free</surname>
          </string-name>
          captcha-service [online],
          <year>2017</year>
          . [Cited 2017-
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [online],
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Metal</surname>
            <given-names>captcha</given-names>
          </string-name>
          ,
          <year>2017</year>
          . www.
          <source>heavygifts.com/metalcaptcha [Cited</source>
          <year>2017</year>
          -
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Resisty</surname>
            <given-names>captcha</given-names>
          </string-name>
          ,
          <year>2017</year>
          . www.wordpress.org/plugins/resisty [Cited 2017-
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[5] Secureimage php captcha [online], www</article-title>
          .phpcaptcha.
          <source>org [Cited</source>
          <year>2017</year>
          -
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          , Volodymyr Mnih, and
          <string-name>
            <given-names>Koray</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          .
          <article-title>Multiple object recognition with visual attention</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Jonathan Aigrain, Angelika Moscicki, and John C Mitchell.
          <article-title>The end is nigh: Generic solving of textbased captchas</article-title>
          .
          <source>In 8th USENIX Workshop on Offensive Technologies (WOOT 14)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Steven Bethard, Celine Fabry, John C Mitchell, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <article-title>How good are humans at solving captchas? a large scale evaluation</article-title>
          .
          <source>In 2010 IEEE Symposium on Security and Privacy</source>
          , pages
          <fpage>399</fpage>
          -
          <lpage>413</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Matthieu Martin,
          <string-name>
            <given-names>and John</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>Textbased captcha strengths and weaknesses</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Computer and communications security</source>
          , pages
          <fpage>125</fpage>
          -
          <lpage>138</lpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kumar</surname>
            <given-names>Chellapilla</given-names>
          </string-name>
          , Kevin Larson, Patrice Simard, and
          <string-name>
            <given-names>Mary</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          .
          <article-title>Designing human friendly human interaction proofs (hips)</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          , pages
          <fpage>711</fpage>
          -
          <lpage>720</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Claudia</given-names>
            <surname>Cruz-Perez</surname>
          </string-name>
          , Oleg Starostenko,
          <string-name>
            <surname>Fernando</surname>
            <given-names>UcedaPonga</given-names>
          </string-name>
          , Vicente Alarcon-Aquino,
          <article-title>and Leobardo ReyesCabrera</article-title>
          .
          <article-title>Breaking recaptchas with unpredictable collapse: heuristic character segmentation and recognition</article-title>
          .
          <source>In Pattern Recognition</source>
          , pages
          <fpage>155</fpage>
          -
          <lpage>165</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Milton</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <article-title>The use of ranks to avoid the assumption of normality implicit in the analysis of variance</article-title>
          .
          <source>Journal of the american statistical association</source>
          ,
          <volume>32</volume>
          (
          <issue>200</issue>
          ):
          <fpage>675</fpage>
          -
          <lpage>701</lpage>
          ,
          <year>1937</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ian</surname>
            <given-names>Goodfellow</given-names>
          </string-name>
          , Yoshua Bengio, and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <source>Deep Learning</source>
          . MIT Press,
          <year>2016</year>
          . http://www. deeplearningbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ian</surname>
            <given-names>Goodfellow</given-names>
          </string-name>
          , Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and
          <article-title>Vinay Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Sture</given-names>
            <surname>Holm</surname>
          </string-name>
          .
          <article-title>A simple sequentially rejective multiple test procedure</article-title>
          .
          <source>Scandinavian journal of statistics</source>
          , pages
          <fpage>65</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <surname>CH</surname>
            . Lampert, MB. Blaschko, and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Beyond sliding windows: Object localization by efficient subwindow search</article-title>
          .
          <source>In CVPR 2008</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , Los Alamitos, CA, USA,
          <year>2008</year>
          .
          <article-title>Max-Planck-</article-title>
          <string-name>
            <surname>Gesellschaft</surname>
          </string-name>
          , IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Yann</surname>
            <given-names>LeCun</given-names>
          </string-name>
          , Léon Bottou, Yoshua Bengio, and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Haffner</surname>
          </string-name>
          .
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Juliet</surname>
            <given-names>Popper</given-names>
          </string-name>
          <string-name>
            <surname>Shaffer</surname>
          </string-name>
          .
          <article-title>Multiple hypothesis testing</article-title>
          .
          <source>Annual review of psychology</source>
          ,
          <volume>46</volume>
          (
          <issue>1</issue>
          ):
          <fpage>561</fpage>
          -
          <lpage>584</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Stark</surname>
          </string-name>
          , C. Hazırba¸s,
          <string-name>
            <given-names>R.</given-names>
            <surname>Triebel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Cremers</surname>
          </string-name>
          .
          <article-title>Captcha recognition with active deep learning</article-title>
          .
          <source>In GCPR Workshop on New Challenges in Neural Computation</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Luis</given-names>
            <surname>Von</surname>
          </string-name>
          <string-name>
            <surname>Ahn</surname>
          </string-name>
          , Manuel Blum, Nicholas J Hopper,
          <string-name>
            <given-names>and John</given-names>
            <surname>Langford</surname>
          </string-name>
          . Captcha:
          <article-title>Using hard ai problems for security</article-title>
          .
          <source>In Advances in Cryptology-EUROCRYPT</source>
          <year>2003</year>
          , pages
          <fpage>294</fpage>
          -
          <lpage>311</lpage>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>