<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>How to Mimic Humans, Guide for Computers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Kopp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matouš Pištora</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Cognitive Research Team in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague Thákurova 9</institution>
          ,
          <addr-line>160 00 Prague</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>110</fpage>
      <lpage>117</lpage>
      <abstract>
        <p>This paper studies reverse Turing tests to tell humans and computers apart. Contrary to classical Turing tests, the judge is not a human but a computer. These tests are often called Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHA). The main purpose of such test is avoiding automated usage of various services, preventing bots from spamming on forums, securing user logins against dictionary or brute force password guessing and many others. During years, a diversity of tests appeared. In this paper, we focused on the two most classical and widespread schemes, which are text-based and audiobased CAPTCHA, and on their use in the Czech internet environment. The goal of this paper is to point out flaws and weak spots of often used solutions and consequent security risks. To this end, we pipelined several relatively easy algorithms like flood fill algorithm and knearest neighbours, to overcome CAPTCHA challenges at several web pages, including state administration.</p>
      </abstract>
      <kwd-group>
        <kwd>CAPTCHA</kwd>
        <kwd>machine learning</kwd>
        <kwd>network security</kwd>
        <kwd>optical character recognition</kwd>
        <kwd>speech recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In the past few decades, the rise of the internet has
revolutionised our lives. We use it for work, study, socialising,
shopping and many other activities on a daily basis. With
the increasing popularity of the web, many public services
have became a target of a malicious activity of some kind.
There were attempts to, e.g., exploit mail servers for
sending massive amounts of spam messages, create numerous
fake profiles on social networks or make fraudulent offers
on online marketplaces. In order to block the access of
automated scripts and bots, the web sites had started to
use various captchas1 based security protocols in hopes of
ensuring their safety. Over the years, such schemes have
evolved in one of the standard security measures.</p>
      <p>
        The acronym CAPTCHA stands for Completely
Automated Public Turing test to Tell Computers and Humans
Apart, and was coined in 2003 by von Ahn et al [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ]. The
fundamental idea of its authors is to use a yet unsolved
hard AI problem which is easy for humans to solve. In
theoretical informatics, the Standard Turing test [
        <xref ref-type="bibr" rid="ref17">18</xref>
        ] is
defined as a test in which a human judge is supposed to
con1We will write captcha in lowercase for typographical reasons.
sistently distinguish whether he/she is communicating via
text with a human counterpart or a computer pretending
to be a human. However, for the automatic and effective
testing, the judge must also be a computer. This is where
captcha, often called a reverse Turing test, comes into play.
      </p>
      <p>Nowadays, a captcha is a program that generates a test
which the majority of humans are able to solve, but current
computer programs are not. Its mainly used on websites
to distinguish whether the user is a human or a robot. The
need for this type of challenge arose with the increasing
amount of internet bots and automated scripts attempting
to exploit public web services. Nowadays, it is an
established security mechanism to prevent mailing spam
messages, mass posting on internet forums, mass voting in
online polls and downloading files in large amounts.</p>
      <p>
        An interesting work has been done by the Microsoft
researcher Chellapilla [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] who calls these tests Human
interaction proofs. His work focuses on distinguishing
effective distortion features and specifying best practices for
designing captchas which are resistant to computers while
remaining relatively easy for humans to solve. He also
states that, depending on the cost of the attack, automated
scripts should not be more successful than 1 in 10 000
attempts, while human success rate should approach 90%.
It is generally considered a too ambitious goal, as
random guesses can be successful [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ], and consequently, a
captcha is considered compromised when the attacker
success rate surpasses 1%.
      </p>
      <p>This is a work in progress, and we started it with
websites that are most familiar to our everyday life, which are
websites in Czech. More precisely, we focused on
webpages of the state administration and similar to show them
the vulnerability of sometimes critical systems of the
national infrastructure. The main purpose of this paper is to
show that the captcha schemes used on such webpages are
easy to solve and therefore unsafe and to alarm the
responsible offices. This is especially alarming on the webpages
like State Office for Nuclear Safety or the Czech State
Administration of Land Surveying and Cadastre.</p>
      <p>The rest of this paper is organised as follows. The
related work is briefly reviewed in the next section.
Section 3 surveys the current captcha solutions. Section 4
presents our approach to breaking text-based and
audiobased captcha challenges. The experimental evaluation is
summarised in Section 5 and the paper closes with a
conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Most papers about breaking captcha heavily focus on some
particular scheme. As an example may serve [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] with
scheme reCapthca 2011. To our knowledge, the most
general approach is presented in [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. This approach is based
on effective selection of the best segmentation cuts. It was
tested on many up-to-date text-based schemes with better
results than most of specialised solutions. But even that
work was focused solely on the text-based schemes. We
focused our efforts in a different way and instead of
targeting one particular scheme, we tried to break captchas
of different types, but all in the Czech internet
environment. Unfortunately, we found only text-based and audio
based captcha. Therefore, we tried to break both of them
on several web sites including the Czech State
Administration of Land Surveying and Cadastre[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or the State Office
for Nuclear Safety[
        <xref ref-type="bibr" rid="ref3">4</xref>
        ].
      </p>
      <p>
        The most recent approaches use neural networks like
[
        <xref ref-type="bibr" rid="ref15">16</xref>
        ]. The results are still not that impressive compared
to the previous approaches, but the neural-net-based
approaches improve very quickly. We intend to use
convolution neural networks in our future work as well. But in this
paper we tried to use as simple techniques as possible and
show that even with them, we were able to compromise all
captcha schemes presented in this study.
      </p>
      <p>
        Not all captcha schemes support the audio as an
alternative. Consequently, there was not that much effort spent
in this topic. One of the first really successful attacks is
well described in [
        <xref ref-type="bibr" rid="ref16">17</xref>
        ], followed by even greater success
in [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ]. More recent results of the same team are presented
in [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]. The reason for our investment into audio captcha
is to decide if it is generally easier to break text-based or
audio-based captcha when both are available. Again, we
used only the most simple techniques to point out the
vulnerability of audio-based captchas.
      </p>
      <p>
        An excellent assessment of humans success rate in
completing captcha challenges can be found in [
        <xref ref-type="bibr" rid="ref8">9</xref>
        ]. As our
paper is work in progress, we have human results only for
the audio-based schemes.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Captcha schemes survey</title>
      <p>This section surveys the currently available captcha
schemes and challenges they present.
3.1
The first ever use of captcha was in 1997 by the software
company Alta-Vista, which sought a way to prevent
automated submissions to their search-engine. It was a
simple text-based test which was sufficient for that time, but
it was eventually proven ineffective. At that time, the
computer recognition rates of single characters were
already on par with those of humans, and thus the
development of captchas shifted to the prevention of
segmentation like noise addition, cluttering and other various
antisegmentation techniques. With the effort to prevent
breaking of captchas with increasing the amount of distortion
and cluttering, the challenges faced the risk of becoming
almost illegible. The design of human friendly, yet secure
captchas becomes a serious challenge. The most
commonly used techniques to prevent automatic recognition
can be divided into two groups called anti-recognition
features and anti-segmentation features.</p>
      <p>The anti-recognition features such as the use of different
size of characters in multiple fonts was a straightforward
first step to the text-based captcha schemes. Those and
other anti-recognition features, like character rotation, are
typically no problem for humans because we see it on
everyday basis. The only exception is distortion. Distortion
is a technique in which ripples and warp are added to the
image. It is one of the easiest and most effective ways of
reducing the classifier accuracy. But excessive distortion
can make it very difficult even for humans and thus
usage of this feature slowly vanishes. Due to advances in
pattern recognition and optical character recognition, all
those features became obsolete and were to some extend
replaced by anti-segmentation features.</p>
      <p>The anti-segmentation features are not designed to
complicate a single character recognition but instead they try
to make the segmentation of the captcha image
unmanageable, preserving the readability by humans. The first two
features used for this purpose were added noise and
confusing background. But it showed up that both of them are
bigger obstacle for humans than for computers. After that
the occlusion lines appeared in the wild. A good
implementation of occluding lines is one of the most effective
and human-friendly ways of preventing segmentation, an
example can be seen at Figure 1. The most recent feature
is called negative kerning. It means that the neighbouring
letters are moved so close to each other that they can
eventually overlap. It showed up that humans are still able to
read the overlapping text with only a small error rate, but
for computers it is almost impossible to found the right
segmentation.
From the beginning, the adoption of captcha schemes was
not the ideal state. Users were annoyed with captchas that
were hard to solve and had to try multiple times in order
to solve them. The people affected the most were those
with visual impairments or various reading disorders such
as dyslexia. Soon, an alternative emerged in the form
of audio captchas. Instead of looking at the image and
transcribing the displayed characters, the user was given
the option, usually alongside with a traditional text-based
captcha, to play a sound puzzle and write the characters
that he/she heard. In order to remain effective and secure,
the captcha has to be resistant to automated sound
analysis. For this purpose various background noise and sound
distortion are added. Still a human visitor should have no
problem in hearing and recognising the code. Generally,
this scheme is now a standard option on major websites
that implement captcha.</p>
      <p>The major anti-automation tools are changing speakers,
involving both males and females of ages ranging from
children to retired. Most of the current solutions rely on
the added noise. The level of sophistication is very
diverse, ranging from buzz, singing birds to human speakers
played backwards.</p>
      <sec id="sec-3-1">
        <title>3.3 Image-based</title>
        <p>With the advancement of captchas, criticism soon began to
appear. The obstacle of solving a puzzle every time
someone wants to enter a site is at least annoying and
discouraging for the common user. It is in the everyones best interest
to keep the customer satisfied all the time and make their
user experience the most pleasant. In order to preserve
security against spam-bots, new captcha designs were
developed. The most prominent design was image-based
captcha. The user is presented with a series of images
showing various objects and the task lies usually in
detecting which of them have a common topic and selecting
them. For example a user is shown a series of images of
various landscapes and is asked to select those with trees,
like in Figure 2. This type of captcha has gained huge
popularity on touchscreen devices like tablets and smart
phones, where simply tapping the screen is the preferable
option over typing the code.
3.4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Other types</title>
        <p>
          In parallel with the image-based captcha developed by
google and other big players, many alternative schemes
appeared. They are different variations of text-based
schemes hidden in video instead of distorted image, some
simple logical games or puzzles. As an example of an easy
to solve logical game we selected the naughts and crosses,
Figure 5. As a special type of of text-based scheme can be
considered the metal captcha. This scheme shows to the
user not the automatically distorted characters but a logos
of metal bands which are typically unreadable, see
Figure 3. All of those got recently dominated by Google’s
noCaptcha button, Figure 4. They say that this single button
can distinguish between humans and computers. It uses
browser cookies and somehow track user behaviour on the
webpage, but implementation detail were not disclosed.
In this section, the algorithm pipelines for both text-based
and audio-based captcha schemes are described. We are
aware that there are some very advanced approaches e.g.
[
          <xref ref-type="bibr" rid="ref15 ref5">6, 16</xref>
          ] but we intentionally used simple algorithms in the
basic pre-process, segment and recognize pipeline. Our
motivation is to show that even using simple approaches,
most currently used captchas in the Czech internet
environment can be compromised.
The text-based captchas are still the most widely used
ones. Their goal is to present an image with distorted
characters using anti-recognition and anti-segmentation
features combined in such a way that humans can easily read
it but computers do not. Our goal, on the contrary, is to
successfully recognize all those characters automatically.
        </p>
        <p>The first step in the intended pipeline is conversion of
an image to the grayscale. The image is converted from</p>
        <p>
          This equation was adopted in the Rec. BT 601 standard
by the International Telecommunication Union [
          <xref ref-type="bibr" rid="ref13">14</xref>
          ].
        </p>
        <p>The image is then transformed to a binary image by a
thresholding method. Pixels with an intensity higher than
the threshold are converted to the white colour and those
with a lower intensity are converted to black. For a given
threshold T the equations is:</p>
        <p>Y (x) =
(0 if x &lt; T</p>
        <p>1 otherwise</p>
        <p>
          The threshold is computed by iterating through all
possible thresholds and selecting the one which minimises the
within-class variance. This method was proposed by Otsu
in [
          <xref ref-type="bibr" rid="ref12">13</xref>
          ]. The class probabilities and the class variances are
computed from the image brightness histogram:
σω2 (t) = ω0(t)σ02(t) + ω1(t)σ12(t)
        </p>
        <p>t−1
ω0(t) = ∑ p(i)
i=0</p>
        <p>L−1
ω1(t) = ∑ p(i)</p>
        <p>i=t
where p is a greyscale level probability and L is the
number of the greyscale levels.
(2)
(3)
(4)
(5)</p>
        <p>Next part is a noise removal. For this we used
morphological operations followed by the flood fill algorithm.
Morphological operations are a simple yet powerful
approach to remove speckles and occluding lines. With the
closing operation, we can fill small holes and gaps in the
image, and with the opening operation loosely connected
segments are disjointed and small points and lines are
removed. The four basic binary morphological operations:
dilation ⊕, erosion ⊖, opening ◦ and closing • are defined
as follows:</p>
        <p>X ⊕ H = {(x, y) : H(x,y) ∩ X 6= 0/}
X ⊖ H = {(x, y) : H(x,y) ⊆ X }
X ◦ H = (X ⊖ H) ⊕ H
X • H = (X ⊕ H) ⊖ H
(6)
(7)
(8)
(9)
where X is the original image, H the structuring element
and H(x, y) the translation of H by the vector (x, y). The
effect of the closing operation can be described as erasing
the object border and then regrowing it back. If in the first
step an object is small enough to be considered a border as
a whole, there is subsequently nothing to regrow and thus
it is deleted.</p>
        <p>The next approach is to count areas of all connected
components (in terms of pixels it contains) and delete the
ones with the area below a certain threshold. The idea is to
iterate on each pixel of the image and when a white pixel is
found a flood fill algorithm is used to count the number of
pixels in the area. Individual characters are large objects
and such can be easily distinguished from noise by
empirically setting a certain threshold. The objects with area
count below the threshold are then deleted, which results
in an almost noiseless image.</p>
        <p>Even with our simplistic approach, only the individual
characters and a few lines remain. At first we isolate all
the objects left in the image, which is done by iterating
through every pixel. When an unlabelled pixel with a
foreground colour is found, the flood fill algorithm is used to
paint it with a new unique colour. Due to the nature of
occluding lines, their position is generally horizontal. That
is unlike any of the characters the captchas contain and as
such the isolated lines can be easily eliminated by
deleting all objects with their height under an empirically set
threshold.</p>
        <p>If the number of isolated objects is the desired
number of characters, a captcha is considered successfully
segmented. In the other case, we have two possibilities. If the
number of objects is greater than number of characters, it
implies that there are some speckles or line segments left.
They are eliminated by deleting objects with the lowest
pixel count. This usually provides good results. If there
are fewer objects than the number of characters it indicates
a connection of multiple characters either by a remaining
line or by collapsing. This situation is resolved by the
Xaxis projection algorithm.</p>
        <p>Its main idea for two or more joined characters is that
the pixel count between them is generally lower than in
the centre of the character. First, we construct the X-axis
projection by summing pixels of each column. Next, all
local minima are found which will be later considered for
cutting points. The next step is to remove all local
minima which have their pixel count under the empirically set
threshold to eliminate most cutting points positioned in the
middle of a character. All possible segmentations into two
parts left are then considered for the subsequent
classification. Finally the cutting point which maximises the
classification performance is selected. Fortunately, this is a
really rare event.</p>
        <p>When the segmentation step is done, each segment is
resized to 20x20 pixels, resulting in a vector of 400 binary
values. These vectors are then used as features for the k-nn
classifier. Parameters of the k-nn classifier are discussed in
Section 5.
4.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Audio captcha</title>
        <p>For the audio-based captchas the pipeline is even simpler.
The most advanced audio captcha looks like the one at
Figure 10. A human speaker with a lot of noise making
it very hard to do a good segmentation. Contrary, the ones
we found on the Czech internet looks more like Figure 11.
A synthetic voice was used and the level of added noise
is almost negligible. Therefore, we can simply skip the
noise cancelation step. Furthermore, the segmentation is
much simpler than in the text-based case. The audio data
are normalised to zero mean and unit variance. The
segmentation is done based on amplitude thresholding with
an empirically set threshold.</p>
        <p>
          According to [
          <xref ref-type="bibr" rid="ref14">15</xref>
          ], speech signals are time-varying
signals, which are stationary for a short time periods
(5100 ms). The change of the signal then reflects
different phonemes. The information in a speech signal is
actually represented by a short term amplitude spectrum of
the speech wave form. Therefore, we split the character
wave form into 10 bins, extracted means and variances of
amplitudes from each bin and used them as features. The
last feature is the length of sound wave in seconds.
        </p>
        <p>Those feature vectors, containing 21 scalar values, are
then presented to a k-nn classifier.
This section describes all the experiments we have done
so far, setting of k-nn parameters for both audio and
textbased captchas and evaluating of the successful
recognition rate for each analysed scheme. Because this is work
in progress, there are still some missing values and not all
experiments were finished yet.</p>
        <p>
          We have tested text-based captchas
recognition at the following web sites: cuzk.cz[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ],
mojedatovaschranka.cz[3], sujb.cz[
          <xref ref-type="bibr" rid="ref3">4</xref>
          ], uloz.to[
          <xref ref-type="bibr" rid="ref4">5</xref>
          ],
centralniregistrdluzniku.cz[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and the audio-based
captchas recognition at: sujb.cz[
          <xref ref-type="bibr" rid="ref3">4</xref>
          ] and again uloz.to[
          <xref ref-type="bibr" rid="ref4">5</xref>
          ].
5.1
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Parameters setting</title>
        <p>For the parameters setting, we used together 510
textbased and audio-based captchas, which were manually
labelled. We used a 3-fold cross-validation, entailing 340
samples for training and remaining 170 for testing.</p>
        <p>Our experimental results on the uloz.to dataset suggests
that the best option for the text-based captchas are
manhattan distance and k = 6, see Figure 12. For the audio-based
captchas the graph looked pretty similar, but we used
euclidean instead of manhattan distance. It showed up that
the euclidean metric is the best and together with k = 9 it
achieved recognition rate 86,5%, followed by the cosine
metric with 84.1%.</p>
        <p>The uloz.to was chosen as the primary testing dataset
for multiple reasons. It has the most advanced captcha we
found on the Czech internet in both text-based and
audiobased cases. We didn’t found any design or
implementation flaws like for e.g. for the cuzk.cz web site. Therefore,
we expected the parameters set on the uloz.to dataset will
be robust enough even for the other schemes and according
to Figure 13, this is more or less true.
uloz.to The uloz.to is a file sharing service which uses
captchas to prevent automated file downloading. They
support both text-based and audio-based schemes. Their
text-based scheme is very good compared to others we
analysed. They use distortion, rotation a lot of noise and
occluding lines. Their audio captcha use one synthetic
voice with addition of a weak noise signal.</p>
        <p>We have analysed 510 samples of audio and
textbased challenges. Our average recognition rate for whole
captchas estimated by 10-fold cross-validation was 14%
for text-based and 86% for audio-based captchas. The
14% recognition rate does not seem much, but lets recall
that there is the 1% threshold to consider a captcha scheme
compromised. Furthermore, we have tested up to ten
humans to solve the random audio captchas and their success
rate ranged from 54% to 76%. This in fact means that the
computers are better than humans in test which should tell
them apart.</p>
        <p>sujb.cz The State Office for Nuclear Safety uses a
captcha to secure their public forum. Both text-based and
audio-based schemes are available and easy to solve. Both
schemes lack noise and anti-recognition features. The
text-based scheme has occluding lines, but they have a
different colour than characters so it is easy to filter them out.</p>
        <p>The overall recognition rate was 98% for audio-based
and 86% for text-based captchas. But we have to admin
that we used only 50 images and audio files to obtain those
results.</p>
        <p>cuzk.cz The Czech State Administration of Land
Surveying and Cadastre uses only the text-based captcha to
disable automatic queries to their database. The images
generated by their scheme look well on the first sight but
there is a serious design bug. The captcha shown on an
image is not a standard GIF or JPEG format but rather a .axd
file, which is the HTTP Handler used by ASP.NET
applications. Therefore, the image is generated on runtime.
Simply refreshing the image (not the whole page) then
generates a new captcha challenge containing the same
characters.</p>
        <p>Thanks to the bug, we were able to obtain and label
2100 different images. This flaw can be easily exploited
to achieve a nearly 100% precision, by downloading more
and more images until we are sure about correct
recognition. To be fair we did not used this bug in our evaluation
and still were able to obtain 46% captcha recognition rate.</p>
        <p>mojedatovaschranka.cz This scheme is pretty weak,
lacking any anti-segmentation features, with a differently
coloured noise and using only digits. Our result is a 82%
success recognition rate over the testing set of 50 samples.</p>
        <p>centralniregistrdluzniku.cz This page serves as the
central registry of debtors and captcha must be solve
before you can upload your customer experience with some
company. Adopted scheme is again easy to solve, the
distortion is weak and occluding lines have a different colour
than the characters. Our result is a 61% success
recognition rate over a testing set of 50 samples.
5.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Summary</title>
        <p>The final results are summarised in Table 1. The reported
numbers are captcha recognition rates, estimated by a
10fold cross-validation. Some values are missing, because
the audio-based captcha alternative is available only for
uloz.to and sujb.cz.</p>
        <p>Finally, the overall misclassification overview is given
for the text-based captcha in Figure 19, and for the
audiobased in Table 2.
This research was driven by curiosity of security
enthusiasts and will be used for academic purposes only. None of
us have any malevolent or business intentions.</p>
        <p>We have tested the security of several captcha solutions
across Czech internet environment. We intentionally used
the out of the shelf algorithms to simulate simple attacks.
The final result is that the current state is alarming. All
tested solution have been compromised with recognition
rate highly over 1%. The most secure solution was the
text-based scheme at uloz.to, where we achieved only 14%
recognition rate. On the other hand we were about 10%
more accurate than humans in terms of average
recognition rate on their audio-based captchas.</p>
        <p>The second most secure were challenges generated at
the web site of the Czech State Administration of Land
Surveying and Cadastre. The captcha is used to block
automated queries to the database and it should prevent
massive downloads of private informations about the
ownership of real estates. Our recognition rate was almost
one half, more precisely 46%. But due to the design flaw
of this captcha, described in Section 5, it can be easily
boosted to almost 100% precision.</p>
        <p>The key messages of this paper should be: do not rely
on any captcha as the only defence agains automation and</p>
        <p>phoneme
never use captcha as the only security solution and for the
attacker it is: if you can choose, try audio captchas, they
are typically easier to break.</p>
        <p>As to our future work, we are still preparing a more
complete survey of captcha solutions used on the Czech
internet. We are especially searching for more state
administration pages, that use completely insufficient solutions or
design flaws. Currently we are devoting our research
efforts to the application of convolution neural networks in
this context as we believe that they can replace our whole
text-based pipeline. We are also starting to pay attention
to image-based captchas like the one in Figure 2</p>
      </sec>
      <sec id="sec-3-6">
        <title>Acknowledgement</title>
        <p>The research reported in this paper has been supported by
the Student Grant SGS16/119/OHK3/1T/18.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] Nahlížení do katastru nemovitostí [online]</source>
          ,
          <year>2004</year>
          -
          <fpage>2016</fpage>
          . [Cited 2016-
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] Centrální registr dlužníku˚ [online],
          <year>2016</year>
          . [Cited 2016-
          <volume>06</volume>
          - 01].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[4] Státní úrˇad pro jadernou bezpecˇnost [online]</source>
          ,
          <year>2016</year>
          . [Cited 2016-
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ulož</surname>
          </string-name>
          .to [online],
          <year>2016</year>
          . [Cited 2016-
          <volume>06</volume>
          -01].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Jonathan Aigrain, Angelika Moscicki, and John C Mitchell.
          <article-title>The end is nigh: Generic solving of textbased captchas</article-title>
          .
          <source>In 8th USENIX Workshop on Offensive Technologies (WOOT 14)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Romain Beauxis, Hristo Paskov, Daniele Perito, Celine Fabry, and John Mitchell.
          <article-title>The failure of noise-based non-continuous audio captchas</article-title>
          .
          <source>In Security and Privacy (SP)</source>
          ,
          <source>2011 IEEE Symposium on</source>
          , pages
          <fpage>19</fpage>
          -
          <lpage>31</lpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Bethard</surname>
          </string-name>
          . Decaptcha: breaking 75%
          <article-title>of ebay audio captchas</article-title>
          .
          <source>In Proceedings of the 3rd USENIX conference on Offensive technologies, page 8. USENIX Association</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Elie</given-names>
            <surname>Bursztein</surname>
          </string-name>
          , Steven Bethard, Celine Fabry, John C Mitchell, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <article-title>How good are humans at solving captchas? a large scale evaluation</article-title>
          .
          <source>In 2010 IEEE Symposium on Security and Privacy</source>
          , pages
          <fpage>399</fpage>
          -
          <lpage>413</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Elie</surname>
            <given-names>Bursztein</given-names>
          </string-name>
          , Matthieu Martin,
          <string-name>
            <given-names>and John</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>Textbased captcha strengths and weaknesses</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Computer and communications security</source>
          , pages
          <fpage>125</fpage>
          -
          <lpage>138</lpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kumar</surname>
            <given-names>Chellapilla</given-names>
          </string-name>
          , Kevin Larson, Patrice Simard, and
          <string-name>
            <given-names>Mary</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          .
          <article-title>Designing human friendly human interaction proofs (hips)</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          , pages
          <fpage>711</fpage>
          -
          <lpage>720</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Claudia</given-names>
            <surname>Cruz-Perez</surname>
          </string-name>
          , Oleg Starostenko,
          <string-name>
            <surname>Fernando</surname>
            <given-names>UcedaPonga</given-names>
          </string-name>
          , Vicente Alarcon-Aquino,
          <article-title>and Leobardo ReyesCabrera</article-title>
          .
          <article-title>Breaking recaptchas with unpredictable collapse: heuristic character segmentation and recognition</article-title>
          .
          <source>In Pattern Recognition</source>
          , pages
          <fpage>155</fpage>
          -
          <lpage>165</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Nobuyuki</given-names>
            <surname>Otsu</surname>
          </string-name>
          .
          <article-title>A threshold selection method from graylevel histograms</article-title>
          .
          <source>Automatica</source>
          ,
          <volume>11</volume>
          (
          <fpage>285</fpage>
          -296):
          <fpage>23</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>ITUR</given-names>
            <surname>Rec</surname>
          </string-name>
          .
          <source>Bt</source>
          <volume>601</volume>
          :
          <article-title>Studio encoding parameters of digital television for standard 4: 3 and wide-screen 16: 9 aspect ratios</article-title>
          .
          <source>ITU-R Rec. BT</source>
          ,
          <volume>656</volume>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Urmila</given-names>
            <surname>Shrawankar and Vilas M Thakare.</surname>
          </string-name>
          <article-title>Techniques for feature extraction in speech recognition system: A comparative study</article-title>
          .
          <source>arXiv preprint arXiv:1305.1145</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Stark</surname>
          </string-name>
          , C. Hazırbas¸,
          <string-name>
            <given-names>R.</given-names>
            <surname>Triebel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Cremers</surname>
          </string-name>
          .
          <article-title>Captcha recognition with active deep learning</article-title>
          .
          <source>In GCPR Workshop on New Challenges in Neural Computation</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Jennifer</surname>
            <given-names>Tam</given-names>
          </string-name>
          , Jiri Simsa, Sean Hyde, and Luis V Ahn.
          <article-title>Breaking audio captchas</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>1625</fpage>
          -
          <lpage>1632</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Alan</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Turing.</surname>
          </string-name>
          <article-title>Computing machinery and intelligence</article-title>
          .
          <source>Mind</source>
          ,
          <volume>59</volume>
          (
          <issue>236</issue>
          ):
          <fpage>433</fpage>
          -
          <lpage>460</lpage>
          ,
          <year>1950</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Luis</given-names>
            <surname>Von</surname>
          </string-name>
          <string-name>
            <surname>Ahn</surname>
          </string-name>
          , Manuel Blum, Nicholas J Hopper,
          <string-name>
            <given-names>and John</given-names>
            <surname>Langford</surname>
          </string-name>
          . Captcha:
          <article-title>Using hard ai problems for security</article-title>
          .
          <source>In Advances in Cryptology-EUROCRYPT</source>
          <year>2003</year>
          , pages
          <fpage>294</fpage>
          -
          <lpage>311</lpage>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>