=Paper=
{{Paper
|id=Vol-2243/paper10
|storemode=property
|title=Neural Cryptanalysis of Classical Ciphers
|pdfUrl=https://ceur-ws.org/Vol-2243/paper10.pdf
|volume=Vol-2243
|authors=Riccardo Focardi,Flaminia L. Luccio
|dblpUrl=https://dblp.org/rec/conf/ictcs/FocardiL18
}}
==Neural Cryptanalysis of Classical Ciphers==
Neural Cryptanalysis of Classical Ciphers ? Riccardo Focardi and Flaminia L. Luccio Università Ca’ Foscari, Venezia {focardi,luccio}@unive.it Abstract. Cryptanalysis identifies weaknesses of ciphers and investigates meth- ods to exploit them in order to compute the plaintext and/or the secret cipher key. Exploitation is nontrivial and, in many cases, weaknesses have been shown to be effective only on reduced versions of the ciphers. In this paper we apply artificial neural networks to automatically “assist” cryptanalysts into exploiting cipher weaknesses. The networks are trained by providing data in a form that points out the weakness together with the encryption key, until the network is able to generalize and predict the key (or evaluate its likelihood) for any possible ciphertext. We illustrate the effectiveness of the approach through simple classical ciphers, by providing the first ciphertext-only attack on substitution ciphers based on neural networks. Keywords: Artificial neural networks; cryptography; cryptanalysis. 1 Introduction Cryptography aims at decorrelating as much as possible plaintexts and keys from ci- phertexts, so that computing the key from the plaintext and the ciphertext is infeasible, even when the attacker can choose the plaintexts to encrypt or ciphertexts to decrypt (the so-called chosen-plaintext and chosen-ciphertext scenarios). It is thus very hard to imagine that any fully automated technique will possibly break complex ciphers. On the other side, cryptanalysis looks for ways to break (part of) ciphers, in a white-box fashion: for example, by exploiting weaknesses of internal operations it might become possible to compute part of the key bits reducing the cost of brute-forcing attacks. However, exploitation is nontrivial and, in many cases, weaknesses have been shown to be effective only on reduced versions of the ciphers (see, e.g., [7,8]). In this paper, we apply artificial neural networks to automatically “assist” cryptan- alysts into exploiting cipher weaknesses. Our threat model assumes that the attacker knows or finds new weaknesses of a cipher and trains neural networks to exploit these weaknesses automatically. The networks are trained by providing data in a form that points out the weakness together with the encryption key, until the network is able to generalize and predict the key (or estimate its likelihood) for any possible ciphertext. In other words, neural networks become a tool for the cryptanalyst to automate and ? Work partially supported by CINI Cybersecurity National Laboratory within the project Filiera- Sicura funded by CISCO Systems Inc. and Leonardo SpA. optimize her attacks: the cryptanalyst points out the weakness and the neural network learns how to exploit it. We illustrate the effectiveness of the approach through simple classical ciphers such as, shift (Caesar), Vigenère and substitution. We are aware that these ciphers are considered extremely weak, however they are well suited to illustrate the approach. Cryptanalysis of these ciphers can be done by only inspecting the statistical structure of the ciphertext.1 Thus, we train the neural networks by providing simple statistical features (such as frequencies of single letters, digrams and trigrams) together with the adopted key until the network generalizes the attack and becomes able to predict a key (or compute its likelihood) from any given ciphertext. Interestingly, we provide the first ciphertext-only attack on substitution ciphers based on neural networks. Our results are preliminary but we believe that the proposed approach is a promising step towards a more efficient and partially automated cryptanalysis of modern ciphers. In particular, our approach might be adopted to cryptanalyze existing broken ciphers, e.g., in the automotive industry [21,22], or to automate (and hopefully optimize) known attacks in reduced versions of state-of-the-art, secure ciphers. Contributions. We contribute as follows: (i) we show a very efficient and fast way to cryptanalyze shift ciphers using neural networks. The trained neural network is able to recover the key by providing as input the relative frequencies of the ciphertext letters; (ii) we show how the neural network for shift ciphers can be applied to break Vigenère poly-alphabetic ciphers. The neural network is applied to subtexts until it provides an accurate result and outputs the cipher’s key; (iii) we provide a very accurate and efficient strategy to cyptanalyse substitution ciphers. The neural network takes as input the 3-grams (three letters) frequencies and provides a measure of the goodness of the plaintext, with respect to English. This allows for searching the key-space efficiently until the correct plaintext is found. To the best of our knowledge this is the first attack on substitution ciphers based on neural networks. Related work Recently, lots of work has been done in order to apply different computa- tional intelligence techniques to the field of cryptography (see, e.g., [6] for an overview). In this paper we only concentrate on machine learning techniques and more specifically on neural networks. The idea of relating the fields of cryptography and machine learning goes back to 1993 when Rivest in [18], firstly discussed how, at that time, each field was contributing with techniques and insights to the other. After that, much work has been done in this direction, in particular for what concerns the use of machine learning techniques to help cryptographers to develop new good ciphers. In this paper we are interested in solving a different problem, i.e., the application of neural networks to automatically “assist” cryptanalysts into exploiting cipher weaknesses. In this regard, the literature is more limited and we discuss it below. For what concerns attacks to classical ciphers, [4] proposes the use of neural networks to run known-plaintext attacks to Vigenère ciphers which, in fact, can be trivially achieved by computing the difference between ciphertexts and plaintexts. Our attacks, instead, 1 Notice that, when a pair plaintext-ciphertext is known computing the key of these ciphers is trivial, so we will consider ciphertext-only attack scenarios. 2 are ciphertext-only. In [15], the author shows how recurrent neural networks may be used to learn decryption algorithms for the Vigenère, Autokey, and Enigma ciphers. This approach, as the author states, is data inefficient as it requires at least a million training examples, i.e., pairs of plaintext and ciphertext to learn a cipher. From a practical point of view this translates to a few days of calculations on a k40 GPU to get 96 − 97% accuracy. Our networks are trained in a matter of seconds, which is what you would expect for the cryptanalysis of simple ciphers. Another limitation is the key-phrases size, which is limited to 1-6 characters. A recent work by Gomez et al. [13] introduces CipherGAN, a tool based on Generative Adversarial Networks (GANs). The authors experimentally show that CipherGAN can decrypt shift and Vigenère ciphers with an accuracy of 98.7% for shift ciphers and of 75.7% for Vigenère ciphers, which turns out to be far less accurate (and much more computational intensive) than ours. Finally, Liu et al. in [16] propose a novel cost function for unsupervised learning and a stochastic primal-dual gradient (SPDG) algorithm, and they show how they can be applied in real-world scenarios, in particular to break the Caesar cipher, but they do not consider more complex ciphers such as Vigenère and substitution. Regarding attacks to other ciphers, in [5] the authors propose an application of neural networks to recover in the last round part of the key an hypothetical cryptographic algorithm based on the Feistel architecture, while in [9] the authors present an application of a neural cryptanalysis approach to the Simplified DES (S-DES), and they were able to obtain the correct values for three key bits. These results are quite ad-hoc and their practical impact and how they relate to known cryptanalytic attacks is unclear. Neural networks have been used for the different problem of cipher classification: in [19] the authors present a neural network that by analyzing some features of the cipher is able to classify it as Playfair, Vigenère or Hill cipher. Another example of application of machine learning techniques to cryptography is presented in [17] where the authors propose to apply these techniques in the context of side channel attacks, which we do not consider here. Paper Outline. The paper is organized as follows. In Section 2 we briefly recall neural network techniques and we give some background on classical ciphers. In Section 3 we describe our proposed approach starting with the shift cipher, then Vigenère, and finally the substitution cipher. We conclude in Section 4 with some final remarks. 2 Background In this section we now shortly recall what an artificial neural network is, and how are classical ciphers defined. Artificial neural networks. An artificial neural network is a network composed of different internal layers, called hidden layers, which may vary both in their number and in the number of internal nodes called neurons. The first layer, typically depicted on the left is called input layer, where the data flow in. It is connected to the second layer and so on up to the output layer that produces the model prediction. Each neuron has a set of input values z1 , z2 , . . . which are real numbers between 0 and 1, and has also a set of weights w1 , w2 , . . . associated to each input, and an 3 overall bias b. Initially, all the weights are set randomly, and then the algorithm changes during its execution. Each neuron has also an activation function, that applies non- linear transformations to the inputs and weights. The functions we will be using for our tests will be two very commonly used ones, i.e., the sigmoid and the rectifier functions. Given a setPof input values z1 , z2 , . . ., a set of weights w1 , w2 , . . ., and a bias b we define z = j wj zj + b. Formally, the sigmoid function σ is the non-linear 1 function σ(z) = 1+exp(−z) , while the rectifier function f is an always positive function f (z) = max{0, z}. The interested reader can refer to [14] for more detail. Classical ciphers. We now shortly recall the classical ciphers that we will analyse in the next sections. We consider a set of plaintexts P , a set of ciphertexts C and a set of keys K. Since in the present work we analyse plaintexts written in English language, we assume that all operations are in Z26 . The shift cipher or Caesar cipher, is a mono-alphabetic cipher that maps a letter of the plaintext each time to the same letter of the ciphertext. Formally: Definition 1. [20] Given a plaintext x ∈ P , a ciphertext y ∈ C, and a key k ∈ K, where P = C = K = Z26 , a shift cipher (or Caesar cipher) is defined by the encryption function Ek (x) = x+k mod 26, and by the decryption function Dk (y) = y−k mod 26. Note that, the number of different keys is limited to 26 (all values between 0 and 25), so a trivial brute-force attack is feasible. The Vigenère cipher, is a poly-alphabetic cipher that maps a letter of the alphabet into a set of different letters. The cipher works on blocks of m letters with a key of length m. Formally: Definition 2. [20] Given a plaintext x ∈ P , a ciphertext y ∈ C, and a key k ∈ K, where P = C = K = Zm m 26 , and where Z26 is Z26 × Z26 × . . . × Z26 , m times, for a key K = (k1 , . . . , km ), the Vigenère cipher is defined by the encryption function Ek1 ,...,km (x1 , . . . , xm ) = (x1 + k1 , . . . , xm + km ) mod 26, and by the decryption function Dk1 ,...,km (y1 , . . . , ym ) = (y1 − k1 , . . . , ym − km ) mod 26. The number of possible keys is 26m , i.e., all the possible sequences of letters of length m. For m big enough this prevents brute-force attacks. Finally, a substitution cipher is a mono-alphabetic cipher that maps the letters of the alphabet into a generic permutation. Formally: Definition 3. [20] Given a plaintext x ∈ P , a ciphertext y ∈ C, and a key ρ ∈ K, where P = C = Z26 , and K = {ρ | ρ is a permutation of 0, 1, . . . , 25}, a substitution cipher is defined by the encryption function Eρ (x) = ρ(x), and by the decryption function Dρ (y) = ρ−1 (x), where ρ−1 is the inverse permutation of ρ. Note that, in this case the number of all possible keys is 26!, i.e., the number of all permutations of 26 numbers, which is approximately 4 × 1026 > 288 . Even if this is less than the recommended size for cryptographic keys, i.e., 2128 , it is still very hard to run a brute-force attack even using powerful parallel computers. 4 Fig. 1: Caesar cipher circularly shifts the frequency histogram key positions to the right. 3 Cryptanalysis of classic ciphers with neural networks In this section we use neural networks to automate attacks on the classic ciphers of Section 2, by exploiting known statistical weaknesses. We start with Caesar (shift) cipher and we show how to train a neural network to recover the key by simply providing the relative frequencies of ciphertext letters (cf. Section 3.1); then, we show how to apply the Caesar neural network to Vigenère subtexts so to efficiently recover the cipher key (cf. Section 3.2); finally, we train a neural network with 3-grams frequencies in order to compute the likelihood of a substitution key and we use the neural network to efficiently search the key space of substitution ciphers (cf. Section 3.3). Experimental setup. We consider a corpus of 1 million words extracted from the British National Corpus [1]. Neural networks are implemented using the Keras library [2] with TensorFlow back-end [3]. We run all our tests on a MacBook Pro with 2 GHz Intel Core i7 and 16 GB of RAM, without any GPU or multi-core optimization. We empirically found that simple networks with just one hidden layer were sufficient and could be trained very efficiently with 100% accuracy. This is consistent with the fact that the considered ciphers are simple. More sophisticated networks might be necessary to perform attacks on more complex ciphers. 3.1 Predicting the Caesar cipher key We start with the toy example of the Caesar (shift) cipher. As we have previously pointed out, this cipher only has 26 keys and can be easily brute-forced by inspecting all the possible 26 decryptions. However, to automate the analysis it would be desirable to detect the correct decryption without human intervention. Because of the very small key-space, in this specific example a check of all possible decryptions against a dictionary would suffice but we investigate a more direct, cryptanalytic approach to directly compute the key, based on the particular encryption function. In fact, the shift cipher is a particularly weak form of mono-alphabetic cipher that preserves the relative order of the frequencies of the alphabet characters. 5 If we measure the frequencies of the characters in a ciphertext, the obtained histogram is a circular shift of the plaintext frequency histogram. Along the brute-force idea above, it is enough to shift the histogram until it “fits” the one characterizing the plaintext language, e.g., English (that can be easily computed from long enough English texts) to find the decryption key. Figure 1(a) shows the relative frequencies computed from the British National Corpus [1] and from a sample plaintext of 200 words. We notice that, even if there are differences, the shape of the histogram is preserved. Figure 1(b) illustrates the relative frequencies after encryption with key 3. It is clearly visible how the shape of the histogram is (circularly) shifted to the right of three positions. We now show that brute-forcing the key is not necessary as we can directly train a neural network to recognize the relative shift for the frequency histogram. In particular, we approach the problem as follows: 1. we take a big enough dataset of English plaintexts encrypted under random keys; 2. we compute the frequencies of single letters of the ciphertexts; 3. we train a neural network by providing the frequencies as input and the correspond- ing key as output; 4. we test the network on an independent dataset. Thus, intuitively, the cryptanalyst only points out a weakness (the frequency of single letters in this case) and leaves to the neural network the job of correlating this to the cipher key. The fact that the key can be computed directly from the histogram is interesting as it shows the power of neural networks for the cryptanalysis of the cipher: we just provide the histograms and keys of the training cases and this makes it possible to “recognize” the key (on different ciphers) without trying the actual shifts. Experimental results. We use a rather standard neural network for classification, with an output layer of 26 neurons, one for each possible key. The only hidden layer is also composed of 26 neurons, as we empirically checked that we did not require more layers and thus more complexity. The activation function is sigmoid σ (cf. Section 2). We trained the network using 5097 ciphertexts of 100 words obtained by encrypting different plaintexts under random keys. We tested the model against the same number of ciphertexts obtained by different plaintexts encrypted under random keys with a 100% accuracy, confirming that the model did not overfit the training data. Training takes about 30 seconds and cracking a ciphertext is almost instantaneous. The model is extremely accurate up to about 30 words. Figure 2 shows the worst-case classification value for the selected keys that we got out of 5097 cracking different ciphertexts of lengths 30, 50, 100 and 200, finding respectively key 1 (Figure 2a), 19 (Figure 2b), 17 (Figure 2c) and 16 (Figure 2d). The attacker picks the key with highest score. For length 30 the best candidate is not so far from the second best, while for bigger texts the prediction is neat. This is due to the fact that relative frequencies of letters in short sentences deviate more from the average ones in English. 3.2 Breaking Vigenère cipher through the Caesar classifier We now consider the Vigenère cipher that is one of the first poly-alphabetic ciphers ever proposed. The idea is exactly to contrast the problem of preserving the letter frequencies 6 Fig. 2: Worst-case classification value for different text sizes. typical of mono-alphabetic ciphers (cf. subsection 3.1). Figure 3(b) shows the histogram of the frequencies after a Vigenère encryption. Compared to Figure 3(a) we clearly notice a more uniform shape due to the poly-alphabetic nature of the cipher. A Vigenère cipher (cf. Definition 2) adds a password of length m to each plaintext block of length m. A very efficient cryptanalysis technique for this cipher is due to Friedman [10,11,12]. The idea is to use statistical measures of coincidence such as the Index of Coincidence (IC) to check weather a subtext of the given cipher looks like English in terms of frequencies. IC is the probability that two randomly selected letters are equal and is computed as the sum of the squared probabilities of the letters. It is preserved by mono-alphabetic transformations. Thus, in order to detect m it is then enough to try all possible key lengths and compute the IC of the subtext only composed of the letters at distance m in the text. The IC will be close to the one of English only when m is correct, i.e., when subtext letters are shifted by the same value. After m has been found, Friedman adopted a variant of IC called Mutual Index of Coincidence (MIC) to detect the relative shift due to the different password letters. Friedman’s cryptanalysis is very efficient. Notice, in fact, that brute-forcing the key length m by repeating an efficient computation (whose complexity is constant) m times is perfectly fine since encryption and decryption have a complexity polynomial in m. Here, we resort to the same idea of brute-forcing m but we fully reuse the Caesar classifier of subsection 3.1 to check the correctness of m by applying it to the m subtexts. 7 Fig. 3: Comparison between Caesar’s and Vigenère’s ciphertext frequencies. So our attack requires O(m2 ) invocations of the neural network Caesar classifier which, in turn, is very efficient. In fact, the Vigenère cipher is the application of m different Caesar ciphers on m subtexts composed of the letters at distance m. Interestingly, when we apply the Caesar classifier on Vigenère (poly-alphabetic) encryption we typically get a rather uniform distribution without a neat choice of a key. Thus, our Caesar classifier can be used to check that a certain m is correct and, when it is the case, it directly returns the Vigenère key. More precisely our attack proceeds as follow. Let C = c0 , . . . , cn−1 be the Vigenère ciphertext to crack and let MAX be the maximum key length the attacker wants to try. Then: For each integer m in [1, MAX ]: 1. Consider all m subtexts Si of C composed of the letters at distance m, i.e., c0 , cm , c2m , . . ., c1 , cm+1 , c2m+1 , . . ., . . ., cm−1 , c2m−1 , c3m−1 , . . .; 2. Apply the Caesar classifier to all the Si ’s; 3. If the classifier returns a value bigger than a threshold t for all the Si ’s output the found key. The complexity of this attack is O(M AX 2 ) and can break a cipher almost immediately. Experimental results. We have tested the attack on texts of length 400 and 1000 words with thresholds 0.95 and 0.98, respectively. Keys were chosen at random from length 5 to length 8. For the texts of 400 words we had about 1% of spurious keys, i.e., keys surviving the thresholds but incorrect. This usually happens when the key is particularly redundant (equal letters) and the classifier detects a wrong key size. It is worth noticing that even when spurious keys are found the attack always outputs also the correct key. For texts of 1000 words the attack is very accurate and we did not get any spurious keys in the tests. As for the Ceasar cipher, cracking a ciphertext is almost instantaneous. 3.3 Neural-cryptanalysis of the substitution cipher Even if it belongs to the mono-alphabetic class of ciphers, the substitution cipher (cf. Definition 3) presents the very challenging issue of an extremely large key-space. Recall 8 that keys are permutations of the alphabet, thus giving 26! ≈ 288.38 possible keys, i.e., more than 88-bits keys. This cipher suffers from the same problem of the shift cipher for what concerns letter frequencies: they are preserved by the encryption. However, differently from the shift cipher, here the histogram is permuted based on the substitution key and the trivial attack of choosing the permutation that best fits the English histogram does not produce any useful plaintext. This is due to the fact that many letters in English present very similar frequencies and it is very likely that they get swapped when shorter texts are sampled. Example 1 (Failure of trivial frequency cryptanalysis). Consider the following plaintext of 200 words (with spaces removed, and cut down to only three lines): MAYWANTTOMODIFYSUCHFORMULATIONSALONGTHELINESAREADERISLIKEL YTOFEELATTHISPOINTORMANYREADERSMAYRESPONDBYORPOSSIBLYBETTE RBYTACKLINGTHEISSUEOFHETEROGENEOUSREADERRESPONSESMOREDI... The following ciphertext has been obtained by encrypting the plaintext under the ran- dom substitution VETISLFMBDGNCYQHJPXZAORKUW (meaning that A becomes V, B becomes E, and so on): CVURVYZZQCQIBLUXATMLQPCANVZBQYXVNQYFZMSNBYSXVPSVISPBXNBGSN UZQLSSNVZZMBXHQBYZQPCVYUPSVISPXCVUPSXHQYIEUQPHQXXBENUESZZS PEUZVTGNBYFZMSBXXASQLMSZSPQFSYSQAXPSVISPPSXHQYXSXCQPSIB... By mapping the letters of the ciphertexts depending on the relative frequencies with respect to the relative frequencies in English we obtain the following unsatisfactory result, that only matches 5 of the plaintext letters B, E, H, K and V: GOFYORIITGTLAPFNCUHPTSGCDOIATRNODTRWIHEDARENOSEOLESANDAKED FITPEEDOIIHANMTARITSGORFSEOLESNGOFSENMTRLBFTSMTNNABDFBEIIE SBFIOUKDARWIHEANNCETPHEIESTWERETCNSEOLESSENMTRNENGTSELA... This is a particularly unfortunate case but it is very hard to get more than 10-11 correct letters out of 26 with this method. It is well known that by looking for n-grams, i.e., sequences of n letters, it is possible to have a much more accurate model of a language. However, since each letter is replaced with a different one independently of the other letters, there is no direct way to use n-gram frequencies and come up with a possible substitution key, as we did in the example above for frequencies of single letters. Instead, the commonly adopted idea is to use n-grams to evaluate how far a given text is from an English one, so that it is possible to look for a better substitution, e.g., by swapping two letters. Intuitively, n-grams are used to define a score function that tells how good is a given key and allows for searching a better one by random swaps. As we did before, we leverage on this idea but we leave to neural networks the task of evaluating a text based on n-grams. In particular we proceed as follows: 1. we take a big enough dataset of English plaintexts encrypted under random keys; 2. we compute the frequencies of 3-grams of both plaintexts and ciphertexts; 9 For i ∈ [1, MAXITER]: 1. Pick a random keyi (or the one maximising the match with single frequencies for i = 1); 2. Use the neural network to get a goodnessi value; 3. For MAXSWAPS iterations: (a) swap two letters in the key and recompute goodnessi ; (b) if goodnessi is better save the new key in keyi ; Output keyj such that goodnessj = max({goodnessi | i ∈ [1, MAXITER]}). Table 1: Attack strategy 3. we train a neural network by providing the frequencies as input and one bit of output: 1 when the input is a plaintext, and 0 when it is a ciphertext; 4. we test the neural network on an independent dataset. As before, we only provided the neural network with what we know is a feature used by cryptanalysis and we leave to the network the job of devising a classifier for us. In fact, the neural network is trained to distinguish a plaintext by a ciphertext encrypted under a random permutation of the alphabet. We have no guarantees that this will provide a good classifier which is able to tell “how” similar is a text to a plaintext and, consequently, how good is a key, but experimental results have confirmed that the method is effective. Experimental results. To limit the dimension of the network we have limited the 3-grams frequencies to the most likely 3-grams in English, that we obtained by the National British Corpus [1]. Our previous experiment was based on 2-grams, so we decided to keep the same number of inputs, i.e., 262 = 676. More specifically, the neural network has a hidden layer of 676 neurons and a single neuron as output, giving the float value in the interval [0,1]. Experiments have shown that increasing the size just slowed down training without improving accuracy. The attack strategy picks a random key, decrypts the ciphertext and uses the neural network to estimate how close the obtained plaintext is to English and, consequently, to the target plaintext. It then swaps random letters looking for a better plaintext. If the neural network gives a better score the new key is kept otherwise a new random swap is tried. The attacker can select the bound MAXSWAPS , that is the maximum number of possible swaps. To prevent that the attack only finds a local maximal solution, the attack can be restarted with a new initial random key. The attack can be improved by picking as starting key for the first iteration the one maximizing the similarity of single letter frequencies illustrated in Example 1. Table 1 summarizes the attack strategy. Example 2 (Attack strategy for substitution cipher). Consider again the ciphertext of Example 1. We start from the (unsatisfactory) decryption that is obtained by making the frequency of single letters match: GOFYORIITGTLAPFNCUHPTSGCDOIATRNODTRWIHEDARENOSEOLESANDAKED FITPEEDOIIHANMTARITSGORFSEOLESNGOFSENMTRLBFTSMTNNABDFBEIIE 10 Fig. 4: Number of wrong substitutions in the recovered keys out of 2500 texts. SBFIOUKDARWIHEANNCETPHEIESTWERETCNSEOLESSENMTRNENGTSELA... The neural network gives a goodness value of about 0.38, confirming that the text is far from being an English sentence. The attack strategy picks a random swap of letter C with H, giving the following text with goodness = 0.78: GOFYORIITGTLAPFNHUCPTSGHDOIATRNODTRWICEDARENOSEOLESANDAKED FITPEEDOIICANMTARITSGORFSEOLESNGOFSENMTRLBFTSMTNNABDFBEIIE SBFIOUKDARWICEANNHETPCEIESTWERETHNSEOLESSENMTRNENGTSELA... Since goodness is improved the new key is kept and the process goes on up to the following text with goodness = 0.9998 recovering the full plaintext: MAYWANTTOMODIFYSUCHFORMULATIONSALONGTHELINESAREADERISLIKEL YTOFEELATTHISPOINTORMANYREADERSMAYRESPONDBYORPOSSIBLYBETTE RBYTACKLINGTHEISSUEOFHETEROGENEOUSREADERRESPONSESMOREDI... We have tested the technique on 2500 texts of 200 words. As detailed in Figure 4, in about 58% of the cases the key has been fully recovered. In 93% of the cases the key has at most 2 wrong mappings, which are trivially recoverable by manual inspection. Interestingly, in about 70% of the cases the best key has been found at the first iteration. The attacks have been performed with MAXITER = 10 and MAXSWAPS = 400 and took about 30 seconds for each ciphertext. However, since in 70% of the cases the key is found at the first iteration, the attack only takes 3 seconds to find the correct key. 4 Conclusions In this paper we have shown how to leverage on existing cryptanalysis techniques for classic ciphers in order to devise neural networks that can automate part of the cryptanalytic attack. We have pointed out that neural networks allow the cryptanalyst to only focus on the interesting features that characterize the weakness of the cipher, leaving to the networks the job of correlating the features and producing either the key material (in case of Caesar and Vigenère) or a measure of the goodness of the brute-forced key. We have also presented the first attack on substitution ciphers based on neural networks. 11 As a future work we plan to investigate the general applicability of our approach by identifying classes of ciphers that can be analyzed using techniques similar to the ones presented here. In particular, we intend to target already broken ciphers, such as the ones in the automotive industry [21,22], and reduced-round standard ciphers. References 1. British national corpus. http://www.natcorp.ox.ac.uk/. 2. Keras: The python deep learning library. https://keras.io/. 3. Tensorflow: An open-source machine learning framework for everyone. https://www. tensorflow.org/. 4. M. Al-Ubaidy. Black-box attack using neuro-identifier. Cryptologia, 24(8):358–372, 2004. 5. A. Albassal and A. Wahdan. Neural network based cryptanalysis of a Feistel type block cipher. In Int. Conf. on Electrical, Electronic and Computer Eng., pages 231–237, 2004. 6. W.S. Awad and E.M.El-Alfy. Computational intelligence in cryptology. In Improving Information Security Practices through Computational Intelligence, chapter 2, pages 28–44. Hershey, PA: IGI Global, 2015. 7. E. Biham and A. Shamir. Differential cryptanalysis of DES-like cryptosystems. Journal of Cryptology, 4(1):3–72, Jan 1991. 8. A. Biryukov, O. Dunkelman, N. Keller, D. Khovratovich, and A. Shamir. Key Recovery Attacks of Practical Complexity on AES-256 Variants with up to 10 Rounds. In Advances in Cryptology – EUROCRYPT 2010, pages 299–319. Springer Berlin Heidelberg, 2010. 9. M. Danziger and H. M. A. Henriques. Improved cryptanalysis combining differential and artificial neural network schemes. In International Telecommunications Symposium, August 2014. 10. M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32:675–701, 1937. 11. M. Friedman. A correction: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 34:109, 1939. 12. M. Friedman. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11:86–92, 1940. 13. A.N. Gomez, S. Huang, I. Zhang, B.M. Li, M. Osama, and L. Kaiser. Unsupervised cipher cracking using discrete GANs. In Int. Conf. on Learning Representations, 2018. 14. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http: //www.deeplearningbook.org. 15. S. Greydanus. Learning the Enigma with recurrent neural networks. CoRR, abs/1708.07576, 2017. 16. Y. Liu, J. Chen, and L. Deng. Unsupervised sequence classification using sequential output statistics. In 31st Annual Conf. on Neural Information Processing Systems, 2017. 17. H. Maghrebi, T. Portigliatti, and E. Prouff. Breaking cryptographic implementations using deep learning techniques. In SPACE, LNCS 10076, pages 3–26. Springer, 2016. 18. RL. Rivest. Cryptography and machine learning. In Advances in Cryptology (ASIACRYPT’91), volume 739 of Lecture Notes in Computer Science. Springer, 1993. 19. G. Sivagurunathan, V. Rajendran, and T. Purusothaman. Classification of substitution ciphers using neural networks. Int. Jour. of Comp. Sc. and Network Security, 10(3):274–279, 2007. 20. D.R. Stinson. Cryptography, Theory and Practice. CRC Press, 2003. Third edition. 21. R. Verdult, F.D. Garcia, and J. Balasch. Gone in 360 seconds: Hijacking with Hitag2. In 21st USENIX, pages 237–252, 2012. 22. R. Verdult, F.D. Garcia, and B. Ege. Dismantling megamos crypto: Wirelessly lockpicking a vehicle immobilizer. In 22th USENIX, pages 703–718, 2013. 12