Decryption Through the Likelihood of Frequency of Letters

Decryption Through the Likelihood of Frequency of Letters BarbaraSánchezRinza brinza@cs.buap.mx Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

ZacariasFernando Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

LunaPérezFlores fzflores@yahoo.com.mx Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

MartMauricio Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

MarcoÍnez Cortés Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

Antonio Computer Science Benemérita Universidad Autónoma de Puebla

14 Sur y Av. San Claudio 72000 Puebla Pue México

Decryption Through the Likelihood of Frequency of Letters 3A884FBF00B3DDC3022DB5B80BB80394 GROBID - A machine learning software for extracting information from scholarly documents Probability, Decrypt

The method to decrypt the information using probability leads to a more thorough job, because you have to know the percentage of each of the letters of the language that is being analyzed here is Spanish. You can consider not only the probabilities of the letters also syllables, set of three, four letters and even words. Then you have this thing to do is make comparisons of the frequencies of cipher text and the frequencies of the language to begin to replace by a correspondence. And finally passing a scanner and find the decrypted text.

Introduction

Cryptography is the science that alters the linguistic representations of a message [1]. For this there are different methods, where the most common is encryption. This science masking the original references of the information by a conversion method governed by an algorithm that allows the reverse or decryption of information. Use of this or other techniques, allowing for an exchange of messages that can only be read by the intended beneficiaries as 'consistent'. A consistent recipient is the person to whom the message is directed with the intention of the sender. Thus, the recipient knows the discrete coherent used for masking the message. So either have the means to bring the message to the reverse process cryptographic, or can infer the process that becomes a message to the public. The original information to be protected is called plaintext or cleartext. Encryption is the process of converting plain text into unreadable gibberish called ciphertext or cryptogram. In general, the concrete implementation of the encryption algorithm (also called figure) is based on the existence of key secret information that fits the encryption algorithm for each different use [2].

Decryption is the reverse process to recover the plaintext from the ciphertext and key. Cryptographic protocol specifies the details of how to use algorithms and keys (and other primitive operations) to achieve the desired effect. The set of protocols, encryption algorithms, key management processes and actions of the users, which together constitute a cryptosystem, which is what the end user works and interacts. In this work, we must first have a ciphertext which must meet certain requirements, such a text should be bijective so that each element of the domain carries a single element of the condominium. In addition we must also take account of the rules of Kerckhoff [3].

Development work

Frequencies in Spanish

Is required to decrypt text using the odds as to how often they used certain letters in the alphabet, for this work only considered the Spanish language [5].

The frequencies of Spanish, which were used for this study were:

1. Frequency triglyphs 2. Frequency of digraphs 3. Most common words 4. Frequency of letters at the beginning of words 5. Frequency of letters in Spanish 6. Frequency Words

Triglyphs Frequencies

The letter frequency statistics may vary from one to another depending on the corpus author has chosen to develop them. Usually differences when the corpus is literary or consists of texts of different origins. Table 1 shows the frequency of each of the Spanish alphabet with their respective percentage.

Most Frequent words

The vowels make up about 46.38% of the text. The high frequency letters account for 67.56% of the text. Mid-frequency points accounting for 25% of the text [4].

In the dictionary the most common vowel is A, but in written texts is the E because of prepositions, conjunctions, verbs, etc. The most common consonants are L, S, N, D, with about 30%. The less frequent six letters: V, N, J, Z, X and K (just over 1%). The average frequency of a Spanish word is 5.9 letters. The coincidence index for Spanish is 0.0775. In addition to solving the encryption table 2 we mentioned that we most frequently used words in a text of 10 000 words. Next, table 3 shows the frequencies of the 4-letter words.

Most common words

Frequency digraphs

The size of the corpus is 60,115 letters. The frequencies are absolute. The digraphs are read by row and column in that order. Below in table 4 shows the union digraphs are letters from letters.

Most common initial letter

The most frequent letters in Spanish that start a word are listed in Table 5 3 Results

The ciphertext is used as said it had to be bijective and have Kerckhoff rules and the decrypted text shown in Figure 1.

Four-letter words

Distribution of letters in literary texts Word Frequency E -16,78% R -4,94% Y -1,54% J -0,30% PARA 67 A -11,96% U -4,80% Q -

Conclusions

We conclude that this method of decryption is good however would have to tweak a little more due to it depends on the text we have and how much text to decrypt was also observed that only decrypts an encrypted bijective. In this work, as seen in the results of Figure 1, which apply various processes, first see the probability of the lyrics in Spanish that are more frequent, then seen with the syllables that are more frequent in Spanish, and then with the last word and you miss the information, text analyzer, as shown in Figure 1 a large percentage of the information is decoded, but as mentioned in the top, this will depend have that much information to process it.

Fig. 1 .1Fig. 1. with each of the texts worked, 01 encrypted text, 02 text one pass, 03 second pass the text, either original text decrypted

Table 1 .1Frequency triglyphsHigh frequency letters Medium frequency letters Low frequency letters Frequencies 0.5%letterfreq.%letterfreq.%letterfreq.%G, F, V, WE16,78R4,94Y1,54A11,96U4,80Q1,53O8,69I4,15B0,92L8,37T3,31H0,89S7,88C2,92J, Z, X, K, NN7,01P2,76D6,87M2,12

Table 2 .2Most frequent words of one, two and three letterTwo-letter words Three-letter wordsWord Frequency Frequency WordFrequencyDE778778QUE289LA460460LOS196El339339DEL156EN302302LAS114QUE289119POR110Y22698CON82A21374UNA78LOS19664MAS36DEL15663SUS27SE11947HAN19LAS114

Table 3 .3Frequency with four letters1,53%

Table 4 .4Frequency of digraphsA B C D E F G H I J K L MA 12 14 54 64 15 5 8 4 10 8 41 30B 11514 1 12C 395178 803D 321 2 841 30E 20 5 47 26 17 8 21 6 9 3 44 26F 29121G 121251H 1535I 43 8 42 29 40 5 81 14 16J 45K1L 445 5 35 1 3289 5M 32 104230N 41 2 33 37 41 10 6 2 28 15 4O 19 17 28 26 16 6 5 5 4 1 22 33P 3011658QR 74 1 12 10 94 1 12 45 1 1 6 15S 32 2 18 15 57 3 2 4 41 15 7T 6016735U 13 6 11 5 52 1 399 6V 121 1515W 11X14Y 5 1 3 2 5 1 11 1letterPCDE S A L R M N Tfrequency 1.1128 1.081 1.012 989 789 761 435 425 403 346 298letterQIH U G V F O B J Y W Z Kfrequency 286 281 230 219 206 183 177 169 124 47 27 19 2 1

Table 5 .5Frequency of initial letters

Liddell and Scott's Greek-English Lexicon 1984 Oxford University Press Codigos Y Claves Secretas: Programas En Basic, Basado A Su Vez En Un Estudio Lexicogrfico Del Diario AnayaMultimedia 1986 El Pas; Mexico WilliamFFriedman LambrosDCallimahos Military Cryptanalytics, Cryptographic Series 1962 Part I

Laguna Hills, Ca

Aegean Park Press 1985 2 Cryptograms In Spanish WayneGBarker Aegean Park Press Laguna Hills, Ca