Single-table substitution cipher principle and algorithm implementation

Source: Internet
Author: User

To understand the single table replacement password you need to understand the replacement password, here I will do a brief introduction:
Substitution is one of the most basic processing techniques used in classical cryptography.
Substitution password refers to the first establishment of a replacement table, encryption will need to encrypt the plaintext by looking up the table, replaced by the corresponding characters, the Ming characters are replaced by one after another, generating meaningless strings, that is, ciphertext, the key to replace the password is its replacement table.
According to the password algorithm and decryption using the replacement table how much different, alternative password can be divided into a single-table alternative password and multi-table replacement password.
A fixed substitution table is used to encrypt the cryptographic algorithm of the single table substitution password. Single table substitution password can be divided into general single-table replacement password, shift password, affine password, key phrase password.
Here's a direct attack of single table instead of a password.

For natural language, if you take a non-professional book, a long enough text will find that the frequency of the occurrence of letters (or characters) will reflect the statistical characteristics of the corresponding language. The statistics of a large number of texts will find that each letter in the corresponding language appears in the corresponding language probability.  So we get a probability distribution on the alphabet of the language. First, the English letter one by the independent test produces the clear text single code, Beker in 1982 the total sample number is 100 362, obtains the single code probability distribution to see the following table:

According to the above table, the probability of the occurrence of the English alphabet is arranged by size as follows:

E T A O I N S H R D L C U M W F G Y P B V K J X Q Z

In the above table, many letters appear almost equal in probability. For the convenience of application, the English alphabet is often categorized by the probability size of the letter appearing in the following table:

--------------------------
Very high frequency E
Secondary high frequency T A O I N S H R
Medium Frequency D L
Low frequency C U M W F G Y P B
Very low frequency V K J X Q Z
--------------------------
Other languages and data also have single-letter statistical characteristics similar to the English language. If we randomly statistics a long enough English text, as long as the content is not too special, the result must be the same as the above table. This indicates that the probability of each letter appearing in an article in English is basically predictable, and it will provide a basis for the analysis of cryptography.
The single-letter statistical characteristics of a language do not reflect the characteristics of the English language at least in the following two ways:
⑴ the probability of two-letter QE appearing as P (QE) =0.00095x0.12702≈1.21x10^ (-4) can be calculated based on the single-letter statistical characteristics of English.
That is to say, in 10^6 two-letter sampling, QE should appear approximately 121 times, but this does not conform to the actual English text. Because in the English text, QE does not appear at all.
⑵ four letters send and Sedn are equal in the probability of single-letter statistics, which is also inconsistent with the actual English language. In short, the single-letter statistic of natural language only reflects the probability of the occurrence of a single letter, and does not reflect the correlation between the letters of the language. In order to embody the two-letter statistical characteristics of natural language, we need to examine the frequency of the occurrence of adjacent letters in the language. Two letters are produced by independent tests.      According to Beker in 1982, the frequency of two-letter letters was given in English. The probability of the largest 30 pairs of letters in the two-letter alphabet is calculated by probability size:

Th He in er a re ed on Es St en at to nt ha nd ou ea ng as or TI is et it ar te se hi of
As long as we randomly statistics long enough English text, as long as the content is not too special, the result must be basically the same as the above probability, which also indicates that the two letters in the English text in the probability is basically predictable, it provides another basis for the analysis of cryptography.

Similarly, we can also examine the frequency of three letters appearing in English texts. Still according to Beker in 1982 the results of the statistics (total sample 100 360) obtained the probability of the largest 20 groups of three letters are arranged by probability size:

The ing and her ere ent tha Nth is eth for DTH hat she ion his sth ers Ver
In particular, the frequency of the appearance is almost 3 times times that of ING.
It should be emphasized that in the use of statistical analysis method, the volume of ciphertext is large enough, otherwise it will increase the difficulty of password attack. In actual communication, in addition to letters, such as punctuation, numbers and other characters, their statistical characteristics must also be taken into account. The data format and header information are important to the security of cryptography, and also play an important role in the analysis of cryptography.
It is helpful to use the following statistical features in English when analyzing or attacking a tip.
The ⑴ article has a great influence on the statistical characteristics of the English language, which makes t,h,th,he and the high probability element in single-letter, double-letter, and three-letter statistics.
⑵ more than half of the English words ending in e,s,d,t
About half of ⑶ English words with t,a,s,w as the starting letter
⑷① in the analysis or attack ciphertext should first find a breakthrough, generally speaking, first from the a I start. (Can be a letter independent as a word only a, I, and the higher the frequency is preferred a)
② if "'" appears, a letter after "'" can only be one of T S L d v; if it is two letters, it can only be re LL. (two different letters are re, same as LL)
③ Other laws: If there are two identical letters at the end of a four word word, it is always ll;
A three-word word starting with a can only be one of and is;
If q* appears in a combination of two letters, then * is U (q is almost always u);
If the beginning and end of a word is the same letter, the most likely is: s T D;
Two are two-letter combinations of consonants, often containing n or T;io, OUs and EAS are the most common two-vowel alphabet;
If the first two letters of a word are consonants, the second letter is most likely: R, L, and H;
If a three times with three consonant endings, then the most common is-ght and-tch;
Reverse two-letter combination: Er-re, Es-se, An-na, It-ti, On-no, En-ne, Ot-to,ed-de, St-ts, At-ta, Ar-ra, In-ni;

Frequency of use of small words
1-letter:a,i,o
2-letter:of,to,in,is,it,be,as,at
3-letter:the,and,for,are,but,not,you,all
4-letter:that,with,have,this,will,your,from,they
5-letter:which,would,there

Vowel Letter: A,e,i,o,u
The most common letter after the vowel is n, the most common letter before the vowel is H

Common double-letter prefixes re Co un com il ir up
The most commonly used three-letter suffix ion ing
Most common ending ed es er "Example"
1. Attack Examples:
Given ciphertext for Grox cmryyvlyic coxn combod wocckqoc DY okmr ydrob droi ypdox sxfoxd K myno LI voddsxq okmr voddob YP DRO KVZRKLOB BO Zbocoxd Kxydrob Yxo

The first step of the attack is to make the frequency distribution table of the text mother appearing

The second step is to determine, according to the frequency statistics of ciphertext letters, that some of the plaintext letters of the text mother may be the letters in the single-letter frequency statistics table. In this example
PlainText letter corresponding to ciphertext letter
O,d,b,v e,t,r,l
The third step is to use the natural language of the text combined with the rules of speculation. D often appears in the prefix or the ending, so guess it corresponds to T, and P often appears in the suffix and not appear in the prefix, so guess it corresponds to the plaintext letter E; K appears alone and the frequency is higher, guess corresponds to a.
The use of two-letter, three-letter statistical characteristics and the spelling of vowel consonants can be guessed as follows:

This is not difficult to guess: Grox is When,okmr is Each,voddsxq is Letting,kxydrob is Another,dy is to.
Then get the following table:

Further determination can determine the c,w,i,n,f,q,z corresponding to the plaintext letter. The clear and restored plaintext is as follows:
When schoolboys send secret messages to all other they often invent a code by letting all letter of the Alphaber Represe NT another one. Algorithm implementation
1#include <iostream>2#include <fstream>3#include <cstdlib>4 using namespacestd;5 Const Charc[ -]={'D','J','k','Z','u',6                 'x','C','m','L','I','W','b','v','N','o','P',7                 'Q','a','R','s','g','h',8                 'F','T','y','e',' '};9 voidEncryption (ifstream& fin,ofstream&fout);Ten intMain () One { A ifstream fin; - ofstream Fout; -Fin.open ("file1_1.in"); the     if(Fin.fail ()) -     { -cout<<"File Open error! (Input)"<<Endl; -Exit1); +     } -Fout.open ("File1_1.out"); +     if(Fout.fail ()) A     { atcout<<"File Open error! (Output)"<<Endl; -     } - encryption (fin,fout); - fin.close (); - fout.close (); -     return 0; in } - voidEncryption (ifstream& fin,ofstream& Fout)//Encryption Process to { +     CharNext; -     Charch; the     inti; *      while(Fin.Get(next)) $     {Panax Notoginseng         if(next>='a'&&next<='Z') -         { thei=next-'a'; +Ch=C[i]; Afout<<ch; the         } +         Else -         { $fout<<' '; $         } -     } - } the voidDecryption (ifstream& fin,ofstream& fout)//decryption Process - {Wuyi     Charch; the     CharChout; -      while(Fin.Get(CH)) Wu     { -          for(intI=0; i<= -; i++) About         { $             if(ch==C[i]) -             { -                 if(i== -) -                 { Afout<<' '; +                 } the                 Else -                 { $chout=Char('a'+i); thefout<<Chout; the                 } the             } the         } -     } in}

Input file file1_1.in

// file1_1.inWe'll attack tomorrow morning

Output File File1_1.out

// file1_1.outfu flbb dssdkw sovoaaof VOANLNC

Single-table substitution cipher principle and algorithm implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.