How to prevent inverse algorithm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If you are unlucky enough to have a good understanding of the WIN32 application environment, your software will eventually be arbitrarily ravaged by the fierce debugger. But you are far from being defeated, if the anti-debugging technology (ANTI-DEBUG) as the first line of defense of software protection has been lost, your opponent just mastered a lot of assembly code, after all, the code and the algorithm is still quite a distance between, so you have a second line of defense-anti-analysis. In this line of defense, there are a number of ways you can limit the ability of a cracker to master your encryption algorithm, which prevents the registration machine or cracked patches from appearing.

First, preface

The purpose of software protection is to provide full functionality only to legitimate users, so software protection must include the verification of user legitimacy, which is usually implemented by means of registration code verification.

(1) The user submits user code u to the software author and applies for registration.

(2) The software author calculates the registration code R = F (U) and replies to the legitimate user.

(3) Users enter U and r in the software Registration interface.

(4) The software verifies that the value of f (u,r) is lawful to determine the legality of the user.

Some of the common terms are described below:

(1) User code U: used to distinguish user identity. It may only be a user-defined user name, this form of user code and user identity weak correlation, if a pair of legitimate user code, registration code is disclosed, then anyone can be used to register the software, it may also be the user's machine hardware characteristics, this form of user code and user identity strong correlation, Can effectively prevent a registration, many people enjoy the situation appears, but for legitimate users is very inconvenient, once the replacement or upgrade of the machine, you must re-Shen Qing the registration code.

(2) Registration code R: used to verify the identity of the user. It may have a unique correspondence with the user code, or the same user code may have several registration codes corresponding to the same registration code may have a number of user code corresponding. If the user code and registration code value space is very large, even if the user code and registration code is not the only corresponding relationship, illegal users "happen" to get a pair of legitimate user code, the probability of registration code is very small.

(3) Keygen: We refer to the small f in R = F (U) as the keygen, and the registration machine has the ability to calculate the corresponding registration code for any user code.

(4) Validation function: We call the large F (u,r) in the validation function, the software uses the validation function to verify the legitimacy of the registration code, that is, when and only if R = f (U) is established, F (u,r) takes the legal value.

(5) algorithm inversion: We put the cracker through the verification function f derivation of the registration machine F process is called algorithm inversion, so the construction of the verification function f is very critical.

In the "primary phase" of software registration protection, there is no essential difference between the verification function and the keygen, namely: f (u,r) = f (U)-R. This is dangerous, the verification function itself contains the registrar, the cracker only need to track the operation of the software, directly the software validation function in the calculation of F (U) Copy the assembly code can be used as a keygen, and even do not need to understand the algorithm of F process.

The improved approach is to first find the inverse function of f f-1, so that: u= f-1 (R), and then f (u,r) = F-1 (R)-U. This is a lot of security, the software itself does not include the registration machine F, the cracker must be fully aware of the F-1 algorithm process based on the analysis of the derivation of the keygen f. The reader may be puzzled: if the Cracker first specify R, and then directly use the validation function to calculate the U = f-1 (r), you can use the U, r to register it, why must deduce F? In the practical application, since U, R is usually given in the form of a string, and the validation function usually takes numerical operations, so the general will convert U, r into numerical form U ', R ', then the Register machine F and the registration machine's inverse function f-1 is actually a composite function, verification function because only need to test U ', The legality of R ' can not be exactly equal to f-1.

Assume:		U ' = f1 (u), R ' = F2 (U '), R = F3 (R ')
The	F	R = f (u) = F3 (F2 (F1 (U)))
	F-1	U = F-1 (r) = F1-1 (f2-1 (f3-1 (R)))
	F	F (u,r) = f2-1 (F3-1 (R))-f1 (U)

It appears that the cracker deduced f-1 by f (u,r) = f2-1 (F3-1 (R))-f1 (U) is easier than deriving F, the key is that F1 is usually a transformation built on an ASCII-encoded table, so even if the derivation of f-1 is not of much value:

(1) Only 95 of the 256 ASCII characters can use the keyboard input, if you are above 8, then randomly select R, the probability of using f-1 (R) to calculate the "usable" U is only a mere 0.00037.

(2) U may also be defined as a specific format, such as e-mail address format, the direct use of f-1 to calculate almost equal to the exhaustive.

(3) The user does not necessarily have the freedom to choose U, for example, U is the machine characteristic code, in this case, f-1 is completely not directly use the value.

Of course, since the cracker will encounter "not display, cannot input" characters when you convert U ' to u, so the software author will encounter the same trouble when converting R ' to r in keygen. R can usually be represented by numbers directly, and another commonly used method is the "Look-up Table method", such as converting 19760510 to 36, which turns into a five-digit number: 11 27 19 11 2, if the number is 36 in 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ, then 19760510 is converted into "BRJB2".

The above is a few basic concepts of software registration algorithm, and to prevent the perpetrators of the registration algorithm inversion, but also need special means:

1. Cleverly construct F, f-1, make the calculation of F by f-1 is not feasible.

2. Cleverly construct F, so that F and F, f-1 are not directly related.

Here we will work together to explore how to use the "fortress tactics", "guerrilla warfare" and "trap tactics" three tactics to achieve the above means.

Ii. Fortress tactics

In fact, in the field of communication, the research of authentication has been started very early, and excellent cryptographic algorithms such as hashing and asymmetric encryption have been developed, and the MD5 algorithm and RSA algorithm are very suitable for the application of software registration algorithm.

The principle of the MD5 hashing algorithm is:

(1) Definition of four 32-bit constants: a=0x01234567,b=0x89abcdef,c=0xfedcba98,d=0x76543210

(2) The plaintext text in a certain split and fill rules into 32-bit grouping information, set number of n

(3) A, B, C, D and each group of groups of information for the N-Wheel nonlinear transformation, the final result is a ', B ', C ', d '.

(4) Merge a ', B ', C ', D ' for a 128-bit message.

The MD5 algorithm is characterized by:

(1) Messages of any length can eventually be transformed into a fixed-length (128-bit) hash value called the message digest.

(2) The same message whose message digest is fixed, the situation in which different messages cause the same message digest is called "Conflict", and this is certainly the case, but no instances have been found yet.

(3) Messages cannot be reversed by message digest.

The MD5 algorithm is not usually used to encrypt messages directly, because there is no inverse algorithm, so encryption cannot be decrypted, such encryption is not applied value, the purpose of the MD5 algorithm is digital signature. For example, A and B communication, a only the plaintext a encryption to B is not enough, because even if the encryption strength of ciphertext B can only prevent in the transmission process is decrypted and leaked content, but does not prevent the attacker to directly tamper with the ciphertext B. b After receiving the decryption of a, can not determine the content contained in a authenticity. So a is usually signed after A and then C = MD5 (a), and B and C are sent to B. B and C, after the first decryption B to get A, and then the same calculation C = MD5 (A), if the calculated C and received C is the same, you can determine A true reliability.

The same MD5 algorithm is not suitable for direct use as a register machine, if using R = MD5 (U) as a keygen, because MD5 does not have an inverse function, the verification function will have to include the Registrar. However, we use the MD5 algorithm in the following ways:

(1) Set the registration machine F: r= F (U).

(2) Set MD5 (a) = B.

(3) Make the validation function f: f (u,r) = MD5 [F-1 (R)-U + A]

Obviously the legal value of F should be B, as long as U, R satisfies r=f (U), f must be equal to B. Because the MD5 is not reversible, the cracker can not know through B f-1 (R)-u+ A should be equal to a, it will not be able to obtain the accurate expression of f-1, but also unable to obtain the registration machine F. As for F-1 (R)-u+ a expression, although it contains a, the cracker cannot judge the relationship between a and f-1. For example: U = F-1 (R) = R * 5 + 19,a= 7, then: F (u,r) = MD5 (R*5+26-u), A and f-1 are integrated, so how can the cracker be divided?

In fact, there are many ways to use the MD5 algorithm to construct F, so that the cracker can not see the relationship between F and F, f-1, not to mention the inverse. This is just an extremely simple example, and I believe the reader is fully capable of playing it. MD5 algorithm is widely used, and almost no algorithm implementation of the vulnerability, you can search through Google to a large number of implementation of the MD5 algorithm of the library, according to document instructions directly called.

Of course, the MD5 algorithm also has its shortcomings, because the MD5 is completely irreversible, so a must be constant, once the cracked organization obtained a pair of legitimate u, R, they can trace the legal a, B value, so as to obtain f-1, and further derivation of the Register machine F. Of course, the premise is that they must first try to obtain a pair of legal u, R.

The principle of RSA asymmetric algorithm is (its proof process uses the Fermat theorem and Euler inference in the theory of number, interested friends please self-check, here withheld):

(1) Select two different primes p, Q, Make n=pq,m= (p-1) (q-1)

(2) Select Prime E<n

(3) Calculation d satisfies edmod m = 1

(4) for any a<n, if B=aemod N, then A=BD mod n

Visible before the start of confidential communications, the receiver can randomly generate N, E and D, and then send N and e as the public key to the originator; the originator encrypts the confidential information A to ciphertext b=aemod N; the recipient receives the ciphertext B and decrypts it with the private key D to get plaintext A=BD mod n, of course, confidential information a must <n , otherwise a fragment AI can be split into <n to be encrypted and sent.

If the communication process is monitored by a third party, because D does not appear in the communication process, the listener can obtain a maximum of N, E and cipher B, in the case of not knowing D can not obtain the plaintext A=BD mod n, if the listener wants to deduce d by N, E, by the generation process of D, he needs to perform a mass factorization of n As we all know, it is easy to calculate N=PQ, but it is much more difficult to calculate pq=n. If n is large enough, for example, 1024bit, then the cost of factorization for n is measured at the current level of computation, requiring tens of millions of dollars of computer systems to take a year!

RSA algorithms are easy to apply in software protection:

(1) Software author use: R = Ud mod n as keygen

(2) software use: U = Re mod n as validation function

The cracker even tracking the whole process of software operation, also less than D, unable to write a register machine, of course, the RSA algorithm is not easy to implement, it has several difficulties:

(1) How to deal with the large number operation, the mainstream RSA algorithm is based on the large number of 1024-bit operation. Most compilers only support 64-bit integer arithmetic, which is far from the need of RSA.

(2) How to quickly perform modulo exponentiation b = Ae mod N, where the bread contains a lot of mathematical skills, where the Montgomery algorithm can even make this operation process without the most time-consuming division operation at all.

(3) How to quickly solve the congruence equation XY mod M = 1.

(4) How to quickly obtain a specified length of random primes.

In fact, these problems are not independent of each other, they are interrelated, the ring-ring phase. If you are interested in implementing the RSA algorithm yourself, I believe that with these problems solved, you will gradually realize that the RSA algorithm is really a "beautiful" algorithm.

There are many open source libraries to implement RSA algorithms, such as crypto++, Miracl, Freelip, rsaref, etc., software authors can use these libraries to protect their own software RSA. These libraries I have studied, code efficiency is really good, run fast, but feel either the data structure used is too complex, or the coding style is messy, my level and patience are really limited, so that I can not read these things. Because the principle of RSA algorithm is more complex, confined to space, here can not be comprehensively introduced, interested friends can go to see the Snow Forum (www.pediy.com) in the next humble article "RSA and large number operations", contains a detailed introduction of the principle and easy to understand the MFC source code.

Basically the RSA algorithm protection software can be said to build an impregnable fortress, but we all know the story of the Macedonian line of defense, because the RSA algorithm is well known, and software authors often use public libraries to implement the RSA algorithm, there are some risks using the RSA algorithm:

1, the RSA algorithm itself is strong enough, but users often use the third-party common code. This code may contain vulnerabilities, and there are a large number of senior crackers around the world in the study of the vulnerability of the Code, once the vulnerability is found, the security of the software may become a buried product, because each of the public fixed code must contain a number of feature strings, the search feature string can easily know which software uses the code containing the defect.

2, the users of RSA algorithm often do not understand the details of the algorithm, may be unknowingly encountered by the use of RSA and unconventional means of attack, for example, in different software works using different E, D but use the same n to be "public module attack", or in the field of e-mail communications encryption, such as the use of E, D, N has been "selected ciphertext attack" and so on.

3, in addition to the RSA algorithm principle seems to add decryption process can be interchangeable, but in the field of communication because the decryption process is not exposed to eavesdroppers, so its encryption and decryption process although the same as modulo power operation, but the actual implementation process is often inconsistent, in the decryption end will usually use the "Chinese remainder theorem" to accelerate, The Chinese remainder theorem contains references to the original data p and Q. This type of library is very careful when used in software protection, and once the software author chooses the wrong RSA library function, the Chinese remainder theorem is used in the validation function, which causes the RSA defense line to be fake.

4, some libraries in the generation of random prime numbers, the use of "pseudo-random number generator", that is, under the exact same initial conditions will produce the exact same "random number" sequence. If the cracker obtains the function library, it can infer the p and Q generation process based on the n value, thus breaking the RSA defense line.

5, the RSA algorithm has a number of special prime numbers constructed by the "weak key", some libraries in the generation of random prime numbers, did not eliminate these special primes, leading to RSA defense in the number of experts in front of weak weakness.

Although the software implementation of RSA protection has some risks, but after all, the RSA algorithm is very complex, can be optimistic to estimate that the RSA algorithm enough to allow more than 90% of the cracker to face the obscure assembly code for a while to find the north.

This section basically borrows the knowledge of cryptography in the field of communication, in fact, the cryptographic knowledge in the software protection has a great use, if you have to study, you will find that "fortress" in fact there are many ways to build.

Third, guerrilla warfare

Guerrilla warfare first tenet: piecemeal. It's very effective against powerful opponents. The guerrilla tactic in software protection is to "dismember" the verification function f into multiple different fi, and then hide the fi as far as possible into the hidden program corner.

Verification by any one fi is only a necessary condition for registration code legality, not sufficient condition, the real legal registration code can be verified by all fi. The cracker finds either or any of the fi clean sweep, as long as it can't get all of the fi, he can't see the full picture of F, the algorithm can not be reversed, can not be registered machine.

Of course, it is not easy to decompose F into a series of necessary non-sufficient fi, which requires more specialized mathematical knowledge, but we can at least use piecewise functions to achieve this goal simply:

(1) splitting R into a multi-segment RI

(2) Constructs different F algorithm to make: Ri = Fi (U)

(3) The inverse function fi-1 of fi=fi

It's a bit of a hassle, but it's definitely worth it. For example, we can let F1 use the MD5 algorithm, F2 using the RSA algorithm, F3 using a custom unknown algorithm, after the user entered the registration code only use F1 to verify, and the registration code in ciphertext form the data file in a custom format, Then, if the validation passes (assuming that the cracker always has a way to get it through) congratulations on the success of the registration. The other two validation functions are hidden and are called only when the user performs a specific operation, such as reading the registration code once again when users are doing an archive operation or using some advanced features, and I even come across a software that responds to a window Destory message when the software is closed to invoke a validation function. Once any of the verification functions found that the registration code is illegal, it clears the registration code and restores the software to an unregistered state, even more extreme choice of "suicide".

Guerrilla warfare second purpose: falsehoods. In the case of a cracker, a guerrilla warfare would be very passive, unless the verification function he found has been able to form a one-to-one correspondence between you and R, it is never certain that the software is still buried in the other validation functions, and the fact that the software author does not need to let you, r to form a one-to-one correspondence, The uncertainty of verifying the number of functions is really annoying for the cracker who is trying to make the keygen.

With a bit of knowledge of a simple linear algebra, we can associate several of the RI (note just a few, not all) with the fi:

Set: Ra = 3u,rb= 5U,RC = 7U, then:

Fa = 7Ra + 11Rb + 5rc–111u

Fb = 11Ra + 7Rb + 3rc–89u

Fc = 5Ra + 3Rb + 11rc–107u

Such a cracker to find FA, Fb, FC in any one, or even a small paragraph can not be found, a better idea is to participate in the linear system of the number of RI is slightly larger than the number of verification functions using linear equations, the software author holds a linear equation of a specific set of solutions as a register machine, And the cracker can not understand the verification function in the end there are a few, just like the devil never know how many army, so that the final hysteria to see people think is "eight work."

If a pair of U, R as the vertical and horizontal coordinates, as a point on the plane, the Registrar f as a legitimate u, r a plane curve, we can also construct a plurality of space surface equation as the verification function f, the condition is that f falls on these spatial surface, if the knowledge of space analytic geometry, I believe you can construct a number of surface equations as a validation function, and even consider the use of parametric equations, so that even if the cracker obtains all the FI, but also have a profound solution to the ability to find F.

We must repeatedly emphasize the importance of mathematical knowledge, whether it is number theory, algebra, linear algebra, geometry, analytic geometry, or calculus (I personally think that using Fourier transform as a validation function will be very interesting), probability theory, can be used as software protection weapons. Please be assured that if you have 30% math skills, you will be enough to kill 60% of your opponents, because you and he have an asymmetric amount of information.

Guerrilla warfare the third tenet: strategic transfer. Guerrillas are often swept by the devil, because the guerrillas are often the whereabouts of the traitor whistle. The Achilles heel of our guerrilla tactics is that each validation function must access the registration code, and the source of the registration code is only one: the one that the user entered from the registration interface. The cracker will follow the process of reading the registration code from the registration interface, and monitor the memory address where the registration code is stored, and once the authentication function accesses the address it will leak the whereabouts, so that the registration code actually becomes a key for the cracker to find the verification function, in theory he just hold the key firmly, You will definitely find all the validation functions. The solution is a large-scale transfer, guerrilla warfare, is to be able to "run", the devil dragged dead, exhausted, the software must keep the registration code "move", moving the method to diversify:

(1) Memory copy, this routine method is easy to be detected by the BPM memory monitoring breakpoint of the cracker.

(2) write to the registry or file, and then in another code to read into another memory address, this method will be cracked by the registry, file Monitoring tool.

(3) Copy the registration code to multiple addresses at a time, so that the cracker can not determine which address is the new home of the registration code, of course, if the enemy perseverance, all chasing, at least can also consume the enemy a lot of energy.

(4) After repeated use of the same function to move suddenly using another first half of the same code and the second half of the different functions to move. This method is easy to let the exhausted enemy accidentally will lose the registration code.

(5) to "smuggle" a part of the registration code into a different address and then assemble it, which is an easy way for the enemy to be unprepared.

(6) The above method is used repeatedly, if only by Copy&paste can be the opponent into a mental illness, why not?

In fact, the initiative is always in your hands, and you can create a plethora of moving dafa to deal with poor crackers. Perhaps you will accidentally encounter an energetic and heroic opponents eventually will your guerrillas one by one purge, but please believe that, as long as you carefully implement the guerrilla tactics of the three purposes, flexible combination of use, will make more than 90% of the cracker like headless flies as the chaos turned off the machine after the surrender. Of course, you may receive a lot of verbal "care", such as "Metamorphosis", "#$%$*^" and so on, but this only means that your opponent lacks poise, and will not have any substantial impact on you.

Iv. Trap Tactics

The so-called "trap", is to let the cracker astray, into the confusion can not extricate themselves, we should not blindly try to compile to let the cracker read the code, it is not an easy thing, and holding such a goal is easy to make self-righteous, underestimate the opponent's mistakes. Our task is best to lure the opponent into making the mistake of self-righteousness, so that he does not fully understand the verification process, but he understood. Of course, the "trap" to be introduced, but some of my personal research ideas, perhaps only Bo master a smile, but hope that at least can inspire you, play a role.

The first trap I call the "random trap", the principle is to prepare multiple validation functions, each time the program runs, it just randomly calls one of them. Where the generation of random numbers can be placed in the program header, because the application usually has a lot of data initialization work, coupled with the random number generation function code itself is more obscure, the inclusion in which a random number should be very covert. Such a cracker after the tracking program will usually think that the program has only one validation function, and then confidently released the registration machine, while others use the registration machine for registration, because the program generated a different random number, called a different authentication function, registration natural failure.

Unfortunately, you cannot judge your opponent's skill and habits, some of the most skilled rookie (like me) or some have special habits of the master will often track your program several times, or even dozens of times, your random traps will be found. So I strongly recommend that you control the probability that one of the validation functions is randomly called around 1%. The downside to this is that there will be some incomplete keygen, but you can make your software run many times in the registration state. The advantage is that, according to psychologists, human beings are most interested in things that are deeper but uncontrollable, and once someone tries to find an incomplete keygen to register your software, he will think, "Haha, now this thing is my!" "The feeling of course will be very good, he and your software relationship in psychological intimacy is generally never worse than the software he bought, and once he suddenly found out your software to see his illegal identity, he will be very frustrated, it will be very bad, it is very likely that he will be tempted to take" their own things "back the impulse, Even in the pocket. In this case, congratulations on your acquisition of an exceptionally registered user. You will also get the incidental harvest: your opponent, the cracker's reputation has been hit, in fact, a lot of crack masters do not rely on cracking life, they care about is the so-called reputation, your this blow is likely to let him frustrated, lost the interest of continuing to crack, so, congratulations you removed a difficult enemy.

The second trap I call the "integer Trap", since the usual validation function converts the user code and the registration code into integers, we can use this experience of the cracker to do some hands and feet. For example, we split the user code into two integers U1, U2, the registration code is also split into two integers R1, R2, keygen for r1=u1+115,r2=u2+351, the verification function has two, wherein F1 is: [(R1-U1) (R2-U2) "2]1/2" (): ( R2-U2)/(R1-U1) = 3, we will F1 exposed, will F2 hidden. So that the cracker find F1 first will find that the verification process contains the complete information of U, R, and F1 is obviously a hook-up equation, check table can get its unique integer solution to r1-u1=81,r2-u2=360, it is easy to think of self-righteous as the Register machine for r1=u1+81,r2=u2+360. However, we do not use F1 integer solution, but in computer computing just meet F1. You can actually think of u as a point on a planar coordinate system, which satisfies the F1 is a circle with the center of U, 369 is the radius, and satisfies F2 is a straight line with a slope of 3, and the intersection of F1 and F2 is the legal R. Most of the mathematical functions we learn are continuous functions, and the computer is processing discrete values, so this kind of traps are easy to construct, but it is not always easy for the cracker to get through.

The third trap I call "function trap", crack master usually to their own analytical inductive ability very confident, when they encountered in the process of tracking function call, like to pass different parameters to the function, try to analyze the function from the change of return value, we can construct traps. First we expose F1,F1 read into

R as a parameter, the return value is R ' and R ' and R have a certain corresponding law, followed by the validation R ' =u, let the cracker mistook F1 function is only F1 (R) =r '. In fact, the function of F1 is Mingxiuzhandao, sneak, before returning R ' R ' writes a variable address. An ambush in another F2 will restore R ' in the address of the variable to r and perform a completely different validation. This trap is very easy to let the beast fall, to deal with the small rabbit is often not effective, crack rookie always see the call on the subconscious to follow the F8, like playing CS rookie See the enemy will withhold the trigger refused to let go, there is no way to stop.

Some of the above examples have a certain space to play, but also can be used in combination, in the final analysis, the essence of trap tactics is to use the opponent's experience to mislead opponents. I think it is necessary for you to study the habits and hobbies of the perpetrators and devise more powerful traps to deal with the increasingly strong crackers.

V. Conclusion

The prevention of the inverse of the algorithm can prevent the transmission of the registration machine, but can not effectively prevent the pernicious influence of the patch, you also need to prevent violence and other technologies to use.

Turn from---------------See Snow Forum

How to prevent inverse algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More