Common protection technologies
Java bytecode is easily decompiled because of its high abstraction level. This section describes several common methods to protect Java bytecode from decompilation. Generally, these methods cannot prevent program decompilation, but increase the difficulty of decompilation, because these methods have their own use environments and weaknesses.
Isolate Java programs
The simplest method is to prevent users from accessing the Java class program. This method is the most fundamental method, and there are multiple methods for implementation. For example, developers can place key Java classes on the server, and clients can access services through the server-related interfaces rather than directly accessing class files. In this way, hackers cannot decompile class files. Currently, more and more standards and protocols are available for services through interfaces, such as HTTP, Web Service, and RPC. However, many applications are not suitable for this protection method. For example, Java programs cannot be isolated from programs running on a single machine. See figure 1.
Figure 1 isolate Java programs
Encrypt class files
To prevent class files from being decompiled directly, many developers encrypt some key class files, such as classes related to registration code and serial number management. Before using these encrypted classes, the program first needs to decrypt these classes and then load these classes into the JVM. The decryption of these classes can be completed by hardware or software.
During implementation, developers often load encryption classes through the custom classloader class (note that the applet cannot support custom classloader due to security reasons ). The custom classloader first finds the encryption class, then decrypts it, and finally loads the decrypted class into the JVM. In this protection mode, the custom classloader is a key class. Because it is not encrypted, it may become the first target of hacker attacks. If the related decryption keys and algorithms are cracked, the encrypted classes are easily decrypted. See figure 2.
Figure 2 encryption of class files
Conversion cost code
Code Conversion cost is also an effective method to prevent decompilation. Because local code is often difficult to decompile. Developers can choose to convert the entire application into code at a cost or key modules. If you only want to convert key modules, Java programs must use JNI technology to call these modules.
Of course, while using this technology to protect Java programs, it also sacrifices Java's cross-platform features. For different platforms, we need to maintain different versions of local code, which will increase the software support and maintenance work. However, for some key modules, sometimes this solution is often necessary.
To ensure that the local code is not modified or replaced, you usually need to sign the Code. Before using the local code, you often need to authenticate the local code to ensure that the Code is not modified by hackers. If the signature check is successful, call the relevant JNI method. See figure 3.
Code obfuscation
Figure 3 conversion cost code
Code obfuscation re-organizes and processes class files so that the processed Code has the same functionality (semantics) as the pre-processed code ). However, obfuscated code is hard to be decompiled, that is, the Code obtained after decompiling is difficult to understand and obscure. Therefore, it is difficult for the decompilers to obtain the true semantics of the program. Theoretically, if hackers have enough time, obfuscated code may still be cracked, and some people are developing obfuscation tools. However, due to the diversified development of obfuscation technology and the maturity of obfuscation theory, obfuscation of Java code can effectively prevent decompilation. Here we will introduce obfuscation technology in detail, because obfuscation is an important technology to protect Java programs. Figure 4 obfuscated codes.
Figure 4 code obfuscation
Summary of several technologies
The above technologies have different application environments and each has its own weaknesses. Table 1 compares the characteristics of these technologies.
Obfuscation Technology
Table 1 Comparison of Different protection technologies
So far, obfuscation technology is the most basic protection method for Java program protection. Java obfuscation tools include commercial, free, and open source code. Sun also provides its own obfuscation tools. Most of them are obfuscation of class files, and a small number of tools first process the source code and then the class, which increases the intensity of obfuscation. Currently, relatively successful obfuscation tools include the 1stbarrier series of jproof, jshrink of eastridge, and sourceguard of 4thpass.com. The main obfuscation techniques can be classified as follows based on obfuscation targets, which are lexical
Obfuscation, data obfuscation, control obfuscation, and prevent transformation ).
Symbol Obfuscation
There are a lot of information in the class that is irrelevant to the program execution itself, such as the method name and variable name. The names of these symbols often have certain meanings. For example, if a method is named getkeylength (), this method is probably used to return the length of the key. Symbolic obfuscation means to disrupt the information and change the information into meaningless representations. For example, all variables are numbered from vairant_001, and all methods are numbered from method_001. This will bring some difficulties to the decompilation. For private functions and local variables, you can change their symbols without affecting program running. However, for some interface names, public functions, and member variables, if other external modules need to reference these symbols, we often need to keep these names, otherwise, the external module cannot find the methods and variables for these names. Therefore, most obfuscation tools provide a wide range of options for symbol obfuscation, allowing users to choose whether or not to confuse symbols.
Data Obfuscation
Figure 5 Change Data Access
Data obfuscation is to confuse the data used by the program. There are also many obfuscation methods, which can be divided into store and encode transform and access transform ).
Changing data storage and encoding can disrupt the data storage methods used by programs. For example, you can split an array with 10 members into 10 variables, disrupt the names of these variables, and convert a two-dimensional array into a one-dimensional array. For some complex data structures, we will disrupt their data structures, such as replacing a complex class with multiple classes.
Another way is to change data access. For example, when accessing the lower mark of an array, we can perform some calculations. Figure 5 is an example.
In actual obfuscation, these two methods are generally used in a comprehensive manner. They disrupt data storage and data access methods. After data obfuscation, the semantics of the program becomes complex, which increases the difficulty of decompilation.
Control Obfuscation
Control obfuscation refers to obfuscation of the control flow of the program, making it more difficult to decompile the control flow of the program. Generally, the change of control flow requires additional computing and control flow, therefore, the performance will have a negative impact on the program. Sometimes, you need to weigh the performance of the program and the degree of obfuscation. Obfuscation control is the most complex and skillful. These technologies can be divided into the following types:
Increase obfuscation control by adding additional and complex control flows, you can hide the original semantics of the program. For example, for the two statements A and B executed in order, we can add a control condition to determine the execution of B. This method makes disassembly more difficult. However, all interference controls should not affect the execution of B. Figure 6 shows three methods to add obfuscation Control for this example.
Figure 6 add three methods of obfuscation Control
It is also an important obfuscation method to reorganize the control flow. For example, a program can embed the method code into a calling program after obfuscation. In turn, a piece of code in the program can also be converted into a function call. In addition, for a loop control flow, the control flow can be split into multiple loops or converted into a recursive process. This method is the most complex and involves a large number of researchers.
Preventive Obfuscation
This obfuscation is usually designed for some specialized anti-compilers. In general, these technologies use anti-compiler vulnerabilities or bugs to design obfuscation solutions. For example, some anti-compilers do not decompile the instructions after return, while some obfuscation schemes just put the code behind the return statement. The effectiveness of such obfuscation has different effects on different anti-compilers. A good obfuscation tool usually combines these obfuscation technologies.
Case Analysis
In practice, it is often necessary to use these methods in combination to protect a large Java program, rather than simply using one method. This is because each method has its own weakness and application environment. Using these methods comprehensively makes Java program protection more effective. In addition, we often need other related security technologies, such as security authentication, digital signatures, and PKI.
The example in this article is a Java application, which is a simulated test software for scjp (Sun certificate Java programmer. This application carries a large number of simulated questions. All questions are encrypted and stored in files. Because the question library it carries is the core part of the software, the access and access to the question library becomes a very core class. Once these classes are decompiled, all the questions will be cracked. Now, let's consider how to protect these questions and related classes.
In this example, we consider using integrated protection technology, including local code and obfuscation technology. Because the software is mainly released on Windows, after converting the local code, you only need to maintain a version of local code. In addition, obfuscation is also very effective for Java programs and applies to such independently published application systems.
In the specific solution, we divide the program into two parts: one is the library accessed by the local code library, and the other is other modules developed by Java. In this way, the question management module is not decompiled. We still need to use obfuscation Technology for Java-developed modules. See Figure 7 for the solution.
Figure 7 scjp protection technical solution Diagram
For the question management module, because the program is mainly used in windows, C ++ is used to develop the question library access module and provides certain access interfaces. To protect the interface for accessing the question library, we also added an initialization interface for the initialization work before each use of the question library access interface. Its interfaces are mainly divided into two types:
1. Initialization Interface
Before using the question Library module, we must call the initialization interface. When calling this interface, the client must provide a random number as a parameter. The question bank management module and the client generate the same sessionkey based on a certain algorithm through this random number, which is used to encrypt all input and output data. In this way, only authorized (valid) clients can connect to the correct connection and generate the correct sessionkey for accessing the question bank information. It is difficult for an invalid customer to generate the correct sessionkey, so the user cannot obtain the information of the question bank. If you need to establish a higher level of confidentiality, you can also adopt two-way authentication technology.
2. Data access interface
After the authentication is completed, the client can normally access the data in the question library. However, both input and output data are data encrypted by sessionkey. Therefore, only the correct question library management module can be used. Figure 8 sequence diagram shows the interaction process between the question bank management module and other parts.
Figure 8 interaction process between the question bank management module and other parts