How can bytecode prevent memory errors and improve code quality?

Source: Internet
Author: User

 

Start to useJAVAP

Most Java programmers know that their programs are generally not compiled into local code but compiled into the bytecode format executed by the Java Virtual Machine (JVM. However, few java programmers have read bytecode because their tools do not encourage them to look at it. Most Java debugging tools do not allow single-step execution of bytecode. They either display source code lines or nothing.
Fortunately, JDK provides javap, a command line tool that makes it easy to view bytecode. Let's look at an example:

Public class ByteCodeDemo {
Public static void main (String [] args ){
System. out. println ("Hello world ");
}
}

After compiling this class, you can open the. class file in the hexadecimal editor and then translate the bytecode according to the specifications of the virtual machine. Fortunately, there is a simpler method. JDK contains a command line anti-Assembler: javap, which can convert bytecode into a readable notation. You can obtain the bytecode list by passing the-c parameter to javap as follows:

Javap-c ByteCodeDemo

The output is similar to the following:

Public class ByteCodeDemo extends java. lang. Object {
Public ByteCodeDemo ();
Public static void main (java. lang. String []);
}
Method ByteCodeDemo ()
0 aload_0
1 invokespecial #1
4 return
Method void main (java. lang. String [])
0 getstatic #2
3 ldc #3
5 invokevirtual #4
8 return

You can only learn a lot about bytecode from this short list. Start with the first command of the main method:

0 getstatic #2

The starting integer is the Offset Value of the instruction in the method. Therefore, the first instruction starts with 0. The offset is the mnemonic of the command ). In this example, the getstatic command pushes a static member into a data structure called the operand stack. Subsequent commands can reference the Members in this data structure. The getstatic command is the member to be pushed in. In this example, the member to be pushed in is "#2 ". If you check the bytecode directly, you will see that the member information is not directly embedded with instructions, but is stored in a shared pool as all constants used by java classes. Storing member information in a constant pool can reduce the size of bytecode commands, because commands only need to store an index in the constant pool rather than the whole constant. In this example, the member information is located at #2 in the constant pool. The project sequence in the constant pool is related to the compiler, so what you see in your environment may not be #2.

After analyzing the first command, you can easily guess the meaning of other commands. The ldc (load constant) command pushes the constant "Hello, World." into the operand stack. The invokevirtual command calls the println method. It pops up two parameters from the operand stack. Do not forget that an instance method like println has two parameters: the above string, plus the implicit this reference.

How to prevent memory errors with bytecode

Java is often hailed as a "secure" language for Internet software development. On the surface, how can code similar to c ++ reflect security? An important security concept introduced by it is to prevent memory-related errors. Computer criminals exploit memory errors to insert malicious code into secure programs in other circumstances. Java bytecode is the first to prevent such attacks, as shown in the following example:

Public float add (float f, int n ){
Return f + n;
}

If you add this method to the above example, recompile it, and then run javap, you will see the bytecode similar to this:

Method float add (float, int)
0 fload_1
1 iload_2
2 i2f
3 fadd
4 freturn

At the beginning of the method, the virtual machine puts the parameters of the method into a data structure called a local variable table. As the name implies, the local variable table also contains any local variables you declare. In this example, the method starts with three items in the local variable table. These are parameters of the add method. The position 0 stores this reference, positions 1 and 2 Save the float and int parameters respectively.

To operate these variables, they must be loaded (pushed) to the operand stack. The first command, fload_1, pushes the float at position 1 to the operand stack, and the second command, iload_2, pushes the int at position 2 to the operand stack. Note that the I and f prefixes in these commands indicate that the Java bytecode commands are strongly typed. If the parameter type does not match the bytecode type, the VM rejects the bytecode as unsafe. Even better, bytecode is designed to perform such a type security check only once when the class is loaded.

How is this type of security enhanced? If an attacker can trick a virtual machine into using an int as a float or vice versa, it can easily destroy computing in an expected way. If these calculations involve bank balances, the implied security is obvious. What's more dangerous is that the VM uses an int as an Object reference. In most cases, this will cause the VM to crash, but the attacker only needs to find a vulnerability. Do not forget that attackers will not manually search for this vulnerability-writing a program that generates hundreds of millions of error bytecode is quite easy to arrange, trying to find the lucky one that harms the VM.

Another memory security protection for bytecode is Array Operations. Aastore and aaload bytecode operate Java arrays and they always check the array boundary. If the caller crosses the end of the array, these bytecode will throw an ArrayIndexOutOfBoundsException. Maybe all the most important checks use branch commands, such as the bytecode starting with if. In bytecode, branch commands can only be transferred to other commands in the same method. The only control that can be passed out of the method is to make it return: throw an exception or execute an invoke command. This not only disables many attacks, but also prevents annoying errors caused by dangling references or stack conflicts. If you used the system debugger to open your program and locate a random position in the code, you will be familiar with these errors.

One important thing to remember in all these checks is that they are performed by virtual machines at the bytecode level rather than by the compiler at the source code level. A compiler for a language such as c ++ may prevent some memory errors discussed above during compilation, but these protections are only applicable at the source code level. The operating system will be happy to load and execute any machine code, whether it is generated by a fine c ++ compiler or by malicious attackers. To put it simply, C ++ is only object-oriented at the source code level, while Java's object-oriented features are extended to the compiled code level.

Analyze bytecode to improve code Quality

The memory and security of Java bytecode are everywhere, so why do we bother to check the bytecode? In many cases, knowing how the compiler converts your code into bytecode can help you write more efficient code and, in some cases, prevent hard-to-detect errors. Consider the following example:

// Returns the concatenation of str1 + str2
String concat (String str1, String str2 ){
Return str1 + str2;
}

// Attach str2 to str1
Void concat (StringBuffer str1, String str2 ){
Str1.append (str2 );
}

Guess how many method calls are required for each method. Compile these methods and run javap. You will get the following output:

Method java. lang. String concat1 (java. lang. String, java. lang. String)
0 new #5
3 dup
4 invokespecial #6
7 aload_1
8 invokevirtual #7
11 aload_2
12 invokevirtual #7
15 invokevirtual #8
18 areturn
Method void concat2 (java. lang. StringBuffer, java. lang. String)
0 aload_1
1 aload_2
2 invokevirtual #7
5 pop
6 return

The concat1 method executes five methods: s: new, invokespecial and three invokevirtuals. This is more work than the concat2 method, and the latter only executes one invokevirtual call. Most Java programmers have been warned that String is immutable, while StringBuffer is more efficient for String connection. The use of javap analysis makes this very vivid. If you are not sure whether the two language structures are equal in performance, you should use javap to analyze bytecode. However, the just-in-time (JIT) compiler should be careful, because the JIT compiler recompiles bytecode into local code and can execute some additional optimizations that javap cannot reveal. Unless you have the source code of your Vm, you should supplement your bytecode benchmark performance analysis.

The last example shows how to check the bytecode to help prevent errors in the program. Create two classes as follows to ensure they are in an independent file.

Public class ChangeALot {
Public static final boolean debug = false;
Public static boolean log = false;
}

Public class EternallyConstant {
Public static void main (String [] args ){
System. out. println ("EternallyConstant beginning execution ");
If (ChangeALot. debug)
System. out. println ("Debug mode is on ");
If (ChangeALot. log)
System. out. println ("Logging mode is on ");
}
}

If you run EternallyConstant, you will get the information:

EternallyConstant beginning execution.

Now try to edit ChangeALot and change the value of debug and log to true (both are true ). Only recompile ChangeALot. Run EternallyConstant again and you will see the following output:

EternallyConstant beginning execution
Logging mode is on

What happened to the debug variable? Even if you set debug to true, the message "Debug mode is on" does not appear. The answer is in the bytecode. Run javap on EternallyConstant and you will see:

Method void main (java. lang. String [])
0 getstatic #2
3 ldc #3
5 invokevirtual #4
8 getstatic #5
11 ifeq 22
14 getstatic #2
17 ldc #6
19 invokevirtual #4
22 return

Surprise! There is an ifeq check on the log member, and the Code does not check the debug member at all. Because debug

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.