Java memory allocations and deep parsing of string types

Last Update:2018-05-23 Source: Internet

Author: User

Tags modifier stringbuffer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction of the topic

Of all the data types in the Java language, the string type is a special type, and it is also a knowledge point that is often asked during an interview, and this article combines the Java memory allocation depth analysis with many confusing questions about string. Here are some of the issues that will be covered in this article, which can be ignored if the reader is familiar with these issues.

1. What block of memory does Java memory refer specifically to? Why do we divide this area of memory? How is it divided? What is the role of each area after partitioning? How do I set the size of each area?

2. When the string type is performing a connection operation, why is the efficiency lower than StringBuffer or StringBuilder? What is the connection and difference between StringBuffer and StringBuilder?

3. What is a constant in Java? What is the difference between string s = "S" and string s = new String ("s")?

This article through the collection and collation of multi-data, the final writing, if there are errors, please advise!

Ii. Java memory allocation 1, JVM Introduction

The Java Virtual machine (Java VM abbreviation JVM) is an abstract computer that runs all Java programs and is the operating environment of the Java language, which is one of the most attractive features of Java. Java Virtual machine has its own perfect hardware architecture, such as processor, stack, register, etc., also has the corresponding instruction system. The JVM masks information that is relevant to the operating system platform, allowing Java programs to run without modification on multiple platforms by generating only the target code (bytecode) that runs on the Java virtual machine.

The bounden duty of a runtime Java Virtual machine instance is to be responsible for running a Java program. When a Java program is started, a virtual machine instance is created. When the program shuts down, the virtual machine instance also dies. If you run three Java programs concurrently on the same computer, you will get three instances of the Java virtual machine. Each Java program runs in its own instance of the Java virtual machine.

As shown, the JVM's architecture consists of several major subsystems and memory areas:

garbage collector (Garbage Collection): responsible for recovering objects that are not used in heap memory (heap), that is, those objects have not been referenced.

class Mount Subsystem (Classloader sub-system): In addition to locating and importing binary class files, you must also be responsible for validating the correctness of the imported classes, allocating and initializing memory for class variables, and helping to parse symbol references.

The execution engine (execution engine) is responsible for executing instructions that are contained in the methods of the loaded class.

Runtime data region (Java Memory Allocation area): also known as virtual machine memory or Java RAM, virtual machines need to partition a memory area from the entire computer memory to store many things. For example: bytecode, other information obtained from the loaded class file, objects created by the program, parameters passed to the method, return values, local variables, and so on.

2. Java Memory partition

From the previous section, we know that the runtime data area is Java memory, and the data area to store more things, if the memory area is not divided into management, it will appear more disorganized. Programs like regular things that hate clutter. Depending on the data being stored, Java memory is typically divided into 5 regions: program Count Register, local method stack (Native stack), method area (Methon areas), stack (stack), heap.

Program Count Register: also called program Register. The JVM supports multiple threads running concurrently, and when each new thread is created, it will get its own PC Register (program counter). If the thread is executing a Java method (non-native), then the value of the PC register will always point to the next instruction to be executed, and if the method is native, the value of the program counter register will not be defined. The JVM's program counter register is wide enough to hold a return address or native pointer.

Stack (Stack): also called stacks. The JVM allocates a stack for each newly created thread. That is to say, for a Java program, it runs through the operation of the stack to complete. The stack holds the state of the thread in frames. The JVM performs only two operations on the stack: stack and stack operations in frames. We know that the method that a thread is executing is called the current method of this thread. We may not know that the frame used by the current method is called the current frame. When a thread activates a Java method, the JVM presses a new frame into the Java stack of threads, which naturally becomes the current frame. During this method execution, this frame is used to hold parameters, local variables, intermediate calculation procedures, and other data. From this allocation mechanism in Java, the stack can be understood as: stack is the storage area that the operating system establishes for a process or thread (a thread in a multithreaded operating system) for this thread, which has an advanced post-out feature. Its related setting parameters:

-xss– setting the maximum value of the method stack

Local method Stack (Native stack): stores the invocation state of the local method.

method Area: When a virtual machine loads a class file, it parses the type information from the binary data contained in the class file, and then puts the type information (including class information, constants, static variables, and so on) into the method area. The memory area is shared by all threads, as shown in. There is a special area of memory in the local method area called Chang (Constant Pool), which is closely related to the parsing of string types.

Heap: The Java heap (Java heap) is the largest piece of memory managed by a Java virtual machine. The Java heap is an area of memory that is shared by all threads. The only purpose in this area is to hold object instances where almost all object instances are allocated memory, but the object's reference is allocated on the stack. Therefore, when you execute string s = new String ("s"), you need to allocate memory from two places: allocating memory to a string object in the heap, allocating memory in the stack for a reference (the memory address of the heap object, that is, the pointer), as shown in.

The Java Virtual machine has a directive that allocates new objects in the heap, but does not have instructions to free up memory, just as you cannot explicitly release an object in the Java code area. The virtual machine itself is responsible for deciding how and when to release the memory occupied by objects that are no longer referenced by the running program, and typically the virtual machine gives the task to the garbage collector (garbage Collection). Its related setting parameters:

-xms-setting heap Memory Initial Size
-xmx-Setting the maximum heap memory value
-xx:maxtenuringthreshold-set the number of times the object survives in the Cenozoic
-xx:pretenuresizethreshold-set large objects that are larger than the specified size are allocated directly in the old generation

The Java heap is the primary area of garbage collector management and is therefore referred to as the GC heap (garbage collectioned heap). Now the garbage collector is basically a generational collection algorithm, so the Java heap can also be subdivided into: the New Generation (young Generation) and the older generation (old Generation), as shown in. The idea of generational collection algorithms: The first is to scan and recycle young generation with a higher frequency, which is called minor collection, and the old generation is much less frequently checked and recycled. Called Major collection. This does not require each GC to check all the objects in memory, in order to make more system resources available for the system to use, another way of saying that when the allocated object encountered low memory, the new generation of GC (young GC), when the new generation of GC is still unable to meet the memory space allocation requirements, The entire heap space and the method area are GC (full GC).

There may be a question for readers here: remember what a permanent generation (Permanent Generation) is, does it not belong to the Java heap? Kiss, you got the right answer! In fact, the legend of the permanent generation is the above-mentioned method area, which is stored in the JVM initialization when the loader loaded some types of information (including class information, constants, static variables, etc.), the life cycle of this information is longer, GC will not be in the main program run time to PermGen space cleanup, So if you have a lot of classes in your application, you're likely to have permgen space errors. Its related setting parameters:

-xx:permsize– setting the initial size of the perm area
-xx:maxpermsize– setting the maximum value of the perm area

New generation (young Generation also divided into: Eden and Survivor District, the survivor area is divided into from space and to space. The Eden area is where the object was originally assigned; By default, the area of the from space and to space are equal in size. When the JVM performs a minor GC, it copies objects that are still alive in Eden to the Survivor area, and copies objects that are still alive in the survivor area to the tenured area. In this GC mode, the JVM differentiates survivor from space and to space in order to increase GC efficiency, which separates object reclamation from object promotion. The Cenozoic size setting has 2 related parameters:

-xmn-set the Cenozoic memory size.

-xx:survivorratio-setting the size ratio of Eden to survivor space

old Generation: when there is not enough space in older, the JVM will be major collection in the elderly area, and after a complete garbage collection, if the survivor and old areas still cannot hold some objects copied from Eden, Causes the JVM to fail to create a memory region for the new object in the Eden area, an "out of memory error" occurs.

Third, depth parsing of string types

Let's start with the Java data type! Java data types are generally divided into two broad categories: the underlying type and the reference type, the underlying type of the variable holds the original value, and the reference type's variable usually represents a reference to the actual object whose value is usually the memory address of the object. For the base type and the reference type of the subdivision, directly, everyone looked at a glance. Of course, it's just one way of classifying it.

(The original picture is missing)

For the above figure, there are 3 points to note:

Char types can be formed individually, and many of the basic types are categorized as: Numeric, character (char), and type bool.
The ReturnAddress type is a type that is used internally by a Java virtual machine and is used to implement the finally statement in a Java program.
Where is the string type? Yes, which belongs to the class type below the reference type. The following starts the mining of string Types!

1, the nature of string

Open string source code, class comments have such a paragraph of words "Strings is constant;" Their values cannot is changed after they is created. String buffers support mutable strings. Because String objects is immutable they can be shared. " This sentence summarizes one of the most important features of string: string is a constant of value immutable (immutable) and is thread-safe (can be shared).

Next, the string class uses the final modifier, which indicates the second characteristic of the string class: The String class is not inheritable.

The following is the member variable definition of the string class, which clarifies from the implementation of the class that the string value is immutable (immutable).

Private final char value[];
private final int count;

Therefore, we look at the concat method of the String class. The first step in implementing this method is definitely to enlarge the capacity of the member variable value, and to redefine a large-capacity character array buf. The second step is to copy the characters from the original value into the BUF, and then copy the string value that needs to be concat to buf, so that buf contains the string value after concat. Here is the key to the problem, if value is not final, just let value point to buf, and then return this, it is done, there is no need to return a new string object. But... Pity... Since value is final, it cannot point to the newly defined mass array buf. "Return new string (0, Count + otherlen, buf);", which is the last statement of the string class Concat implementation method, which is returned with a new string object. That's the truth!

Summary: String The essence is a character array, two features: 1 , the class cannot be inherited; 2 , non-denatured (immutable) .

2. How to define String

Before discussing the method of defining a string, let's look at the concept of a constant pool, which was mentioned earlier in the introduction of the method area. The following is a slightly formal definition of it.

Chang (constant pool) refers to some data that is determined at compile time and is saved in the compiled. class file. It includes constants in classes, methods, interfaces, and so on, and also includes string constants. Chang is also dynamic, and new constants can be put into the pool during operation, and the Intern () method of the string class is a typical application of this feature. Don't you understand? The Intern method is described later in this article. The virtual machine maintains a constant pool for each mounted type, with an ordered set of constants used by the type in the pool, including direct constants (string, Integer, and float constants) and symbolic references to other types, fields, and methods (the difference from object references). Readers can see for themselves).

The method of defining string is summed up in a total of three ways:

Use the keyword new, such as: string S1 = new String ("myString");
Directly defined, such as: String S1 = "myString";
Concatenation is generated, such as String S1 = "my" + "string"; This is a complex approach, so here's a few examples of the java–string constant pool problem.

The first way to define the process through the keyword NEW: in the program compile time, the compiler goes to the string constant pool check, whether there is "myString", if not present, in the constant pool to open up a memory space to hold "myString", if there is no need to re-open space, Ensure that there is only one "myString" constant in the constant pool, saving memory space. It then opens a space in the memory heap to hold the new string instance, creating a space in the stack, named "S1", and storing the value as the memory address of the string instance in the heap, which refers to the string instance of S1 to new. everybody, the most vague place to be! What is the relationship between the new instance in the heap and the "myString" in the constant pool? After we have analyzed the second definition, we will go back and analyze the problem.

The second way to define the process directly: in the program compile time, the compiler goes to the string constant pool check, whether there is "myString", if not exist, in the constant pool to open up a memory space to store "myString", if there is, there is no need to re-open space. A space is then created in the stack, named "S1", and the stored value is the memory address of "myString" in the constant pool. What is the difference between a string constant in a constant pool and a string object in a heap? Why can a directly defined string also invoke various methods of a string object?

With a lot of questions, I'll talk to you about the relationship between a string object in a heap and a string constant in a constant pool, and please remember that it's just a matter of discussion, because I'm quite vague about this piece.

The first conjecture: because a directly defined string can also invoke various methods of a string object, it can be assumed that a string instance (object) is actually created in the constant pool . string S1 = new String ("MyString"), a string instance is created in the constant pool at compile time, then a string instance is stored in the heap, and the reference S1 points to this instance in the heap. At this point, the instances in the pool are not referenced. When the string S1 = "MyString" is executed, the S1 points directly to the instance object in the pool, because an instance object already exists in the pool, or the instance object is created first in the pool, and S1 points to it. As shown in the following:

This conjecture holds that a string constant in a constant pool is essentially a string instance, and a string instance in the heap is a clone relationship.

The second conjecture is also at present the most on-line exposition, but the ideas are not clear, some of the problems can not be explained. The following references the relational parsing of JAVA string objects and string constants.

In the parsing phase, the virtual machine discovers the string constant "myString", which is found in an internal string constant list, and if not found, creates a string object containing the character sequence [myString] in the heap S1, The character sequence and the corresponding string object are then saved to the internal string constant list as a name-value pair ([myString], S1). As shown in the following:

If an identical string constant, mystring, is found behind the virtual machine, it finds the same sequence of characters within the list of internal string constants and returns a reference to the corresponding string object. The key to maintaining this internal list is that any particular sequence of characters appears only once on this list.

For example, string s2 = "MyString", the runtime S2 gets the S1 return value from the internal string constant list, so S2 and S1 point to the same string object.

This conjecture has a more obvious problem, the place where the red font is marked is where the problem lies. The proof method is very simple, the following code implementation results, Javaer should know.

string S1 = new String ("myString");
String s2 = "myString";

System.out.println (S1 = = s2); According to the above inference logic, the result of printing is true, and the actual result is false, because S1 points to a string object in the heap, and S2 points to a string constant in the constant pool.

Although this is not persuasive, the article mentions a thing-a list of string constants that may be the key to explaining the problem.

The article mentions three questions, this paper only gives the conjecture, please know the Real insider's expert to help analysis and analysis, thank you!

What is the relationship between the new instance in the heap and the "myString" in the constant pool?
What is the difference between a string constant in a constant pool and a string object in a heap?
Why can a directly defined string also invoke various methods of a string object?

3, String, StringBuffer, StringBuilder of the contact and difference

The above has analyzed the nature of the string, the following simple talk about StringBuffer and StringBuilder.

Both StringBuffer and StringBuilder inherit the abstract class Abstractstringbuilder, which, like string, also defines char[] value and int count, but unlike the String class, They do not have a final modifier. Therefore, it is concluded thatstring, StringBuffer, and StringBuilder are essentially character arrays, but that when a connection operation is performed, the string returns a new string instance each time. The Append method of StringBuffer and StringBuilder directly returns this, so this is why it is not recommended to use string when doing a large number of string join operations, and recommend StringBuffer and StringBuilder. So, what's the use of Stringbuffe? What kind of situation uses StringBuilder?

About the difference between StringBuffer and StringBuilder, open their source code, the following post append () method implementation.

The first figure of the polygon is the implementation of the Append () method in StringBuffer, and the second picture is the implementation of the StringBuilder to append (). The difference should be at a glance, StringBuffer in the method before adding a synchronized modification, play the role of synchronization, can be used in multi-threaded environment. The cost of this is to reduce the efficiency of implementation. Therefore, if you can use StringBuffer for string connection operations in a multithreaded environment, the single-threaded environment uses StringBuilder, which is more efficient.

Java memory allocations and deep parsing of string types

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More