Java memory allocation and deep parsing of the String type, javastring

Source: Internet
Author: User

Java memory allocation and deep parsing of the String type, javastring

[Respect Original Articles from:]

SummaryThis section introduces the concept, structure, and allocation mechanism of java memory. Based on this, the java String type is deeply parsed, and the features of the String object are parsed based on memory allocation.

Java memory String StringBuffer StringBuilder

I. Introduction

Among all data types in the java language, the String type is a special type. It is also a frequently asked knowledge point during interviews, this article analyzes many confusing questions about String in depth based on java memory allocation. The following are some of the issues involved in this article. If you are familiar with these issues, you can ignore this article.

1. Which memory is the java memory? Why is the memory area divided? How is it divided? What is the role of each area after division? How do I set the size of each region?

2. Why is the efficiency of String-type connection operations lower than that of StringBuffer or StringBuilder? What are the relationships and differences between StringBuffer and StringBuilder?

3. What is a constant in java? What is the difference between String s = "s" and String s = new String ("s?

This article is written after collecting and summarizing data from multiple parties. If there are any errors, please give me more advice!

Ii. java Memory Allocation

1. Introduction to JVM
Java Virtual Machine (JVM) is an abstract computer that runs all Java programs. It is the runtime environment of the Java language and one of the most attractive features of Java. The Java Virtual Machine has its own complete hardware architecture, such as the processor, stack, and register, as well as corresponding command systems. JVM shields information related to the specific operating system platform, so that Java programs only need to generate the target code (bytecode) that runs on the Java Virtual Machine ), it can be run on multiple platforms without modification.
A running java VM instance is responsible for running a java program. When a Java program is started, a virtual machine instance is born. When the program is closed and exited, the virtual machine instance will die. If three Java programs run on the same computer at the same time, three Java Virtual Machine instances will be obtained. Each Java program runs on its own Java Virtual Machine instance.
As shown in, the JVM architecture includes several major subsystems and memory areas:
Garbage Collection ):Recycles unused objects in Heap memory (Heap), that is, these objects are no longer referenced.
Classloader Sub-System ):In addition to locating and importing binary class files, you must also verify the correctness of the imported class, allocate class variables and initialize memory, and help parse symbol references.
Execution Engine ):Executes the commands contained in the loaded classes.
Java Memory Allocation Area ):It is also called virtual machine memory or Java memory. During virtual machine operation, you need to divide a memory area from the entire computer memory to store many things. For example, bytecode, other information obtained from the mounted class file, objects created by the program, parameters passed to the method, return values, and local variables.

2. java memory Partition
As we can see from the above section, the runtime data zone is the java memory, and there are a lot of things to store in the Data zone. If you do not divide and manage this memory area, it will be messy. Programs like regular things and hate disorder the most. Based on different data storage, java memory is usually divided into five areas: Program Count Register, Native Stack, and Methon Area) stack and Heap ).
Program counter (Program Count Register ):It is also called a program register. JVM supports running multiple threads at the same time. When each new thread is created, it will obtain its own PC register (program counter ). If the thread is executing a Java method (non-native), the PC register value will always point to the next command to be executed. If the method is native, the value of the program counter register is not defined. The JVM program counter register is sufficiently wide to ensure that it can hold a return address or native pointer.
Stack ):Stack. JVM assigns a stack to each newly created thread. That is to say, for a Java program, its operation is done through stack operations. The stack stores the thread status in frames. JVM only performs two types of operations on the stack: frame-based stack pressure and outbound stack operations. We know that the method being executed by a thread is called the current method of this thread. We may not know that the frame used in the current method is called the current frame. When a thread activates a Java method, the JVM will press a new frame into the thread's Java stack, which naturally becomes the current frame. During the execution of this method, this frame is used to save parameters, local variables, intermediate calculation processes, and other data. From the perspective of Java's allocation mechanism, the Stack can be understood as follows: the Stack is a process or thread created by the operating system (a thread in the operating system that supports multithreading) the storage area created for this thread has the advanced and later features. Related parameters:

  • -Xss -- set the maximum value of the method Stack

Native Stack ):Stores the call status of local methods.

Method Area ):When a virtual machine loads a class file, it parses the type information (including class information, constants, static variables, etc.) from the binary data contained in the class file) put it in the method area, and all threads share the memory area, as shown in. There is a special memory area in the local method area, called the Constant Pool. This memory will be closely related to the analysis of the String type.

Heap ):Java Heap is the largest memory managed by Java virtual machines. Java heap is a memory area shared by all threads. The only purpose of this region is to store object instances. Almost all object instances allocate memory here, but the reference of this object is allocated in the Stack. Therefore, when you execute String s = new String ("s"), you need to allocate memory from two places: allocate memory for String objects in the heap, allocate memory for reference (the heap object's memory address, that is, pointer) in the stack, as shown in.

The Java virtual machine has a command for allocating new objects in the heap, but does not release memory, just as you cannot explicitly release an object in the Java code area. The Virtual Machine determines how and when to release the memory occupied by objects no longer referenced by running programs. Generally, the virtual machine hands over the task to the Garbage collector (Garbage Collection ). Related parameters:

  • -Xms -- set the initial heap memory size

  • -Xmx -- set the maximum heap memory.

  • -XX: MaxTenuringThreshold -- sets the number of times an object exists in the new generation.

  • -XX: PretenureSizeThreshold -- set a large object that exceeds the specified size to be directly allocated to the old generation.


Java Heap is the main area for managing the Garbage Collector. Therefore, it is also known as Garbage Collectioned Heap ). Currently, the garbage collector basically uses the generational collection algorithm, so Java heap can be subdivided into Young Generation and Old Generation, as shown in. The idea of generational collection algorithms: the first statement is to scan and recycle young objects at a high frequency. This is called minor collection, the check frequency for old objects (old generation) is much lower, which is called major collection. In this way, you do not need to check all objects in the memory every GC, so that more system resources can be used by the system, when the allocation object encounters insufficient memory, perform GC (Young GC) for the new generation. When the new generation GC still cannot meet the memory space allocation requirements, GC (Full GC) is performed on the entire heap space and the method area ).

Readers may ask: remember what elsePermanent Generation (Permanent Generation)Is it not a Java heap? Sorry, you are correct! In fact, the legendary permanent generation is the method area mentioned above. It stores some type information (including class information, constants, static variables, etc.) loaded by the loader during jvm initialization ), this information has a long life cycle, and GC does not clean up PermGen Space during the main program running period. Therefore, if your application contains many classes, PermGen Space may be faulty. Related parameters:

  • -XX: PermSize -- set the initial size of the Perm Area

  • -XX: MaxPermSize -- sets the maximum value of the Perm zone.

Young Generation)It can also be divided into: Eden zone and region vor zone. region vor zone can be divided into From Space and To Space. The Eden area is the place where the object was originally allocated. By default, the size of the area From Space and To Space is equal. During Minor GC by JVM, the surviving objects in Eden are copied to the same vor area, and the surviving objects in the same vor area are copied to the Tenured area. In this GC mode, To improve GC efficiency, JVM divides the distinct vor into From Space and To Space, so that the object recycling and object promotion can be separated. The size of the new generation has two parameters:

  • -Xmn-set the new generation memory size.

  • -XX: Adjust vorratio -- set the size ratio of the Eden to the same vor space.

Old Generation): When there is not enough space in the OLD area, the JVM will perform a major collection in the OLD area. After full garbage collection, if the OLD and OLD areas still cannot store some objects copied from Eden, as a result, the JVM cannot create a memory area for the new object in the Eden area, and the "Out of memory error" occurs ".


Iii. Deep parsing of the String type

Let's start with the Java data type! Java data types (various classification methods) can be divided into two categories: basic type and reference type. Variables of basic types hold original values, variables of the reference type usually indicate the reference to the actual object, and the value is usually the memory address of the object. Let's take a look at the subdivisions of basic and reference types. Of course, it is only one of the classification methods.

For the above figure, there are three points to note:

  • The char type can be separated into one category. Many basic types are classified as numerical type, character type (char) type, and bool type.

  • The returnAddress type is used internally by a Java VM to implement finally statements in Java programs.

  • Where is the String type? Yes, which belongs to the class type under the reference type. Mining of the String type starts below!


1, The essence of String
Open the source code of the String, and the class comment contains such a section "Strings are constant; their values cannot be changed after they are created. string buffers support mutable strings. because String objects are immutable they can be shared. ". This sentence summarizes one of the most important features of String: String is a constant of immutable and can be shared ).
Next, the String class uses the final modifier, indicating the second feature of the String class: the String class cannot be inherited.
The following is the definition of member variables of the String class. The implementation of the class clarifies that the String value is immutable ).
Private final char value [];
Private final int count;
Therefore, let's look at the concat method of the String class. The first step to implement this method is to expand the capacity of the member variable value. The expansion method re-defines a large character array buf. The second step is to copy the character in the original value to the buf, and then copy the string value that requires concat to the buf. In this way, the buf contains the string value after concat. The following is the key to the problem. If the value is not final, direct the value to the buf, and then return this, the success is that there is no need to return a New String object. But... Unfortunately... Because the value type is final, it cannot point to the newly defined large-capacity array buf. What should I do? "Return new String (0, count + otherLen, buf);" is the last statement of the concat implementation method of the String class. A new String object is returned. This is a big picture!

Summary:StringIt is essentially a character array with two features:1, This class cannot be inherited;2And immutable(Immutable).


2, String definition method
Before discussing the definition method of String, let's take a look at the concept of constant pool. We have mentioned it before in the method area. Here is a formal definition.
The constant pool refers to the data that is identified during the compilation period and saved in the compiled. class file. It includes constants in classes, methods, interfaces, and other fields, as well as string constants. The constant pool is also dynamic. New constants can be placed in the pool during running. The intern () method of the String class is a typical application of this feature. Don't you understand? The intern method will be introduced later. The Virtual Machine maintains a constant pool for each mounted type, and an ordered set of constants used for this type in the pool, including direct constants (string, integer, and float constants) and symbol reference for other types, fields, and methods (what is the difference with object reference? Readers can learn it by themselves ).

There are three methods to define String:

  • Use the keyword new, such as: String s1 = new String ("myString ");

  • For example, String s1 = "myString ";

  • Generation in series, for example, String s1 = "my" + "String"; this method is complex and will not be described here. For more information, see the java -- String constant pool examples.

The first method is to define the process by keyword new: During the program compilation period, the compiler first checks the String constant pool to see if "myString" exists. If it does not exist, A memory space is opened up in the constant pool to store "myString". If yes, there is no need to re-open the space to ensure that there is only one "myString" constant in the constant pool, saving memory space. Then, open up a space in the memory heap to store the new String instance, and open a space in the stack, named "s1". The stored value is the memory address of the String instance in the heap, this process points s1 to the new String instance.Ladies and gentlemen, the most vague point is here! What is the relationship between the new instance in the heap and the "myString" in the constant pool? After analyzing the second definition method, we can analyze the problem later.

The second method is to directly define the process: During the program compilation period, the compiler first checks the String constant pool to see if "myString" exists. If it does not exist, A memory space is opened up in the constant pool to store "myString". If yes, the space does not need to be re-opened. Create a space in the stack and name it "s1". The stored value is the memory address of "myString" in the constant pool.What is the difference between a String constant in the constant pool and a String object in the heap? Why can I call various methods of a String object using a defined String?

With many questions, I will discuss with you the relationship between the String object in the heap and the String constant in the constant pool. Please remember that this is just a discussion, because I am also vague about this.
The first conjecture:Because the directly defined String can also call various methods of the String object, you canIn fact, the constant pool is also createdStringInstance (object). String s1 = new String ("myString"); first, a String instance is created in the constant pool during the compilation period, and then a String instance is cloned and stored in the heap, reference s1 to point to this instance in the heap. At this time, the instances in the pool are not referenced. When String s1 = "myString"; is executed, because the "myString" Instance Object already exists in the pool, s1 directly points to the instance object in the pool; otherwise, create an instance object in the pool, and s1 points to it. As shown in:

This assumption is that the String constant in the constant pool is essentially a String instance and is cloned from the String instance in the heap.

Second ConjectureIt is also the most elaborated on the Internet, but the ideas are not clear, and some problems cannot be explained. The following describes the relationship between a JAVA String object and a String constant.
In the parsing phase, the VM finds the String constant "myString", which is located in an internal String constant list. If the constant is not found, then, a String object s1 containing the Character Sequence [myString] will be created in the heap, and then this character sequence and the corresponding String object will be used as the name value pair ([myString], s1) save it to the internal String constant list. As shown in:

If the same String constant myString is found after the VM, it will find the same character sequence in the internal String constant list, and then return the reference of the corresponding String object. The key to maintaining this internal list is that any specific character sequence appears only once on this list.
For example, String s2 = "myString". during runtime, s2 obtains the return value of s1 from the internal String constant list. Therefore, both s2 and s1 point to the same String object.
This conjecture has an obvious problem. The red font marks the problem. The proof method is very simple. Java er should know the execution result of the following code.
String s1 = new String ("myString ");
String s2 = "myString ";
System. out. println (s1 = s2); // according to the above speculative logic, the printed result is true, while the actual result is false, because s1 points to the String object in the heap, s2 points to the String constant in the constant pool.

Although this section is not so convincing, the article mentions a String constant list, which may be the key to explaining this problem.

The three questions mentioned in this article are just conjecture. Please be informed of the real insider's help in analysis and analysis. Thank you!

  • What is the relationship between the new instance in the heap and the "myString" in the constant pool?

  • What is the difference between a String constant in the constant pool and a String object in the heap?

  • Why can I call various methods of a String object using a defined String?

3, String, StringBuffer, StringBuilder connection and Difference 
The above has analyzed the essence of String. The following describes StringBuffer and StringBuilder.

Both StringBuffer and StringBuilder inherit the abstract class AbstractStringBuilder. This abstract class and String also define char [] value and int count, but unlike the String class, they do not have final modifiers. Therefore, it is concluded that:String, StringBuffer, and StringBuilder are essentially character arrays. The difference is that during the connection operation, String returns a new String instance each time, the append method of StringBuffer and StringBuilder directly returns this, so this is why String is not recommended for a large number of String connection operations, but StringBuffer and StringBuilder are recommended.In which case should StringBuffe be used? What is the use of StringBuilder?

For the differences between StringBuffer and StringBuilder, open their source code and the implementation of the append () method is introduced below.

The first figure above shows the implementation of the append () method in StringBuffer, and the second figure shows the implementation of the append () method in StringBuilder. The difference should be clear at a glance. StringBuffer adds a synchronized modifier before the method to play the role of synchronization and can be used in multi-threaded environments. The cost is to reduce the execution efficiency.Therefore, if you can use StringBuffer to connect strings in a multi-threaded environment, and use StringBuilder in a single-threaded environment, it is more efficient.

Iv. References

Java virtual machine architecture
Java memory management basics-Java Memory Allocation
Optimization of Java heap memory settings
Java memory management and garbage collection
Conversion and recovery of Java heap memory
JVM garbage collection mechanism of Java Virtual Machine

Several tips for setting JVM Memory Allocation
Go deep into Java strings
Java performance optimization: String
Java String constant pool knowledge
Java memory allocation and String type
Java String memory mechanism
Java memory analysis and String object

String learning Summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.