Cainiao class: Detailed description of data synchronization in Java multi-thread Development

Source: Internet
Author: User

Source: chinaitlab Author: chinaitlab

 

Variables in Java are classified into local variables and class variables. A local variable is a variable defined in a method, such as a variable defined in the run method. For these variables, there is no issue of sharing between threads. Therefore, they do not need to be synchronized. Class variables are defined in the class, and the scope is the entire class. Such variables can be shared by multiple threads. Therefore, we need to synchronize data for such variables.

 

Data Synchronization means that at the same time, only one thread can access the synchronized class variables. Only after the current thread has accessed these variables can other threads continue to access them. The access mentioned here refers to access with write operations. If the threads of all callback variables are read operations, data synchronization is generally not required.

 

What will happen if data is not synchronized to shared class variables? Let's first look at what will happen to the following code:

package test;public class MyThread extends Thread {   public static int n = 0;   public void run()    {      int m = n;yield();      m++;n = m;    }   public static void main(String[] args) throws Exception   {    MyThread myThread = new MyThread ();    Thread threads[] = new Thread[100];    for (int i = 0; i < threads.length; i++)threads[i] = new Thread(myThread);    for (int i = 0; i < threads.length; i++)threads[i].start();    for (int i = 0; i < threads.length; i++)threads[i].join();    System.out.println("n = " + MyThread.n);   }}

 

The possible result of executing the above Code is as follows:

 

N = 59

 

Many readers may find this result strange. This program starts 100 threads, and each thread adds the static variable n to 1. Finally, the join method is used to make the 100 threads run completely, and then the N value is output. Normally, the result is n = 100. The result is less than 100.

 

In fact, the culprit for generating such results is the "dirty data" that we often mention ". The yield () Statement in the run method is the initiator of the "dirty data" generation (the "dirty data" may also be generated without the yield statement, but it is not so obvious, "Dirty data" is often generated only when 100 is changed to a larger number. In this example, yield is called to enlarge the effect of "dirty data ). The yield method is used to suspend the thread, that is, to temporarily discard the CPU resources of the thread that calls the yield method, so that the CPU has the opportunity to execute other threads. To illustrate how this program generates "dirty data", we assume that only two threads are created: thread1 and thread2. because the start method of thread1 is called first, the run method of thread1 is generally run first. When the run method of thread1 runs to the first line (int m = N;), the value of N is assigned to M. when the yield method is executed to the second row, thread1 will temporarily stop running. When thread1 is paused, thread2 starts running after it obtains CPU resources (thread2 is always in the ready state before ), when thread2 is executed to the first line (int m = N;), since N is still 0 when thread1 is executed to yield, the value of M in thread2 is also 0. in this way, the m values of thread1 and thread2 are 0. after they run the yield method, they all add 1 starting from 0. Therefore, no matter who executes the method first, the last n value is 1, but this n is assigned by thread1 and thread2 respectively. This process is shown as follows:

 

Someone may ask, if only N ++ is used, will it generate "dirty data? The answer is yes. Then, N ++ is just a statement. How can I hand over the CPU to other threads during execution? In fact, this is only a superficial phenomenon. After being compiled into an intermediate language (also called bytecode) by the Java compiler, N ++ is not a language. Let's see what kind of Java intermediate language the following Java code will be compiled.

 

Java source code

Public void run () {n ++;} The compiled intermediate language code 001 public void run () 002 {003 aload_0 004 DUP 005 getfield006 iconst_1007 iadd008 putfield 009 return 010}

 

We can see that there is only N + + statements in the run method, but after compilation, there are 7 intermediate language statements. We don't need to know what the functions of these statements are. Just take a look at the 005th, 007, and 008 statements. In line 005, It is getfield. According to its English meaning, we can know that we want to get a value. Because there is only one n, there is no doubt that we want to get the value of N. It is not difficult to guess that the iadd in row 007 is to add the N value 1. I think you have guessed the meaning of putfield in line 008. It is responsible for updating the N after adding 1 back to class Variable N. speaking of this, you may have another question: When you execute n ++, you just need to add n to 1. Why is it so costly. In fact, this involves a Java memory model.

 

The memory model of Java is divided into primary and working storage areas. The primary storage saves all Java instances. That is to say, after we use new to create an object, this object and its internal methods and variables are stored in this area, and N in the mythread class is saved in this area. The primary storage zone can be shared by all threads. The working storage area is the thread stack we mentioned earlier. In this area, the variables defined in the run method and the method called by the run method are saved, that is, the method variables. When the thread wants to modify the variables in the primary storage area, instead of directly modifying these variables, it copies them to the working storage area of the current thread. After the modification, overwrite the variable value in the primary storage area.

 

After learning about the Java memory model, it is not difficult to understand why n ++ is not an atomic operation. It must go through a copy, add 1, and overwrite process. This process is similar to the process simulated in the mythread class. As you can imagine, if thread1 is interrupted for some reason when getfield is executed, the execution result of the mythread class will be similar. To completely solve this problem, you must use a certain method to synchronize N, that is, only one thread can operate N at a time, which is also called an atomic operation on N.

 

Address: http://tech.ccidnet.com/art/3539/20090325/1720555_1.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.