Easily learn multithreading (I) -- the big data era requires multithreading and the multi-threaded data Era
In the demand for big data and high concurrency, how can we make our enterprise survive and survive in the harsh environment of competition? This avoids writing concurrent programs. At the beginning of software design, we should consider which serial programs run in parallel mode more efficiently. This involves multi-task collaboration and data sharing.
With the continuous development of the multi-core era, the single-core processor era has long been replaced. Look at the computers of colleagues around you, at least quad-core processors, let alone workstations and servers. The serial program we compile runs on a single processor. That is to say, we only use less than 1/4 of the CPU system resources, which is undoubtedly a great waste of resources. Therefore, Java's support for multithreading provides a powerful guarantee for multi-core computing.
So from today on, let's talk about multithreading. I will try my best to use my understanding and experience to explain it and hope it will help you.
It is not difficult to get started with multithreaded programming. However, the use of Multithreading is a complex business requirement in actual needs. That is to say, multithreading is used to solve complex business needs in actual needs. In this process, we need to consider concurrency control, data synchronization, data sharing, semaphore control, task collaboration, and so on.
What kind of problems are complicated? My understanding: processing thousands of tens of thousands of pieces of data is also a difficult requirement for small data processing, let alone the need for high concurrency. For how many TB of data, no matter how common a query is, it can calculate a complex requirement.
Just as the president described China's national conditions: What a tiny problem, after multiplying by 1.3 billion, will be a big problem. What a huge problem, divided by 1.3 billion, will become very simple.
There are many multi-threaded examples in our life. I clearly remember that I learned a text "Overall Planning Method" when I was in junior high school. This article mainly tells us how to make proper use of time. An example is given in this article: if we want to take a bath, we need to fetch water first, and then boil water. While boiling water is burning, we can keep waiting for it to boil, you can also prepare something for the bath, such as the basin, wash it for him, etc. After the water is boiled, we can take a bath quickly.
In this process, the author stressed that we should be a person who will live. There is no need to keep waiting for the water to boil. We can use the waiting time to prepare for it. You don't have to wait here to take advantage of the time. Virtually, the author also took a lesson for our friends who learned multithreading,The old saying goes: people who do not live will not learn.
Let's take a look at this example: Water and bath are serial, that is to say, we must finish them from start to end. While waiting for boiling water, it can be done in parallel, we can make other preparations for taking a bath, such as looking for a basin and preparing soap and shampoo. Parallel processing is also an asynchronous processing process. In fact, Asynchronization and concurrency are inseparable. Asynchronization means parallel processing, while concurrency means more.
I have repeatedly stressed that multithreading is used to deal with complex requirements. Despite the complicated problem of multi-thread processing, it is still very difficult. The big guy cannot be a fat man at a Glance. We still need to start from the foundation.
Just like learning Struts and Struts1, we will feel very friendly and not so complicated to study Struts2 and SpringMVC.
The same is true when we learn multithreading. With the basic understanding, the concurrency tools encapsulated in JDK are no longer complicated. In addition, any complicated concurrency requirement encountered in the business is composed of simple related issues, master the basic things, and then click the trick again to adapt.
Multi-threaded applications are also restricted. Blind use or misuse can not only improve the system performance, but also greatly compromise the system performance. Because the thread itself also occupies memory space, a large number of threads will compete to seize memory resources, improper processing can easily cause memory overflow. At the same time, a large number of thread recycling will also put a lot of pressure on the GC and prolong the pause time. This involves some JVM optimization and GC recovery mechanisms. If you have the opportunity, I will introduce it in detail later.
From today on, I will talk about several multi-threaded things based on my own understanding, hoping to help path friends.
Generally, how does one increase the data processing speed by using multiple threads?
Multi-threaded acceleration is just an illusion of the use of people. In the current situation, multi-threaded acceleration is not really possible, but it only disperses the CPU pressure. If the speed is really increased, the cpu controller needs to be improved to accurately identify the logical connection between code statements, but this is difficult at least it seems quite difficult at present.
In a realistic example, a person who repairs tires will face customers with different needs. Some people may lose their cars and take them one day later. Some people may need to make up the road immediately. If it is a single thread, it is done one by one, but this is extremely uncomfortable for those who are in a hurry, because it has to wait for the man to finish repairing the car (and the man will get the car again tomorrow) and help him repair the car again. This is unfriendly to the service customers.
If multithreading is required, someone needs to take a short time to get rid of the impatient tires, repair the tires in a short time, and repair the tires that will be taken tomorrow when they are free. The result is that the efficiency is very high for those who are in a hurry. When they come, they can be repaired. For those who do not care about the time, the efficiency is not reduced, so there is an illusion in comparison, I think multithreading improves efficiency.
In fact, you will know that when this repair is about to discard the real repaired tire in your hand and choose a time-out tire, He must remember the repair progress of the lost tire, and to mount him to a safe place, and then take over the urgent task, when he returns to the mounted tire, he wants to check the tire status, continue repair before comparison. In this case, the efficiency is definitely reduced, because he must change it and remember the state of the mounted tires. If it is repaired one by one, it will not be used, obviously, this multithreading method reduces the efficiency.
The same is true when the cpu is used to serve applications. If a high-priority task comes, the cpu immediately loses its hands to do advanced tasks. After the task is finished, the cpu returns to the low-level system. The cpu efficiency is definitely reduced. The service provider has a good user experience. He does not need to wait.
Because thread switching requires a lot of time, if only one or two threads are consumed, but once a large number of threads come, the cpu will spend a lot of time between these threads. After all, you need to determine which thread has a high priority. If too many threads are judged, the time will increase, and because too many threads increase the price during switching, because you need to remember the status of each thread, if the time spent on thread switching increases, the actual time that can be used to serve the program will inevitably decrease. Efficiency must be lower.
Multi-threaded processing of table data can be a problem
Idea: Load Part A data to the memory at A time, enable multiple threads for processing, and save the processing results to the memory. The results reach A certain number and are written to B.
1: load A data: if the average size of data in A form is 1 K, 1000 rows will be loaded, about 1 MB in total. According to this algorithm, you can set the loading quantity, after a database is loaded, it is saved to the List in the set and serves as the source data set.
2: enable multiple threads to fetch data from the source dataset cyclically and save the processing results to the result dataset. When the number of result data sets reaches 1000 (set by yourself), stop other threads, write Data to B. After the data is written, clear the result data set and continue execution. When the database in the source dataset is processed, the current source data set is cleared and the data retrieved from Table A is saved to the set until Table A has no data.
3: All data processing is complete and all threads are finished. If there is still data in the result dataset, continue to write data to table B and write data to Table C.
4. Clear Table