Tumor Big data mining often need to deal with tens of billions of lines of text files, these files are often up to hundreds of GB, if the file structure is simple and unified, then the SED and awk processing is very convenient and fast. But sometimes there is a more complex process of logic, so I usually use Java to deal with it. But because Java is single-threaded, so for the laboratory Multi-core server, can fully effective use of each core will be more convenient, then this time it is recommended to use multi-threading concurrency (parallel) processing tasks, so as to achieve the speed of operation of the increase.
Here is an example of a parallel computation. The example is simple, the main is to accumulate three numbers, the final output results. We use single-threaded and multi-threaded execution, where single-threaded execution is sequential and multi-threading initiates three threads concurrently (server CPUs are larger than three, so this is done in parallel instead of concurrency).
The first is a single-threaded run result:
Public classnothreading{ Public Static voidMain (string[] args) {LongStartTime =System.currenttimemillis (); intsum_i = 0; intSum_j = 0; intSum_k = 0; for(inti = 0; I < 10000; i++) {sum_i+ = 1;
/* Increase program run time, followed by the same * / for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } for(intj = 0; J < 10000; J + +) {Sum_j+ = 2; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } for(intk = 0; K < 10000; k++) {Sum_k+ = 3; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } LongEndTime =System.currenttimemillis (); System.out.println (Sum_i+ "\ T" + Sum_j + "\ T" +sum_k); System.out.println ("Run Time:" + (Endtime-starttime) + "MS"); }}
Operation Result:
10000 20000 30000 time: 663587ms
The picture is the CPU resource utilization State of the program when it runs: you can see that only one CPU utilization reaches 100%.
Here are the Multithreading:
classcount_i{ Public intsum_i = 0; Public synchronized voidcount () { for(inti = 0; I < 10000; i++) {sum_i+ = 1; /*increase the elapsed time after the same*/ for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } }}classcount_j{ Public intSum_j = 0; Public synchronized voidcount () { for(intj = 0; J < 10000; J + +) {Sum_j+ = 2; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } }}classcount_k{ Public intSum_k = 0; Public synchronized voidcount () { for(intk = 0; K < 10000; k++) {Sum_k+ = 3; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } }}classMul_thread_iextendsthread{ Publiccount_i c_i; PublicMul_thread_i (count_i acc) { This. c_i =ACC; } Public voidrun () {c_i.count (); }}classMul_thread_jextendsthread{ PublicCount_j C_j; PublicMul_thread_j (Count_j acc) { This. C_j =ACC; } Public voidrun () {c_j.count (); }}classMul_thread_kextendsthread{ PublicCount_k C_k; PublicMul_thread_k (Count_k acc) { This. C_k =ACC; } Public voidrun () {c_k.count (); }} Public classthreethreading_save{ Public Static voidMain (string[] args)throwsinterruptedexception {LongStartTime =System.currenttimemillis (); Count_i ci=Newcount_i (); Count_j CJ=NewCount_j (); Count_k ck=NewCount_k (); Mul_thread_i AA=Newmul_thread_i (CI); Mul_thread_j BB=NewMul_thread_j (CJ); Mul_thread_k cc=Newmul_thread_k (CK); Aa.start (); Bb.start (); Cc.start (); Aa.join (); Bb.join (); Cc.join (); System.out.println (ci.sum_i); System.out.println (Cj.sum_j); System.out.println (Ck.sum_k); LongEndTime =System.currenttimemillis (); System.out.println ("Run Time:" + (Endtime-starttime) + "MS"); }}
Here is the result of the operation:
100002000030000Run Time:221227ms
CPU Status: You can see that there are three CPU utilization up to 100%.
State at idle:
Summing up some, when we deal with a large number of tasks, if the computer has more than one CPU, can be processed to the task reasonably divided into several parts, and then open a few threads at the same time to operate, and so on these subtasks are completed later to the main thread subsequent processing,
You can see the efficiency doubled. Of course, thread safety is a problem that needs to be noted, as the time relationship is described in detail later.
Using Java multithreading to improve data processing efficiency