Using Java multithreading to improve data processing efficiency

Source: Internet
Author: User

Tumor Big data mining often need to deal with tens of billions of lines of text files, these files are often up to hundreds of GB, if the file structure is simple and unified, then the SED and awk processing is very convenient and fast. But sometimes there is a more complex process of logic, so I usually use Java to deal with it. But because Java is single-threaded, so for the laboratory Multi-core server, can fully effective use of each core will be more convenient, then this time it is recommended to use multi-threading concurrency (parallel) processing tasks, so as to achieve the speed of operation of the increase.

Here is an example of a parallel computation. The example is simple, the main is to accumulate three numbers, the final output results. We use single-threaded and multi-threaded execution, where single-threaded execution is sequential and multi-threading initiates three threads concurrently (server CPUs are larger than three, so this is done in parallel instead of concurrency).

The first is a single-threaded run result:

 Public classnothreading{ Public Static voidMain (string[] args) {LongStartTime =System.currenttimemillis (); intsum_i = 0; intSum_j = 0; intSum_k = 0;  for(inti = 0; I < 10000; i++) {sum_i+ = 1;
/* Increase program run time, followed by the same * / for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } for(intj = 0; J < 10000; J + +) {Sum_j+ = 2; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } for(intk = 0; K < 10000; k++) {Sum_k+ = 3; for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); } } LongEndTime =System.currenttimemillis (); System.out.println (Sum_i+ "\ T" + Sum_j + "\ T" +sum_k); System.out.println ("Run Time:" + (Endtime-starttime) + "MS"); }}

Operation Result:

10000    20000    30000  time: 663587ms

The picture is the CPU resource utilization State of the program when it runs: you can see that only one CPU utilization reaches 100%.

Here are the Multithreading:

classcount_i{ Public intsum_i = 0;  Public synchronized voidcount () { for(inti = 0; I < 10000; i++) {sum_i+ = 1; /*increase the elapsed time after the same*/             for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); }        }    }}classcount_j{ Public intSum_j = 0;  Public synchronized voidcount () { for(intj = 0; J < 10000; J + +) {Sum_j+ = 2;  for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); }        }    }}classcount_k{ Public intSum_k = 0;  Public synchronized voidcount () { for(intk = 0; K < 10000; k++) {Sum_k+ = 3;  for(intA = 0; A < 100000; A + +) {String s= "To cost some time"; String[] SS= S.split (""); }        }    }}classMul_thread_iextendsthread{ Publiccount_i c_i;  PublicMul_thread_i (count_i acc) { This. c_i =ACC; }     Public voidrun () {c_i.count (); }}classMul_thread_jextendsthread{ PublicCount_j C_j;  PublicMul_thread_j (Count_j acc) { This. C_j =ACC; }     Public voidrun () {c_j.count (); }}classMul_thread_kextendsthread{ PublicCount_k C_k;  PublicMul_thread_k (Count_k acc) { This. C_k =ACC; }     Public voidrun () {c_k.count (); }} Public classthreethreading_save{ Public Static voidMain (string[] args)throwsinterruptedexception {LongStartTime =System.currenttimemillis (); Count_i ci=Newcount_i (); Count_j CJ=NewCount_j (); Count_k ck=NewCount_k (); Mul_thread_i AA=Newmul_thread_i (CI); Mul_thread_j BB=NewMul_thread_j (CJ); Mul_thread_k cc=Newmul_thread_k (CK);        Aa.start ();        Bb.start ();        Cc.start ();        Aa.join ();        Bb.join ();                    Cc.join ();        System.out.println (ci.sum_i);        System.out.println (Cj.sum_j);        System.out.println (Ck.sum_k); LongEndTime =System.currenttimemillis (); System.out.println ("Run Time:" + (Endtime-starttime) + "MS"); }}

Here is the result of the operation:

100002000030000Run Time:221227ms

CPU Status: You can see that there are three CPU utilization up to 100%.

State at idle:

Summing up some, when we deal with a large number of tasks, if the computer has more than one CPU, can be processed to the task reasonably divided into several parts, and then open a few threads at the same time to operate, and so on these subtasks are completed later to the main thread subsequent processing,

You can see the efficiency doubled. Of course, thread safety is a problem that needs to be noted, as the time relationship is described in detail later.

Using Java multithreading to improve data processing efficiency

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.