Performance and scalability of Java concurrency programming

Last Update:2014-09-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reflection on performance

For a given operation, there is usually a lack of a particular resource to limit its performance, as is often said of the short board theory, such as CPU clock cycles, memory, network bandwidth, I/O bandwidth, database requests (this should be a bottleneck now high concurrency), disk space, and so on. When the operation is constrained by a resource, we name the operation as resource + "intensive", such as CPU-intensive, database-intensive, and.

Multithreading, in contrast to a single thread, can better leverage its advantages when multiple CPUs are present, but it can cost more if it is a single-core scenario, for example. Coordination between threads, increased context switching, creation and destruction of threads, and scheduling of threads. If multi-threading is used excessively, these costs even affect the performance of the program than single-threaded performance is worse.

So when we are writing multithreaded programs, we should first consider whether the running environment of the program is adaptable to multi-threaded programs.

Performance and scalability

Scalability is the increase in the throughput or processing capacity of a program when computing resources are increased.

When performing tuning, we should parallelize the computations as much as possible.

Three-tier program model: The presentation layer, the business logic layer, and the persistence layer are independent of each other, and if we merge the three layers into the same application then the performance is certainly higher than the performance of the application divided into multiple layers distributed to many systems. But at the same time their scalability is reduced.

When this single system reaches the limit of its own processing power, it will be very difficult to further improve its processing power. So we usually tend to be better at scalability.

evaluate various performance tradeoffs

Avoid immature optimizations, first make the program correct, and then increase the speed of operation-if it is still running fast enough, it is often said. If there is nothing wrong with the program, do not touch it.

Common optimizations include increasing memory usage to reduce latency and increase overhead for security.

In other words, it's not worth it. Here are a few questions:

What is the quicker meaning?

Under what conditions does the method run faster? Low load or high load? Big data sets or small datasets? Can I verify your answer by testing the results?

How often do these conditions occur in the running environment? Can you test your answers by testing them?

Can I use the code here in other environments with different conditions?

Finally, are some of the computational resource values for optimization sacrifices worth it?

Don't guess, take the test as a benchmark.

Amdahl Law

In the case of increasing the computational resources, the program can theoretically achieve the highest speedup, which depends on the proportion of the programmable parallel components and serial components in the program. Assume that F is a part that must be executed serially

speedup<=1/f+ ((1-f)/N)

So the fewer parts that need serial execution, the higher the maximum speedup that can be achieved.

There is no fully parallelized program, even the simplest concurrent program has the serial part that gets runnable from the queue

For example, when the number of threads is increasing, concurrentlinkedqueue can have a throughput rate of two times higher than synchronizedlinkedlist.

Because in the first queue, only the update operation to the pointer needs to be executed serially, the second is that the entire insert or delete operation will be executed serially.

Context Switches

that is, the scheduling of threads. Occurs when the number of threads that can be run is greater than the number of CPUs. Sets the new scheduled thread execution context to the current context. Learning about Android Development knows that Android has a very important domain that is context.

When a thread is blocked by waiting for a competing lock on a thread, the JVM will suspend the thread and allow it to be swapped out. If threads frequently block, then they will not be able to use the full scheduled time slice, then the more context switches occur, the more the scheduling overhead is increased, and therefore the throughput is reduced.

In most general-purpose processors, context switching is equivalent to a clock cycle of 5k to 10k, which is a few microseconds.

Both the Vmstat command in UNIX systems and the Perfmon tool for Windows systems can report the number of context switches and the percentage of execution time in the kernel. If the kernel occupancy rate exceeds 10%, it usually indicates that the scheduling activity occurs very frequently. This is probably due to I/O or a blocking of the competition lock.

Memory Synchronization

Some special instructions, the memory fence, may be used in the visibility guarantees provided by synchronized and volatile. It will resist some compiler optimizations, and in the memory fence, most operations cannot be reordered.

The non-competition synchronization is encouraged to adopt. It has minimal impact on the overall performance of the application. Competitive synchronization, while destroying security, undergoes a very painful process of debugging.

Now the JVM can be optimized to remove some locks that will not compete, thereby reducing unnecessary synchronization overhead.

Like what

Synchronized (new Object ()) {}

The above synchronization is usually optimized.

There are also some optimizations such as the following:

Public String getstoogenames () {list<string> stooges=new arraylist<string> (); Stooges.add ("ADF"); Stooges.add ("ADF"); Stooges.add ("ADF"); return stooges.tostring ();}

In this code above, at least the Stooges lock gets Freed 4 times (the last ToString () is also one time, and an intelligent runtime compiler parses these calls, thus merging the acquisition of locks into one lock acquisition and release. and recompile the getstoogenames to return only the results of the first execution after the first execution.

The cost of non-competitive synchronization is already very small, so we should focus on the areas where competition is taking place.

Performance and scalability of Java concurrency programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance and scalability of Java concurrency programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Performance and scalability of Java concurrency programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support