Use Java to construct highly scalable applications

Source: Internet
Author: User

Http://www.ibm.com/developerworks/cn/java/j-lo-scalbility? S_tact = 105agx52 & s_cmp = tec-csdn # Resources

 

Use Java to construct highly scalable applications

How to implement an efficient and multi-thread secure queue

Document options
<Tr
Valign = "TOP"> <TD width = "8"> Src = "// www.ibm.com/ I /c.gif"/> </TD> <TD width = "16"> Height = "16" src = "// www.ibm.com/ I /c.gif"/> </TD> <TD class = "small"
Width = "122"> <p> <SPAN class = "Ast"> Javascript is not displayed
</Span> </P> </TD> </tr>


Print this page

Send this page as an email

Sample Code

Level: Advanced

Dai Xiaojun
Daixiaoj@cn.ibm.com
), Software engineer, IBM China Software Development Center
Gan Zhi
Ganzhi@cn.ibm.com
), Senior Software Engineer, IBM China Software Development Center
Qi Yao
Qiyaoj@cn.ibm.com
), Software engineer, IBM China Software Development Center
Luo Zhida
Luozd@cn.ibm.com
), Software engineer, IBM China Software Development Center

October 10, 2008

When
After the CPU enters the multi-core era, performance tuning of software is no longer a simple task. Programs without parallelism may run slower on new hardware than before. When the CPU
When the number increases, it is wise for chip manufacturers to reduce the CPU running frequency to achieve the best performance/power consumption ratio. Compared with C/C ++ programmers
Writing multi-threaded applications in Java is much simpler. However, it is not easy for multi-threaded programs to achieve high performance. For software developers,
It is not surprising that parallel programs are not faster than serial programs during testing. After all, before the multi-core era, the widely accepted parallel software development guidelines are usually too simple and arbitrary.

In this article, we will introduce the general steps to improve the performance of Java multi-threaded applications. With some simple rules provided in this article, we can obtain high-performance and scalable applications.


Why is the performance not growing?

Multi-nuclear power brings about a substantial increase in performance, which can be easily observed through some simple tests. If we write a multi-threaded program and accumulate a local variable in each thread, we can easily see that multi-core and parallel performance improves exponentially. This is very easy to do, isn't it? In reference resources
Here is an example. However, contrary to our tests, we seldom see such perfect scalability in actual software applications. There are two factors that impede us from achieving perfect scalability. First, we are faced with theoretical limitations. Secondly, implementation problems often occur during software development. Let's take a look at the three performance curves shown in Figure 1:

Figure 1. Performance Curve


Work
To pursue a perfect software engineer, we hope to see linear growth in program performance as the number of threads increases, that is, Figure 1
. What we do not want to see most is the green curve. No matter how many new CPUs are invested, the performance has not increased at all. (With the CPU
The growth and performance decline curve also exists in the actual project ). The red lines in the figure indicate that the General 90-10 rule is not applicable to scalability. Suppose there are 10%
The expansion curve is shown in the red line. As shown in the figure, when 90% of the code can be perfectly parallel
In this case, we can only achieve about 5 times of performance. If a task has a part that cannot be parallel, then in the real world, our performance curve is roughly located in the gray area in Figure 1.

In this article, we will not try to challenge the theoretical limits. It is not an easy task to explain how a Java programmer can reach the limit as much as possible.

What causes poor scalability?

Yes
There are many reasons for poor scalability, the most significant of which is the abuse of locks. There is no way to do this. We were taught like this: "Do you want multi-thread security? Add a lock ". Think about Python
And Java's collections. synchronizedxxxx ()
What's wrong with the series of methods that follow giants? Yes, it is very convenient to use locks to protect key areas, and it is easier to ensure correctness. However, locks also mean that only one process can enter the key areas, while others can
Cheng is waiting! If you observe that the CPU is idle and the software runs slowly, it is wise to check the use of the lock.

Java lock monitor in performance inspector is a good open-source tool for Java programs.

Tune a multi-threaded Application

Next, we will provide an example program and demonstrate how to achieve better scalability on a multi-core platform. This example demonstrates a hypothetical log server. It receives logs from multiple sources and saves them to the file system. For the sake of simplicity, our example code does not contain any network-related code,Main()
The function starts multiple threads to send log information to the log server. For eager readers, Let's first look at the optimization results:

Figure 2. Daily server optimization result


In
In, the blue curve is a lock-based old-fashioned log server, while the green curve is the log server after we have optimized the performance. You can see that logserverbad
The performance of logservergood increases linearly as the number of threads increases. If you do not mind using a third-party library
The lockfreequeue of Project Kunming can further provide better scalability:

Figure 3. Use the lock-Free Data Structure


In
The third curve indicates that the concurrent1_queue in the standard library is replaced with lockfreequeue.
Performance curve. We can see that if the number of threads is small, there is little difference between the two curves, but after the number of single threads increases to a certain extent, the lock-free data structure has obvious advantages.

The following describes the tools and techniques used in the above example to help us create highly scalable Java applications.

Use JlM to analyze applications

JlM provides lock hold time and conflict statistics for Java applications and JVM. The following functions are provided:

  • Count conflicting locks

    • Number of successfully obtained locks
    • Number of recursive locks
    • Number of times the Lock Applying thread is blocked
    • Cumulative lock hold time. For platforms that support 3tier spin locking, you can also obtain the following information:
      • Number of spin loop request locks
      • Number of times that the request thread requests the lock in the outer layer (thread yield loop)
  • Use the rtdriver tool to collect more detailed information
    • Jlmlitestart: Collects counters only
    • Jlmstart: collects statistics on counters and hold time only.
    • Jlmstop: Stop data collection
    • Jlmdump: prints the data collection and continues the collection process.
  • Time when garbage collection (GC) is removed from lock hold time
    • GC time is removed from the hold time of all held locks in the GC cycle

Use atomicinteger to count

Generally, when we implement counter or random number generator used by multiple threads, locks are used to protect shared variables. The disadvantage of doing so is that if the lock competition is too powerful, it will damage the throughput, because the competition synchronization is very expensive.

Although the volatile variable can store shared variables at a lower cost than synchronization, it can only ensure that other threads can immediately see the write to the volatile variable, the atomicity of read-Modify-write cannot be guaranteed. Therefore, the volatile variable cannot be used to implement the correct counter and random number generator.

Starting from JDK 5,java.util.concurrent.atomic
Atomic variables are introduced in the package, including atomicinteger, atomiclong, atomicboolean, and arrays atomicintergerarray and atomiclongarray. Atomic variables ensure++
,--
,+=
,-=
And other operations. With these data structures, You can implement more efficient counters and random number generators.

Add a lightweight thread pool -- executor

Large
Most concurrent applications are managed based on tasks. Generally, we create a separate thread for each task to execute. This will bring about two problems: 1. A large number
Threads (> 100) consume system resources, increase Thread Scheduling overhead, and cause performance degradation. 2. For short-lived tasks, frequent creation and elimination of threads is not a wise choice. Because
The overhead of creating and killing threads may be greater than the performance benefit of multithreading.

A more reasonable way to use multithreading is to use the thread pool (thread
Pool ). Java. util. Concurrent provides a flexible thread pool implementation: Executor
Framework. This framework can be used for asynchronous task execution and supports many different types of task execution policies. It also provides a standard method for decoupling between task submission and task execution.
Runnable describes tasks in a common way. The implementation of executor also provides support for the lifecycle and hook
Functions, such as statistical collection, application management, and monitoring.

Executing task threads in the thread pool can reuse existing threads to avoid creating new threads.
. This reduces the overhead of thread creation and elimination when processing multiple tasks. At the same time, when the task arrives, the working thread usually exists, and the waiting time for creating the thread does not delay the execution of the task.
High responsiveness. By appropriately adjusting the thread pool size, you can get enough threads to keep the processor busy. At the same time, you can also prevent excessive threads from competing with each other for resources, resulting in application consumption in thread management.
Excessive resources.

Executor provides some useful preset thread pools by default, which can be created by calling the static factory method of executors.

  • Newfixedthreadpool: provides a thread pool with the maximum number of threads.
  • Newcachedthreadpool: provides a thread pool without the maximum number of threads.
  • Newsinglethreadexecutor: provides a single-threaded thread pool. Ensure that tasks are executed in the order specified in the task queue (FIFO, lifo, priority.
  • Newscheduledthreadpool: provides a thread pool with the maximum number of threads and supports scheduled and periodic task execution.

Use Concurrent Data Structure

Collection
The Framework has brought a lot of convenience to Java programmers, but in the multi-core era, Collection
The framework has become somewhat unsuitable. Shared data between multiple threads is always stored in the data structure, such as map, stack, queue, list, and set.
By default, these data structures in the Collection framework are not safe for multithreading. That is to say, these data structures cannot be securely accessed by multiple threads at the same time. JDK
Synchronizedcollection provides a thread-safe interface for these classes.synchronized
Keyword implementation is equivalent to adding a global lock to the entire data structure to ensure thread security.

Java. util. Concurrent
Provides more efficient collection, such as concurrenthashmap/set, concurrent1_queue,
Concurrentskiplistmap/set, copyonwritearraylist/Set
. These data structures are designed for multi-thread concurrent access, using fine-grained locks and the new lock-free algorithm. In addition to higher performance under multi-threaded conditions
Put-if-absent is an atomic function suitable for concurrent applications.

Other considerations

Do not put too much pressure on the memory system

For example
If the thread needs to allocate memory during execution, this will not cause problems in Java. The modern JVM is highly optimized, and it usually keeps one block for each thread
Buffer. In this way, as long as the buffer is not used up, you do not need to deal with the global heap. JVM
Memory will have to be allocated to the global heap, which usually results in a serious reduction in scalability. In addition, the pressure on GC will further reduce program scalability. Although we have parallel
GC, but its scalability is usually not ideal. If a program that runs cyclically needs to allocate temporary objects for each execution, we can consider using threadlocal and
Softreference technology to reduce memory allocation.

Use threadlocal

Threadlocal
Class can be used to save the state information of the private thread, which is very convenient for some applications. Generally, it has a positive impact on scalability. It can provide a private variable for each thread, So multiple threads
No need to synchronize between them. It should be noted that before JDK 1.6, threadlocal has a very inefficient implementation. If you need to use threadlocal in JDK 1.5 or an older version
Threadlocal, You need to carefully evaluate its impact on performance. Similarly, currently, in JDK 6, reentrantreadwritelock
The implementation is also quite inefficient. If you want to improve the scalability by using the non-mutex feature between read locks, you also need to perform profile to confirm its applicability.

The lock granularity is very important.

Rough
The granularity global lock ensures thread security while compromising application performance. Careful consideration of the lock granularity is important when building highly scalable Java applications. When the CPU
When the number and number of threads are small, the global lock will not cause fierce competition, so the cost of getting a lock is very small (JVM has optimized this situation ). With the CPU
As the number and number of threads increase, the competition for global locks becomes increasingly fierce. Except for one CPU that acquires the lock, other CPUs that attempt to obtain the lock can continue to work.
Can only be idle, resulting in the entire system CPU
The utilization rate is too low, so can the system be fully utilized. When we encounter a highly competitive global lock, we can try to divide the lock into multiple fine-grained locks, each of which protects part of the shared resources. Subtract
The granularity of the lock can reduce the degree of competition of the lock. Java. util. Concurrent. concurrenthashmap improves
The performance of hashmap in multi-threaded applications. In concurrenthashmap, the default constructor uses 16 locks to protect the entire hash map.
. Users can use thousands of locks through parameter settings, which is equivalent to dividing the entire hash map into thousands of fragments, each of which uses one lock for protection.

Conclusion

Check the hotspot area in the profile result by selecting an appropriate profile tool. Use data structures, thread pools, and fine-grained locks suitable for multi-threaded access to reduce hotspot areas. Repeat this process to continuously improve application scalability.

It is not easy to build highly scalable Java applications on multiple cores. Reducing conflicts and synchronization between threads is the key to improving scalability. Some common tools and techniques introduced in this article can help programmers, but more situations depend on specific applications.



Back to Top

Download

Description Name Size Download Method
Java program examples used in this article Javascale.zip 10 KB HTTP
Information about the Download Method

References

  • In the Kunming open-source project
    Download the source code of all projects from the website.
  • In the author's blog
    There is a simple program to measure the computing performance of multiple cores. Although this example is based on C ++, its conclusion is equally applicable to Java programs.
  • View Python documentation
    Obtain more information about global intepreter lock and the reasons for its existence.
  • Download the open-source tool performance inspector.
    To observe the use of the lock.
  • Refer to Brian's Java Theory and Practice: popular Atoms
    Obtain more information about atomicinteger.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.