JVM performance optimizations for improved Java scalability

Last Update:2015-03-21 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many programmers spend a lot of time tuning application-level performance bottlenecks as they address JVM performance issues, and when you finish reading this series you will find that I might look at this kind of problem more systematically. I've said that the JVM's own technology limits the scalability of Java enterprise-class applications. First, let's start by listing some of the leading factors.

The mainstream hardware server provides a lot of memory
Distributed systems have a lot of memory requirements, and the demand continues to grow
A common Java application holds approximately 1gb~4gb of space, much less than a hardware server's memory management capability and the memory requirements of a distributed application. This is referred to as the Java memory Wall, as shown in the diagram showing the evolution of memory usage in Java application servers and general Java applications.

Figure 1 Java Memory Wall (1980~2010)

Java Memory Wall

This brings us to the following JVM performance issues:

1) If the memory allocated to the application is too small, it will cause insufficient memory. The JVM cannot free up memory space to the application in time, eventually throwing out memory or the JVM shutting down completely. So you have to provide more memory to the application.

2) If you add memory to a response time-sensitive application, the Java heap will eventually fragment if you do not restart your system or optimize your application. When fragmentation occurs, it can cause the app to break for 100 milliseconds ~100 seconds, depending on your Java application, the size of the Java heap, and other JVM tuning parameters.

Most of the discussion about pauses is focused on average pauses or target pauses, which rarely involve a worst-case pause in heap compression, and approximately 1 second pauses in the production environment for every gigabyte of valid data in the heap.

A 2-4-second pause is unacceptable for most enterprise applications, so although the actual Java application instance may require more memory space, it actually allocates only 2~4gb of memory. In some 64-bit systems, there are many JVM tuning items for scalability, allowing these systems to run 16GB or even 20GB of heap space, and to meet the SLA for typical response times. But these are far from reality, and the current technology of the JVM does not prevent the application from pausing while heap compression is in progress. Java application developers are struggling with these two tasks that most of us complain about.

Architecture/modeling is based on a large number of instance pools, followed by complex monitoring and management operations.
Repeated JVM and application tuning to avoid "Stop the world" caused by the pause. Most programmers want a pause not to occur during peak system load. I call it an impossible goal.

Now let's go into a little bit of Java scalability issues.

Over-provisioning or over-instantiation of Java deployments

In order to make full use of memory resources, it is common practice to deploy Java applications on multiple application server instances rather than on one or a few application server instances. While running 16 application server instances on one server can take full advantage of all memory resources, it is not possible to address the cost of multi-instance monitoring and management, especially if your application is deployed on multiple servers.

Another problem is that memory resources at peak loads are not needed every day, creating a huge waste. In some cases, a physical machine may not be more than 3 "Large application server instances", such deployments are less economical and less environmentally friendly, especially during non-peak load periods.

Let's compare the two deployment architectures, where the left side is the multiple and small Application Server instance deployment mode, and the right side is a small and large application server instance deployment model. Both modes handle the same load, which is more economical in which deployment architecture.

Figure 2 Large Application Server deployment scenario

From: Azul Systems

As I said before, concurrent compression makes the large application server deployment mode feasible and can break the limits of JVM scalability. At present, only the Azul Zing JVM can provide concurrent compression technology, and Zing is the server side of the JVM, we are happy to see more and more developers at the JVM level to challenge Java scalability issues.

Because performance tuning is still the primary way to solve the problem of Java scalability, let's start by looking at the main tuning parameters and what they can achieve.

Tuning parameters: Some examples

The most famous tuning parameter is "-XMX", which allows you to specify the heap size of Java, which may actually vary from JVM execution to different results.

Some JVMs contain internal structures (such as compiler threads, garbage collector structures, code caches, and so on) that require memory in the "-xmx" setting, while others do not. Therefore, the size of the user Java process does not necessarily match the "-xmx" setting.

If your application allocates an object's rate, the object's life cycle, or the size of the object exceeds the JVM's memory-related configuration, a memory overflow will occur as soon as the maximum available memory threshold is reached, and the user process will stop.

When your application is obsessed with the availability of memory, the most efficient way is to use "-xmx" to specify a larger memory to restart the current application process. To avoid frequent reboots, most enterprise production environments tend to specify the memory required for peak loads, resulting in over-provisioning optimizations.

Hint: adjustment of production environment load

Common mistakes that Java developers make are the heap memory settings that are made under the experiment, and the migration to the production environment is forgotten to readjust. The production and lab environments are different, so remember to readjust the heap memory settings based on the load of the production environment.

Generational garbage collector tuning

There are also other optimization options, "-xns" and "-xx:newsize", that are used to adjust the size of the young generation to specify the amount of space in the heap that is specifically responsible for allocating new objects.

Most developers are trying to adjust the size of the young generation based on the lab environment, which means there is a risk of failure under the production load. The average Cenozoic size is set to about One-third to One-second of the heap size, but this is not a guideline, after all, it depends on the application logic. It is therefore better to investigate the rate of metamorphosis and the size of older generations of young generation to older generations, based on this (to ensure that older generations are too small to frequently promote GC-led memory overflow errors) as much as possible to adjust the space of younger generations.

There is also a young generation-related tuning item "-xx:survivorratio", which is used to specify the life cycle of objects in the young generation, over the specified length of time related objects will be moved to older generations. In order to set this value correctly, you need to know how often the young generation of space is recycled and can estimate how long the new object will be referenced in the application process, as well as the allocation rate.

Concurrent garbage Collection Tuning

For pause-sensitive applications, concurrent garbage collection is recommended, although parallel methods can lead to very good throughput benchmark scores, but parallel GC is not conducive to shorter response times. Concurrent GC is currently the only effective way to achieve consistency and minimal "Stop the world" interrupts. Different JVMs provide different settings for concurrent GC, and Oracle JVM (hotspot) provides "-XX:+USECONCMARKSWEEPGC", and future G1 will become the default concurrent garbage collector for Oracle JVM.

Performance tuning is not a real solution

Perhaps you have noticed that in the previous discussion on how to set this parameter "correctly", I deliberately added double quotation marks to the word "correct". That's because, in my personal experience, when it comes to performance parameter tuning, there is no correct setting in the strictest sense. Each set value is for a specific scenario. JVM performance tuning is at best a stopgap measure, given the changes in the application scenario.

Take the heap setting as an example: if the 2GB heap can handle 200,000 concurrent users, it may not be able to cope with 400,000 of concurrent users.

Let's take "-xx:survivorratio" as an example: what happens when the pressure reaches 50,000 transactions per millisecond when the setting meets a scenario where the load continues to grow up to 10,000 transactions per millisecond?

Most enterprise application workloads are dynamic, and technologies such as dynamic memory management and dynamic compilation of the Java language make Java more suitable for enterprise-class applications. Let's take a look at two configuration lists.

Listing 1. Startup options for application (1)

>java-xmx12g-xx:maxpermsize=64m-xx:permsize=32m-xx:maxnewsize=2g-xx:newsize=1g-xx:survivorratio=16-xx:+ useparnewgc-xx:+useconcmarksweepgc-xx:maxtenuringthreshold=0-xx:cmsinitiatingoccupancyfraction=60-xx:+ cmsparallelremarkenabled-xx:+usecmsinitiatingoccupancyonly-xx:parallelgcthreads=12-xx:largepagesizeinbytes= 256m ...

Listing 2. Startup options for application (2)

>java–xms8g–xmx8g–xmn2g-xx:permsize=64m-xx:maxpermsize=256m-xx:-O mitstacktraceinfastthrow-xx:survivorratio= 2-XX:-USEADAPTIVESIZEPOLICY-XX:+USECONCMARKSWEEPGC-XX:+CMSCONCURRENTMTENABLED-XX:+CMSPARALLELREMARKENABLED-XX: +CMSPARALLELSURVIVORREMARKENABLED-XX:CMSMAXABORTABLEPRECLEANTIME=10000-XX:+USECMSINITIATINGOCCUPANCYONLY-XX: CMSINITIATINGOCCUPANCYFRACTION=63-XX:+USEPARNEWGC–XNOCLASSGC ...

The configuration of the two differs greatly because they are two different applications. The sense of "correct" configuration and tuning is done according to the respective application. It works well in a lab environment, but in the production environment it will eventually show fatigue. Listing 1 shows poor performance in a production environment due to the fact that the dynamic load is not taken into account. Listing 2 does not take into account the changes in the application's characteristics in the production environment. These two situations should be attributed to the development team, but where is the blame?

Is the workaround feasible?

Some enterprises define the ultimate object reclaim space by precisely measuring the size of the transaction object and "streamline" its architecture to fit the space. This may be the way to cut debris to deal with a full day of trading (without heap compression). There is also a way to ensure that objects are referenced in a relatively short period of time in order to prevent them from being moved to the old generation after survivorratio time and be directly recycled to avoid memory compression scenarios. Both of these options are available, but there are challenges for application developers and designers.

Who protects the performance of the application?

A portal app may fail at its peak of active load, and a trading application may not function every time the market falls and rises; e-commerce sites may not be able to cope with the peak shopping season. These are real-world cases that are mostly caused by tuning JVM performance parameters. When economic losses occur, the development team is blamed. Perhaps the development team should be blamed on some occasions, but what kind of responsibility should the JVM's provider take?

First, the JVM provider should give priority to tuning parameters, at least in the short term. Some of the new tuning preferences are for specific, emerging enterprise application scenarios. More tuning preferences are designed to offload performance optimizations to application developers by mitigating the workload of the JVM support team. But I personally think that this will lead to a much longer support load, and some tuning options for the worst scenarios will be postponed, of course not indefinitely.

There is no doubt that the JVM's development team is also working on their work, and only the application implementers will be more aware of the specific needs of their application. However, the application's implementers or developers are not able to predict the dynamic load requirements of the period. In the past, JVM providers have also analyzed the performance and scalability issues of Java and what they can solve. Instead of providing tuning parameters, it is straightforward to optimize or innovate garbage collection algorithms. More interesting is that we can imagine what would happen if the OPENJDK community came together and reconsidered the Java garbage collector!

Benchmark for JVM Performance

Tuning parameters are sometimes used by JVM providers as a competitive tool, because different tuning can improve the performance of their jvms in a predictable environment, and the last article in this series will investigate these benchmarks to measure the performance of the JVM.

Challenges for JVM Developers

A true enterprise scalability requirement is a requirement that the JVM be able to adapt to dynamic and flexible application workloads. This is the key to ensuring consistent performance over a specific throughput and response time. This is the JVM Developer's mission to complete the history, so it's time to call on our Java developer community to meet the real Java scalability challenges.

L Continuous Tuning

For a given application, it is necessary to tell how much memory is needed at the outset, and then the JVM should be responsible for the work, and the JVM needs to adapt the dynamic application load and running scenario.

• Number of JVM instances vs. instance extensibility

Now that the server supports very large memory, why can't the JVM instance use it effectively? Deploying application splits on many small application server instances is a waste from an economic and environmental perspective. Modern JVMs need to keep up with the development trend of hardware and applications.

• Real-world performance and scalability

Enterprises do not need to perform extreme performance tuning for their application performance requirements. The JVM provider and the OPENJDK community need to address the core issues of Java Scalability and eliminate the "Stop the World" operation.

Conclusion

If the JVM does this and provides a concurrent compressed garbage collection algorithm, the JVM is no longer a limiting factor in Java scalability, and Java application developers do not have to spend a painful time understanding how to configure the JVM for optimal performance, so there will be more interesting Java application-level innovations , rather than an endless JVM tuning. I'm going to challenge the JVM developers and what the provider needs to do to respond to Oracle's campaign of making the Java future.

About the author

Eva Andearsson has more than more than 10 years of experience in JVM technology, SOA, cloud computing, and other enterprise-class middleware solutions. In 2001, she joined the start-up company appeal Virtual Solutions (formerly known as BEA) as a jrockit JVM developer. In the field of garbage collection research and algorithms, Eva obtained two patents. In addition, she proposed deterministic garbage collection (deterministic garbage Collection), which later formed the JRockit Real Time system (JRockit real). Technically, EVA works closely with Sun and Intel, involving many projects that integrate JRockit product lines, WebLogic and coherence. In 2009, Eva joined Azul system as a product manager. Responsible for the development of the new Zing Java platform. Recently, she Hiddink, as a senior product manager, to join Cloudera company, responsible for the management of Cloudera company Hadoop distributed Systems, dedicated to the development of a highly scalable, distributed data processing framework.

JVM performance optimizations for improved Java scalability

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More