Twitter engineers talk about JVM tuning

Last Update:2016-07-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Tuning needs to focus on several aspects

Memory tuning
Tuning CPU Usage
Lock Competition Tuning
I/O tuning

Two. Twitter's biggest enemy: latency

What are some of the causes of the delay?

The biggest impact factor is the GC
Others are: Lock and thread scheduling, I/O, algorithm data structure selection is inefficient

Three. Memory Performance tuning

(1) Memory consumption tuning

OutOfMemoryError exception Reason: May be true data volume is too large, may want to display too much data, possible memory leaks

The amount of data is too large to observe and solve:

View GC logs to see memory changes before and after full GC, changes do not indicate that the amount of data is too large
Try increasing the memory usage of the JVM
Do you think that this data really needs to be in memory? can be considered using: LRU algorithm swap out and so on, weak references (Soft References)

Data bloat (Fat)

When you want to do something weird, it happens. Data takes up too much of a problem, such as: load the entire social graph onto a single JVM instance, load all the user's metadata onto a single JVM instance
Reduce internal data presentation work on a scale like Twitter

Data bloat Reason:

(1) The object head (the JVM object head generally occupies two machine code, occupies 64bit on the 32-bit JVM, occupies 128bit on the 64-bit jvm bytes, for example: New Java.lang.Object () occupies bytes; New Byte[0] bytes) More Object Header content reference: http://blog.csdn.net/wenniuwuren/article/details/50939410

(2) Fill complete

Look at an example.

public static class D {      byte D1;  }  public static class E extends D {      byte e1;  }

New D () occupies a space of bytes, and new E () occupies a space of up to bytes. Specific spatial calculation reference: http://blog.csdn.net/wenniuwuren/article/details/50958892

It is now generally 64-bit that the jvm,64-bit pointer will cause the CPU cache to be much less than 32-bit pointers, so it is recommended that the JVM parameter be added-xx:+usecompressedoops use pointer compression to compress the 64-bit pointer to 32-bit, But can use 64-bit memory space, achieve double benefit function. In addition, the maximum recommended heap is less than 30G.

Try not to use the wrapper class of the original type Object

In Scala 2.7.7: Seq[int] Save Int, the first space occupies (+ 32*length) bytes, and the second space occupies (+ 4*length) bytes.

This issue was fixed in Scala 2.8, where we can see:

You don't know the performance characteristics of the class library you're using (for example, you can use int for int)
You may never know the problem unless you run it under the profiling tool

Map space occupancy (map footprints)

Guava Mapmaker.makemap () occupies 2272 bytes
Mapmaker.concurrencylevel (1). Makemap () occupies 352 bytes

Use Thread Local carefully

A typical problem is the resource-related m*n of the online pool, such as the 200 thread pool that uses 50 connections and eventually 10,000 connection caches

Consider using a synchronization object or create a new object at a time

Four. Fight against delays

Performance Triangle

Figure 1: Lower memory footprint, lower latency, higher throughput

Figure 2: Compression (compactness, which reduces memory footprint), increased spit volume, and higher response rate

How does the new generation work?

All new objects are allocated in Eden generation, because the Cenozoic GC has compression, so memory allocation with pointer collisions
When Eden is full, make a stop-the-world Minor GC, and survive to Survivor.
After several Minor GC, the surviving objects will be lifted (tenured) to the old age

Idealized New generation operation

The Eden generation is sufficient to accommodate more than one set of concurrent request and response objects (so that there is no stop-the-world and the throughput is higher)
Each Survivor space is sufficient to accommodate active objects and age-appropriate objects (reducing premature ascension to the old age)
The elevation threshold is just the right time for long-lived objects to be promoted to older generations (making room for Survivor).

Start tuning from the new generation

Print verbose GC logs, such as opening JVM parameters:-xx:+printgcdetails,-xx:+printgcdatestamps,-xx:+printheapatgc,-xx:+printtenuringdistribution Wait a minute...
Focus on Survivor size, set the appropriate Survivor size
Focus on the elevation threshold, allowing long-lived objects to rise rapidly to the old age

(1)-XX:+PRINTHEAPATGC

Heap after GC Invocations=1 (full 0):   par new generation total   943744K, used 54474K [0x0000000757000000, 0x0000000 797000000, 0x0000000797000000)    Eden Space 838912K,   0% used [0x0000000757000000, 0x0000000757000000, 0x000000078a340000) from    space 104832K,  51% used [0x00000007909a0000, 0X0000000793ED2AE0, 0x0000000797000000 )   to Space 104832K,   0% used [0x000000078a340000, 0x000000078a340000, 0x00000007909a0000)   concurrent Mark-sweep generation Total 1560576K, used 0K [0x0000000797000000, 0x00000007f6400000, 0x00000007f6400000)   Concurrent-mark-sweep Perm Gen Total 159744K, used 38069K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000) c11/>}

(2)-xx:+printtenuringdistribution

Desired survivor size 53673984 bytes, new Threshold 4 (max 6)  -age   1:    9165552 bytes,    9165552 Total  - Age   2:    2493880 bytes,   11659432 Total  – Age   3:    6817176 bytes,   18476608 all  -age   4:   36258736 bytes,   54735344  total: 899459k->74786k (943744K), 0.0654030 secs] 1225769k-> 401096K (2504320K), 0.0657530 secs] [times:user=0.55 sys=0.00, real=0.07 secs]

CMS Tuning

The CMS collector needs more memory and allocates as much as possible.
Reduce fragmentation and avoid full GC
-xx:cmsinitiatingoccupancyfraction=n n is typically set to 75-80 (too early to start to reduce throughput, too late to start causing concurrent mode failed)

Is the response still too slow?

Minor GC When there are too many surviving objects, try to reduce the Cenozoic space, reduce the Survivor space, reduce the promotion threshold
Too many threads. Try to find the smallest concurrency level or add more JVM instances
Try using Volatile instead of synchronized to reduce lock contention and try using Atomic* 's atomic class

Dealing with the fragmentation problem of CMS with allocation slab

Apache's Cassandra internally uses slab allocations. Each slab size is 2MB, using CAS to replicate byte[] to the inside, using Cassandra before the overhead is 30-60 seconds per hour, after use in 3 days 10 hours overhead 5 seconds.

There are some limitations to using the allocation slab: cache content is written to disk when the cache is full, and objects need to be converted to binary issues.

Twitter engineers talk about JVM tuning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Twitter engineers talk about JVM tuning

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support