Some changes to the internal implementation of the string class in Java 1.7.0_06 "Go"

Source: Internet
Author: User

Original link: Java-performance translation: importnew.com-Xiacholin
Link: http://www.importnew.com/7656.html

ChangeLog:

    • November 19, 2013, updated the changes in the JAVA8 version.
    • 013 November 28, updated the changes in the Java 7u40 version. (Thanks to Sunny Chan and his colleagues for prompting me to focus on the new version of the JDK)

Share a base char[]

There are 4 non-static variables in the original string class:

    • Char[] value is used to store strings.
    • int offset is used to record the corresponding subscript of the first letter of the string in the value array.
    • The int count is used to record the length of the string.
    • The int hash is used to cache the hash value of the string.

As you can see, most of the string objects will be offset=0 and count=value.length. Create an object unless you create a string object by calling the String.substring method, or indirectly by calling Pattern.split such an API.

The string object created by String.substring will share the same internal variable char[] value as the original string object, and the benefits of this design are:

    1. Save memory overhead by sharing strings.
    2. The time complexity of the String.substring method is O (1).

However, such a design could lead to memory leaks: If you extract a very short substring from a long string object, when the string object is no longer needed (the object waits for GC reclamation), Your substring also holds a reference to the char[] value array that stores the full string in the string object. The workaround for this scenario is to create a new substring object from the constructor new string (string), thereby releasing the dependency between the short substring and the long mother string.

Since the Java 1.7.0_06 version (including the latest version of Java 8), there are no more offset and count variables in the string class. This means that the member variable char[] value will not be shared. You can forget the above description of memory leaks and how to use the new string method to avoid memory leaks. However, it should be remembered that string.substring is now a linear level of time complexity, which is no longer a constant level of time complexity.

The change of the hashing algorithm

The following sections are only available for Java version 7 above Java 7u6, which have been removed in Java 8.

Another variation of the string class in this update is a new hashing algorithm. Oracle indicates that the new algorithm generates a better hash distribution and can improve the performance of the hash algorithm-based container, such as HashMap, Hashtable, HashSet, Linkedhashmap, Linkedhashset, Weakhashmap and Concurrenthashmap. Unlike the changes introduced in the first part of this article, this part of the change is experimental in nature and is closed by default.

As you might have guessed, this part of the change applies only to the string type key. If you want to enable them, you need to set the system variable Jdk.map.althashing.threshold to a non-negative integer value (the default is-1). When using the new hashing method, this value will be the container size threshold. It is important to note that the hashing method is not updated until it is hashed (rehash). Therefore, if the container last performed a heavy hash at size=160, and jdk.map.althashing.threshold = 200, the hash method is updated only if the container's size grows to about 320.

The string class now has a Hash32 () method, and its results are deferred in the member variable int hash32. The biggest change in this approach is that the result of executing hash32 () on different JVMs by the same string may be different (to be exact, it will be different in most cases, because it internally calls a System.currenttimemillis () and two times system.nanotime () to initialize seed. Therefore, some containers have different iteration sequences each time the program is run.

In fact, I was a little surprised by the change in this method. If the original Hashcode method works well, why do we need a new hashing method? I decided to use the test program in the article Hashcode method performance tuning to test how many duplicate hashes are generated using the Hash32 method.

The String.hash32 () method is not public, so I can only find a way to call String.hash32 () by looking at the source code of the HashMap. The answer is Issun.misc.Hashing.stringHash32 (String).

Testing using the same dataset (consisting of 1 million different keys), String.hash32 generated 304 duplicate hashes, compared to String.hashcode, which did not generate a duplicate hash value. I think we need to await Oracle's further refinement or more usage scenarios.

A new hashing algorithm can severely affect high concurrency, multithreaded code

This section is for the Java 7 version of build 6 (contains build 6) to BUILD40 (not including BUILD40). This part of the code has been removed in Java 8. See the next section for a description of the Java 7u40 version above.

Oracle left behind a bug:hashmap, Hashtable, HashSet, Linkedhashmap, Linkedhashset, and Weakhashmap in the hash implementations of these classes. Only Concurrenthashmap is unaffected. This is because all non-concurrent classes now introduce the following member variables:

12345 /**   * A randomizing value associated with this instance which is applied to   * hash code of keys to do hash collisions harder to find.   */ transient final int hashseed = sun.misc.Hashing.randomHashSeed ( this

This means that the Sun.misc.Hashing.randomHashSeed method is called during each map or set instance creation. Randomhashseed subsequent calls to the Java.util.Random.nextInt method. The random class is known for being unfriendly in its multithreaded environment: it has a member variable of type atomic private final Atomiclong Seedfield. The atomic type performs well in the case of low-level, or moderate, multi-threaded scenarios, but is poorly performing in highly competitive scenarios.

As a result, many high-payload Web applications that handle Http/json/xml requests may be affected by this bug because the existing parser uses the above-mentioned container for the presence of a bug when it represents a name-value pair (name-value). These parsers are also likely to use nested maps, which further increases the number of map instances created per second.

How to solve this problem?

1. Use Concurrenthashmap : The Randomhashseed method is called only when the system variable Jdk.map.althashing.threshold is set. Unfortunately, this approach is only available to the core developers of the JDK.

12345678910111213 /** * A randomizing value associated with this instance that is applied to * hash code of keys to make hash collisions harder to find. */private transient final int hashSeed = randomHashSeed(this);private static int randomHashSeed(ConcurrentHashMap instance) {    if (sun.misc.VM.isBooted() && Holder.ALTERNATIVE_HASHING) {        return sun.misc.Hashing.randomHashSeed(instance);    }    return 0;}

2. Hacker Way : Modify the Sun.misc.Hashing class, this way is extremely not recommended. But if you still want to solve this bug, the idea is: the Java.util.Random class is not final. You can include a thread local subclass of the Random class in Java 7: Java.util.concurrent.ThreadLocalRandom, which uses threadlocal< internally Threadlocalrandom> (Thanks to Benjamin Possolo that I omitted the introduction of this class in my previous article). In addition, Threadlocalrandom is CPU Cache-aware: Each Threadlocalrandom instance has a 64-byte padding (the size of the cache row) behind the seed of each. This reduces the likelihood of collisions of 2 different seed in the same cache row.

You can then modify the member variable Sun.misc.Hashing.Holder.SEED_MAKER to initialize it to an instance of the random subclass (Threadlocalrandom). Don't worry that the variable is private, static, and final, and the reflection mechanism can help you.

123 publicclass Hashing {    private static class Holder {        static finaljava.util.Random SEED_MAKER;

New hashing algorithms in versions above Java 7u40 no longer affect high concurrency, multithreaded code

Oracle fixes the above bug in the Java 7u40 version.

They used the method mentioned in the previous chapter to invoke the Sun.misc.Hashing.randomHashSeed method only when the hash threshold was enabled (enabled by setting the system variable Jdk.map.althashing.threshold). Oracle modifies only two classes: HashMap and Hashtable, which in turn indirectly modify HashSet, Linkedhashmap, and Linkedhashset, because these three classes are implemented based on HashMap. The only class that hasn't been modified is weakhashmap, but I can't think of a scenario in which this class would be massively instantiated.

Related articles

Recently, this article has aroused the intense discussion in the Reddit. I recommend that readers take a look at:

    • Reddit user Bondolo article explores the reasons behind these changes.
    • String.intern methods in Java 6, 7, 8: String pool

Summarize

    • Starting with the Java 1.7.0_06 version, the String.substring method creates a new char[] value for each substring (instead of the char[] value of the shared parent string). This means that the time complexity of the String.substring method is changed from the constant order to the linear order. The benefit of this change is that the string object consumes slightly less memory (8 bytes less than before) and ensures that the String.substring method does not cause a memory leak (for more information about the Java Object memory layout, see string packing Part 1: converting characters to bytes).
    • Features in the Java 7u6+ version are removed in Java 8. From the Java 1.7.0_06 version, the String class has a second hash function: Hash32. The method is not yet public, and can only be accessed by using the reflection mechanism or by calling Sun.misc.Hashing.stringHash32 (String). This method will only be used if the size of the 7 hash-related JDK containers exceeds the threshold set by the system variable Jdk.map.althashing.threshold. This is an experimental feature and I do not recommend using this feature in code at this time.
    • Java 7u6 (contains Java 7u6) to Java 7u40 (without Java 7u40) in the version of the functionality, does not apply to Java 8. The new hash implementation introduces a performance bug that involves a non-concurrent map and set container for all versions of Java 7u6 (including Java 7u6) to Java 7u40 (which does not contain Java 7u40). This bug only affects the efficiency of multi-threaded applications creating map instances per second. Please see the third section of this article for details. The Java 7u40 version has fixed this bug.

Some changes to the internal implementation of the string class in Java 1.7.0_06 "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.