How Linkedin engineers optimize their Java code, linkedinjava

Last Update:2014-12-08 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How Linkedin engineers optimize their Java code, linkedinjava

Link to the original article: how Linkedin engineers optimized their Java code

Recently, I found a very good blog on Linkedin's technical blog when I was refreshing technical blogs of major companies. This blog post introduces the middle layer of the Linkedin information stream, Feed Mixer, which provides support for multiple distribution channels such as the Linkedin Web homepage, university homepage, company homepage, and client (as shown in ).

A library called SPR ("super") is used in Feed Mixer. Blog posts talk about how to optimize SPR java code. The following is their optimization experience.

1. Be cautious with Java loop Traversal

List traversal in Java is much more troublesome than it looks. Take the following two sections of code as an example:

private final List<Bar> _bars;for(Bar bar : _bars) {    //Do important stuff}

privatefinalList<Bar>_bars;for(inti=0;i<_bars.size();i++){Barbar=_bars.get(i);//Do important stuff}

When code A is executed, it creates an iterator for this abstract list, and code B directly usesget(i)To obtain elements, saving the iterator overhead compared with code.

In fact, there are still some trade-offs. Code A uses an iterator to ensure that the time complexity of getting elements isO (1)(UsedgetNext()AndhasNext()Method), the final time complexity isO (n). For code B_bars.get(i)The time complexity isO (n)(Assume that this list is an aggregate list), the time complexity of the entire loop of the final code B isO (n ^ 2)(But if the list in code B isArrayList，Thatget(i)The time complexity of the method isO (1)). Therefore, when deciding which Traversal method to use, we need to consider the underlying implementation of the List, the average length of the list and the memory used. Finally, because we need to optimize the memoryArrayListIn most cases, the time complexity of searching isO (1)Finally, select the method used by code B.

2. Estimate the set size during initialization

We can learn from this Java document:"A HashMap instance has two factors that affect its performance: initial size and loading Factor (Load factor). […] When the size of the hash table reaches the product of the initial size and loading factor, the hash table will perform the rehash operation […] If you want to store multiple mappings in a HashMap instance, you need to set a large enough initial size to store mappings more effectively, instead of making the hash table grow automatically and then rehash, this causes a performance bottleneck."

In Linkedin practice, we often encounter the need to traverseArrayListAnd save these elementsHashMap . Set thisHashMapInitializing the expected size can avoid overhead caused by re-hash. The initialization size can be set to the size of the input array divided by the result value of the default loading Factor (here, 0.7 is used ):

Code before optimization:

HashMap<String,Foo> _map;void addObjects(List<Foo> input){  _map = new HashMap<String, Foo>();   for(Foo f: input)  {    _map.put(f.getId(), f);  }}

Optimized Code

HashMap<String,Foo>_map;voidaddObjects(List<Foo>input){_map=newHashMap<String,Foo>((int)Math.ceil(input.size()/0.7));for(Foof:input){_map.put(f.getId(),f);}}

3. latency expression Calculation

In Java, all method parameters will be calculated (from left to right) before a method is called, as long as a method parameter is an expression ). This rule causes unnecessary operations. Consider the following scenario:ComparisonChainCompare twoFooObject. One advantage of using such a comparison chain is that in the comparison process, the entire comparison ends if a compareTo method returns a non-zero value, avoiding many unnecessary comparisons. For example, in this scenario, the objects to be compared first consider their score, then position, and finally_barThis property is:

public class Foo {private float _score;private int _position;private Bar _bar; public int compareTo (Foo other) {return ComparisonChain.start().compare(_score, other.getScore()).compare(_position, other.getPosition()).compare(_bar.toString(), other.getBar().toString()). result;}}

However, the above implementation method always includes twoStringObject To savebar.toString()Andother.getBar().toString()Value, even if the two strings are not required for comparison. To avoid such overhead, You can implement a comparator for the Bar object:

publicclassFoo{privatefloat_score;privateint_position;privateBar_bar;privatefinalBarComparatorBAR_COMPARATOR=newBarComparator(); publicintcompareTo(Fooother){returnComparisonChain.start().compare(_score,other.getScore()).compare(_position,other.getPosition()).compare(_bar,other.getBar(),BAR_COMPARATOR).result();}privatestaticclassBarComparatorimplementsComparator<Bar>{@Overridepublicintcompare(Bara,Barb){returna.toString().compareTo(b.toString());}}}

4. compile regular expressions in advance

String operations are costly operations in Java. Fortunately, Java provides some tools to make regular expressions as efficient as possible. Dynamic regular expressions are rare in practice. In the following exampleString.replaceAll()All include a constant mode applied to the input value. Therefore, we can pre-compile this mode to save CPU and memory overhead.

Before optimization:

private String transform(String term) {return outputTerm = term.replaceAll(_regex, _replacement); }

After optimization:

privatefinalPattern_pattern=Pattern.compile(_regex);privateStringtransform(Stringterm){StringoutputTerm=_pattern.matcher(term).replaceAll(_replacement);}

5. Cache it as much as possible if you can

Saving results in the cache is also a way to avoid overhead. But the cache is only applicable to the same data operations on the same data set, for example, preprocessing of some configurations or processing strings ). Now there are multiple LRU (Least Recently Used) cache algorithms, But Linkedin uses Guava cache (for specific reasons, see here). The rough code is as follows:

privatefinalintMAX_ENTRIES=1000;privatefinalLoadingCache<String,String>_cache;// Initializing the cache_cache=CacheBuilder.newBuilder().maximumSize(MAX_ENTRIES).build(newCacheLoader<String,String>(){@OverridepublicStringload(Stringkey)throwsException{returnexpensiveOperationOn(key);}}); //Using the cacheStringoutput=_cache.getUnchecked(input);

6. The intern method of String is useful but dangerous.

The intern feature of String can sometimes be used instead of cache.

From this document, we can know:
"A pool of strings, initially empty, is maintained privately by the class String. when the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals (Object) method, then the string from the pool is returned. otherwise, this String object is added to the pool and a reference to this String object is returned".

This feature is similar to caching, but there is a limit that you cannot set the maximum number of elements to accommodate. Therefore, if there is no limit on these intern strings (for example, strings represent some unique IDs), it will cause a rapid increase in memory usage. Linkedin once planted a heel on it -- at that time, it used the intern Method for some key values. It was normal during offline simulation, but once it was deployed online, the memory usage of the system is immediately increased (because a large number of unique strings are intern ). So Linkedin chose to use the LRU cache to limit the maximum number of elements.

Final Result

The SPR memory usage is reduced by 75%, and the feed-mixer memory usage is reduced by 50% (as shown in ). These optimizations reduce the generation of objects, reduce the GC frequency, and reduce the latency of the entire service by 25%.

Note: This article is translated from Linkedin technical blog.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More