How LinkedIn engineers are optimizing their Java code

Source: Internet
Author: User
Tags rehash

The original link:How LinkedIn engineers optimize their Java code

When I recently brushed the technical blogs of major companies, I found a very good blog post on LinkedIn's technology blog. This blog post describes the LinkedIn information middle tier feed Mixer, which provides support for multiple distribution channels such as LinkedIn's web home page, University homepage, company homepage, and client (as shown).

A library called SPR ("Super") is used in the feed mixer. The blog is about optimizing the Java code of the SPR. Here's how they summarize the optimization experience.

1. Take care of the cyclic traversal of Java

A list traversal in Java can be a lot more cumbersome than it looks. Take the following two-paragraph code as an example:

    • a:
      private final list<bar> _bars;for (bar bar: _bars) {//do important stuff} 


       
    • b:
      privatefinallist<bar>_bars;for (Inti=0;i<_bars.size (); i++) {barbar=_bars.get (i);//Do Important stuff} 

When code a executes, an iterator is created for the abstract list, and code B uses it directly get(i) to get the element, eliminating the overhead of the iterator relative to code a.

In fact, there is a need for some trade-offs. Code a uses iterators to ensure that the time complexity of acquiring an element isO (1)(Using thegetNext()AndhasNext()method), the final time complexity isO (n)。 But for code B, every time the loop is called_bars.get(i)Time-consuming complexity of theO (n)(Assuming the list is a linkedlist), then the time complexity of the entire loop of the final code B isO (n^2)(But if the list inside code B isArrayList,Thatget(i)Time complexity of the method isO (1))。 So when deciding which traversal method to use, we need to consider the underlying implementation of the list, the average length of the list, and the memory used. Finally, because we need to optimize the memory, plusArrayListIn most cases, the time complexity of the lookup isO (1), and finally decided to select the method used by code B.

2. Estimate the size of the collection at initialization time

From this Java document we can see that "a HashMap instance has two factors that affect its performance: the initial size and load factor (load factor)." [...] When the hash table size reaches the initial size and the load factor product, the hash table will be rehash operation [...] if you want to store multiple mapping relationships in a HashMap instance, We need to set a large enough initialization size to store the mapping relationship more efficiently instead of letting the hash table grow automatically so that it rehash, causing performance bottlenecks. "

In the practice of LinkedIn, it often comes across the need to traverse aArrayListand save these elements to theHashMap Go inside. Put thisHashMapInitializing the expected size avoids the overhead of re-hashing. The initialization size can be set to the input array size divided by the result value of the default load factor (take 0.7 here):

  • pre-optimized code:
    hashmap<string,foo> _map;void addobjects (list<foo> input) {_map = new hashmap<string   , foo> ();  for (Foo f:input) {_map.put (F.getid (), f); }} 


     

  • Optimized code
    Hashmap<string,foo>_map;voidaddobjects (List<foo>input) {_map=newhashmap<string,foo> ((int) Math.ceil (Input.size ()/0.7)); for (Foof:input) {_map.put (F.getid (), f);}}


3. Calculation of deferred expressions

In Java, all method parameters are evaluated (left to right) before the method call, as long as the method argument is an expression. This rule will cause some unnecessary actions. Consider the following scenario: usingComparisonChainCompare twoFooObject. One advantage of using such a comparison chain is that as long as a CompareTo method returns a non-0 value in the comparison process, the entire comparison ends, avoiding a lot of meaningless comparisons. For example, the objects to be compared in this scenario are the first to consider their score, then position, and finally_barThis property has:

public class Foo {private float _score;private int _position;private Bar _bar, public int compareTo (foo other) {return Co Mparisonchain.start (). Compare (_score, Other.getscore ()). Compare (_position, other.getposition ()). Compare (_ Bar.tostring (), Other.getbar (). toString ()). Result;}}

But the way it's implemented is always two.StringObject to savebar.toString()Andother.getBar().toString()Values, even if the comparison of the two strings may not be required. To avoid this overhead, you can implement a comparator for the bar object:

Publicclassfoo{privatefloat_score;privateint_position;privatebar_bar;privatefinalbarcomparatorbar_comparator= Newbarcomparator (); Publicintcompareto (fooother) {Returncomparisonchain.start (). Compare (_score,other.getscore ()). Compare (_position, Other.getposition ()). Compare (_bar,other.getbar (), bar_comparator). result (); privatestaticclassbarcomparatorimplementscomparator<bar>{@Overridepublicintcompare (Bara,barb) { Returna.tostring (). CompareTo (B.tostring ());}}

4. Pre-compiling regular expressions

The operation of a string is considered a costly operation in Java. Fortunately, Java provides some tools to make regular expressions as efficient as possible. Dynamic regular expressions are relatively rare in practice. In the next example, each invocation String.replaceAll() contains a constant pattern applied to the input values. So we pre-compile this pattern to save CPU and memory overhead.

  • optimize before:
    private string transform (string term) {return outputterm = Term.replaceall (_regex, _replacement);} 

  • After optimization:
    Privatefinalpattern_pattern=pattern.compile (_regex);p rivatestringtransform (stringterm) {StringoutputTerm=_ Pattern.matcher (term). ReplaceAll (_replacement);}

5. Cache it as much as possible if you can

Storing the results in the cache is also a way to avoid excessive overhead. But the cache only applies to the same data operations (such as preprocessing of some configurations or some string processing) that are used in the same data set. There are now multiple LRU (Least recently used) cache algorithms implemented, but LinkedIn uses the guava cache (see here for specific reasons) with the approximate code as follows:

privatefinalintmax_entries=1000;privatefinalloadingcache<string,string>_cache;//Initializing the Cache_ Cache=cachebuilder.newbuilder (). MaximumSize (max_entries). Build (Newcacheloader<string,string> () {@ Overridepublicstringload (Stringkey) Throwsexception{returnexpensiveoperationon (key);}); Using the cachestringoutput=_cache.getunchecked (input);

6. The Intern method of string is useful, but it is also dangerous

The intern feature of String can sometimes be used instead of caching.

From this document, we can know:
" A Pool of strings, initially empty, is maintained privately by the Class String. When the Intern method was invoked, if the pool already contains a string equal to this string object as determined by the Equals (Object) method, then the string from the pool is returned. Otherwise, this string object was added to the pool and a reference to this string object is returned ".

This feature is similar to caching, but there is a limitation that you cannot set the maximum number of elements that can be accommodated. Therefore, if these intern strings are not limited (for example, strings represent some unique IDs), then it can make memory consumption grow fast. LinkedIn used to stumble on this--it was the intern method for some key values, and everything was fine when the offline simulation was done, but once deployed, the system's memory took up (because a large number of unique strings were intern). So finally LinkedIn chooses to use the LRU cache, which limits the maximum number of elements.

Final result

The SPR memory footprint is reduced by 75%, reducing the memory footprint of the Feed-mixer by 50% (as shown). These optimizations reduce the generation of objects, thus reducing the frequency of GC, and the latency of the entire service is reduced by 25%.

Note: This article is translated from the LinkedIn technology blog.

How LinkedIn engineers are optimizing their Java code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.