In Java to the list to go heavy, stream to heavy

Last Update:2018-04-12 Source: Internet

Author: User

Tags addall

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Learn a little bit of programming every day PDF ebook, video tutorial free download:
Http://www.shitanlife.com/code

Problem

Now the Internet technology matures, more and more tend to decentralized, distributed, stream computing, so many of the previous things done on the database side put on the Java side. Today, someone asked, if the database field does not have an index, how should it be weighed against that field? Everyone agrees to do it in Java, but how to do it?

Answer

Suddenly think of previously wrote list to heavy article, find out to see. This is done by rewriting the hashcode and equals methods of the objects in the list, then throwing them into the hashset and then fetching them. This was the first time I learned Java as the answer to be written back as a dictionary. For example, interview, the person who has been known to do Java for 3 years, ask the difference between set and HashMap can be back out, ask how to realize it do not know. In other words, the beginner has only the back characteristics. But when you really use it in your project, you need to make sure it's true. Because the endorsement is useless, can only believe the result. You need to know how hashset helped me to do it again. Change a train of thought, need not hashset can go heavy? The simplest and most straightforward way to do this is not to compare it with historical data each time, but to insert the tail of the team. And HashSet just sped up the process.

First, give us the object we want to sort. User

@Data@Builder@AllArgsConstructorPublicClass User {Private Integer ID;private String name;} list<user> users = Lists. newarraylist (new user ( 1,  "a"), new user (1,  "B"), new user (2,  "B"), new user ( 1,  "a");

The goal is to take out the ID of the user, in order to prevent the passing, give a rule, as long as the ID of the unique data can be taken out, do not have to focus on the same ID.

In the most intuitive way

The idea is to store the traversed data with an empty list.

@TestPublic void dis1() {list<user> result = new linkedlist<> (); For (User user:users) { Boolean b = result.  Stream (). anymatch (U-U.getId ().  Equals (user.  GetId ())); if (!b) {result.  Add (user); }} System. Out . println (result);}

With HashSet

The back of the characteristics of all know hashset can go heavy, then how to go heavy? Go a little deeper back over the hashcode and Equals method. So how do you do it according to these two? People who have not seen the source are unable to continue, and the interview is over.

In fact, HashSet is implemented by HashMap (never read the source of the time has been intuitive to think that HashMap key is hashset to achieve, on the contrary). The narrative is not unfolded here, as long as you see the HashSet and Add method to understand.

public HashSet() {    map = new HashMap<>();}/*** 显然，存在则返回false，不存在的返回true*/public boolean add(E e) { return map.put(e, PRESENT)==null;}

Then, it can also be seen that hashset to repeat is based on hashmap implementation, and HashMap is completely dependent on the implementation of Hashcode and the Equals method. This is completely through, want to use hashset must look at their own two methods.

In this topic, to be based on the ID to weight, then, our comparative basis is the ID. Modify the following:

@OverridePublic Boolean equals(Object o) {if (this = = O) {ReturnTrue }if (o = =null | |GetClass ()! = O.GetClass ()) {ReturnFalse } User user = (user) O;Return Objects.Equals (ID, user.id);  @Override public  int hashcode () {return Objects. hash (ID);} //hashcoderesult = 31 * Result + (element = = null? 0:element.

Where objects calls arrays's hashcode, as shown above. Multiply by 31 equals x<<5-x.

The final implementation is as follows:

@Testpublic void dis2() { Set<User> result = new HashSet<>(users); System.out.println(result);}

Use Java's stream to go heavy

Back to the original question, the reason for this problem is that you want to get the database side to the Java side, then the amount of data may be larger, such as 10w. For big data, using the stream correlation function is the simplest. Just the stream also provides the distinct function. So what should I do with it?

users.parallelStream().distinct().forEach(System.out::println);

I do not see a lambda as a parameter, that is, no custom conditions are provided. Fortunately Javadoc marked the de-weight standard:

of the distinct elements(according to {@link Object#equals(Object)}) of this stream.

We know that it is also necessary to recite a guideline: when Equals returns True, the return value of hashcode must be the same. This is a little bit logical in the back, but as long as you understand the way HashMap is implemented, it won't be a mouthful. HashMap is first based on the Hashcode method, then compares the Equals method.

Therefore, to use distinct to achieve de-weight, you must override the Hashcode and Equals methods unless you use the default.

So, why should we do this? Point in to see the realization.

<p_in> node<t> reduce ( Pipelinehelper<t> Helper, spliterator<p_in> spliterator) { If the stream is SORTED then it should also are ORDERED so the following would also //Preserve the sort order terminalop<t, linkedhashset<t>> reduceop = reduceops.<t, linkedhashset& Lt T>>makeref (linkedhashset::new, LinkedHashSet :: Add, Linkedhashset::addall); return nodes.

The interior is achieved with reduce, think of reduce, instantly think of a way to achieve distinctbykey. I just use reduce, the part of the calculation is to take the elements of the stream and my own built-in a hashmap comparison, there are skipped, not put in. In fact, the idea is the most straightforward way to start.

@TestPublic void Dis3() {users.Parallelstream ().filter (distinctbykey (User::getid)). public static <T> Span class= "hljs-function" >predicate<t>  distinctbykey (FUNCTION<? super T,?> Keyextractor) {set<object> seen = ConcurrentHashMap.< Span class= "Fu" >newkeyset (); return T-seen. Apply (t));

Of course, if it is a parallel stream, it is not necessarily the first one, but random.

The above method is found to date to the best, non-invasive. But if you have to use distinct. You can rewrite hashcode and equals just as you would hashset that method.

Summary

Will not use these things, you can only go to practice, or to the real need to use when it is difficult to take out suddenly, or risk. And if you really want to use it boldly, it is necessary to understand the rules and the principles of implementation. For example, the implementation of Linkedhashset and HashSet is different.

Attach the thief simple Linkedhashset source code:

PublicClass linkedhashset<E>Extends hashset<E>Implements set<E>,Cloneable,Java.Io.Serializable{PrivateStaticFinalLong Serialversionuid =-2851667679971038690L;Public Linkedhashset(IntInitialcapacity,FloatLoadfactor) {Super (initialcapacity, Loadfactor,true); }Public Linkedhashset(Intinitialcapacity) {Super (Initialcapacity,.75f,true); }Public Linkedhashset() {Super16,.75f,true); }Public Linkedhashset(COLLECTION&LT;?ExtendsE> c) {Super (Math.max (2*c.size (), < Span class= "Hljs-number" >11),. 75f, true); addall (c);}  @Override public spliterator<e> spliterator () {return spliterators.this, Spliterator.ordered); }}

Learn a little bit of programming every day PDF ebook, video tutorial free download:
Http://www.shitanlife.com/code

In Java to the list to go heavy, stream to heavy

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More