Background
The alarm subsystem monitors some metric data for all ports of 40,000 large network elements and determines whether or not an alarm is generated based on the threshold configuration. Acquisition--Data processing subsystem collects 240,000 data each 5 minutes and sends 240,000 messages to the alarm subsystem, 240,000 messages involving 1 million metric data of the entity. The alarm subsystem uses multi-node deployment mode to share the pressure, and each node handles the data of different network element types, different entities and different indexes. Massive data filtering, will inevitably use a large number of set logic operations, improper use, it will cause performance bottlenecks.
Example
There is the dynamic change of the entity of the alarm node monitoring, so each alarm node needs to maintain its own monitoring list dynamically, so the code will use the Collection.removeall to calculate the difference set, calculate the new entity, and then further calculate the historical average of these new entities, and the bad data.
PackageCom.coshaho.hash;Importjava.util.ArrayList;Importjava.util.List; Public classHashobject { Public Static voidMain (string[] args) {List<String> List1 =NewArraylist<string>(); List<String> List2 =NewArraylist<string>(); //2000-Length list seeking difference set for(inti = 0; I < 2000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } LongStartTime =System.currenttimemillis (); List1.removeall (LIST2); LongEndTime =System.currenttimemillis (); System.out.println ("List remove all cost:" + (Endtime-starttime) + "Ms."); //10000-Length list seeking difference setlist1.clear (); List2.clear (); for(inti = 0; I < 10000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } startTime=System.currenttimemillis (); List1.removeall (LIST2); EndTime=System.currenttimemillis (); System.out.println ("10000 list remove all cost:" + (Endtime-starttime) + "Ms."); //50000-Length list seeking difference setlist1.clear (); List2.clear (); for(inti = 0; I < 50000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } startTime=System.currenttimemillis (); List1.removeall (LIST2); EndTime=System.currenttimemillis (); System.out.println ("50000 list remove all cost:" + (Endtime-starttime) + "Ms."); }}
In the above code, we have calculated the difference set for the list of length 2000,10000,50000, and time is as follows:
list Remove all cost:46ms. 10000 list Remove all cost:1296ms. 50000 list Remove all cost:31028ms.
As you can see, the amount of data increased by 5 times times, the ArrayList of the difference set operation time consumption increased by 30 times times. When we do the hundreds of thousands of-element differential set operation, the time consumption is unacceptable to us.
Equals
In entity filtering, in order to find the Entity data that we care about, we will certainly use Collection.contains to filter the entity ID, which uses the string equals method to determine whether two IDs are equal. For us, the meaning of two strings equals the same length of two strings, and the character encoding of the corresponding position is equal. If a large number of string 22 comparisons using the above algorithm, it will be a huge number of operations, consuming a lot of performance. At this time, the role of hashcode is particularly important.
Hashcode
Hashcode is an int type. Two objects if equal (equals is true), then hashcode must be equal, whereas two objects ranging from hashcode to equals must be false. The best hash algorithms, unequal object hashcode are not the same, all equals comparison only calls Hashcode's identity comparison, then the computational amount is greatly reduced. In fact, any hash algorithm can not achieve the above requirements (hashcode is an int type, indicating that the Hashcode range is limited, the object exceeds the number of int value range, it is inevitable that the unequal object corresponding to the same hashcode value). Unequal objects correspond to the same hashcode called hash collisions.
However, the probability of hash collision is very low in a good hash algorithm. For example, 0.01% of the hash collision probability, which means that we averaged 10,000 unequal objects of equals comparison, only one hash conflict, it means that only one call to the main logic of equals. When we design the Equals method, we compare the two objects hashcode equality, and the inequality returns false, equal to the main logical comparison of equals.
The original Hashcode method is implemented locally by the virtual machine and can be used to operate on the object address. The Hashcode method is replicated in string, with the following code:
//Object Public native inthashcode (); //String Public inthashcode () {inth =Hash; if(h = = 0 && value.length > 0) { CharVal[] =value; for(inti = 0; i < value.length; i++) {h= * H +Val[i]; } Hash=h; } returnh; }
HashMap
HashMap is a container that uses key hashcode to hash storage. It uses an array---linked list--red-black tree to store data. Structures such as:
The simplest scenario, when calculating the position of a key in an array, uses the hashcode% array length to calculate the remainder (in fact, the JDK uses a better hashing algorithm). As you can imagine, with the same hash algorithm, the longer the array length, the smaller the probability of hash collisions, but the greater the space used.
The JDK defaults to 0.75 as the ratio of the element capacity to the array length. The default initialization array length is 16 (the use of 2 n is to consider hashmap capacity), when the number of elements increased to 16*0.75=12, the array length will automatically increase by one times, the element position will be recalculated. In the case of a huge amount of data, we should consider initializing enough array lengths when initializing HashMap, especially if performance first, we can also appropriately reduce the ratio of element capacity to array length. HashMap Part Source:
/*** The default initial capacity-must be a power of. */ Static Final intdefault_initial_capacity = 1 << 4;//aka /*** The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with a Rguments. * Must be a power of <= 1<<30. */ Static Final intmaximum_capacity = 1 << 30; /*** The load factor used when none specified in constructor. */ Static Final floatDefault_load_factor = 0.75f; /*** Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @paraminitialcapacity The initial capacity *@paramloadfactor the load factor *@throwsIllegalArgumentException If the initial capacity is negative * or the load factor is nonpositive */ PublicHashMap (intInitialcapacity,floatloadfactor) { if(Initialcapacity < 0) Throw NewIllegalArgumentException ("Illegal initial capacity:" +initialcapacity); if(Initialcapacity >maximum_capacity) initialcapacity=maximum_capacity; if(loadfactor <= 0 | |Float.isnan (loadfactor))Throw NewIllegalArgumentException ("Illegal load factor:" +loadfactor); This. Loadfactor =Loadfactor; Threshold=initialcapacity; Init (); } /*** Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load Factor (0.75). * * @paraminitialcapacity the initial capacity. * @throwsIllegalArgumentException If the initial capacity is negative. */ PublicHashMap (intinitialcapacity) { This(initialcapacity, default_load_factor); } /*** Constructs an empty <tt>HashMap</tt> with the default initial capacity * (+) and the default L Oad factor (0.75). */ PublicHashMap () { This(default_initial_capacity, default_load_factor); }
Big Data set Operation performance considerations
From the above analysis, we know that in the performance-first scenario, the big data set operation must use the hash set (hashmap,hashset,hashtable) to store the data. At the beginning of the article, we modify the set to use Hashset.removeall, the code is as follows:
PackageCom.coshaho.hash;Importjava.util.Collection;ImportJava.util.HashSet; Public classHashobject { Public Static voidMain (string[] args) {Collection<String> List1 =NewHashset<string>(); Collection<String> List2 =NewHashset<string>(); //2000-Length list seeking difference set for(inti = 0; I < 2000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } LongStartTime =System.currenttimemillis (); List1.removeall (LIST2); LongEndTime =System.currenttimemillis (); System.out.println ("List remove all cost:" + (Endtime-starttime) + "Ms."); //10000-Length list seeking difference setlist1.clear (); List2.clear (); for(inti = 0; I < 10000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } startTime=System.currenttimemillis (); List1.removeall (LIST2); EndTime=System.currenttimemillis (); System.out.println ("10000 list remove all cost:" + (Endtime-starttime) + "Ms."); //50000-Length list seeking difference setlist1.clear (); List2.clear (); for(inti = 0; I < 50000; i++) {List1.add ("" +i); List2.add ("" + (i + 1)); } startTime=System.currenttimemillis (); List1.removeall (LIST2); EndTime=System.currenttimemillis (); System.out.println ("50000 list remove all cost:" + (Endtime-starttime) + "Ms."); }}
The results are as follows:
list Remove all cost:31ms. 10000 list Remove all cost:0ms. 50000 list Remove all cost:16ms.
The use of Java performance optimization--hashcode