Read Catalogue
- Recommendation 60: Performance considerations, arrays are preferred
- Recommendation 61: Use variable-length arrays if necessary
- Recommendation 62: Beware of shallow copies of arrays
- Recommendation 63: Specify the initial capacity for the collection in a clear scenario
- Recommendation 64: A variety of best-value algorithms, timely selection
Oh, it is clear that the river is neither so shallow as the Ox Uncle said, nor so deep as the little squirrel said, only to try to know it himself.
---fable "Pony crosses the River"
Data processing is a necessary function of each language, Java is more, the dataset can be allowed to repeat, or can not allow duplication, can allow null to exist, or can not allow NULL to exist, can be automatically sorted, or not automatically sorted, can be blocked, or non-blocking, can be a stack, It can also be a queue ...
This chapter will focus on the three data sets (arrays, ArrayList, and HashMap) we use to illustrate the considerations in the development process, and thus extend to set, Quene, stack, and so on.
Back to top tip 60: Performance considerations, arrays are preferred
Arrays in the actual system development in the use of less, we usually only read some open source projects to see their figure, in Java it does not have list, set, map these collection classes are convenient, but in the basic type processing, the array is still dominant, And the bottom of the collection class is also implemented by arrays, such as summing a dataset:
1//array sum 2 public static int sum (int datas[]) {3 int sum = 0;4 for (int i = 0; i < datas.length; i++) {5
sum + = datas[i];6 }7 return sum;8 }
Sums the array of an int type, takes out all the array elements, and adds, in which case the base type uses the highest efficiency of the array, followed by the efficiency of the set. Look again with the list summation:
1//Sum of the list 2 public static int sum (list<integer> datas) {3 int sum = 0;4 for (int i = 0; i < data S.size (); i++) {5 sum + = Datas.get (i); 6 }7 return sum;8 }
Note the sum + = Datas.get (i); This line of code, here actually has done a unboxing action, the integer object through the Intvalue method automatically converted to an int basic type, for the performance of the critically critical system, the scheme is more dangerous, especially when the large number of times , first, when the list array is initialized to be boxed, an int type is wrapped into an integer object, although there is an integer pool in, but not within the scope of the whole pool will produce a new integer object, and it is well known that the basic type is in the stack memory operation, The object is the heap memory operation, the stack memory is characterized by: fast speed, small capacity, heap memory is characterized by: slow speed, large capacity (from the performance of the basic type of processing advantage). Second, when the summation operation (or other traversal calculation) to do the unpacking action, so the unnecessary performance consumption is produced. In the actual test, it was found that the efficiency of the array is 10 times times that of the set when the basic type is summed.
Note: arrays are used instead of collections in scenarios with high performance requirements.
Back to top tip 61: Use variable-length arrays if necessary
The array in Java is fixed-length, once the initialization of the declaration can not be changed, this is very inconvenient in practice, such as to the class students to statistics, because we do not know how many students in a class (at any time may have students enrolled, drop out or transfer), So it takes a large enough array to accommodate all the students, but how big is the problem? 20 years ago a desktop 64MB of memory is very good, now if there is no 2GB of memory (now this is too small) you are embarrassed to communicate with others Computer configuration, so ah, this is large enough to be relative to the scene at the time, as the environment changes, "big enough" may also turn into "small enough", Then there will be a situation beyond the maximum capacity of the array, how to solve it? In fact, the problem can be resolved "tactfully" by an array expansion, with the following code:
1 public static <T> t[] Expandcapacity (t[] datas, int newlen) {2 //cannot be negative 3 Newlen = Newlen < 0? 0: Newlen;4 //Generate a new array and copy the original value 5 return arrays.copyof (datas, Newlen); 6 }
The above code takes the Copyof method of the Arrays Array tool class, produces a new array of Newlen lengths, copies the original values, and then assigns the extra-long elements (0, false, or null depending on the type), using the following method:
public class Client61 {public static void Main (string[] args) { //A class accommodates a maximum of 60 students Stu [] stunums= new stu[60]; //stunums Initialization ... Occasionally a class can accommodate 80 people, the array is extended stunums=expandcapacity (stunums,80); /* Re-initialize 20 people over the limit ... * /} public static <T> t[] Expandcapacity (t[] datas, int newlen) { //cannot be negative
newlen = Newlen < 0? 0:newlen; Generate a new array and copy the original value return arrays.copyof (datas, Newlen);} } Class stu{ }
In this way, the problem of the variable length of the array is solved in a tortuous manner, in fact, the principle of the set's automatic maintenance function is similar. In real-world development, if you do need a variable-length data set, the array is also within the scope of consideration and cannot be negated by a fixed length.
Back to top tip 62: Beware of shallow copies of arrays
There is such an example, the first box has multicoloured 7 color balloon, now hope in the second box also put 7 balloons, one of the last balloon changed to blue, that is red orange green cyan Blue 7 Balloons, Then it's easy to think that the balloon in the second box can be done by copying the balloon in the first box, after all, there are 6 balloons that are the same, so look at the implementation code:
1 Import java.util.Arrays; 2 Import Org.apache.commons.lang.builder.ToStringBuilder; 3 4 public class Client62 {5 public static void main (string[] args) {6//balloon number 7 int ballonnum = 7; 8//First box 9 balloon[] box1 = new BALLOON[BALLONNUM];10//Initialize balloon in first box one for (int i = 0; i < Ballonnum; i++) {Box1[i] = new Balloon (color.values () [i], i); 13}14//The balloon in the second box is the one that copies the first box of the Ba lloon[] Box2 = arrays.copyof (Box1, box1.length); 16//Modify the last balloon color, Box2[6].setcolor (color.blue); 18 Print out the balloon color in the first box (Balloon b:box1) {System.out.println (b); 21}22 23}24}25 26 Balloon color enum Color {Yellow Red, Orange, Green, Indigo, Blue, Violet29}30 31//Balloon class Balloon {33///Series The private int id;35//color of the private color color;37-Balloon (color _color, int _id) {39 color = _color;40 id = _id;41 }42 public int getId () {setId return id;45}46 (int id) {$ this.id = ID; }50 public Color GetColor () {color;53}54-public void SetColor (color color) {56 This.color = color;57}58 @Override60 public String toString () {//apache-common-lang under the package Tostringbuilder overrides the ToString method for the return of the new Tostringbuilder (this). Append ("number", id). Append ("Color", color). toString (); 63 }64 65}
The color of the last balloon in the second box is undoubtedly changed to blue, but we do it by copying the balloon in the first box and then modifying it, does that affect the color of the balloon in the first box? Let's look at the output:
The last balloon color has also been modified, we just want to change the second box balloon Ah, why? This is a typical shallow copy (shallow Clone) problem, previously described in the first chapter of serialization, but here is a little different: the elements in the array do not implement the serializable interface.
Indeed, the array produced by the copyof method is a shallow copy, which is exactly the same as a shallow copy of the serialization: the base type is the direct copy value, and the other is the copy reference address. It is necessary to note that the clone method of the array is also the same, the same shallow copy, and the Clone method of the collection is also a shallow copy, which requires you to pay more attention when copying.
Problem found, the solution is also very simple, traverse box1 each element, regenerate a balloon (Balloon) object, and placed in the Box2 array, the code is relatively simple, no longer repeat.
The most common use of this method is when using a collection (such as list) for business processing, such as discovering the need to copy the elements in the collection, the collection does not provide a copy method, if you write it will be very troublesome, So simply use the List.toarray method to convert the array, and then through the arrays.copyof copy, and then converted back to the collection, simple and convenient! But, unfortunately, here we hit a shallow copy of the muzzle, although many times shallow copy can solve business problems, but more time will leave hidden trouble, we need to beware and beware.
Back to top recommendation 63: Specify the initial capacity for the collection in a clear scenario
We often use a collection of ArrayList, vectors, HashMap, and so on, usually directly with new to keep up with the class name of a collection, and then use the Add, remove and other methods to operate, and because it is automatically managed length, so we do not have to bother with extra long problems, This is indeed a very good merit, but there are also matters that we must pay attention to.
Below take ArrayList as an example to learn more about how Java implements dynamic management of length, starting with the reading of the Add method, the code (JDK7) is as follows:
1 Public Boolean add (E e) {2 //Extended length 3 ensurecapacityinternal (size + 1); Increments modcount!! 4 //Append element 5 elementdata[size++] = e;6 return true;7 }
We know that ArrayList is a variable-sized array, but it uses array storage (that is, elementdata variable) at the bottom, and the length of the array is fixed and the length of the dynamic length must be extended, The Ensurecapacityinternal method provides this functionality with the following code:
private void ensurecapacityinternal (int mincapacity) {//Modify counter modcount++; Overflow-conscious code if (mincapacity-elementdata.length > 0) grow (mincapacity); }private void Grow (int mincapacity) {//Overflow-conscious code//Last (original) defined array length int oldcapacity = Ele Mentdata.length; The new length is original length + original length shifted right one ==> original length of 1.5 times times int newcapacity = oldcapacity + (oldcapacity >> 1); if (newcapacity-mincapacity < 0) newcapacity = mincapacity; if (newcapacity-max_array_size > 0) newcapacity = hugecapacity (mincapacity); Mincapacity is usually close to size, so this is a win://array copy, generating a new array elementdata = arrays.copyof (elementd ATA, newcapacity); } private static int hugecapacity (int mincapacity) {if (mincapacity < 0)//overflow throw new outof Memoryerror (); Return (Mincapacity > Max_array_size)? Integer.max_vAlue:max_array_size; }
Probably analysis of these source code, this source or JDK7 before the version of the optimized processing. Let's start with the first method. Ensurecapacityintenal, the method name in English roughly means "ensure internal capacity", here to illustrate, size represents the number of existing elements, not ArrayList capacity, the capacity should be the length of the array elementdata. The parameter mincapacity is the minimum capacity to check, that is, the function of the method is to ensure that the length of the elementdata is not less than mincapacity, and if not enough, call grow to increase capacity. Capacity growth is also a structural change, so modcount needs to add 1.
Grow method: First on the capacity expansion of 1.5 times times, here oldcapacity >> 1 is the binary operation right shift, equivalent to divide by 2, if you do not know this wall go. Then to the new temporary capacity (not formally change the capacity, should be called the expected capacity) and the actual need for the minimum capacity comparison, if not satisfied, then the temporary capacity to the required minimum capacity value. In judging whether the capacity exceeds Max_array_size value, the Max_array_size value is integer.max_value-8, which is smaller than the maximum value of int 8, do not know what the original design is, it may be convenient to judge it. If it has already passed, call the Hugecapacity method to check that the int value of the capacity is not already overrun. Generally rarely used in the case of the maximum value of int, so much data will not be used ArrayList to do containers, it is estimated that no chance to see Hugecapacity run once. Finally, the new capacity is determined using the Arrays.copyof method to generate a new array, and COPYOF has completed the work of copying the data to the new array.
To return to the point, we pay attention to the calculation of the length of the array, not to add an element, the length of the Elementdata is added 1, but in the elementdata length of the threshold, The Elementdata will be expanded 1.5 times times, so that the implementation avoids the performance overhead of multiple copyof methods, otherwise, each additional element will have to be enlarged once, the performance will be worse. Do not know whether we have such a question, why to expand 1.5 times times, instead of 2.5, times, 3.5 times times? In fact, I also think so, the reason is that the expansion is too large, the larger the memory consumption, the more wasted memory (1.5 times times the expansion, up to a waste of 33% of the array space, and 2.5 times times the maximum consumption of 60% of memory), and once the expansion is too small, you need to reallocate memory multiple times, the performance consumption is serious, The test verifies that the 1.5 times-fold expansion satisfies both performance requirements and memory consumption.
Now that we know the principle of ArrayList expansion, there is one more question: What is the default length of Elementdata? The answer is 10, if we declare ArrayList by default, such as New ArrayList (), then the initial length of the Elementdata is 10, let's take a look at the three constructors of ArrayList.
No parameter constructs public ArrayList () {this ]; } Constructs an empty list with the specified initial capacity. Public ArrayList (int initialcapacity) { super (); if (initialcapacity < 0) throw new IllegalArgumentException ("Illegal capacity:" + initialcapacity); This.elementdata = new object[initialcapacity]; } Constructs a list of elements that contain the specified collection, which are public ArrayList (COLLECTION< extends e> c) in the order in which they are returned by the collection iterator; Elementdata = C.toarray (); size = Elementdata.length; C.toarray might (incorrectly) not return object[] (see 6260652) if (Elementdata.getclass ()! = Object[].class)
elementdata = arrays.copyof (elementdata, size, object[].class); }
ArrayList (): default constructor that provides an empty list with an initial capacity of 10.
ArrayList (int initialcapacity): Constructs an empty list with the specified initial capacity.
ArrayList (collection<? extends e> c): Constructs a list of elements that contain the specified Collection, arranged in the order in which they are returned by the Collection iterator.
From here we can see that if the initial capacity is not set, the system will be expanded by 1.5 times times the rule, each expansion is a copy of the array, if the volume of data is large, such copies will be very resource-intensive and inefficient. So, if we know the possible length of a ArrayList, then setting an initial capacity on ArrayList can significantly improve system performance.
Other sets such as vectors and ArrayList similar, only the expansion of multiples of different, vector expansion twice times, we are interested to see the Vector,hashmap JDK source code.
Back to top recommendation 64: Multi-value algorithm, timely selection
Sorting a batch of data and then finding the maximum or minimum value is the basic knowledge of the data structure. In Java, we can write the method of the algorithm, but also by the array first sort and then take the value of the way to achieve, the following for the maximum value for example, explain the various algorithms:
(1), self-realization, quickly find the maximum value
Let's take a look at the algorithm that uses the fast lookup method to get the maximum value, the code is as follows:
1 public static int max (int[] data) {2 int max = data[0];3 for (int i:data) {4 max = max > I? max:i;5< C4/>}6 return max;7 }
This is the maximum algorithm that we often use, and the fastest algorithm. It does not require sorting, just iterate over the array to find the maximum value.
(2), first order, then value
For the maximum value, you can also take the first sort after the value of the way, the code is as follows:
1 public static int max (int[] data) {2 arrays.sort (data); 3 return data[data.length-1];4 }
In terms of efficiency, of course, it is faster to write their own quick find, only by traversing once can calculate the maximum value, but in the actual test found that if the array amount of less than 10000, two basically no difference, but in the same millisecond level, at this time you can not write the algorithm, The way that the array is sorted first and then the value is taken directly.
If the array element exceeds 10000, it needs to be considered according to the actual situation: its own implementation, can improve performance, first sorted after the value, simple, easy to understand. Excluding performance differences, both can be selected, even the latter more convenient, and more easily thought out.
Now the question is, why do you use the Data.clone copy to sort the code first? Is it because the array is also an object, and does not copy change the order of the original array elements? Unless the order of the array elements is irrelevant. What if we were looking for an element next to the maximum value (i.e., the second)? Note that the elements of an array can be repeated, and the maximum value may be multiple, so it is not possible to solve the problem by simply sorting and then taking the second-to-last element.
At this point, you need a special sorting algorithm, first to eliminate duplicate data, and then sort, of course, their own write algorithm can also be implemented, but the collection class has provided a very good method, if you use their own writing algorithm is a little repetitive build wheel. The array cannot reject duplicate data, but the set collection is possible, and the subclass TreeSet of the set can be automatically sorted, with the following code:
1 public static int Getsecond (integer[] data) {2 //Convert to List 3 list<integer> dataList = arrays.aslist (data); 4< c2/>//converted to TreeSet, excluding duplicate elements and ascending order 5 treeset<integer> ts = new treeset<integer> (dataList); 6 // Achieve a maximum value smaller than the maximum value, that is, the second 7 return Ts.lower (Ts.last ()); 8 }
Remove the repeating elements and arrange in ascending order, which is implemented by the TreeSet class, and then use the lower method to find the value less than the maximum value, you see, the above program is very simple? What if we were writing our own code? At least two times to calculate the value of the second, code complexity will be greatly improved. Therefore, in the actual application to find the most value, including the maximum, minimum, the second-to-last small value, such as the use of the collection is the simplest way, of course, from the performance considerations, the array is the best choice.
Note: Using the collection is the simplest use of the maximum value calculation, using an array for optimal performance.
Reprint-Write high-quality code: 151 Suggestions for improving Java programs (5th: Arrays and Collections ___ recommendation 60~64)