Writing high-quality code: 151 suggestions for improving Java programs (Chapter 1: arrays and collections ___ suggestion 60 ~ 64), java151

Source: Internet
Author: User

Writing high-quality code: 151 suggestions for improving Java programs (Chapter 1: arrays and collections ___ suggestion 60 ~ 64), java151

Oh, it understands that the river is neither as shallow nor as deep as the squirrel said. You can only know it if you have tried it yourself.

--- Fable story: crossing the river by pony

Data processing is a required function in each language. Java is even worse. data sets can be repeated, repeated data can be disabled, null can be allowed, null can be allowed, and automatic sorting can be performed, you can also disable automatic sorting, block or non-blocking, stack, or queue ......

This chapter will focus on the three most used data sets (arrays, ArrayList and HashMap) to explain the precautions during the development process, and extended to Set, Quene, Stack, and so on.

Recommendation 60: performance considerations, array is the first choice

Arrays are rarely used in actual system development. We usually see them only when reading some open-source projects, in Java, it does not have the List, Set, and Map collection classes for ease of use. However, in terms of basic type processing, arrays are dominant, and the underlying layers of collection classes are also implemented through arrays, for example, to calculate the sum of a dataset:

1 // sum the array 2 public static int sum (int datas []) {3 int sum = 0; 4 for (int I = 0; I <datas. length; I ++) {5 sum + = datas [I]; 6} 7 return sum; 8}

Sums an array of the int type, extracts all array elements, and adds them together. In this algorithm, if it is of the basic type, the array efficiency is the highest, followed by the efficiency when using a set. Then we can use the List summation:

1 // calculate 2 public static int sum (List <Integer> datas) {3 int sum = 0; 4 for (int I = 0; I <datas. size (); I ++) {5 sum + = datas. get (I); 6} 7 return sum; 8}

Note that sum + = datas. get (I); in this line of code, a box-breaking action has been done here. The Integer object is automatically converted to an int basic type through the intValue method, this solution is dangerous for systems with near-critical performance. Especially for a large number of systems, the first step is to bind the List array during initialization, wrap an int type into an Integer object. Even if there is an Integer pool, a new Integer object will be generated if it is not within the range of the Integer pool. It is well known that, the basic type is to operate in the stack memory, while the object is operated in the heap memory. The stack memory features fast speed and small capacity. The heap memory features slow speed, large Capacity (in terms of performance, basic types of processing are dominant ). Second, we need to split the box when performing the sum operation (or other traversal calculation), so unnecessary performance consumption is generated. In actual tests, we found that the array efficiency is 10 times the efficiency of the set when the basic types are summed.

  Note: Use arrays instead of collections in scenarios with high performance requirements.

Suggestion 61: Use a variable-length array if necessary

The array in Java is fixed, and the length cannot be changed once it is declared during initialization. this is inconvenient for actual use. For example, you need to collect statistics on the information of the class students, because we do not know how many students will be in a class (students may be enrolled, dropped out, or transferred at any time), we need a large enough array to accommodate all the students, but how big is the problem? 20 years ago, the 64 MB memory of a desktop computer was very good. If there is no 2 GB memory (it is too small now), you would be embarrassed to communicate with others about the computer configuration, this is big enough for the current scenario. As the environment changes, "big enough" may also be converted into "small enough", and then the maximum size of the array will be exceeded, how can this problem be solved? In fact, the problem can be solved by resizing the array. The Code is as follows:

1 public static <T> T [] expandCapacity (T [] datas, int newLen) {2 // cannot be a negative value 3 newLen = newLen <0? 0: newLen; 4 // generate a new array and copy the original value 5 return Arrays. copyOf (datas, newLen); 6}

The above Code uses the copyOf method of the Arrays array tool class to generate a new array with a newLen length and copy the original value, then you can assign values to extra-long elements (0, false, or null Based on Different Types). The usage is as follows:

Public class Client61 {public static void main (String [] args) {// a class can accommodate up to 60 students Stu [] stuNums = new Stu [60]; // stuNums initialization ...... // occasionally, a class can accommodate 80 people, And the array is extended. stuNums = expandCapacity (stuNums, 80);/* reinitialize 20 people who have exceeded the limit ...... */} public static <T> T [] expandCapacity (T [] datas, int newLen) {// cannot be a negative value newLen = newLen <0? 0: newLen; // generate a new array and copy the original value return Arrays. copyOf (datas, newLen) ;}} class Stu {}

Through this method, the variable length problem of the array is solved in twists and turns. In fact, the principle of automatic length maintenance of the set is similar. In actual development, if you really need a variable-length dataset, arrays are also within the scope of consideration and cannot be negated due to a fixed length.

Suggestion 62: Watch out for the shortest copy of the array.

In this example, the first box contains red, orange, yellow, green, blue, and purple balloons. Now I want to add seven balloons to the second box, and change the last one to blue, that is, 7 balloons in red, orange, yellow, green, blue, and blue, it is easy to think that the balloon in the second box can be copied by copying the balloon in the first box, after all, six balloons are the same. Let's look at the implementation code:

1 import java. util. arrays; 2 import org. apache. commons. lang. builder. toStringBuilder; 3 4 public class Client62 {5 public static void main (String [] args) {6 // number of balloons 7 int ballonNum = 7; 8 // The first box 9 Balloon [] box1 = new Balloon [ballonNum]; 10 // initialize the Balloon 11 for (int I = 0; I <ballonNum; I ++) {12 box1 [I] = new Balloon (Color. values () [I], I); 13} 14 // The Balloon in the second box is copying 15 Balloon [] box2 = Arrays in the first box. copyOf (box1, box1.length); 16 // modify the color of the last balloon 17 box2 [6]. setColor (Color. blue); 18 // print the Balloon color in the first box 19 for (Balloon B: box1) {20 System. out. println (B); 21} 22 23} 24} 25 26 // balloon Color 27 enum Color {28 Red, Orange, Yellow, Green, Indigo, Blue, violet 29} 30 31 // Balloon 32 class Balloon {33 // 34 private int id; 35 // Color 36 private color Color; 37 38 public Balloon (color _ Color, int _ id) {39 color = _ color; 40 id = _ id; 41} 42 43 public int getId () {44 return id; 45} 46 47 public void setId (int id) {48 this. id = id; 49} 50 51 public Color getColor () {52 return color; 53} 54 55 public void setColor (Color color) {56 this. color = color; 57} 58 59 @ Override60 public String toString () {61 // rewrite toString method 62 return new ToStringBuilder (this) in the apache-common-lang package ). append ("id", id ). append ("color", color ). toString (); 63} 64 65}

The color of the last balloon in the second box is undoubtedly changed to blue, but we did it by copying the balloon in the first box and then modifying it, will it affect the color of the balloon in the first box? Let's look at the output result:

  

The color of the last balloon was changed. We just wanted to modify the balloon of the second box. Why? This is a typical Shallow Clone problem. It was mentioned in the first chapter of serialization, but it is different from this: the elements in the array do not implement the Serializable interface.

Indeed, the array generated through the copyOf method is a shortest copy, which is exactly the same as the serialized shortest copy: the basic type is direct copy value, and the others are copy reference addresses. It should be noted that the array clone method is the same as this method. It is also a shortest copy, and the clone method of the set is also a shortest copy, which requires you to pay more attention when copying.

After the problem is found, the solution is very simple. Traverse every element of box1, regenerate a Balloon object, and place it in the box2 array. The code is simple and will not be repeated.

The most commonly used method is when a set (such as List) is used for business processing. For example, if you find that the elements in the Set need to be copied, the copy method is not provided for the set, if you write it yourself, it will be very troublesome, so you can simply use List. the toArray method is converted to an array, and then the Arrays. copyOf copy, and then switch back to the set, which is simple and convenient! However, we are very sorry that we have hit the shot of the shallow copy. Although many times the shallow copy operation can solve business problems, it will leave more hidden risks, we need to be careful.

Recommendation 63: Specify the initial capacity for the set in specific scenarios

We often use ArrayList, Vector, HashMap, and other sets. Generally, we use the new class name to declare a set and then use the add, remove, and other methods for operations, in addition, because it automatically manages the length, we don't need to worry too much about it. This is indeed a very good advantage, but there are also things we must pay attention.

The following uses ArrayList as an example to explain how Java implements dynamic Length Management. First, read the add method. The Code (JDK7) is as follows:

1 public boolean add (E e) {2 // extended length 3 ensureCapacityInternal (size + 1); // Increments modCount !! 4 // append element 5 elementData [size ++] = e; 6 return true; 7}

We know that ArrayList is a variable-size array, but it uses Array Storage (that is, the elementData variable) at the underlying layer, and the array length is fixed, to achieve dynamic length, the length must be extended. The ensureCapacityInternal method provides this function. The Code is as follows:

Private void ensureCapacityInternal (int minCapacity) {// modify the counter modCount ++; // overflow-conscious code if (minCapacity-elementData. length> 0) grow (minCapacity);} private void grow (int minCapacity) {// overflow-conscious code // The length of the array defined last time (original) int oldCapacity = elementData. length; // The New length is the original length + the original length shifts one digit to the right ==> 1.5 times the original length int newCapacity = oldCapacity + (oldCapacity> 1 ); if (newCapacity-minCapacity <0) NewCapacity = minCapacity; if (newCapacity-MAX_ARRAY_SIZE> 0) newCapacity = hugeCapacity (minCapacity); // minCapacity is usually close to size, so this is a win: // copy an array, generate new array elementData = Arrays. copyOf (elementData, newCapacity);} private static int hugeCapacity (int minCapacity) {if (minCapacity <0) // overflow throw new OutOfMemoryError (); return (minCapacity> priority )? Integer. MAX_VALUE: MAX_ARRAY_SIZE ;}

Analyze the source code. The source code is optimized in versions earlier than JDK 7. First, let's talk about the first method ensureCapacityIntenal. The English meaning of the method name is "ensure internal capacity". Here we will explain that size indicates the number of existing elements, not the ArrayList capacity, the capacity should be the length of the array elementData. The minCapacity parameter is the minimum capacity to be checked, that is, the function of the method is to ensure that the length of elementData is not less than minCapacity. If not, the grow is called to Increase the capacity. Capacity growth is also a structural change, so modCount needs to be increased by 1.

Grow method: first increase the capacity by 1.5 times. Here, oldCapacity> 1 is the right shift of the binary operation, which is equivalent to dividing by 2. If you do not know this wall, go. Then compare the new temporary capacity (the expected capacity has not been officially changed) with the actual minimum capacity, change the temporary capacity to the required minimum capacity value. When determining whether the capacity exceeds the value of MAX_ARRAY_SIZE, the value of MAX_ARRAY_SIZE is Integer. MAX_VALUE-8, which is 8 smaller than the maximum value of int. I don't know what the design was originally intended. It may be easy to judge. If it has exceeded, call the hugeCapacity method to check whether the int value of the capacity has exceeded. Generally, when the maximum int value is rarely used, so much data will not be used as a container using ArrayList. It is estimated that hugeCapacity will not run once. Finally, the new capacity is determined, and the Arrays. copyOf method is used to generate a new array. copyOf has completed copying the data to the new array.

Back to the question, you should note that the array length calculation method is not to add an element. The length of elementData is increased by 1. Instead, when the length of elementData reaches the critical point, elementData is resized by 1.5 times, this avoids the performance overhead of multiple copyOf methods. Otherwise, each added element needs to be resized once, and the performance will be worse. I wonder if you have such a question: Why do you need to scale up 1.5 times, not 2.5 times, times, or 3.5 times? In fact, I also thought that the reason is that a scale-up operation is too large, and the larger the memory usage is, the more memory will be wasted (1.5 times the scale-up, up to 33% of the array space is wasted, and 2.5 times consume a maximum of 60% of the memory), while a single expansion is too small, you need to re-allocate the memory for the array multiple times, the performance consumption is serious, after testing and verification, expansion of 1.5 times not only meets the performance requirements, but also reduces memory consumption.

Now that we know the expansion principle of ArrayList, there is another question: what is the default length of elementData? The answer is 10. If we declare ArrayList by default, for example, new ArrayList (), the initial length of elementData is 10. Let's look at the three constructors of ArrayList.

// Construct public ArrayList () {this (10) ;}// construct an empty list with the specified initial capacity. Public ArrayList (int initialCapacity) {super (); if (initialCapacity <0) throw new IllegalArgumentException ("Illegal Capacity:" + initialCapacity); this. elementData = new Object [initialCapacity];} // constructs a list of elements that contain the specified collection, these elements are public ArrayList (collection <? Extends E> c) {elementData = c. toArray (); size = elementData. length; // c. toArray might (incorrectly) not return Object [] (see 6260652) if (elementData. getClass ()! = Object []. class) elementData = Arrays. copyOf (elementData, size, Object []. class );}

ArrayList (): default constructor, which provides an empty list with an initial capacity of 10.

ArrayList (int initialCapacity): constructs an empty list with the specified initial capacity.

ArrayList (Collection <? Extends E> c): constructs a list of elements that contain the specified collection. These elements are arranged in the order returned by the collection iterator.

From this we can see that if you do not set the initial capacity, the system will scale up according to the 1.5 times rule. Each scale-up is an array copy. If the data volume is large, such a copy will consume a lot of resources, and the efficiency is very low. Therefore, if we know the possible length of an ArrayList and set an initial capacity for the ArrayList, the system performance can be significantly improved.

Other sets, such as Vector and ArrayList, are similar, but the expansion multiples are different. If you are interested, you can check the JDK source code of Vector and HashMap.

Recommended 64: a variety of best-value algorithms for timely selection

Sort a batch of data and find the maximum or minimum values. This is the basic data structure knowledge. In Java, we can write algorithms or sort arrays and then obtain values. The following example illustrates multiple algorithms:

(1) implement it by yourself to quickly find the maximum value

First look at the algorithm that uses the quick search method to obtain the maximum value. The Code is as follows:

1 public static int max(int[] data) {2         int max = data[0];3         for (int i : data) {4             max = max > i ? max : i;5         }6         return max;7     }

This is our frequently used Maximum algorithm and the fastest algorithm. It does not require sorting. You only need to traverse the array to find the maximum value.

(2) sort first and then set the value

You can also sort the values first and then obtain the maximum value. The Code is as follows:

1 public static int max(int[] data) {2         Arrays.sort(data);3         return data[data.length - 1];4     }

In terms of efficiency, it is faster to write a quick search method by yourself. You can calculate the maximum value by traversing it once. However, in actual tests, if the number of arrays is less than 10000, there is basically no difference between the two, but in the same millisecond level, you do not need to write your own algorithm, directly use the array first sort and then value.

If the number of array elements exceeds 10000, you need to consider it based on the actual situation: self-implementation can improve the performance. The value is sorted first and then easy to understand. To eliminate performance differences, you can choose either of them, or even the latter, which is more convenient and easy to think.

Now the question is, why do we use data. clone copy before sorting in the code? That's because the array is also an object. Does it change the order of the original array elements without copying them? Unless the order of array elements is irrelevant. What should I do if I want to find the element second to the maximum value (that is, the second child? It should be noted that the elements in the array can be repeated, and the maximum value may be multiple. Therefore, the problem cannot be solved when only one sorting and the last and second elements are obtained.

In this case, a special sorting algorithm is required. You must first remove duplicate data and then sort the data. Of course, you can write your own algorithms. However, the Collection class provides a very good method, if you use your own algorithm, it seems a little repetitive. The array cannot remove duplicate data, but the Set can be used, and the TreeSet subclass of the Set can be automatically sorted. The Code is as follows:

1 public static int getSecond (Integer [] data) {2 // convert to List 3 List <Integer> dataList = Arrays. asList (data); 4 // convert to TreeSet, remove duplicate elements and sort 5 TreeSet <Integer> ts = new TreeSet <Integer> (dataList) in ascending order ); 6 // get the maximum value smaller than the maximum value, that is, the second child 7 return ts. lower (ts. last (); 8}

Remove duplicate elements and sort them in ascending order. This is implemented by the TreeSet class, and you can use the lower method to find values smaller than the maximum value. Do you think the above program is very simple? What if we write our own code? The second value can be calculated only when the array is traversed at least twice. The Code complexity will be greatly improved. Therefore, in practical applications, finding the greatest value, including the maximum, minimum, and second-to-second values is the easiest way to use a set. Of course, in terms of performance, arrays are the best choice.

  Note: The simplest way to use a set for the most value calculation is to use the best array performance.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.