Effective Java Third edition--45. Use stream wisely and prudently

Source: Internet
Author: User

Tips
"Effective Java, third Edition" an English version has been published, the second edition of this book presumably many people have read, known as one of the four major Java books, but the second edition of 2009 published, to now nearly 8 years, but with Java 6, 7, 8, and even 9 of the release, the Java language has undergone profound changes.
In the first time here translated into Chinese version. For everyone to learn to share.

45. Use stream wisely and prudently

The stream API has been added to Java 8 to simplify the task of performing bulk operations in order or in parallel. The API provides two key abstractions: Stream, which represents a finite or infinite sequence of data elements, as well as a stream pipeline (Stream pipeline) that represents a multilevel calculation of these elements. The elements in the stream can come from anywhere. Common sources include collections, arrays, files, regular expression pattern-matching, pseudo-random number generators, and other streams. A data element in a stream can be an object reference or a base type. Three basic types are supported: Int,long and Double.

A stream pipeline consists of 0 or more intermediate operations and an end operation of the source stream. Each intermediate operation transforms the stream in some way, such as mapping each element to a function of that element or filtering out all elements that do not satisfy certain conditions. Intermediate operations Convert one stream to another, and its element type may be the same or different from the input stream. The end operation Convection performs the final calculation resulting from the last intermediate operation, such as storing its elements in a collection, returning an element, or printing all its elements.

Pipeline delay (lazily) calculation evaluates: The calculation is not started until the end operation is called, and data elements that are not required to complete the finalization operation are never computed. This delay computes the evaluation method so that an infinite stream can be used. Note that the flow pipeline without an end operation is silent, so don't forget to include one.

Stream API Streaming (fluent):: It is designed to allow all calls that make up a pipeline to be linked to an expression. In fact, multiple pipelines can be linked together to form an expression.

By default, the stream pipeline runs in order (sequentially). Making a pipeline execute in parallel is as simple as calling a parallel method on any stream in the pipeline, but rarely (the 48th entry).

The stream API is versatile enough that virtually any computation can be performed using the stream, but just because it does, does not mean that this should be done. If used properly, the flow can make the program shorter and clearer, and if used improperly, makes it difficult for the program to read and maintain. There are no hard rules for when to use a stream, but there are some inspirations.

Consider the following program, which reads the word from the dictionary file and prints all the inflection-word (anagram) groups whose size matches the minimum value specified by the user. If two words are connected by length, and the same letters in different order are composed, they are conjugation words. The program reads each word from the user-specified dictionary file and puts the word into the map object. The key of the map object is a word sorted alphabetically, so the key of "staple" is "aelpst", "petals" is also "aelpst": These two words are the same word, all the same words share the same alphabetical form (or called alphagram). The value of the Map object is a list of all words in the form of a shared alphabetical order. After the dictionary file has been processed, each list is a complete, same-bit phrase. The program then traverses the view of the map object values() and prints a list of each size that matches the threshold:

Prints all large anagram groups in a dictionary Iterativelypublic class Anagrams {    public static void Main (string[] args) throws IOException {        file dictionary = new File (Args[0]);         int mingroupsize = Integer.parseint (args[1]);         map<string, set<string>> groups = new HashMap<> ();         try (Scanner s = new Scanner (dictionary)) {             while (S.hasnext ()) {                 string Word = S.next ();                 [groups.computeifabsent (alphabetize Word (HTTP/ Groups.computeifabsent (alphabetize (Word)),                      (unused), new treeset<> ()). Add ( Word);            }         }        for (set<string> group:groups.values ())             if (Group.size () >= minGroupSize)                  System.out.println (group.size () + ":" + group);     }    private Static String alphabetize (String s) {        char[] a = S.tochararray ();         arrays.sort (a);         return new String (a);     }}

One of the steps in this program is worth noting. Inserting each word into the map (shown in bold) uses the computeIfAbsent method, which is added in Java 8. This method finds a key in the map: If the key exists, the method returns only the value associated with it. If not, the method evaluates the value by applying the given function object to the key, associates the value with the key, and returns the computed value. computeIfAbsentmethod simplifies the implementation of the map that associates multiple values with each key.

Now consider the following procedure, which solves the same problem, but heavily uses the stream excessively. Note that the entire program (except for the code that opens the dictionary file) is contained in a single expression. The only reason to open a dictionary file in a separate expression is to allow the use of the Try-with-resources statement, which ensures that the dictionary file is closed:

Overuse of Streams-don ' t do this!public class Anagrams {  public static void main (string[] args) throws IOE xception {    path dictionary = paths.get (args[0]);     int minGroupSize = Integer.parseint (Args[1]);       try (stream<string> words = Files.lines ( Dictionary)) {        words.collect (           groupingby (Word, word.chars (). Sorted ()                        .collect ( stringbuilder::new,                          (SB, C) sb.append ((char) c),                         &nbsp stringbuilder::append). toString ())           .values (). Stream ()             .filter (Group--group.size () >= mingroupsize)             .map (Group- Group.size () + ":" + Group)             .foreach ( System.out::p rintln);         }    }}

If you find this piece of code difficult to read, don't worry; you're not alone. It is shorter, but less readable, especially for programmers who are not good at using streams. overuse of streams makes it difficult for programs to read and maintain .

Fortunately, there is a compromise. The following program solves the same problem by using streams without using them excessively. The result is a shorter and clearer program than the original:

 //tasteful use of streams enhances clarity and Concisenesspublic class Anagrams {   public stat IC void Main (string[] args) throws IOException {      path dictionary = paths.get (args[0] );       int mingroupsize = Integer.parseint (args[1]);       try (stream<string> words = Files.lines (dictionary)) {         Words.collect (Groupingby (Word), alphabetize)            . VALUES (). Stream ()           . Filter (Group- Group.size () >= mingroupsize)           . ForEach (G- System.out.println (g.size () + ":" + g));      }   }    Alphabetize method is the same as in original version}  

This program is not difficult to understand, even if it has little contact with the stream before. It opens a dictionary file in a try-with-resources block, obtaining a stream of all the rows in the file. The rheological quantity words is named, which indicates that each element in the stream is a word. The pipeline on this stream has no intermediate action, and its end operation collects all the words into a map object, grouping the words in alphabetical order (item 46th). This is exactly the same map that was constructed for the previous two versions of the program. Then open a new stream on the values () view of the map <List<String>> . Of course, the element in this flow is the same phrase. The convection is filtered so that minGroupSize all groups smaller than the size are ignored, and finally the remainder of the same phrase is printed by the finalization action foreach.

Note that you carefully select the lambda parameter name. The parameters in the above program g should really be named group , but the generated lines of code are too wide for this book. in the absence of an explicit type, carefully naming the lambda parameter is critical to the readability of the flow pipeline .

Also note that the word alphabet is done in a separate alphabetize method. This enhances readability by providing an action name and preserving the implementation details outside the main program. using an auxiliary method is more important in the flow pipeline than in the iteration code , because the pipeline lacks explicit type information and named temporary variables.

The alphabetical method can be re-implemented using a stream, but the flow-based alphabetical method is less clear, more difficult to write correctly, and may be slower. These flaws are due to Java's lack of support for primitive character streams (this does not mean that Java should support char streams; it is not possible to do so). To demonstrate the hazards of using a stream to process a char value, consider the following code:

"Hello world!". Chars (). ForEach (System.out::p rint);

You may want it to print Hello world! , but if you run it, find it printed 721011081081113211911111410810033 . This is because the element of the “Hello world!”.chars() returned stream is not a char value, but an int value, so print the int overload is called. Admittedly, a method named chars returns an int value stream that is confusing. You can fix the program by forcing the correct overload to be called:

"Hello world!".chars().forEach(x -> System.out.print((char) x));

Ideally, however, you should avoid using streams to process char values .

When you start using a stream, you may feel the urge to convert all of the loop statements into a flow mode, but resist this impulse. While this is possible, it can compromise the readability and maintainability of the code base. Typically, using a combination of flows and iterations can best accomplish a moderately complex task, as shown in the previous Anagrams program. Therefore, refactor existing code to use the flow and use them in new code only if it makes sense .

As the program in this project shows, a flow pipeline uses a function object (typically a lambdas or method reference) to represent a repeating calculation, whereas an iterative code uses a code block to represent a repeating calculation. You can do something from a block of code that you can't do from a function object:

• Any local variables within the range can be read or modified from the code block; From a lambda, you can only read the final or valid final variable [JLS 4.12.4], and you cannot modify any local variables.
• From a block of code, you can return from the enclosing method, break or continue enclosing the loop, or throw any checked exception that declares this method; From a lambda you can't do these things.

If you use these techniques to best express your calculations, then it may not be a good match for the stream. Instead, a stream can easily do something:
• Uniform conversion of element sequences
• Filter sequence of elements
• Combine element sequences with a single operation (e.g. add, concatenate, or calculate minimum values)
• Accumulate the sequence of elements into a collection that may be grouped by some common properties
• Search for elements that meet certain conditions in the sequence of elements

If you use these techniques to best express your calculations, then using streams is a good candidate for these scenarios.

One thing that is hard to do for a stream is to access the corresponding elements in multiple stages of the pipeline at the same time: once the values are mapped to other values, the original values are lost. One solution is to map each value to a pair object that contains both the original value and the new value, but this is not a satisfactory solution, especially if you need a pair of objects in multiple stages of the pipeline. The generated code is confusing and lengthy, destroying the main purpose of the stream. When it works, a better solution is to transform the mappings when you need access to the early-stage values.

For example, let's write a program to print the first 20 Mason primes (Mersenne primes). The number of Mason primes is a 2p−1 form. If P is prime, the corresponding Mason number may be prime; If that's the case, it's Mason Prime. As the initial flow in our pipeline, we need all primes. Here is a method that returns the (infinite) stream. We assume that static imports are used to easily access static members of BigInteger:

static Stream<BigInteger> primes() {    return Stream.iterate(TWO, BigInteger::nextProbablePrime);}

The name of the method (primes) is a plural noun that describes the elements of the stream. It is strongly recommended that all methods that return a stream use this naming convention because it enhances the readability of the flow pipeline. The method uses a static factory Stream.iterate , which accepts two parameters: the first element in the stream, and the function that generated the next element in the stream from the previous element. This is the procedure for printing the first 20 Mason primes:

public static void main(String[] args) {    primes().map(p -> TWO.pow(p.intValueExact()).subtract(ONE))        .filter(mersenne -> mersenne.isProbablePrime(50))        .limit(20)        .forEach(System.out::println);}

This program is the direct code described by Mason above: It starts with prime numbers, calculates the corresponding Mason number, filters out all numbers except the number of primes (magic number 50 control probability primality test the magic numbers, the probabilistic primality Test), the resulting stream is limited to 20 elements and printed out.

Now suppose we want to precede each Mason prime number with its exponent (p), which appears only in the initial stream, so it is inaccessible in the finalization operation, and the end operation outputs the result. Fortunately, by reversing the mappings that occurred in the first intermediate operation, it is easy to calculate the exponent of the Mersenne number. The exponent is the number of bits in the binary representation, so the finalization operation produces the desired result:

.forEach(mp -> System.out.println(mp.bitLength() + ": " + mp));

There are many tasks that do not know whether to use a stream or an iteration. For example, consider the task of initializing a new deck of cards. The Assumption Card is an immutable value class, which encapsulates Rank and Suit , they are enumerated types. This task represents any pair of elements that you want to calculate that can be selected from both collections. Mathematicians call it a Cartesian product of two sets. Here is an iterative implementation that has a nested For-each loop that you should be very familiar with:

// Iterative Cartesian product computationprivate static List<Card> newDeck() {    List<Card> result = new ArrayList<>();    for (Suit suit : Suit.values())        for (Rank rank : Rank.values())            result.add(new Card(suit, rank));    return result;}

The following is a stream-based implementation that uses an intermediate action flatMap method. This action maps each element in a stream to a stream and then connects all of them to a stream (or flatten them). Note that this implementation contains a nested lambda expression ( rank -> new Card(suit, rank)) ):

// Stream-based Cartesian product computationprivate static List<Card> newDeck() {    return Stream.of(Suit.values())        .flatMap(suit ->            Stream.of(Rank.values())                .map(rank -> new Card(suit, rank)))        .collect(toList());}

newDeckWhich of the two versions is better? It comes down to personal preferences and your programming environment. The first version is simpler and may feel more natural. Most Java programmers will be able to understand and maintain it, but some programmers will feel more comfortable with a second (stream-based) version. If convection and functional programming are quite proficient, then it will be more concise and not too difficult to understand. If you are unsure of which version you prefer, the iteration version may be a more secure option. If you prefer a streaming version and trust that other programmers who use the code will share your preferences with you, you should use it.

In summary, some tasks are best done using streams, and some tasks are best done using iterations. By combining these two approaches, you can best accomplish a number of tasks. There are no hard rules for choosing which method to use, but there are some useful heuristics. In many cases, it would be clear which method to use, and in some cases it would not be clear. If you're not sure whether a task is better done by flow or iteration, try both methods to see which one works better.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.