Java Streams (i) API introduction

Last Update:2018-05-06 Source: Internet

Author: User

Tags first string

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java.util.stream such a package (stream) is introduced in Java8, and the addition of new features is designed to help developers perform a series of operations on collections at a higher level of abstraction.
With the help of the Java.util.stream package, we can concise declarative representations of collections, arrays and other possible parallel processing on the data source. Implement changes from an external iteration to an internal iteration.

Higher Level abstraction
Consider such a problem, when we want to collect a class from Shaanxi students, in the java8 before, we generally achieve this.

        public List<Student> isFromShannxi(List<Student> students){            Objects.requireNonNull(students, "Cannot find students");            List<Student> shannxiStudents = new ArrayList<>();            for(Student student : students) {                if (student.isFrom(Constants.Province.SHANNXI)) {                    shannxiStudents.add(student);                }            }            return shannxiStudents;        }

This code estimates that each Java developer can be implemented in minutes. However, each time you write a lot of boilerplate code, create a collection that holds the collection results, iterate through the incoming collection, filter by specific criteria, and then add the filtered results to the result set and return. There are several drawbacks to this code
1 Maintenance difficulties, if you can not read the entire cycle can not guess the meaning of the code, if the name is not standardized or no comments (imagine the above variable name is the XS (student) LZSXDXS (students from Shaanxi) ... , it is more difficult to maintain, we do not think that there is no such code, I have to take over such code, it is quite painful.
The 2 is difficult to scale for parallel execution, and it can be very hard to modify the estimate to parallel processing.

The following shows the use of streams to achieve the same functionality

        public List<Student> isFromShannxi(Stream<Student> stream){            Objects.requireNonNull(stream, "Cannot find students");            return stream                    .filter(student -> student.isFrom(Constants.Province.SHANNXI))                    .collect(toList());        }

From the code we can see the meaning of the program, filter the stream and turn the filtered result into a collection and return, as to what type of the, is inferred by the JVM. The execution of the whole program is like describing one thing, without excessive intermediate variables, which can reduce GC pressure and improve efficiency. Then start exploring the function of stream. Introduction to
Stream
Java.util.stream This package introduces a stream. The
Stream and the collection have several different places.

is not stored. The stream is not the data structure of the storage element, but instead it passes the elements from sources such as data structures, arrays, constructor functions, or I/O channels through a calculation operation.

Essentially, the operation of a stream produces a result, but does not modify its source. For example, the stream gets a new element from the collection without filtering the element, rather than removing the filter element from the collection.

Lazy Evaluation, many operations of a stream are inert.

evaluates early, invoking a method of early evaluation after a series of lazy evaluation to produce the final result. The

collection has a size limit, but the stream has no size limit. The

stream element is accessed only once in the declaration cycle, just like an iterator, and a new stream must be generated when the same element needs to be accessed again.

Java provides the following ways to generate a stream

Collection.stream () creates a stream with a collection;

Collection.parallelstream () creates a parallel stream using a collection;

Arrays.stream (object[]);

Stream.of (object[]), intstream.range (int, int) or stream.iterate (Object, Unaryoperator), Stream.empty (), Stream.generate (Supplier

Bufferedreader.lines ();

Random.ints ();

Bitset.stream (), Pattern.splitasstream (Java.lang.CharSequence), and Jarfile.stream ().

intermediate operation of the flow (no side effects)

Filter (predicate

Map (Function

FlatMap (Function

Distinct () a duplicate stream element has been deleted

Sorted () Flow elements sorted by natural order

Sorted (Comparator

Limit (long) truncates the stream element to the provided length

Skip (long) discards the flow elements of the first N elements

TakeWhile (predicate

Dropwhile (predicate

Terminate Operation

ForEach (Consumer

ToArray () creates an array using the elements of the stream.

Reduce (...) aggregates the elements of a stream into one summary value.

Collect (...) aggregates the elements of a stream into a summary result container.

min (Comparator

max (Comparator

count () returns the size of the stream.

{any,all,none}match (predicate

FindFirst () returns the first element of a stream, if any.

findany ()

The
flow operations and pipelines
Flow Operations Act as intermediate and terminate operations, and are combined to become flow pipelines. A stream pipeline consists of a source (for example, Collection, an array, a generator function, or an I/O Channel); Then there are 0 or more intermediate operations, such as Stream.filter or stream.map; and terminal operations such as Stream.foreach or Stream.reduce. Termination operations such as Stream.foreach or intstream.sum can traverse the stream to produce results or side effects. After the termination operation is executed, the stream pipeline is considered consumed and can no longer be used; If you need to traverse the same data source again, you must return to the data source to get the new stream. The
slow process process can achieve significant efficiency, and filtering, mapping, and summing can be fused to a single pass through the data, with the smallest intermediate state. Laziness can also avoid checking all data when it is not necessary; For operations such as "find the first string longer than 1000 characters", simply check enough strings to find the string with the desired attribute without checking all the strings available in the source. (This behavior becomes more important when the input stream is infinite, not just big.) The intermediary business is further divided into stateless and stateful operations. Stateless operations, such as filter and map, do not preserve the state of previously seen elements when working with new elements-each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may contain the state of the previously seen element as it processes the new element. Stateful operations may require that the entire input be processed before the results are generated. For example, only after you have viewed all the elements of a stream can you produce any results for the sort stream. Therefore, in parallel computing, some pipelines that contain stateful intermediate operations may require multiple passes of data, or may need to cache important data. Pipelines that contain no-state intermediate operations can be processed at once, with minimal data buffering, whether sequential or parallel. In addition, some operations are considered short-circuit operations. If the intermediate operation is short-circuited when rendering an infinite input, it may produce a limited flow. The terminal operation is short-circuited and, if an infinite input occurs, it may terminate within a limited time. Short-circuit operation in Pipelining is a necessary but insufficient condition to deal with the normal termination of infinite flow within a limited time.

See the total number of students from Shaanxi who have elective power electronics in a class
The processing element with an explicit for-loop is inherently serial. A stream facilitates parallel execution by redefining the calculation as a pipeline of aggregation operations rather than as a command action for each individual element. All flow operations can be performed in a serial or parallel manner. Unless parallelism is explicitly required, the stream implementation in the JDK creates a serial stream. For example, Collection has a method Collection.stream () and Collection.parallelstream () that produce sequential and parallel streams, respectively; Other stream-loading methods, such as Intstream.range (int, int), produce sequential streams, but they can be efficiently parallelized by calling their Basestream.parallel () method. To execute the "sum of widget weights" query in parallel, we can do this

        return students.stream()                .filter(student -> student.isFrom(Constants.Province.SHANNXI))                .filter(student -> student.getScores().containsKey(Constants.Course.POWER_ELECTRONICS))                .count();

This program executes very quickly, because there are only dozens of students in each class, but if I need to select the students from Shaanxi who have taken the power electronics course in the National University. That's a bit of a long run. At this point we need to change the serial to parallel to make full use of multicore CPU resources. The traditional way of modification needs to be changed into a code structure, but the change of the flow is very simple. We just need to add one line of code. As for the principle and constraints of turning a serial stream into a parallel stream, we analyze it back.

        return students.stream()                .parallel()                .filter(student -> student.isFrom(Constants.Province.SHANNXI))                .filter(student -> student.getScores().containsKey(Constants.Course.POWER_ELECTRONICS))                .count();

The only difference between the serial and parallel versions of this example is the creation of the initial stream, using "Parallelstream ()" instead of "stream ()". When a terminate operation is initiated, the flow pipeline executes sequentially or in parallel, depending on the direction of the stream called by the stream. You can use the Isparallel () method to determine whether a stream is executed serially or in parallel, and you can use the basestream.sequential () and Basestream.parallel () operations to modify the direction of the stream. When a terminate operation is initiated, the flow pipeline is executed according to the pattern order of the stream called by the flow or in parallel. In addition to identifying operations that are explicitly indeterminate, such as whether the Findany () stream is sequential or parallel, the calculation results should not be altered. Most stream operations accept parameters that describe user-specified behavior, which are typically lambda expressions. In order to maintain the correct behavior, these behavior parameters must be non-intrusive and in most cases must be stateless. Such parameters are always an instance of a functional interface, such as a function, and often a lambda expression or a reference to a method.

Do not modify the data source
Streams enable you to perform potentially parallel aggregation operations on a variety of data sources, including even non-thread-safe collections, such as ArrayList. This is only possible if we can prevent interference with the data source during the execution of the flow pipeline. In addition to the Escape-hatch operation iterator () and spliterator (), execution begins when the terminal operation is invoked and ends when the terminal operation completes. For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the flow pipeline. The notable exception is that the source of the stream is a concurrent collection, which is designed to handle concurrent modifications. The concurrent stream sources are those that Spliterator report the concurrent attribute. If the modification of the behavior parameter results in a modification of the data source, then the behavior parameter interferes with the concurrency data source. The requirements for all pipeline flows are not limited to parallel pipelines, unless the stream source is concurrent, modifying the data source of the stream during the execution of the flow pipeline may result in an exception or inconsistent results. For a well-behaved stream source, you can modify the source before terminating the operation, which can overwrite the previous stream. as shown below.

        public static void modifyStream() {            List<String> list = new ArrayList<>(Arrays.asList("one", "two"));            Stream<String> stream = list.stream();            list.add("three");            System.out.println(stream.collect(Collectors.joining(", ", "[", "]")));        }

First create a list of two strings: "One"; and "The other". Then create a stream from the list. Next, modify the list by adding a third string: "three". Finally, the elements of the stream are collected and concatenated together. Since the list has been modified before the terminal collect operation begins, the result will be a string of "[One, one, three]". All flows returned from the JDK collection and most other JDK classes run well in this manner;

stateless behavior
If the behavior parameters of a stream operation are stateful, the results of the parallelization operation may be inconsistent. An example of a stateful lambda is the map () operation. If the operation is parallel, the same input may produce different results due to the different thread schedules. The result of a stateless lambda expression is always the same. Therefore, the best way is to avoid stateful behavior parameters throughout the flow operation.

Side Effects
Behavioral parameters The side effects of convective operations are often discouraged because they often lead to inadvertent violations of stateless requirements and other thread-safety hazards.
If the behavior parameters do have side effects, unless explicitly stated, these side effects are not guaranteed to be visible to other threads, nor do they guarantee that different operations on the same element within the same flow pipeline execute on the same thread. In addition, the sequencing of these effects can be surprising. Even if the pipelining is constrained to produce a result that is consistent in order with the flow source (for example, Intstream.range (0,5). Parallel (). Map (x-x*2). ToArray () must produce [0, 2, 4, 6, 8]), There is also no guarantee that the Mapper function applies to the order of individual elements, or that any thread executes any behavior arguments for a given element. Many calculations may be tempted to use side effects that can be expressed more safely and effectively without side effects, such as using a simplification rather than a variable accumulator. However, side effects such as println () for debugging purposes are usually harmless. A small number of flow operations, such as ForEach () and Peek (), can only be manipulated by side effects; These should be used with caution. As an example of how to convert a stream pipeline that improperly uses side effects to an inappropriate flow pipeline, the following code searches for a string stream that matches a given regular expression and puts the match into the list.

         ArrayList<String> results = new ArrayList<>();         stream.filter(s -> pattern.matcher(s).matches())               .forEach(s -> results.add(s));  // Unnecessary use of side-effects!

This code uses undesirable side effects unnecessarily. If executed in parallel, non-thread-safe ArrayList can result in incorrect results, and adding the required synchronization will result in contention, thereby undermining the benefits of parallelism. In addition, the use of side effects here is completely unnecessary; In foreach () it is easier to use a shrink operation that is more secure, more efficient, and more suitable for parallel substitution:

         List<String>results =             stream.filter(s -> pattern.matcher(s).matches())                   .collect(Collectors.toList());  // No side-effects!

Order
The flow may or may not have an order, the flow order depends on the flow source and intermediate operations, and some stream sources (such as list or array) are internally ordered, while others (such as hashset) are not. Some intermediary business, such as sorted (), can be in an unordered, otherwise the initial order of the stream, and others may lead to an orderly flow of unordered, such as basestream.unordered (). In addition, some termination operations may ignore the flow order, such as ForEach (). If the flow is ordered, most operations are limited to manipulating elements in the order in which they are encountered; If the source of the stream is a list contains [1, 2, 3], then the result of the execution map (x-x*2) must be [2, 4, 6]. However, if the source does not have a defined order of encounters, any permutation of the value [2, 4, 6] is a valid result. For parallel streams, relaxing the sort constraint can sometimes achieve more efficient execution. If the order of the elements is not relevant, some collection operations (?? such as filtering duplicates (distinct ()) or group restores (Collectors.groupingby ()) can be implemented more efficiently. Similarly, an operation limit () that is intrinsically associated with a command may require buffering to ensure proper ordering, thereby undermining the benefits of parallelism. In the case where the flow has encountered a command, but the user is not particularly concerned about the encounter command, explicitly canceling the stream unordered () can improve the parallel performance of some stateful or terminal operations. However, even under sort constraints, most flow pipelines are still effectively parallelized.

Reduce operations
The retuce operation takes a sequence of input elements and is applied repeatedly by combining operations, such as finding a set of numbers, or aggregating the sum of elements or maximizing them into a single summarized result. The class of the stream has a generally reduced operation, so-called multiple forms of reduce () and collect (), as well as multiple specialized restore forms, such as SUM (), Max (), or count (). For example, to add 1 to 100. The traditional for loop is implemented this way.

        int sum = 0;        for(int x : numbers) {            sum += x;        }        return sum;                return numbers.stream()                .reduce(0, Integer::sum);

variable reduce
We want to stitch together a string in a collection, and we can manipulate the elements through the stream. As follows

        String result = strings.stream()                .reduce("", String::concat);                StringBuilder result = strings.stream()                .collect(StringBuilder::new,                        (sb, s) -> sb.append(s),                        (sb, sb2) -> sb.append(sb2));                                StringBuilder result = strings.stream()                .collect(StringBuilder::new,                        StringBuilder::append,                        StringBuilder::append);

In the first example above, although a string can be stitched together, it is inefficient due to the constant new string object. We should use StringBuilder to implement this function, the second and third is a string concatenation implemented with StringBuilder, where the third code is optimized with the method reference for the second segment. But this kind of feeling is still a bit cumbersome. Can there be a simpler way to do this, don't worry, the designers of the class library have come up with this application scenario. Collectors.join () is dedicated to solving the problem. Let's look at an example of a ready-made API

        String result = strings.stream()                    .collect(Collectors.joining(""));

Where the joining method has several overloaded methods, you can add delimiters, prefixes and suffixes, such as collectors.joining (",", "[", "]"), and the string to be added to our test is "Hello" "World", then the result is "[Hello, World] "

concluding remarks
Stream provides a very powerful feature, and there is a class collectors under the Java.util.stream package, which is a good partner with stream and is combined to make it more powerful in a simpler way. The above code listing can be downloaded on GitHub.

Reference Documents
Java Streams
Should I return a stream or a collection
Java8 gets the minimum and maximum values in the stream
Java.util.stream
JAVA8-Functional programming

Java Streams (i) API introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More