Java 8 new feature tour: Using Stream API to process collections
In this "Java 8 new feature tutorial" series, we will explain in depth and use code to demonstrate how to traverse collections through streams and how to create streams from collections and arrays, and how to aggregate the stream value.
In the previous article "traversing, filtering, processing collections, and using Lambda expression enhancement methods", I have explained in depth and demonstrated how to traverse collections through lambda expressions and method references, use the predicate interface to filter the set and implement the default method of the interface. Finally, the implementation of static methods of the interface is demonstrated.
The source code is on my Github: you can clone it from here.
Content list
-
Use a stream to traverse a set.
-
Creates a stream from a set or array.
-
The value in the aggregate stream.
1. Use a stream to traverse a set
Introduction:
Java's collection framework, such as the List and Map interfaces and Arraylist and HashMap classes, makes it easy for us to manage ordered and unordered collections. The Collection framework has been continuously improved since the first day of introduction. In Java SE 8, we can use stream APIs to manage, traverse, and aggregate sets. A stream-based set is different from an input/output stream.
How to work?
It uses a brand new method to process data as a whole, rather than individual. When you use a stream, you do not need to care about the details of loops or traversal. You can directly create a stream from a set. Then you will be able to use this stream for many events, such as traversal, filtering, and aggregation. I will start with the example in the com. tm. java8.features. stream. traversing package of the Java8Features project. In a SequentialStream class, Java SE 8 has two sets of streams: Serial streams and parallel streams.
List <person> people = new ArrayList <> ();
People. add (new Person ("Mohamed", 69 ));
People. add (new Person ("Doaa", 25 ));
People. add (new Person ("Malik", 6 ));
Predicate <person> pred = (p)-> p. getAge ()> 65;
DisplayPeople (people, pred );
...........
Private static void displayPeople (List <person> people, Predicate <person> pred ){
System. out. println ("Selected :");
People. forEach (p-> {
If (pred. test (p )){
System. out. println (p. getName ());
}
});
}
In these two streams, the serial stream is relatively simple. It is similar to an iterator that processes an element in the set each time. However, the syntax is different from the previous one. In this Code, I created an array List of pepole and transformed it to List. It contains three Person class instances. Then we declare a condition using Predicate. Only people that meet this condition will be displayed. This set is cyclically traversed in 48 to 52 rows of the displayPeople () method, and each item is tested one by one. Run this code and you will get the following results:
Selected:
Mohamed
I will demonstrate how to use a stream to refactor this code. First, I commented out this code. Then, in the Code of this annotation, I began to use the set object people. Then I call a stream () method. A stream object, similar to a set, must also declare generics. If you get a stream from a set, the type of each item in the stream is the same as that of the set. My set is an instance of the Person class, so the same generic type is used in the stream.
System. out. println ("Selected :");
// People. forEach (p-> {
// If (pred. test (p )){
// System. out. println (p. getName ());
//}
//});
People. stream (). forEach (p-> System. out. println (p. getName ()));
}
You can call a stream () method to obtain a stream object, and then perform some operations on the object. I simply called the forEach method. This method requires a Lamda expression. I passed a Lamda expression in the parameter. Each item in the list is processed by the iterator. The processing process is implemented through Lambda operators and methods. I simply use system output to output the name of each person. Save and run the code. The output result is as follows. Because no filtering is performed, all elements in the list are output.
Selected:
Mohamed
Doaa
Malik
Now, once you have a stream object, you can easily use the predicate object. When you use the for each method to process each item, I have to display the test method that calls predicate, but when using stream, you can call a method named filter. This method receives a predicate object. All the predicate objects have a test method, so it knows how to call this method. So I made some changes to the Code. I moved the. forEach () method down two rows, and then in the blank row in the middle, I called the filter method.
People. stream ()
. Filter (pred)
. ForEach (p-> System. out. println (p. getName ()));
The filter method receives the instance object of a predicate interface. I will pass in the predicate object. The filtr method returns a filtered Stream object. On this object, I can call the forEach () method. I run this Code. This time, I only show items that meet the predefined conditions in the set. You can do more things on stream objects. Let's take a look at the doc documentation in the Java SE 8 API.
Selected:
Mohamed
You will see that in addition to filtering, you can also do aggregation, sorting, and other things. Before summing up this demonstration, I want to show you the important differences between a serial stream and a parallel stream. An important goal of Java SE 8 is to improve the processing capability of multiple CPU Systems. Java can automatically coordinate the running of multiple CPUs at runtime. All you need to do is switch the serial stream into a parallel stream.
In terms of syntax, there are two methods to achieve Stream Conversion. I copied a serial stream class. In the package View window, I copy and paste the class, rename it, ParallelStream, and open this new class. In this version, the annotated code is deleted. I no longer need these annotations. Now you can create parallel streams in two ways. The first method is to call the parallelStream () method in the set. Now I have a stream that can automatically allocate a processor.
Private static void displayPeople (List <person> people, Predicate <person> pred ){
System. out. println ("Selected :");
People. parallelStream ()
. Filter (pred)
. ForEach (p-> System. out. println (p. getName ()));
}
By running this code, you can see completely consistent results, filter and return data.
Selected:
Mohamed
The second method is to create a parallel stream. Call the stream () method again and then call the parallel () method based on the stream method. Essentially, the same thing is done. It starts with a serial stream and then converts it into a parallel stream. But it is still a stream. It can be filtered and processed in the same way as before. However, the current stream can be decomposed into multiple processes.
People. stream ()
. Parallel ()
. Filter (pred)
. ForEach (p-> System. out. println (p. getName ()));
Summary
There is no clear rule to explain under what circumstances the parallel stream is better than the serial stream. This depends on the data size and complexity and hardware processing capabilities. And your multi-CPU system. The only suggestion I can give you is to test your application and data. Create a baseline and timing operation. Then use the serial stream and parallel stream respectively to see which one is more suitable for you.
2. Create a stream from a set or Array
Introduction
Java SE 8's stream API is designed to help manage data sets. These objects refer to objects in the Collection framework, such as array lists or hash tables. However, you can also directly create a stream from an array.
How to work?
Under the eg.com. tm. java8.features. stream. creating package in the Java8Features project, I created a class named ArrayToStream. In the main method of this class, I created an array containing three elements. Each element is an instance object of the Person class.
Public static void main (String args []) {
Person [] people = {
New Person ("Mohamed", 69 ),
New Person ("Doaa", 25 ),
New Person ("Malik", 6 )};
For (int I = 0; I <people. length; I ++ ){
System. out. println (people [I]. getInfo ());
}
}
The setters, getters, and getInfo () methods are created for Private Members in this class. This method returns a concatenated string.
Public String getInfo (){
Return name + "(" + age + ")";
}
Now, if you want to use a stream to process this array, you may think that you need to convert the array into an array list and then create a stream from this list. However, you can create a stream directly from an array in two ways. First, I do not need to process the three lines of data, so I need to comment out them first. Then, under this, I declare a stream type object.
Stream is an interface under java. util. stream. When I press Ctrl + Space and select it, the system will prompt the element generic type, which is the type of stream management. Here, the element type is Person, which is consistent with the type of the array element. I name my new stream object stream, and all the letters are in lower case. This is the first method to create a stream. Use the stream interface to call the of () method. Note that there are two different versions of this method.
The first is to need a single object, and the second is to need multiple objects. I use a parameter method, so passing an array named people is all I need to do. Stream. of () means to input an array and wrap it in the Stream. Now, I can use lambda expressions, filters, method references, and other stream object methods. I will call the for each method of the stream and pass in a lambda expression. After passing in the current person object and the lambda operator, I can get the information of the person object. This information is obtained through the getInfo () method of the object.
Person [] people = {
New Person ("Mohamed", 69 ),
New Person ("Doaa", 25 ),
New Person ("Malik", 6 )};
// For (int I = 0; I <people. length; I ++ ){
// System. out. println (people [I]. getInfo ());
//}
Stream <Person> stream = Stream. of (people );
Stream. forEach (p-> System. out. println (p. getInfo ()));
Save and run this code to get the result. The order of the output elements is the same as the order in which I put them. This is the first method: Use the Stream. of () method.
Mohamed (69)
Doaa (25)
Malik (6)
The other method is actually the same as the above method. Copy the above Code and comment out the first method. This time, the Stream. of () method is not used. We use the class named Arrays, which is located under the java. util package. In this class, you can call the method named stream. Note that the stream method can wrap various types of arrays, including basic types and composite types.
// Stream <person> stream = Stream. of (people );
Stream <person> stream = Arrays. stream (people );
Stream. forEach (p-> System. out. println (p. getInfo ()));
Save and run the above Code. The process of stream completion is essentially the same as before.
Mohamed (69)
Doaa (25)
Malik (6)
Conclusion
Therefore, whether it is Stream. of () or Arrays. stream (), what is done is essentially the same. All are converted from an array of the basic or composite object type to a stream object. Then, you can use lambda expressions, filters, method references, and other functions.
3. Aggregate stream Value
Introduction
Previously, I have described how to use a stream to iterate a set. You can also use a stream to aggregate each item in the set. Such as the sum, average, and total. When you perform these operations, it is very important to understand the features of parallel streams.
How to work?
I will demonstrate it in the eg.com. tm. java8.features. stream. aggregating package of the Java8Features project. First, we use the ParallelStreams class. In the main method of this class, I created an array list containing string elements. I simply added 10000 elements to the list using loops. Then in lines 35 and 36, I created a stream object and used the for each method to output each item in the stream.
Public static void main (String args []) {
System. out. println ("Creating list ");
List <string> strings = new ArrayList <> ();
For (int I = 0; I <10000; I ++ ){
Strings. add ("Item" + I );
}
Strings. stream ()
. ForEach (str-> System. out. println (str ));
}
After running this code, I got the expected result. The output order on the screen is the same as that added to the List.
.........
Item 9982
Item 9983
Item 9984
Item 9985
Item 9986
Item 9987
Item 9988
Item 9989
Item 9990
Item 9991
Item 9992
Item 9993
Item 9994
Item 9995
Item 9996
Item 9997
Item 9998
Item 9999
Now let's take a look at what will happen when it is converted to a parallel stream. As I described earlier, you can call the parallelStream method or the parallel method on the stream.
I will use the second method. Now, I can use a parallel stream, which can be processed by multiple processors based on the load.
Strings. stream ()
. Parallel ()
. ForEach (str-> System. out. println (str ));
Run the code section again and observe what will happen. Note that the final printed element is not the last element in the list, and the last element should be 9999. If I scroll the output result, I can find that the processing process is in a loop in some way. This is because data is divided into multiple blocks during runtime.
.........
Item 5292
Item 5293
Item 5294
Item 5295
Item 5296
Item 5297
Item 5298
Item 5299
Item 5300
Item 5301
Item 5302
Item 5303
Item 5304
Item 5305
Item 5306
Item 5307
Item 5308
Item 5309
Item 5310
Item 5311
Then, assign the data block to the appropriate processor for processing. The code is executed only when all blocks are processed. Essentially, when calling the forEach () method, the entire process is divided as needed. At present, this operation may increase performance or not. This depends on the size of the dataset and the performance of your hardware. In this example, we can also see that if you need to process each item one by one in the order of addition, parallel stream may not be suitable.
The serial stream ensures that the sequence of each operation is consistent. But in terms of definition, parallel streams are more efficient. Therefore, parallel streams are very effective in aggregation operations. It is suitable for considering a set as a whole and performing aggregation operations on the set. I will use an example to demonstrate the count, average, and sum operations of the Set elements.
We will count in the main method of this class and start with the same basic code. Create a list of 10,000 strings. Then, a for each method is used to process each item cyclically.
Public static void main (String args []) {
System. out. println ("Creating list ");
List <string> strings = new ArrayList <> ();
For (int I = 0; I <10000; I ++ ){
Strings. add ("Item" + I );
}
Strings. stream ()
. ForEach (str-> System. out. println (str ));
}
In this example, I want to count the collection elements directly, instead of processing them one by one. So I commented out the original code and used the following code. Because you cannot accurately know the number of elements in the set. So I use long integer variables to store results.
I name this variable count. by calling the. stream (),. count () method of the strings set, a long integer value is returned. Then splice the value with "count:" and print it through the output of system.
// Strings. stream ()
//. ForEach (str-> System. out. println (str ));
Long count = strings. stream (). count ();
System. out. println ("Count:" + count );
Save and run the Code Section. The following is the output result. The statistics on the number of elements in the collection are almost instantaneous.
Creating list
Number: 10000
Now let's make a small change to the above Code and add two zeros. Now, start processing 1000,000 strings. I ran this code again and soon returned the result.
Creating list
Number: 1000000
Now, I use parallel streams for processing to see what will happen. I will add the parallel method below:
// Strings. stream ()
//. ForEach (str-> System. out. println (str ));
Long count = strings. stream (). parallel (). count ();
System. out. println ("Count:" + count );
Then I ran this code and found that it took a little longer. Now, I am doing a benchmark test to observe what happened by capturing the timestamps before and after the operation. And then do some math. Different systems may have different results. However, based on my experience, this kind of simple set containing simple types does not have much advantage in using parallel streams. However, I encourage you to perform Benchmark Testing on your own, although it is troublesome. But how do you do it.
Let's take a look at the sum and mean. I will use the SumAndAverage class. This time, I have a list containing three person objects. Each person object has a different age value. My goal is to calculate the sum of the three ages and the average of the ages. I added a new line of code after all the person objects are added to the list. Then, I created an integer variable named sum.
First, I use the pepole. stream () method to obtain a stream. Based on this stream, I can call the mapToInt () method. Note that there are two similar Map Methods: mapToDouble () and mapToLong (). These methods aim to obtain simple basic data types from composite types and create stream objects. You can use lambda expressions to complete this task. Therefore, I chose the mapToInt () method because each person's age is an integer.
Lambda expressions start with a variable that represents the current person. Then, an integer is returned through the Lambda operator and Lambda Expression p. getAge. This return value is also called an int string. You can also return double strings or other types. Now that we know it is a numeric value, we can call the sum () method. Now, I have added all the age values of the person object in all sets. Using a statement, I can use System Output to Output the result. I connect the result of the sum with "Total of ages" and output it together.
List <person> people = new ArrayList <> ();
People. add (new Person ("Mohamed", 69 ));
People. add (new Person ("Doaa", 25 ));
People. add (new Person ("Malik", 6 ));
Int sum = people. stream ()
. MapToInt (p-> p. getAge ())
. Sum ();
System. out. println ("Total of ages" + sum );
Save and run the above Code. The sum of the three ages is 100.
Total of ages 100
The average value of these values is very similar. However, Division is required to calculate the average value, so the Division value is 0. Therefore, when you calculate the average value, you can return an Optional variable.
You can use multiple data types. When calculating the average value, I want to obtain a value of the doule type. Therefore, I created a variable of the OptionalDouble type. Note that Optional Int and Optional Long exist. I name the average value avg. The code used is the same as the sum code, and people. stream () is used (). On this basis, use mapToInt () again (). The same lambda expression is passed. Finally, the average method is called.
Now we get a variable of the OptionalDouble type. Before processing this variable, you can use isPresent () to ensure that it is indeed a double value. Therefore, I used a piece of if/else template code for processing. The condition is avg. isPresent (). If the condition is true, use System Output to Output the "Average" label and Average value. In the else clause, I simply print "average wasn' t calculated ".
OptionalDouble avg = people. stream ()
. MapToInt (p-> p. getAge ())
. Average ();
If (avg. isPresent ()){
System. out. println ("Average:" + avg );
} Else {
System. out. println ("average wasn't calculated ");
}
Now, in this example, I know that it can be successful, because I assigned a value to all three people of age. However, this is not always the case. As I mentioned earlier, if there is a division of 0, then you cannot obtain a double type return value. I save and run this code. Please note that the optional double class is a composite object.
Total of ages 100
Average: OptionalDouble [33.333333333333336]
Therefore, the actual value is included in this type, return to this Code, reference this object directly, and call the getAsDouble () method.
If (avg. isPresent ()){
System. out. println ("Average:" + avg. getAsDouble ());
} Else {
System. out. println ("average wasn't calculated ");
}
Now, I can get the value of the double type. Run the code again and the output result is as follows:
Total of ages 100
Average: 33.333333333333336
Conclusion
Through stream and lambda expressions, you can use very few code to complete aggregation computing of a set.