Java 8 vs. Scala (ii): Stream vs. Collection

Source: Internet
Author: User
Tags sorts stream api

"Editor's note" in the previous article, we introduced the comparison of lambda expressions between Java 8 and Scala. In this article, the second part of the Hussachai Puripunpinyo Java and Scala comparison trilogy, which focuses on Stream and Collection, is compiled by OneAPM engineers.

First of all, to make a brief introduction, collection is a limited set of data, and stream is a sequence set of data that can be finite or infinite.

The Streams API is a newly released API in Java 8, primarily for manipulating collection and streaming data. The collections API alters the state of the dataset, while the Streams API does not. For example, when you call Collections.sort (list), the method sorts the arguments that are passed in, and calls (). Sorted () Copies a copy of the data for operation, keeping the original data intact. You can get more information about the API data stream here

The following is a comparison between collections and streams that I have removed from the Java 8 documentation. We strongly recommend that you read the full version.

Streams and collections have the following differences:

    1. No storage. Steam is not a data structure that stores data elements. Instead, the data elements are transferred from the source by the calculation Operation pipeline.

2. The essence is the function. The Stream object operation can get a result, but the original data is not modified.

    1. Laziness-seeking (Delayed Search): Many operations of Stream such as filter, map, sort, and duplicate removal (de-weight) can be deferred, meaning that we can return if we check the elements that satisfy the requirements.

    2. May be unrestricted: Streams allows the Client to take enough elements until a certain condition is met. And collections can't do that.

      1. Consumption of. The elements in steam can only be accessed once during the steam lifetime.

Both Java and Scala can easily compute the values in collection at the same time. In Java, you only need to call Parallelstream () * or stream (). Parallel () instead of stream (). In Scala, you must call the par () function before calling other methods. You can also increase the performance of your program by adding parallelism. Unfortunately, most of the time it executes very slowly. In fact, parallelism is a feature that can easily be misused.

In JavaDoc, the Parallelstream () method is introduced: it is possible to return a parallel stream (collection as a data source), so it may also return a serial stream. (Someone has done a research on the API)

Image title

The Stream API for Java is deferred. This means that without specifying an end operation (such as the Collect () method call), all intermediate calls (such as the filter call) are not executed. Deferred stream processing is primarily intended to optimize the execution efficiency of the stream API. For example, to filter, map, and sum a data stream, by using a deferred mechanism, all operations are traversed once, reducing intermediate calls. Also, deferred execution allows each operation to process only the necessary data. In contrast, Scala's collections are processed on the fly. Does this mean that the Java Stream API is always superior to Scala in testing? If you only compare the Java Stream API with the Scala Collection API, then the Java Stream API is indeed better than the Scala Collection API. But there are more options in Scala. By simply calling Tostream (), you can convert a Collection to a Stream, or you can work with a collection of data using view, a Collection that provides deferred processing power.

Below is a rough introduction to the Stream and View features of Scala.

The Stream of Scala

Scala's Stream differs from Java. In Scala stream, there is no need to call the end operation to get the result of the stream. Stream is an abstract class that inherits Abstractseq, Linearseq, and generictraversabletemplate trait. So you can think of stream as a SEQ.

If you are unfamiliar with Scala, you can use Seq as a List in Java. (The list in Scala is not an interface.)

It is necessary to know that the elements in the Streams are lazily computed, and because of this, stream is able to calculate infinite data streams. If you want to calculate all the elements in a collection, the Stream and list have the same performance. Once the results are calculated, the values are cached. Stream has a force function that forces the evaluation stream to return the result. Note that you do not call the function in an infinite stream, nor do you force the API to handle the entire stream, such as size (), ToList (), foreach (), and so on, which are implicit in the Scala stream.

Implements the Fibonacci sequence in Scala Stream.

def fibFrom(a: Int, b: Int): Stream[Int] = a #:: fibFrom(b, a + b)val fib1 = fibFrom(0, 1) //0 1 1 2 3 5 8 …val fib5 = fibFrom(0, 5) //0 5 5 10 15 …//fib1.force //Don’t do this cause it will call the function infinitely and soon you will get the OutOfMemoryError//fib1.size //Don’t do this too with the same reason as above.fib1.take(10) //Do this. It will take the first 10 from the inifite Stream.fib1.take(20).foreach(println(_)) //Prints 20 first numbers

::is a common method of connecting data in collection. It #:: means that the data is connected but deferred (the method names in Scala are arbitrary).

Scala's View

Again, Scala's collection is a strict collection, and view is strictly non-rigid. View is based on the collection of a base collection, where all conversions are deferred. You can convert a strict collection to a view by calling the view function, or you can convert it back by calling the Force method. View does not cache the result, and the conversion is performed each time it is called. It's like a database View, but it's a virtual collection.

Create a data set.

public class Pet {    public static enum Type {        CAT, DOG    }    public static enum Color {        BLACK, WHITE, BROWN, GREEN    }    private String name;    private Type type;    private LocalDate birthdate;    private Color color;    private int weight;    ...}

Suppose you have a pet set, and then you'll use that set for detailed instructions.

Filter filters

Requirements: Filter a chubby pet from the collection, chubby is defined as weighing more than 50 pounds and wanting a list of pets born on January 1, 2013. The following code snippet shows how to implement the filter in different ways.

Java Method 1: Traditional way

//Before Java 8List<Pet> tmpList = new ArrayList<>();for(Pet pet: pets){    if(pet.getBirthdate().isBefore(LocalDate.of(2013, Month.JANUARY, 1))            && pet.getWeight() > 50){        tmpList.add(pet);    }}

This approach is common in imperative languages. First, you must create a temporary collection, and then traverse all the elements to store the elements that meet the criteria into a temporary set. It does have a bit of a detour, but the results and efficiency are very good. But I have to be disappointed to say that traditional methods are faster than Streams APIs. However, there is no need to worry about performance issues because the simplicity of the code is more important than the slight performance gain.

Java Methods 2:streams API

//Java 8 -    .filter(pet -> pet.getBirthdate().isBefore(LocalDate.of(2013,   Month.JANUARY, 1)))    .filter(pet -> pet.getWeight() > 50)    .collect(toList())

The above code indicates that the elements in the collection are filtered using the Streams API. Deliberately two calls to the filter function is to indicate that the Streams API design is like a Builder pattern. Before builder pattern calls the build method, you can concatenate the various methods together. In the Streams API, the build method is called an end operation, and a non-finalization operation is called an intermediate operation. The finalization operation may be different from the constructor because it can only be called once in the Streams API. But there are many end operations that you can use, such as Collect, count, Min, Max, iterator, ToArray. These operations produce results, and terminal operations consume values, such as ForEach. So, which of the traditional methods and Streams APIs Do you think is more readable?

Java Methods 3:collections API

//Java 8 - Collectionpets.removeIf(pet -> !(pet.getBirthdate().isBefore(LocalDate.of(2013,Month.JANUARY, 1))                && pet.getWeight() > 50));//Applying De-Morgan‘s law.pets.removeIf(pet -> pets.get(0).getBirthdate().toEpochDay() >= LocalDate.of(2013, Month.JANUARY, 1).toEpochDay()                || pet.getWeight() <= 50);

This method is the shortest. However, it modifies the original collection, and the previous method does not. The Removeif function takes the predicate (function interface) as an argument. predicate is a behavior parameter that only has an abstract method named Test and requires only one object and returns a Boolean value. Note that this must be used here! " "Take the reverse, or you can apply the De Morgan theorem to make the code look like two declarations."

Scala methods: Collection, view, and stream

//Scala - strict collectionpets.filter { pet => pet.getBirthdate.isBefore(LocalDate.of(2013, Month.JANUARY, 1))}.filter { pet => pet.getWeight > 50 } //List[Pet]//Scala - non-strict collectionpets.views.filter { pet => pet.getBirthdate.isBefore(LocalDate.of(2013, Month.JANUARY, 1))}.filter { pet => pet.getWeight > 50 } //SeqView[Pet]//Scala - streampets.toStream.filter { pet => pet.getBirthdate.isBefore(LocalDate.of(2013,  Month.JANUARY, 1))}.filter { pet => pet.getWeight > 50 } //Stream[Pet]

Scala's solution is similar to the Java Streams API. But first, you must call the view function to turn the strict set to the non-strict set, and then use the Tostream function to turn the strict set into a stream.

Next, go directly to the code.


A group is made from a collection of elements by an attribute of the element. The result is map<t, list<t= "" >>, where T is a generic type.

Requirements: Group pets by type, such as dogs, cats, etc.

Note: Groupingby is a static helper method for


Sorts the elements in the collection based on attributes. The result is any type of collection that maintains the order of elements according to the configuration.

Required: Sort by type, name, and color.


Applies the given function to the collection element. Depending on the function you define, the result type returned is different.

Requirements: You need to convert your pet into a string in %s?—?name: %s, color: %s the format.

Looking for the first one

Returns the first value that can match the specified predicate.

Request: Find a pet named Handsome . No matter how many Handsome , just take the first one.

The problem is a bit tricky. Don't know if you're aware that in Scala, I'm using the Find function instead of filter? If you replace find with filter, it calculates all the elements in the collection because Scala collection is strict. However, in the Java Streams API you can rest assured that filter is used because it calculates the first value needed and does not compute all the elements. This is the benefit of deferred execution!

Next, I'll introduce you to more instances of collection delay execution in Scala. We assume that filter always returns true and then takes a second value. What will be the result?

pets.filter { x => println(x.getName); true }.get(1) --- (1)pets.toStream.filter { x => println(x.getName); true }.get(1) -- (2)

As shown above, the (1) formula will print out the names of all the pets in the collection, whereas (2) only the first 2 pets are exported. This is the benefit of the lazy collection, which is always delayed calculation.

pets.view.filter { x => println(x.getName); true }.get(1) --- (3)

(3) formula and (2) formula will have the same result? Wrong! The result is the same as (1), do you know why?

By comparing some of the common operating methods in Java and Scala,--filter, group, map, and find, it's clear that Scala's approach is more concise than Java. Which one do you prefer? which is more readable?

In the next section of the article, we'll compare which way is faster. Please look forward to!

Original link: Https://

OneAPM for Java is able to perform application performance management and monitoring within all Java applications, including visibility of code-level performance issues, rapid identification and traceability of performance bottlenecks, real user experience monitoring, server monitoring, and end-to-end application management. To read more technical articles, please visit the OneAPM official blog.

Turn from:

Java 8 vs. Scala (ii): Stream vs. Collection

Related Article

Cloud Intelligence Leading the Digital Future

Alibaba Cloud ACtivate Online Conference, Nov. 20th & 21st, 2019 (UTC+08)

Register Now >

Starter Package

SSD Cloud server and data transfer for only $2.50 a month

Get Started >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.