The Stream API in Java Programming

Together with lambda expressions on the language side, an entire new library was implemented in Java 8 that makes processing datasets easy called the Stream API.

At the core of the Stream API are operations to filter, map, and reduce the data in collections.

Declarative Programming

The Stream API is used in a functional style, and programs can thus be quite compact. The individual methods of the Stream API are all presented in detail in this section, each shown in the following example:

Object[] words = { " ", '3', null, "2", 1, "" };

Arrays.stream( words ) // Creates new stream

.filter( Objects::nonNull ) // Leave non-null references in the stream

.map( Objects::toString ) // Convert objects to strings

.map( String::trim ) // Truncate whitespace

.filter( s -> ! s.isEmpty() ) // Leave non-empty elements in the stream

.map( Integer::parseInt ) // Convert strings to integers

.sorted() // Sort the integers

.forEach( System.out::println ); // 1 2 3

While the classes from the Collection API implement optimal storage forms for data, the task of the Stream API is to conveniently query and aggregate the data. The Stream API emphasizes the what, not the how. Traversals and iterations don’t occur in the code; instead, the Fluent API declaratively describes what the result should look like. The library ultimately implements the how. For example, an implementation can decide whether processing is sequential or parallel, whether the order is important, whether all data must be cached for sorting purposes, and so on.

The Pipeline Principle for Streams from the Previous Example

Internal versus External Iteration

The first thing you notice about the Stream API is that the classic loop is missing. Usually, you would use loops to run through data and make queries on the elements. Traditional loops are always sequential and run from element to element, from beginning to end. The same rule is true for an iterator. The Stream API takes a different approach. With its help, the external iteration (controlled by loops from the developer) can be replaced by an internal iteration (the Stream API fetches data). For example, when forEach(...) requests data, the data source is tapped and the data retrieved, but not before.

One advantage is that we specify which data structure should be run through, but how this task is done internally can be determined and optimized by the implementation itself. If you write the loop yourself, the processing always runs element by element, while an internal iteration can also parallelize on its own and have partial problems computed by multiple execution units.

Note: Appended to various collections is a forEach(...) method that runs over all elements and calls a method on a passed consumer. However, the classic for loop (or even the extended for loop) isn’t now obsolete. Besides being easy to write and debug, the usual loop still has some advantages. forEach(...) usually gets the executable code via a lambda expression, which has its limitations. For example, a lambda expression may not describe local variables (all local variables addressed by the lambda expression are effectively final) and may not throw checked exceptions. Inside a loop, neither is an issue. Incidentally, a break can be used for loop terminations, which doesn’t exist in lambda expressions either (return in lambda corresponds to continue).

What Is a Stream?

A stream is a sequence of data (but not a data source per se) that stores data like a data structure. The data from the stream is processed through a chain of the following downstream processing steps:

Filter
Map
Reduce

Processing along a chain is referred to as a pipeline and consists of three components:

The pipeline starts with a data source, such as an array, a data structure, or a generator.
Various processing steps follow, such as filtering (elements disappear from the stream) or mapping (a data type can also be converted into another data type). These changes along the way are called intermediate operations. The result of an intermediate operation is again a stream.
At the end, the result is collected, and the result is no longer a stream. For example, a reduction would be the formation of a maximum or the concatenation of strings.

The actual data structure isn’t changed; rather, at the end of the intermediate operations, a terminal operation asks for the result. An example terminal operation is forEach(...), which is located at the end of the chain and for which the stream stops.

Many terminal operations reduce the passing data to a value, unlike forEach(...), for example. This reduction occurs in methods for simply counting elements or for calculating totals and are called reducing operations. In the API, ready-made methods are available for standard reductions—for instance, calculating a total, maximum, or average— but general reductions are possible via your own functions, for example, calculating a product instead of the total.

Lazy Love

All intermediary operations are “lazy” because they postpone computations until they are needed. As shown in the first example, when the elements are taken from the array, they are passed to the next processing step in order. If the filter removes elements from the stream, they are gone and don’t need to be considered in a later step. It is therefore not the case that the data physically exists multiple times in a data structure with all elements, for example.

In contrast to continuing operations, in terminal operations, a result must be present: These operations are “eager.” Basically, everything is deferred until a value is needed, that is, until a terminal operation really wants to access the result.

State: Yes or No?

Intermediary operations may or may not have states. A filter operation, for example, has no state because, to accomplish its task, the program must look only at the current element, not at preceding ones. A sort operation, on the other hand, has a state: It “wants” all other elements to be saved because only knowing the current element isn’t sufficient for sorting; knowledge of all preceding ones is needed too.

Learn more about Java programming here.

Editor’s note: This post has been adapted from a section of the book Java: The Comprehensive Guide by Christian Ullenboom.