Spring Batch - Partitioning

Metehan Alan
3 min readFeb 21, 2021

--

In this article, we will talk about Spring batch partitioning to use multiple threads to process a range of data sets in a Spring Boot application. Until this time, I had the opportunity to implement more than one batch in my professional career and have used partitioning on them. I would be glad to share my experiences here with you. Hope you enjoy it!

What Is Spring Batch Framework?

You probably have prior knowledge about Spring Batch Framework, especially if you want to learn about partitioning in Spring Batch. Nevertheless, in a brief summary, Spring Batch is a powerful framework for developing robust batch applications. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.

Spring Batch Architecture

Normally, when you work with a single thread, the architecture of the batch will be like the figure above. For each record in your dataset, (Reading-Processing-Writing) steps will be passed with just one thread one by one since Spring Batch is single-threaded by default.

Most batch processing problems can be solved using single-threaded, but few complex scenarios like single-threaded processing taking a long time to perform tasks, where parallel processing is needed, can be achieved using multi-threaded models. At this point, partitioning is one of the methods we can use.

Partitioning in Spring Batch

Spring Batch Partitioning Example

Partitioning uses multiple threads to process a range of data sets. The range of data sets can be defined programmatically. It’s on the use case, how many threads we want to create to be used in a partition(s).

Partitioning is really useful when we have a huge amount of data like millions. If you are not working with such data, you can work with a single thread otherwise we can’t rely on a single thread.

Spring Batch Partitioning Example

The dependency required for this project is Spring Batch.

I implemented the whole example on a single job configuration class where we are creating the necessary bean to perform the job. Normally you can implement it into separate classes like Partitioner, Reader, Processor, Writer.

First of all, a job definition is made where MasterStep is defined. Partition and Slave steps are defined in MasterStep. To keep the example simple, I reduced the steps like Processor-Writer to a single slave step.

We used SimpleAsyncTaskExecutor which is the simplest multi-threaded implementation of the TaskExecutor interface.

--

--

Metehan Alan
Metehan Alan

Written by Metehan Alan

Backend Engineer at FREE NOW

No responses yet