Spring Batch – Chunk based Steps – Basics

Till now we have been creating steps using Tasklets. Lets us look into another way of doing it using Chunks. Chunk based step is item based. So when we look at a chunk based step, we expect to be processing items individually. Within this type of step, there are 3 main components :

  1. Item Reader : Responsible for all the input of the step.
  2. Item Processor : This is an optional one. It is used when additional transformation, validation or additional logic that needs to be applied to each item.
  3. Item Writer : Provides the output of the step.

Lets build a chunk step whith just Reader and Writer. Starting with pom.xml


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.mynotes.spring.batch</groupId>
<artifactId>batch-chunk-basics</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>batch-basics</name>
<description>Demo project for Spring batch-chunk-basics</description>

<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.3.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>

application.properties


spring.datasource.url=jdbc:mysql://localhost:3306/spring_batch?autoReconnect=true&useSSL=false
spring.datasource.username=root
spring.datasource.password=admin

Launcher class Application.java


package com.mynotes.spring.batch;

import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@EnableBatchProcessing
public class Application {

public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}

Lets write our step reader MyStepReader.java


package com.mynotes.spring.batch;

import java.util.Iterator;
import java.util.List;
import org.springframework.batch.item.ItemReader;

public class MyStepReader implements ItemReader<String> {

private final Iterator<String> dataList;

public MyStepReader(List<String> dataList) {
this.dataList = dataList.iterator();
}

@Override
public String read() throws Exception {
if(this.dataList.hasNext()) {
return this.dataList.next();
}
else {
return null;
}
}
}

Above we implemented the ItemReader interface. It has a single method read() which returns the single item which is the individual unit of processing. If we have to say read 1000 file, 1 file is 1 unit. Above we have taken a simple list of String. So whatever Type you will be returning from the read method will be passed upon to the ItemProcessor (if you have one) or the ItemWriter. The read() method is called over and over within the contex of chunk until null is written. Once null is written , it tells spring batch that input has been exhaushted. If you do not return null when you are done, the process will keep on going.

Lets write our writer, MyStepWriter.java


package com.mynotes.spring.batch;

import java.util.List;
import org.springframework.batch.item.ItemWriter;

public class MyStepWriter implements ItemWriter<String> {

@Override
public void write(List<? extends String> items) throws Exception {
System.out.println("Writer chunk size: " + items.size());

for (String item : items) {
System.out.println("Writer::::::" + item);
}
}
}

Above we implemented an interface ItemWriter which has a single method write() which takes a list of items to be written. Now you may be thinking that the ItemReader return a single item but the ItemWriter takes a list. This is beacause while the items are read individually, all the items within a chunk are written at once. So if the chunk size is 10, the read() method gets called 10 times which fills up the chuk and then thins list is passed to the writer.  This could be use for optimisation.

Now lets see our job in JobConfiguration.java


package com.mynotes.spring.batch;

import java.util.Arrays;
import java.util.List;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class JobConfiguration {

@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private StepBuilderFactory stepBuilderFactory;

@Bean
public MyStepReader myReader() {
List<String> dataList = Arrays.asList("aaa","bbb","ccc","ddd","eee","fff");
return new MyStepReader(dataList);
}

@Bean
public MyStepWriter myWriter() {
return new MyStepWriter();
}

@Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<String,String>chunk(3)
.reader(myReader())
.writer(myWriter())
.build();
}

@Bean
public Job transitionJob() {
return jobBuilderFactory.get("chunkBasic1")
.incrementer(new RunIdIncrementer())
.start(step1())
.build();
}
}

Above we first intialized our reader in myReader(). myWriter() is quite straight forward. Then we created a step1.  .<String,String>chunk(3) tells spring that our chunk size is 3. The generics here is the return type of the Reader and the input of the Writer. This can be actaully be different if you are using a processor that converts it. We then wired in our Reader and Writer. We then created our job using this step1. Lets run it:

spring-batch-chunk-basics1

As you can see the chunk size when full gets passed to the write method which prints it.

Spring batch provides many out-of-the-box ItemReaders and ItemWriters. You can use them according to your need.

Advertisements
%d bloggers like this: