Spring Batch : Basics

Most enterpises reply heavily on batch processing to perform business operations. These business operations include automated, complex processing of large volumes of information that is most efficiently processed without user interaction. These operations typically include time based events (e.g. month-end calculations, notices or correspondence), periodic application of complex business rules processed repetitively across very large data sets (e.g. Insurance benefit determination or rate adjustments), or the integration of information that is received from internal and external systems that typically requires formatting, validation and processing in a transactional manner into the system of record.

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch is not a scheduling framework. There are many good enterprise schedulers available in both the commercial and open source spaces such as Quartz, Tivoli, Control-M, etc. It is intended to work in conjunction with a scheduler, not replace a scheduler.

Lets see few of the terminology used in Spring Batch framework:

JOB : Some big functionality, composed of one, or many steps.

STEP : A step represents an independednt piece of processing that makes up a job. A job can have many steps. There are 2 different type of step:

  • Tasklet : is a single interface with a single method  ‘execute’ on it. you can write whatever code you want within it and spring will run it everytime the Job is run. You ca ftp a file, send emails etc. there are some out-of-the-box Tasklet that you can use to speed up your development like the System command tasklet, which takes a commandline string string for you to run.
  • Chunk : This is the other type of Step. Chunk based step is item based. So wehn we look at a chunk based step, we expect to be processing items individually. Within this type of step, there are 3 main components :
    1) Item Reader : Responsible for all the input of the step.
    2) Item Processor : This is an optional one. It is used when additional transformation, validation or additional logic that needs to be applied to each item.
    3) Item Writer : Provides the output of the step.

The following figure illustrates a simple batch job:

simplebatchjob

The following figure illustrates a batch job that has two steps:

batchjobwithmultiplesteps

Lets see this in action. We will be creating a spring-boot application with batch starter. pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.mynotes.spring.batch</groupId>
<artifactId>batch-basics</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>batch-basics</name>
<description>Demo project for Spring Boot</description>

<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.3.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>

</project>

Spring batch saves Job related details in a database. I will tell about it later. For now just give a datasource to it. I am using mysql. application.properties:


spring.datasource.url=jdbc:mysql://localhost:3306/spring_batch?autoReconnect=true&amp;useSSL=false
spring.datasource.username=root
spring.datasource.password=admin

spring.jpa.hibernate.ddl-auto=validate
logging.level=DEBUG
spring.jpa.show-sql=false

Writing our launcher class. BatchBasicsApplication.java . We just need to add the @EnableBatchProcessing above it for enabling batch.


package com.mynotes.spring.batch;

import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@EnableBatchProcessing
public class BatchBasicsApplication {

public static void main(String[] args) {
SpringApplication.run(BatchBasicsApplication.class, args);
}
}

Lets write our actual Job in a seperate class JobConfiguration.java


package com.mynotes.spring.batch;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class JobConfiguration {

@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private StepBuilderFactory stepBuilderFactory;

@Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.tasklet(new Tasklet() {
@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
System.out.println("FIRST JOB - STEP 1");
return RepeatStatus.FINISHED;
}
}).build();
}

@Bean
public Job helloWorldJob() {
return jobBuilderFactory.get("MyFirstJob")
.start(step1())
.build();
}

}

@Configuration will be picked by the component scan of spring. We are autowiring 2 factories. A JobBuilderFactory and a StepBuilderFactory. We then created a step1 using stepBuilderFactory. For simplicity we are using tasklets. You can use the lambda expression to do this. We then created a job using jobBuilderFactory and add step1 to it. Output:

spring-batch-basics1

Job repository keeps track of job execution, status etc. It has information such as job success, failure, when the failure occurred, from where the job has to restart etc. If you dont provide one its will take a map one but its higly recommneded you use a database.

batch-arch

Lets discuss about few things before we look into the tables. As we know, a Job is a flow of steps that you can progress through.

spring-batch-basics2

Lets say we have a Job called ‘logsDump’ that run every night. For each Job we would create JobInstance like logsDump job for 27/01/2017. A JobInstance is a logical run of the Job, so you would have 1 Job instance for each day. Each time we launch the Job we are going to create JobExecution. A JobInstance can have many JobExecutions. In ideal cases there would be only one but lets say we had a JobInstance started and it failed half way thorough the job. We then fix the data and now we reun the Job. For that we use the same JobInstance and create a 2nd Job Execution for the same day. We will continue this until tha JobExecution s complete. Once its completed, JobInstance cannot be run again. Follwing is a glace look of the tables creaed. We will explore more in the later posts:

spring-batch-basics3

OK, our previous ‘MyFirstJob’ ran successfullt. Lets try to run it again.

spring-batch-basics3

The job didnt ran. Lets see the table. If we see the BATCH_JOB_EXECUTION table

spring-batch-basics4

The second Execution didnt ran since the first was COMPLETED. Had it been something else like fail , it would have ran.  Notice that the logs says the Job was started without sending any parameters. So when you ran it first it took parameter as null, now to run again you have to send a different parameter. We will get into Job Parameters in a later posts , but for now lets take a simple approach using an RunIdIncrementer.  Note, for production its highly recommended to use a Job Parameter, but for now lets keep thing simple. So changing our job build.


@Bean
public Job helloWorldJob() {
return jobBuilderFactory.get("MyFirstJob")
.incrementer(new RunIdIncrementer())
.start(step1())
.build();
}

It will excute printing the sysout. Lets see the table:

spring-batch-basics5

Advertisements
%d bloggers like this: