Sunday, August 10, 2014

Leveraging Spring Batch to Run Your Workflows

In most enterprise applications which involves rapid data transactions and data sources of large volumes, there is always the requirement of a batch processing framework to do the bulk data processing. Most often, this bulk data processing job will be a EOD, Weekly or a Monthly job in which the data would be propagated from one source to the other and will probably be tunneled through a data processing engine in the middle. Spring Batch is one of the stand out frameworks today which is tailor made for this kind use case. It even contains many off-the-shelf components for reading and writing to various data sources, ex: Files (Plain, CSV, XML), Databases (DB2, SQL) and etc. which makes it really easy to develop a working solution in very quick time. It may end up to be one single bean configuration file which contains all the definitions from data reading, tokenization, POJO mapping, data processing and writing to the end source. I will share some of the very good spring batch references for anyone who is a little new for Spring Batch.

However, in this blog post, I will try to explore more into Spring Batch and see how it can really help an enterprise application as a workflow engine. It is not always the case in large enterprise applications to have only one use case for Bulk Data Processing. As more applications get integrated, the core framework needs to support many different kinds of bulk data processing and in fact, until the requirement comes, you may not know what you really need.

If the aforementioned argument make sense to your architecture as well, then Spring Batch is even more suited. In my experience with Spring Batch for the last few months, what I realized more and more is how its' core architecture can beautifully be separated from its implementations. In most cases, you will be working with its implementations that fits to your use cases. However when you peel them off, what you find is a very extensible, and very powerful framework that can help you run many different kinds of business workflows and with many capabilities that you want it to have.

The key is to write a core component in your framework that embeds Spring Batch and leverage Inversion of Control to externalize the eventual business implementation details from the framework. I will provide a design level idea for this in a bit. 

Whatever the core capabilities that are required, you need to orchestrate them in your framework implementation.

I encountered the requirement to design a workflow engine using Spring Batch which has the following capabilities.

  • Extensible to easily configure new workflows
This is what I mean when I spoke about externalizing business implementation. We needed to support accelerated development to process different types of workflows as and when they come.
  • Parallel Processing Capability.
Once again, Spring Batch has this capability off-the-shelf with the concept of Multi Threaded Steps. [1]
All you have to do is, specify your way of splitting the work into set of workers. Using the remote chunk processing concept in Spring Batch, you can define a Master Step which will delegate the work to set of Slave Steps which will complete the task a multi-threaded order. 


  • Configurable
With Spring Batch, this becomes very easy because everything can be configured using the bean configuration.
  • Recoverability
This is one important aspect for any workflow process. Because the data being process bound to has fault data, thus it is essential that you have the control over how the workflow process the data. If something bad happens, you should be able to bailout from completing the workflow and later manually invoke it after correcting whatever that has gone wrong before.

Spring Batch Jobs once again has an off-shelf feature for this called Job Repository, which has its own data schema which stores the state of each job that it runs into a specified data store.(ex: Database) You can make use of this solution for your recoverability purpose, or if it is too over complicated in your opinion (like mine), you can design your own solution and Spring has enough space in its architecture to place yours.

Here I will get back to the externalization piece which will help easily configure many workflows using a straight forward interface implementation. As a good engineer, we make it a practice to externalize the interface from the implementation. It is the same thing that we can do here.

For a Spring Job, there are 3 key extension points that you can look at.
  1. ItemReader
  2. ItemProcessor
  3. ItemWriter
Externalizing Implementation from Core Architecture
The above diagram depicts a reference design which can actually externalize the implementation details out of the core classes and into the business layer using the inversion of control. To make it more understandable, I will take few code snippets to show how this can be achieved.

1. Sample Bean Configuration


         <!-- batch job -->
 <job id="batchjob" xmlns="http://www.springframework.org/schema/batch">
  <step id="masterStep">
   <partition step="slave" partitioner="lotPartitioner">
    <handler task-executor="taskExecutor" />
   </partition>
  </step>
 </job>
        <!-- Slave -->
 <step id="slave" xmlns="http://www.springframework.org/schema/batch">
  <tasklet>
   <chunk reader="customItemReader" writer="customItemWriter"
    processor="customRowProcessor" />
  </tasklet>
 </step>

We have set customized reader,writer and a processor for a slave. Lets look at one of the three, customItemProcessor to see how simply we can externalize the implementation using IOC.


2. Reference Implementation of Custom Item Processor


public class CusItemProcessor implements ItemProcessor {

 @Autowired
 private IWorkflowLifeCycleMethods lifecycleMethods; 

 public BaseItem process(BaseItem item) throws Exception {
  return lifecycleMethods.process(item);
 }
 
}

In this manner, we have abstracted the core classes and made sure for each use case for the workflow engine, we only need to define a set of methods defined in the IWorkflowLifeCycleMethods interface and luckily using dependancy injection we can configure different implementations to meet our different needs in different workflows.

For further reading, I encourage to refer the book, "Spring Batch In Action" which has a wealth of information.

Some other useful links:


Cheers.!!

No comments:

Post a Comment

My Stack Overflaw Flair