First batch job on Podcastpedia.org with EasyBatch

(P) Codever is an open source bookmarks and snippets manager for developers & co. See our How To guides to help you get started. Public bookmarks repos on Github ⭐🙏
Remember the first batch job for Podcastpedia.org, presented in Spring Batch Tutorial with Spring Boot and Java Configuration… There, I would read submitted podcasts from a .csv file to add them to the Podcastpedia.org directory (database). Well today I will present how I automated the creation of this kind of input file, with the help of Easy Batch. Why EasyBatch? Because, after seeing my initial post, I was contacted by its founder, Mahmoud Ben Hassine, to have a look at Easy Batch and give it a try. I did, and I am happy about that. Read on to find out why…
Contents
1. Job description
The batch job is fairly simple: reads database entries containing the submitted podcasts from one table and generates a properly formatted .csv file
2. Project setup
Reading from a database and writing to a flat file requires the following libraries in the classpath, which also bring the transitive dependency easybatch-core
:
<dependency> <groupId>org.easybatch</groupId> <artifactId>easybatch-flatfile</artifactId> <version>${easybatch.version}</version> </dependency> <dependency> <groupId>org.easybatch</groupId> <artifactId>easybatch-jdbc</artifactId> <version>${easybatch.version}</version> </dependency>
The current version is 2.2.0.
3. Implementation
3.1. Job launcher
For simplicity I chose to launch the job from a main method:
package org.podcastpedia.batch.jobs.generatefilefromsuggestions; import java.io.File; import java.io.FileWriter; import java.sql.Connection; import java.sql.DriverManager; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date; import org.easybatch.core.api.EasyBatchReport; import org.easybatch.core.impl.EasyBatchEngine; import org.easybatch.core.impl.EasyBatchEngineBuilder; import org.easybatch.jdbc.JdbcRecordReader; public class JobLauncher { private static final String OUTPUT_FILE_HEADER = "FEED_URL; IDENTIFIER_ON_PODCASTPEDIA; CATEGORIES; LANGUAGE; MEDIA_TYPE; UPDATE_FREQUENCY; KEYWORDS; FB_PAGE; TWITTER_PAGE; GPLUS_PAGE; NAME_SUBMITTER; EMAIL_SUBMITTER"; public static void main(String[] args) throws Exception { //connect to MySql Database Class.forName("com.mysql.jdbc.Driver").newInstance(); Connection connection = DriverManager.getConnection(System.getProperty("db.url"), System.getProperty("db.user"), System.getProperty("db.pwd")); FileWriter fileWriter = new FileWriter(getOutputFilePath()); fileWriter.write(OUTPUT_FILE_HEADER + "\n"); // Build an easy batch engine EasyBatchEngine easyBatchEngine = new EasyBatchEngineBuilder() .registerRecordReader(new JdbcRecordReader(connection, "SELECT * FROM ui_suggested_podcasts WHERE insertion_date >= STR_TO_DATE(\'" + args[0] + "\', \'%Y-%m-%d %H:%i\')" )) .registerRecordMapper(new CustomMapper()) .registerRecordProcessor(new Processor(fileWriter)) .build(); // Run easy batch engine EasyBatchReport easyBatchReport = easyBatchEngine.call(); //close file writer fileWriter.close(); System.out.println(easyBatchReport); } private static String getOutputFilePath() throws Exception { //create if not existent a "weeknum" directory in the given "output.directory.base" directory Date now = new Date(); Calendar calendar = Calendar.getInstance(); calendar.setTime(now); int weeknum = calendar.get(Calendar.WEEK_OF_YEAR); String targetDirPath = System.getProperty("output.directory.base") + String.valueOf(weeknum); File targetDirectory = new File(targetDirPath); if(!targetDirectory.exists()){ boolean created = targetDirectory.mkdir(); if(!created){ throw new Exception("Target directory could not be created"); } } //build the file name based on current time to be placed in the "weeknum" directory DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH.mm"); String outputFileName = "suggestedPodcasts " + dateFormat.format(now) + ".csv"; String filePath = targetDirPath + "/" + outputFileName; return filePath; } }
Let’s have a look at the different components from the Launcher class:
3.2. Connect to MySQL
Class.forName("com.mysql.jdbc.Driver").newInstance(); Connection connection = DriverManager.getConnection(System.getProperty("db.url"), System.getProperty("db.user"), System.getProperty("db.pwd"));
When using the JDBC outside of an application server, the DriverManager
class manages the establishment of connections. You have to:
“Specify to the DriverManager which JDBC drivers to try to make Connections with. The easiest way to do this is to use Class.forName() on the class that implements the java.sql.Driver interface. With MySQL Connector/J, the name of this class is com.mysql.jdbc.Driver. With this method, you could use an external configuration file to supply the driver class name and driver parameters to use when connecting to a database.” [2]
Once the MySQL driver has been registered, you can obtain a Connection to the database by calling the DriverManager.getConnection()
method with given MySQL database URL.
Note – make sure you also have the MySQL JDBC connector in your classpath:
<!-- MySQL JDBC connector --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.31</version> </dependency>
3.3. Create an Easy Batch engine
Creating an Easy Batch engine is straightforward and can be done through the EasyBatchEngineBuilder
API as follows
// Build an easy batch engine EasyBatchEngine easyBatchEngine = new EasyBatchEngineBuilder() .registerRecordReader(new JdbcRecordReader(connection, "SELECT * FROM ui_suggested_podcasts WHERE insertion_date >= STR_TO_DATE(\'" + args[0] + "\', \'%Y-%m-%d %H:%i\')" )) .registerRecordMapper(new CustomMapper()) .registerRecordProcessor(new Processor(fileWriter)) .build();
This is actually the whole batch configuration for the job. Short and clear:
- register a record reader, in our case a
JdbcRecordReader
, for which you need to specify the connection created earlier and SQL string to execute against it - register a custom record mapper
- register a processor
Note: You don’t need to iterate over the Jdbc ResultSet, Easy Batch will do it for you.
3.3.1. Mapping the database records
To map the database object to the domain object I defined a CustomMapper
:
public class CustomMapper implements RecordMapper<SuggestedPodcast>{ @SuppressWarnings("rawtypes") @Override public SuggestedPodcast mapRecord(Record record) throws Exception { JdbcRecord jdbcRecord = (JdbcRecord) record; ResultSet resultSet = jdbcRecord.getRawContent(); SuggestedPodcast response = new SuggestedPodcast(); response.setMetadataLine(resultSet.getString("metadata_line")); return response; } }
For that I had to implement the RecordMapper
interface with its single mapRecord()
method.
3.3.2. Processing records
Easy Batch lets you define your batch processing business logic through the RecordProcessor
interface. This is where you define what to do for each record. The processor is registered in the line:
.registerRecordProcessor(new Processor(fileWriter))
My custom processor extends the AbstractRecordProcessor
, which is a abstract record processor implementation to extend by clients that do not need to implement RecordProcessor.getEasyBatchResult()
:
public class Processor extends AbstractRecordProcessor<SuggestedPodcast>{ private FileWriter fileWriter; public Processor(FileWriter fileWriter) { this.fileWriter = fileWriter; } @Override public void processRecord(SuggestedPodcast record) throws Exception { fileWriter.write(record.getMetadataLine() + "\n"); fileWriter.flush(); } }
The “logic” is very simple, it just writes new lines at the end of the file.
3.4. Execution and reporting
Easy Batch engine records several metrics during record processing and provides a complete report at the end of execution. This report is an instance of the EasyBatchReport
class and contains the following information:
- The batch start and end times
- The batch duration
- The data source name
- The total records number
- The number of filtered, ignored and rejected records
- The number of records processed with errors
- The number of records successfully processed
- The record processing time average
- And the computation result if any
You obtain an Easy Batch report when running the Easy Batch engine:
// Run easy batch engine EasyBatchReport easyBatchReport = easyBatchEngine.call();
Check out the Easy Batch user guide for other report formatting options.
Conclusion
For this easy job I had to implement, Easy Batch proved to be a simple, yet powerful batch framework, with good samples and documentation. Before starting my next batch job I will definetely have a look at Easy Batch first, before considering the “mightier” Spring Batch framework. But, to quote the author from Easy Batch, from Easy Batch vs Spring Batch: Feature comparison
“Choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or an implementation of JSR352). If you don’t need all this advanced stuff; then Easy Batch can be very handy to simplify your batch application development. “
Resources
Source Code – GitHub
- podcastpedia-easybatch – project presented in this tutorial. Please make a pull request for any improvement proposals
Web
- https://github.com/j-easy/easy-batch
- Tutorials
- <a title=”https://github.com/j-easy/easy-batch/wiki” https://github.com/j-easy/easy-batch/wiki” target=”_blank”>User guide</a>
- Tweets ETL tutorial
- Connecting to MySQL Using the JDBC DriverManager Interface