Parallel Job Execution in Pentaho with Dynamic Configuration

Sarvesh Gupta and Sushil Goyal

March 23, 2015

Share to

We had to design and build a data warehouse for multi-tenant architecture. However, there were multiple clients with different data metrics and source databases while the data model (dimensions and facts) was common. Thus, to start with, we developed the ETL (Extract, Transform & Load) jobs for a single client which had to be scaled for this new requirement.

To load the data using the same ETL job, one can change the source and target database configurations in jdbc.properties file but this is not a scalable approach as the properties file needs to be modified every time a job has to be executed. Moreover, since Pentaho refers JNDI (Java Naming and Directory Interface) connections using the name, one cannot define two JNDI connections with the same name. Also, there is a good possibility of executing the same job in parallel for different source and target databases. In such situations, multiple parameters might override during parallel execution due to common kettle.properties file.

An inefficient way to tackle this problem would be to create a separate job for each client. Imagine a scenario where there are hundreds of ETL jobs and tens of clients! It would lead to a huge duplication of effort and would be operationally ineffective.

We, therefore, wanted to achieve our objective with minimal rework and a good design approach which would be maintainable and operationally effective.

The Solution:

In case one has installed PDI on a server at path “/opt/data-integration” and ran a PDI job through below kitchen command then by default it searches kettle.properties and repository.xml at path “KETTLE_HOME/.kettle” and searches jdbc.properties at path “/opt/data-integration/simple-jndi”.

/opt/data-integration/kitchen.sh -rep test -job load_data_job

To overcome this problem, one can assign the path while firing the kitchen command.

For example:

KETTLE_HOME=”/data/client1/” KETTLE_JNDI_ROOT=”/data/client1/jndi/” /opt/data-integration/kitchen.sh -rep test -job load_data_job

KETTLE_HOME=”/data/client2/” KETTLE_JNDI_ROOT=”/data/client2/jndi/” /opt/data-integration/kitchen.sh -rep test -job load_data_job

In the above example, we have separated kettle.properties and jdbc_properies files for both clients at a different location.

If you run the first kitchen command, it searches kettle.properties at path “=”/data/client1/” and jdbc.properties file at path “=”/data/client1/jndi/”.

If you run the second kitchen command, it searches kettle.properties at path “=”/data/client2/” and jdbc.properties file at path “=”/data/client2/jndi/”.

Similarly, for each new client one can set up the configuration in a new location and point the kitchen command to new kettle and jdbc properties files. This gives the benefit of reusing existing ETL jobs, avoid conflict between parallel executions while it provides the flexibility to scale when required.

Financial Services

Media & Entertainment

Real Estate

Manufacturing

Hitech

Agriculture

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

Financial Products

Manufacturing Products

FEATURED INSIGHT

Accelerate your business with Tavant’s web-based banking application

AI & Analytics

Quality Engineering

Salesforce

Cloud

Microsoft

Mobility

FEATURED INSIGHT

Decoding the Future of Fintech Lending

Legacy System Modernization

ABOUT US

Leadership

Our Story

Awards

Partnerships

Careers @ Tavant

Our Culture

Open Positions

QUICK READS

Online Platform Services for a Leading Game Company

Financial Services

Media & Entertainment

Real Estate

Manufacturing

Hitech

Agriculture

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

Financial Products

Manufacturing Products

FEATURED INSIGHT

Accelerate your business with Tavant’s web-based banking application

AI & Analytics

Quality Engineering

Salesforce

Cloud

Microsoft

Mobility

FEATURED INSIGHT

Decoding the Future of Fintech Lending

Legacy System Modernization

ABOUT US

Leadership

Our Story

Awards

Partnerships

Careers @ Tavant

Our Culture

Open Positions

QUICK READS

Online Platform Services for a Leading Game Company

Parallel Job Execution in Pentaho with Dynamic Configuration

Sarvesh Gupta and Sushil Goyal

Share to

Tags :

Let’s create new possibilities with technology

INDUSTRIES

PRODUCTS

Financial Products

Manufacturing Products

SERVICES

ABOUT US

CAREERS

IMPACT

NEWS

INSIGHTS

CONTACT US

Follow us

© Copyright 2024 Tavant. All Right Reserved

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

Decoding the Future
of Fintech Lending

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

Decoding the Future
of Fintech Lending