Data Cloning Through Pentaho Data Integration Clone Step

Splitting rows based on a column value. The input data comprises of ticket booking records defining number of seats booked at event, section and row level. It also contains the starting and last seat number.

 

Objective

It was required to split ticket blocks within event_name + section_name + row_name as follows:

    • convert the record where num_seats > 1 into as many records as num_seats
    • assign values as follow in split records
      • num_seats = 1 for each record
      • seat_num = individual seat within the block
  1. original seat_num + i where “i” is counter from 0 to num_seats – 1
        • last_seat = new value of the seat_num as above
      • population logic of rest of the column remains unchanged

Sample Data Input:

 

Expected Output:

So based on the above screenshots, we need to split the incoming input rows based on the num_seats field .So for first input row where num_seats=4 we need to generate 4 records as per the rules defined above.

Solution:

Pentaho provides a clone row step that can clone objects or rows in the same way as the main row based on a column value.

Refer to the below screenshot for the solution:

 

  1. Table input: This step will load the input data.
  2. Clone row: This step will create the clone objects or rows similar to the main row
      1. Nr clone in field: will specify column value to be used for cloning
      2. Add clone flag to output: will put the flag=N for the original row and Flag=Y for clone rows
      3. Clone num field (seat_index_rownum): will add the index value (0,1,2,..).
  1. Filter rows: Remove the original (non-cloned) row (where clone?=N).

 

  1. Calculator and Select Values: Calculate the seat number and replace the original fields (num_seats, seat_num, last_seat, etc.) with the new values.

 

  1. Table Output: Loading the data into the target table.

Tags :

Let’s create new possibilities with technology