In the last Post I explained how I implemented Surrogate Key using SSIS, for Dimension Table Updates.
I tried to do the Same thing with Kettle and I was not amazed that How easy it was. Kettle gives you so many options to do a task.
There are many ways to implement it. I’ve implemented it using “Add Sequence” and “Modified JavaScript”.
1. Logic is to First get the Last Used Surrogate Value. This can be done in two ways, either we can use a sequence (if the DB supports) or we can use Kettle’s own sequence generator. In this example I’ve used the DB sequence as I want this example to be similar to the SSIS exercise which I did although Latter should be a better option as it’ll have Performance benefits.
2. Next to pass this value to a Kettle Variable using “Set Variable” Step.
3. Save this Transformation which will act as a Configuration Transformation for the Main Transformation.
4. Create a New transformation which will do the Actual Loading.
5. Get the Staging Data by using Input Step (I’ve used Flat file Input to keep things simple)
6. Then I used “Modified JavaScript” Step.
In this step first Get the Variable Value, which we set in the Step2. Then increment this Value for Each row and we’re done.
7. Now just pass the Output of above step to a Output step (a DB table or a Flat file).
Another alternative could’ve been: Pick Max value from the Primary Key column of the Table and assign it the Variable, but that can be a performance issue with Larger tables.
Issues which I faced:
How to use Variables into “Modified JavaScript Step”? There is a nice method “getVariable” available under “Special Functions” Category with which we can reference the Variable which we are using in the Transformation. This is the thing which I like most about Kettle, It is an ETL Tool with a Programmers Mind-set rather than being a mere Drag-N-Drop Tool.
Another issue was with Formatting. With the JavaScript I had converted few variables from String to Number and Vice-versa, without passing the Optional Formatting Parameter.
So a thing to note always use the formatter explicitly otherwise Kettle will format the Variables as per its Logic.
So I was expecting “1234” and I got “1234.00” or “000000001234”.
str2num(MAX_ID, "#")
num2str(strVarValue, "#")
I kept getting this Error “Sequence Not Found” when I validated the Transformation, although the Sequence was there. I tried the SQL Editor available inside the Kettle and even it verified that Sequence was there. It was really annoying. I restarted the Database, Cleared the Cache of the DB but nothing worked. Actual issue was that I had created my DB schema under a DB with Multiple Schemas in it and while creating the Connection in Kettle I had filtered it (Attached image). But it seems Kettle and DB2 aren’t so friendly and that’s why the Error. Solution to this issue is that Create a OS User with same name as Schema and then Create the User Objects in the schema with the connection under this New User’s Credentials. |
Job:
Setup Variable Transformation:
Set Variable and Add Sequence Step:
Main Transformation:
Modified JavaScript Step
No comments:
Post a Comment