Tuesday, June 21, 2011

Kettle – Best Practices

In tune with the last post, I’ll discuss more about ETL Best practices which I follow with Kettle.

• Enable cache for lookup on Dimension tables.

This improves the Performance of your Transformation for Small/Medium size Dimension tables. But you need to monitor the Logs to check whether It is beneficial or not. In some cases it might add un-necessary load on ETL server.

Its always a dilemma for ETL developer when it comes to Caching of Dimensional tables while doing the lookups. Usually right approach is collect Performance data, for the ETL where Caching is Implemented. This data usually includes Time taken,

  • To fetch Dimension table into memory
  • To filter this data
  • To update the Cache if underlying lookup table data gets changed.

Another good practice which I learned from Informatica, is to Place these Cached files on  Faster Drives.

image

No comments:

Post a Comment