An Extract, Transform and Load (ETL) tool needs to be robust, scalable, high throughput and fault tolerant. Very much like an e-Commerce transaction system. Designing such a system on a distributed computing backbone can be extremely rewarding, given that mid-size to large organizations might be collecting data from multiple sources and bringing it all together into an integrated warehouse—resulting in thousands of batch and real-time jobs running during the course of a day.
For example, retailers collect inventory, sales, finance, marketing, clickstream, and competitor data multiple times a day. But aggregating this data by running ETL jobs, only once daily, can slow down decision-support systems and rules engines, which must feed essential decisions (like dynamic prices) back to the system to control demand.
For many e-commerce analytics and data-mining solutions, a slow ETL tool might prove to be a huge bottleneck. While commercial and open source tools help implement such workflows, it is often better to consider a homegrown ETL tool based on good design and distributed-computing principles.
Learn how to build your homegrown ETL solution and use a task queue to scale the tool horizontally.
Download the whitepaper to read more: http://lf1.me/Ncc/