Data Warehousing with Hadoop

Data warehouses are on the edge of the line, trying to cope with growing needs on their finite resources. The sudden growth in the volumes of data sets generated in the world has also impacted data warehouses because the amount of data they handle are expanding — partly due to more structured data is created but also because we often have to manage the regulatory requirements designed to maintain query able access to historical data. Also, the exceling power in data warehouses is usually used to process transformations of the relational data (RDBMS) as it either comes to the warehouse itself or is put into a child data mart (a separate subset of the data warehouse) for a separate analytics application. In addition, the demand is rising for analysts to design new queries against the structured data stored in warehouses, and these kinds of ad hoc queries might use significant data processing resources. Many times a one-time report may suffice, and many times an exploratory analysis is required to find questions that haven’t been asked yet that may yield significant business results. The bottom line is that data warehouses are typically being used for reasons beyond their original design.

In software development, Hadoop can provide significant relief in this situation, using high-level architecture, Hadoop can live alongside data warehouses and fulfill some of the purposes that they aren’t designed for.

Hadoop can modernize a data warehousing ecosystem by provide a landing zone for all data and persisting the data to provide a query able archive of cold data. Leveraging Hadoop’s large-scale batch processing efficiencies to pre-process and transform data for the warehouse. It also enables an environment for ad hoc data discovery.

On one hand, the Hadoop hype machine is in full gear and bent on world domination. This camp sees Hadoop replacing the relational database products that now power the world’s data warehouses. The argument here is compelling: Hadoop is cheap and scalable, and it has query able interfaces that are becoming increasingly faster and more closely compliant with ANSI SQL — the standard for programming applications used with database systems.

On the other hand, many relational warehouse vendors have gone out of their way to resist the appeal of all the Hadoop hype. Understandably, they won’t roll over and make way for Hadoop to replace their relational database offerings. They’ve adopted what we consider to be a protectionist stance, drawing a line between structured data, which they consider to be the exclusive domain of relational databases, and unstructured data, which is where they feel Hadoop can operate. In this model, they’re positioning Hadoop as solely a tool to transform unstructured data into a structured form for relational databases to store.

blog

Data Warehousing with Hadoop

Allen Scott

Leave Comment

1 Comments

Comments

Liked By