Big Data: Why we need Hadoop ?

Software Industry is full of buzzwords; it’s always a dilemma to know the clear meaning of “big data”. This lack of vision turns out much worse when IT experts trying to attract attention to their own projects by putting them under as “big data,” even though there’s nothing  at all big about them.

In its heart, big data is a simply logic of understanding data problems that we are unable to solve using traditional tools. To analyse and examine the true nature of big data issues, experts focused on the “3V’s in big data,” which determines the basic characteristics of the reason behind what makes a data challenge “big”:

Volume: Large data set of volumes ranging from dozens of terabytes, and even petabytes.

Variety: The organised data comprising of multiple sets, varying from raw text (which, from a software development perspective, has partial or no discernible structure — most of the people call this unstructured data) to log files (usually referred to as being semi-structured) to data ordered in strongly typed rows and columns (structured data like RDMS’s).

Velocity: The incoming data volume in our companies has some kind of relevance for a limited period of time, this window that commonly shuts well before the data has been transferred and loaded into a data warehouse for more deeper analysis (for example, financial securities ticker data, which help us to identify a buying opportunity, but only for a little while). The larger the chunks of data entering our organization per second, the bigger will be our velocity challenge.

Each one of above criteria clearly defines their own, distinct challenge to a person wanting to do the analysis and extract the information. As such, these three V’s are used as an easy way to determine big data problems and provide clear picture about what has become a vague buzzword. The usual rule of thumb is that if our data storage and analysis work exhibits any of these three properties, it is very probable that we’ve got our self a big data challenge.

In software development, Hadoop is considered as a classical information technology tool, and it is very well suited to meet most of the big data problems, especially with high volumes of data sets and data with a variety of distinct structures. But there are many big data problems where Hadoop isn’t well suited — in particular, analysing high-velocity data the instant it enters an organization. Data velocity problems are involved with the analysis of moving data, but Hadoop is designed to analyse static data. The conclusion to draw from this is that although Hadoop is an elegant tool used for big data analysis, but it is not meant to solve all our big data problems. Unlike some of the buzz and hype, the entire big data domain isn’t synonymous with Hadoop.

Leave Comment