Industry is full of buzzwords; it’s always a dilemma to know the clear meaning
of “big data”. This lack of vision turns out much worse when IT experts trying
to attract attention to their own projects by putting them under as “big data,”
even though there’s nothing at all big
its heart, big data is a simply logic of understanding data problems that we
are unable to solve using traditional tools. To analyse and examine the true
nature of big data issues, experts focused on the “3V’s in big data,” which
determines the basic characteristics of the reason behind what makes a data
Large data set of volumes ranging from dozens of terabytes, and even petabytes.
The organised data comprising of multiple sets, varying from raw text (which,
from a software development perspective, has partial or no discernible
structure — most of the people call this unstructured data) to log files
(usually referred to as being semi-structured) to data ordered in strongly
typed rows and columns (structured data like RDMS’s).
The incoming data volume in our companies has some kind of relevance for a
limited period of time, this window that commonly shuts well before the data
has been transferred and loaded into a data warehouse for more deeper analysis
(for example, financial securities ticker data, which help us to identify a
buying opportunity, but only for a little while). The larger the chunks of data
entering our organization per second, the bigger will be our velocity
one of above criteria clearly defines their own, distinct challenge to a person
wanting to do the analysis and extract the information. As such, these three
V’s are used as an easy way to determine big data problems and provide clear
picture about what has become a vague buzzword. The usual rule of thumb is that
if our data storage and analysis work exhibits any of these three properties,
it is very probable that we’ve got our self a big data challenge.
software development, Hadoop is considered as a classical information technology
tool, and it is very well suited to meet most of the big data problems,
especially with high volumes of data sets and data with a variety of distinct
structures. But there are many big data problems where Hadoop isn’t well suited
— in particular, analysing high-velocity data the instant it enters an organization.
Data velocity problems are involved with the analysis of moving data, but
Hadoop is designed to analyse static data. The conclusion to draw from this is
that although Hadoop is an elegant tool used for big data analysis, but it is
not meant to solve all our big data problems. Unlike some of the buzz and hype,
the entire big data domain isn’t synonymous with Hadoop.