Big Data: Data Volumes and Varying Data Structures
is not true if I say that we all are now living in an advanced state of the
information age. Data is being evolved and stored electronically by networked
sensors at very large volumes, in an accelerating pace and in mind-boggling
varieties. Devices such as smart phones, digital cameras, automotive, TV, and equipment
in industry and health care all contribute to the exploding volumes of data
sets. This data can be surfed, captured, and shared, but its greatest potential
remains largely untapped. The value lies in its potential and capability to provide
insight that can easily rectify vexing business challenges, open new domain, reduce
cost rates, and improve the overall health of our societies.
the mid-2000s, software development companies such as Google and Yahoo! were in
need for an approach to analyse their huge volumes of data that search engines
were storing. Hadoop is the
by-product of that approach, representing an elegant and cost-effective way of
reducing big analytical problems to small, manageable tasks.
data has a very high degree of organization and is usually the kind of data we
see in relational databases (RDBMS’s) or spread sheets. Due to its well-defined
structure, it can be easily maps to one of the standard data types (or user-defined,
custom data types that are based on those standard types). It can easily be
searched using standard search algorithms and processed in well-defined ways.
data (such as what you might see in log files) is a bit more complex to
perceive as compared to structured data. Typically, this kind of data is seen
in the form of text files, which has some degree of order — for example,
tab-delimited files where columns are separated by a tab character. So instead
of being able to issue a database query for a certain schema and knowing
results what we are getting back, people specifically need to explicitly assign
data types to the data elements extracted from semi-structured data sets.
data has none of the benefits of having structured and schematic codes into a
data set. Its analysis by way of more classical ways is complex and costly at
best, and logistically impossible at worst. Just assume having many years’
worth of notes and information typed by call centre operators that examines
customer activities. Without a powerful and smart set of text analytics tools,
it would be extremely difficult to determine any relevant behaviour patterns.
Moreover, the sheer volume of data in many cases poses virtually insurmountable
challenges to traditional data mining techniques, which, even when condition.