Big Data: HBase as
Distributed, Persistent, Multidimensional Sorted Map
we are very well familiar with the power packed characteristics and nature of
Hbase. As we define already Hbase is a big data tool (used by Hadoop) that is
based on Google Big table Distributed data storage system (DDSS) and Google
defined it as a “sparse, distributed, persistent multidimensional sorted map.”
Previously, we have seen what exactly it means by
“Sparse” and how Hbase is designed to fit this very nature of Sparseness
by supporting with no waste of costly storage space for null values and also
how we can dynamically add data field’s overtime without having concerned about
redesigning the schema or disrupt any operation.
here we examine the last three characteristics of the definition, how Hbase
behaves as distributed, persistent multidimensional stored map.
is distributed and persistent
BigTable is a distributed and persistent database store. By Persistent, it
simply means that the data we store in BigTable (and HBase, for that matter)
will persist or remain after our program or session ends. That’s pretty
straightforward — persistent means that it persists — but we should spend a
little more time thinking about how the data is persisted. In the BigTable paper,
Google explains the distributed file system also called as Google File System or
GFS. It turns out that, just as HBase is an open source implementation of Google’s
BigTable, likewise HDFS is also an open source implementation of GFS. By
default, HBase leverages HDFS to enable persistent nature and persist its data
to disk storage. But, we can use other distributed data stores with HBase, the
big majority of the HBase installations leverage HDFS. This makes an ideal clue
given that HBase is the “Hadoop Database” — hey, it’s already built into the name, for goodness
is an important enabling technology just
not only for Hadoop but also for HBase as well. By storing data in HDFS, HBase
enables reliability, availability, seamless scalability, high performance and
many more features— all on cost effective distributed servers!
has a multidimensional Sorted Map
start with the basics, a map (also called as an associative array) is an abstract
collection of key-value pairs, where the key is always unique. This definition
is crucial to our understanding of HBase because the HBase data model is often described
in many ways — most of the time incompletely as a column-oriented store. HBase
is, in actual at the bottom, a key-value
data store in which each key is unique — meaning it appears at most once in the
HBase data store. Additionally, the map is sorted and multidimensional too. The
keys are stored in HBase and sorted in bytelexicographical order. Every
value can have multiple versions, which makes the data model multidimensional.
By default, data versions are implemented with a timestamp.