HBase Data Model: Column Qualifiers and Versioning (Part – 2)

We have already examined dos and don’ts of Row Keys and Column Family with details in our previous post.

HBase Architecture: ZooKeeper Znodes (Part-6)

Zookeeper enables coordination and synchronization with what we calls “znodes”, which are presented as a directory tree and resemble the file path names we would see in a Unix file system.

HBase Architecture: ZooKeeper and HBase Reliability (Part-5)

In our pervious parts of this series named under “HBase Architecture” we have seen RegionServers,

HBase Architecture: MasterServers(Part-4)

In HBase Architecture: Part-1, we Started our discussion of architecture by describing RegionServers instead of the MasterServer may have surprised you.

HBase Architecture: Compactions (Part-3)

Previously, we have seen RegionServers and learn how regions work. Now here we examine compactions.

HBase Architecture: Regions (Part-2)

Previously we are introduced to the HBase architecture and examine the most basic component of the architecture that is RegionServer.

HBase Data Model: Row Keys and Column Families (Part – 1)

HBase data stores comprises of one or more tables, that are indexed by row keys. Data is stored in rows with columns, and rows can have multiple versions.

Big Data: What is Sparse Data in HBase?

As we might have guessed, the Google’s BigTable distributed data storage system(DDSS) was designed to meet the demands of big data. Now, big data applications store massive amount of data but big data content is also often variable.

Big Data: Evolution of HBase

I think everybody remember his first surfing experience on the World Wide Web,; we just knew that it was an incredible innovation for the IT industry.

Big Data: CAP Theorem

Selecting a database which fits in our application requirement is a very daunting task since “no one size fits all”.

Pig Latin Operators in Hadoop

Pig Latin has a simple syntax with powerful semantics we will use to carry out two primary operations:

Hadoop integration with R

Developers and Programmers are still continue to explore various approaches to leverage the distributed computation benefits of MapReduce and the almost limitless storage capabilities of HDFS in intuitive manner that can be exploited by R.

Machine Learning with Mahout and Collaborative Filtering

Machine learning refers to a feild of artificial intelligence (A.I.) functions that provides tools enabling computers to enhance their analysis on the basis of previous events.

MapReduce Reducer Class:

MapReduce API has two main classes which do the Map-Reduce task for us: Mapper class and Reducer class.

Pig Architecture and Application Flow in Hadoop

Simple” often sense as “elegant” when it comes to those remarkable architectural drawings for that new Silicon Valley mansion we have planned for when the money starts rolling in after we implement Hadoop.

YARN: Yet Another Resource Negotiator

With the release of Hadoop 2, however, YARN was introduced, which open doors for whole new world of data processing opportunities.

MapReduce Mapper Class:

Mapper class is responsible for providing implementations for mapping jobs in MapReduce.

Managing files with Hadoop File System Commands

HDFS is one of the two main components of the Hadoop framework; the other is the computational paradigm known as MapReduce.

Data Replication in Hadoop: Replicating Data Blocks (Part – 1)

In HDFS, the Data block size needs to be large enough to warrant the resources dedicated to an individual unit of data processing On the other hand.

Hadoop Java API for MapReduce

Hadoop has gone through some big API change in its 0.20 release, which is the basic interface in the 1.0 version .

Input Splits and Key-Value Terminologies for MapReduce

As we already know that in Hadoop, files are composed of individual records, which are ultimately processed one-by-one by mapper tasks.

Log Data Ingestion with Flume

Some amount of data volume that ends up in HDFS might land there through database load operations or other types of batch processes.

Name Node Design and its working in HDFS

Whenever a user tries to stores a file in HDFS, the file is first break down into data blocks, and three replicas of these data blocks are stored in slave nodes (data nodes) throughout the Hadoop cluster

Check pointing process in HDFS

As we already know now that HDFS is a journaled file system, where new changes to files in HDFS are captured in an edit log that’s stored on the NameNode in a file named edits.

Concept of Data compression in Hadoop

The massive data volumes that are very command in a typical Hadoop deployment make compression a necessity.


Enter your email address here always to be updated. We promise not to spam!