Home > DeveloperSection > Category

Latest article on category "Big Data"

HBase Data Model: Column Qualifiers and Versioning (Part – 2)

We have already examined dos and don’ts of Row Keys and Column Family with details in our previous post.

By Jayden Bell posted   6 months ago

HBase Architecture: ZooKeeper Znodes (Part-6)

Zookeeper enables coordination and synchronization with what we calls “znodes”, which are presented as a directory tree and resemble the file path names we would see in a Unix file system.

By Jayden Bell posted   6 months ago

HBase Architecture: ZooKeeper and HBase Reliability (Part-5)

In our pervious parts of this series named under “HBase Architecture” we have seen RegionServers,

By Jayden Bell posted   6 months ago

HBase Architecture: MasterServers(Part-4)

In HBase Architecture: Part-1, we Started our discussion of architecture by describing RegionServers instead of the MasterServer may have surprised you.

By Jayden Bell posted   6 months ago

HBase Architecture: Compactions (Part-3)

Previously, we have seen RegionServers and learn how regions work. Now here we examine compactions.

By Jayden Bell posted   6 months ago

HBase Architecture: Regions (Part-2)

Previously we are introduced to the HBase architecture and examine the most basic component of the architecture that is RegionServer.

By Jayden Bell posted   6 months ago

HBase Data Model: Row Keys and Column Families (Part – 1)

HBase data stores comprises of one or more tables, that are indexed by row keys. Data is stored in rows with columns, and rows can have multiple versions.

By Jayden Bell posted   6 months ago

Big Data: What is Sparse Data in HBase?

As we might have guessed, the Google’s BigTable distributed data storage system(DDSS) was designed to meet the demands of big data. Now, big data applications store massive amount of data but big data content is also often variable.

By marcel ethan posted   6 months ago

Big Data: Evolution of HBase

I think everybody remember his first surfing experience on the World Wide Web,; we just knew that it was an incredible innovation for the IT industry.

By Manoj Bhatt posted   6 months ago

Big Data: CAP Theorem

Selecting a database which fits in our application requirement is a very daunting task since “no one size fits all”.

By Manoj Bhatt posted   6 months ago

Big Data: NoSQL Data Stores

NoSQL database stores are initially considering the notion “Just Say No to SQL” and these were the reactions to the perceived limitations of (SQL-based) relational databases RDBMS.

By zack mathews posted   6 months ago

Machine Learning with Mahout and Collaborative Filtering

Machine learning refers to a feild of artificial intelligence (A.I.) functions that provides tools enabling computers to enhance their analysis on the basis of previous events.

By Andrew Watson posted   7 months ago

MapReduce Reducer Class:

MapReduce API has two main classes which do the Map-Reduce task for us: Mapper class and Reducer class.

By Ailsa Singh posted   7 months ago

YARN: Yet Another Resource Negotiator

With the release of Hadoop 2, however, YARN was introduced, which open doors for whole new world of data processing opportunities.

By Mikki Halpin posted   7 months ago

Log Data Ingestion with Flume

Some amount of data volume that ends up in HDFS might land there through database load operations or other types of batch processes.

By Jayne Spooner posted   7 months ago

Name Node Design and its working in HDFS

Whenever a user tries to stores a file in HDFS, the file is first break down into data blocks, and three replicas of these data blocks are stored in slave nodes (data nodes) throughout the Hadoop cluster

By Jayden Bell posted   7 months ago

Check pointing process in HDFS

As we already know now that HDFS is a journaled file system, where new changes to files in HDFS are captured in an edit log that’s stored on the NameNode in a file named edits.

By Jayden Bell posted   7 months ago

Slave Node Server Design for HDFS

When we are choosing storage options, consider the impact of using commodity drives rather than more expensive enterprise-quality drives.

By zack mathews posted   7 months ago

Hadoop Distributed File System (HDFS)

HDFS is a file system unlike most of us may have encountered before. It is not a POSIX compliant file system, which basically means it does not provide the same guarantees as a regular file system.

By Jayne Spooner posted   7 months ago

Fraud Detection with Hadoop

Fraud is a major concern across all industries. Just name the industry (banking, insurance, government, health care, or retail, for example) and we will find fraud. At the same time.

By john rob posted   7 months ago

Big Data: Why we need Hadoop ?

Software Industry is full of buzzwords; it’s always a dilemma to know the clear meaning of “big data”.

By Tom Cruser posted   7 months ago

Big Data Strategies: Limitations with Scale-up and Scale –Out Systems

Deploying a scale-out solution has required significant engineering effort; the system developer often needs to handcraft the mechanisms for data partitioning .

By Royce Roy posted   7 months ago

Big Data Revolution in Software Development

Look around at the technology we have today, and it's easy to come to the conclusion that it's all about data. Organizations are flooded with data.

By David Miller posted   7 months ago

Don't want to miss updates? Please click the below button!

Follow MindStick