Home > DeveloperSection > Category

Latest blog on category "Big Data"

HBase Architecture: Introduction and RegionServers(Part-1)

The reason that folks such as chief financial officers are excited by the thought of using Hadoop is that it lets us store massive amounts of data across a cluster of low cost commodity servers — that’s music to the ears of financially minded people.

By Jayden Bell posted   5 months ago

Big Data: HBase as Distributed, Persistent, Multidimensional Sorted Map

Now we are very well familiar with the power packed characteristics and nature of Hbase.

By marcel ethan posted   5 months ago

Big Data: ACID versus BASE Data Stores

I think back in our school days, almost all of us have studied about difference between “ACID” and “BASE” in chemistry.

By zack mathews posted   5 months ago

Clustering and Classification with Mahout

Unlike the supervised learning method described earlier for Mahout’s recommendation engine feature, clustering is a kind of unsupervised learning — where the data labels points are not known ahead of time and should be inferred from the data without

By Tom Cruser posted   5 months ago

MapReduce Driver Class:

Although the mapper and reducer implementations are all we need to perform the MapReduce job, there is one more piece of code necessary in MapReduce:

By Andrew Watson posted   5 months ago

YARN’s Resource Management

The most key component of YARN is the Resource Manager, which governs and maintains all the data processing resources in the Hadoop cluster. In other words, the Resource Manager is a dedicated scheduler who has a task to assigns resources to requesti

By Mikki Halpin posted   5 months ago

Data Replication in Hadoop: Slave node disk failures (Part -2)

Hadoop was originally designed with an intention to store petabyte data at the scale, with any Potential limitations to scaling out are minimized.

By Elena Glibart posted   5 months ago

Writing and Reading Data from HDFS

For creating new files in HDFS, a set of process would have to take place (refer to adjoining figure to see the components involved

By Jayden Bell posted   5 months ago

HDFS Federation and High availability

Before Hadoop 2 comes to the picture, Hadoop clusters were living with the fact that Name Node has placed limits on the degree to which they could scale.

By Jayden Bell posted   5 months ago

Storing Data in HDFS

Just to be clear, storing data in HDFS is not entirely the same as saving files on your personal computer. In fact, quite a number of differences exist — most having to do with optimizations that make HDFS able to scale out easily across thousands of slave nodes and perform well with batch workloads.

By Glen Martin posted   5 months ago

Hadoop Distributions: Cloudera

We have seen that the Hadoop ecosystem has several component parts, all of which exist as their own Apache projects. Since Hadoop has become extremely popular and widely used, it also going through few significant further changes, various versions of these open source community components may not be fully compatible with the rest of the components.

By naomi burke posted   5 months ago

Pros and Cons of Hadoop System

As with any tool, it's important to understand when Hadoop is a good fit for the problem in question.

By zack mathews posted   5 months ago

Graph Analysis with Hadoop

We all are already familiar with log data, relational data, text data, and binary data, but we will soon hear about another form of information: graph data.

By ezra heywood posted   5 months ago

Image Classification with Hadoop

Image classification starts with the notion that we build a training set and that computers are equipped to recognize and categorize what they’re processing at. In the context, that having greater data helps generate better fraud detection and risk-predictive models, the same way it also helps system to better classify images.

By ben reitman posted   5 months ago

Social Sentiment Analysis with Hadoop

Social sentiment analysis is simply the most overrated of the Hadoop applications, which should be no surprise, given that we breathe in a world with a constantly connected and expressive population.

By ben reitman posted   5 months ago

Risk Modelling with Hadoop

Risk modelling is another major use case that’s energized by Hadoop. We think we will find that it closely resembles the fraud detection model use case in which it acts like a model-based discipline.

By jacob rasel posted   5 months ago

Data Warehousing with Hadoop

Data warehouses are on the edge of the line, trying to cope with growing needs on their finite resources.

By Allen Scott posted   5 months ago

Big Data: Data Volumes and Varying Data Structures

It is not true if I say that we all are now living in an advanced state of the information age. Data is being evolved and stored electronically by networked sensors at very large volumes, in an accelerating pace and in mind-boggling varieties.

By Elena Glibart posted   5 months ago

Log Data Analysis with Hadoop

Log analysis is a common practice which can be easily handled by Hadoop project. Indeed, the early applications of Hadoop were for the large-scale analysis of clickstream logs — log which record data about the web pages that users browse

By Samuel Fernandes posted   5 months ago

Big Data Strategies: Share Nothing Approach

Anyone with children will have spent considerable time teaching the little ones that it's good to share.

By Samuel Fernandes posted   6 months ago

Big Data Strategies: Classic data processing systems (Scale –Up and Scale-Out)

The fundamental reason that big data mining systems were rare and expensive is that scaling a system to process large data sets is very difficult.

By Jayden Bell posted   6 months ago

Don't want to miss updates? Please click the below button!

Follow MindStick