Home > DeveloperSection > Category

Latest blog on category "Hadoop"

Pig Script Interfaces and Mode of Running in Hadoop

All scripts are run on a single machine without requiring Hadoop MapReduce and HDFS. This can be useful for developing and testing Pig logic.

By Chris S posted   one year ago

Introduction to Ooize in Hadoop

Moving data and running different kinds of applications in Hadoop is great stuff, but it’s only half the battle. For Hadoop’s efficiencies to truly start paying off for us, start thinking about how we can tie together a number of these actions to for

By Jayden Bell posted   one year ago

Statistical Analysis in Hadoop

Big data is all about applying analytics to more data, for more people. To carry out this task, big data practitioners use new tools — such as Hadoop — to explore and understand data in ways that previously might not have been possible (challenges that were “too complex,” “too expensive,” or “too slow”). Some of the “bigger analytics” that we often hear mentioned when Hadoop comes up in a conversation revolve around concepts such as machine learning, data mining, and predictive analytics.

By Andrew Watson posted   one year ago

Pig Data Types in Hadoop

We have already seen the Pig architecture and Pig Latin Application flow. We also learn the Pig Design principle in the previous post.

By Ailsa Singh posted   one year ago

Pig Design Principles in Hadoop

Pig Latin is the programing platform which provides a language for Pig programs. Pig helps to convert the Pig Latin script into MapReduce tasks that can be run within Hadoop cluster.

By Glen Martin posted   one year ago

Introduction to Pig in Hadoop

Java MapReduce programs and the Hadoop Distributed File System (HDFS) provide us with a powerful distributed computing framework, but they come with one major drawback — relying on them limits the use of Hadoop to Java programmers who can think in Map and Reduce terms when writing programs.

By Glen Martin posted   one year ago

Hadoop File System Commands: ls Command Output Analysis

In this post, I have explained about Hadoop File System Commands.

By David Miller posted   one year ago

HDFS Architecture in Hadoop

The core concept of HDFS is that it can be made up of dozens, hundreds, or even thousands of individual computers, where the system’s files are stored in directly attached disk drives.

By Jayden Bell posted   one year ago

Three modes of Hadoop Cluster Architecture

Hadoop is primarily structured and designed to be deployed on a massive cluster of networked systems or nodes, featuring master nodes (which host the services that maintains Hadoop’s storage and manipulating power ) and slave nodes (where the data sets are stored and processed). We can, however, run Hadoop on a single computer, which is a great way to learn the basics of Hadoop by experimenting in a controlled space.

By Felix Pickles posted   one year ago

Why we need Map-Reduce in Hadoop?

After we have stored piles and piles of data in HDFS (a distributed storage system spread over an expandable cluster of individual slave nodes), the first question that comes to mind is “How can we analyse or query this data?” Transferring all this data to a central node for processing isn’t going to work, since this way, we will be waiting forever for the data to transfer over the network (not to mention waiting for everything to be processed serially). So what’s the solution? And the solution is “MapReduce”!!

By Jayne Spooner posted   one year ago

Various Data compression codecs in Hadoop

Here we enlist and identify some common codecs that are supported by the Hadoop framework.

By marcel ethan posted   one year ago

Hadoop Toolbox

Besides, the major contribution of Amazon EMR services and its other related tools, many other companies also provide certain useful Hadoop Tools.

By Felix Pickles posted   one year ago

Concept of Map Reduce in Hadoop

Though MapReduce as a technology is relatively new, it builds upon much of the fundamental work from both mathematics and computer science, particularly approaches that look to express operations that would then be applied to each element in a set of data. Indeed the individual concepts of functions called map and reduce come straight from functional programming languages where they were applied to lists of input data.

By Andrew Watson posted   one year ago

Hadoop Distributions: EMC, HotonWork and MapR

Besides Cloudera, there are few other popular Hadoop distribution which are well implemented for commercial and development purposes.

By Derek Honeybun posted   one year ago

Hadoop Distributed processing with MapReduce

MapReduce comprises the sequential processing of operations on distributed volumes of data sets.

By Nigel Bunyan posted   one year ago

Don't want to miss updates? Please click the below button!

Follow MindStick