Home > DeveloperSection > Articles > Machine Learning with Mahout and Collaborative Filtering

Machine Learning with Mahout and Collaborative Filtering

Big Data Bigdata  Hadoop  Software Development 
0 Comment(s)
 152  View(s)
Rate this:

 Machine Learning with Mahout and Collaborative Filtering

Machine learning refers to a field of artificial intelligence (A.I.) functions that provides tools enabling computers to enhance their analysis on the basis of previous events. These computer systems leverage historical data from past attempts at solving a task in order to refine the performance of future attempts at identical tasks. In terms of expected results, machine learning may sound familiar a lot like that other buzzword “data mining”; however, the former concentrates on prediction via an analysis of prepared training data,  but the latter is focused with knowledge discovery from unprocessed raw data. This is the reason why, machine learning relies heavily upon statistical modelling technology and draws from fields of probability theory and pattern recognition.

Mahout is a very popular open source project from Apache, providing Java libraries for distributed and scalable machine-learning algorithms. These algorithms consists of classic machine learning works such as classification, clustering, association rule analysis, and recommendations.

Although Mahout libraries are engineered to work within an Apache Hadoop context, they are also compatible with every system supporting the MapReduce framework. For instance, Mahout offers Java libraries for Java collections and common math operations (linear algebra and statistics) that can be used independently without Hadoop. Mahout libraries are implemented in Java MapReduce and run on our cluster as collections of MapReduce tasks on either YARN (with MapReduce v2), or  MapReduce v1.

Mahout is a continuously growing project with multiple contributors. At present, the collection of algorithms available in the Mahout libraries is by no means complete; however, the  huge collection of algorithms implemented for use consistently to expand with time.

There are three key kinds of Mahout algorithms which supports statistical analysis:

1.    Collaborative filtering

2.    Clustering

3.    Classification

Collaborative filtering

Mahout was particular designed and structured for serving as a recommendation engine, employing what we called it as a collaborative filtering algorithm. Mahout integrates the wealth of clustering and classification algorithms at its disposal to generate more precise and refined recommendations on the basis of input data. These recommendations are often applied against user preferences, taking into consideration the behaviour of the user. By comparing a user’s previous selections, it is possible to identify the nearest neighbours (persons with a similar decision history) to that user and predict future selections based on the behaviour of the neighbours.

Consider a “taste profile” engine such as Netflix — an engine that recommends ratings on the basis of user’s previous scoring and viewing habit patterns. In this example, the behavioural patterns for a user are compared with the user’s history — and the trends of users with same tastes belonging to the same Netflix community — to build a recommendation for content not yet viewed by the user in question.

Don't want to miss updates? Please click the below button!

Follow MindStick