Machine Learning with
Mahout and Collaborative Filtering
learning refers to a field of artificial intelligence (A.I.) functions that provides
tools enabling computers to enhance their analysis on the basis of previous events.
These computer systems leverage historical data from past attempts at solving a
task in order to refine the performance of future attempts at identical tasks.
In terms of expected results, machine learning may sound familiar a lot like
that other buzzword “data mining”; however, the former concentrates
on prediction via an analysis of prepared training data, but the latter is focused with knowledge
discovery from unprocessed raw data. This is the reason why, machine learning
relies heavily upon statistical modelling technology and draws from fields of
probability theory and pattern recognition.
is a very popular open source project from Apache, providing Java libraries for
distributed and scalable machine-learning algorithms. These algorithms consists
of classic machine learning works such as classification, clustering,
association rule analysis, and recommendations.
Mahout libraries are engineered to work within an Apache Hadoop context, they
are also compatible with every system supporting the MapReduce framework. For
instance, Mahout offers Java libraries for Java collections and common math
operations (linear algebra and statistics) that can be used independently
without Hadoop. Mahout libraries are implemented in Java MapReduce and run on our
cluster as collections of MapReduce tasks on either YARN (with MapReduce v2),
or MapReduce v1.
is a continuously growing project with multiple contributors. At present, the
collection of algorithms available in the Mahout libraries is by no means
complete; however, the huge collection
of algorithms implemented for use consistently to expand with time.
are three key kinds of Mahout algorithms which supports statistical analysis:
was particular designed and structured for serving as a recommendation engine, employing
what we called it as a collaborative filtering algorithm. Mahout integrates the
wealth of clustering and classification algorithms at its disposal to generate
more precise and refined recommendations on the basis of input data. These
recommendations are often applied against user preferences, taking into
consideration the behaviour of the user. By comparing a user’s previous
selections, it is possible to identify the nearest neighbours (persons with a
similar decision history) to that user and predict future selections based on
the behaviour of the neighbours.
a “taste profile” engine such as Netflix — an engine that recommends ratings on
the basis of user’s previous scoring and viewing habit patterns. In this
example, the behavioural patterns for a user are compared with the user’s
history — and the trends of users with same tastes belonging to the same
Netflix community — to build a recommendation for content not yet viewed by the
user in question.