Image classification starts with the notion that we build a training set and that computers are equipped to recognize and categorize what they’re processing at. In the context, that having greater data helps generate better fraud detection and risk-predictive models, the same way it also helps system to better classify images. This requires a significant amount of data processing tools and resources in software development, however, which would limit or control the scale of deployments. Image classification is a hot topic in the Hadoop world since no mainstream technique was empowered — until Hadoop came along — to opening doors for this kind of expensive processing on such a massive and cost-efficient scale.

In this use case, the data set is considered as both the training set and the data models as the classifiers. Classifiers identify features or specific patterns within sound, image, or video and classify them accordingly. These are built and iteratively enhanced from training sets so that the associated precision scores (a measure of exactness) and recall scores (a measure of coverage) are measured high. Hadoop is no doubt perfectly suited for image classification since it provides a huge parallel processing ecosystem to create classifier models (iterating over training sets) but also enable probably limitless scalability to process and execute those classifiers across bigger sets of unstructured data chunks. For instance, social media and multimedia sources such as YouTube, Facebook, Instagram, and Flickr — these all are sources of unstructured binary data. We can use Hadoop to scale the processing of massive volumes of stored images and video for multimedia semantic classification.

However, this explanation focuses on image analysis, Hadoop can also be implemented in audio or voice analytics, too. For instance, we could generate a system which knows how to clearly instantaneously classify the whisper of the wind in contrast to the whisper of a human voice or to differentiate the sound of human footsteps running in the perimeter parklands from that of animals and other wildlife creatures.

We could totally understand that this explanation might have sort of a Star Trek feel to it, but we can see real implementation explains now. In fact, IBM builds one of the largest image classification systems in the planet, via the IBM Multimedia Analysis and Retrieval System (IMARS).

Image classification has many applications, and being able to compute this classification at a very huge scale using Hadoop opens up many options for data analysis as other tools can use this classification information generated for the images. This technique is very useful to the health industry also. Early testing has shown this strategy to help reduce the number of missed or inaccurate diagnoses, saving time, money, and — most of all — lives.

  Modified On Mar-14-2018 02:54:45 AM

Leave Comment