Fraud Detection with Hadoop           

Fraud is a major concern across all industries. Just name the industry (banking, insurance, government, health care, or retail, for example) and we will find fraud. At the same time, we will find folks who are willing to invest an incredible amount of time and money to try to prevent fraud. After all, if fraud were easy to detect, there wouldn’t be so much investment around it. In today’s interconnected world, the sheer volume and complexity of transactions makes it harder than ever to find fraud. What used to be called “searching a needle in a haystack” has become the task of “finding a finest needle in stacks of needles.” Though the huge volume of transactions makes it difficult to spot fraud due to the amount of data, ironically, this same issue can help create  much better fraud  detection models — an area where Hadoop shines.

Traditional approaches to fraud prevention aren’t particularly efficient. For instance, the management of improper payments is usually handled by data analysts auditing what result to a very small sample of claims paired with requesting medical documentation from targeted submitters. The standard term for this model is pay and chase: Claims are accepted and paid out and transformations search for intentional or unintentional overpayments by a mechanism of post payment review of those claims. (The U.S. Internal Revenue Service (IRS) operation uses the pay-and-chase approach on tax returns.)

So how this fraud detection really works? Due of the shortcomings of classical technologies, fraud models are implemented by sampling data and using this sample to build a set of fraud-detection models. When we compare this model with a Hadoop-anchored fraud department that uses the complete data set — no sampling — to generate the models, we can see the contradictions.

The most common recurring theme we can see across most Hadoop use cases is that it assists business in breaking through the glass ceiling on the volume and variety of data that can be incorporated into decision analytics. The more data we have (and the more history you store), the better our models can be. Mixing non-traditional forms of data with our set of historical transactions can make our fraud models even more robust.

For building fraud-detection models, Hadoop can very effectively designed to handle volume i.e. processing the full data set — no data sampling. Hadoop manage new varieties of data: Examples are the inclusion of proximity-to-care-services and social circles to decorate the fraud model. It also Maintain an agile environment by enabling different kinds of analysis and changes to existing models.

Leave Comment