How Data Science is reinventing fraud detection

This article presents the data science and machine learning for fraud detection, analyzing past data and identifying new patterns. Companies must analyze threats, collect and analyze data to build effective fraud detection models.

Elise Andro Profile Picture
Elise Andro Data Scientist

According to the Association of Professionals and Directors of Accounting and Management (APDC) internal fraud costs French companies 5 % of their turnover.

In the same way as external fraud, such as cyberattacks, companies are still poorly equipped to deal with the risk of internal fraud: 6 out of 10 companies have not allocated a specific budget to fight against fraud (Euler Hermes and DFCG study of 2020). 

How can businesses protect themselves from this threat?

Let's start with the beginning, 

 

Where does the fraud come from? Or, why does an individual commit fraud?

In the 1960s, the American criminologist Donald Cressey invented the theory of “  Fraud Triangle ". If we follow his theory, an employee, a manager or a director can be pushed into fraud for three reasons: 

  • pressure : often financial, such as a high lifestyle or addictions, but sometimes coming from the company via unattainable objectives for example.
  • the opportunity : weaknesses in the company's internal control.
  • rationalization : a reflection justifying and making acceptable the act of fraud (“everyone does it”, “the amount is ridiculous compared to the company’s income”, etc.) 

Faced with this, the first reaction of companies: prevention

To deal with the risk of fraud, the first defense is of course prevention

The first preventive measure is to map threats and draw up a clear action plan which will be relayed within the company. The essential point for a prevention policy to work is to obtain the participation of as many people as possible and to distill its practices into the business culture.

However, this is not enough

Prevention must be accompanied by internal controls and audits as well as warning systems. However, manual controls are long and often ineffective : an approving manager will tend to overlook validations of expense reports or small operations for example, these tasks being far from being among his daily priorities. As for audits, the analysis of thousands of lines of data by teams of auditors is very long and above all expensive

This is where data science comes in

Data science, and particularly machine learning, are already widely used today for problems such as spam detection, medical diagnosis or even media content recommendation. Machine learning consists ofanalysis of data to understand, predict and classify information. This technology is therefore ideal for fraud detection.

In fact, it makes it possible to analyze past fraud, identify their origin and the environment in which they are committed. Thus, teams can take measures and apply new automatic controls or points of vigilance to remind employees as part of the prevention plan.

In concrete terms, how does that work?

There are two main families of machine learning algorithms: supervised models and unsupervised models. Before going any further, it is good to point out that the application of one or the other of these families of models does not give a firm answer to the presence or absence of fraud but provides a probability of encountering fraud (“This transaction has a 86% chance of being a fraudulent transaction” for example). 

THE supervised models learn on the basis of so-called data 'labelled', that is to say containing the expected result. For example, a dataset of accounting transactions with whether or not they are fraudulent for each line. The model then learns to recognize the similar fraudulent situations to the one that was shown to him. This method is effective in detecting fraud identical to past situations. 

The limit of this solution is creativity of the fraudster. Indeed, when it comes to fraud, adaptation and renewal of techniques are essential. 

We then use the unsupervised models. In this case, the model is trained with a dataset without labels. He will then have to look for similarities or unusual behaviors within the data on his own. Thus the unsupervised model will make it possible to discover new types of fraud by deciphering large volumes of information impossible to analyze manually. 

By combining the two, it is then possible to continuously monitor and report risks of frauds referenced while detecting new sources of threats within the organization. 

Other techniques can of course complement these methods: automatic controls of operations based on business rules, a risk scoring system for teams or services to prioritize prevention and control actions, data sampling to limit manual control to the most risky operations, etc.

Where do I start for my business?

  1. Analyze threats and map risks. 
  2. You will be able to detect white areas and develop a prevention plan..
  3. Your data: COLLECT, STRUCTURE and ANALYZE IT

From there, you will be able to build effective models for:

  • Detect quickly the frauds committed
  • Identify sources of threats
  • Anticipate and take appropriate measures

 

A must see

Most popular articles

Do you have a transformation project? Let's talk about it !