Personalised fraud detection
Fraud is expensive, affects common resources and prices and is therefore important to detect and prevent. Soft fraud, the exaggeration of legitimate claims, is quite diffuse and difficult to spot. A sustainable welfare system and efficient insurance operations require implementation of effective measures to limit fraud. Tax avoidance and tax evasion are other important types of fraud. We are also interested in money laundering detection. We develop adaptive tools that use “all data”, including payment logs, relational networks and other available digital records, but under strict privacy protection regulations.
A further objective is to combine the multitude of fraud detection models in an optimal way, taking advantage of the strength of each predictor while blurring away weaknesses, and still obtaining coherent quantifications of the uncertainty in the fraud prediction. A similar objective is the development of new individualised anti-money laundering solutions. So far, the detection of suspicious transactions is based on labour-intensive semi-manual approaches and restricted to customers who significantly differ from the norm. Since the volume of banking transactions is steadily increasing, automated, intelligent tools are needed. The aim is to significantly increase the number of correctly identified money laundering transactions.
Fraud detection can be seen as a regression/forecasting problem, where fraud (true/false) is the response, possibly with a potential economic loss, and there are very many covariates. Including interactions, the number of covariates is huge. Generally, there are few fraud cases that are investigated, and a great number of undetected cases exist. The objective is to produce a trustworthy probability of fraud for each case.
Ensemble methods for fraud detection
Combining results obtained by different statistical and machine learning procedures can be convenient. We constructed a toolbox for combining fraud forecasting models, exploiting the time series aspect of the data, available covariates and the probabilistic confidence in the classification obtained by each individual model. The toolbox has been tested on data from the Norwegian Tax Administration.
Combining dependent probability forecasts
Different fraud forecasting methods are however likely to be dependent. This is in particular the case if they are based on the same covariates. Therefore, ensemble methods which ignore such dependence will not be able to recognise the presence of the same forecast, just repeated several times. This will lead to spurious confidence in the ensemble forecast. The idea is to construct a joint model for the outcome (fraud/not fraud) and the forecasts, capturing dependences of these, based on a pair-copula construction. We developed the pair copula methodology in the last decade, and it is now used in many areas with great success. The new method will be tested on simulated and real data from our partners.
Network analysis for fraud detection
Fraud can spread directly or indirectly from one fraudster to another. Exploiting knowledge about social relations between users/customers can be useful to discover fraud. Understanding how such networks of users look and evolve over time is expected to significantly improve fraud detection models. We build these networks and extract useful characteristics to produce better fraud forecasts and provide additional insight into how fraud spreads. We are here working with insurance fraud data and later on tax avoidance and money laundering data.
A machine learning model for suspicious transactions
Most supervised anti money laundering methods assume that suspicious activities are labelled as such by experts, while legitimate activities are just randomly sampled from the complete population of activities. This is motivated by the fact that the chance of a random activity being suspicious is almost zero. We challenge this view by 1) modelling suspicious transactions directly instead of via accounts or parties, and 2) show that the current practise of excluding activities labelled as non-suspicious by experts leads to significantly worse performance. The method is being transferred to DNB and the approach will be published.
Local Gaussian discrimination with discrete and continuous variables
We generalise classical discriminant analysis (LDA and QDA) to local-Gaussian class distributions instead of regular Gaussian distributions. This lifts the variable dependence from globally pairwise to locally pairwise dependence. We are also able to combine discrete and categorical variables with the continuous variables by relying on pairwise dependence in a unified framework. The method will be evaluated on simulated and real data from one or several of the partners and the approach will be published.