Executive Summary
Adherence to AML regulations within the EU has ensured that financial institutions generate a constant stream of financial crime risk signals from various processes (e.g. transaction monitoring or client due diligence reviews). Most predictive models used today to prioritize these risk signals are black box models which provide limited information on how risks were identified. Explainable predictive models can help streamline the operational processes to address these risk signals by allowing insight into individual case risks, based on historical information.
The benefits of explainable models, such as the Explainable Boosting Machines algorithm, are threefold: Improving operational understanding of risks involved in the case, decreasing case review duration, and enabling case prioritization strategies. Combined, these benefits can lead to sizeable operational enhancements in both value and (timely) risk identification.
Introduction
“Why did the Transaction Monitoring system prioritize this? This transaction and this client literally have no risk factors. What am I supposed to analyse here? This does not make sense, I am wasting my time on this case!”
– Disgruntled Anti Money Laundering analysts everywhere
In the Financial Economic Crime (FEC) domain, particularly at financial institutions, regulations require the installation of various systems which generate risk signals in a constant stream. These include Transaction Monitoring (TM) systems, Ongoing Client Due Diligence (ODD) systems, Event-driven client risk assessments (EDR), Sanctions screening systems, and more. Given the risk tolerance for facilitating crime at financial institutions is generally very low (except for some deliberate offenders), those systems combined generate so many risk signals1 that the human workforce required to address them is approximately up to 20% of all banking staff in the Netherlands2.
Problematically, because of the low risk appetite, the risk signals tend to be of low quality. Since missing a risk signal is perceived as very costly, it is generally preferred to generate many false signals in order not to miss a single true risk signal.
Data scientists, in turn, have a field day coming up with exciting models which may capture some decent risk signals from the enormous heap of available data at financial institutions. All kinds of methods can be used: From Neural Networks to simple rule-based systems, and from Linear Regressions to Artificial Intelligence implementations (e.g. Large Language Models) . Some of these methods (e.g. rules) can give a clear reason why a risk signal is generated. However, many of the better performing models actually cannot do this, including models which are the de-facto industry standard for predictive models3. Enter the best of both worlds: Explainable boosting machine (EBM) models.
This article will shed light on the business value an explainable predictive model can generate, by using the example of predicting client money laundering risk scores. With mock data, we show the insights which can be delivered to operating staff which would, learning from our own experience implementing such models in practise since 2022, greatly improve their understanding of case risk factors, optimizing case review time, and open the door to risk-based case hibernation strategies4.
On the nature of explainability in predictive models
For any artificial intelligence model, there are various levels of model explainability to consider. We adhere to a three-way framework for explainability of models we create:
1
Process explainability
- Understandable model inputs, decisions and assumptions made for developing the model, and governance of the model
2
Technical explainability
- Explainable to stakeholders in terms of model creation, technical implementation, maintenance, and periodic updating
3
Result explainability
- Explaining the outputs the model generates to stakeholders, and follow-up operators, including:
- Global (generic) explanation of model outputs
- Locally (case-by-case) explainable results
Facets 1 and 2 of the framework are actually applicable to all models. Where explainable models shine, is in facet 3: the explainability of results. There is a distinction made between global and local explainability here. “Global explainability” refers to the reasons why a model may output results in general, whereas “Local explainability” is all about explaining why a model did output this specific result.
Many models can actually tell you on a global level what the reasons for outputting results are, and may show you overviews of what input data led to the decisions it made. However, that general overview may have very little relation with the outlier case which an analyst is presented with (and financial economic crime cases are often – and hopefully – outliers) and thus is fairly useless in operational practise (and only convenient for model governance purposes). An explainable model, talked about next, will generate more than simply that general overview.
The benefits of explainable model outputs
The benefits that a model with explainable outputs brings to operational FEC processes are threefold, and combined can result in large risk mitigation – and operational performance gains. In areas plagued with historically limited useful model results, such as Transaction Monitoring (with an average 5-10% of all model outputs being useful)5, such improvements quickly lead to sizeable financial benefits:
- Improve operational understanding of risks involved in the case; as an analyst gets to see what the model deemed the relevant risk factors for each individual case. Starting an analysis with the relevant risk factors in mind, the analyst has an immediate understanding on what behaviour to look for rather than to start the analysis blank (or worse, with a limited checklist) as if there was no specific reason why the case was flagged by the model.
- Decrease case review time; since highlighting the specific case risk factors allows an analyst to scope their review to them. As such, businesses can employ a risk based approach (particularly in event-driven case reviews) to limit non-related client research activities.
- Enable case hibernation strategies6; as clarity on each individual case’s risk factors also allows for a risk-based approach in prioritizing, and de-prioritizing cases for review. The latter can be taken to the end-point of ‘hibernating’ (i.e. not analysing) cases until new client risk factors or client information present themselves7.
Best in class: Explainable Boosting Machines
To reap those benefits, we need a model which allows us to globally and locally explain its results with best-in-class performance. Enter the Explainable Boosting Machines model. This article does not intend to cover the technical background of the EBM model, since both Interpret ML and Microsoft already do stellar jobs at that themselves. Instead, we show what is possible on a functional level which illustrates the business value8.
Imagine you have set up a client event-driven review (EDR) system outputting thousands of monthly review cases, and have decided to make a prioritization which of those to analyse first, using just 8 input data elements:
- The client’s current Financial Economic Crime CDD risk classification (CDD_risk_classification)
- The number of previous TM alerts the client received (Number_of_previous_alerts)
- The number of previous unexpected transaction reports filed for the client (Number_of_OT_filings)
- The number of cash transactions conducted by the client (Number_of_Cash_transactions)
- A check if the client is related to higher risk jurisdictions (High_Risk_Country_Nexus)
- The client’s highest product risk classification (Product_risk_classification)
- A check if the client is related to a politically exposed person (PEP_status)
- A check if the client is sanctioned in any way (Sanctions_status)
With our mock dataset, we train our EBM model to show us how these 8 input data features relate to the historical classification of EDRs into those which received elevated risk scores (interesting cases) versus those which received normal risk scores (not interesting cases)9. The algorithm will tell us what the average weights are for each of the input data factors for all the predictions, allowing us to understand what input data was important in making the risk predictions:
Graph 1: Global Explainable boosting machines data weights on EDR data

What is interesting, is that out of our 8 input data elements, the model deems 3 as non-important (PEP status, Sanctions_status, High_Risk_Country_Nexus) as stand alone contributors to risk, and only 5 (highlighted orange) as important. However, the EBM model is able to discern both non-linear relationships, as well as interaction relationships between input data. So, we do see the one (High_Risk_Country_Nexus) return as predictors of risks in combination with other input data variables.
When it comes to case-by-case explanations, we can retrieve even more information from the model. Graph 2 contains the single-case (local explanation) view of how its prediction came to be.
Graph 2: Local Explainable boosting machines output for a single EDR case

In this case, the EBM model made a prediction that the case should feature some risk (predicted class: 1.0, with a calculated probability of 0.857 that the risk is present). It tells us that the primary reasons for that are the product risk classification (top orange bar), and the combination of the previous unexpected transaction reports likely related to cash triggers (second from top orange bar). It also shows us that there were some factors which actually lower the calculated risk of the case (in blue).10
Is the above view the best method to present outcomes to your analysts? Probably not, as embedding the results in your case-management system with a thorough explanation is a more logical route. However, it does show exactly how the model came to its predictions, aiding the analyst in their course of action and limiting their time spent on this case.
Similarly, this level of transparency on a case-by-case level also allows financial institutions to explore their willingness to accept the automated processing of low-risk cases. As opposed to most other models, this model can document minutely why certain cases are predicted of a lower risk, which allows financial institutions to manually select financial crime risk appetite on a detailed level across output cases.
A note on EBM model performance compared to other models
At this point, critical readers will start asking things like “okay, so what are the downsides of this method?”. Naturally, there are drawbacks to the EBM model, yet they are no different than the drawbacks underlying this family of predictive model11, which includes the “XGBoost” method commonly used to predict risk across the financial economic crime domain. Such drawbacks include facets like needing to control model bias to avoid discrimination, and needing careful input data curation.
Indeed, in practical financial crime system implementations we have found the EBM model to generate predictive results at least on par with industry standard black-box models, similarly to what Microsoft Research found themselves in their benchmarks, and what academic inquiries conclude.
“So, that means it’s crazy expensive then, with a host of licensing fees?” Well no, use of an EBM model is just as free and open source as other models commonly used in the financial crime domain. Truth be told, the real reason EBMs are not the financial crime domain standard predictive model seems to be a lack of familiarity with them.
Conclusion
Given the potential personal, perceptual, and financial impact of financial crime investigation cases on clients, financial institutions should always strive for maximum understanding of model outputs to be absolutely sure of the need to review the cases such models generate.
Here, using an explainable model which is able to explain the factors which led to the prediction on a case-by-case level has the advantages of greatly improving an analyst’s understanding of case risk factors, optimizing case review time, and opening the door to case hibernation strategies.
Throughout this article, we have shown the example EBM model is highly fit for such tasks, and deserves consideration for deployment in anti-money laundering, sanctions investigations, and other anti-financial crime related production processes given its benefits.
Interested in how to improve the explainability of your models and achieve the benefits described here?
Footnotes
- A total of 1.896.176 reports were filed to the Netherlands FIU in 2022. Knowing that only 5-10% of signals tends to be reported to an FIU (see footnote 4), we can assume this number to be 10x higher in terms of risk signals handled in 2022 by institutions with an AML reporting requirement. ↩︎
- Sourced from the 2022 Annual report of ABN Amro, where on p41 a rough headcount of over 4.000 anti-money laundering staff can be found, versus the total FTE count of 20,038 is noted on p49. ↩︎
- We are aware that there are numerous ways to attempt to explain a black-box model, such as calculating SHAP values. However, all of such methods come with their associated drawbacks. So, if an innately explainable model performs on par with a black-box model which requires the introduction of fallible methods in an attempt to explain it, why choose the black-box model in the first place? See Rudin (2019) for more on this topic. ↩︎
- A case hibernation strategy in the domain of Financial Economic crime refers to the periodic risk-based de-prioritization of cases which can be reasonably estimated to be a of a lower potential money laundering, sanctions, or economic crime risk. The case is ‘hibernated’, only to be reviewed when new risk signals on the related client, transaction, or counterparty arise. ↩︎
- Based on our own experience on Transaction Monitoring improvement programs in the Netherlands, and internationally based on the Bank of International Settlement’s 2023 Aurora project findings (p19). ↩︎
- see footnote 2. ↩︎
- Such implementations however should only take place in line with a financial institutions risk appetite, when a predictive model has proven itself highly reliable in practical testing, and in a controlled deployment featuring periodic assessments. ↩︎
- For those interested to test out the model itself in Python, check out an EBM example script with some mock data for cryptocurrency and financial market predictions on our GitHub to get you going quick. ↩︎
- In our mock dataset, the ratio of ‘interesting’ cases which were deemed of elevated risk is set to 15%, which is in line with the 2022 German FIU rate of reported cases actually processed (p19) (chosen given its stricter reporting requirement than the Netherlands FIU). ↩︎
- The model also displays an intercept which is constant across all cases and negative, meaning that for this model, cases are by default more likely to be considered as not containing risk ↩︎
- Being the category of Generalized Additive Models ↩︎