top of page

AI Case Study

A large european bank identifies issues in post-trade operations by analysing mailboxes with the use of machine learning

A large european bank is leveraging re:infer's technology to identify inefficiencies from communication data. By using supervised and unsupervised learning the system analysed 300 of the bank's shared mailboxes to identify issues in operations. The system then conducted a more targeted analysis of the fixed income mailboxes to identify and quantify specific issues that needed attention. The bank identified several areas for improvement and implemented the necessary targeted solutions to improve efficiency.


Financial Services


Project Overview

"In post-trade operations, re:infer is being used to discover manual processes, quantify failures, highlight root-cause errors, detect incorrect reference data and provide a bridge to downstream automation.

The first step is to get a holistic view of the communication flows across an organization and understand how email is being processed. This can be done quickly by building a broad taxonomy that describes the post-trade processes and course-grained exception types and applying it to the client’s email data.

Business users simply create a taxonomy for their data using a multi-stage learning process. A taxonomy consists of business-relevant labels that are used to tag each conversation (together with confidence levels) in order to drive downstream processing.
In the case of post-trade operations, a taxonomy will typically describe a hierarchical processes model that identifies the function and exception type, e.g. “Settlement > Fails > Counterparty Lack”.

At a large European bank  all communication flows are handled manually and are entirely independent of the core IT systems and structured data flow. 

In this instance, the taxonomy was applied to 300 shared mailboxes — around 40% of the total email corpus for operations and over 5TBs of data.

Key observations:

* A very large volume of historical data with a total of 3 billion emails and nearly 150 TBs of data stored on internal email servers, around 10% of which is operations data spread across 3–4,000 shared mailboxes.
* A high volume of emails with around 500k — 1 million emails per day being handled across the middle and back office teams.
* Manual triage with heavy use of shared mailboxes for manually categorizing, prioritizing and routing emails, accounting for around 5–10% of the effort in each operations team.
* Communication is mostly internal with around 70% of emails being sent between internal teams and evidence of multiple handoffs between shared mailboxes.
* Teams organized around mailbox structures rather than being aligned to a consistent model such as client/product/function.
* Unbounded client communication channel with over 500 unique email addresses used by operations to communicate with a single client over a four week period.
* Data duplication due to heavy use of distribution lists, causing data to be copied to multiple teams and individuals. In one case an external client cc’d over 20 separate distribution lists when sending an email.
* Data deletion as 50% of operations email data is from the past 2 years, indicating a large volume of data is deleted after that period.
* Inappropriate use of mailboxes for document management with some shared mailboxes having in excess of 20,000 subfolders.
* Very poor search performance impacting the usability of workstations and degrading end-user productivity.

It is clear to see how deeply ingrained email and chat applications can become in large organizations. In this specific case, the operations team were using Microsoft Outlook as their primary tool for workflow, document management and enterprise search — none of which it was designed for.

The next step is to focus on specific business areas that are generating a high number of exceptions and refining the taxonomy to understand root causes.

One area that was quickly identified as problematic in this case study was EMEA fixed income settlements and the related treasury functions.

Analysis of the fixed income mailboxes helped identify (and quantify) specific issues that needed to be addressed:

* Client behaviors that cause operational issues e.g. lack of cash/stock to deliver, late settlement instructions etc. These issues were mostly rectified by the business working together with the clients, but in some cases led to clients being off-boarded.
* A significant number of inter-company trades failing settlement due to inventory management issues, specifically a lack of cash/stock to deliver on the intended settlement date (ISD). These issues led to a new initiative to improve the real-time accuracy of main-firm positions and front-to-back visibility of positions through shared infrastructure.
* Sub-optimal collateral and funding processes due to inaccurate inventory and matching/settlement fails not being included in short term funding ladders. These issues led to a large volume of status inquiries over chat and email from the treasury trading desks, finance, and collateral teams.
* Ineffective trade matching process on repo flows with most errors only being identified through the settlement instruction process — much later in the trade lifecycle than is necessary. For the most part, trade matching issues are managed manually over email with the counter-parties and custodians.
* On average around 8-10% of trades settle late i.e. settlement occurs later than the intended settlement date (ISD) — despite most settlement instructions being matched before on time.
* Late settling trades cause operational backlogs contrary to the widely held belief that operational processes are volume insensitive. Ordinarily, around one-third of late settling trades are settled within 1–3 days, 4–10 days and >10 days respectively, but during times of stress the average time to remediate late settling trades increases.
* Custodians send over 3,000 error messages per day or around 500k messages over a 6 month period. Around one-third of these were booking issues, one third relating to settlement and inventory management problems, and one third largely due to processing latency or other types of unnecessary waste.
* Trade lifecycle events sent over email present risk, for example, trade confirmations/allocations, corporate action instructions etc. These would typically go through a manual triage process and then lead to manual data entry into core IT systems or market matching utilities. This led to an improvement program to either move lifecycle events onto systemic flows (e.g. FIX) or to automate the triage process and call downstream robots for processing.
* Sub-optimal process design e.g. when a client lacks the cash/stock to deliver the original process was to let it fail settlement. The process was subsequently changed so that the trading desk asked the client for partial settlement and covered the shortfall through the repo desk, thereby avoiding settlement failure and improving the accuracy of positions.
* Sub-optimal organizational design creating high number of handoffs between operational teams e.g. unmatched settlement instructions sent to the settlement fails team would always create a request to the confirmations team to obtain the trade details agreed with the client, and then if any trade amendments are required that creates further requests to the middle office and trade support teams. These handoffs were eliminated by re-organizing the teams and mailbox structures around front-to-back product flow, so each team member could manage a settlement fail independently."

Reported Results

"The above issues were all identified from communication data using re:infer, and led to a broad program of targeted improvements that encompassed:

* Review of problematic clients
* Bug fixes to underlying systems
* Shared front-to-back views of trade status and positions (“Trade Tracker”)
* Process improvements and redesign
* Automation of manual triage process
* Automatic triggering of downstream robots
* Automatic case creation in downstream workflow systems
* Phasing out of manual reporting
* Development of intelligent alerting and risk indicators

As the initial analysis approach was entirely data-driven, the impact of these changes could be measured against historical levels."


"Re:infer combines unsupervised and supervised machine learning techniques to automatically detect concepts and intents in communication data, avoiding the need for complex heuristics."





"As banks seek new efficiency levers they have started to explore how new technology can accelerate opportunity discovery and transformation efforts.

Communication channels such as email and chat contain a rich history of the process exceptions and manual work being performed across an organization."



"A key differentiating feature is that re:infer only requires 15–20 data samples to detect new concepts with high confidence."

bottom of page