top of page

AI Case Study

The SEC's Division of Economic and Risk Analysis identifies potentially fraudulent activity through analysing regulatory filings by using machine learning and natural language processing

The U.S. Security's and Exchange Commission's Division of Economic and Risk Analysis (DERA) uses machine learning to detect potential investment advisor misconduct or SEC violation, allowing more efficient resource allocation in investigating alleged incidents.


Public And Social Sector


Project Overview

"[M]achine learning is now integrated into several risk assessment programs... DERA staff currently ingests a large corpus of structured and unstructured data from regulatory filings of investment advisers into a Hadoop computational cluster. This is one of the big data computing environments we use at the Commission, which allows for the distributed processing of very large data files." DERA has devised a modelling system using machine learning and NLP based on this data: "But the results can also generate false positives or, more colloquially, false alarms. In particular, identification of a heightened risk of misconduct or SEC rule violation often can be explained by non-nefarious actions and intent. Because we are aware of this possibility, expert staff knows to critically examine and evaluate the output of these models."

Reported Results

Specifics undisclosed, however: "The results are impressive. Back-testing analyses show that the algorithms are five times better than random at identifying language in investment adviser regulatory filings that could merit a referral to enforcement. But given the demonstrated ability of these machine learning algorithms to guide staff to high risk areas, they are becoming an increasingly important factor in the prioritization of examinations. This enables the deployment of limited resources to areas of the market that are most susceptible to possible violative conduct."


"DERA’s modeling staff takes over with a two-stage approach. In the first, they apply unsupervised learning algorithms to identify unique or outlier reporting behaviors. This includes both topic modeling and tonality analysis. Topic modeling lets the data define the themes of each filing. Tonality analysis gauges the negativity of a filing by counting the appearance of certain financial terms that have negative connotations. The output from the first stage is then combined with past examination outcomes and fed into a second stage [machine learning] algorithm to predict the presence of idiosyncratic risks at each investment adviser."


Legal And Compliance



"[R]egulators can benefit from understanding the likely outcomes of investor behaviors. The harder truth is that applying machine learning methods is not always simple. Outcomes are often unobservable. Fraud, for example, is what social scientists call a latent variable. You don’t see it until it’s found. So, it is more challenging for machine learning algorithms to make accurate predictions of possible fraud than shopping decisions, where retailers have access to full transaction histories—that is, complete outcomes for each action... Two years ago, tackling these types of issues at the Commission was still on the horizon."



"[S]tructured and unstructured data from regulatory filings of investment advisers into a Hadoop computational cluster".

bottom of page