AI Case Study
The US National Security Agency achieves false positive rate of 0.008% for identifying potential terrorist accomplices using machine learning
The US National Security Agency (NSA) is using a random forest method of machine learning to identify potential couriers delivering information to terrorists in Pakistan. This is based on phone records and other information from a mass surveillance program of the country's cellular networks. The NSA targets a 50% false negative rate and 0.008% false positive rate using this method.
Public And Social Sector
From Ars Technica: "According to the documents, SKYNET engages in mass surveillance of Pakistan's mobile phone network, and then uses a machine learning algorithm on the cellular network metadata of 55 million people to try and rate each person's likelihood of being a terrorist. Pakistan has a population of around 192 million people, with about 120 million cellular handsets in use at the end of 2012, when the SKYNET presentation was made. The NSA analysed 55 million of those mobile phone records. SKYNET's classification algorithm analyses the metadata and ground truths, and then produces a score for each individual based on their metadata. The objective is to assign high scores to real terrorists and low scores to the rest of the innocent population." The machine learning algorithm is trained on known couriers.
From Ars Technica: "In 2014, the former director of both the CIA and NSA proclaimed that 'we kill people based on metadata.' Now, a new examination of previously published Snowden documents suggests that many of those people may have been innocent. Last year, The Intercept published documents detailing the NSA's SKYNET programme."
The NSA claims it targets a 50% false negative rate and 0.008% false positive rate. Ars Technica points out, however, that "quality evaluation is especially important in this case, as the random forest method is known to overfit its training sets, producing results that are overly optimistic."
From Ars Technica: "The random forest method uses random subsets of the training data to create a "forest" of decision "trees," and then combines those by averaging the predictions from the individual trees. SKYNET's algorithm takes the 80 properties of each cellphone user and assigns them a numerical score—just like a spam filter."
From Ars Technica: "In addition to processing logged cellular phone call data (so-called "DNR" or Dialled Number Recognition data, such as time, duration, who called whom, etc.), SKYNET also collects user location, allowing for the creation of detailed travel profiles. Turning off a mobile phone gets flagged as an attempt to evade mass surveillance. Users who swap SIM cards, naively believing this will prevent tracking, also get flagged (the ESN/MEID/IMEI burned into the handset makes the phone trackable across multiple SIM cards).... the machine learning algorithm uses more than 80 different properties to rate people."