AI Case Study

Ohio State University researchers using NLP to identify security flaws on Twitter

Researchers from Ohio State University, Leidos, and FireEye LLC train a convolutional neural network (CNN) to identify tweets referring to software security vulnerabilities and gauge their severity.



Internet Services Consumer

Project Overview

The research is "the first study of whether natural language processing techniques can be used to analyze users’ opinions about the severity of software vulnerabilities reported online." Tweets are deemed to be severe if they meet one of the following "(1) does the author believe that their followers should be worried about the threat? (2) is the vulnerability easily exploitable? and (3) could the threat affect a large number of users? If one or more of these criteria are met, then we consider the threat to be severe."

Reported Results

"Without much hyperparameter tuning on the development set, the convolutional neural network consistently achieves higher precision at the same level of recall as compared to logistic regression".


"Specifically, given a named entity and tweet, our goal is to estimate the probability the tweet describes a cybersecurity threat towards the entity, pthreat, and also the probability that the threat is severe." The researchers used logistic regression for a baseline and a 1D CNN for threat severity prediction.


Information Technology



"Software vulnerabilities are flaws in computer systems that leave users open to attack; vulnerabilities are generally unknown at the time a piece of software is first published, but are gradually identified over time. As new vulnerabilities are discovered and verified they are assigned CVE numbers (unique identifiers), and entered into the National Vulnerability Database (NVD). To help prioritize response efforts, vulnerabilities in the NVD are assigned severity scores using the Common Vulnerability and Scoring System (CVSS). As the rate of discovered vulnerabilities has increased in recent years, the need for efficient identification and prioritization has become more crucial."



"6,000 tweets annotated with opinions toward threat severity... To collect tweets describing cybersecurity events for annotation, we tracked the keywords “ddos” and “vulnerability” from Dec 2017 to July 2018 using the Twitter API... For threat existence classification, we randomly split our dataset of 6,000 tweets into a training set of 4,000 tweets, a development set of 1,000 tweets,
and test set of 1,000 tweets. For the threat severity classifier, we only used data from 2nd phase of annotation. This dataset consists of 1,966 tweets that were judged by the mechanical turk workers".