AI Case Study
Uber automates map data issue detection from its customer support tickets
Uber automatically detects issues related to its map data from customer problem tickets. It does this using an NLP system to identify language related to maps, allowing the correct maps team to assess and fix the problem.
"To address the problem of large-scale ticket analysis, we built a natural language processing (NLP) platform that looks for map data-related issues in the text of tickets. This platform can then specify which specific type of map data triggered the ticket, so that the appropriate team within our maps organization can assess the issue and determine a solution.
Lacking upfront categorization, we need to deal with free-form text and detect signals associated with maps from it. In written language, people naturally phrase issues in a variety of ways when submitting a ticket. For example, one person could write that a location is “wrong” while another might describe it as “incorrect” or “off.” Understanding the multiple ways in which people can refer to the same thing is a challenge suited specifically for NLP."
Technical And Product Support
"Uber reviews and addresses feedback in customer support tickets, which are submitted by riders, driver-partners, eaters, and delivery-partners on the Uber platform. Some of these tickets point out location problems, giving us one means of identifying and fixing errors in our map data. Manually poring over the tickets to find those that point out inaccurate map data would not be scalable. Therefore, we use machine learning (ML) and big data processing to automate this workflow."
Results undisclosed, however the company is looking at expanding the technology to non-English markets.
"The requirement for detecting errors in a map data type can be modeled as a classification problem in machine learning, in which a classification model will predict the probability that a ticket is related to errors in the map data type based on its learning from training data. We start with Logistic Regression (or Softmax, if multiclass classification fits better for our use case) for our first version algorithm.
Logistic Regression takes numerical vectors as its input. Therefore, we need to encode a ticket, including free-form text and contact type (category), into a numerical vector. For free-form text, a naive approach would be to use a pre-defined vocabulary to encode by word frequency. That approach would make the vector sparse and require much more training data to train the classification model effectively, which is infeasible in most cases. Therefore, we want to embed the text of a ticket into a dense vector such that similar tickets will be close to each other in the embedding vector space. Each word is mapped to a vector with 300 dimensions. We use dimensional reduction technology such as t-SNE and PCA to reduce the dimension of word vectors to three so that we can visualize them in 3D.
After preprocessing, we use the NLP model, consisting of contact type indexer, contact type one hot encoder, Word2Vec model, and Logistic Regression model, trained and saved as a Spark pipeline model.
Text from customer submitted problem tickets; the Word2Vec model is trained on a million randomly sampled tickets.