AI Case Study
Facebook optimises experiments for server performance and ranking improvement using reinforcement learning
Internet Services Consumer
"A typical approach for tuning parameters of an online system is to manually run small grid searches to optimize each parameter separately. Bayesian optimization constructs a statistical model of the relationship between the parameters and the online outcomes of interest, and uses that model to decide which experiments to run. This model-based approach has several key advantages, especially for tuning online machine learning systems.
We have used the approach described in the paper to optimize a number of systems at Facebook, and describe two such optimizations in the paper. The first was to optimize 6 parameters of one of Facebook’s ranking systems. These particular parameters were involved in the indexer, which aggregates content to be sent to the prediction models. The second example was to optimize 7 numeric compiler flags for HHVM. The goal of this optimization was to reduce CPU usage on the web servers, with a constraint on not increasing peak memory usage.
The first 30 iterations were randomly generated configurations (up to the green line), after which Bayesian optimization was used to identify parameter configurations to be evaluated. In this experiment, and many others that we’ve run, we found that Bayesian optimization was able to find better configurations that are more likely to satisfy the constraints. The compiler flag settings found in this optimization were made the new defaults in the open source HHVM."
"We have found that with our approach described in the paper, Bayesian optimization is an effective and robust tool for optimizing via noisy experiments. It has made engineers’ lives easier by replacing manual exploration of multidimensional spaces, and has improved the online performance of backend systems and machine learning infrastructure.
Bayesian optimization allows us to borrow strength across rounds of experimentation: A test of a parameter value in a continuous space gives us information about not only its outcome, but also those of nearby points. This allows us to greatly reduce the number of experiments needed to explore the space.
Better experiment outcomes: The model is able to identify parts of the space that are unlikely to provide a good outcome, and avoid running tests of those parameter values. This improves the experience inside the treatment groups."
Bayesian optimization, randomized experiments, quasi-Monte Carlo methods.
"Facebook relies on a large suite of backend systems to serve billions of people each day. Many of these systems have a large number of internal parameters. For example, the web servers that power Facebook use the HipHop Virtual Machine (HHVM) to serve requests, and HHVM has dozens of parameters that control the just-in-time compiler."
User traffic statistics, server performance metrics including CPU time, memory usage and database fetches.