AI Case Study

Airbnb improves search results ranking with the use of deep learning



Internet Services Consumer

Project Overview

"The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, how- ever, plateaued over time. This paper discusses the work done in applying neural networks in an a empt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product.

Starting with this background, the current paper discusses our experiences in transitioning one of the at-scale search engines on the internet to deep learning. The paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs). For teams starting to explore machine learning, we would recommend a look at [27] as well.
The search ranking model under discussion is part of an ecosystem of models, all of which contribute towards deciding which listings to present to the guest. These include models that predict the likelihood of a host accepting the guest’s request for booking, models that predict the probability the guest will rate the on trip experience 5-star, etc. Our current discussion is focused on one particular model in this ecosystem. Considered the most complex piece of the ranking puzzle, this model is responsible for ordering the listings available according to the guest’s likelihood of booking.
A typical guest search session is depicted in Figure 1. It is common for guests to do multiple searches, clicking through some of the listings to view their details page. Successful sessions end with the guest booking a listing. The searches performed by a guest and their interactions are logged. While training, a new model has access to the logs of the current and previous models used in production. The new model is trained to learn a scoring function that assigns impressions of the booked listings in the logs at as high a rank as possible, similar to [19]. The new model is then tested online in an A/B testing framework to see if it can achieve a statistically significant increase in conversions compared to the current model."

Reported Results


"The first architecture that we finally managed to get online was a simple single hidden layer NN with 32 fully connected ReLU activations that proved booking neutral against the GBDT model. The NN was fed by the same features as the GBDT model. The training objective for the NN was also kept invariant w.r.t the GBDT model: minimizing the L2 regression loss where booked listings are assigned a utility of 1.0 and listings that are not booked a utility of 0.0.
The value of the whole exercise was that it validated that the entire NN pipeline was production ready and capable of serving live traffic. Aspects of this pipeline are discussed later under the feature engineering and system engineering sections."



Product Marketing


"The home sharing platform at Airbnb is a two sided marketplace for hosts to rent out their spaces, referred to as listings, to be booked by prospective guests from all around the world. A typical booking starts with the guest issuing a search at for homes available in a particular geographic location. The task of search ranking is to respond to the guest with an ordered list of a handful of listings from the thousands available in the inventory.
The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history, with many successful iterations to follow. e gains in online bookings eventually saturated with long spells of neutral experiments. is made the moment ripe for trying sweeping changes to the system."