top of page

AI Case Study

Alibaba improves online ad performance by 45% using reinforcement learning to optimise real time bidding

Alibaba uses reinforcement learning for optimising its real-time ad auctioning


Consumer Goods And Services

Retail General

Project Overview

"Our main goal is to derive the optimal bidding
policy in a reinforcement learning fashion. For most
performance-driven campaigns, the optimization target is
to maximize the user responses on the displayed ads if the
bid leads to auction winning."

According to the research paper, the bidding decision process was treated as a "reinforcement learning problem, where the state space is represented by the auction information and the campaign’s real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks."

Reported Results

"In our empirical study, the proposed solution has achieved
16.7% and 7.4% performance gains against the state-of-the-art
methods on two large-scale real-world datasets. In addition,
our proposed system has been deployed into a commercial
RTB platform. We have performed an online A/B testing, where a 44.7% improvement in click performance was observed against a most widely used method in the industry."


"With an MDP formulation, the state transition
and reward function are captured via modeling the auction
competition and user click, respectively. The optimal bidding
policy is then derived using dynamic programming.
Furthermore, to deal with the large-scale auction volume
and campaign budget, we proposed neural network models
to fit the differential of the values between two consecutive states".



Campaign Management


"The majority of online display ads are served through realtime
bidding (RTB) — each ad display impression is auctioned
off in real-time when it is just being generated from a
user visit. To place an ad automatically and optimally, it is
critical for advertisers to devise a learning algorithm to cleverly
bid an ad impression in real-time."



"Two real-world datasets are used in our experimental study,
namely iPinYou and YOYI. iPinYou is one of the mainstream RTB ad companies in China. The whole dataset comprises 19.5M impressions, 14.79K clicks and 16.0K CNY expense on 9 different
campaigns over 10 days in 2013. We follow [31] for splitting the train/test sets and feature engineering. YOYI is a leading RTB company focusing on multi-device display advertising in China. YOYI dataset comprises 441.7M impressions, 416.9K clicks and 319.5K CNY expense during 8 days in Jan. 2016. The first 7 days
are set as the training data while the last day is set as the test data."

bottom of page