AI Case Study
Researchers at University of Bologna observe that their pricing algorithms can learn to collude to raise prices using reinforcement learning
Researchers at the University of Bologna developed two pricing algorithms using reinforcement learning. When they set them loose in a controlled environment they observed that the algorithms, although autonomous, learned to cooperate. They optimally adapted to each other's behaviour resulting in rising prices. This reflects the existing concerns on automated pricing, as the algorithms had no prior knowledge and learnt to collude by trial and error.
Industry
Project Overview
"Within a pricing context, these systems are given a goal such as to maximize overall profit; then they experiment with different strategies in a simulated environment to find the optimal one. A new paper now suggests that these systems could pose a huge problem: they quickly learn to collude.
Researchers at the University of Bologna in Italy created two simple reinforcement-learning-based pricing algorithms and set them loose in a controlled environment. They discovered that the two completely autonomous algorithms learned to respond to one another’s behavior and quickly pulled the price of goods above where it would have been had either operated alone.
“What is most worrying is that the algorithms leave no trace of concerted action,” the researchers wrote."
Reported Results
"“They learn to collude purely by trial and error, with no prior knowledge of the environment in which they operate, without communicating with one another, and without being specifically designed or instructed to collude.” This risks driving up the price of goods and ultimately harming consumers."
Technology
Function
Background
"Pricing algorithms have become ubiquitous in online retail as automated systems have grown increasingly affordable and easy to implement. But while companies like airlines and hotels have long used machines to set their prices, pricing systems have evolved. They have moved from rule-based programs to reinforcement-learning ones, where the logic of deciding a product’s price is no longer within a human’s control."
Benefits
Data