top of page

AI Case Study

Researchers improve film box office success prediction using machine learning transduction algorithms

Researchers determine that predicting box office success for movies using machine learning is enhanced when considering that films share characteristics like director which make them more or less successful.


Public And Social Sector

Education And Academia

Project Overview

"In the recent past, machine learning algorithms have been used effectively to identify interesting patterns from volumes of data, and aid the decision making process in business environments. In this paper, we aim to use the power of such algorithms to predict the pre- release box-office success of motion pictures.

The problem of forecasting the box-office collection for a movie is reduced to the problem of classifying the movie into one of several categories based on its revenue. We propose a novel approach to constructing and using a graph network between movies, thus alleviating the movie independence assumption that traditional learning algorithms make. Specifically, the movie network is first used with a transductive algorithm to construct features for clas- sification. Subsequently, a classifier is learned and used to classify new movies with respect to their predicted box-office collection.

However, in the case of movie gross prediction problem, movies are generally not independent. In fact, there is an underlying graph structure that we could identify among movies. For example, a movie can be connected to another movie if they share actors and/or directors, if they have the same genre, if one is a sequel to the other, or if they are released around the same time. If we consider common actors or directors, the intuition is that the reputation of an actor or a director who worked in a movie can be transferred to a different movie in which the actor or the director took part. Thus, we believe that the reputation of Steven Spielberg as the director of a yet to be released movie will have positive effect on the success of that movie compared to the success of a yet to be released movie directed by a rookie director."

Reported Results

The researchers found that their "hypothesis that considering a dependency relation between movies helps improve the prediction accuracy is confirmed". Box office success for films can then be better predicted considering non-independent data features along with a film's independent characteristics.


"Transductive algorithms are algorithms which are trained on specific training instances to reason on specific test instances. Suppose we add a new node to the graph, we would have to run the algorithm again to predict the revenue for the newly added node. The algorithm that we use in this work uses the matrix factorization technique to factorize the adjacency matrix in a graph network. The factors generated as a result of this factorization can be seen as link features and can be further used as features for classification."


R And D

Core Research And Development


"Much better marketing strategies can be designed and better choices can be made by the cinema production companies in the presence of a strong estimator of a movie’s anticipated success. With this in mind, researchers in the past have tried to identify factors that influence the success of a movie and computed correlations between those variables and a movie’s box-office gross".



"977 movies released as ‘wide releases’ between the years 2006 and 2011. There are approximately 150 movies in each year and for each movie we collected features such as actor and director profiles (to compute star value), genre, release date, sequel information, budget, runtime, number of theaters and MPAA rating. Movies released between the years 2006 and 2010 are considered as training instances and movies released in the year 2011 are considered as test instances."

bottom of page