AI Case Study

Researchers at the University of Bristol detect trends in historical data using AI

Researchers from the University of Bristol, the University of Cardiff, and the website FindMyPast leveraged machine learning to analyse historical archives. The scientists analysed more than 35 million articles in British newspapers dated between 1800 to 1950. By understanding the content of the articles the team was able to identify patterns such as increasing mentions of women in the news in the 20th century, as well as estimate important historical dates with accuracy.

Industry

Public And Social Sector

Education And Academia

Project Overview

"In 2016, a team of researchers from the University of Bristol, the University of Cardiff, and the website FindMyPast, led by Nello Cristianini, Professor of Artificial Intelligence at the University of Bristol, used AI to analyze roughly 14% of local British newspapers from 1800 to 1950, covering more than 35 million articles and 28.6 billion words.

The abstract of the paper: "Previous studies have shown that it is possible to detect macroscopic patterns of cultural change over periods of centuries by analyzing large textual time series, specifically digitized books. This method promises to empower scholars with a quantitative and data-driven tool to study culture and society, but its power has been limited by the use of data from books and simple analytics based essentially on word counts. This study addresses these problems by assembling a vast corpus of regional newspapers from the United Kingdom, incorporating very fine-grained geographical and temporal information that is not available for books. The corpus spans 150 years and is formed by millions of articles, representing 14% of all British regional outlets of the period. Simple content analysis of this corpus allowed us to detect specific events, like wars, epidemics, coronations, or conclaves, with high accuracy, whereas the use of more refined techniques from artificial intelligence enabled us to move beyond counting words by detecting references to named entities. These techniques allowed us to observe both a systematic underrepresentation and a steady increase of women in the news during the 20th century and the change of geographic focus for various concepts. We also estimate the dates when electricity overtook steam and trains overtook horses as a means of transportation, both around the year 1900, along with observing other cultural transitions. We believe that these data-driven approaches can complement the traditional method of close reading in detecting trends of continuity and change in historical corpora."

Reported Results

This study detected trends in the historical record as reflected in newspaper archives, including shifts in gender bias, technology uptake, broad political awareness and accurate dates for specific events. This allows for a quantitative description of forces previously described from manual surveys.

Technology

Function

Background

Historians are finding that the practical applications of deep learning can help them work with large amounts of information much easier.

Benefits

Data

More than 35 million articles from British newspapers from 1800 to 1950.