AI Case Study

The New York Times automates historic photo content identification and detail extraction with Google computer vision

The New York Times is digitising their repository of several million historic photos using Google Cloud Services. Machine learning is used to identify and extract bits of text in the photos automatically, adding important content and context to the files without human intervention.

Industry

Consumer Goods And Services

Media And Publishing

Project Overview

Using Google's AutoML features, such as its computer vision, allows automatic extraction of texts from digitised photos. "To preserve this priceless history, and to give The Times the ability enhance its reporting with even more visual storytelling and historical context, The Times is digitizing its archive, using Cloud Storage to store high-resolution scans of all of the images in the morgue. Storing the images is only one half of the story. To make an archive like The Times’ morgue even more accessible and useful, it’s beneficial to leverage additional GCP features. In the case of The Times, one of the bigger challenges in scanning their photo archive has been adding data regarding the contents of the images. The Cloud Vision API can help fill that gap. This is only the start of what’s possible for companies with physical archives. They can use the Vision API to identify objects, places and images. For example, if we pass the black and white photo above through the Cloud Vision API with Logo Detection, we can see that Pennsylvania Station is recognized. Furthermore, AutoML can be used to better identify images in collections using a corpus of already captioned images."

Reported Results

"Of course, the digital text transcription isn’t perfect, but it’s faster and more cost effective than alternatives for processing millions of images."

Technology

Details undisclosed, but essentially a binary/multi-class classification or annotation exercise.

Function

Information Technology

Knowledge Management

Background

"For over 100 years, The Times has archived approximately five to seven million of its old photos in hundreds of file cabinets three stories below street level near their Times Square offices in a location called the “morgue.” Many of the photos have been stored in folders and not seen in years. Although a card catalog provides an overview of the archive’s contents, there are many details in the photos that are not captured in an indexed form."

Benefits

Data

Digitised historic photos