AI Case Study

MarineLives aims to assemble additional historical knowledge on the UK and European population of the 17th century with the use of machine learning

MarineLives, a collaborative transcription project, aims to analyse historical data with the use of machine learning. By studying the degree of sophistication of signatures, marks and initials found in manuscripts from the English High Court of Admiralty from 1627 to 1677, they aim to gather information on the English and European's population literacy at the time.

Industry

Project Overview

"The Kaggle Research Competition is a machine learning focused community owned by Google, for the algorithmic identification of marks, initials, and signatures. MarineLives, a collaborative transcription project, works with primary manuscripts from the English High Court of Admiralty from 1627 to 1677. This court dealt with marine matters, such as sea disputes with ship owners or sailors on ships. Approximately six million words and full-text transcriptions were assembled over the last six years.

"We don't have a complete sample of the English or European population from the 17th century—we deal with 17th-century data," Greenstreet said. "But we do have a lot of mariners and merchants and other related trades."

The historian David Cressy had the idea that if someone could sign their name, it meant they were literate. Whereas, the use of a crude mark or initials, which was a sort of interim between a mark and a signature, was a sign that someone was less literate.

"If we can distinguish the degree of sophistication of the signature and of the mark and of the initial, then we can assemble additional knowledge," Greenstreet said. "If we can work at a much greater scale, we can assemble some pretty powerful data."

MarineLives is currently in the process of forming a charity called Chronoscopic Education, which will become the home for MarineLives and a number of other projects they run. Their aim is to create a collaborative research community.

"It depends upon us assembling a fully annotated data set," Greenstreet said. "We're aiming for about 10,000 images, which will be annotated in some shape or form, and we're still debating that annotation."

Greenstreet said the project is a bit like dragging a horse to water because historians typically have fewer computer science skills. Therefore, it is important to show historians the power of working at a larger scale in terms of the data."

Reported Results

Planned; results not yet available

Technology

Function

Background

"In the United Kingdom, Colin Greenstreet, Founder and Co-Director of MarineLives, is heading up a Kaggle Research Competition. Greenstreet said that the competition is tentatively scheduled to begin March 2019, with a second competition likely that November."

Benefits

Data

"Approximately six million words and full-text transcriptions were assembled over the last six years."