CUST-3 Archive – records of UK overseas trade, 1697-1780
Data Transformation – Image to Data
with London School of Economics

Back

In 2021, the London School of Economics provided our first significant data collection challenge. Our objective was to create AI models to extract text from 35,000 photos of papers detailing historical British trade and format it in Excel.

Existing images of these records were drawn from the UK’s national customs and excise archive stored at the National Archives in Kew. The documents amount to tabulated manuscripts with historical scripts.

We developed bespoke AI models to recognise column breaks on the page and deal with non-standard, inconsistent, and sometimes unclear, characters

Cust-3 (TNA) is an astonishing resource that comprehensively captures British global trade from 1697 to 1780 – a period of empire building. Tables include government taxation too, which is an important resource for understanding the rise of the modern fiscal state.

The image below gives an example of output from our model that identifies and links manuscript tabulated data. Jamaican imports included 'Elephants Teeth’ (ivory?), Tortoise Shells, and Snuff. Also shown is a snippet from the data collected from the images. Note that we are now developing web-based tools to identify, and rapidly correct errors generated by the AI models. This currently amounts to around 5% of the data collected. Rapid intelligent identification of AI generated errors for human checks is now becoming a key part of our service offering.