Back
Challenge
Thousands of pages of handwritten wills from the Ely Consistory Court, held at Cambridgeshire Archives, contain rich legal and genealogical data. These documents—written in Latin and early English—were inconsistently photographed, difficult to read, and lacked structured digital formats for analysis.
Solution
Osiris-AI:
- Enhanced 11,466 manuscript images using AI-powered restoration and noise reduction
- Trained a bilingual starter HTR model for finetuning, achieving ~85% accuracy
- Developed a rule-based parser to extract structured entries from key phrases in wills
- Built a prototype tagger model for segmenting relationships and named entities
- Parsed and structured 609 pages into spreadsheet format for research use
Impact
The project demonstrated a scalable pipeline for transforming early modern legal texts into usable datasets. With further training, the models and methods can support deeper analysis of inheritance patterns, naming customs, and social relations across centuries of English history.