Back
Challenge
Victorian prison registers are densely structured manuscript records with hand-drawn tables and irregular handwriting. These documents presented significant challenges for layout recognition and accurate transcription, particularly due to sparse content, skewed baselines, and complex formatting.
Solution
Osiris-AI:
- Prepared and processed 500 images of Victorian prison registers
- Collaboratively built a training dataset
- Developed a tailored pilot HTR model through iterative retraining, achieving around 85% accuracy
- Cropped images to focus on table content, improving segmentation performance
- Created a custom postprocessing pipeline to cluster rows and columns
- Extracted structured fields (e.g., names, ages, birthplaces, offenses) using regex-driven parsing
Impact
The project produced a working HTR model for prison records and an adaptable pipeline for turning raw handwriting into structured CSV datasets. The tools and methods now form a robust foundation for scaling the transcription of 19th-century penal records, supporting both research and digital archive efforts.
