Osiris-AI Logo with tagline

Apply our advanced technology to transform your handwritten and printed documents into structured data and interactive visualisations.

What we do
Make your archives useful for research and visible to a wider audience

Who we are
Historians and computer scientists from the University of Cambridge

How it works
Watch the video

Contact us
Challenge us

Scroll down

Unlocking archives through intelligent automation.

From image to insight β€” Osiris-AI transforms complex historical documents into structured, searchable data.

Osiris-AI develops advanced tools that transform complex archival and historical materials into structured, searchable data. Our technology automates processes that once required extensive manual work β€” from page segmentation and handwriting recognition to data extraction from complex layouts β€” enabling faster, more accurate research and digital preservation.

Learn more

How it works...

Who we are.

Oliver Buxton Dunn

Dr Oliver Buxton Dunn

After graduating with an MPhil in History from the University of Cambridge, Oliver completed a PhD in Italy at EUI in Florence. He then worked as a researcher and lecturer at Cambridge. Since 2022, he is teaching and undertaking research at Carlos III University in Madrid.

Oliver Buxton Dunn

Head of Research and Development
Alexis Litvine

Dr Alexis Litvine

Alexis holds a PhD in history from the University of Cambridge, after graduating from Sciences-Po (Paris) and the Ecole Normale SupΓ©rieure. He specialises in quantitative analysis, geospatial modelling and data, and digital strategies for information retrieval.

Alexis Litvine

Head of company operations
Yiannos Stathopoulos

Dr Yiannos Stathopoulos

Yiannos holds a PhD in Computer Science from the University of Cambridge, an MSc in Computer Science from the University of Oxford, and an MSc in Statistics from the University of Nottingham. Yiannos advises our tech team... and produces exquisite models for us to use.

Yiannos Stathopoulos

Co-founder and tech advisor

Ruth Murphy


Ruth holds a PhD in Italian from the University of Cambridge and is currently a postdoctoral researcher at the University of Sheffield. As a talented linguist, she oversees modern transcription work for Osiris.

Chloe Ashley


Chloe Ashley helps with company administration and client care. She also lends a hand with the more delicate transcription tasks and data QA. She holds a degree in English Literature and Hispanic Studies from Queen Mary, University of London.

Stan Hinton


Stan graduated in history from Cambridge after a successful career start in the tech industry. He specialises in UI components and annotations schemes for AI.

Charlie Cook


Charlie is an undergraduate mathematics student at The University of Aberdeen with an enthusiasm for coding and AI. Charlie is involved in model training and fine tuning, as well as script construction and refinement.

Sam Hoyle


Sam is now ESRC-funded postgraduate student at the University of Durham, specialising in maritime violence in the Indian Ocean. He has helped us with transcription, segmentation, and image processing since 2023.

Our services.

Osiris-AI turns complex historical records into trusted, structured data using tried-and-tested methods developed in collaboration with leading scholars since 2020.

πŸ–ΌοΈAdvanced image preparation

Image enhancement, deskewing, dewarping, page splitting, and CV/AI optimisations.

✍️Handwriting & print text recognition

Large base models and bespoke HTR/OCR pipelines adapted to historical scripts and document types.

βœ…AI output validation & quality assurance

Detect hallucinations, missing content, misrecognition, and numeric errors with audit-ready logs.

🧩Layout & structure analysis

Custom page segmentation and layout parsing for reliable structured data extraction.

🧠Multi-model language processing

Carefully constrained multimodal LLM workflows for entity recognition and relationship extraction.

πŸ‘₯Crowdsourced ground truth platforms

Web tools for expert/public validation, correction, and training data creation.

πŸ—ΊοΈHistorical GIS & spatial analysis

Link records to place, movement, and spatial context for mapping and analytics.

πŸ§‘β€πŸ’ΌConsultancy & deployment

Workflow design, grant support, on-site deployment, and ML infrastructure planning.

πŸ“¦Research-ready exports

Outputs delivered in JSON/CSV/XML/PDF, with structure retained and QA trails included.

πŸ”’ Ethical and transparent by design

  • No subscriptions or hidden fees. Project-based pricing for the work you need.
  • Your data stays private. Processing on secure, isolated servers (yours or ours), not third-party clouds.
  • Guaranteed deletion on completion. Files and models can be permanently deleted after delivery and QA.
  • Transparent error tracking. Outputs are auditable, correctable, and fit for integration.

Accuracy you can measure β€” and trace.

General-purpose LLMs can transcribe reasonably well β€” but they also invent plausible, correct-looking text. Osiris-AI measures and controls this risk explicitly.

Osiris gets the best from fast-evolving AI through research and development to provide trustworthy and useful information from complex records.

Why generic LLM transcription breaks in production

  • Plausible hallucinations: extra words that look authentic but were never in the source.
  • Loss of structure: word position, layout, and tables are not reliably preserved.
  • Token limits: large documents must be chunked, increasing drift and QA burden.

How Osiris-AI manages this for you

  • Two-pronged approach: deterministic feature extraction (segmentation + HTR) plus multimodal LLMs.
  • Structure-first: layout, coordinates, and document structure are retained for downstream use.
  • Token-level verification: every AI output is checked against the source.
  • Batch-safe pipelines: designed for scale without manual chunking.
  • Customisable models: configurations tuned to your material, with transparent error reporting.

We regularly benchmark our models against competing systems (including industry leaders). The examples below report overall accuracy (F1) against ground truth and break errors down by type β€” isolating hallucination risk, numeric failures, and OCR noise.

18th-century English handwriting

HTR benchmark against ground truth.

English language newspaper print (1925)

OCR benchmark against ground truth.

Overall accuracy is reported as F1 against ground truth. Error categories isolate hallucination risk, numeric failures, and OCR noise β€” helping teams prioritise what to check and what to fix.

Past projects.

Trusted by

Get in touch with us ...

Whether you need to extract data at scale, want to integrate geospatial data and historical records or need data consultancy, we'd love to hear from you.

* These fields are required.