Artificial intelligence is being used at the University of Innsbruck, part of the EU's READ project, to automatically transcribe previously unintelligible historical documents, and make the system available as a phone app.
How do you find a text in ancient manuscripts, and do it fast? Until recently, computers weren't very good in reading handwritten scripts — but now artificial intelligence has produced a breakthrough.
The Tyrolean State Archive in Innsbruck stores countless documents dating from the 11th century onwards — mostly official records, legal documents and other important handwritten documents from the past. Transcribing these books isn't easy. But this archive is working with scientists to automate the transcription using cutting-edge computer technologies.
"With difficult scripts I believe the new technique will have problems. But with relatively nice calligraphy, the new system has great advantages and helps us a lot," says the Director of the archive, Christoph Haidacher.
To digitise such books, scientists working on a European research project, READ, designed a simple-to-use system based on a specially-developed smartphone application: it detects when pages are turned and automatically takes high-resolution photos of each page.
"We use, of course, a combination of low-tech and high-tech. A dark tent is a relatively simple, low-tech accessory. But it works with a high-tech app running on a smartphone that is connected to the Transkribus platform: the app uploads the images to the server that performs the recognition of the handwritten text," says the READ project co-ordinator & Researcher in Digitalisation & Digital Preservation at the University of Innsbruck, Dr. Guenter Muehlberger.
Transkribus simplifies tasks that would often take years of work, helping scholars with complex handwritings and unusual layouts. It is currently being used to transcribe the 500-page "Hero Book", the most significant anthology of Medieval German texts commissioned by Maximilian I in early 16th century.
"The big advantage of this system is that it provides a link between the image and the text, and does it in a very simple way, so the transcriber has the full overview immediately. That's a way to reduce error to a minimum, which can't be achieved with any other system," says Professor of Literature & Cultural History at the University of Innsbruck, Mario Klarer.
The server at the University of Innsbruck uses machine-learning algorithms to teach the computer new writing styles. After users transcribe part of the text manually, the software engine learns to identify the characters and then finishes the task automatically with impressive accuracy - higher than 95% for historical documents independent of their language or writing style.
"I give the computer an image and a part of corresponding text, and based on that the computer can learn the handwritten script and similar fonts," says Dr. Muehlberger.
The system can transcribe text in any language, bringing together scholars and scientists, archivists and volunteers from many countries. The developers are planning to make Transkribus commercially available to users around the world.
"We were even surprised by the great success of the project and that so many institutions are in touch with us expressing an interest. And as we want to continue to offer and expand this service, we are starting a spin-off company," concludes Dr. Muelhberger.
You can learn more about the Transkribus system on its website: https://read.transkribus.eu.