Blog No. 9 Registry of Deeds and Virtual Record Treasury of Ireland
Guest Blog Article by Dr. David Brown.
Beyond 2022 is committed to finding replacements for the records lost in the destruction of the Public Record Office of Ireland (PROI) in 1922 and obtaining digital copies of these replacements whenever possible to restore to the virtual shelves of the digitally reconstructed record treasury. The PROI contained tens of thousands of deeds, originals, copies of deeds or copies of the memorials at the Registry of Deeds. These were lodged as evidence in court cases, to support probates, to facilitate purchases and sales under the various land acts of the nineteenth century, and for a host of other purposes. As there could be many copies of the same deed in the PROI, copied for different purposes over a long period of time, the Registry of Deeds holds replacements for tens of thousands of the items that went up in smoke in 1922.
Smoke billowing from the Four Courts, 1922
Credit: Dixon Slides, Dublin City Libraries
At an early stage in both projects, Beyond 2022 and the Property Registration Authority’s (PRA) own digitisation initiatives, it became apparent that there was no point in amassing thousands of digital images unless the content is searchable. The collections are just too large to expect users to page through all of the pictures in the hope of finding the information they are looking for. At Beyond 2022, we began to experiment seriously with Transkribus. Transkribus is a machine learning platform that allows the computer to infer, from the ground truth of pre-prepared perfect transcription, likely matches of words or letters, even if the computer has never seen that style of handwriting before. The ground truth is converted by Transkribus into a mathematical model based on the geometry of the letters. When the computer comes across a new letter in a new document, it compares all the words and letters in its memory and suggests the most likely match. The computer cannot ‘think’ as such, but it also cannot forget and this is where machine learning is such a powerful technology.
Normally, 15,000 words of ground truth will get you started and the results will be usable. During the 2020 lockdown, PRA staff produced 140,000 words of perfect diplomatic transcription from the transcript books. This effort has resulted in a highly accurate ground truth that, if scanned and processed, would enable the first 100 years of transcript books to be fully searchable. It would be a truly remarkable resource. The initial sample volumes of Transcript Books have now been released on the VRTI website so the first 1,500 pages of newly scanned material are now available to search for the first time.
The ground truth text prepared by PRA staff in 2020 was, in fact, so accurate and of such high value that it was incorporated into a general AI model, called B2022 English M4, that can be accessed through the Transkribus platform. This model is available for free and will enable any user to experiment with converting their own historical documents into text. The PRA text has already helped hundreds of users around the world to interpret their 18th century documents so that single major effort back in 2020 is helping to reveal the secrets of thousands of documents.
You can download Transkribus or Transkribus lite from the READ Cooperative website, and the B2022 English M4 model is available there for you to use. We hope you find it useful and do let the community of users know about your projects. If you have any questions or issues using Transkribus, do contact the READ Cooperative team in Innsbruck directly who provide the support for this platform.
About: Dr David Brown is Archival Discovery lead for Beyond 2022/ Virtual Record Treasury of Ireland and a member of the Registry of Deeds Digitisation Strategy Advisory Group.