Rethinking the abbreviation: questions and challenges of machine reading medieval scripta

Speakers: Estelle Guéville (Louvre Abu Dhabi) David Joseph Wrisley (NYU Abu Dhabi) @DJWrisley

Every medievalist will recognize barriers to access (time, distance, paleographic skill, condition) and how digitization has expanded the way we work. It is possible to view facsimiles on the other side of the world (say, in Abu Dhabi). Inaccessibility in 2020 has not only been a question of distance from physical archives, but also from the recognizable infrastructures in which we work (offices, libraries, print-only resources). We know across the GLAM sector that the creation of transcriptions and metadata is being revived in a time of social distance (Ferraiolo, 2020). This Spring we focused our collective attention on what can be done in isolation and with the kind of time and attention span that we might not have had in a regular work year. A slow focus on creating "ground truth" for neural automatic transcription systems, has led us to rethink what transcription really means for pre-modern writing systems (Kirmizialtin and Wrisley 2020). Usually the notion of "unread" is used with a Morettian connotation, pointing to all the texts that we don't have in our possession or we have not had the time to read. There is an unread, however, within medieval manuscripts. It is an issue that has always been there, just beneath our eyes, that most medievalists prefer to resolve or to uncollapse: the abbreviation. Our case study uses a thirteenth-century Latin Bible following the Paris tradition held in the Louvre Abu Dhabi collection. This Bible has a very regular script and page layout, which, in the absence of a colophon, makes it almost impossible to date and to determine exact provenance. As often, when palaeography and textual clues fail, the medievalists have looked to the decoration, which, in our case, has been compared to other manuscripts from the Rouen region. Computational text studies have illustrated that micro-features are useful for textual forensics (Pinche et al, 2019; Byszuk and Khismatulin, 2019; Kestemont et al, 2019). What new knowledge about manuscripts can automatic transcription help uncover: authorship, localisation or scribal habit? Are there patterns of use of abbreviations, or even letters forms (s and ſ; d and ꝺ; r and ꝛ) that can help us re-read our texts in manuscript? Can we link variance in abbreviations to a change of hand or other material traces of change? In the case of charters, the use of s and ſ; d and ꝺ has been studied (Stutzmann, 2011) and led to the author to make a classification and assertions about textual evolution. Yet, this study was based on manual encoding, which, if possible in the case of charters, is not an option when working with manuscripts made of many hundreds folios. Studying the use of those specific letter forms, the number of abbreviations per line, their location in the word, in the line, in the sentence, or even the recurrence of abbreviations for the same words (i.e. quoque, deus, super, etc.) and their differences is a huge task made possible thanks to computational methods.