First steps towards an UnEdition of Cyfraith Hywel
First steps towards an UnEdition of Cyfraith Hywel
Like many medieval outputs, the medieval Welsh legal tradition of Cyfraith Hywel did not exist as a singular concept but instead was a living and evolving entity attested from the 12th century in almost 80 manuscripts and books. As one of the most prolific and comprehensive sources of medieval Welsh life, this tradition and, in particular, the alterations evident between the branches and manuscripts contained within have the potential to reveal a great deal about an otherwise opaque period of Welsh history. Despite this promise, the size and complexity of the tradition have long been a stumbling block to the high-level holistic analysis required to identify these tradition-wide shifts, a fact that has caused great frustration for Cyfraith Hywel scholars. Advances in computer-aided text analysis (corpus linguistics, natural language processing etc.) are a light at the end of the centuries long tunnel that is this challenge, because they have demonstrable scope for effective exploration of large-scale textual data. However, in order to apply these methods it is necessary to dramatically rethink how Cyfraith Hywel is made available to the scholarly community and to embrace the concept of a machine readable, malleable ‘UnEdition’, the potential of which is the central theme of my PhD thesis.
The most prominent edition of Cyfraith Hywel to date is an aggregated text composed by Aneurin Owens in 1841. Celebrated for encompassing the three primary branches and the majority of Cyfraith Hywel manuscripts in one easy to navigate edition, this was a truly invaluable addition to Cyfraith Hywel scholarship that made the text and some of the variations contained within more widely accessible. Despite its continued popularity, there are a number of flaws to this aggregated approach, the most notable being that the editor must select a ‘model’ manuscripts to act as the base texts, with all other manuscripts represented by footnotes marking variations. Not only does this imbue within the tradition a false sense of hierarchy, it, more importantly, masks the structural and organisational foibles that make each manuscript unique. Within the Cyfraith Hywel tradition, only two of the forty manuscripts created before the Acts of Union (1536 and 1543) have been judged to be identical,[1] meaning that there are thirty-eight that are unique; such variety cannot be accounted for by a single, static aggregated edition.
To counteract this issue, there has been a push in the last half-century or so to create editions of specific manuscripts. These have been exceptionally well received and, in turn, have led to a greater understanding of how manuscripts might relate to one another. Equally, there have been steps taken to increase the availability of individual manuscripts and their associated editions by making them available in a digital arena. Digitised images of the manuscripts are available at various host institutions and the Cyfraith Hywel website offers a comprehensive index of manuscript contents with an associated version of Aneirin Owens’ aggregated edition. There are even digital proxies of many Cyfraith Hywel texts available on the Welsh Prose website, each of which is accompanied by manuscript accurate encoding. Whilst these digital proxies approach an UnEdition in concept, incorporating additional metadata and facilitating new avenues of research, they are still static documents much like their physical counterparts found in scholarly editions. In direct contrast to Owen’s edition, these have the depth of detail but lack the capacity for efficient comparative research.
As an attempt to advance this situation, I created, through the course of my thesis, what might be considered the first steps towards an UnEdition of Cyfraith Hywel by imbuing the text from 21 manuscripts with a computer readable XML structure. The intent of this was to explore how document manipulation and the application of computer-aided textual research methods such as those grounded in natural language processing, statistical analysis or machine learning might offer an alternative perspective on the tradition that might lead to new insights and avenues of thought. The process involved in creating this encoded corpus was lengthy if not inherently difficult (beyond those difficulties that are regularly associated with the interpretation of medieval text). The first and most important stage was establishing the structure that would be used for each text. For Cyfraith Hywel it was logical to use the pre-existing internal structuring of subdivision, meaning treatments of an individual piece of law. The great variety found between the manuscripts made this division unclear at points so existing editions and other scholarly sources were consulted to try and achieve a diplomatic result. Following these broad divisions, each word was encoded with additional metadata concerning its grammatical function and, most importantly, its dictionary or lemma form. One of the greatest challenges facing the creation of effective digital editions of medieval texts is the inconsistency found in spelling and grammatical conventions. For modern languages, the rules governing language production are quite firm and the samples available for analysis are exceptionally numerous. This means that it is possible to train a computer to recognise patterns of language usage and then independently analyse an unknown piece of language with reasonable accuracy (e.g speech recognition, spam detection in emails). For medieval material, the samples are limited and the attested language in constant flux with, at times, generations between surviving iterations of a text. Due to this, it can be difficult to create accurate means of computer-aided text identification. By including the lemma form in the mark-up, it is possible with additional analysis to determine, for example, the variety of vocabulary, free from grammatical variance, found within an individual section of text and compare this to other areas. The final stage of the thesis, following the application of this structural, grammatical and semantic encoding, was to test the usefulness of the machine-readable text to the effective access and understanding of this complex tradition.
Whilst not at this stage a fully realised Un-Edition, the resultant machine-readable corpus was shown to open the tradition to a new perspective for analysis. By viewing the textual data from a distance rather than focusing upon those unique features that stand out in most keenly in traditional approaches it was possible to identify some unexpected similarities and difference within and between established groupings of texts. Beyond this, for the first time, it was possible to provide a statistical breakdown of the size and scope of the Cyfraith Hywel tradition and the subdivisions contained within. I discussed some of the results of this analysis at the 20/20 Dark Archives conference (the video of this is available via their website), but with direct reference to the editorial impact of this format, the benefits go beyond the ability to access this new range of information. The flexibility afforded by the digital format promotes an alternative approach to reading the tradition, one that actually seems far closer to the original intent of the material. Cyfraith Hywel is not a narrative tradition. Instead it is composed of many short snippets of information, known as tractates, designed to facilitate easy use by practicing lawyers. The arrangement of material within each manuscript of Cyfraith Hywel doesn’t seem random, but neither does it imply an intent for the manuscript to be read from cover to cover. A searchable and malleable digital version of the content can allow the reader to emulate this approach, navigating through the text, or indeed multiple versions of the text, to seek the points that are of interest to them. Un-Editions fully embrace the flexibility and promise offered by the digital format and, whilst they may never replace the traditional edition or the traditional approaches to medieval literary scholarship, they do offer an excellent and supplementary counterpoint.
To move forwards and take the encoded manuscripts created for my thesis into the sphere of a fully realised UnEdition, the next step would be to create a user-friendly interface that makes available the language analysis techniques to a wider audience without the requirement for technical expertise. By bridging the gap between individuals with the technical expertise required to create and implement the software for analysis and the specialised subject knowledge necessary for analysis of the content it may be possible to push through the decades long methodological impasse that has been facing Cyfraith Hywel scholars and gain a holistic perspective on these fascinating texts.
Zoe Bartliff
[1] Jenkins, D. (2000), Excursus: the lawbooks and their relation, in T. M. Charles-Edwards, M. Owen & P. Russell, eds, ‘The Welsh King and his Court’, University of Wales Press, pp. 10–14.