Using Semantic Textual Encoding to Support the Navigation and Analysis of Medieval Texts

Zoe Bartliff (University of Glasgow) @ZBartliff

Medieval texts, with a slight refocusing of perception, can be considered as a large scale data-set (LSDS), comprised of a wealth of data but much of it only accessible in a limited or hard to find format. Traditionally, the same issues encountered by medieval scholars are those also found in working with LSDS. Aside from the difficulties of navigation and processing the sheer volume of the data contained within even a single text of the medieval corpus, issues arise in the estimation of the size and shape of the material, the identification of relationships that exist between it and other manuscripts and in the seeking, sorting and filtering of the textual information.
The research presented within this paper, proposes the application of semantically focused XML encoding to support a data-driven approach to medieval textual analysis. Utilising twenty-six of the manuscripts from the Cyfraith Hywel legal corpus as a case study, this paper briefly treats the process of encoding the texts, before presenting select highlights that have resulted from the subsequent analysis. Due in part to the tenacity of the tradition, spanning several centuries, and in part to its deep cultural ties with medieval Welsh society, Cyfraith Hywel is widely considered to be the most important source available to modern scholars concerning the state of the contemporary Welsh legal system as well as a valuable reflection on the cultural and historical situation within Wales throughout the Anglo-Norman invasion. The temporal and structural depths of these manuscripts are matched equally by the internal breadths of the content contained within. Containing topics ranging from the rights, dues and duties of each of the King’s officers through to the valuation of animal parts and the place and protections of women, Cyfraith Hywel is a veritable social compendium, constructed to embody the culture of medieval Wales in as far as it is applicable to the law.
Numerous techniques have been employed to aid navigation of the Cyfraith Hywel corpus, primarily though the creation of scholarly editions, but these approaches, as with the majority of access methods for LSDS, either permit the researcher to aggregate the data, therefore gaining a surface level overview, or to 'drill down' and explore a smaller sample of the data, but in greater detail. A consequence of this limitation on access to the text has resulted in the application of research methodologies that are not replicable with additional data-sets. Further to this, much scholarship into Cyfraith Hywel, often by admission of the academic who conducted the research, is lacking in sufficient data to present a representative and convincing argument for the corpus as a whole or to create a protocol to allow the inclusion of additional material that may progress their research further. This lack of inherent replicability and expandability within the discipline has left Cyfraith Hywel at somewhat of a methodological impasse, one which is reflected throughout medieval scholarship.
Through the application of semantically focused encoding, the edited corpus that forms the core of this research the corpus is opened up to both horizontal and vertical examinations of the material whilst concurrently providing opportunities for replicable and expandable exploratory data analysis.