Digital scholarship blog

Enabling innovative research with British Library digital collections

08 July 2016

A temporal & spatial investigation of disease in 19th century British newspapers

We're excited to present this guest post by Catherine Porter, Spatial Humanities Project, Department of History, Lancaster University, based on a talk she gave at an internal seminar (a second talk from the seminar is here). Over to Cat...

In recent years libraries and archives have worked tirelessly towards digitising their collections. For researchers, one of the most exciting collections is the British Library’s 19th Century Newspaper Collection containing 30 titles and in excess of 30 billion words. But the use of these digitised corpora for research purposes poses new methodological questions. The greatest challenge to the historian, for instance, is how to fully explore a topic of interest in textual archives that often run to millions, if not billions, of words. Certainly, the close reading (systematically reading or searching a piece of text in full, often a slow and laborious process) techniques traditionally used are still applicable, but to fully investigate 100 years of even just one publication via conventional methods is an impossible task. In order to fully follow the narrative of a theme over an entire century of newspaper publication we must find new ways to interact with these digital archives.

In my current post as a post-doctoral Research Associate on the European Research Council project ‘Spatial Humanities: Texts, GIS & Places’, based at Lancaster University, we have made significant strides in tackling how we can more fully investigate digital texts. The project team has developed a suite of techniques called Geographical Text Analysis (GTA) (see Porter, et al., 2015), a combination of corpus linguistic methodologies and geospatial technologies that enable us to temporally and spatially analyse and ‘map’ themes of interest in digital texts.

GTA works by first carrying out keyword searches in Lancaster University’s CQPweb, corpus linguistic software that houses specially prepared digital copies of the BL 19th Newspaper Collection. Next, the exported text file is 'geoparsed' (see Grover et al., 2012) to identify and tag place-names within a certain span of the keywords of interest, and coordinates are allocated to these place-names using a gazetteer such as GeoNames (a list of place names and associated coordinates). The resulting file contains the keywords of interest, the text either side of the keywords to provide context, the place-name, and the related coordinates that enable us to map the textual information using GIS software.

CQPweb screenshot
Screenshot of text in CQPweb

A key objective of the Spatial Humanities project is investigating change in mortality in 19th century England and Wales. The role played by newspapers in publicising health and disease such as disease outbreaks, government health policies, and preventative measures, is a key question that can add to historical debates on mortality and has yet to be fully explored because of the challenges outlined above. We are currently working on research using GTA to investigate what 19th century newspapers said about public health and disease. We are also directly comparing the outputs from GTA with official health statistics to interrogate the temporal and spatial relationship between mortality and newspaper interest.

Our research questions include whether 19th century British newspapers discussed those places in England and Wales that were in greatest need, and using various forms of spatial analysis we have highlighted those places where the mention of disease was statistically significant. Although discussion of disease was found to be global, references to other countries were largely related to how disease was affecting the British population - such as colonial officials and soldiers - abroad. Unsurprisingly, in the newspapers investigated so far we have found that discussion of disease was focused on urban settlements in England and Wales, and was particularly London-centric, as shown in the following image.

A map of significant disease mentions in the Era newspaper
A map of significant disease mentions in the Era newspaper, related to Crowding, Food and Water, and Respiratory diseases in 19th century London (based on disease categories devised by Woods, 2000). Base map Copyright David Rumsey Map Collection.

The importance of our Geographical Text Analysis methodology is clear. For the first time we have at our disposal a suite of techniques that facilitate a semi-automated and methodical search of digital texts enabling the investigation of key themes of interest and allowing the analysis of textual information both temporally and spatially. GTA not only has implications for research in general, but also as a teaching tool in disciplines such as the digital humanities, geohumanities, corpus linguistics, history and geography.

For anyone interested in finding out more please feel free to contact me ([email protected]) or Professor Ian Gregory ([email protected]).