THE BRITISH LIBRARY

The Newsroom blog

07 January 2019

Heritage Made Digital - the newspapers

The British Library is currently engaged on a major programme entitled Heritage Made Digital. The aim of the programme is to transform digital access to the British Library's heritage collections by streamlining digitisation workflows, undertaking strategically led digitisation and making existing digitised content available as openly as copyright and licensing agreements allow. Heritage Made Digital is embracing a wide range of materials, from manuscripts through to sounds, and one of its major elements is newspapers. 

Unfit newspaper volumes awaiting conservation inspection at the British Library

Unfit newspaper volumes awaiting conservation inspection

The first thing to ask is why the British Library needs to be digitising newspapers, when we already have a very productive relationship with family history company Findmypast, which selects and digitises newspapers for the British Newspaper Archive, providing us with digital preservation copies in the process. It has digitised over 20 million pages from our collection, and adds hundreds of thousands of extra pages each month.

The simple answer is that there is more that we would like to see digitised that isn't likely to get digitised soon otherwise. The greater part of newspapers processed by Findmypast come from our microfilmed copies, because it is so much easier and quicker to do so (about eighteen times quicker than digitising from print). But only a third of our collection of some 60 million newspaper issues has been microfilmed. Of the newspapers for which we have only print, some get digitised, but many do not. In part this is because of the condition of many of newspapers, often produced using low-quality newsprint and for many years not stored in optimum conditions. We define preservation status of our newspapers under three categories: good, poor and unfit. Unfit no one gets to see, even onsite, unless we have a microfilm or digital access version. And around 4.5% of our collection (or 20 million pages) is in an unfit state and with no microfilmed or digitised copy available. That's a lot of newspapers not to be making available at all.

So, for Heritage Made Digital, we have chosen to concentrate on newspapers in a poor or unfit condition. This is not as straightforward as it might sound, since few runs of a newspaper title (i.e. from its first date to its last date) exist under one condition status. One volume may be good, another poor, another unfit (e.g. with a broken spine, crumbling pages etc). Therefore, although we want to concentrate on poor or unfit newspapers, we also want to digitise full runs of newspaper titles, because this will make best sense for researchers. In practice, we find that 40% of the volumes we are digitising for Heritage Made Digital are in a poor or unfit state. 

We have set other restrictions for ourselves, with the aim of offering the best result for the widest range of research users. We are only digitising newspapers that are out of copyright, so that we can make the results freely available online - both the digitised pages and the data created by digitisation. Calculating when a newspaper goes out of copyright is complicated, but we are sticking to a 140-year rule - so the run of the newspaper has to have ended by 1878. 

Next, we are primarily digitising newspapers that we published in London but which were distributed outside London as well. So, not newspapers for the areas of London only (i.e. London regionals), but metropolitan newspapers with a wider circulation. Curiously enough, this is a neglected area for newspaper digitisation. The British Newspaper Archive focusses heavily on British regional newspapers, while the main UK national newspapers available digitally are almost entirely those where the title still exists (e.g. The Guardian, The Times). In other words, we have identified a gap, one which we think will make a significant difference to what is available online so far.

We are not in competition with Findmypast, however - in fact, we are working closely with them. Every newspaper that we digitise will be made freely available via the British Library's catalogue, but they will also be made available via the British Newspaper Archive (a subscription site). That means that almost all of our digitised newspapers will be searchable - by title, date and word - in the one place. As things stand, the newspapers will be appearing on the BNA first, and secondly (at a date still to be determined) through the British Library catalogue, using the Universal Viewer display tool (a development project still in progress).

Pile of British Library newspaper volumes

Waiting to be digitised

So, what are we digitising?

It will be around 1.3 million pages, 1 million from print and another 300,000 from microfilm. We're still choosing the titles to digitise, even as we start digitising, as we find out more through a process of preservation need and research, but it will be somewhere around 180 newspaper titles, many of them short runs of a year or less. We can't provide a definitive list as yet, but these are some of the titles (with title changes) that have gone to our imaging studios already:

  • Baldwin's London Weekly Journal (1803-1836)
  • The Bee-Hive / The Penny Bee-Hive (1862-1876)
  • The British Liberator (1833)
  • Colored News (1855)
  • Illustrated Sporting News and Theatrical and Music Review / Illustrated Sporting and Theatrical News (1862-1870)
  • The Lady's Newspaper and Pictorial Times (1847-1863)
  • Mirror of the Times (1800-1823)
  • Morning Herald (1801-1869)
  • The News / The News and Sunday Herald / The News and Sunday Globe (1805-1839)
  • People's Weekly Police Gazette (1835-1836)
  • Pictorial Times (1843-1848)
  • The Saint James's Chronicle (1801-1866)
  • The Sun / The Sun & Central Press (1801-1876)

There is a lot more that we have planned. We're exploring academic partnerships (we're already working closely with the recently-announced British Library/Alan Turing Institute data science project Living with Machines). We're aiming to do creative things with the data. We will be publishing blog posts, both about the content and about the decisions we're making on what gets digitised. We will be producing online guides and research tools, aimed at both the specialist and the general user.

We think that we have come up with a model for the digitisation of newspapers, in particular the way in which we are working in partnership with Findmypast, which will be particularly productive. We certainly hope to build on it beyond the life of the project. We can't show you any newspapers digitised through Heritage Made Digital, or offer any free datasets, as yet. But we will do soon.

It's worth remembering that the British Library has 60 million newspapers, from 1619 to the present day. After a decade or more of intensive work, we have digitised just 5%. There is a long, long way to go.