The Newsroom blog

News about yesterday's news, and where news may be going

4 posts from January 2019

31 January 2019

The anatomy of news

“I hear new news every day”, wrote the scholar Robert Burton in 1628, “and those ordinary rumours of war, plagues, fires, inundations, thefts, murders, massacres, meteors, comets, spectrums, prodigies, apparitions, of towns taken, cities besieged in France, Germany Turkey, Poland, daily musters and preparations, and such like.” For Burton, this firehose of news amounted to a “vast confusion”, though his attitude seems to have been one of wonder rather than fear.

Burton was an Oxford man, but made regular trips to London. There he would have paid a visit to the Exchange, gathering up news and gossip from the merchants crowding the surrounding streets, before moving on to St. Paul’s Churchyard, perhaps stopping to buy a pamphlet from a hawker on the way. On front of the Cathedral he might have picked up some more pamphlets from the many booksellers lining the border of its square, or a copy of Nathaniel Butter and Nicholas Bourne’s new news publication, an innovative weekly format copied from the continent, although, somewhat disappointingly, it wouldn’t have contained any domestic news.

This short walk helps us understand how Burton perceived a world of overwhelming information. But what would he have made of the 21st century? Indeed, what would he have made of the 19th? Had he been writing, say, 250 years later, in 1872, Burton would surely have been overwhelmed by the number of titles available to him on a daily basis.

A late-seventeenth-century London coffee house

A late-seventeenth-century London coffee house (Usage terms: Creative Commons Attribution Non Commercial Share Alike licence. Held by © Trustees of the British Museum)

The 19th century is a new world for me, coming from a background of 17th century newspapers. And it is a different world. There’s the name, for one thing: the Oxford English Dictionary records the first use of the word ‘newspaper’, to mean a publication of regular, periodical news, in 1688. My own work is on the first half of the 17th century, when the word ‘news-book’ was most common, as was a host of words and phrases like ‘coranto’, ‘weekly news-sheet’, ‘weekly pamphlet’ and ‘Mercuries’, with overlapping, shifting and slightly different meanings.

This naming change can be useful – it helps us to grasp the real intellectual and material differences between the news world of the 17th century and that of the 19th. Although the change was gradual and not always linear – changes and innovations often moved backwards as well as forwards – the march of progress was did eventual pick up pace. 17th century news looked very different, much like a few sheets of A4 paper folded in half, with news in a single column. It was called a news-book because it looked like a small book. The way information was organised was different, too: early 17th century news-books contained a series of paragraphs each from a particular place, recording all the news collected from that place. The invention of the ‘article’, a unit of news based on one particular subject or event, was not to happen for some time.

Newspaper front pages showing the evolution from one to eight columns

The evolution from one to eight columns

This categorical divide also continues with the data. I estimate there are 1,000,000 words in Early English Books Online’s entire periodicals collection. The British Library’s collection of 19th century news runs to hundreds of millions of pages (we wrote recently that the collection consists of 60 million issues, 450 million pages... perhaps four trillion words... twenty-six trillion characters…). The other seismic change is that a computer can be taught to read (with varying accuracy) 19th century news. For the 17th, it’s still very difficult.

This Optical Character Recognition is what allows me to load up the British Newspaper Archive and check if my great-great-granddad committed any crimes in 1839 (still can’t find anything), for example, or check Limerick hurling scores from 1887. This difference isn’t just trivial: it represents a complete step-change in the way we approach newspaper history. For one thing, the datasets increase in size, by orders of magnitude. I have created a dataset of about 15,000 rows, manually collected, by reading 17th century news and noting down bits of information in a spreadsheet. 15,000 rows, from about 400 newspaper issues, which took many months to create. Yesterday, a few hours, I created a dataset of N-Grams (basically combinations of words) from a single issue of one 19th century title.  It contained 150,000 rows.

150,000 rows of generated data, from one issue. Multiply that by about 250 for a weekday title, then by hundreds of titles, then by 200 years and the potential for ‘big data’ is rather astonishing. Of course, this data is not as rich with information as my humble spreadsheet, nor does it record any kind of fine-grained detail, but it does change the types of processing, computing power and storage needed, and most importantly, the types of intellectual questions that are and are not answerable. My 17th century dataset is like interviewing everyone in a small town, in some detail; the 19th century datasets we’ll be working with on our Heritage Made Digital newspapers project records the cosmos – albeit from far away. We don’t know much, but we know it about an enormous number of things. But the differences extend past volume: there is also a step-change in readership and scope.

The 19th century newspaper was everywhere. Some of the most popular 17th century newsbooks were probably printed in weekly runs of about 2,000; by 1863, the Daily Telegraph had a circulation of 120,000 per day. In 1628 Burton was overwhelmed by information in London and Oxford but elsewhere the firehose could be a drip, or a drought. By the 19th century news surged through the country’s arteries, veins and capillaries: at first everywhere within the reach of the train; eventually the telegraph, information finally travelling at the speed of light, in dots and dashes. It was the most pervasive cultural object of the century.

Graph showing newspaper titles held by the British Library, year by year, 1621-1900

Newspaper titles held by the British Library, year by year, 1621-1900

Even accounting for the reuse and sharing of copies this is a fundamentally very different type of cultural artefact. If I analyse every page of news in the early 17th century, I have a vast record of events, and the thoughts and feelings of a select group of people. In the 19th century, the newspaper is a reasonable proxy for the way society thinks. To me it seems as though news in the 19th century captures a good proportion of a collective consciousness. It is a reasonable (though problematic) way to infer societal change. Through the newspaper’s great reach we can understand historical forces. The articles and personalities in the 19th century newspaper can tell us about structures of power. Its advertisements identify trends, economic forces and the changing roles within the family. The words themselves and their frequencies can help us understand the use of language, or uncover drifts in sentiments towards political movements, ideologies and so forth. In the 17th century the readership is so small, such a small part of the diet of information ingested by both important and ordinary people, that the questions we ask of its remains are different. Not less important, certainly not less interesting, but surely of a different kind.

Yes, the 19th century news world feels like a different one to the 17th. A mostly new world, with some evidence of the ruins of its earlier civilisation: the old towers are fallen, though echoes of their presence remain. The vast confusion had been replaced with one infinitely greater. Our job is to find, research and understand the new techniques that are necessary to make sense of this information overload.

Yann Ryan

Curator, Newspaper Data

22 January 2019

African newspapers

We're delighted to be able to announce a significant new digital resource using newspapers from the British Library's collection. African Newspapers: The British Library Collection is being offered by the academic publisher Readex. The collection comprises sixty-four newspapers titles, all dating from 1901, that were published throughout Africa, chiefly in English.

Front page of website African Newspapers: The British Library Collection

The British Library has substantial collections of newspapers from the African continent, particularly for the period of the British Empire, almost none of which have been available digitally before now. Ranging from 1840 to 1900, the newspapers cover the period of European exploration, colonialism and the first steps towards self-governance. The newspapers contain news reports, articles, letters, advertisements, shipping reports and obituaries, providing an invaluable portrait of a continent in transition.

The available titles are still being added to, but the finished resource will include such titles as Central African Times, Egyptian Gazette, Times of Marocco and the West African Reporter. The territories covered include the countries now known as Djibouti, Ghana, Nigeria and South Africa, and the islands of Mauritius and Saint Helena. All are fully word-searchable and browsable.

The Egyptian Gazette, from African Newspapers: The British Library Collection

The Egyptian Gazette, from African Newspapers: The British Library Collection

African Newspapers: The British Library Collection is available to British Library readers at our St Pancras and Boston Spa locations, as one of the many electronic resources that we provide onsite. A few we can offer for free remote access to those with a British Library reader pass, including Readex's World Newspaper Archive: African Newspapers, 1800-1922 and Rand Daily Mail (which is partly based on the British Library's run of this key South African title), via our Remote eResources facility. African Newspapers: The British Library Collection is not available for remote access, but has greatly expanded the number of newspapers from this period of African history which can now be searched in depth via a single interface.

We have a mixed model for the digitisation of newspapers. For British and Irish newspapers, chiefly regional, we work with family history company Findmypast, which produces the British Newspaper Archive website. Recently we announced that the British Library has started digitising newspapers itself, concentrating on some out-of-copyright (pre-1878) newspapers published in London, whose physical originals are often in a poor or unfit state. And we work with academic resource providers such as Readex, Cengage Gale and Adam Matthew Digital, who create thematic packages which sometimes include British Library newspapers and periodicals and are marketed to educational institutions. It's a complicated picture, and not everything can be made accessible to anyone anywhere, but through through such collaborations we can make far more available digitally than we could ever achieve alone.

 

 

 

16 January 2019

Multi-title newspaper volumes – historic practice vs. modern process

The British Library (and its previous incarnation as a department of the British Museum) has been collecting newspapers for over 200 years. The ways in which these items have been acquired, processed, and stored have changed over time as priorities, policies, locations and technologies have developed, but some historic practices have had interesting implications for our current Heritage Made Digital programme to digitise a number 19th century newspapers. The newspapers we are digitising are mostly London based, and we are focussing on titles that have a number of volumes in poor or unfit condition, with the aim for filling in some gaps that currently exist in the digital archive.

Multi-title newspaper volumes in the British Library

Multi-title newspaper volumes

One of the practices that has had a real impact on the way we have approached this digitisation project is the way in which items have been bound in the past for storage and preservation. When thinking about how newspapers are stored by the British Library, most users probably imagine that complete runs of a title, for instance the Morning Herald, would be held together. But that hasn’t always been the case. For ease of processing and storage and to conserve space, many newspapers were held in annual sequences, rather than in title runs. So newspapers published in 1832 were held together, and these were followed by the titles published in 1833.

For many newspapers, particularly the dailies, this simply means that one or more volumes of a title are held in each yearly sequence. But for a significant part of the collection, mostly weekly titles of 12 pages or less, for which the British Library collected only one edition, there simply wasn’t enough material to make up a bound volume on their own. This has meant that there are many volumes containing two, three or sometimes more titles for a year. For example, a volumes currently sat on my desk contains both The Ballot and the Weekly Times for 1831.

The Ballot and the Weekly Times newspapers in one volume

The Ballot and the Weekly Times in one volume

This practice saved space and money by reducing the number of bindings produced for each year. There were no strict rules about how many titles or pages constituted a volume, and practices varied over the years, but mostly newspapers of a similar size are bound together, as this makes them easier to store. In most cases all of the items bound into a volume are newspapers, but occasionally they also contain periodicals bound, and these fall under the care of other departments within the British Library.

This has led to some complications in the workflow of our digitisation processes. The catalogue records for newspapers do not contain details of how they are bound, so we are often unaware of whether items are bound individually or in multi-title volumes, or which items are bound together. This sometimes means that a single volume is called up multiple times by our digitisation team, as several titles from our list are bound together for one or more years. It has also made the process of scanning titles more laborious and complicated. Staff do not simply open a volume and scan the contents, they have to identify the correct title, and work out where it starts and ends, checking this against the details that are on the catalogue records.

It has also raised some interesting questions for our digitisation project. What impact does going through the digitisation process several times have on volumes, particularly as many are already in a poor or unfit condition? If one of the titles we are digitising is bound in a multi-title volume, should we be digitising all of the other titles with which it is bound? Should we be digitising the periodicals that are contained within these volumes, even though they are not officially part of the newspaper collection? How far should what is digitised follow the physical reality of what is archived?

We are still working to answer some of these questions. In general we have had to stick to digitising only the items already on our list, as otherwise the numbers could spiral out of control, and we might end up digitising large numbers of titles that do not meet our criteria (i.e. in a poor or unfit condition; out of copyright; and with a circulation beyond London). We look closely at the other titles we come across, and access them against our objectives, but in most cases there are reasons why they had not already been selected.

Despite their complications these multi-title volumes do also provide opportunities. I will talk in a future post about serendipity and its role in newspaper research, but it has also played a very small role in our selection process. As mentioned above, in most cases we have stuck only to those titles already on our selection list, but there have been a few occasions when looking at a volume, we have stumbled across another newspaper that has proved interesting enough to make it onto our list. It has also made us think a lot about how and why things were done in the past, and how practices evolve, giving us a better understanding of how the collection was, how it currently is, and how it could be in the future.

 

Beth Gaskell

Curator, Newspaper Digitisation

07 January 2019

Heritage Made Digital - the newspapers

The British Library is currently engaged on a major programme entitled Heritage Made Digital. The aim of the programme is to transform digital access to the British Library's heritage collections by streamlining digitisation workflows, undertaking strategically led digitisation and making existing digitised content available as openly as copyright and licensing agreements allow. Heritage Made Digital is embracing a wide range of materials, from manuscripts through to sounds, and one of its major elements is newspapers. 

Unfit newspaper volumes awaiting conservation inspection at the British Library

Unfit newspaper volumes awaiting conservation inspection

The first thing to ask is why the British Library needs to be digitising newspapers, when we already have a very productive relationship with family history company Findmypast, which selects and digitises newspapers for the British Newspaper Archive, providing us with digital preservation copies in the process. It has digitised over 20 million pages from our collection, and adds hundreds of thousands of extra pages each month.

The simple answer is that there is more that we would like to see digitised that isn't likely to get digitised soon otherwise. The greater part of newspapers processed by Findmypast come from our microfilmed copies, because it is so much easier and quicker to do so (about eighteen times quicker than digitising from print). But only a third of our collection of some 60 million newspaper issues has been microfilmed. Of the newspapers for which we have only print, some get digitised, but many do not. In part this is because of the condition of many of newspapers, often produced using low-quality newsprint and for many years not stored in optimum conditions. We define preservation status of our newspapers under three categories: good, poor and unfit. Unfit no one gets to see, even onsite, unless we have a microfilm or digital access version. And around 4.5% of our collection (or 20 million pages) is in an unfit state and with no microfilmed or digitised copy available. That's a lot of newspapers not to be making available at all.

So, for Heritage Made Digital, we have chosen to concentrate on newspapers in a poor or unfit condition. This is not as straightforward as it might sound, since few runs of a newspaper title (i.e. from its first date to its last date) exist under one condition status. One volume may be good, another poor, another unfit (e.g. with a broken spine, crumbling pages etc). Therefore, although we want to concentrate on poor or unfit newspapers, we also want to digitise full runs of newspaper titles, because this will make best sense for researchers. In practice, we find that 40% of the volumes we are digitising for Heritage Made Digital are in a poor or unfit state. 

We have set other restrictions for ourselves, with the aim of offering the best result for the widest range of research users. We are only digitising newspapers that are out of copyright, so that we can make the results freely available online - both the digitised pages and the data created by digitisation. Calculating when a newspaper goes out of copyright is complicated, but we are sticking to a 140-year rule - so the run of the newspaper has to have ended by 1878. 

Next, we are primarily digitising newspapers that we published in London but which were distributed outside London as well. So, not newspapers for the areas of London only (i.e. London regionals), but metropolitan newspapers with a wider circulation. Curiously enough, this is a neglected area for newspaper digitisation. The British Newspaper Archive focusses heavily on British regional newspapers, while the main UK national newspapers available digitally are almost entirely those where the title still exists (e.g. The Guardian, The Times). In other words, we have identified a gap, one which we think will make a significant difference to what is available online so far.

We are not in competition with Findmypast, however - in fact, we are working closely with them. Every newspaper that we digitise will be made freely available via the British Library's catalogue, but they will also be made available via the British Newspaper Archive (a subscription site). That means that almost all of our digitised newspapers will be searchable - by title, date and word - in the one place. As things stand, the newspapers will be appearing on the BNA first, and secondly (at a date still to be determined) through the British Library catalogue, using the Universal Viewer display tool (a development project still in progress).

Pile of British Library newspaper volumes

Waiting to be digitised

So, what are we digitising?

It will be around 1.3 million pages, 1 million from print and another 300,000 from microfilm. We're still choosing the titles to digitise, even as we start digitising, as we find out more through a process of preservation need and research, but it will be somewhere around 180 newspaper titles, many of them short runs of a year or less. We can't provide a definitive list as yet, but these are some of the titles (with title changes) that have gone to our imaging studios already:

  • Baldwin's London Weekly Journal (1803-1836)
  • The Bee-Hive / The Penny Bee-Hive (1862-1876)
  • The British Liberator (1833)
  • Colored News (1855)
  • Illustrated Sporting News and Theatrical and Music Review / Illustrated Sporting and Theatrical News (1862-1870)
  • The Lady's Newspaper and Pictorial Times (1847-1863)
  • Mirror of the Times (1800-1823)
  • Morning Herald (1801-1869)
  • The News / The News and Sunday Herald / The News and Sunday Globe (1805-1839)
  • People's Weekly Police Gazette (1835-1836)
  • Pictorial Times (1843-1848)
  • The Saint James's Chronicle (1801-1866)
  • The Sun / The Sun & Central Press (1801-1876)

There is a lot more that we have planned. We're exploring academic partnerships (we're already working closely with the recently-announced British Library/Alan Turing Institute data science project Living with Machines). We're aiming to do creative things with the data. We will be publishing blog posts, both about the content and about the decisions we're making on what gets digitised. We will be producing online guides and research tools, aimed at both the specialist and the general user.

We think that we have come up with a model for the digitisation of newspapers, in particular the way in which we are working in partnership with Findmypast, which will be particularly productive. We certainly hope to build on it beyond the life of the project. We can't show you any newspapers digitised through Heritage Made Digital, or offer any free datasets, as yet. But we will do soon.

It's worth remembering that the British Library has 60 million newspapers, from 1619 to the present day. After a decade or more of intensive work, we have digitised just 5%. There is a long, long way to go.