Picture this - visualising newspaper data
What are newspapers made of?
One could say that they are made of accounts of current events, collected in the form of a document for the interest of a particular readership. One could equally say that they are made of assumptions that covertly or overtly express an ideology of one kind or another. One could be literal and say that newspapers are made of paper, generally derived from a rag or wood pulp confection, cut to a set shape and overlaid with ink in the form of recognisable objects. Newspapers are, most simply, made of words, numbers and pictures - but mostly words.
Participants at British Library newspaper data visualisation workshop, 30 October 2019
Newspapers are also made of data. Data is a newspaper's underlying code. Beyond the plain text there exist underlying collections of terms from which we may discovers ideas, clues, connections and patterns which may reveal all the more for us what a set of newspapers has to say.
The digitisation of historic newspapers is creating not just a digital simulacrum of our physical newspaper archives, but a vast collection of data that can be derived from the digitisation process. Most users of digitised historic newspapers will be aware of optical character recognition, or OCR, the process by which the text on an old newspaper page, which we see as words but which a machine understands to be images, is converted by that machine into words that it can recognise. Some will also know that older type, or poor quality microfilmed newspapers, can lead to inaccurate OCR, as the machine struggles to interpret the muddy images it sees into words that we will recognise. Some may also know that specialist software enables a digitised newspaper page to be broken down into its constituent parts, such as article, illustrations or advertisements.
But there is much more than can be done when we digitally analyse this initial layer of information still further. Software programmes can highlights proper names (people, places, organisations), word frequencies, patterns of words and recurrent phrases. It is a form of indexing, as though someone had read through an entire run of a newspaper and produced an index. Indexes to annual volumes of newspapers were not uncommon in the nineteenth century, when people would visits clubs or newspaper reading rooms to leaf their way through past newspapers, the best known being Palmer's Index to The Times. Now software can do this basic work but also much more. It can reveal those patterns which may reveal an underlying history; hidden truths just waiting for the right software programme to bring them to the surface.
The opportunities created by derived data are exciting a growing number of data science and digital humanities scholars, who find newspapers an especially fruitful source of enquiry for their large numbers, their consistency of form and their geographical, social or political identity. Digitised newspapers reflect the flow of time, turning news into history.
So the specialists are being well catered for, but what about the rest of us? Here at the British Library, as we digitise more and more newspapers and so create an ever greater reservoir of re-usable data, we are interested in opening up newspaper data to other kinds of users. We shouldn't all have to be experts in specialist file formats or programming languages to get something out of newspaper data. We should be thinking equally of those who would just like to have a spreadsheet with a clear set of fields that they can sort (by place, date, title etc.), and maybe some guidance on easy-to-use visualisation tools that enable anyone to produce a graph, pie-chart or stylish map.
All of this we are going to do. We will have news about what we are going to be making available to all soon. But we are also looking at ways in which such data might inspire creativity. In partnership with the London College of Communication, we have organised some trial workshop in newspaper data visualisation.
The first of these took place early in October, a report on which was published on this blog. For this workshop we had a mixed group of volunteers, though several were newspaper history specialists. We gave them some sample nineteenth century British newspaper stories and invited them to rethink what they saw in visual terms, with reference to the data that could be derived from the stories, either by machine or human.
The results were fascinating, at times inspiring (we now know that inside some newspaper historians lies an artist just waiting to be set free). However, for a second workshop at the end of October we changed tack. Instead of giving the volunteers stories we gave them one of four sample newspapers from the nineteenth century but asked them to concentrate on sets of terms, phrases and story headlines that we had generated from an entire year of the newspaper (we chose 1880 and the newspapers the Illustrated Police News, Hull Packet, Newcastle Courant and Manchester Weekly Times). Analysing an entire year yielded more meaningful results from which we expected the volunteers to be able to create their own visual impressions, rough sketches inevitably, but with the hope showing the potential.
Our volunteers were another mixed group, this time with more people from the creative side of things (including some art and design students), but again some with expertise in newspapers. What we saw were a rich set of different responses to the data. Some worked with the terms as presented. One create an idea for a headline generator, focussing on disturbing stories reporting on violence towards women, from the Illustrated Police News. Another, who was a poet, uncovered patterns within the terms that revealed found poetry.
Named entities from the Hull Packet for 1880 (names and organisations)
Others worked with the data to visualise the newspaper form differently, one that made its underlying messages more apparent, or at least arranged in a new light. One design student reinvented her newspaper as an unfolding square, with its messages on the outside leading to greater discovery within. One group extracted the major components of their newspaper (advertising, law reports, local news, entertainments) which they laid out on the floor, with lengths of string indicating which kinds of newspaper component were most prominent (advertising had by far the longest string).
Another group started with an academic research question (they wanted to know how they could find out how much advertising space was being paid for different products) and imagined a form of digital analysis which measured space alongside subject, extrapolating the potential for visual analysis in a most interesting way. Another participant saw the opportunity for presenting nineteenth century newspapers in a twenty-first century format, and likewise twenty-first century subjects in a nineteenth century newspaper format, to make that which could appear alien to a young audience of today more meaningful, and revelatory.
I noted three things in particular. The first was that data on its own was not helpful. It appeared to lack meaning. A set of terms only became meaningful to the volunteers when they could see it in the context of the newspaper from which the data was generated. To understand and value derived data, we need knowledge of its roots.
Secondly, it is noticeable how much people did not so much work with the data as use it as a springboard for their own analyses of what was significant about the newspaper before them. The data encouraged creative thinking without necessarily being used directly as the basis of the creative object. Derived data can form the building blocks of a new kind of historical enquiry, but it can also - quite literally - inspire. It encourages to think, and to visualise, analytically.
Lastly, people see things differently because they are different, and in this variety lies such opportunity. A newspaper historian see the patterns of news. A designer sees how a raw idea may be made both beautiful and practical. A poet sees poetry.
We will be exploring this area further. Although our primary goal in making more historical newspaper data available is to assist academic, as well as general researchers, we want to see where the creative impulse may take us. It could lead to different kinds of newspapers being digitised, or derived data being made in forms most suitable for creative inspiration. It could lead to beautiful things.