THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

15 August 2018

Seeking researchers to work on an ambitious data science and digital humanities project

If you follow @BL_DigiSchol or #DigitalHumanities hashtags on twitter, you might have seen a burst of data science, history and digital humanities jobs being advertised. In this post, Dr Mia Ridge of the Library's Digital Scholarship team provides some background to contextualise the jobs advertised with the 'Living with Machines' project.

We are seeking to appoint several new roles who will collaborate on an exciting new project developed by the British Library and The Alan Turing Institute, the national centre for data science and artificial intelligence. You'd be working with an inter-disciplinary group of investigators. The project is led by Ruth Ahnert (QMUL), and co-led by (in alphabetical order): Adam Farquhar (British Library), Emma Griffin (UEA), James Hetherington (Alan Turing Institute), Jon Lawrence (Exeter), Barbara McGillivray (Alan Turing Institute and Cambridge) and Mia Ridge (British Library).

In its early stages of development, the project, called Living with Machines, brings together national-scale digital collections and data, advanced data science techniques, and fundamental humanities questions. It will look at the social and cultural impact of mechanisation across the long nineteenth century, using data science methods both to track the application of technology to our social and economic lives and the human response to their introduction. The project will initially work with digitised newspaper collections, but will look to include a variety of sources and formats held by the British Library and other institutions.

So what does this mean for you? The project name is both a reference to the impact of the Industrial Revolution and a nod to the impact of computational methods on scholarship. It will require radical collaboration between historians, data scientists, geographers, computational linguists, and curators to enable new intersections between computational methods and historical inquiry. 

We’re looking to recruit people interested in examining the impact - the challenges, as well as the opportunities - of intensely interdisciplinary collaboration, while applying transformative data-science driven approaches to enable new research questions and approaches to historical sources. As a multidisciplinary project, it will require people with enough perspective on their own disciplines to explain often tacit knowledge about the norms and practices of those disciplines. Each team member will play an active part in relevant aspects of the research process, including outreach and publications, while gaining experience working on a very large-scale project. We're looking for people who enjoy collaboration and solving problems in a complex environment.

As the job titles below indicate, the project will require people with a range of skills and experience. Outputs will range from visualisations, to monographs and articles, to libraries of code; from training workshops and documentation, to work ensuring the public and other researchers can meaningfully access and interpret the results of data science processes. A number of roles are offered to help make this a reality.

Jobs currently advertised:

Further opportunities are currently being scoped and are likely to include:

  • Data and Content Manager (British Library)
  • Rights Assurance (British Library)
  • Digital Curator (British Library, contributing to the development and implementation of the digital scholarship and public outreach strands of the project)
  • Digital Systems Engineer (British Library, suitable for Research Software Engineers or other software developers)

You may have noticed that the British Library is also currently advertising for a Curator, Newspaper Data (closes Sept 9). This isn’t related to Living with Machines, but with an approach of applying data-driven journalism and visualisation techniques to historical collections, it should have some lovely synergies and opportunities to share work in progress with the project team.

Keep an eye out for press releases and formal announcements from the institutions involved, but in the meantime, please share the job ads with people who might be suitable for any of these roles. If you have any questions about the roles, HR@turing.ac.uk is a great place to start.

13 August 2018

The Parts of a Playbill

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can help out with In the Spotlight at playbills.libcrowds.com.

In the Spotlight is a brilliant tool for spotting variations between playbills across the eighteenth and nineteenth centuries. The site provides participants with access to thousands of digitised playbills, and the sheets of the playbills in the site’s collections often have lists of the cast, scenes, and any innovative ‘machinery’ involved in the production. Whilst the most famous actors obviously needed to be emphasised and drew more crowds (e.g., any playbills featuring Mr Kean tend to have his name in huge letters), from the playbills in In the Spotlight’s volumes that doesn’t always seem to be the case with playwrights. Sometimes they’re mentioned by name, but in many cases famous playwrights aren't named on the playbill. I’ve speculated previously that this is because these playwrights were so famous that perhaps audiences would hear by word of mouth or press that a new play was out by them, so it was assumed that there was no point in adding the name as audiences would already know?

What can you expect to see on a playbill?

The basics of a playbill are: the main title of the performance, a subtitle, often the current date, future or past dates of performances, the cast and characters, scenery, short or long summaries of the scenes to be acted, whether the performance is to benefit anyone, and where tickets can be bought from. There are definitely surprises though: the In the Spotlight team have also come across apologies from theatre managers for actors who were scheduled to perform not turning up, or performing drunk! The project forum has a thread for interesting things 'spotted on In the Spotlight', and we always welcome posts from others.

Crowds would often react negatively if the scheduled performers weren’t on stage. Gilli Bush-Bailey also notes in The Performing Century (2007) that crowds would be used to seeing the same minor actors reappear across several parts of the performance and playbills, stating that ‘playbills show that only the lesser actors and actresses in the company appear in both the main piece and the following farce or afterpiece’ (p. 185), with bigger names at theatres royal committing only to either a tragic or comic performance.

From our late 18th century playbills on the site, users can see quite a standard format in structure and font.

Vdc_100022589157.0x000013
In this 1797 playbill from the Margate volume, the font is uniform, with variations in size to emphasise names and performance titles.

How did playbills change over time?

In the 19th century, all kinds of new and exciting fonts are introduced, as well as more experimentation in the structuring of playbills. The type of performance also influences the layout of the playbill, for instance, a circus playbill be often be divided into a grid-like structure to describe each act and feature illustrations, and early magician playbills often change orientation half-way down the playbill to give more space to describe their tricks and stage.

Vdc_100022589063.0x00001f
1834 Birmingham playbill

This 1834 Birmingham playbill is much lengthier than the previous example, showing a variety of fonts and featuring more densely packed text. Although this may look more like an information overload, the mix of fonts and variations in size still make the main points of the playbill eye-catching to passersby. 

James Gregory’s ‘Parody Playbills’ article, stimulated by the In the Spotlight project, contains a lot of great examples and further insights into the deeper meaning of playbills and their structure.

Works Cited

Davies, T. C. and P. Holland. (2007). The Performing Century: Nineteenth-Century Theatre History. Basingstoke: Palgrave Macmillan.

Gregory, J. (2018) ‘Parody Playbills: The Politics of the Playbill in Britain in the Eighteenth and Nineteenth Centuries’ in eBLJ.

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.