Digital scholarship blog

5 posts from September 2015

30 September 2015

New opportunities for digital humanities PhD research at the BL

The British Library is looking for university partners to co-supervise collaborative research projects that will draw on – and develop – aspects of digital scholarship.

Funding is available from the Arts & Humanities Research Council’s Collaborative Doctoral Partnerships (CDP) programme, through which the Library works with UK university partners to deliver a bespoke PhD research and training programme.

Our current CDP opportunities include one project to examine the culture and evolution of scientific research, another project to investigate the changing nature of publishing in digital environments, and a third project that will apply digital techniques to forge fresh insights into the construction of scientific knowledge in the 18th Century:

“The Working Life of Scientists: Exploring the Culture of Scientific Research through Personal Archives” will involve the detailed mapping of the personal relationships of 20th century British scientists. It will draw on the Library’s Contemporary Archives and Manuscripts collections, which include personal archives and correspondence from the fields of computer science and programming, cybernetics and artificial intelligence, as well as evolutionary, developmental and molecular biology. The project will provide a unique opportunity to investigate the roles of culture, imagination, argumentation, creativity, discovery and curiosity in scientific enquiry.

“Digital Publishing and the Reader” will identify and examine new technologies used in publishing in the UK. It has a particular emphasis on examples which encourage interaction between readers, texts and authors, such as text-based online gaming, online comics, or online publishing relating to campaigns and activism. The project will inform how emerging media and new communication technologies should be recorded or collected as part of a national collection on British written culture.

“Hans Sloane’s Books: Evaluating an Enlightenment Library” will break new ground by developing digital tools to cross-reference, contextualise and analyse the intellectual significance of the library of Hans Sloane (1660-1753): physician, collector and posthumous ‘founding father’ of the British Museum. The project will draw on in-house digital-curatorial expertise to develop software tools to interrogate Library datasets and to devise ways of ordering and visualising the data. This will enable the first full evaluation of the contribution of Sloane’s library to the Enlightenment scientific community. 

Academics with interests in digital humanities and digital scholarship are invited to develop any of these research themes with a view to co-supervising a PhD project with the British Library. The projects would start in October 2016. A fully-funded AHRC studentship will be allocated to each partner university. Once recruited, the PhD students will get staff-level access to Library collections, expertise and facilities, as well as financial support for research-related costs of up to £1,000 a year.

The application deadline is 27 November.

View full application guidelines and further details about all current AHRC CDP research themes and partnership opportunities.

18 September 2015

Working with news data

The British Library has a vast news collection. We have some 60 million newspaper issues (around 450 million pages) dating from the 1620s to the present day, 60,000 television and radio news programmes from 2010 onwards, and we are archiving over 1,000 UK news websites on a regular basis. Just as the newspaper industry is moving into other media in a cross-platform world, so we are following in how we archive news, in order that we can offer the optimum research service fin the future.


News collections at the British Library

To make such a vision work we have to get the data right. The Library's Explore catalogue works well for finding a volume of newspapers to be delivered to a researcher's desk, but is not readily open for any sort of content analysis of the news collection. Our different news collections - for newspapers, web, TV and radio - come together via Explore, but not easily so, because of the different ways in which they are held and described (most of our newspaper records are at title level, the TV and radio news records are at programme level, while the web archive operates best at page level). We are some way off from presenting the unified news collection, and could be doing so much more to serve new kinds of research enquiry by taking a more data-drive approach to our news holdings.

These needs were the drivers behind a workshop on 7 September 2015, co-organised by BL Labs and the Library's News & Moving Image team, entitled Working with news data across different media. This brought together researchers, developers and content owners to look at ways in which changes in \archive news data management can be of benefit to researchers. The event was part of an ongoing process from BL Labs looking at how the Library's digital collections can be made available for researchers, but was the starting point for a discussion we need to have with researchers and content managers as to how best to pursue an archive news data strategy.

The day began with an introduction to the Library's digital collections and the work of BL Labs by Mahendra Mahey. Luke McKernan, Lead Curator News & Moving Image, then gave a talk on the Library's news collections. He outlined what the Library has to offer researchers at present in terms of news data for onsite analysis: 

  • 2 million 19th century British newspaper pages (XML, page images)
  • UK television news data 2010 onwards – EPG (Electronic Programme Guide) data for 45,000 programmes, subtitles (XML) for c.25,000 programmes, some speech-to-text files for 2011 broadcasts (XML)
  • UK radio news data 2010 onwards – EPG data for 15,000 programmes, some speech-to-text files for 2011 broadcasts (XML)
  • a possible selection of Web news data

Additionally there is selected data and page images from The Financial Times. The Financial Times is partnering with The British Library to make its historical archive available on a royalty free basis for academic research purposes. Any researcher interested in taking advantage of this should contact Luke McKernan for further information.

The British Library is also planning to make available title-level records for all 34,000 newspaper titles that it holds as open data. We will have more news on this initiative in due course.

There are goals beyond these that the Library could strive for. What about an open news dataset shared with other institutions? What about an archive news data model to bring together such collections? And how about the ultimate aim of having all of our news collections identified at issue rather than title level? That will be a huge undertaking, but the goal must be for us to be able to offer to future users a digital picture of what happened in any one place at any one time, contributing to an overall 'news' picture. This would mean not just what was reported in a local newspaper on any one day, but what people from that locality heard, read or saw that helped make up their understanding of the world. That's how we gather our news today; it is also a model for understanding how news has maybe always operated, certainly how news archives can be approached in their totality.


Laughing at Victorian jokes

A number of short presentations then followed, from projects either using the Library's news collections or with whom we have collaborated on news-related initiatives:

  • Glen Robson of the National Library of Wales spoke about implementing the IIIF image format for their public domain newspapers, which could lead to cross-institutional sharing of newspaper collections by using this standardised image retrieval framework.
  • Dr Katrina Navickas of the University of Hertfordshire, a BL Labs competitition winner talked about developing her winning idea, the 'Political Meetings Mapper' which is using automated processes to identify meetings of the Chartist movement in 19th century newspapers.
  • Dr Bob Nicholson, Edge Hill University, winner of the 2014 BL Labs competition, spoke about the Victorian Meme Machine project, tracking down jokes in Victorian newspapers and mapping these automatically to contemporary images. he called for focussed datasets rather than just presenting digitised newspapers in their entirety, and for newspaper data to be linked out to other forms of data.
  • Martin Stabe, Head of Interactive News at the Financial Times spoke on ways in which the newspaper's archive could be opened up for research. The newspaper is taking bold steps in exploring alternative ways of opening up its archives beyond the tried-and-tested subscription models.
  • Melvin Wevers, PhD student within the Translantis project at the University of Utrecht, introduced the Texcavator tool, which is being used to analyse the Dutch National Library's digital newspaper collection to study Dutch public discourse, and which has also been applied to some UK newspapers (including sample data from the Financial Times)
  • Ian Tester, director of Partner Products at Findmypast Ltd, who manage the British Newspaper Archive of digitised newspapers from the British Library, spoke on the diverse researchers opportunities that the archive now provides, with an emphasis on the many kinds of book now being published that have made often unexpected use of the digital archive.
  • Mark Flashman and Michael Satterthwaite from the BBC Rewind project spoke about the ways the BBC is applying innovative digital applications to digital storytelling and opening up news archives through projects such as the World Service Radio Archive, News Timeliner and Your Story. They stressed the importance of achieving good things with small amounts of data first, and of working for 'good enough' results rather than perfection.

The workshop then divided up into four groups to consider four questions which could help us shape how we develop things next. They were:

  • What’s the best way to get the most out of hack events? What have people learned, what are the issues, the best way to overcome them and how to get the most from them?
  • What is the best way to work across a heterogeneous collection of news data, with particularly focus on the data available from the British Library (though not exclusively). What are the challenges and how to get over them?
  • How might the British Library most usefully work with third parties to get the best out of news data. What are the issues and challenges?
  • What do researchers want from news data? What are the issues and the challenges?

We're still working on assimilating the answers to those questions, as we start to shape our new data plans, including further such events. our thanks to everyone who attended the workshop and who supplied such stimulating and useful contributions. The next step will be a news hackathon, which we will be hosting on November 16th at the British Library in London. More news on this will be published soon.

14 September 2015

#CitizenHums and #MakingBigDataHuman

Last week I had the pleasure of attending the Citizen Humanities Comes of Age: Crowdsourcing for the Humanities in the 21st Century symposium. The two-day event was organised by King's College London’s Department of Digital Humanities (DDH) and Stanford University’s Center for Spatial and Textual Analysis (CESTA) with the aim of exploring “ways in which humanities and cultural heritage research is enriched through scholarly crowdsourcing”.

Through the magic of twitter I was also able to (sort of) attend the Making Big Data Human Conference too (luckily my colleague Stella was actually there for the full experience, more on that in a future post). It turned out a really nice correspondence of ideas kept occurring throughout, too numerous really to capture them all here but a couple of good examples came during a talk @jfwinters gave.


This statement simultaneously resonated with a topic we were similarly grappling with over at #CitizenHums.  Would it be better practice for institutional crowdsourcing initiatives to be specifically research question led, rather than say, collection led? While making as much of British Library collections accessible to as wide an audience as possible will remain a central driver, it’s a useful reminder that there are always data collecting decisions being made that will impact future reuse of this content by researchers. To avoid creating potentially irreplaceable gaps in our datasets, we’d do well always to, even if informally, consider more specifically the wide variety of explicit future research uses of the data we’re crowdsourcing.


This is true of any data collection exercise as well and resonated with an interesting conversation around, as @Mia Ridge describes it, the machine learning + crowdsourcing ecosystem.  The planning of any crowdsourcing project must include consideration of what can be done programmatically at any stage, where pre-processing can bring efficiencies to the tasks we ask of volunteers, while datasets and volunteer responses can be made open to help inform the development of machine learning driven solutions.

When we released 1 Million images online from British Library collections, we hoped to surface tags from the Flickr crowd to make them more discoverable, but we also hoped that the dataset might prove a good training ground for machine learning. Happily both machine and man have taken up the challenge since and the results have been staggering in the terms of making that massive collection more accessible.

Similarly we are posting all the data and contributions from our card catalogue conversion experiment LibCrowds online in the hopes that it might be used to train typeset and handwriting OCR or test other automatic ways in which complex but common library issues such as these can be programmatically resolved alongside human interventions.

All in all a fantastic two days and a many thanks to @StuartDunnCeRch and all the organisers for putting it together!

Nora McGregor
Digital Curator, Digital Research Team




04 September 2015

What is a Game Jam? By Adam Crymble

Maybe you've been to an 'unconference', and you might even have participated in a 'hackathon'. But what's a Game Jam?


(Image: from Flickr user galant, CC-BY 2.0

Game Jams originated in the amateur video-game making community. Building video games is easier than you might think, with a number of platforms that can get you up and running with a viable game in a matter of hours (assuming you've got some decent coding skills). Thanks to these platforms and many decades of great video games, there's a large and engaged community of amateur video game makers who build, share, and play each other's games. But coding can be an isolating experience. Particularly if you live on the other side of the world from your online game-making friends. So Game Jams were a way for people to get together virtually and challenge each other to build a game on a particular theme, usually over a defined period of time (a week, for example).

The virtual format is more inclusive than an in-person event, because it means anyone in the world can participate and travel costs become moot. It also takes advantage of the fact that many people's best computer isn't a laptop, meaning an in-person hackathon forces participants to use a less than ideal machine with a small screen.

So now that you know what a Game Jam is, why not join one? Starting today, British Library Labs in conjunction with the Digital History Reseach Centre at the University of Hertfordshire are hosting a 'Crowdsourcing Game Jam'. We're challenging anyone who is interested to build a game that makes the process of crowdsourcing more fun.

The Game Jam is open from 4-11 September 2015, and we've set aside £500 to work with someone with a viable idea for a game. Full details are available on the Game Jam website.

So get Jamming. Take lots of pictures. Tweet at us (@adam_crymble; @benosteen; @BL_Labs).

Blog about your games. And submit them to our Game Jam!

If you need to reach us with questions or comments, you can find me at

What makes the Crowdsourcing Arcade Machine tick?

2015-09-02 16.31.43

Can crowdsourcing be done in public? I've spent a few days building a large arcade-style cabinet that is tough and rugged and that the general public can interact with. There is no external keyboard or mouse to this, but you can think of it as a normal computer.

The joystick and two buttons are a constraint, intended to encourage more casual applications and use. Can a machine that looks like it has come from the 1980s, help with crowdsourcing applications? Are there any games that can both run with these constraints AND provide data about cultural collections?

To start this conversation properly, we have just launched a Game Jam: This is open to anyone who wants to write something that fits with this machine. We are interested in prototypes, full functioning games or even just ideas of what might make for a fun game. The only key point is that there is some aspect to the game which might tell us something interesting and new about our collections. 


  • Raspberry Pi 2 - Quad core 700MHz by default, but can be overclocked if necessary.
    • Running Raspian by default, but will run whatever flavour of OS needed for a game.
  • 4:3 LCD screen, up to 1280x1024 screen resolution.
  • It should have a wifi connection in most locations (however, as you might expect with wifi, it may not work all the time!)
  • Illuminated marquee
  • Stereo sound (Speakers above the screen)
  • Joystick - movement is mapped to the up,down, left, and right cursor keys
  • Two input buttons - also mapped to key presses, Left Ctrl and Left Alt by default, but can be changed if necessary.
  • Up to two auxiliary buttons - on the front of the cabinet, also mapped to key presses.

2015-09-02 16.33.19

From top to bottom: Raspberry Pi 2, Amplifier and Power source (5v and 12v)

2015-09-02 16.33.34

The underside of the control panel and the bottom of the mounted LCD. Uses standard arcade controls (Happ brand in this case) and an I-PAC2 to map these onto keyboard presses for convenience.