Digital scholarship blog

2 posts from August 2015

13 August 2015

Fin; or reflections on thirty months of Digital Research

Thirty months ago I joined the British Library Digital Research Team. In that time we (often with the folks from British Library Labs) have achieved a huge amount, not least putting over one million public domain images on Flickr, developing our internal training provision, and repurposing British Library collections to enrich the education and outlook of computer science and game design students. This week I say goodbye.


The Digital Research Team was created in 2010 with a broad mission that covers everything from enabling computational analysis of large scale digitised collections and creative reuse of openly licenced collections to advocacy of clear data citation and digital skills training. I often have summed our role up by saying that we are here to ensure that the British Library's digital collections are used in ways that go beyond looking at them on a webpage, an open, data, and creativity orientated approach that is at the forefront of the British Library's vision.


I came to the team from academia and a background in studying long eighteenth-century satirical prints. My data was small, perspectives narrow, and foobar modest, but my eyes, ears, and mind open. And they needed to be, for in my first month in the job the British Library celebrated enhanced powers to collect non-print materials published in the UK. In effect this meant that this library of around 170 million things had the power to collect the UK web domain. Since then the library has collected over 2 billion web pages, fundamentally changing our collection profile (see the UK Web Archive blog for more), making the British Library a place full of data as much as books. Even the beloved manuscript, I soon learnt, was not 'safe' from the bitstream for also changing our collection profile were the small but growing volume of floppy disks, CD-ROMs, hard-drives, and email archives that are the archives of life in the 'Information Age'. And these personal digital archives are more than just collections of 'proper' born-digital documents typed up on personal computers, they include software, browser-caches, spam, and downloads folders, in fact they include every bit on every disk: captures of whole computing environments that can be booted up to offer an experiential window into a person's interaction with their machine.


I say can but in most cases they aren't. For as unpublished material these archives, like their paper counterparts, can only be made available to readers once we are sure we have complied with things like The Data Protection Act, a time consuming process that requires people to examine each and every digital object. This clash of possibilities speaks to two overarching themes of my thirty months with the Digital Research Team. The first is the gap that often appears between well thought out established practice and the demands of large and/or complex digital collections: in the case of born-digital manuscript collections, responsibilities to both readers and depositors compete when faced with hundreds of thousands of files. The second is the important - but often forgotten - role of decisions made by people in the creation, management, and marshalling of large and/or complex digital collections. This role may be self-evident. But data does tend to flatten and depersonalise. And interfaces to data tend to emphasise those qualities in their haste to ensure that experiences are smooth, that tensions recede from view. As someone trained to trace the provenance of evidence and to examine the role of agency and power in humanistic phenomena, I see it as important to put the personal back into our use of data. Why? Well, when you search Explore the British Library and Google Books you don't just search databases of 56 million things and over 30 million books respectively, rather you search accumulations of human labour, expertise, and decision making shaped (and constrained) by local, temporal, and organisational priorities and worldviews. When you browse Wikipedia, Wikimedia Commons, or Wikisource you rely on the production of human labour mediated through community guidelines and practices that - perhaps inevitably - introduce prejudices. When you use any computational process to take data in and push data out, the bit in the middle isn't the work of a machine but the work of people instructing a machine, people - as Mia Ridge, Ramon Amaro and the Software Sustainability Institute, among others, remind us - with opinions, perspectives, fears, and dreams. And when you seek solace in a standard, you seek solace in something that, as a product of human agency, can never wholly be neutral.

Screenshot 2015-04-07 15.30.46 - Copy

This may all sound a bit negative. But my point is that many of the achievements of the Digital Research Team stem from this sort of thinking, an approach that is deeply critical of techno-evangelist perspectives to the role of digital collections, methods, and approaches in society and culture. We don't assume that digital technology is the solution but rather that an approach that sees people using digital technology is one solution among many possible solutions. My job over the last thirty months has been to collaborate with amazing people both in and outside to British Library to chose the right solutions. As I move to a new position outside the British Library, I look forward to seeing the fruits of these and future decisions appear on the Digital Scholarship Blog.

James Baker -- Curator, Digital Research -- @j_w_baker

05 August 2015

Crowdsourcing as Interesting Decisions: Update from BL Labs 2015 Competition Winner

Posted by Mahendra Mahey (BL Labs Manager) on behalf of Adam Crymble, a Lecturer in Digital History at the University of Hertfordshire, and one of the winners of the 2015 British Library Labs competition, describes the current progress of his project, ‘Mechanical Curator Arcade’.

When I was nine years old my friend Robbie and I spent an inordinate amount of time in the local video game arcade, and far more money than either of us would like to admit. We watched enviously as the teenagers hogged the Street Fighter II machine near the entrance. Robbie and I retreated deeper into the arcade, where we found a favourite in The Simpsons Arcade Game.


We even beat it once.

Like many children of the 1970s, 80s, and 90s, video games were a staple of our formative years. Many of us have developed a superhuman ability to stare at screens for long periods without blinking. We know instinctively that there is something behind this wall, and that some combination of buttons will help us discover it:


But how many of us know Why a game is fun? I only recently began to ask myself that question, and I came across a quote attributed to renowned video game maker Sid Meier, the creator of the Civilization franchise. Meier noted that 'a game is a series of interesting choices'.

Not everyone agrees with that definition, but it's a surprisingly simple and astute observation. Games lay down a series of rules - they generate the conditions of a virtual universe. We learn the rules, and our challenge is to win the game by making choices that lead us through that world, to victory.

But a game is about more than just choices. A game is about losing. Or at least, the threat of losing. If we make the wrong choice - jump on a prickly enemy, for example - we're punished. We die.

This revelation has been important for me, because for the past few months I've been trying to make crowdsourcing fun. Crowdsourcing is an increasingly common practice amongst historians, whereby a simple but repetitive task - such as transcription or tagging a huge set of images - is shared across a large number of volunteers. It adheres to the adage, 'many hands make light work'. Like games, crowdsourcing is inherently about choices. Depending on the task, the volunteer makes a choice. If they're transcribing handwritten documents, they have to decide what word they see on the screen. If they're asked to tag a historic image, they had to decide the appropriate tag.

In order to make crowdsourcing more fun, some projects have attempted to offer a series of incentives. High scores and leaderboards are popular now in 'gamified' crowdsourcing experiences. But I've yet to come across a crowdsourcing game in which you can REALLY lose. It's all carrot, and no stick, and that's why it's no fun.

Counterintuitive, perhaps, but once you hit the age of 5 and your competitive streak kicks in, it's the threat of losing that makes you want to win. And this is where crowdsourcing faces its biggest challenge if we want users to have a 'fun' experience. Because in order for you to lose, the maker of the game needs to know when you've done something wrong - when you've broken the rules of the virtual universe. That's easy enough for Super Mario, because the game is programmed to check when you've bumped into a bad guy, or fallen down a hole. But in crowdsourcing, we have no idea if you've given us the right answer - if you've tagged the image correctly, or transcribed the word right. If we knew that, we wouldn't have to ask you to do it in the first place. That means we can't punish you consistently. And it means you won't have fun the minute you realise that. Because at that point, your interesting decisions become meaningless and any correct information you provide comes down to your good will rather than your desire to win.

That's where we currently stand in our efforts to make crowdsourcing fun. It's a big challenge, but it's one I believe someone out there can tackle. So in the spirit of crowdsourcing, we're turning to the crowd, and we're hosting a virtual 'Game Jam' from 4-11 September 2015 to engage with amateur video game makers everywhere who think they've got the answer.

To help them get started with an appropriate crowdsourcing task, we've put together a sample set of these historic images - around 100 to 200 illustrations each of people, music, architecture, flora, fauna and even cycling - along with several hundred images that we know very little about. We thought this might help to validate the results of the crowdsourced content.

The sample link is:

An ideal game draws a random image from the set and through gameplay the player tells us something about the content of the image. Perhaps they choose from our limited set of tags (flora, fauna, mineral, human portrait, landscape, manmade - eg. machine, buildings, ship, abstract, artistic, music, map), or gamemakers can opt to be more creative.

If we like what we see, we've set aside up to £500 (courtesy of the Andrew Mellon Foundation) to work with someone to polish their game and release it as part of our 'Mechanical Curator Arcade Game', a 1980s-style arcade console that we're planning to install in the British Library this autumn. The Game Jam is open to anyone, but only those over the age of 18 are elligible to work for us.

All completed games (whether they fit the crowdsourcing theme or not) will also be eligible to enter the British Library Labs Awards, with a chance to win an additional £500 in prizes, as long as they use the British Library digital content such as the sounds and images from the open collections.

If you're up for the challenge, you can find out more on our Game Jam event page. We're looking forward to working with one of you, and get in touch at if you'd like to discuss ideas. We're here to listen and learn.