THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

15 June 2018

Team @BL_DigiSchol join @thecarpentries at #CarpentryCon2018 in Dublin

Conference opening
CarpentryCon 2018
UCD campus
UCD Dublin

Members of the Digital Scholarship team, Alex, Rossitza and Stella, attended The Carpentries community inaugural conference held on the relaxing campus of University College Dublin 29 May-1 June. The atmosphere at the event was energising thanks to the enthusiasm of the community members who volunteer to teach computational, coding and data science skills to researchers worldwide.

The theme of the event “Building Locally, Connecting Globally” permeated the rich programme of talks and interactive sessions that focused on sharing knowledge, networking and developing new content and strategies for strengthening and growing The Carpentries. A report on the conference has been published by Belinda Weaver and this blog post by Raniere Silva summarises well some of the key messages.

Our team exhibited a poster on the Digital Scholarship staff training programme that creates opportunities for staff at The British Library to develop the necessary skills and knowledge to support emerging areas of modern scholarship.

Poster
Digital Scholarship poster
Team poster
Rossitza, Stella and Alex

Thus, particularly relevant for us were the sessions led by Belinda Weaver and Chris Erdmann about growing the software and data skills training provision for library professionals. We engaged in a conversation with members of The Library Carpentry community about how best to review and create new curricula and resources, as well as how the needs of the broader culture heritage professionals may vary. There are opportunities to work with university departments, professional bodies and regional consortia to get library and other GLAM professionals involved with The Library Carpentry. Watch this space for our team's involvement with The Carpentries and for further updates follow The Library Carpentry blog and Twitter feed, and The Carpentry Clippings newsletter.

Below are just few highlights from the sessions we took part in:

@frameshiftlic : Diversity and inclusion go hand in hand. Much more needs to be done to increase diversity and inclusivity in the technology sector.

The Carpentries community uses GitHub to maintain training materials and good guidance was provided on how to clone and fork repositories and submit pull requests.  A great teaching resource Happy Git and GitHub for the useR is being developed for Software Carpentry by Jennifer Bryan

Greg Wilson offered advice on how to keep refreshing teaching methods and content for both the learners’ and instructors’ benefit. His reading list for engaging learners includes The discussion book: 50 great ways to get people talking and Understanding how we learn: A Visual Guide

Tracy Teal talked about the funding model, operations and infrastructure of The Carpentries who have updated their website, logo, handbook and a Code of Conduct. Curriculum development, equality and inclusion, and building local capacity for training remain high priorities for the community.

Most engaging was the interactive breakout session on developing a new software carpentry lesson on High Performance Computing (HPC). The session leader Alan O’Cais used the classroom engagement platform Socrative to gather attendees’ feedback on existing lessons, appropriate content and the learner profile.

Other great sessions covered best approaches to teaching live coding at university, post workshop community development strategies, and how organisations, such as The Software Sustainability Institute and ELIXIR, have been supporting The Carpentries community initiatives.

CarpentryCon group photo Flickr 6000x4000
#CarpentryCon 2018 delegates. Image by Bérénice Batut available at https://flic.kr/p/252fVid under CC-BY-SA 2.0



01 June 2018

Interactive Fiction Summer School and Settle Stories

As a PhD student, I’m privileged to spend three years of my life investigating a subject I find fascinating, but one of the absolute highlights of the first year of my study was the week I spent attending the Interactive Fiction Summer School at the British Library last July. My research explores how mobile phones are changing storytelling, so interactive fiction was a subject I was keen to find out more about – and how better to do so than by learning from the experts how to write my own?

It was an excellent course, as we learned not only about the mechanics of writing stories where the reader plays a part in deciding what happens – how to make your reader’s choices both engaging and manageable, for instance – but also about storytelling more generally: how to generate momentum and make your ending both surprising and inevitable. Over the course of the week, we each wrote our own interactive stories, drawing on what we learned from our tutors and getting to grips with the mechanics of the form: my own story ended up unexpectedly drawing upon my experiences teaching in Japan.

One of the week’s many highpoints was a session on the use of conflict in interactive fiction, run by Rob Sherman, who shared a thought-provoking work he’d created for the housing and homelessness charity Shelter, about a woman struggling to keep her family safe and happy in a world of rising costs, lowering wages, and disappearing support. This year, Rob is leading the British Library's summer school, curating sessions from a range of experts including the poet and interactive writer Abigail Parry (last year’s excellent course leader), Gavin Inglis, and Hannah Powell-Smith.

The summer school had other benefits too: including spending time with fascinating and creative people interested in the storytelling possibilities of interactive fiction, sharing ideas, and collaborating: I remember one particularly memorable session working with two of my fellow students on a story about a performance artist who decides to enact that old myth about frogs in boiling water herself, and ends up in boiled to death in an underground swimming pool as part of an installation about the damage we’re doing to the environment.

The summer school attracted a wide range of people, from young would-be writers, to academics and storytelling professionals. One of my fellow students was Sita Brand, director of Settle Stories, whose annual festival of storytelling takes place in the picturesque Yorkshire market town of Settle. Sita invited me to speak at this year’s Festival, and so I found myself this April talking to an audience about my research into how mobile phones are influencing storytelling and being interviewed by Dave Driver for Dry Stone Radio. (You can hear the interview here – from 1:34 on.)

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-06-01/645a4fba-905f-48c7-8d65-f1513df86a52.png
Telephone box at Settle Stories, where attendees pick up the phone handset and dial for a story

If I've whetted your appetite and you are interested in attending this summer's Interactive Fiction Summer School at the British Library, which is on the theme of Infinite Journeys, booking details are here.  It runs for five days, beginning Monday 23 July and ending on Friday 27 July. Also, if you are interested in my research on fiction being written for smartphones, then I'm giving a Feed the Mind talk on Mobile Stories: New Kinds of Fiction? on Monday 11 June, 12:30-13:30, booking details here.

This a guest post is by Alastair Horne, you can follow him on twitter as @pressfuturist, and also on Instagram.

14 May 2018

Seeing British Library collections through a digital lens

Digital Curator Mia Ridge writes: in this guest post, Dr Giles Bergel describes some experiments with the Library's digitised images...

The University of Oxford’s Visual Geometry Group has been working with a number of British Library curators to apply computer vision technology to their collections. On April 5 of this year I was invited by BL Digital Curator Dr. Mia Ridge to St. Pancras to showcase some of this work and to give curators the opportunity to try the tools out for themselves.  

Image1
Visual Geometry’s VISE tool matching two identical images from separate books digitised for the British Library’s Two Centuries of Indian Print project.

Computer vision - the extraction of meaning from images - has made considerable strides in recent years, particularly through the application of so-called ‘deep learning’ to large datasets. Cultural collections provide some of the most interesting test-cases for computer vision researchers, due to their complexity; the intensity of interest that researchers bring to them; and to their importance for human well-being. Can computers see collections as humans do? Computer vision is perhaps better regarded as a powerful lens rather than as a substitute for human curation. A computer can search a large collection of images far more quickly than can a single picture researcher: while it will not bring the same contextual understanding to bear on an image, it has the advantage of speed and comprehensiveness. Sometimes, a computer vision system can surprise the researcher by suggesting similarities that weren’t readily apparent.

As a relatively new technology, computer vision attracts legitimate concerns about privacy, ethics and fairness. By making its state of the art tools freely available, Visual Geometry hope to encourage experimentation and responsible use, and to enlist users to help determine what they can and cannot do. Cultural collections provide a searching test-case for the state of the art, due to their diversity as media (prints, paintings, stamped images, photographs, film and more) each of which invite different responses. One BL curator made a telling point by searching the BBC News collection with the term 'football': the system was presented with images previously tagged with that word that related to American, Gaelic, Rugby and Association football. Although inconclusive due to lack of sufficiently specific training data, the test asked whether a computer could (or should) pick the most popular instances; attempt to generalise across multiple meanings; or discern separate usages. Despite increases in processing power and in software methods, computers' ability to generalise; to extract semantic meaning from images or texts; and to cope with overlapping or ambiguous concepts remains very basic.  

Other tests with BL images have been more immediately successful. Visual Geometry's Traherne tool, developed originally to detect differences in typesetting in early printed books, worked well with many materials that exhibit small differences, such as postage stamps or doctored photographs. Visual Geometry's Image Search Engine (VISE) has shown itself capable of retrieving matching illustrations in books digitised for the Library's Indian Print project, as well as certain bookbinding features, or popular printed ballads. Some years ago Visual Geometry produced a search interface for the Library's 1 Million Images release. A collaboration between the Library's Endangered Archives programme and Oxford researcher David Zeitlyn on the archive of Cameroonian studio photographer Jacques Toussele employed facial recognition as well as pattern detection. VGG's facial recognition software works on video (BBC News, for example) as well as still photographs and art, and is soon to be freely released to join other tools under the banner of the Seebibyte Project.    

I'll be returning to the Library in June to help curators explore using the tools with their own images. For more information on the work of Visual Geometry on cultural collections, subscribe to the project's Google Group or contact Giles Bergel.      

Dr. Giles Bergel is a digital humanist based in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford.  

The event was supported by the Seebibyte project under an EPSRC Programme Grant EP/M013774/1

 

11 May 2018

Digital Conversations @BL: Empowering Technologies

As part of our Digital Conversations series, we invite you to join us for an evening discussing how Indigenous communities are using new technologies to preserve and promote their culture; this takes place on Tuesday 22 May, 18:30 - 20:30, in the British Library Knowledge Centre, to book a ticket go here.

Various topics and examples of how digital platforms are used by communities across the world will be covered. Including crowdfunding, crowdsourcing, apps, Etsy, videogames and social media.  Chaired by Niamh Moore our panel includes the following speakers:

 

 

  • Gloria O’Neill, President and Chief Executive Officer of Cook Inlet Tribal Council and Upper One Games, the first indigenous-owned video game developer and publisher in the United States, is here in person to discuss BAFTA winning game Never Alone (Kisima Inŋitchuŋa), which shares Alaska Native culture and values with the world at large

 

  • Felicity Wright who has over twenty years experience in working with non-profit Aboriginal-owned social enterprises. Currently she works for Injalak Arts in Gunbalanya, Australia, who sell their art and craft work to global audiences from their Etsy store. This will be a recorded presentation introduced by Jo Pilcher, University of Brighton, whose PhD research is about Aboriginal Australian textile production in the Northern Terrority

 

  • Michael Wynne, Digital Applications Librarian, will give an overview of Mukurtu, a free, mobile, and open source content management system platform, which is built with indigenous communities

We are looking forward to a fascinating and lively discussion, so please prepare your questions for the panel! If you can't attend in person, please follow #BLdigital for the twitter stream during the event.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2018-05-11/8f26b0d7cd344044954d850140fe98a1.png
Game play scene from Never Alone featuring the owl man

This event is sponsored by the Eccles Centre for American Studies at the British Library. It will include a combination of in-person, Skype and pre-recorded presentations. This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom.

08 May 2018

The Italian Academies database – now available in XML

Dr Mia Ridge writes: in 2017, we made XML and image files from a four-year, AHRC-funded project: The Italian Academies 1525-1700 available through the Library's open data portal. The original data structure was quite complex, so we would be curious to hear feedback from anyone reusing the converted form for research or visualisations.

In this post, Dr Lisa Sampson, Reader in Early Modern Italian Studies at UCL, and Dr Jane Everson, Emeritus Professor of Italian literature, RHUL, provide further information about the project...

New research opportunities for students of Renaissance and Baroque culture! The Italian Academies database is now available for download. It's in a format called XML which represents the original structure of the database.

This dedicated database results from an eight-year project, funded by the Arts and Humanities Research Council UK, and provides a wealth of information on the Italian learned academies. Around 800 such institutions flourished across the peninsula over the sixteenth and seventeenth centuries, making major contributions to the cultural and scientific debates and innovations of the period, as well as forming intellectual networks across Europe. This database lists a total of 587 Academies from Venice, Padua, Ferrara, Bologna, Siena, Rome, Naples, and towns and cities in southern Italy and Sicily active in the period 1525-1700. Also listed are more than 7,000 members of one or more academies (including major figures like Galileo, as well as women and artists), and almost 1,000 printed works connected with academies held in the British Library. The database therefore provides an essential starting point for research into early modern culture in Italy and beyond. It is also an invitation to further scholarship and data collection, as these totals constitute only a fraction of the data relating to the Academies.

Terracina
Laura Terracina, nicknamed Febea, of the Accademia degli Incogniti, Naples

The database is designed to permit searches from many different perspectives and to allow easy searching across categories. In addition to the three principal fields – Academies, People, Books – searches can be conducted by title keyword, printer, illustrator, dedicatee, censor, language, gender, nationality among others. The database also lists and illustrates the mottoes and emblems of the Academies (where known) and similarly of individual academy members. Illustrations from the books entered in the database include frontispieces, colophons, and images from within texts.

Intronati emblem
Emblem of the Accademia degli Intronati, Siena


The database thus aims to promote research on the Italian Academies in disciplines ranging from literature and history, through art, science, astronomy, mathematics, printing and publishing, censorship, politics, religion and philosophy.

The Italian Academies project which created this database began in 2006 as a collaboration between the British Library and Royal Holloway University of London, funded by the Arts and Humanities Research council and led by Jane Everson. The objective was the creation of a dedicated resource on the publications and membership of the Italian learned Academies active in the period between 1525 and 1700. The software for the database was designed in-house by the British Library and the first tranche of data was completed in 2009 listing information for academies in four cities (Naples, Siena, Bologna and Padua). A second phase, listing information for many more cities, including in southern Italy and Sicily, developed the database further, between 2010 and 2014, with a major research grant from the AHRC and collaboration with the University of Reading.

The exciting possibilities now opened up by the British Library’s digital data strategy look set to stimulate new research and collaborations by making the records even more widely available, and easily downloadable, in line with Open Access goals. The Italian Academies team is now working to develop the project further with the addition of new data, and the incorporation into a hub of similar resources.

The Italian Academies project team members welcome feedback on the records and on the adoption of the database for new research (contact: www.italianacademies.org).

The original database remains accessible at http://www.bl.uk/catalogues/ItalianAcademies/Default.aspx 

An Introduction to the database, its aims, contents and objectives is available both at this site and at the new digital data site: https://data.bl.uk/iad/

Jane E. Everson, Royal Holloway University of London

Lisa Sampson, University College, London

04 May 2018

What do deep learning, community archives, Livy and the politics of artefacts have in common?

They're all topics we've discussed in the British Library's Digital Scholarship Reading Group. Digital Curator Mia Ridge explains...

A few months after I submitted my PhD and joined the Library's Digital Scholarship team, I realised that it'd be hard to keep up with trends in digital scholarship unless I made a special effort. I also figured I couldn't be the only person in that situation. I've always loved a reading group, so a Digital Scholarship Reading Group seemed a good way to read and discuss at least one topical article a month and meet other people in the Library at the same time.

It'd be boring if it was just members of the Digital Scholarship team violently agreeing with each other, so after a few pilot sessions, I organised posters for staff notice boards to help make it clear that all were welcome, regardless of job title or seniority. After a year or so, we changed the time to allow for more people who work set shifts to attend.

There's a bit of admin each month  - whoever's coordinating the group for that month will update the standing calendar entry, post upcoming topics on our internal staff network, and sometimes ask the Internal Communications team to include them in newsletters.

We usually have eight to ten people turn up, but our last session had over 20 people! This may have been because we had a special guest speaker (thank you, Jane Winters!), because it was about digital humanities rather than digital scholarship, or the result of working with the Internal Comms team to send an all-staff email invitation to attend. The discussion is richest when we have people from a range of different departments and disciplines. A nice side-effect of encouraging attendance from across the Library is learning a little more about people's roles in other departments.

Reading group notice

I've experimented with different ways of selecting articles - an internal poll seemed to work well - and I love it when an attendee suggests a topic from their field, especially as quite a few Library staff are working towards formal degrees outside work. Other discussions are inspired by topics in the news or questions we've been asked as digital curators. I've experimented with length and tone, from academic, peer-reviewed articles to news or magazine articles and videos. Providing a few options for a particular topic seems to work well, as when we had a TED talk, scholarly article or peer-reviewed technical article on the same topic of bias in algorithms.

We usually meet on the first Tuesday of each month.Thanks to everyone who's added to the conversation, suggested a topic or article, or coordinated the discussion - I hope you've all enjoyed it as much as I have! Get in touch (@mia_out or via bl.uk/digital) if you'd like to know more or to suggest a topic or article.

Here's what we've met to discuss so far:

Allington, Daniel, Sarah Brouillette, and David Golumbia, ‘Neoliberal Tools (and Archives): A Political History of Digital Humanities’, Los Angeles Review of Books <https://lareviewofbooks.org/article/neoliberal-tools-archives-political-history-digital-humanities/>
Boyd, danah, and Kate Crawford, ‘Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon’, Information, Communication & Society, 15 (2012), 662–79 <https://doi.org/10.1080/1369118X.2012.678878>
Brügger, Niels, ‘Digital Humanities in the 21st Century: Digital Material as a Driving Force’, Digital Humanities Quarterly, 010 (2016)
Buolamwini, Joy, ‘How I’m Fighting Bias in Algorithms’ <https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms>
Buolamwini, Joy, and Timnit Gebru, ‘Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification’, 15
Deloit, Corine, Neil Wilson, Luca Costabello, and Pierre-Yves Vandenbussche, ‘The British National Bibliography: Who Uses Our Linked Data?’ (presented at the International Conference on Dublin Core and Metadata Applications, Copenhagen, Denmark, 2016) <http://dcevents.dublincore.org/IntConf/dc-2016/paper/viewFile/420/471>
Digital Humanities Research, Teaching and Practice in the UK Landscape Report, 2017
Dinsman, Melissa, ‘The Digital in the Humanities: An Interview with Bethany Nowviskie - Los Angeles Review of Books’, Los Angeles Review of Books <https://lareviewofbooks.org/article/digital-humanities-interview-bethany-nowviskie/>
Drucker, Johanna, ‘Humanities Approaches to Graphical Display’, 5 (2011) <http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html>
Earhart, Amy E., and Toniesha L. Taylor, ‘Pedagogies of Race: Digital Humanities in the Age of Ferguson’, in Debates in the Digital Humanities <http://dhdebates.gc.cuny.edu/debates/text/72>
Elford, Jana Smith, ‘Recovering Women’s History with Network Analysis: A Case Study of the Fabian News’, The Journal of Modern Periodical Studies, 6 (2016), 191–213 <https://doi.org/10.5325/jmodeperistud.6.2.0191>
Evans, Meredith R., ‘Modern Special Collections: Embracing the Future While Taking Care of the Past’, New Review of Academic Librarianship, 21 (2015), 116–28 <https://doi.org/10.1080/13614533.2015.1040926>
Gilliland, Anne, and Andrew Flinn, ‘Community Archives: What Are We Really Talking About?’, in Keynote Speech Delivered at the CIRN Prato Community Informatics Conference, 2013 <http://ccnr.infotech.monash.edu/assets/docs/prato2013_papers/gilliland_flinn_keynote.pdf>
Graham, Shawn, Ian Milligan, and Scott Weingart, ‘Putting Big Data to Good Use: Historical Case Studies’, in The Historian’s Macroscope: Big Digital History, 2014 <http://www.themacroscope.org/?page_id=599>
Grimmelmann, James, ‘The Virtues of Moderation’, TECHNOLOGY Vol., 17 (2015), 68
Jardine, Lisa, and Anthony Grafton, ‘“Studied for Action”: How Gabriel Harvey Read His Livy’, Past & Present, 1990, 30–78 <http://www.jstor.org/stable/650933>
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton, ‘Deep Learning’, Nature, 521 (2015), 436–44 <https://doi.org/10.1038/nature14539>
Lohr, Steve, ‘Facial Recognition Is Accurate, If You’re a White Guy’, The New York Times, 14 February 2018, section Technology <https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html>
Moravec, Michelle, ‘Feminist Research Practices and Digital Archives’, Australian Feminist Studies, 32 (2017), 186–201 <https://doi.org/10.1080/08164649.2017.1357006>
Prescott, Andrew, ‘Searching for Dr. Johnson: The Digitisation of the Burney Newspaper Collection’, 2018, 49–71 <https://doi.org/10.1163/9789004362871_004>
Rawson, Katie, and Trevor Muñoz, ‘Against Cleaning’, 2016 <http://www.curatingmenus.org/articles/against-cleaning/>
 
Self, Will, and William Watkins, ‘There Will Be Blood’, Times Higher Education (THE), 14 July 2016 <https://www.timeshighereducation.com/digital-editions/14-july-2016-digital-edition>
 
Standing, Susan, and Craig Standing, ‘The Ethical Use of Crowdsourcing’, Business Ethics: A European Review, n/a-n/a <https://doi.org/10.1111/beer.12173>
Verwayen, Harry, Julia Fallon, Julia Schellenberg, and Panagiotis Kyrou, Impact Playbook for Museums, Libraries and Archives (Europeana Foundation, 2017)
Winner, Langdon, ‘Do Artifacts Have Politics?’, Daedalus, 1980, 121–136 <http://www.jstor.org/stable/20024652>
Witmore, Michael, ‘Latour, the Digital Humanities, and the Divided Kingdom of Knowledge’, New Literary History, 47 (2016), 353–75 <https://doi.org/10.1353/nlh.2016.0018>

01 May 2018

New Digital Curator in the Digital Scholarship Team

Adi Keinan-SchoonbaertHello all! My name is Adi Keinan-Schoonbaert, and I’m the new Digital Curator for Asian and African collections at the British Library. One of the core remits of the Digital Scholarship team is to enable and encourage the reuse of the Library’s digital collections. When it comes to Asian and African collections, there are always interesting projects and initiatives going on. One is the Two Centuries of Indian Print project, which just started a second phase in March 2018 – a project with a strong Digital Humanities strand led by Digital Curator Tom Derrick. Another example is a collaborative transcription project, supporting the transcription of handwritten historical Arabic scientific works for Handwritten Text Recognition (HTR) research with the help of volunteers.

To give a bit of a background about myself and how I got to the Library: I’m an archaeologist and heritage professional by education and practice, with a PhD in Heritage Studies from University College London (2013). As a field archaeologist I used to record large quantities of excavation-related data – all manually, on paper. This was probably the first time I saw the potential of applying digital tools and technologies to record, manage and share archaeological data.

My first meaningful engagement with archaeological data and digital technologies started in 2005, when I joined the Israeli-Palestinian Archaeology Working Group (IPAWG) to create a database of all archaeological sites surveyed or excavated by Israel in the West Bank since its occupation in 1967, and its linking with a Geographic Information System (GIS), enabling the spatial visualisation and querying of this data for the first time. The research potential of this GIS-linked database proved so great, that I’ve decided to further explore it in a PhD dissertation. My dissertation focused on archaeological databases covering the occupied West Bank, and I was especially interested in the nature of archaeological records and the way they reflect particular research interests and heritage management priorities, as well as variability in data quality, coverage, accuracy and reliability.

Following my PhD I stayed at UCL Institute of Archaeology as a post-doctoral research associate, and participated in a project called MicroPasts, a UCL-British Museum collaboration. This project used web-based, crowdsourcing methods to allow traditional academics and other communities in archaeology to co-produce innovative open datasets. The MicroPasts crowdsourcing platform provided a great variety of projects through which people could contribute – from transcribing British Museum card catalogues, through tagging videos on the Roman Empire, to photomasking images in preparation for 3D modelling of museum objects.

With the main phase of the MicroPasts project coming to an end, I joined the British Library as Digital Curator (Polonsky Fellow) for the Hebrew Manuscripts Digitisation Project. This role allowed me to create and implement a digital strategy for engaging, accessing and promoting a specific digitised collection, working closely with curators and the Digital Scholarship team. My work included making the collection digitally accessible (on data.bl.uk, working with British Library Labs) and encouraging open licensing, creating a website, promoting the collection in different ways, researching available digital methods to explore and exploit collections in novel ways, and implementing tools such as an online catalogue records viewer (TEI XML), OpenRefine, and 3D modelling.

A 6-months backpacking trip to Asia unexpectedly prepared me for my new role at the Library. I was delighted to join – or re-join – the Library’s Digital Research team, this time as Digital Curator for Asian and African Collections. I find these collections especially intriguing due to their diversity, richness and uniqueness. These include mostly manuscripts, printed books, periodicals, newspapers, photographs and e-resources from Africa, the Middle East (including Qatar Digital Library), Central Asia, East Asia (including the International Dunhuang Project), South Asia, SE Asia – as well as the Visual Arts materials.

I’m very excited to join the Library’s Digital Research team work alongside Neil Fitzgerald, Nora McGregor, Mia Ridge and Stella Wisdom and learn from their rich experience. Feel free to get in touch with us via digitalresearch@bl.uk or Twitter - @BL_AdiKS for me, or @BL_DigiSchol for the Digital Scholarship team.

25 April 2018

Some challenges and opportunities for digital scholarship in 2018

In this post, Digital Curator Dr Mia Ridge shares her presentation notes for a talk on 'challenges and opportunities for digital scholarship' at the British Library's first Research Collaboration 'Open House'.

I'm part of a team that supports the creation and innovative use of the British Library's digital collections. Our working definition of digital scholarship is 'using computational methods to answer existing research questions or challenge existing theoretical paradigms'. In this post/talk, my perspective is informed by my knowledge of the internal processes necessary to support digital scholarship and of the issues that some scholars face when using digital/digitised collections, so I'm not by any means claiming this is a complete list.

Opportunities in digital scholarship

  • Scale: you can explore a bigger body of material computationally - 'reading' thousands, or hundreds of thousands, of volumes of text, images or media files - while retaining the ability to individually examine individual items as research questions arise from that distant reading
  • Perspective: you can see trends, patterns and relationships not apparent from close reading individual items, or gain a broad overview of a topic
  • Speed: you can test an idea or hypothesis on a large dataset; prototype new interfaces; generate classification data about people, places, concepts; transcribe content

Together, these opportunities enable new research questions.

Sample digital scholarship tools and methods

Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections at scale, improve discoverability or enable public engagement.

  • OCR, HTR - optical character recognition, handwritten text recognition
  • Data visualisation for analysis or publication
  • Text and data mining - applying classifications to or analysing texts, images or media. Key terms include natural language processing, corpus linguistics, sentiment analysis, applied machine learning. Examples include: Voyant tools, Clarifai image classification.
  • Mapping and GIS - assigning coordinates to quantitative or qualitative data
  • Public participation and learning including crowdsourcing, citizen science/history. Examples include In the Spotlight, transcribing information from historical playbills.
  • Creative and emerging formats including games
An experiment with image classification with Clarifai
An experiment with image classification with Clarifai

Putting it all together, we have case studies like Dr. Katrina Navickas, BL Labs Winner 2015's Political Meetings Mapper. This project, based on digitised 19th century newspapers, used Python scripts to calculate the meeting date, and extract and geocode their locations to create a map of Chartist meetings.

The Library has created a data portal, data.bl.uk, containing openly licensed datasets. We aim to describe collections in terms of their data format (images, full text, metadata, etc.), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes. Other datasets may be available by request, or digitised via funded partnerships.

We're aware that, currently, it can be hard to use the datasets from data.bl.uk as they can be too large to easily download, store and manipulate. This leads me neatly onto...

Challenges in digital scholarship

  • Digitisation and cataloguing backlog - the material you want mightn't be available without a special digitisation project
  • Providing access to assets for individual items - between copyright and technology, scholars don't always have the ability to download OCR/HTR text, or download all digitised media about an item
  • Providing access to collections as datasets - moving more material into the 'sweet spot' of material that's nicely digitised in suitable formats, usable sizes, with open licences allowing for re-use is an on-going (and expensive, time-consuming process)
  • 'Cleaning' historical data and dealing with gaps in both tools provision and source collections - none of these processes are straightforward
  • Providing access to platforms or suites of tools - how much should the Library take on for researchers, and how much should other institutions or individuals provide?
  • Skills - where will researchers learn digital scholarship methods?
  • Peer review - what if your discipline lacks DS-skilled peers? How can peers judge a website or database if they've only had experience with monographs or articles? How can scholars overcome prejudice about the 'digital'?
  • Versioning datasets as annotations or classifications change, software tools improve over time, transcriptions are corrected, etc - some of these changes may affect the argument you're making

Overall, I hope the opportunities outweigh the challenges, and it's certainly possible to start with small projects with existing tools and digital sources to explore the potential of a larger project.

If you've used BL data, you can enter the BL Labs awards - they don't close until October so you have time to start an experimental project now! You can also ask the Labs team to reality check your digital scholarship idea based on Library collections and data.

Digital scholarship is constantly shifting so on another date I might have come up with different opportunities and challenges. Let me know if you have challenges or opportunities that you think could be included in this very brief overview!