25 April 2018
Some challenges and opportunities for digital scholarship in 2018
In this post, Digital Curator Dr Mia Ridge shares her presentation notes for a talk on 'challenges and opportunities for digital scholarship' at the British Library's first Research Collaboration 'Open House'.
I'm part of a team that supports the creation and innovative use of the British Library's digital collections. Our working definition of digital scholarship is 'using computational methods to answer existing research questions or challenge existing theoretical paradigms'. In this post/talk, my perspective is informed by my knowledge of the internal processes necessary to support digital scholarship and of the issues that some scholars face when using digital/digitised collections, so I'm not by any means claiming this is a complete list.
Opportunities in digital scholarship
- Scale: you can explore a bigger body of material computationally - 'reading' thousands, or hundreds of thousands, of volumes of text, images or media files - while retaining the ability to individually examine individual items as research questions arise from that distant reading
- Perspective: you can see trends, patterns and relationships not apparent from close reading individual items, or gain a broad overview of a topic
- Speed: you can test an idea or hypothesis on a large dataset; prototype new interfaces; generate classification data about people, places, concepts; transcribe content
Together, these opportunities enable new research questions.
Sample digital scholarship tools and methods
Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections at scale, improve discoverability or enable public engagement.
- OCR, HTR - optical character recognition, handwritten text recognition
- Data visualisation for analysis or publication
- Text and data mining - applying classifications to or analysing texts, images or media. Key terms include natural language processing, corpus linguistics, sentiment analysis, applied machine learning. Examples include: Voyant tools, Clarifai image classification.
- Mapping and GIS - assigning coordinates to quantitative or qualitative data
- Public participation and learning including crowdsourcing, citizen science/history. Examples include In the Spotlight, transcribing information from historical playbills.
- Creative and emerging formats including games
Putting it all together, we have case studies like Dr. Katrina Navickas, BL Labs Winner 2015's Political Meetings Mapper. This project, based on digitised 19th century newspapers, used Python scripts to calculate the meeting date, and extract and geocode their locations to create a map of Chartist meetings.
The Library has created a data portal, data.bl.uk, containing openly licensed datasets. We aim to describe collections in terms of their data format (images, full text, metadata, etc.), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes. Other datasets may be available by request, or digitised via funded partnerships.
We're aware that, currently, it can be hard to use the datasets from data.bl.uk as they can be too large to easily download, store and manipulate. This leads me neatly onto...
Challenges in digital scholarship
- Digitisation and cataloguing backlog - the material you want mightn't be available without a special digitisation project
- Providing access to assets for individual items - between copyright and technology, scholars don't always have the ability to download OCR/HTR text, or download all digitised media about an item
- Providing access to collections as datasets - moving more material into the 'sweet spot' of material that's nicely digitised in suitable formats, usable sizes, with open licences allowing for re-use is an on-going (and expensive, time-consuming process)
- 'Cleaning' historical data and dealing with gaps in both tools provision and source collections - none of these processes are straightforward
- Providing access to platforms or suites of tools - how much should the Library take on for researchers, and how much should other institutions or individuals provide?
- Skills - where will researchers learn digital scholarship methods?
- Peer review - what if your discipline lacks DS-skilled peers? How can peers judge a website or database if they've only had experience with monographs or articles? How can scholars overcome prejudice about the 'digital'?
- Versioning datasets as annotations or classifications change, software tools improve over time, transcriptions are corrected, etc - some of these changes may affect the argument you're making
Overall, I hope the opportunities outweigh the challenges, and it's certainly possible to start with small projects with existing tools and digital sources to explore the potential of a larger project.
If you've used BL data, you can enter the BL Labs awards - they don't close until October so you have time to start an experimental project now! You can also ask the Labs team to reality check your digital scholarship idea based on Library collections and data.
Digital scholarship is constantly shifting so on another date I might have come up with different opportunities and challenges. Let me know if you have challenges or opportunities that you think could be included in this very brief overview!