Digital scholarship blog

254 posts categorized "Digital scholarship"

20 April 2022

Importing images into Zooniverse with a IIIF manifest: introducing an experimental feature

Digital Curator Dr Mia Ridge shares news from a collaboration between the British Library and Zooniverse that means you can more easily create crowdsourcing projects with cultural heritage collections. There's a related blog post on Zooniverse, Fun with IIIF.

IIIF manifests - text files that tell software how to display images, sound or video files alongside metadata and other information about them - might not sound exciting, but by linking to them, you can view and annotate collections from around the world. The IIIF (International Image Interoperability Framework) standard makes images (or audio, video or 3D files) more re-usable - they can be displayed on another site alongside the original metadata and information provided by the source institution. If an institution updates a manifest - perhaps adding information from updated cataloguing or crowdsourcing - any sites that display that image automatically gets the updated metadata.

Playbill showing the title after other large text
Playbill showing the title after other large text

We've posted before about how we used IIIF manifests as the basis for our In the Spotlight crowdsourced tasks on LibCrowds.com. Playbills are great candidates for crowdsourcing because they are hard to transcribe automatically, and the layout and information present varies a lot. Using IIIF meant that we could access images of playbills directly from the British Library servers without needing server space and extra processing to make local copies. You didn't need technical knowledge to copy a manifest address and add a new volume of playbills to In the Spotlight. This worked well for a couple of years, but over time we'd found it difficult to maintain bespoke software for LibCrowds.

When we started looking for alternatives, the Zooniverse platform was an obvious option. Zooniverse hosts dozens of historical or cultural heritage projects, and hundreds of citizen science projects. It has millions of volunteers, and a 'project builder' that means anyone can create a crowdsourcing project - for free! We'd already started using Zooniverse for other Library crowdsourcing projects such as Living with Machines, which showed us how powerful the platform can be for reaching potential volunteers. 

But that experience also showed us how complicated the process of getting images and metadata onto Zooniverse could be. Using Zooniverse for volumes of playbills for In the Spotlight would require some specialist knowledge. We'd need to download images from our servers, resize them, generate a 'manifest' list of images and metadata, then upload it all to Zooniverse; and repeat that for each of the dozens of volumes of digitised playbills.

Fast forward to summer 2021, when we had the opportunity to put a small amount of funding into some development work by Zooniverse. I'd already collaborated with Sam Blickhan at Zooniverse on the Collective Wisdom project, so it was easy to drop her a line and ask if they had any plans or interest in supporting IIIF. It turns out they had, but hadn't had the resources or an interested organisation necessary before.

We came up with a brief outline of what the work needed to do, taking the ability to recreate some of the functionality of In the Spotlight on Zooniverse as a goal. Therefore, 'the ability to add subject sets via IIIF manifest links' was key. ('Subject set' is Zooniverse-speak for 'set of images or other media' that are the basis of crowdsourcing tasks.) And of course we wanted the ability to set up some crowdsourcing tasks with those items… The Zooniverse developer, Jim O'Donnell, shared his work in progress on GitHub, and I was very easily able to set up a test project and ask people to help create sample data for further testing. 

If you have a Zooniverse project and a IIIF address to hand, you can try out the import for yourself: add 'subject-sets/iiif?env=production' to your project builder URL. e.g. if your project is number #xxx then the URL to access the IIIF manifest import would be https://www.zooniverse.org/lab/xxx/subject-sets/iiif?env=production

Paste a manifest URL into the box. The platform parses the file to present a list of metadata fields, which you can flag as hidden or visible in the subject viewer (public task interface). When you're happy, you can click a button to upload the manifest as a new subject set (like a folder of items), and your images are imported. (Don't worry if it says '0 subjects).

 

Screenshot of manifest import screen
Screenshot of manifest import screen

You can try out our live task and help create real data for testing ingest processes at ​​https://frontend.preview.zooniverse.org/projects/bldigital/in-the-spotlight/classify

This is a very brief introduction, with more to come on managing data exports and IIIF annotations once you've set up, tested and launched a crowdsourced workflow (task). We'd love to hear from you - how might this be useful? What issues do you foresee? How might you want to expand or build on this functionality? Email digitalresearch@bl.uk or tweet @mia_out @LibCrowds. You can also comment on GitHub https://github.com/zooniverse/Panoptes-Front-End/pull/6095 or https://github.com/zooniverse/iiif-annotations

Digital work in libraries is always collaborative, so I'd like to thank British Library colleagues in Finance, Procurement, Technology, Collection Metadata Services and various Collections departments; the Zooniverse volunteers who helped test our first task and of course the Zooniverse team, especially Sam, Jim and Chris for their work on this.

 

12 April 2022

Making British Library collections (even) more accessible

Daniel van Strien, Digital Curator, Living with Machines, writes:

The British Library’s digital scholarship department has made many digitised materials available to researchers. This includes a collection of digitised books created by the British Library in partnership with Microsoft. This is a collection of books that have been digitised and processed using Optical Character Recognition (OCR) software to make the text machine-readable. There is also a collection of books digitised in partnership with Google. 

Since being digitised, this collection of digitised books has been used for many different projects. This includes recent work to try and augment this dataset with genre metadata and a project using machine learning to tag images extracted from the books. The books have also served as training data for a historic language model.

This blog post will focus on two challenges of working with this dataset: size and documentation, and discuss how we’ve experimented with one potential approach to addressing these challenges. 

One of the challenges of working with this collection is its size. The OCR output is over 20GB. This poses some challenges for researchers and other interested users wanting to work with these collections. Projects like Living with Machines are one avenue in which the British Library seeks to develop new methods for working at scale. For an individual researcher, one of the possible barriers to working with a collection like this is the computational resources required to process it. 

Recently we have been experimenting with a Python library, datasets, to see if this can help make this collection easier to work with. The datasets library is part of the Hugging Face ecosystem. If you have been following developments in machine learning, you have probably heard of Hugging Face already. If not, Hugging Face is a delightfully named company focusing on developing open-source tools aimed at democratising machine learning. 

The datasets library is a tool aiming to make it easier for researchers to share and process large datasets for machine learning efficiently. Whilst this was the library’s original focus, there may also be other uses cases for which the datasets library may help make datasets held by the British Library more accessible. 

Some features of the datasets library:

  • Tools for efficiently processing large datasets 
  • Support for easily sharing datasets via a ‘dataset hub’ 
  • Support for documenting datasets hosted on the hub (more on this later). 

As a result of these and other features, we have recently worked on adding the British Library books dataset library to the Hugging Face hub. Making the dataset available via the datasets library has now made the dataset more accessible in a few different ways.

Firstly, it is now possible to download the dataset in two lines of Python code: 

Image of a line of code: "from datasets import load_dataset ds = load_dataset('blbooks', '1700_1799')"

We can also use the Hugging Face library to process large datasets. For example, we only want to include data with a high OCR confidence score (this partially helps filter out text with many OCR errors): 

Image of a line of code: "ds.filter(lambda example: example['mean_wc_ocr'] > 0.9)"

One of the particularly nice features here is that the library uses memory mapping to store the dataset under the hood. This means that you can process data that is larger than the RAM you have available on your machine. This can make the process of working with large datasets more accessible. We could also use this as a first step in processing data before getting back to more familiar tools like pandas. 

Image of a line of code: "dogs_data = ds['train'].filter(lamda example: "dog" in example['text'].lower()) df = dogs_data_to_pandas()

In a follow on blog post, we’ll dig into the technical details of datasets in some more detail. Whilst making the technical processing of datasets more accessible is one part of the puzzle, there are also non-technical challenges to making a dataset more usable. 

 

Documenting datasets 

One of the challenges of sharing large datasets is documenting the data effectively. Traditionally libraries have mainly focused on describing material at the ‘item level,’ i.e. documenting one dataset at a time. However, there is a difference between documenting one book and 100,000 books. There are no easy answers to this, but libraries could explore one possible avenue by using Datasheets. Timnit Gebru et al. proposed the idea of Datasheets in ‘Datasheets for Datasets’. A datasheet aims to provide a structured format for describing a dataset. This includes questions like how and why it was constructed, what the data consists of, and how it could potentially be used. Crucially, datasheets also encourage a discussion of the bias and limitations of a dataset. Whilst you can identify some of these limitations by working with the data, there is also a crucial amount of information known by curators of the data that might not be obvious to end-users of the data. Datasheets offer one possible way for libraries to begin more systematically commuting this information. 

The dataset hub adopts the practice of writing datasheets and encourages users of the hub to write a datasheet for their dataset. For the British library books, we have attempted to write one of these datacards. Whilst it is certainly not perfect, it hopefully begins to outline some of the challenges of this dataset and gives end-users a better sense of how they should approach a dataset. 

16 March 2022

Getting Ready for Black Theatre and the Archive: Making Women Visible, 1900-1950

Following on from last week’s post, have you signed up for our Wikithon already? If you are interested in Black theatre history and making women visible, and want to learn how to edit Wikipedia, please do join us online, on Monday 28th March, from 10am to 1.30pm BST, over Zoom.

Remember the first step is to book your place here, via Eventbrite.

Finding Sources in The British Newspaper Archive

We are grateful to the British Newspaper Archive and Findmypast for granting our participants access to their resources on the day of the event. If you’d like to learn more about this Archive beforehand, there are some handy guides to how to do this below.

Front page of the British Newspaper Archive website, showing the search bar and advertising Findmypast.
The British Newspaper Archive Homepage

I used a quick British Newspaper Archive Search to look for information on Una Marson, a playwright and artist whose work is very important in the timeframe of this Wikithon (1900-1950). As you can see, there were over 1000 results. I was able to view images of Una at gallery openings, art exhibitions and read all about her work.

Page of search results on the British Newspaper Archive, looking for articles about Una Marson.
A page of results for Una Marson on the British Newspaper Archive

Findmypast focuses more on legal records of people, living and dead. It’s a dream website for genealogists and those interested in social history. They’ve recently uploaded the results of the 1921 census, so there is a lot of material about people’s lives in the early 20th century.

Image of the landing page for the 1921 Census of England and Wales on Findmypast.
The Findmypast 1921 Census Homepage.

 

Here’s how to get started with Findmypast in 15 minutes, using a series of ‘how to’ videos. This handy blog post offers a beginner's guide on how to search Findmypast's family records, and you can always use  Findmypast’s help centre to seek answers to frequently asked questions.

Wikipedia Preparation

If you’d like to get a head start, you can download and read our handy guide to setting up your Wikipedia account, which you can access  here. There is also advice available on creating your account, Wikipedia's username policy and how to create your user page.

The Wikipedia logo, a white globe made of jigsaw pieces with letters and symbols on them in black.
The Wikipedia Logo, Nohat (concept by Paullusmagnus), CC BY-SA 3.0, via Wikimedia Commons

Once you have done that, or if you already have a Wikipedia account, please join our event dashboard and go through the introductory exercises, which cover:

  • Wikipedia Essentials
  • Editing Basics
  • Evaluating Articles and Sources
  • Contributing Images and Media Files
  • Sandboxes and Mainspace
  • Sources and Citations
  • Plagiarism

These are all short exercises that will help familiarise you with Wikipedia and its processes. Don’t have time to do them? We get it, and that’s totally fine - we’ll cover the basics on the day too!

You may want to verify your Wikipedia account - this function exists to make sure that people are contributing responsibly to Wikipedia. The easiest and swiftest way to verify your account is to do 10 small edits. You could do this by correcting typos or adding in missing dates. However, another way to do this is to find articles where citations are needed, and add them via Citation Hunt. For further information on adding citations, watching this video may be useful.

Happier with an asynchronous approach?

If you cannot join the Zoom event on Monday 28th March, but would like to contribute, please do check out and sign up to our dashboard. The online dashboard training exercises will be an excellent starting point. From there, all of your edits and contributions will be registered, and you can be proud of yourself for making the world of Wikipedia a better place, in your own time.

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian).

14 March 2022

The Lotus Sutra Manuscripts Digitisation Project: the collaborative work between the Heritage Made Digital team and the International Dunhuang Project team

Digitisation has become one of the key tasks for the curatorial roles within the British Library. This is supported by two main pillars: the accessibility of the collection items to everybody around the world and the preservation of unique and sometimes, very fragile, items. Digitisation involves many different teams and workflow stages including retrieval, conservation, curatorial management, copyright assessment, imaging, workflow management, quality control, and the final publication to online platforms.

The Heritage Made Digital (HMD) team works across the Library to assist with digitisation projects. An excellent example of the collaborative nature of the relationship between the HMD and International Dunhuang Project (IDP) teams is the quality control (QC) of the Lotus Sutra Project’s digital files. It is crucial that images meet the quality standards of the digital process. As a Digitisation Officer in HMD, I am in charge of QC for the Lotus Sutra Manuscripts Digitisation Project, which is currently conserving and digitising nearly 800 Chinese Lotus Sutra manuscripts to make them freely available on the IDP website. The manuscripts were acquired by Sir Aurel Stein after they were discovered  in a hidden cave in Dunhuang, China in 1900. They are thought to have been sealed there at the beginning of the 11th century. They are now part of the Stein Collection at the British Library and, together with the international partners of the IDP, we are working to make them available digitally.

The majority of the Lotus Sutra manuscripts are scrolls and, after they have been treated by our dedicated Digitisation Conservators, our expert Senior Imaging Technician Isabelle does an outstanding job of imaging the fragile manuscripts. My job is then to prepare the images for publication online. This includes checking that they have the correct technical metadata such as image resolution and colour profile, are an accurate visual representation of the physical object and that the text can be clearly read and interpreted by researchers. After nearly 1000 years in a cave, it would be a shame to make the manuscripts accessible to the public for the first time only to be obscured by a blurry image or a wayward piece of fluff!

With the scrolls measuring up to 13 metres long, most are too long to be imaged in one go. They are instead shot in individual panels, which our Senior Imaging Technicians digitally “stitch” together to form one big image. This gives online viewers a sense of the physical scroll as a whole, in a way that would not be possible in real life for those scrolls that are more than two panels in length unless you have a really big table and a lot of specially trained people to help you roll it out. 

Photo showing the three individual panels of Or.8210S/1530R with breaks in between
Or.8210/S.1530: individual panels
Photo showing the three panels of Or.8210S/1530R as one continuous image
Or.8210/S.1530: stitched image

 

This post-processing can create issues, however. Sometimes an error in the stitching process can cause a scroll to appear warped or wonky. In the stitched image for Or.8210/S.6711, the ruled lines across the top of the scroll appeared wavy and misaligned. But when I compared this with the images of the individual panels, I could see that the lines on the scroll itself were straight and unbroken. It is important that the digital images faithfully represent the physical object as far as possible; we don’t want anyone thinking these flaws are in the physical item and writing a research paper about ‘Wonky lines on Buddhist Lotus Sutra scrolls in the British Library’. Therefore, I asked the Senior Imaging Technician to restitch the images together: no more wonky lines. However, we accept that the stitched images cannot be completely accurate digital surrogates, as they are created by the Imaging Technician to represent the item as it would be seen if it were to be unrolled fully.

 

Or.8210/S.6711: distortion from stitching. The ruled line across the top of the scroll is bowed and misaligned
Or.8210/S.6711: distortion from stitching. The ruled line across the top of the scroll is bowed and misaligned

 

Similarly, our Senior Imaging Technician applies ‘digital black’ to make the image background a uniform colour. This is to hide any dust or uneven background and ensure the object is clear. If this is accidentally overused, it can make it appear that a chunk has been cut out of the scroll. Luckily this is easy to spot and correct, since we retain the unedited TIFFs and RAW files to work from.

 

Or.8210/S.3661, panel 8: overuse of digital black when filling in tear in scroll. It appears to have a large black line down the centre of the image.
Or.8210/S.3661, panel 8: overuse of digital black when filling in tear in scroll

 

Sometimes the scrolls are wonky, or dirty or incomplete. They are hundreds of years old, and this is where it can become tricky to work out whether there is an issue with the images or the scroll itself. The stains, tears and dirt shown in the images below are part of the scrolls and their material history. They give clues to how the manuscripts were made, stored, and used. This is all of interest to researchers and we want to make sure to preserve and display these features in the digital versions. The best part of my job is finding interesting things like this. The fourth image below shows a fossilised insect covering the text of the scroll!

 

Black stains: Or.8210/S.2814, panel 9
Black stains: Or.8210/S.2814, panel 9
Torn and fragmentary panel: Or.8210/S.1669, panel 1
Torn and fragmentary panel: Or.8210/S.1669, panel 1
Insect droppings obscuring the text: Or.8210/S.2043, panel 1
Insect droppings obscuring the text: Or.8210/S.2043, panel 1
Fossilised insect covering text: Or.8210/S.6457, panel 5
Fossilised insect covering text: Or.8210/S.6457, panel 5

 

We want to minimise the handling of the scrolls as much as possible, so we will only reshoot an image if it is absolutely necessary. For example, I would ask a Senior Imaging Technician to reshoot an image if debris is covering the text and makes it unreadable - but only after inspecting the scroll to ensure it can be safely removed and is not stuck to the surface. However, if some debris such as a small piece of fluff, paper or hair, appears on the scroll’s surface but is not obscuring any text, then I would not ask for a reshoot. If it does not affect the readability of the text, or any potential future OCR (Optical Character Recognition) or handwriting analysis, it is not worth the risk of damage that could be caused by extra handling. 

Reshoot: Or.8210/S.6501: debris over text  /  No reshoot: Or.8210/S.4599: debris not covering text.
Reshoot: Or.8210/S.6501: debris over text  /  No reshoot: Or.8210/S.4599: debris not covering text.

 

These are a few examples of the things to which the HMD Digitisation Officers pay close attention during QC. Only through this careful process, can we ensure that the digital images accurately reflect the physicality of the scrolls and represent their original features. By developing a QC process that applies the best techniques and procedures, working to defined standards and guidelines, we succeed in making these incredible items accessible to the world.

Read more about Lotus Sutra Project here: IDP Blog

IDP website: IDP.BL.UK

And IDP twitter: @IDP_UK

Dr Francisco Perez-Garcia

Digitisation Officer, Heritage Made Digital: Asian and African Collections

Follow us @BL_MadeDigital

08 March 2022

Black Theatre and the Archive: Making Women Visible, 1900-1950

On International Women’s Day 2022 we are pleased to announce our upcoming online Wikithon event, Black Theatre and the Archive: Making Women Visible, 1900-1950, which will take place on Monday 28th March, 10:00 – 13:30 BST. Working with one of the Library’s notable collections, the Lord Chamberlain’s Plays, we will be looking to increase the visibility and presence of Black women on Wikipedia, with a specific focus on twentieth century writers and performers of works in the collection, such as Una Marson and Pauline Henriques, alongside others who are as yet lesser-known than their male counterparts.

The Lord Chamberlain’s Plays are the largest single manuscript collection held by the Library. Between 1824 and 1968 all plays in the UK were submitted to the Lord Chamberlain’s Office for licensing. This period includes two important acts of Parliament related to theatre in the UK: the Stage Licensing Act of 1737 and the Theatres Act of 1843. You can watch Dr Alexander Lock, Curator of Modern Archives and Manuscripts at the British Library, discussing this collection with Giuliano Levato who runs the People of Theatre vlog in this video below.

The Lord Chamberlain’s Plays with British Library Curator Dr Alexander Lock on People of Theatre - The Vlog for Theatregoers

We are delighted to be collaborating with Professor Kate Dossett of the University of Leeds. Kate is currently working on ‘Black Cultural Archives & the Making of Black Histories: Archives of Surveillance and Black Transnational Theatre’, a project supported by an Independent Social Research Foundation Fellowship and a Fellowship from our very own Eccles Centre. Her work is crucial in shining light on the understudied area of Black theatre history in the first half of the twentieth century .

 A woman and a man sit behind a desk with an old-fashioned microphone that says ‘BBC’. The woman is on the left, holding a script, looking at the microphone. The man is also holding a script and looking away.(1)
Pauline Henriques and Sam Sevlon in 1952. Image: BBC UK Government, Public domain, via Wikimedia Commons.

Our wikithon is open to everyone: you can register for free here. We will be blogging in the run up to the event with details on how to prepare. We are thankful to be supported by the British Newspaper Archive and FindMyPast, who will provide registered participants access to their online resources for the day of the event. You can also access 1 million free newspaper pages at any time, as detailed in this blog post.  

We hope to consider a variety of questions, such as what a timeline of Black British theatre history looks like, who gets to decide the parameters, and how we can make women more visible in these studies? We will think about the traditions shaping Black British theatre and the collections that help us understand this field of study, such as the Lord Chamberlain’s Plays. This kind of hands-on historical research helps us to better represent marginalised voices in the present day. 

It will be the first of a series of three Wikithons exploring different elements of the Lord Chamberlain’s Plays. Throughout 2022 we will host another two Wikithons. Please follow this blog, our twitter @BL_DigiSchol and keep an eye on our Wiki Project Page for updates about these.

Art + Feminism Barnstar: a black and white image of a fist holding a paintbrush in front of a green star.
Art + Feminism Barnstar, by Ilotaha13, (CC BY-SA 4.0)

We are running this workshop as part of the Art + Feminism Wiki movement, with an aim to expand and amplify knowledge produced by and about Black women. As they state in their publicity materials:

Women make up only 19% of biographies on English Wikipedia, and women of colour even fewer. Wikipedia's gender trouble is well-documented: in a 2011 survey 2010 UNU-MERIT Survey, the Wikimedia Foundation found that less than 10% of its contributors identify as female; more recent research [such as the] 2013 Benjamin Mako Hill survey points to 16% globally and 22% in the US. The data relative to trans and non-binary editors is basically non-existent. That's a big problem. While the reasons for the gender gap are up for debate, the practical effect of this disparity is not: gaps in participation create gaps in content.

We want to combat this imbalance directly. As a participant at this workshop, you will receive training on creating and editing Wikipedia articles to communicate the central role played by Black women in British theatre making between 1900 and 1950. You will also be invited to explore resources that can enable better citation justice for women of colour knowledge producers and greater awareness of archive collections documenting Black British histories. With expert support from Wikimedians and researchers alike, this is a great opportunity to improve Wikipedia for the better.

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian) and Digital Curator Stella Wisdom (@miss_wisdom).

14 February 2022

PhD Placement on Mapping Caribbean Diasporic Networks through Correspondence

Every year the British Library host a range of PhD placement scheme projects. If you are interested in applying for one of these, the 2022 opportunities are advertised here. There are currently 15 projects available across Library departments, all starting from June 2022 onwards and ending before March 2023. If you would like to work with born digital collections, you may want to read last week’s Digital Scholarship blog post about two projects on enhanced curation, hybrid archives and emerging formats. However, if you are interested in Caribbean diasporic networks and want to experiment creating network analysis visualisations, then read on to find out more about the “Mapping Caribbean Diasporic Networks through correspondence (2022-ACQ-CDN)” project.

This is an exciting opportunity to be involved with the preliminary stages of a project to map the Caribbean Diasporic Network evident in the ‘Special Correspondence’ files of the Andrew Salkey Archive. This placement will be based in the Contemporary Literary and Creative Archives team at the British Library with support from Digital Scholarship colleagues. The successful candidate will be given access to a selection of correspondence files to create an item level dataset and explore the content of letters from the likes of Edward Kamau Brathwaite, C.L.R. James, and Samuel Selvon.

Photograph of Andrew Salkey
Photograph of Andrew Salkey, from the Andrew Salkey Archive, Deposit 10310. With kind permission of Jason Salkey.

The main outcome envisaged for this placement is to develop a dataset, using a sample of ten files, linking the data and mapping the correspondent’s names, location they were writing from, and dates of the correspondence in a spreadsheet. The placement student will also learn how to use the Gephi Open Graph Visualisation Platform to create a visual representation of this network, associating individuals with each other and mapping their movement across the world between the 1950s and 1990s.

Gephi is open-source software  for visualising and analysing networks, they provide a step-by-step guide to getting started, with the first step to upload a spreadsheet detailing your ‘nodes’ and ‘edges’. To show an example of how Gephi can be used, We've included an example below, which was created by previous British Library research placement student Sarah FitzGerald from the University of Sussex, using data from the Endangered Archives Programme (EAP) to create a Gephi visualisation of all EAP applications received between 2004 and 2017.

Gephi network visualisation diagram
Network visualisation of EAP Applications created by Sarah FitzGerald

In this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups. Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop. If you want to read more about the other visualisations that Sarah created during her project, please check out these two blog posts:

We hope this new PhD placement will offer the successful candidate the opportunity to develop their specialist knowledge through access to the extensive correspondence series in the Andrew Salkey archive, and to undertake practical research in a curatorial context by improving the accessibility of linked metadata for this collection material. This project is a vital building block in improving the Library’s engagement with this material and exploring the ways it can be accessed by a wider audience.

If you want to apply, details are available on the British Library website at https://www.bl.uk/research-collaboration/doctoral-research/british-library-phd-placement-scheme. Applications for all 2022/23 PhD Placements close on Friday 25 February 2022, 5pm GMT. The application form and guidelines are available online here. Please address any queries to research.development@bl.uk

This post is by Digital Curator Stella Wisdom (@miss_wisdom) and Eleanor Casson (@EleCasson), Curator in Contemporary Archives and Manuscripts.

10 February 2022

In conversation: Meet Silvija Aurylaitė, the new British Library Labs Manager

The newly appointed manager of the British Library Labs (BL Labs), Silvija Aurylaitė, is excited to start leading the BL Labs Labs transformation with a new focus on computational creative thinking. The BL Labs is a welcoming space for everyone curious about computational research and using the British Library’s digital collections. We welcome all researchers - data scientists, digital humanists, artists, creative practitioners, and everyone curious about digital research.

Image of BL Labs Manager Silvija Aurylaite
Introducing Silvija Aurylaitė, new manager of BL Labs

Find out more from Silvija, in conversation with Maja Maricevic, BL Head of Higher Education and Science.

 

Maja: The Labs have a proud history of experimenting and innovating with the British Library’s digital collections. Can you tell us more about your own background?

Silvija: Ever since I discovered the BL Labs in London 8 years ago, I have been immersed into the world of experimentation with digital collections. I started researching collections from open GLAMs (galleries, libraries, archives and museums) around the world and the implications of copyright and licensing for creative reuse. In a large ecosystem of open digital collections, my special interest has been identifying content for people to use to bring to life their creative ideas such as new design works.

Inspired by the Labs, I started developing my own curatorial web project, which won the Europeana Creative Design Challenge in 2015. The award gave me the chance to work with a team of international experts to learn new skills in areas such as IT, copyright and social entrepreneurship. This experience later evolved into the ‘Revivo Images’, a pilot website that gives guidance on open image collections around the world, which are carefully selected for quality, reliability of copyright and licence information, with explanations how to use the databases. It was a result of collaboration with a great interdisciplinary team including an IT lead, programmers, curators, designers and a copywriter.

All this gave me invaluable experience in overseeing a digital collections web project from vision to implementation. I learned about curating content from across collections, building an image database and mapping metadata using various standards. We also used AI and human input to create keywords and thematic catalogs and designed a simple minimalist user interface.

What I most enjoyed about this journey, actually, was meeting a great range of creative people in many creative fields, from professional animators to students looking for a theme for their BA final thesis - and learning what excited them most, and what barriers they faced in using open collections. I met many of them at various art festivals, universities, design schools and events where I delivered talks and creative workshops in my free time to spread the word about open digital collections for creativity. For two years I was also responsible for the ‘Bridgeman Education’ online database, one of the largest digital image collections with over 1.300.000 images from the GLAM sector, designed for the use of art images in higher education curricula. I had the opportunity to talk to many librarians, lecturers and students from around the world about what they find most useful in this new digital turn.

As a result of this, I am particularly excited about introducing the Labs to university students: from students in computer science departments with coding skills to researchers in social sciences and humanities, to creativity champions in fashion, graphic design or jewelry, who might be attracted to aesthetic qualities of our collections or those looking to pick up creative coding skills.

The landscape has changed a lot in the last 8 years since I learned about the Labs, and I gradually started my own journey of learning code and algorithmic thinking. Already in my previous role in the British Library, as the Rights Officer for the Heritage Made Digital project, we approached digital collections as data. Now we are all embracing computational data science methods to gain new insights into digital collections, and that is what the future British Library Labs is going to celebrate.

 

Maja: You have a strong connection to the BL Labs since you were the Labs volunteer 8 years ago. What most inspired you when you first heard of the Labs?

Silvija: Personally, the Labs were my first professional experience abroad after my MA studies in intellectual history at the American university in Budapest, and happened to be one of the main incentives to stay in London.

This city has attracted me for its serendipity - you can have a great range of urban experiences from attending the oldest special interest societies and visiting antiquarian bookshops to meeting founders of latest startups in their regular gatherings and getting up to speed with the mindset of perpetual innovation.

When I first heard about the Labs in one of its public events, this sentence struck me: “experiment with the BL digital collections to create something new”, with the “new” being undefined and open. I had this idea of a perpetuity - the possibility of endlessly combining the knowledge and aesthetics of the past, safeguarded by one of the biggest libraries of the world, with the creative visions, skills and technology of today and tomorrow.

Such endless new experiences of digital collections can be accelerated by creating a dedicated space for experimentation - a collider or a matchmaker - that contributes to the diverse serendipitous urban experience of London itself. This is how I see the Labs.

Looking from a user point of view, I am particularly excited about the ‘semiotic democracy’, or ‘the ability of users to produce and disseminate new creations and to take part in public cultural discourse’[1] (Stark, 2006). I believe this new playful approach to digitise out-of-copyright cultural materials will fundamentally change the way we see GLAMs. We’ll look at them less and less as spaces that are only there to learn about the past as it used to be, as a recipient, and more and more as a co-creator, able to enter into a meaningful dialogue and reshape meanings, narratives and experiences.

 

Maja: Prior to Labs appointment, you also have a significant rights management experience. What have you learned that will be useful for the Labs?

Silvija: It was a delight to work with Matthew Lambert, the Head of Copyright, Policy & Assurance, for the Heritage Made Digital project, led by Sandra Tuppen, in setting up the British Library’s copyright workflow for both current and historical digitisation projects. This project now allows users to explore the BL’s digital images in the Universal Viewer with attributed rights statements and usage terms.

These last 3.5 years was a great exercise in dealing with very large, often very messy, data to create complex systems, policies and procedures which allow oversight of all important aspects of the digital data including copyright and licencing, data protection and sensitivities. Of course, such work in the Library is of massive importance because it affects the level of freedom we later have to experiment, reuse and do further research based on this data.

Personally, the Heritage Made Digital project is also very precious to me because of its collaborative nature. They use MS SharePoint tool to facilitate data contributions from across many departments in the BL. And they are just fantastic at promoting and celebrating digitisation as a common effort to make content publicly accessible. I will definitely use this experience to suggest solutions on how to register and document both the BL’s datasets and related reuse projects as a similar collaborative project within the Library.

 

Maja: There is so much that is changing in digital research all the time. Are there particular current developments that you find exciting and why?

Silvija: Yes! First, I find the moment of change itself exciting - there is no book about the tools we use today that won’t be running out of date tomorrow. This is a good neuroplasticity exercise that trains the mind not to sleep and be constantly attentive to new developments and opportunities.

Second, I absolutely love to see how many people, from creators to researchers and library staff, are gradually and naturally embracing code languages. With this comes associated critical thinking, such as the ability to surpass often outdated old database interfaces to reveal exciting data insights simply by having a liberating package of new digital skills.

And, third, I am super excited about the possibility of upscaling and creating a bigger impact with existing breakthrough projects and brilliant ideas relating to the British Library’s data. I believe this could be done by finding consensus on how we want to register and document data science initiatives - finalised, ongoing and most wanted, both internally and externally - and then by promoting this knowledge further.

This would allow us to enter a new stage of the BL Labs. The new ecosystem of re-use would promote sustainability, reproducibility, adaptation and crowdsourced improvement of existing projects, giving us new super powers!

↩︎ Stark, Elisabeth (2006). Free culture and the internet: a new semiotic democracy. opendemocracy.net (June 20). URL: https://www.opendemocracy.net/en/semiotic_3662jsp

07 February 2022

New PhD Placements on Enhanced Curation: Hybrid Archives and Emerging Formats

The British Library is accepting applications for the new round of 2022 PhD Placement opportunities: there are 15 projects available across Library departments, all starting from June 2022 onwards and ending before March 2023. Two of the projects within the Contemporary British Collections department focus on Enhanced Curation as an approach to add to the research value of an archival object or digital publication.

Developing an enhanced curation framework for contemporary hybrid archives (2022-CB-HAC)” will outline a framework for Enhanced Curation in relation to contemporary hybrid archives. These archival collections are the record of the creative and professional lives of prominent individuals in UK society, containing both paper and digital material.  So far we have defined Enhanced Curation as the means by which the research value of these records can be enhanced through the creation, collection, and interrogation of the contextual information which surrounds them.

Luckily, we’re in a privileged position – most of our archive donors are living individuals who can illuminate their creative practice for us in real-time. Similarly, with forensic techniques, we’re capturing more data than ever before when we acquire an archive. The truly live questions are then – how can we use this position to best effect? What can we do with what we’re already collecting? What else should we be collecting? And how can we represent this data in engaging and enlightening new ways for the benefit of everyone, including our researchers and exhibition audiences?

Enhanced Curation, as we see it, is about bringing these dynamic collections to life for as many people as possible.  In approaching these questions, the chosen student will engage in a mixture of theoretical and practical work – first outlining the relevant debates and techniques in and around curation, archival science, museology and digital humanities, and then recommending a course of action for one particular hybrid personal archive. This is a collaborative exercise, though, and they will be provided with hands-on training for working with (and getting the most out of) this growing collection area by specialist curatorial staff at the Library.

Photograph of a floppy disk and its case
Floppy disk from the Will Self archive.

Collecting complex digital publications: Testing an enhanced curation method (2022-CB-EF)” focuses on the Library collection of emerging formats. Emerging formats are defined as born-digital publications whose structure, technical dependencies and highly interactive nature challenge our traditional collection methods. These publications include apps, such as the interactive adventure 80 Days, as well as digital interactive narratives, such as the examples collected in the UK Web Archive Interactive Narratives and New Media Writing Prize collections. Collection and preservation of these digital formats in their entirety might not always be possible: there are many challenges and implications in terms of technical capabilities, software and hardware dependencies, copyright restrictions and long-term solutions that are effective against technical obsolescence.

The collection and creation of contextual information is one approach to fill in the gaps and enhance curation for these digital publications. The placement student will helps us test a collection matrix for contextual information relating to emerging formats, which include – but is not limited to – webpages, interviews, reviews, blog posts and screenshots/screencast of usage of a work. These might be collected using a variety of methods (e.g. web archiving, direct transfer from the author, etc.) as well as created by the student themselves (e.g. interviews with the author, video recordings of usage, etc.) Through this placement, the student will have the opportunity to participate in a network of cultural heritage institutions concerned with the preservation of digital publications while helping develop one of the Library contemporary collections.

Photograph of a man looking at an iPad screen and reading an app
Interacting with the American Interior app on iPad.

Both PhD Placements are offered for 3 months full time, or part-time equivalent. They can be undertaken as hybrid placements (i.e. remotely, with some visits to the British Library building in London, St. Pancras), with the option of a fully remote placement for “Collecting complex digital publications: Testing an enhanced curation method”.

Applications for all 2022/23 PhD Placements close on Friday 25 February 2022, 5pm GMT. The application form and guidelines are available online here. Please address any queries to research.development@bl.uk

This post is by Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari and Callum McKean, Digital Lead Curator, Contemporary Archives and Manuscripts.

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs