THE BRITISH LIBRARY

Digital scholarship blog

85 posts categorized "Research collaboration"

23 October 2020

BL Labs Public Award Runner Up (Research) 2019 - Automated Labelling of People in Video Archives

Add comment

Example people identified in TV news related programme clips
People 'automatically' identified in digital TV news related programme clips.

Guest blog post by Andrew Brown (PhD researcher),  Ernesto Coto (Research Software Engineer) and Andrew Zisserman (Professor) of the Visual Geometry Group, Department of Engineering Science, University of Oxford, and BL Labs Public Award Runner-up for Research, 2019. Posted on their behalf by Mahendra Mahey, Manager of BL Labs.

In this work, we automatically identify and label (tag) people in large video archives without the need for any manual annotation or supervision. The project was carried out with the British Library on a sample of 106 videos from their “Television and radio news” archive; a large collection of news programs from the last 10 years. This archive serves as an important and fascinating resource for researchers and the general public alike. However, the sheer scale of the data, coupled with a lack of relevant metadata, makes indexing, analysing and navigating this content an increasingly difficult task. Relying on human annotation is no longer feasible, and without an effective way to navigate these videos, this bank of knowledge is largely inaccessible.

As users, we are typically interested in human-centric queries such as:

  • “When did Jeremy Corbyn first appear in a Newsnight episode?” or
  • “Show me all of the times when Hugh Grant and Shirley Williams appeared together.

Currently this is nigh on impossible without trawling through hundreds of hours of content. 

We posed the following research question:

Is it possible to enable automatic person-search capabilities such as this in the archive, without the need for any manual supervision or labelling?

The answer is “yes”, and the method is described next.

Video Pre-Processing

The basic unit which enables person labelling in videos is the face-track; a group of consecutive face detections within a shot that correspond to the same identity. Face-tracks are extracted from all of the videos in the archive. The task of labelling the people in the videos is then to assign a label to each one of these extracted face-tracks. The video below gives an example of two face-tracks found in a scene.


Two face-tracks found in British Library digital news footage by Visual Geometry Group - University of Oxford.

Techniques at Our Disposal

The base technology used for this work is a state-of-the-art convolutional neural network (CNN), trained for facial recognition [1]. The CNN extracts feature-vectors (a list of numbers) from face images, which indicate the identity of the depicted person. To label a face-track, the distance between the feature-vector for the face-track, and the feature-vector for a face-image with known identity is computed. The face-track is labelled as depicting that identity if the distance is smaller than a certain threshold (i.e. they match). We also use a speaker recognition CNN [2] that works in the same way, except it labels speech segments from unknown identities using speech segments from known identities within the video.

Labelling the Face-Tracks

Our method for automatically labelling the people in the video archive is divided into three main stages:

(1) Our first labelling method uses what we term a “celebrity feature-vector bank”, which consists of names of people that are likely to appear in the videos, and their corresponding feature-vectors. The names are automatically sourced from IMDB cast lists for the programmes (the titles of the programmes are freely available in the meta-data). Face-images for each of the names are automatically downloaded from image-search engines. Incorrect face-images and people with no images of themselves on search engines are automatically removed at this stage. We compute the feature-vectors for each identity and add them to the bank alongside the names. The face-tracks from the video archives are then simply labelled by finding matches in the feature-vector bank.

Face-tracks from the video archives are labelled by finding matches in the feature-vector bank.
Face-tracks from the video archives are labelled by finding matches in the feature-vector bank. 

(2) Our second labelling method uses the idea that if a name is spoken, or found displayed in a scene, then that person is likely to be found within that scene. The task is then to automatically determine whether there is a correspondence or not. Text is automatically read from the news videos using Optical Character Recognition (OCR), and speech is automatically transcribed using Automatic Speech Recognition (ASR). Names are identified and they are searched for on image search engines. The top ranked images are downloaded and the feature-vectors are computed from the faces. If any are close enough to the feature-vectors from the face-tracks present in the scene, then that face-track is labelled with that name. The video below details this process for a written name.


Using text or spoken word and face recognition to identify a person in a news clip.

(3) For our third labelling method, we use speaker recognition to identify any non-labelled speaking people. We use the labels from the previous two stages to automatically acquire labelled speech segments from the corresponding labelled face-tracks. For each remaining non-labelled speaking person, we extract the speech feature-vector and compute the distance of it to the feature-vectors of the labelled speech segments. If one is close enough, then the non-labelled speech segment and corresponding face-track is assigned that name. This process manages to label speaking face-tracks with visually challenging faces, e.g. deep in shadow or at an extremely non-frontal pose.

Indexing and Searching Identities

The results of our work can be browsed via a web search engine of our own design. A search bar allows for users to specify the person or group of people that they would like to search for. People’s names are efficiently indexed so that the complete list of names can be filtered as the user types in the search bar. The search results are returned instantly with their associated metadata (programme name, data and time) and can be displayed in multiple ways. The video associated with each search result can be played, visualising the location and the name of all identified people in the video. See the video below for more details. This allows for the archive videos to be easily navigated using person-search, thus opening them up for use by the general public.


Archive videos easily navigated using person-search.

For examples of more of our Computer Vision research and open-source software, visit the Visual Geometry Group website.

This work was supported by the EPSRC Programme Grant Seebibyte EP/M013774/1

[1] Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. VGGFace2: A dataset for recognising faces across pose and age. In Proc. International Conference on Automatic Face & Gesture Recognition, 2018.

[2] Joon Son Chung, Arsha Nagrani and Andrew Zisserman. VoxCeleb2: Deep Speaker Recognition. INTERSPEECH, 2018

BL Labs Public Awards 2020

Inspired by this work that uses the British Library's digital archived news footage? Have you done something innovative using the British Library's digital collections and data? Why not consider entering your work for a BL Labs Public Award 2020 and win fame, glory and even a bit of money?

This year's public and staff awards 2020 are open for submission, the deadline for entry for both is Monday 30 November 2020.

Whilst we welcome projects on any use of our digital collections and data (especially in research, artistic, educational and community categories), we are particularly interested in entries in our public awards that have focused on anti-racist work, about the pandemic or that are using computational methods such as the use of Jupyter Notebooks.

20 October 2020

The Botish Library: developing a poetry printing machine with Python

Add comment

This is a guest post by Giulia Carla Rossi, Curator of Digital Publications at the British Library. You can find her @giugimonogatari.

In June 2020 the Office for Students announced a campaign to fill 2,500 new places on artificial intelligence and data science conversion courses in universities across the UK. While I’m not planning to retrain in cyber, I was lucky enough to be in the cohort for the trial run of one of these courses: Birkbeck’s Postgraduate Certificate in Applied Data Science. The course started as a collaborative project between The British Library, The National Archives and Birkbeck University to develop a computing course aimed at professionals working in the cultural heritage sector. The trial run has now ended and the course is set to start in full from January 2021.

The course is designed for graduates who are new to computer science – which was perfect for me, as I had no previous coding knowledge besides some very basic HTML and CSS. It was a very steep learning curve, starting from scratch and ending with developing my own piece of software, but it was great to see how code could be applied to everyday issues to facilitate and automate parts of our workload. The fact that it was targeted at information professionals and that we could use existing datasets to learn from real life examples made it easier to integrate study with work. After a while, I started to look at the everyday tasks in my to-do list and wonder “Can this be solved with Python?”

After a taught module (Demystifying Computing with Python), students had to work on an individual project module and develop a software based on their work (to solve an issue, facilitate a task, re-use and analyse existing resources). I had an idea of the themes I wanted to explore – as Curator of Digital Publications, I’m interested in new media and platforms used to deliver content, and how text and stories are shaped by these tools. When I read about French company Short Édition and the short story vending machine in Canary Wharf I knew I had found my project.

My project is to build a stand-alone printer that prints random poems from a dataset of out-of-copyright texts. A little portable Bot-ish (sic!) Library to showcase the British Library collections and fill the world with more poetry.

This is a compilation of two images, a portable printer and a design sketch of the same by the author.
A Short Story Station in Canary Wharf, London and my own sketch of a printing machine. (photo by the author)

 

Finding poetry

For my project, I decided to use the British Library’s “Digitised printed books (18th-19th century)” collection. This comprises over 60,000 volumes of 18th and 19th century texts, digitised in partnership with Microsoft and made available under Public Domain Mark. My work focused on the metadata dataset and the dataset of OCR derived text (shout out to the Digital Research team for kindly providing me with this dataset, as its size far exceeded what my computer is able to download).

The British Library actively encourages researchers to use its “digital collection and data in exciting and innovative ways” and projects with similar goals to mine had been undertaken before. In 2017, Dr Jennifer Batt worked with staff at the British Library on a data mining project: her goal was to identify poetry within a dataset of 18th Century digitised newspapers from the British Library’s Burney Collection. In her research, Batt argued that employing a set of recurring words didn’t help her finding poetry within the dataset, as only very few of the poems included key terms like ‘stanza’ and ‘line’ – and none included the word ‘poem’. In my case, I chose to work with the metadata dataset first, as a way of filtering books based on their title, and while, as Batt proved, it’s unlikely that a poem itself includes a term defining its poetry style I was quite confident that such terms might appear in the title of a poetry collection.

My first step then was to identify books containing poetry, by searching through the metadata dataset using key words associated with poetry. My goal was not to find all the poetry in the dataset, but to identify books containing some form of poetry, that could be reused to create my printer dataset. I used the Poetry Foundation’s online “Glossary of Poetic Terms - Forms & Types of Poems” to identify key terms to use, eliminating the anachronisms (no poetry slam in the 19th century, I'm afraid) and ambiguous terms (“romance” returned too many results that weren’t relevant to my research). The result was 4580 book titles containing one or more poetry-related words.

 

A screenshot showing key terms defined as 'poem, sonnet, ballad, rhyme, verse etc.
My list of poetry terms used to search through the dataset

 

 

Creating verses: when coding meets grammar

I then wanted to extract individual poems from my dataset. The variety of book structures and poetry styles made it impossible to find a blanket rule that could be applied to all books. I chose to test my code out on books that I knew had one poem per page, so that I could extract pages and easily get my poems. Because of its relatively simple structure - and possibly because of some nostalgia for my secondary school Italian class - I started my experiments with Giacomo Pincherle’s 1865 translation of Dante’s sonnets, “In Omaggio a Dante. Dante's Memorial. [Containing five sonnets from Dante, Petrarch and Metastasio, with English versions by G. Pincherle, and five original sonnets in English by G. Pincherle.]

Once I solved the problem of extracting single poems, the issue was ‘reshaping’ the text to match the print edition. Line breaks are essential to the meaning of a poem and the OCR text was just one continuous string of text that completely disregarded the metric and rhythm of the original work. The rationale behind my choice of book was also that sonnets present a fairly regular structure, which I was hoping could be of use when reshaping the text. The idea of using the poem’s metre as a tool to determine line length seemed the most effective choice: by knowing the type of metre used (iambic pentameter, terza rima, etc.) it’s possible to anticipate the number of syllables for each line and where line breaks should occur.

So I created a function to count how many syllables a word has following English grammar rules. As it’s often the case with coding, someone has likely already encountered the same problem as you and, if you’re lucky, they have found a solution: I used a function found online as my base (thank you, StackOverflow), building on it in order to cover as many grammar rules (and exceptions) as I was aware of. I used the same model and adapted it to Italian grammar rules, in order to account for the Italian sonnets in the book as well. I then decided to combine the syllable count with the use of capitalisation at the beginning of a line. This increased the chances of a successful result in case the syllable count would return a wrong result (which might happen whenever typos appear in the OCR text).

 

An image showing the poem 'To My Father', both written as a string of lines, and in its original form
The same sonnet restructured so that each line is a new string (above), and matches the line breaks in the print edition (below)

 

It was very helpful that all books in the datasets were digitised and are available to access remotely (you can search for them on the British Library catalogue by using the search term “blmsd”), so I could check and compare my results to the print editions from home even during lockdown. I also tested my functions on sonnets from Henry Thomas Mackenzie Bell’s “Old Year Leaves Being old verses revived. [With the addition of two sonnets.]” and Welbore Saint Clair Baddeley’s “Legend of the Death of Antar, an eastern romance. Also lyrical poems, songs, and sonnets.

Another image showing a poem, this time a sonnet, written as both a string of lines, and in its original form
Example of sonnet from Legend of the Death of Antar, an eastern romance. The function that divides the poems into lines could be adapted to accommodate breaks between stanzas as well.

 

Main challenges and gaps in research

  • Typos in the OCR text: Errors and typos were introduced when the books in the collection were first digitised, which translated into exceptions to the rules I devised for identifying and restructuring poems. In order to ensure the text of every poem has been correctly captured and that typos have been fixed, some degree of manual intervention might be required.
  • Scalability: The variety of poetry styles and book structures, paired with the lack of tagging around verse text, make it impossible to find a single formula that can be applied to all cases. What I created is quite dependent on a book having one poem per page, and using capitalisation in a certain way.
  • Time constraint: the time limit we had to deliver the project - and my very-recently-acquired-and-still-very-much-developing skill set - meant I had to focus on a limited number of books and had to prioritise writing the software over building the printer itself.

 

Next steps

One of the outputs of this project is a JSON file containing a dictionary of poetry books. After searching for poetry terms, I paired the poetry titles and relative metadata with their pages from the OCR dataset, so the resulting file combines useful data from the two original datasets (book IDs, titles, authors’ names and the OCR text of each book). It’s also slightly easier to navigate compared to the OCR dataset as books can be retrieved by ID, and each page is an item in a list that can be easily called. One of the next steps will be to upload this onto the British Library data repository, in the hope that people might be encouraged to use it and conduct further research around this data collection.

Another, very obvious, next step is: building the printer! The individual components have already been purchased (Adafruit IoT Pi Printer Project Pack and Raspberry Pi 3). I will then have to build the thermal printer with Raspberry Pi and connect it to my poetry dataset. It’s interesting to note that other higher education institutions and libraries have been experimenting with similar ideas - like the University of Idaho Library’s Vandal Poem of the Day Bot and the University of British Columbia’s randomised book recommendations printer for libraries.

A photograph of technical components
Component parts of the Adafruit IoT Pi Printer Project Pack. (photo by the author)

My aim when working on this project was for the printer to be used to showcase British Library collections; the idea was for it to be located in a public area in the Library, to reach new audiences that might not necessarily be there for research purposes. The printer could also be reprogrammed to print different genres and be customised for different occasions (e.g. exhibitions, anniversary celebrations, etc.) All of this was planned before Covid-19 happened, so it might be necessary to slightly adapt things now - and any suggestions in merit are very welcome! :)

Finally, none of this would have been possible without Nora McGregor, Stelios Sotiriadis, Peter Wood, the Digital Scholarship and BL Labs teams, and the support of my line manager and my team.

19 October 2020

The 2020 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate an existing British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2020 British Library Labs Staff Award, now in its fifth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is 0700 (GMT), Monday 30 November 2020.

Nominees will be highlighted on Tuesday 15 December 2020 at the online British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects (everyone is welcome to attend, you just need to register).

You can see the projects submitted by members of staff and public for the awards in our online archive.

In 2019, last year's winner focused on the brilliant work of the Imaging Team for the 'Qatar Foundation Partnership Project Hack Days', which were sessions organised for the team to experiment with the Library's digital collections. 

The runner-up for the BL Labs Staff Award in 2019 was the Heritage Made Digital team and their social media campaign to promote the British Library's digital collections one language a week from letters 'A' to 'U' #AToUnknown).

In the public Awards, last year's winners (2019) drew attention to artisticresearchteaching & learning, and community activities that used our data and / or digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.

12 October 2020

Fiction Readers Wanted for PhD Research Study

Add comment

This a guest post is by British Library collaborative doctoral student Carol Butler, you can follow her on twitter as @fantomascarol.

Update: Due to a phenomenal response, Carol has recruited enough interviewees for the study, so the link to the application form has been removed (13/10/2020).

In 2016 I started a PhD project in partnership with the British Library and the Centre for Human-Computer Interaction Design (CHCID) at City, University of London. My research has focused on the phenomena of fiction authors interacting with readers through online media, such as websites, forums and social media, to promote and discuss their work. My aim is to identify potential avenues for redesigning or introducing new technology to better support authors and readers. I am now in my fourth and final year, aiming to complete my research this winter.

The internet has impacted how society interacts with almost everything, and literature has been no exception. It’s often thought that if a person or a business is not online, they are effectively invisible, and over the last ten years or so it has become increasingly common – expected, even - for authors to have an online presence allowing readers, globally, to connect with them.

Opportunities for authors and readers to interact together existed long before the internet, through events such as readings, signings, and festivals. The internet does not replace these – indeed, festivals have grown in popularity in recent years, and many have embraced technology to broaden their engagement outside of the event itself. However, unlike organised events, readers and authors can potentially interact online far more directly, outside of formal mediation. Perceived benefits from this disintermediation are commonly hailed – i.e. that it can break down access barriers for readers (e.g. geography and time, so they can more easily learn about the books they enjoy and the person behind the story), and help authors to better understand their market and the reception to their books. However, being a relatively new phenomenon, we don’t know much yet about how interacting with each other online may differ to doing so at a festival or event, and what complications the new environment may introduce to the experience, or even exacerbate. It is this research gap that my work has been addressing.

Early in my research, I conducted interviews with fiction authors and readers who use different online technologies (e.g. social media such as Twitter and Facebook, forums such as Reddit, or literary-specific sites such as GoodReads) to interact with other readers and authors. All participants generously shared their honest, open accounts about what they do, where and why, and where they encounter problems. It became clear that, although the benefits to being online are widely accepted and everyone had good experiences to report, in reality, people’s reasons for being online were riddled with contradictions, and, in some cases, it was debatable whether the positives outweighed the negatives, or whether the practice served a meaningful purpose at all. Ultimately – it’s complex, and not everything we thought we knew is necessarily as clear cut as it’s often perceived. 

This led me to make a U-turn in my research. Before working out how to improve technology to better support interactions as they currently stand, I needed to find out more about people’s motivations to be online, and to question whether we were focused on the right problem in the first place. From this I’ve been working to reframe how we, in the research field of Human-Computer Interaction, may understand the dynamics between authors and readers, by building a broader picture of context and influences in the literary field.

I’m going to write another blog post in the coming months to talk about what I’ve found, and what I think we need to focus on in the near future. In particular, I think it is important to improve support for authors, as many find themselves in a tricky position because of the expectation that they are available and public-facing, effectively 24/7. However, before I expand on that, I am about to embark on one final study to address some outstanding questions I have about the needs of their market – fiction readers. 

Over the next few weeks, I will be recruiting people who read fiction – whether they interact online about reading or not - to join me for what I am informally referring to as ‘an interview with props’. This study is happening a few months later than I’d originally intended, as restrictions in relation to Covid-19 required me to change my original plans (e.g. to meet people face-to-face). My study has ‘gone digital’, changing how I can facilitate the sessions, and what I can realistically expect from them.

I will be asking people to join me to chat online, using Zoom, to reflect on a series of sketched interface design ideas I have created, and to discuss their current thoughts about authors being available online. The design sketches represent deviations from the technology currently in common use - some significant, and some subtle. The designs are not being tested on behalf of any affiliated company, and neither do I necessarily anticipate any of them to be developed into working technology in the future. Ultimately, they are probes to get us talking about broader issues surrounding author and reader interactions, and I’m hoping that by getting peoples perspectives about them, I’ll learn more about why the designs *don’t* work, moreover why they do, to help inform future research and design work.

I’ve been ‘umming and ahhing’ about how best to share these designs with participants through a digital platform. Sitting together in the same room, as I’d originally planned, we could all move them around, pick them up, take a red pen to them, make notes on post-its, and sketch alternative ideas on paper. There are fantastic online technologies available these days, which have proved invaluable during this pandemic. But they can’t provide the same experience that being physically present together can (a predicament which, perhaps ironically, is fitting with the research problem itself!).

A screen image of the Miro platform, showing a drawing of a person wearing glasses, with a text box underneath saying Favourite Author
A sneaky peek at a sketch in the making, on Miro

I have decided to use a website called Miro.com to facilitate the study – an interactive whiteboard tool that allows participants to add digital post-it notes, doodles, and more. I’ve never used it before now, and to my knowledge there is no published research out there (yet) by others in my research field who have used it with participants, for me to learn from their experience. I think I must prepare myself for a few technical glitches! But I am hopeful that participants will enjoy the experience, which will be informal, encouraging, and in no way a judgement of their abilities with the technology. I am confident that their contribution will greatly help my work – and future work which will help authors and readers in the real world.

If anyone who is reading this is interested in participating, please do get in touch. Information about the study and how to contact me can be found here or please email carol.butler@city.ac.uk.

Update: Due to a phenomenal response, Carol has recruited enough interviewees for the study, so the link to the application form has been removed (13/10/2020). Thanks to everyone who has applied.

23 September 2020

Mapping Space, Mapping Time, Mapping Texts

Add comment

For many people, our personal understanding of time has been challenged during the covid-19 pandemic, with minutes, hours and days of the week seeming to all merge together into "blursday", without our previous pre covid-19 routines to help us mark points in time.

Talking of time, the AHRC-funded Chronotopic Cartographies research project has spent the last few years investigating how we might use digital tools to analyse, map, and visualise the spaces, places and time within literary texts. It draws on the literary theorist Mikhail Bakhtin's concept of the 'chronotope': a way of describing how time and place are linked and represented in different literary genres.

To showcase research from this project, next Tuesday (29th September 2020) we are co-hosting with them an online interdisciplinary conference: "Mapping Space, Mapping Time, Mapping Texts". 

Many blue dots connected with purple lines, behind text saying Mapping Space, Mapping Time, Mapping Texts

The "Mapping Space, Mapping Time, Mapping Texts" registration page is here. Once you have signed up, you will receive an email with links to recorded keynotes and webinar sessions. You will also received an email with links to the Flickr wall of virtual research posters and hangout spaces, on the morning of the conference.

The conference will go live from 09.00 BST, all webinars and live Q&A sessions will be held in Microsoft Teams. If you don't have Teams installed, you can do so before the event here. We appreciate that many participants will be joining from different time zones and that attendees may want to dip in and out of sessions; so please join at whatever pace suits you.

Our keynote speakers: James Kneale, Anders Engberg-Pederson and Robert T. Tally Jr have provided recordings of their presentations and will be joining the event for live Q&A sessions over the course of the day. You can watch the keynote recordings at any time, but if you want to have the conference experience, then log in to the webinars at the times below so you can participate "live" across the day. Q&A sessions will be held after each keynote at the times below. 

Schedule:

9.00 BST: Conference goes live, keynotes and posters available online, urls sent via email.

9.30: Short introduction and welcome from Sally Bushell

10.00-11.00: First Keynote: James Kneale

11.00-11.30: Live Q&A (chaired by Rebecca Hutcheon)

2.00-3.00: Second Keynote: Anders Engberg-Pedersen

3.00-3.30: Live Q&A (chaired by Duncan Hay)

5.00-6.00: Third Keynote: Robert T. Tally Jr

6.00-6.30: Live Q&A (chaired by Sally Bushell)

In the breaks between sessions, please do browse the online Flickr wall of research posters and hang out in conference virtual chat room.

We very much look forward to seeing you on-screen, on the day (remember it is Tuesday, not Blursday!).

This post is by Digital Curator Stella Wisdom (@miss_wisdom

11 September 2020

BL Labs Public Awards 2020: enter before 0700 GMT Monday 30 November 2020!

Add comment

The sixth BL Labs Public Awards 2020 formally recognises outstanding and innovative work that has been carried out using the British Library’s data and / or digital collections by researchers, artists, entrepreneurs, educators, students and the general public.

The closing date for entering the Public Awards is 0700 GMT on Monday 30 November 2020 and you can submit your entry any time up to then.

Please help us spread the word! We want to encourage any one interested to submit over the next few months, who knows, you could even win fame and glory, priceless! We really hope to have another year of fantastic projects to showcase at our annual online awards symposium on the 15 December 2020 (which is open for registration too), inspired by our digital collections and data!

This year, BL Labs is commending work in four key areas that have used or been inspired by our digital collections and data:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Educational - Quality learning experiences created for learners of any age and ability that use the Library's digital content.
  • Community - Work that has been created by an individual or group in a community.

What kind of projects are we looking for this year?

Whilst we are really happy for you to submit your work on any subject that uses our digital collections, in this significant year, we are particularly interested in entries that may have a focus on anti-racist work or projects about lock down / global pandemic. We are also curious and keen to have submissions that have used Jupyter Notebooks to carry out computational work on our digital collections and data.

After the submission deadline has passed, entries will be shortlisted and selected entrants will be notified via email by midnight on Friday 4th December 2020. 

A prize of £150 in British Library online vouchers will be awarded to the winner and £50 in the same format to the runner up in each Awards category at the Symposium. Of course if you enter, it will be at least a chance to showcase your work to a wide audience and in the past this has often resulted in major collaborations.

The talent of the BL Labs Awards winners and runners up over the last five years has led to the production of remarkable and varied collection of innovative projects described in our 'Digital Projects Archive'. In 2019, the Awards commended work in four main categories – Research, Artistic, Community and Educational:

BL_Labs_Winners_2019-smallBL  Labs Award Winners for 2019
(Top-Left) Full-Text search of Early Music Prints Online (F-TEMPO) - Research, (Top-Right) Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction - Artistic
(Bottom-Left) John Faucit Saville and the theatres of the East Midlands Circuit - Community commendation
(Bottom-Right) The Other Voice (Learning and Teaching)

For further detailed information, please visit BL Labs Public Awards 2020, or contact us at labs@bl.uk if you have a specific query.

Posted by Mahendra Mahey, Manager of British Library Labs.

04 September 2020

British Library Joins Share-VDE Linked Data Community

Add comment

This blog post is by Alan Danskin, Collection Metadata Standards Manager, British Library. metadata@bl.uk

What is Share-VDE and why has the British Library joined the Share-VDE Community?

Share-VDE is a library-driven initiative bringing library catalogues together in a shared Virtual Discovery Environment.  It uses linked data technology to create connections between bibliographic information contributed by different institutions

Example SVDE page showing Tim Berners-Lee linked info to publications, wikipedia, and other external sites
Figure 1: SVDE page for Sir Tim Berners-Lee

For example, searching for Sir Tim Berners-Lee retrieves metadata contributed by different members, including links to his publications. The search also returns links to external sources of information, including Wikipedia.

The British Library will be the first institution to contribute its national bibliography to Share-VDE and we also plan to contribute our catalogue data. By collaborating with the Share-VDE community we will extend access to information about our collections and services and enable information to be reused.

The Library also contributes to Share-VDE by participating on community groups working to develop the metadata model and Share-VDE functionality. This provides us with a practical approach for bridging differences between the IFLA Library Reference Model (LRM) and the Bibframe initiative, led by Library of Congress.

Share VDE is promoted by the international bibliographic agency Casalini Libri and @Cult, a solutions developer working in the cultural heritage sector.

Andrew MacEwan, Head of Metadata at the British Library, explained that, “Membership of the Share-VDE community is an exciting opportunity to enrich the Library’s metadata and open it up for re-use by other institutions in a linked data environment.”

Tiziana Possemato, Chief Information Officer at Casalini Libri and Director of @Cult, said "We are delighted to collaborate with the British Library and extremely excited about unlocking the wealth of data in its collections, both to further enrich the Virtual Discovery Environment and to make the Library's resources even more accessible to users."

For further information about:

SHARE-VDE  

Linked Data

Linked Open Data

The British Library is the national library of the United Kingdom and one of the world's greatest research libraries. It provides world class information services to the academic, business, research and scientific communities and offers unparalleled access to the world's largest and most comprehensive research collection. The Library's collection has developed over 250 years and exceeds 150 million separate items representing every age of written civilisation and includes books, journals, manuscripts, maps, stamps, music, patents, photographs, newspapers and sound recordings in all written and spoken languages. Up to 10 million people visit the British Library website - www.bl.uk - every year where they can view up to 4 million digitised collection items and over 40 million pages.

Casalini Libri is a bibliographic agency producing authority and bibliographic data; a library vendor, supplying books and journals, and offering a variety of collection development and technical services; and an e-content provider, working both for publishers and libraries.

@Cult is a software development company, specializing in data conversion for LD; and provider of Integrated Library System and Discovery tools, delivering effective and innovative technological solutions to improve information management and knowledge sharing.

24 August 2020

Not Just for Kids: UK Digital Comics, from creation to consumption

Add comment

This is a guest post by Linda Berube, an AHRC Collaborative Doctoral Partnership student based at the British Library and City, University of London. If you would like to know more about Linda's research, please do email her at Linda.Berube@city.ac.uk.

“There are those who claim that Britain no longer has a comics industry.” (John Freeman, downthetubes.net, quoting Lewis Stringer)

Freeman goes onto say that despite the evidence supporting such a view (have you ever really looked at a WH Smith comics rack? He has: see his photo of one here), the British comics industry is not just all licenced content from the United States, and it has continued to produce new publications. Maybe the newsstand is not necessarily the best place to look for them.
For the newsstand does not tell the whole story. Comics are not all kiddie and superhero characters now, if they ever were (Sabin 1993). Not that there is anything wrong with that content, but prevailing attitudes about the perceived lack of seriousness of these types of comics can inhibit a consideration of comics as cultural objects in their own right, worthy of research. Novelist Susan Hill (2017) expresses a widely held view when she stated: "Is it better for young people to read nothing at all than read graphic novels-which are really only comics for an older age group?". No amount of book awards, academic departments or academic journals have eliminated such sentiments[1].

The best place for looking at all UK comics have to offer is online. Digital comics have not only brought a whole new audience but new creators, as well as new business models and creative processes. My Arts and Humanities Research Council’s Collaborative Doctoral Partnership Programme funded research will take a deep dive into these models and processes, from creation to consumption. For this work, I have the considerable support of supervisors Ian Cooke and Stella Wisdom (British Library) and Ernesto Priego and Stephann Makri (Human-Computer Interaction Design Centre, City, University of London)[2].

A cartoon of a spaceship on the left and a large smartphone screen on the right, showing two people talking to each other
Figure 1: Charisma.ai uses innovative technology to create comics

This particular point in time offers an excellent opportunity to consider the digital comics, and specifically UK, landscape. We seem to be past the initial enthusiasm for digital technologies when babies and bathwater were ejected with abandon (see McCloud 2000, for example), and probably still in the middle of a retrenchment, so to speak, of that enthusiasm (see Priego 2011 pp278-280). To date, there have been few attempts at viewing the creation to consumption process of print comics in their entirety, and no complete studies of the production and communication models of digital comics. While Benatti (2019) analysed the changes to the roles of authors, readers, and publishers prompted by the creation of webcomics, she admits that “the uncertain future of the comics print communications circuit makes the establishment of a parallel digital circuit…more necessary than ever for the development of the comics medium”. (p316)

Screen capture of a website showing the covers of three comics, the first comic shows a rocket leaving earth, the second a Christmas wreath and a pair of crutches, the third 4 people next to a beach
Figure 2: Helen Greetham is part of the international Spider Forest Webcomic collective, one way of distributing and marketing digital comics

Benatti was using the wider publishing industry’s process models and the disruption caused by digital technology as a lens through which to view webcomics. Indeed, historians have discovered cohesive patterns in the development of ideas, especially as embodied in print books. These patterns, most often described as cycles, chains, or circuits, follow the book through various channels of creation, production, and consumption. (See Darnton 1982, diagram of Communication Circuit below, for example). However, they have undergone a significant transformation, disruption even, when considered in the context of the digital environment (Murray and Squires 2013 have update Darnton for the digital and self-publishing age). And at first, it seemed that the disruption would prove terminal for certain types of communication, but most especially books and newspapers in print.

A diagram of Darntons Communication Circuit
Figure 3: Robert Darnton’s Communication Circuit

What about the production patterns for comics within this publishing context? Have print comics given way to digital comics? And are digital comics the revolution they once seemed?
My research, a scoping study in its first year looking at the UK comics landscape and interviewing comics gatekeepers-mediators (CGMs)[3], seeks to address the gap in the understanding of the creation to consumption process for digital comics. This first year’s work will be followed up by research into the creative process of digital comics writers and artists and what readers might contribute to that process. It will be the first such research to investigate cohesive patterns and production models through interdisciplinary empirical research for UK digital comics: analysing how an idea and digital comic object is formed, communicated, discussed and transformed by all the participants involved, from authors to CGMs to readers.

References:

Benatti, Francesca (2019). ‘Superhero comics and the digital communications circuit: a case study of Strong Female Protagonist’, Journal of Graphic Novels and Comics,10 (3), pp306-319. Available at: DOI: 10.1080/21504857.2018.1485720.

Darnton, R. (1982). ‘What Is the History of Books?’ Daedalus,111(3), pp65-83. Available at: www.jstor.org/stable/20024803.  Also available at:  https://dash.harvard.edu/bitstream/handle/1/3403038/darnton_historybooks.pdf

Freeman, John (2020).   ‘British Comics Industry Q&A’, downthetubes.net: exploring comics and more on the web since 1998. Quoting British comics creator and archivist Lew Stringer in a 2015 assessment of news stand comics on his Blimey! It’s Another Blog About Comics blog.  Available at: https://downthetubes.net/?page_id=7110).

Hill, Susan (2017). Jacob’s Room Is Full of Books: A Year of Reading. Profile Books.

McCloud, Scott (2000). Reinventing Comics: How Imagination and Technology Are Revolutionizing an Art Form.  New York, N.Y: Paradox Press.

Murray, P.R.  and Squires, C. (2013). ‘Digital Publishing Communications Circuit’, Book 2.0, 3(1), pp3-23. Available at: DOI: https://doi.org/10.1386/btwo.3.1.3_1. See also: Stirling University, Book Unbound https://www.bookunbound.stir.ac.uk/research/.

Priego, Ernesto (2011). The Comic Book in the Age of Digital Reproduction. City, University of London. Journal contribution. https://doi.org/10.6084/m9.figshare.754575.v4.

Sabin, Roger (1993). Adult comics: An introduction. London: Routledge. See Part 1: Britain 1. The first adult comics 2. Kid's stuff 3.Underground comix  4. 2000AD: 'The Comic of tomorrow!'  5. Fandom and direct sales 6. 'Comics grow up!': dawn of the graphic novel  7.From boom to bust 8.Viz: 'More fun than a jammy bun!'  9. The future.


Footnotes

1. For example, the Pulitzer Prize[Maus]; The Guardian’s First Book Award 2001 [Jimmy Corrigan]; Man Booker Prize longlist [Sabrina], not to mention the Journal of Graphic Novels and Comics. The fact that graphic novels are singled out from comics here is another entire blog post… ↩︎

2. Ernesto does a nice line in comics himself: see Parables of Care. Creative Responses to Dementia Care, As Told by Carers and I Know How This Ends: Stories of Dementia Care, as well as The Lockdown Chronicles. ↩︎

3. The word ‘publisher’, at least in its traditional sense, just does not seem to apply to the various means of production and distribution. ↩︎