THE BRITISH LIBRARY

Digital scholarship blog

125 posts categorized "Experiments"

30 January 2019

Reading 35,000 Books: The UCD Contagion Project and the British Library Digital Corpus - Workshop & Roundtable

Add comment

A guest post by Geradine Meany, Professor of Cultural Theory in the School of English, Drama and Film and Derek Greene, Assistant Professor at the School of Computer Science, both at the University College Dublin who are organising a FREE workshop and roundtable together with the BL Labs team on Thursday 20 February 2019 at the British Library in London.

How do you set about finding specific references and thematic associations in the massive digital resource represented by the British Library Nineteenth Century Book Corpus, originally digitised through a collaboration with Microsoft?

The Contagion, Biopolitics and Cultural Memory project at UCD Dublin set out to illuminate culturally and historically specific understandings of disease and contagion that appear within the fiction in the corpus. In order to do so, the project team extracted over 35,000 unique volumes out of a total of 65,000 in English and built a searchable interface of 12.3 million individual pages of text, which can be filtered and sorted using the corpus metadata (e.g. author, title, year, etc). The interface incorporates an index of the topical catalogue of volumes used by the British Library from 1823-1985 (within Alston index). Using a combination of OCR text recognition and manual annotation, we have extracted data the two top levels of the index, covering over 98% of the English language texts in the corpus. So for the first time it is possible to reliably identify and extract fiction, drama, history, topography, etc, from the corpus.

35000books
Extracting data from 35,000 digitised books

To allow researchers to further filter the corpus to identify texts from niche topic areas, the interface supports the semi-automatic creation of word lexicons, built upon modern “word embedding” natural language processing methods. By combining the resulting lexicons with existing corpus metadata and the data extracted from digitised version of the Alston Index, researchers can efficiently create and export small topical sub-corpora for subsequent close reading.

The Contagion project team is currently using information retrieval and word embeddings to identify texts for close reading. This combination allows us to track key trends pertaining to illness and contagion in the corpus, and interpret these findings with particular reference to current and historical debates surrounding biopolitics, medical culture and migration. Clusters of associations between contagion, poverty and morality are identifiable within the corpus. However, to date our research indicates that Victorians were more worried about religious contamination from migrants and minorities than they were about contagious diseases.

A key feature of the project is the intersection of methodologies and concepts from English literature, automated text mining, and medical humanities. This involves using data analytics as a mode of interpretation not a substitute for it, a way of engaging with the extent and complexity of cultural production in the nineteenth century. Cultural data resists giving definitive yes or no answers to the questions put to it by researchers, but the more cultural data we analyse the better we can map the processes of cultural change and continuity, in all their complexity. The process of tracking themes, topics, and associations enabled by the new interface offers an opportunity to work with and far beyond the existing canon of nineteenth century fiction, itself radically expanded by the last 20 years of scholarship. The identification within the corpus of a very large collection of 3 volume novels indicates that the popular novel is very well represented, for example, while the ability to identify and extract ‘Collected Works’ indicates which writers their contemporaries expected to remain central to the tradition of fiction.

On February 20th 2019, the FREE ‘Reading 35,000 Books’ workshop and roundtable will present the project’s work to date, and will also include discussion by scholars of nineteenth century literature and the British Library Labs of the future development and use of the new searchable interface, including exporting topical sub-corpora for further research.

The event is supported by the Irish Research Council.

 

28 January 2019

BL Labs 2018 Teaching & Learning Award Winner: 'Pocket Miscellanies'

Add comment

This guest blog is by the 2018 BL Labs Teaching & Learning Award winner, Jonah Coman.

Pocket Miscellanies were born as a response to a cluster of problems posed by digitisation and access to medieval content. Medieval images are rarely seen by non-medievalists and members of the general public outside of meme-based content. Offline and analog, the medievalist has no freely-available tools to educate or illustrate to a non-specialist what their research is about. The digital and physical zines showcase close-reading snippets of the digitised medieval manuscripts held by the British Library, as well as over 70 other institutions.

PocketMisc fig 1

Figure 1. Leather binder with the first ten issues of Pocket Miscellanies. Photo © Eleanor May Baker.

Teaching and learning resource

The Pocket Miscellany choice of topics was selected to showcase the diversity of human representation in medieval manuscripts. This project is as political as it is educational. The first ten little volumes (#1 Adam, #2 Eve, #3 Temptation by the Snake, #4 Sex, #5 Sodom, #6 Trans bodies, #7 People of colour, #8 Racism #9 Disability and #10 Mobility aids) set up the political project of this ongoing collection, concentrating on disenfranchised communities, such as people of colour, LGBTQ people and disability in medieval visual culture. To date, there are ten published zines, but the project is expanded to include over 80 topics to be gradually released in the future.

The Pocket Miscellanies are distributed both online and offline as pocket-sized concertina books (usually distributed as collections), so that learners from different communities outwith most obvious user groups (researchers, teachers, educators) gain access to digital content provided by national, regional and university libraries with comprehensive medieval digital content.

Publication DIY: online and offline

From a feminist medievalist position, the format of the zine was the obvious choice for distributable political scholarship. Zines (short from magazines) are DIY radical publications that elide strictures of book publishing. Zine distribution models rely on sharing via social interaction: a zine can be a reminder of a discussion or political statement. Zines democratise knowledge that mainstream works might be afraid to tackle, or might be suppressed by mainstream publication systems concerned with sales rather than radical ideas. The small, folded formats native to zines are also reminiscent to the materiality and physical formats of medieval and early modern books created for English readers, such as the Sarum books of hours and the folding almanac.

The Pocket Miscellanies have two pathways to impact: the digital version has been shared with medievalist and historian teachers and educators via the Issuu publication platform, garnering nearly a thousand unique readers in the months they have been online. The paper copy, of very small size, can and was distributed at conferences (Bodies Ignored in Leeds, Permeable Bodies in London), other public events (Edinburgh Pride, Glasgow and Dundee Zine Fest, Edinburgh book art and comic book conventions) and to non-specialists in casual conversation. Over 3000 paper copies were printed and distributed for free since August. Both of these impact pathways have the advantage of accessibility - they are quick-and-dirty guides for non-specialists to learn about the most common depictions of a specific motif – as well as a history within DIY teaching community.

PocketMisc fig 2

Figure 2. Poster and zine display at the BL Labs Symposium, 11 Nov 2018. Photo © Ash.

The online version of the zines links to the digitised source hosted on the library’s own website, and is easily editable/correctable. After the initial publication of the online zines. Due to their digital form, each individual zine is permanently undergoing improvement via the open loop of online feedback and consumption facilitated by Twitter and Issuu. I use crowd-sourced information about the specific themes and amended the content to reflect spearheading scholarship in the field - information that has not been published yet, nor, sometimes, may be published in the future. This way, state-of-the-art research can be integrated in a quick publication and distribution circuit. 

PocketMisc fig 3

Figure 3. Screenshot from the Issuu.com/MxComan online library.

The paper copies are easily distributable in offline, analog spaces and provide a physical token of the learning experience. I use an independent publishing method historically widespread in queer communities, the zine, to create an analog version to 'viral content'. Zines are bricolage-fuelled, cheaply-printed, freely distributed and easily discarded methods of teaching and information. Using the independent publishing medium of the zine I created small chapbooks that can be printed at home, mixed and shared, carried in a pocket and left in community spaces and flier racks.

PocketMisc fig 4

Figure 4. A bundle of the ten original zines. Photo (c) Ana Hine.

Rip-and-mix: how copyright can the enemy of knowledge

Working with digitised content from tens of libraries across the world has proved frustrating because of the diversity of copyright policies. Modern libraries and research centres have a lot of power as gatekeepers of historical material. Texts and images that would be long out of copyright (virtually anything produced in the middle ages) is protected by many institutions under copy rights, prohibiting (esp. commercial) reproduction. This affects what images researchers choose to present to wider public; most academic publications will never be able to include the amount of colour illustrations that the self-published zine format allows. The collaborative and radical DIY ethics of zine-making allows Pocket Miscellanies to be a disruptive alternative to mainstream publication industry, bringing cutting-edge research in print (and full-colour illustration) right now, at very small costs and an extremely agile pace.

The whole issue of copyright is where zines have been historically and still are so radical. Reproduction rights are different than publication rights; strict reproduction and redistribution rights are essentially violated by any dissemination of an image anywhere else but on its origin website. Attaching a ‘medieval reaction’ image to a tweet or Facebook post, as well as pining it on a Pinterest board, are essentially in violation with the most museums’ and auction houses’ extremely strict CC-BY-NC-ND+ rules. On the other side, 'publication' rights are eschewed by zines since, technically, zines are not publications. Unlike magazines, journals or books, zines do not have ISBNs, cannot count towards REFs etc so are essentially outlaws in terms of publication rights. Unlike mainstream publications, zines are predicated on anarchist, bootleg, rip-and-mix aesthetic.

The Pocket Miscellany zines posed hard choices: do I follow the anarchic, disruptive and historically radical tradition of the zine, and use any digitised image that I can find, disregarding the copyright statements and challenging the hegemonical hold institutions have over historical images via aberrant legalities, or do I create a series of zines only with images obtained by legitimate venues, choosing academic strictures for the advantage of being able to share them far and wide without breaking copyright terms? In the end, the content of the zines, showing collections of the same visual motif in a context of continuity, dictated my choice: having as varied examples of one image as possible was more important than being able to sell these zines in bookshops and gift-shops. At the same time, I chose to only use images that are ok to use in a non-commercial capacity, so none from libraries with ‘non-derivatives’ policies. These choices (half-punk, half-tame) made selling these zines in any form and at any price point impossible, so their production relies on donations

The Pocket Miscellanies are an ongoing project. As I mentioned, I have over 80 topics planned, and half a dozen collaborations in the works. If you would want to share your expertise on a specific topic, please get in touch via Twitter @MxComan; if you want to support the project, as well as get your hands on some paper goodies, you can do so on Patreon. If you are organising a conference and you want to distribute any of the zines related to the conference, or even better, have me deliver an impact, public engagement and zine-making workshop at your conference, get in touch and we can discuss it further.

Watch Jonah receiving the winning award for Teaching and Learning, and talking about Pocket Miscellanies on our YouTube channel (clip runs from 10.32):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

25 January 2019

BL Labs 2018 Artistic Award Winner: 'Another Intelligence Sings'

Add comment

This guest blog is by the winners of the BL Labs Artistic Award for 2018, Robert Walker, Rose Leahy and Amanda Baum, for 'Another Intelligence Sings'.AI Sings 1

When the natural world is recorded, it is quantised for the human ear, to wavelengths within our perception and timeframes within our conception. Yet the machine learning algorithm sits outside the human sensorium, outside the human lifespan. An algorithm is agnostic to the source, the intention and the timescale of data. By feeding it audio samples of lava and larvae, geological tensions and fleeting courtship, the seismic and the somatic, the many voices of life are woven into a song no one lifespan or life form could sing.

Another Intelligence Sings ( AI Sings ) is an immersive audio-tactile installation inviting you to experience the sounds of our biological world as recounted through an AI. Through the application of neural networks to field recordings from the British Library sound archive a nonhuman reading of the data emerges. Presenting an alternative composition of Earth’s songs, AI Sings explores an expanded view of what might be perceived as intelligent.

The breadth of the British Library Wildlife and Environmental Sounds archive enabled us to take a cross section of the natural world from primordial physical phenomena to the great beasts of the savannas to the songbirds of the British countryside. The final soundscape is created from using two different neural networks, Wavenet and Nsynth. We trained Wavenet, Google’s most advanced human speech synthesis neural network, on many hours of field recordings, including those from the British Library archives.

Nsynth is an augmented version of Wavenet that was built and trained by Magenta, Google AI’s creative lab. Nsynth creates sounds that are not a simple crossfade or blend but something genuinely new based on the perceived formal musical qualities of the two source sounds. This was used to create mixtures between specific audio samples, for example, sea lion meets mosquito, leopard meets horse, and mealworm meets ocean.

Click here to play a 4 minute clip of the sound from the installation.

AI Sings 2
Through this use of the technology, AI Sings reorients the algorithm’s focus, away from the human expression of individual thought and towards an amalgam of geological and biological processes. The experience aims to enable humans to meditate on the myriad intelligences around and beyond us and expand our view of what might be perceived as intelligent. This feeds into our ongoing body of shared work, which raises questions about the use of artificial intelligence in society. Previously, we have used a neural network to find linguistic patterns not perceivable to human reading to mediate our collectively written piece Weaving Worlds (2016). In AI Sings we continue this thread of asking which perspectives an AI can bring that human perception cannot.

AI Sings 3

AI Sings takes digital archive content and makes it into a tactile, sensuous, and playful experience. By making the archive material an experiential encounter, we were able to encourage listeners to enter into a world where they could be immersed and engaged in the data. Soft, tactile materials such as hair and foam invited people to enter into and interact with the work. In particular, we found that the playful nature of the materials in the piece meant that children were keen to experience the work, and listen to the soundscape, thereby extending the audience of the archive material to one it may not usually reach.

By addressing the need for experiential, visceral and poetic encounters with AI, Another Intelligence Sings goes beyond the conceptual and engages people in the technology which is so rapidly transforming society. We hope this work shows how the creative application of AI opens up new possibilities in the field of archivology, from being a tool of categorisation to becoming a means of expanding the cultural role of the library in the future.

The piece premiered at the V&A Digital Design Weekend 2018 on 22nd of September as part of London Design Festival, where it was exhibited to over 22,000 visitors. Following the weekend we were invited by Open Cell, London’s newly opened bioart- and biodesign studio and exhibition space, to be showcased on their site.

More about the project can be found on our websites:

www.baumleahy.com + www.irr.co + www.amandabaum.com + www.roseleahy.com

Watch the AI Sings team receiving their award and talking about their project on our YouTube channel (clip runs from 8.18):

 

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

24 January 2019

Innovation Fellow for Interactive Fiction in the Emerging Formats Project

Add comment

There’s an episode of book shop-based comedy, Black Books, in which Fran, played by Tamsin Greig, starts a new job. She has no idea what her role actually consists of, and yet, somehow, she becomes good at it and delivers a rousing presentation, all while never fully understanding what she has done. Every new research project feels somewhat like this. There are usually continuities from previous projects, but because this one is new there will inevitably be new things you don’t know and how do you find out what you don’t know if you don’t know it?

Fortunately, thanks to the Library’s excellent Web Archiving, Contemporary British and Digital Scholarship teams, I’ve managed to fill in most of those blanks pretty quickly. My name’s Lynda Clark and I’m currently undertaking an AHRC/M3C Innovation Placement embarking on a six-month research project called ‘Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction’. My primary goals are to get a sense of the ‘shape’ of contemporary British web-based interactive fiction – the kinds of tools British creators are using and the works they are making with them; and to explore how those works might be preserved for future readers and researchers.

Product_boxes v2
Boxes created at a British Library hosted emerging formats project workshop

I’m a maker of interactive fiction myself and have produced a variety of works, often silly (almost always silly, in fact) but sometimes more serious, the most substantial of which was my interactive novella Writers Are Not Strangers, produced as part of my recently submitted creative-critical PhD thesis. Even amongst my own modest back catalogue there is a fair amount of variation in styles, interfaces and tools used, some of which I know will likely scupper the webcrawlers commonly used to archive web-based digital work. Six months isn’t long to find a solution to this challenge, but I’m hoping I can at the very least start to create a record of works to preserve and at least categorically determine what doesn’t work to enable future researchers to move towards what does.

This is where you come in. If you’re a UK-based creator of web-based interactive fiction, please nominate your work for inclusion in the UK Web Archive, where it could (technology permitting) be included in a collection. This will mean the system takes regular ‘snapshots’ of the nominated website and stores them forever! You can make your nominations via the UKWA’s site or by contacting me.

This post is by the Library's Innovation Fellow for Interactive Fiction Lynda Clark, on twitter as @Notagoth. You can find out more about the Library's Emerging Formats project here.

21 January 2019

Can you help us with user experience testing for books designed for mobile devices?

Add comment

On 7- 8th February, we’ll be running some user testing sessions, to help us understand how people might want to use new types of digital publications in our collections. We are interested in understanding what the user needs are for books published as apps or written specifically for use on mobile devices. If you use such publications in your work, or for reading for pleasure, we’d really like to hear from you.

We are carrying out these user testing sessions at the British Library in London and at the library of Trinity College Dublin on Thursday- Friday 7- 8 February. If you are interested in taking part, please follow the relevant link and complete the short form to sign up:

Tablet

The British Library and the other five UK Legal Deposit Libraries have been collecting various types of born-digital publications since 2013, as outlined in The Legal Deposit Libraries (Non-Print Works) Regulations. These publications mainly comprised eBooks and eJournals, as well as archived UK websites. Over the past couple of years, we have started up the “Emerging Formats” project, to investigate new forms of digital publications whose structure and interactive features are more complex and pose new challenges in terms of collection and preservation.

The Emerging Formats project focuses on three formats: eBook mobile apps, web-based interactive narratives, and structured data.

EBook apps are digital books published as mobile apps, incorporating storytelling into the interactive functionality of mobile technology. They often rely heavily on the specific hardware and software they were created for, strengthening the relationship between content and device. They cover many genres, from poetry and academic to cookery and children’s fiction. They are often compared to games, as they require a significant level of interaction and readers’ engagement for the story to progress. Inkle’s 80 Days, Faber&Faber’s T.S. Eliot’s The Waste Land and Nosy Crow’s Goldilocks and Little Bear are all relevant examples of eBook mobile apps.

Interactive narratives are text-based stories which rely on the reader to make decisions to determine how the narrative unfolds. While sharing interactive features with eBook mobile apps (as well as dependency on device functionalities, such as cameras and location tracking), this format is web-based and not packaged as an app. The genres of writing are again quite varied, although fiction seems to lend itself well to this particular format. Editions at Play, a collaboration between Visual Editions and Google’s Creative Lab, has published a number of interactive narratives, spanning from a ghost story personalised to the surrounding of the reader (Breathe) to a Google Street View-based love story (Entrances & Exits).

The British Library held a workshop last November for internal teams as well as external stakeholders to better understand what content has been created, why it is complex and what the challenges are in preserving these complex digital objects.

The next step for the Emerging Formats project is to understand users’ expectations and their requirements when accessing this type of publications. In order to achieve this, we are running some onsite user experience testing with the help on an external agency, Bunnyfoot Ltd.

We are looking for participants who have some familiarity with using mobile devices to read and interact with eBooks created for mobile platforms and/or web-based interactive narratives. However, there’s no requirement that they are “expert users” of any kind. We’d like to include people who use these types of publication in their research (e.g. Digital Humanities, experimental literature, Human Computer Interaction, Digital Media, education specialists etc); people who create publications of this type as part of their practice; as well as people who read these types of publications for pleasure.

We have two days of testing booked in, for Thursday 7th and Friday 8th February, at the British Library in London and at Trinity College in Dublin. The sessions will last for about 1 hour, and Bunnyfoot will offer a £50 incentive for anyone taking part.

If you are interested in taking part in our user experience testing, please complete the brief screening survey linked at the beginning of the post (make sure you select the one relevant to your location).

To find out further information about the Emerging Formats project, please see our project page.

This post is by Ian Cooke, Head of Contemporary British Publications, on twitter as @IanCooke13 and Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari.

17 January 2019

BL Labs 2018 Research Award Winner: 'The Delius Catalogue of Works'

Add comment

This guest blog is by the winners of the BL Labs Research Award for 2018: Joanna Bullivant and Daniel M. Grimley of the Faculty of Music, University of Oxford; and David Lewis and Kevin Page of the University of Oxford e-Research Centre.

The Delius Catalogue of Works is a new, freely accessible digital catalogue of the complete works of Frederick Delius (1862-1934).

Explore more here: https://delius.music.ox.ac.uk

The Delius Catalogue (DCW) was created as part of a project called ‘Delius, Modernism, and the Sound of Place’ (https://deliusmodernism.wordpress.com), a collaboration between the University of Oxford, the British Library, and the Royal Library, Denmark, which was funded by the Arts and Humanities Research Council. The project as a whole sought to better understand Delius and his music. Delius has been understood as an English portraitist, someone who wrote impressionistic works depicting natural scenes, whose music was strongly linked to the English landscape, and who had little interest in large-scale musical construction or in the details of performance.

However, Delius also lived and wrote music all over the world (in Scandinavia, Florida, Germany and France), and was the friend of many important modern artists, writers and musicians including Edvard Grieg, Edvard Munch, August Strindberg and Paul Gauguin. He also left behind very substantial sketches and other manuscripts that help us to understand his music, the vast majority of which are in the British Library.

Within the project, our aim in creating the DCW was to make a clear and up-to-date catalogue of Delius’s works which was both of a high scholarly standard and accessible to a variety of users (such as scholars, performers and students). We also wanted to integrate the catalogue as far as possible with the British Library’s own manuscript catalogue, to showcase the Library’s Delius collections and enable users of the catalogue to understand and have access to the physical manuscripts. This was a challenge both in terms of research (collecting and presenting information in a clear and concise manner) and web design (presenting it in the best possible manner).

Creating the catalogue was greatly helped by the decision to use MerMEId (Metadata Editor and Repository for MEI Data), specialist software created by Axel Teich Geertinger and his team at the Royal Library, Denmark, originally for creating a catalogue of the works of Carl Nielsen (http://www.kb.dk/dcm/cnw/navigation.xq). MerMEId is built on an eXist XML database with Lucene-based searching, and most of its functionality is implemented using xquery and xslt.

The core catalogue data is stored as MEI, an XML-based standard for the encoding and markup of musical data, inspired by TEI for text. MerMEId’s combination of open-source, standards-based technologies gave great flexibility to customise both the data model and the user interface to suit the application. In the DCW, we adapted genre categories, improved site accessibility, and adapted things like instrumental abbreviations and references to Delius reference works for our purposes. We also adapted the conceptual cataloguing model FRBR (https://www.ifla.org/files/assets/cataloguing/frbr/frbr_2008.pdf) in order to create records for each work that were narrative and hierarchical.

In the case of a work with a straightforward history like Brigg Fair, this meant adopting a standard presentational format in which the catalogue gave catalogue numbers, dedicatee, date of composition, a short introduction, duration, instrumentation, a musical incipit, and information in chronological order on manuscript sources, performance history and documents such as letters or bibliographic items:

Delius image 1

See: https://delius.music.ox.ac.uk/catalogue/document.html?doc=delius_briggfair.xml

In a work with a more complicated history like the Piano Concerto, however, the model may be adapted to show a long compositional process and multiple versions:

Delius image 2

See: https://delius.music.ox.ac.uk/catalogue/document.html?doc=delius_pianoconc.xml

By creating multiple “versions” of the work in MerMEId to reflect its journey through different stages of composition, and by noting extant manuscript and print sources and performances in each case, we can clearly and consistently both narrate the story of each work and show how existing sources and versions fit into it.

The data available in the British Library Archives and Manuscripts catalogue was essential for creating the Delius catalogue. At the ‘Sources’ level of each catalogue record, users can link directly to the manuscript and thus see how to access the physical manuscript, and how extant manuscripts relate to the history of each work, as in the Caprice and Elegy:

Delius image 3

See: https://delius.music.ox.ac.uk/catalogue/document.html?doc=delius_capriceelegy.xml

As well as fostering understanding of Delius’s works and their connection to the British Library’s outstanding manuscript collections, this project has led to exciting ongoing work. A subsequent project involving the same team involved digitising some of the British Library’s Delius manuscripts and other materials and creating a variety of articles, teaching resources and other metadata to showcase them. These are now part of the Library’s new online learning resource Discovering Music: https://www.bl.uk/20th-century-music.

We intend to expand our work to other composers, continuing to explore ways to make their music and manuscripts more accessible to a wide variety of people.

Watch Joanna Bullivant and David Lewis receiving their award on behalf of their team, and talking about their project on our YouTube channel (clip runs from 10.36):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

15 January 2019

The BL Labs Symposium, 2018

Add comment

On Monday 12th November, 2018, the British Library hosted the sixth annual BL Labs Symposium, celebrating all things digital at the BL. This was our biggest ever symposium with the conference centre at full capacity - proof, if any were needed, of the importance of using British Library digital collections and technologies for innovative projects in the heritage sector.

The delegates were welcomed by our Chief Executive, Roly Keating, and there followed a brilliant keynote by Daniel Pett, Head of Digital and IT at the Fitzwilliam Museum, Cambridge. In his talk, Dan reflected on his 3D modelling projects at the British Museum and the Fitzwilliam, and talked about the importance of experimenting with, re-imagining, and re-mixing cultural heritage digital collections in Galleries, Libraries, Archives and Museums (GLAMs).

This year’s symposium had quite a focus on 3D, with a series of fascinating talks and demonstrations throughout the day by visual artists, digital curators, and pioneers of 3D photogrammetry and data visualisation technologies. The full programme is still viewable on the Eventbrite page, and videos and slides of the presentations will be uploaded in due course.

Composite bl labs 2018 awardees

Each year, BL Labs recognises excellent work that has used the Library's digital content in five categories. The 2018 winners, runners up and honourable mentions were announced at the symposium and presented with their awards throughout the day. This year’s Award recipients were:

Research Award:

Winner: The Delius Catalogue of Works by Joanna Bullivant, Daniel Grimley, David Lewis and Kevin Page at the University of Oxford

Honourable Mention: Doctoral theses as alternative forms of knowledge: Surfacing ‘Southern’ perspectives on student engagement with internationalisation by Catherine Montgomery and a team of researchers at the University of Bath

Honourable Mention: HerStories: Sites of Suffragette Protest and Sabotage by Krista Cowman at the University of Lincoln and Rachel Williams, Tamsin Silvey, Ben Ellwood and Rosie Ryder of Historic England

Artistic Award:

Winner: Another Intelligence Sings by Amanda Baum, Rose Leahy and Rob Walker

Runner Up: Nomad by independent researcher Abira Hussein, and Sophie Dixon and Edward Silverton of Mnemoscene

Teaching & Learning Award:

Winner: Pocket Miscellanies by Jonah Coman

Runner Up: Pocahontas and After by Michael Walling, Lucy Dunkerley and John Cobb of Border Crossings

Commercial Award:

Winner: The Library Collection: Fashion Presentation at London Fashion Week, SS19 by Nabil Nayal in association with Colette Taylor of Vega Associates

Runner Up: The Seder Oneg Shabbos Bentsher by David Zvi Kalman, Print-O-Craft Press

Staff Award:

Winner: The Polonsky Foundation England and France Project: Manuscripts from the British Library and the Bibliothèque nationale de France, 700-1200 by Tuija Ainonen, Clarck Drieshen, Cristian Ispir, Alison Ray and Kate Thomas

Runner Up: The Digital Documents Harvesting and Processing Tool by Andrew Jackson, Sally Halper, Jennie Grimshaw and Nicola Bingham

The judging process is always a difficult one as there is such diversity in the kinds of projects that are up for consideration! So we wanted to also thank all the other entrants for their high quality submissions, and to encourage anyone out there who might be considering applying for a 2019 award!

We will be posting guest blogs by the award recipients over the coming months, so tune in to read more about their projects.

And finally, save the date for this year's symposium, which will be held at the British Library on Monday 11th November, 2019.

07 December 2018

Introducing an experimental format for learning about content mining for digital scholarship

Add comment

This post by the British Library’s Digital Curator for Western Heritage Collections, Dr Mia Ridge, reports on an experimental format designed to provide more flexible and timely training on fast-moving topics like text and data mining.

This post covers two topics – firstly, an update to the established format of sessions on our Digital Scholarship Training Programme (DSTP) to introduce ‘strands’ of related modules that cumulatively make up a ‘course’, and secondly, an overview of subjects we’ve covered related to content mining for digital scholarship with cultural heritage collections.

Introducing ‘strands’

The Digital Research team have been running the DSTP for some years now. It’s been very successful but we know that it's hard for people to get away for a whole day, so we wanted to break courses that might previously have taken 5 or 6 hours of a day into smaller modules. Shorter sessions (talks or hands-on workshops) only an hour or at most two long seemed to fit more flexibly into busy diaries. We can also reach more people with talks than with hands-on workshops, which are limited by the number of training laptops and the need to offer more individual

A 'strand' is a new, flexible format for learning and maintaining skills, with training delivered through shorter modules that combine to build attendees’ knowledge of a particular topic over time. We can repeat individual modules – for example, a shorter ‘Introduction to’ session might run more often, or target people with some existing knowledge for more advanced sessions. I haven’t formally evaluated it but I suspect that the ability to pick and choose sessions means that attendees for each module are more engaged, which makes for a better session for everyone. We've seen a lot of uptake – in some cases the 40 or so places available go almost immediately - so offering shorter sessions seems to be working.

Designing courses as individual modules makes it easier to update individual sections as technologies and platforms change. This format has several other advantages: staff find it easier to attend hour-long modules, and they can try out methods on their own collections between sessions. It takes time for attendees to collect and prepare their own data for processing with digital methods (not to mention preparation time and complexity for the instructor), so we've stayed away from this in traditional workshops.

New topics can be introduced on a 'just in time' basis as new tools and techniques emerge. This seemed to address lots of issues I was having in putting together a new course on content mining. It also makes it easier to tackle a new subject than the established 5-6 hour format, as I can pilot short sessions and use the lessons learnt in planning the next module.

The modular format also means we can invite international experts and collaborators to give talks on their specialisms with relatively low organisational overhead, as we regularly run ‘21st Century Curatorship’ talks for staff. We can link relevant staff talks, or our monthly ‘Hack and Yack’ and Digital Scholarship Reading Groups sessions to specific strands.

We originally planned to start each strand with an introductory module outlining key concepts and terms, but in reality we dived into the first one as we already had talks that'd fit lined up.

Content mining for digital scholarship with cultural heritage collections

Tom and Nora trying out AntConcFrom the course blurb: ‘Content mining (sometimes ‘text and data mining’) is a form of computational processing that uses automated analytical techniques to analyse text, images, audio-visual material, metadata and other forms of data for patterns, trends and other useful information. Content mining methods have been applied to digitised and digital historic, cultural and scientific collections to help scholars answer new research questions at scale, analysing hundreds or hundreds of thousands of items. In addition to supporting new forms of digital scholarship that apply content mining methods, methods like Named Entity Recognition or Topic Modelling can make collection items more discoverable. Content mining in cultural heritage draws on data science, 'distant reading' and other techniques to categorise items; identify concepts and entities such as people, places and events; apply sentiment analysis and analyse items at scale.’

An easily updatable mixture of introductory talks, tutorial sessions, hands-on workshops and case studies from external experts fit perfectly into the modular format, and it's worked out well, with a range of topics and formats offered so far. Sessions have included: an Introduction to Machine Learning; Computational models for detecting semantic change in historical texts (Dr Barbara McGillivray, Alan Turing Institute); Computer vision tools with Dr Giles Bergel, from the University of Oxford's Visual Geometry Group; Jupyter Notebooks/Python for simple processing and visualisations of data from In the Spotlight; Listening to the Crowd: Data Science to Understand the British Museum's Visitors (Taha Yasseri, Turing/OII); Visualising cultural heritage collections (Olivia Fletcher Vane, Royal College of Art); An Introduction to Corpus Linguistics for the Humanities (Ruth Byrne, BL and Lancaster PhD student); Corpus Analysis with AntConc.

What’s next?

My colleagues Nora McGregor, Stella Wisdom and Adi Keinan-Schoonbaert have some great ‘strands’ planned for the future, including Stella’s on ‘Emerging Formats’ and Adi’s on ‘Place’, so watch this space for updates!