THE BRITISH LIBRARY

Digital scholarship blog

90 posts categorized "Tools"

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

02 October 2019

The 2019 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate a British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2019 British Library Labs Staff Award, now in its fourth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is 12:00 (BST), Tuesday 5 November 2019.

Nominees will be highlighted on Monday 11 November 2019 at the British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects.

You can see the projects submitted by members of staff for the last two years' awards in our online archive, as well as blogs for last year's winners and runners-up.

The Staff Award complements the British Library Labs Awards, introduced in 2015, which recognise outstanding work that has been done in the broader community. Last year's winner focused on the brilliant work of the 'Polonsky Foundation England and France Project: Digitising and Presenting Manuscripts from the British Library and the Bibliothèque nationale de France, 700–1200'.

The runner up for the BL Labs Staff Award last year was the 'Digital Documents Harvesting and Processing Tool (DDHAPT)' which was designed to overcome the problem of finding individual known documents in the United Kingdom's Legal Deposit Web Archive.

In the public competition, last year's winners drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.

 

14 September 2019

BL Labs Awards 2019: enter before 2100 on Sunday 29th September! (deadline extended)

Add comment

We have extended our deadline for our BL Labs Awards to 21:00 (BST) on Sunday 29th September, submit your entry here. If you have already entered, you don't have to resubmit, however, we are happy to receive updated entries too.

The BL Labs Awards formally recognises outstanding and innovative work that has been created using the British Library’s digital collections and data.

Submit your entry, and help us spread the word to all interested parties!

This year, BL Labs is commending work in four key areas:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

After the submission deadline of 21:00 (BST) on Sunday 29th September for entering the BL Labs Awards has passed, the entries will be shortlisted. Selected shortlisted entrants will be notified via email by midnight BST on Thursday 10th October 2019. 

A prize of £500 will be awarded to the winner and £100 to the runner up in each Awards category at the BL Labs Symposium on 11th November 2019 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners up over the last four years has led to the production of a remarkable and varied collection of innovative projects. In 2018, the Awards commended work in four main categories – Research, Artistic, Commercial and Teaching & Learning:

Photo collage

  • Research category Award (2018) winner: The Delius Catalogue of Works: the production of a comprehensive catalogue of works by the composer Delius, based on research using (and integrated with) the BL’s Archives and Manuscripts Catalogue by Joanna Bullivant, Daniel Grimley, David Lewis and Kevin Page from Oxford University’s Music department.
  • Artistic Award (2018) winner: Another Intelligence Sings (AI Sings): an interactive, immersive sound-art installation, which uses AI to transform environmental sound recordings from the BL’s sound archive by Amanda Baum, Rose Leahy and Rob Walker independent artists and experience designers.
  • Commercial Award (2018) winner: Fashion presentation for London Fashion Week by Nabil Nayal: the Library collection - a fashion collection inspired by digitised Elizabethan-era manuscripts from the BL, culminating in several fashion shows/events/commissions including one at the BL in London.
  • Teaching and Learning (2018) winner: Pocket Miscellanies: ten online pocket-book ‘zines’ featuring images taken from the BL digitised medieval manuscripts collection by Jonah Coman, PhD student at Glasgow School of Art.

For further information about BL Labs or our Awards, please contact us at labs@bl.uk.

Posted by Mahendra Mahey, Manager of of British Library Labs.

13 September 2019

Results of the RASM2019 Competition on Recognition of Historical Arabic Scientific Manuscripts

Add comment

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

Earlier this year, the British Library in collaboration with PRImA Research Lab and the Alan Turing Institute launched a competition on the Recognition of Historical Arabic Scientific Manuscripts, or in short, RASM2019. This competition was held in the context of the 15th International Conference on Document Analysis and Recognition (ICDAR2019). It was the second competition of this type, following RASM2018 which took place in 2018.

The Library has an extensive collection of Arabic manuscripts, comprising of almost 15,000 works. We have been digitising several hundred manuscripts as part of the British Library/Qatar Foundation Partnership, making them available on Qatar Digital Library. A natural next-step would be the creation of machine-readable content from scanned images, for enhanced search and whole new avenues of research.

Running a competition helps us identify software providers and tool developers, as well as introduce us to the specific challenges that pattern recognition systems face when dealing with historic, handwritten materials. For this year’s competition we provided a ground truth set of 120 images and associated XML files: 20 pages to be used to train text recognition systems to automatically identify Arabic script, and a 100 pages to evaluate the training.

Aside from providing larger training and evaluation sets, for this year’s competition we’ve added an extra challenge – marginalia. Notes written in the margins are often less consistent and less coherent than main blocks of text, and can go in different directions. The competition set out three different challenges: page segmentation, text line detection and Optical Character Recognition (OCR). Tackling marginalia was a bonus challenge!

We had just one submission for this year’s competition – RDI Company, Cairo University, who previously participated in 2018 and did very well. RDI submitted three different methods, and participated in two challenges: text line segmentation and OCR. When evaluating the results, PRImA compared established systems used in industry and academia – Tesseract 4.0, ABBYY FineReader Engine 12 (FRE12), and Google Cloud Vision API – to RDI’s submitted methods. The evaluation approach was the same as last year’s, with PRImA evaluating page analysis and recognition methods using different evaluation metrics, in order to gain an insight into the algorithms.

 

Results

Challenge 1 - Page Layout Analysis

The first challenge was set out to identify regions in a page, and find out where blocks of text are located on the page. RDI did not participate in this challenge, therefore an analysis was made only on common industry software mentioned above. The results can be seen in the chart below:

Chart showing RASM2019 page segmentation results
Chart showing RASM2019 page segmentation results

 

Google did relatively well here, and the results are quite similar to last year’s. Despite dealing with the more challenging marginalia text, Google’s previous accuracy score (70.6%) has gone down only very slightly to a still impressive 69.3%.

Example image showing Google’s page segmentation
Example image showing Google’s page segmentation

 

Tesseract 4 and FRE12 scored very similarly, with Tesseract decreasing from last year’s 54.5%. Interestingly, FRE12’s performance on text blocks including marginalia (42.5%) was better than last year’s FRE11 performance without marginalia, scoring at 40.9%. Analysis showed that Tesseract and FRE often misclassified text areas as illustrations, with FRE doing better than Tesseract in this regard.

 

Challenge 2 - Text Line Segmentation

The second challenge looked into segmenting text into distinct text lines. RDI submitted three methods for this challenge, all of which returned the text lines of the main text block (as they did not wish to participate in the marginalia challenge). Results were then compared with Tesseract and FineReader, and are reflected below:

Chart showing RASM2019 text line segmentation results
Chart showing RASM2019 text line segmentation results

 

RDI did very well with its three methods, with an accuracy level ranging between 76.6% and 77.6%. However, despite not attempting to segments marginalia text lines, their methods did not perform as well as last year’s method (with 81.6% accuracy). Their methods did seem to detect some marginalia, though very little overall, as seen in the screenshot below.

Example image showing RDI’s text line segmentation results
Example image showing RDI’s text line segmentation results

 

Tesseract and FineReader again scored lower than RDI, both with decreasing accuracy compared to RASM2018’s results (Tesseract 4 with 44.2%, FRE11 with 43.2%). This is due to the additional marginalia challenge. The Google method does not detect text lines, therefore the Text Line chart above does not include their results.

 

Challenge 3 - OCR Accuracy

The third and last challenge was all about text recognition, tackling the correct identification of characters and words in the text. Evaluation for this challenge was conducted four times: 1) on the whole page, including marginalia, 2) only on main blocks of text, excluding marginalia, 3) using the original texts, and 4) using normalised texts. Text normalisation was performed for both ground truth and OCR results, due to the historic nature of the material, occasional unusual spelling, and use/lack of diacritics. All methods performed slightly better when not tested on marginalia; accuracy rates are demonstrated in the charts below:

Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
 
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)

 

It is evident that there are minor differences in the character accuracies for the three RDI methods, with RDI2 performing slightly better than the others. When comparing the OCR accuracy between texts with and without marginalia, there are slightly higher success rates for the latter, though the difference is not significant. This means that tested methods performed on the marginalia almost as well as they did on the main text, which is encouraging.

Comparing RASM2018’s results, RDI’s results are good but not as good as last year (with 85.44% accuracy), likely to be a result of adding marginalia to the recognition challenge. Google performed very well too, considering they did not specifically train or optimised for this competition. Tesseract’s results went down from 30.45% to 25.13%, and FineReader Engine 12 performed better than its previous version FRE11, going up from 12.23% to 17.53% accuracy. However, it is still very low, as handwritten texts are not part of their target material.

 

Further Thoughts

RDI-Corporation has its own historical Arabic handwritten and typewritten OCR system, which has been built using different historical manuscripts. Its methods have done well, given the very challenging nature of the documents. Neither Tesseract nor ABBYY FineReader produce usable results, but that’s not surprising since they are both optimised for printed texts, and target contemporary material and not historical manuscripts.

As next steps, we would like to test these materials with Transkribus, which produced promising results for early printed Indian texts (see e.g. Tom Derrick’s blog post – stay tuned for some even more impressive results!), and potentially Kraken as well. All ground truth will be released through the Library’s future Open Access repository (now in testing phase), as well as through the website of IMPACT Centre for Competence. Watch this space for any developments!

 

15 August 2019

Creating Geo-located Digital Sound Walks

Add comment

A few months ago, here at the British Library we held an interesting Exploring with Sound Walks event, that discussed digital projects that connect literature, sound recordings, place, technology and walking. Several digital tools were mentioned by the presenters at this event, so this post, by Marcin Barski, is a practical guide for creating geo-located sound walks.

We hope you are inspired to create your own walks, listen to sound walks, vote for your favourite (you need to be logged in to vote), and maybe attend one of the Sound Walk Sunday events taking place on the 1st and throughout the month of September 2019. If you can easily travel to London, you may also be interested in attending a Sound Walk Sunday walkshop in and around the British Library, taking place 10:00-13:00 on Saturday 31st August.

Man standing next to a tree, wearing headphones and listening to a sound walk experience
Image copyright Stefaan van Biesen

Over to Marcin for his advice on creating sound walks:

Let's start with some basic definitions. A sound walk is any activity that involves both walking and some form of listening. Listening is a much broader term than most of us would ever suspect. It can basically relate to the very act of giving attention to sound, but if we focus on details we soon realise that, as much as it happens mostly involuntarily, in certain circumstances and contexts we can direct it at the topics or phenomena that otherwise would have been lost in the very rich audiosphere that incessantly surrounds us.

It's rather important to understand that those topics and phenomena do not necessarily need to be of audial nature. Using pre-recorded sets of narratives, spoken word or studio-engineered music, we can make our audience aware of stories normally hidden from sight.

In recent years several tools have been developed that help sound walk artists, educators and creators place sounds in exact locations. Once placed, we need to tell our audience how to find and experience them. This can be achieved by voice instructions, QR codes or most commonly (and conveniently) by using mobile apps that determine the user's position via GPS and trigger sounds in the exact locations where we would like them to be heard.

Below you will find a quick guide on how to start creating your own geo-located sound walk, along with descriptions of some of the tools that can make the process smooth and stress free.

  1. Know your subject

The very first thing you will need to do is to decide what story you want to tell. Do you know it well enough already, or is there a need to do some research? Is the story self-explanatory, or will you need to explain some or all details to your audience? And - actually, maybe the first question to ask - is it a story at all? Some sound walks can be based on natural recordings and music only, meaning the whole covered area changes its character without a single word.

Once your story is ready, decide how it should be conveyed to your audience. Do you prefer to tell the story yourself, or maybe it will be better to find and interview other people with significant knowledge of the topic? Creating a sound walk can sometimes be similar to working on a radio piece, in which important elements are delivered by experts or insiders. Will you need to record everything yourself or is it possible to find archival sounds that would add something valuable to your content?

  1. Choose the area

In the next step you will need to decide what area should be covered with your sounds and narratives. Would you like the sounds to be located only in the described and meaningful locations, or would you prefer to have the whole area covered with some more or less abstract background recordings? In the latter case, you need to take into consideration that the larger the area, the more sounds will be needed to fill all the silent spaces.

Bear in mind that your audience will most likely experience your piece by foot, hence the distances between the particular spots should not be too large. The best results are achieved when distances between the sounds allow for smooth and undistracted strolling.

Remember to consider safety! Don't force your audience to walk in hazardous or restricted areas. They will be using headphones, so choose a route away from busy roads.

  1. Choose your tools

Each app you are going to use will come with specific requirements for the format or duration of your recordings. In most cases these requirements will be easy to meet, but make sure you are doing it correctly right from the beginning to avoid the hassle of file conversion or additional editing in later stages. You will find helpful descriptions of some of the available apps below.

Creating a geo-located sound walk can be fun not only for you, but also for others. Consider working in a group in which all of you have assigned tasks or subjects to cover. You will be surprised how much a socially creative activity it can become. Some people create sound walks with children, or with their local community groups.

  1. Recording and editing

We don't all carry high quality microphones in our pockets. Of course, if you want to create an audiophile experience, you will need to secure professional audio recorders and microphones, however technology is not a barrier anymore. You can even use your smartphone to make your recordings - their quality will definitely be good enough to record speech.

If you'd like to use background sounds, and you have no means of making the recordings yourself, there are repositories of sounds available on the Internet. Impressive collection of sounds can be found, for example, on the British Library's SoundCloud channel. You can also search at http://archive.org and http://freesound.org - which are available for you to use free of charge.

There's quite a number of open-source and free sound editing software on the Internet. If you're not a professional sound designer, most likely Audacity will be enough for you. It's easy to use and has all the features you may need. It's also quite popular so you will find many helpful tutorials online.

  1. Placing sounds

This is probably the most pleasant and at the same time the most challenging part of the work. Be prepared to spend hours in your chosen area and to have your patience tested. Although most of the apps allow for fairly accurate placing of sounds, you will need to test each single location yourself. Sometimes you will need to move a sound by a few metres, other times you will want to change the way in which two sounds interact with each other. Wear comfortable shoes and submit to the trial and error process. Despite the challenges, trust us, it's fun!

  1. Go public and advertise

Once you are sure that all of your recordings are out there and in the exact places you want them to be, you can make your walk public. In most of the available apps you can publish with just one click. And when it's public, don't forget to tell everyone to try it. It's very rewarding to hear back from your audience - you will realise how much your work has re-shaped their perception of the chosen space.

Person standing in front of a church building, they are wearing headphones and listening to a sound walk
Image copyright Stefaan van Biesen

Here is a list of digital tools and platforms available for making sound walks:

Echoes

Echoes gives you the freedom to explore breath-taking GPS-triggered audio tours wherever you are. With the Echoes Creator, you can quickly and easily upload audio, images, and text, geolocate them on the map, and publish them for the world to see. Just add shapes to the map, which create geofenced areas. These will trigger content when your listeners physically walk inside them.

Echoes is free to use and available at http://echoes.xyz

PlaceCloud

Placecloud's mission is to reveal the cultural significance of everyday places. To achieve this, they have invented something called 'placecasts', or place-specific podcasts: short audio recordings with GPS coordinates attached to them. Users can listen to them while being physically present in the places they refer to.

Placecloud keeps the process simple: many of the steps described above won't be necessary when working with this tool. By adding your recordings you become part of a wider community of 'placecasters'.

http://www.placecloud.io

VoiceMap

VoiceMap is a tool for digital storytelling in public spaces. It's designed for storytellers and passionate locals all over the world who can - in an easy way - share their thoughts and narratives about the places they live in. As a creator you can guide your audience around your city - and get paid for this.

http://voicemap.me

Locosonic

Similar to Echoes, Locosonic is designed for creating "movies for your ears" - as they call it. Locosonic Soundscapes link sounds, music and stories to a location. While exploring an area, you will hear the Soundscape that matches your location. Like an additional sense, Locosonic allows you to experience places through their stories and music.

http://www.locosonic.com

CGeomap

CGeomap  is a collaborative tool which allows people to work together on the same project. Very easy to use, it creates simultaneously an online map and a browser-based web app, geolocating audio, text and visual content, without the need to install on your device.

It is more limited in terms of sound than Echoes or Locosonic, but adds media to your walk, and generates simultaneously an online media map, accessible for all on desktop. An extra feature allows the user to shift, while walking, from one map to the other, activating up to three layers of content in one place.

Info: http://bit.ly/300hpMS

Aporee

radio aporee ::: miniatures for mobiles is a platform for (creating) sound walks. These are created and organised by a web-based editing tool and listened to with a mobile phone app, while walking outside, at the site where the piece is created for. In addition to the phone apps, a (prototype) browser-based web app is also available, without the need to install the app on your device.

https://aporee.org/mfm/


We hope you have fun making and listening to sound walks! Sound Walk Sunday events are taking place on the 1st and throughout the month of September 2019. One of them "Ecumenopolis – the whole world is one city", by Geert Vermeire, is being made for walkers around the British Library London, the State Library of Moscow, the National Library of Greece and the City Library of Sao Paulo, so we can't wait to listen to this work.

This post is introduced by Digital Curator Stella Wisdom (@miss_wisdom) and Andrew Stuck from the Museum of Walking.  Many thanks to Marcin Barski, curator, music publisher, sound and installation artist, co-founder of the Instytut Pejzażu Dźwiękowego (Polish Soundscape Institute) for writing this practical guide to creating sound walks.

10 June 2019

Collaborative Digital Scholarship in Action: A Case Study in Designing Impactful Student Learning Partnerships

Add comment

The Arts and Sciences (BASc) department at University College London has been at the forefront of pioneering a renascence of liberal arts and sciences degrees in the UK. As part of its Core modules offering, students select an interdisciplinary elective in Year 2 of their academic programme – from a range of modules specially designed for the department by University College London academics and researchers.

When creating my own module – Information Through the Ages (BASC0033) – as part of this elective set, I was keen to ensure that the student learning experience was both supported and developed in tandem with professional practices and standards, knowing that enabling students to progress their skills developed on the module beyond the module’s own assignments would aid them not only in their own unique academic degree programmes but also provide substantial evidence to future employers of their employability and skills base. Partnering with the British Library, therefore, in designing a data science and data curation project as part of the module’s core assignments, seemed to me to provide an excellent opportunity to enable both a research-based educative framework for students as well as a valuable chance for them to engage in a real-world collaboration, as providing students with external industry partners to collaborate with can contribute an important fillip to their motivation and the learning experience overall – by seeing their assessed work move beyond the confines of the academy to have an impact out in the wider world.

Through discussions with my British Library co-collaborators, Mahendra Mahey and Stella Wisdom, we alighted on the Microsoft Books/BL 19th Century collection dataset as providing excellent potential for student groups to work with for their data curation projects. With its 60,000 public domain volumes, associated metadata and 1 million+ extracted images, it presented as exciting, undiscovered territory across which our student groups might roam and rove, with the results of their work having the potential to benefit future British Library researchers.

Structuring the group project around wrangling a subset of this data: discovering, researching, cleaning and refining it, with the output from each group a curated version of the original dataset we therefore felt presented a number of significant benefits. Students were enabled to explore and develop technical skills such as data curation, software knowledge, archival research, report writing, project development and collaborative working practices, alongside experiencing a real world, digital scholarship learning experience – with the outcomes in turn supporting the British Library’s Digital Scholarship remit regards enabling innovative research based on the British Library digital collections.

Students observed that “working with the data did give me more practical insight to the field of work involved with digitisation work, and it was an enriching experience”, including how they “appreciated how involved and hands-on the projects were, as this is something that I particularly enjoy”. Data curation training was provided on site at the British Library, with the session focused on the use of OpenRefine, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”[1] Student feedback also told us that we could have provided further software training, and more guided dataset exploration/navigation resources, with groups keen to learn more nuanced data curation techniques – something we will aim to respond to in future iterations of the module – but overall, as one student succinctly noted, “I had no idea of the digitalization process and I learned a lot about data science. The training was very useful and I acquired new skills about data cleaning.”

Overall, we had five student groups wrangling the BL 19th Century collection, producing final data subsets in the following areas: Christian and Christian-related texts; Queens of Britain 1510-1946; female authors, 1800-1900 (here's a heatmap this student group produced of the spread of published titles by female authors in the 19th century); Shakespearean works, other author’s adaptations on those works, and any commentary on Shakespeare or his writing; and travel-related books.

In particular, it was excellent to see students fully engaging with the research process around their chosen data subset – exploring its cultural and institutional contexts, as well as navigating metadata/data schemas, requirements and standards.

For example, the Christian texts group considered the issue of different languages as part of their data subset of texts, following this up with textual content analysis to enable accurate record querying and selection. In their project report they noted that “[u]sing our dataset and visualisations as aids, we hope that researchers studying the Bible and Christianity can discover insights into the geographical and temporal spread of Christian-related texts. Furthermore, we hope that they can also glean new information regarding the people behind the translations of Bibles as well as those who wrote about Christianity.”

Similarly, the student group focused on travel-related texts discussed in their team project summary that “[t]he particular value of this curated dataset is that future researchers may be able to use it in the analysis of international points of view. In these works, many cities and nations are being written about from an outside perspective. This perspective is one that can be valuable in understanding historical relations and frames of reference between groups around the world: for instance, the work “Travels in France and Italy, in 1817 and 1818”, published in New York, likely provides an American perspective of Europe, while “Four Months in Persia, and a Visit to Trans-Caspia”, published in London, might detail an extended visit of a European in Persia, both revealing unique perspectives about different groups of people. A comparable work, that may have utilized or benefitted from such a collection, is Hahner’s (1998) “Women Through Women’s Eyes:Latin American Women in Nineteenth Century Travel Accounts.” In it, Hahner explores nineteenth century literature written to unearth the perspectives on Latin American women, specifically noting that the primarily European author’s writings should be understood in the context of their Eurocentric view, entrenched in “patriarchy” and “colonialism” (Hahner, 1998:21). Authors and researchers with a similar intent may use [our] curated British Library dataset comparably – that is, to locate such works.”

Data visualisation by travel books group
Data visualisation by travel books group
Data visualisation by travel books group
Data visualisation by travel books group

Over the ten weeks of the module, alongside their group data curation projects, students covered lecture topics as varied as Is a Star a Document?, "Truthiness" and Truth in a Post-Truth World, Organising Information: Classification, Taxonomies and Beyond!, and Information & Power; worked on an individual archival GIF project which drew on an institutional archival collection to create (and publish on social media) an animated GIF; and spent time in classroom discussions considering questions such as What happens when information is used for dis-informing or mis-informing purposes?; How do the technologies available to us in the 21st century potentially impact on the (data) collection process and its outputs and outcomes?; How might ideas about collections and collecting be transformed in a digital context?; What exactly do we mean by the concepts of Data and Information?; How we choose to classify or group something first requires we have a series of "rules" or instructions which determine the grouping process – but who decides on what the rules are and how might such decisions in fact influence our very understandings of the information the system is supposedly designed to facilitate access to? These dialogues were all situated within the context of both "traditional" collections systems and atypical sites of information storage and collection, with the module aiming to enable students to gain an in-depth knowledge, understanding and critical appreciation of the concept of information, from historical antecedents to digital scientific and cultural heritage forms, in the context of libraries, archives, galleries and museums (including alternative, atypical and emergent sources), and how technological, social, cultural and other changes fundamentally affect our concept of “information.”

“I think this module was particularly helpful in making me look at things in an interdisciplinary light”, one student observed in module evaluation feedback, with others going on to note that “I think the different formats of work we had to do was engaging and made the coursework much more interesting than just papers or just a project … the collaboration with the British Library deeply enriched the experience by providing a direct and visible outlet for any energies expended on the module. It made the material seem more applicable and the coursework more enjoyable … I loved that this module offered different ways of assessment. Having papers, projects, presentations, and creative multimedia work made this course engaging.”

Situating the module’s assessments within such contexts I hope encouraged students to understand the critical, interdisciplinary focus of the field of information studies, in particular the use of information in the context of empire-making and consolidation, and how histories of information, knowledge and power intersect. Combined with a collaborative, interdisciplinary curriculum design approach, which encouraged and supported students to gain technical abilities and navigate teamwork practices, we hope this module can point some useful ways forward in creating and developing engaging learning experiences, which have real world impact.

This blog post is by Sara Wingate-Gray (UCL Senior Teaching Fellow & BASC0033 module leader), Mahendra Mahey (BL Labs Manager) and Stella Wisdom (BL Digital Curator for Contemporary British Collections).

29 March 2019

Staying Late at the Library ... to Algorave

Add comment

Blog article by Algorave audio-visual artist Coral Manton. Coral is curating this British Library Lates Algorave in collaboration with British Library Events, BL Labs, Digital Scholarship and The Alan Turing Institute.

On the 5th April British Library Lates will host an Algorave in the atrium. Algorave artists will live-code music and visuals, writing code sequences generating algorithmic beats beneath the iconic Kings’ Library Tower.

Alex Mclean live coding on stage with light projections
Alex Mclean AKA Yaxu

The scene grew out of a reaction to ‘black-boxing’ in electronic music - where the audience is unable to interface with the ‘live-ness’ of what the performer is making. Nothing is hidden at an Algorave. In an Algorave you can see what the performer is doing through code projected onto walls in realtime. The creative process is open and shared with the audience. Code is shared freely. Performers share their screens with the crowd, taking them on a journey through making - unmaking - remaking, thought processes laid bare in lines of improvised code weaving it’s way through practised shaping of sound.

Carol Manton live coding on stage with light projections
Coral Manton AKA Coral

As a female coder, becoming part of the Algorave community has led me to reflect on the power of seeing women coding live, and how this encourages greater participation from women. Algorave attempts to maintain a positive gender balance. More than this the joy of seeing women confidently and openly experimenting with code, sharing their practise, making mistakes, revelling in uncertainty and error, crashing-restarting-crashing again to cheers from the supportive crowd willing the performances to continue sharing the anarchic joy of failure in a community where failure leads to new possibilities.

Shelly Knotts and Joanne Armitage live coding on stage with rear light projections
ALGOBABEZ AKA Shelly Knotts and Joanne Armitage

Algorave is a fun word - an algorithmic rave - a scene where people come to together to create and dance to music generate by code. Technically Algorave is described as "sounds wholly or partly characterised by the emission of a succession of repetitive conditionals”. The performers writes
 lines of code that create cyclic patterns of music, layered to create an evolving composition. The same is applied to the visuals: live coded audio reactive patterns, showing shapes bouncing, revolving, repeating to the beat of the music. All of this creates a shared club experience like no other.

Visual Artists Antonio Robert AKA hellocatfood: “I like to do Algorave because I think it runs an otherwise perfect black box computer into a live performance instrument. Playing at an Algorave forces me to abandon what I know and respond to everything happening around me. It shows me that even something as meticulously designed as a computer is a living tool that is subject to randomness and mistakes.”

Antonio Roberts live coding on stage with rear light projections
Antonio Roberts AKA hellocatfood

Algorave is an open, non-hierarchical global community, with it’s hub in Sheffield. There have been Algoraves in over 50 cities around the world. Algorave is not a franchise, it is a free culture, anyone can put on an Algorave - however their approach should align with the ethos of the community. Algorave collapses hierarchies - headliners are generally frowned upon. Diversity is key to the Algorave community. Algorave is open to everyone and actively promotes diversity in line-ups and audiences. The community is active both online and at live events organised by community members. The software people use is created within the community and open-source. There is little barrier to participation. If you are interested in Algorave come along, speak to the performers, join the online community, download some software
(e.g. IXI LangpuredataMax/MSPSuperColliderExtemporeFluxus, TidalCyclesGibberSonic PiFoxDot and Cyril) and get coding.

If this sounds like your scene or you want to know more, please join us at the Algorave Late Event. Tickets available here: https://www.bl.uk/events/late-at-the-library-algorave

Also check out https://algorave.com & https://toplap.org

26 March 2019

BL Labs Staff Award Runners Up: 'The Digital Documents Harvester'

Add comment

This guest blog is by Jennie Grimshaw on behalf of her team who were the BL Labs Staff Award runners up for 2018.

Harvest Haystack uk

The UK Legal Deposit Web Archive (LDWA) contains terabytes of data harvested from the UK web domain. It has a public search interface at https://webarchive.org.uk/ , but finding individual documents in what is in effect a vast unstructured dataset is challenging. The analogy of looking for a needle in a haystack comes to mind as being entirely appropriate.

The Digital Documents Harvesting and Processing Tool (DDHAPT) was designed to overcome the problem of finding individual known documents in the LDWA. It is an adaptation of the web archiving software that enables selectors to set up regular in-depth crawls of target, document heavy websites. The system then extracts new pdfs published since its previous visit from the target websites and presents them to the selector in a list with the most recent at the top:

DDH image 1

The selector can then view an image of the document on the screen by clicking on the title. If the document is in scope, basic metadata is created by completing an on-screen form. If the document doesn’t make the grade for the creation of an individual record, it can be removed from the list of new documents for selection by clicking on the green Ignore button on the right of the screen.

The metadata we create records the title and subtitle, publication year and publisher, edition, series, personal and corporate authors and ISBN (if present). Some fields such as title, publication year and publisher are automatically populated.  A broad subject heading is assigned from a pick list. Our aim is to create a “good enough” record that can stand without upgrading by the digital cataloguers, avoiding double handling.

DDH image 2

To save time and avoid transcription errors system allows the selector to highlight information in the document such as personal author, publisher, series title or ISBN. You then mouse up, which calls up a list of fields. Clicking on the appropriate field automatically transfers the data into it.

DDH image 3

Once the metadata has been created, the selector clicks on a submit button which starts the process of loading it into the British Library catalogue and the catalogues of the other five legal deposit libraries – the national libraries of Scotland and Wales, the Universities of Oxford and Cambridge, and Trinity College Dublin. The document remains in the Legal Deposit Web Archive. Its URL in the web archive is recorded in the metadata and creates the link between the document and its catalogue record. Readers who find the record in the British Library’s public catalogue or those of any of the legal deposit libraries can then click on the “I want this” button and view the document on screen.

The DDHAPT is currently being used to monitor the publications of Westminster government departments and help us ensure that future generations of researchers can reliably access known official documents via the catalogues of the six legal deposit libraries. However, we intend to extend its use to cover the output of other non-commercial publishers such as campaigning charities, think tanks, academic research centres, and pressure groups as a way of making their archived publications easily discoverable.

Normally material collected under the non-print legal deposit regulations can only be viewed by law on the premised on one of the six legal deposit libraries. However, the Libraries have negotiated licences with the UK government and many other non-commercial online publishers that allow us to make their archived websites and the documents on them open and available remotely. These licences lift non-print legal deposit restrictions and allow us to make the documents covered by them available 24/7 from anywhere in the world.

In these ways the DDHAPT improves the discoverability of non-commercially published documents collected under non-print legal deposit, facilitates metadata creation through auto-population of some fields, and avoids double handling through creation of good quality metadata at the point of selection.

Watch the Digital Documents Harvester team receiving their award and talking about their project on our YouTube channel (clip runs from 8.15 to 14.45):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.