THE BRITISH LIBRARY

Digital scholarship blog

15 posts categorized "Science"

11 September 2020

BL Labs Public Awards 2020: enter before 0700 GMT Monday 30 November 2020!

Add comment

The sixth BL Labs Public Awards 2020 formally recognises outstanding and innovative work that has been carried out using the British Library’s data and / or digital collections by researchers, artists, entrepreneurs, educators, students and the general public.

The closing date for entering the Public Awards is 0700 GMT on Monday 30 November 2020 and you can submit your entry any time up to then.

Please help us spread the word! We want to encourage any one interested to submit over the next few months, who knows, you could even win fame and glory, priceless! We really hope to have another year of fantastic projects to showcase at our annual online awards symposium on the 15 December 2020 (which is open for registration too), inspired by our digital collections and data!

This year, BL Labs is commending work in four key areas that have used or been inspired by our digital collections and data:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Educational - Quality learning experiences created for learners of any age and ability that use the Library's digital content.
  • Community - Work that has been created by an individual or group in a community.

What kind of projects are we looking for this year?

Whilst we are really happy for you to submit your work on any subject that uses our digital collections, in this significant year, we are particularly interested in entries that may have a focus on anti-racist work or projects about lock down / global pandemic. We are also curious and keen to have submissions that have used Jupyter Notebooks to carry out computational work on our digital collections and data.

After the submission deadline has passed, entries will be shortlisted and selected entrants will be notified via email by midnight on Friday 4th December 2020. 

A prize of £150 in British Library online vouchers will be awarded to the winner and £50 in the same format to the runner up in each Awards category at the Symposium. Of course if you enter, it will be at least a chance to showcase your work to a wide audience and in the past this has often resulted in major collaborations.

The talent of the BL Labs Awards winners and runners up over the last five years has led to the production of remarkable and varied collection of innovative projects described in our 'Digital Projects Archive'. In 2019, the Awards commended work in four main categories – Research, Artistic, Community and Educational:

BL_Labs_Winners_2019-smallBL  Labs Award Winners for 2019
(Top-Left) Full-Text search of Early Music Prints Online (F-TEMPO) - Research, (Top-Right) Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction - Artistic
(Bottom-Left) John Faucit Saville and the theatres of the East Midlands Circuit - Community commendation
(Bottom-Right) The Other Voice (Learning and Teaching)

For further detailed information, please visit BL Labs Public Awards 2020, or contact us at labs@bl.uk if you have a specific query.

Posted by Mahendra Mahey, Manager of British Library Labs.

20 May 2020

Bringing Metadata & Full-text Together

Add comment

This is a guest post by enthusiastic data and metadata nerd Andy Jackson (@anjacks0n), Technical Lead for the UK Web Archive.

In Searching eTheses for the openVirus project we put together a basic system for searching theses. This only used the information from the PDFs themselves, which meant the results looked like this:

openVirus EThOS search results screen
openVirus EThOS search results screen

The basics are working fine, but the document titles are largely meaningless, the last-modified dates are clearly suspect (26 theses in the year 1600?!), and the facets aren’t terribly useful.

The EThOS metadata has much richer information that the EThOS team has collected and verified over the years. This includes:

  • Title
  • Author
  • DOI, ISNI, ORCID
  • Institution
  • Date
  • Supervisor(s)
  • Funder(s)
  • Dewey Decimal Classification
  • EThOS Service URL
  • Repository (‘Landing Page’) URL

So, the question is, how do we integrate these two sets of data into a single system?

Linking on URLs

The EThOS team supplied the PDF download URLs for each record, but we need a common identifer to merge these two datasets. Fortunately, both datasets contain the EThOS Service URL, which looks like this:

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.755301

This (or just the uk.bl.ethos.755301 part) can be used as the ‘key’ for the merge, leaving us with one data set that contains the download URLs alongside all the other fields. We can then process the text from each PDF, and look up the URL in this metadata dataset, and merge the two together in the same way.

Except… it doesn’t work.

The web is a messy place: those PDF URLs may have been direct downloads in the past, but now many of them are no longer simple links, but chains of redirects. As an example, this original download URL:

http://repository.royalholloway.ac.uk/items/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

Now redirects (HTTP 301 Moved Permanently) to the HTTPS version:

https://repository.royalholloway.ac.uk/items/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

Which then redirects (HTTP 302 Found) to the actual PDF file:

https://repository.royalholloway.ac.uk/file/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

So, to bring this all together, we have to trace these links between the EThOS records and the actual PDF documents.

Re-tracing Our Steps

While the crawler we built to download these PDFs worked well enough, it isn’t quite a sophisticated as our main crawler, which is based on Heritrix 3. In particular, Heritrix offers details crawl logs that can be used to trace crawler activity. This functionality would be fairly easy to add to Scrapy, but that’s not been done yet. So, another approach is needed.

To trace the crawl, we need to be able to look up URLs and then analyse what happened. In particular, for every starting URL (a.k.a. seed) we want to check if it was a redirect and if so, follow that URL to see where it leads.

We already use content (CDX) indexes to allow us to look up URLs when accessing content. In particular, we use OutbackCDX as the index, and then the pywb playback system to retrieve and access the records and see what happened. So one option is to spin up a separate playback system and query that to work out where the links go.

However, as we only want to trace redirects, we can do something a little simpler. We can use the OutbackCDX service to look up what we got for each URL, and use the same warcio library that pywb uses to read the WARC record and find any redirects. The same process can then be repeated with the resulting URL, until all the chains of redirects have been followed.

This leaves us with a large list, linking every URL we crawled back to the original PDF URL. This can then be used to link each item to the corresponding EThOS record.

This large look-up table allowed the full-text and metadata to be combined. It was then imported into a new Solr index that replaced the original service, augmenting the records with the new metadata.

Updating the Interface

The new fields are accessible via the same API as before – see this simple search as an example.

The next step was to update the UI to take advantage of these fields. This was relatively simple, as it mostly involved exchanging one field name for another (e.g. from last_modified_year to year_i), and adding a few links to take advantage of the fact we now have access to the URLs to the EThOS records and the landing pages.

The result can be seen at:

EThOS Faceted Search Prototype

The Results

This new service provides a much better interface to the collection, and really demonstrates the benefits of combining machine-generated and manually curated metadata.

New openVirus EThOS search results interface
New improved openVirus EThOS search results interface

There are still some issues with the source data that need to be resolved at some point. In particular, there are now only 88,082 records, which indicates that some gaps and mismatches emerged during the process of merging these records together.

But it’s good enough for now.

The next question is: how do we integrate this into the openVirus workflow? 

 

14 May 2020

Searching eTheses for the openVirus project

Add comment

This is a guest post by Andy Jackson (@anjacks0n), Technical Lead for the UK Web Archive and enthusiastic data-miner.

Introduction

The COVID-19 outbreak is an unprecedented global crisis that has prompted an unprecedented global response. I’ve been particularly interested in how academic scholars and publishers have responded:

It’s impressive how much has been done in such a short time! But I also saw one comment that really stuck with me:

“Our digital libraries and archives may hold crucial clues and content about how to help with the #covid19 outbreak: particularly this is the case with scientific literature. Now is the time for institutional bravery around access!”
– @melissaterras

Clearly, academic scholars and publishers are already collaborating. What could digital libraries and archives do to help?

Scale, Audience & Scope

Almost all the efforts I’ve seen so far are focused on helping scientists working on the COVID-19 response to find information from publications that are directly related to coronavirus epidemics. The outbreak is much bigger than this. In terms of scope, it’s not just about understanding the coronavirus itself. The outbreak raises many broader questions, like:

  • What types of personal protective equipment are appropriate for different medical procedures?
  • How effective are the different kinds of masks when it comes to protecting others?
  • What coping strategies have proven useful for people in isolation?

(These are just the examples I’ve personally seen requests for. There will be more.)

Similarly, the audience is much wider than the scientists working directly on the COVID-19 response. From medical professions wanting to know more about protective equipment, to journalists looking for context and counter-arguments.

As a technologist working at the British Library, I felt like there must be some way I could help this situation. Some way to help a wider audience dig out any potentially relevant material we might hold?

The openVirus Project

While looking out for inspiration, I found Peter Murray-Rust’s openVirus project. Peter is a vocal supporter of open source and open data, and had launched an ambitious attempt to aggregate information relating to viruses and epidemics from scholarly publications.

In contrast to the other efforts I’d seen, Peter wanted to focus on novel data-mining methods, and on pulling in less well-known sources of information. This dual focus on text analysis and on opening up underutilised resources appealed to me. And I already had a particular resource in mind…

EThOS

Of course, the British Library has a very wide range of holdings, but as an ex-academic scientist I’ve always had a soft spot for EThOS, which provides electronic access to UK theses.

Through the web interface, users can search the metadata and abstracts of over half a million theses. Furthermore, to support data mining and analysis, the EThOS metadata has been published as a dataset. This dataset includes links to institutional repository pages for many of the theses.

Although doctoral theses are not generally considered to be as important as journal articles, they are a rich and underused source of information, capable of carrying much more context and commentary than a brief article[1].

The Idea

Having identified EThOS as source of information, the idea was to see if I could use our existing UK Web Archive tools to collect and index the full-text of these theses, build a simple faceted search interface, and perform some basic data-mining operations. If that worked, it would allow relevant theses to be discovered and passed to the openVirus tools for more sophisticated analysis.

Preparing the data sources

The links in the EThOS dataset point to the HTML landing-page for each theses, rather than to the full text itself. To get to the text, the best approach would be to write a crawler to find the PDFs. However, it would take a while to create something that could cope with the variety of ways the landing pages tend to be formatted. For machines, it’s not always easy to find the link to the actual theses!

However, many of the universities involved have given the EThOS team permission to download a copy of their theses for safe-keeping. The URLs of the full-text files are only used once (to collect each thesis shortly after publication), but have nevertheless been kept in the EThOS system since then. These URLs are considered transient (i.e. likely to ‘rot’ over time) and come with no guarantees of longer-term availability (unlike the landing pages), so are not included in the main EThOS dataset. Nevertheless, the EThOS team were able to give me the list of PDF URLs, making it easier to get started quickly.

This is far from ideal: we will miss theses that have been moved to new URLs, and from universities that do not take part (which, notably, includes Oxford and Cambridge). This skew would be avoided if we were to use the landing-page URLs provided for all UK digital theses to crawl the PDFs. But we need to move quickly.

So, while keeping these caveats in mind, the first task was to crawl the URLs and see if the PDFs were still there…

Collecting the PDFs

A simple Scrapy crawler was created, one that could read the PDF URLs and download them without overloading the host repositories. The crawler itself does nothing with them, but by running behind warcprox the web requests and responses (including the PDFs) can be captured in the standardised Web ARChive (WARC) format.

For 35 hours, the crawler attempted to download the 130,330 PDF URLs. Quite a lot of URLs had already changed, but 111,793 documents were successfully downloaded. Of these, 104,746 were PDFs.

All the requests and responses generated by the crawler were captured in 1,433 WARCs each around 1GB in size, totalling around 1.5TB of data.

Processing the WARCs

We already have tools for handling WARCs, so the task was to re-use them and see what we get. As this collection is mostly PDFs, Apache Tika and PDFBox are doing most of the work, but the webarchive-discovery wrapper helps run them at scale and add in additional metadata.

The WARCs were transferred to our internal Hadoop cluster, and in just over an hour the text and associated metadata were available as about 5GB of compressed JSON Lines.

A Legal Aside

Before proceeding, there’s legal problem that we need to address. Despite being freely-available over the open web, the rights and licenses under which these documents are being made available can be extremely varied and complex.

There’s no problem gathering the content and using it for data mining. The problem is that there are limitations on what we can redistribute without permission: we can’t redistribute the original PDFs, or any close approximation.

However, collections of facts about the PDFs are fine.

But for the other openVirus tools to do their work, we need to be able to find out what each thesis are about. So how can we make this work?

One answer is to generate statistical summaries of the contents of the documents. For example, we can break the text of each document up into individual words, and count how often each word occurs. These word frequencies are a no substitute for the real text, but are redistributable and suitable for answering simple queries.

These simple queries can be used to narrow down the overall dataset, picking out a relevant subset. Once the list of documents of interest is down to a manageable size, an individual researcher can download the original documents themselves, from the original hosts[2]. As the researcher now has local copies, they can run their own tools over them, including the openVirus tools.

Word Frequencies

second, simpler Hadoop job was created, post-processing the raw text and replacing it with the word frequency data. This produced 6GB of uncompressed JSON Lines data, which could then be loaded into an instance of the Apache Solr search tool [3].

While Solr provides a user interface, it’s not really suitable for general users, nor is it entirely safe to expose to the World Wide Web. To mitigate this, the index was built on a virtual server well away from any production systems, and wrapped with a web server configured in a way that should prevent problems.

The API this provides (see the Solr documentation for details) enables us to find which theses include which terms. Here are some example queries:

This is fine for programmatic access, but with a little extra wrapping we can make it more useful to more people.

APIs & Notebooks

For example, I was able to create live API documentation and a simple user interface using Google’s Colaboratory:

Using the openVirus EThOS API

Google Colaboratory is a proprietary platform, but those notebooks can be exported as more standard Jupyter Notebooks. See here for an example.

Faceted Search

Having carefully exposed the API to the open web, I was also able to take an existing browser-based faceted search interface and modify to suite our use case:

EThOS Faceted Search Prototype

Best of all, this is running on the Glitch collaborative coding platform, so you can go look at the source code and remix it yourself, if you like:

EThOS Faceted Search Prototype – Glitch project

Limitations

The main limitation of using word-frequencies instead of full-text is that phrase search is broken. Searching for face AND mask will work as expected, but searching for “face mask” doesn’t.

Another problem is that the EThOS metadata has not been integrated with the raw text search. This would give us a much richer experience, like accurate publication years and more helpful facets[4].

In terms of user interface, the faceted search UI above is very basic, but for the openVirus project the API is likely to be of more use in the short term.

Next Steps

To make the search more usable, the next logical step is to attempt to integrate the full-text search with the EThOS metadata.

Then, if the results look good, we can start to work out how to feed the results into the workflow of the openVirus tool suite.

 


1. Even things like negative results, which are informative but can be difficult to publish in article form. ↩︎

2. This is similar data sharing pattern used by Twitter researchers. See, for example, the DocNow Catalogue. ↩︎

3. We use Apache Solr a lot so this was the simplest choice for us. ↩︎

4. Note that since writing this post, this limitation has been rectified. ↩︎

 

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

10 June 2019

Collaborative Digital Scholarship in Action: A Case Study in Designing Impactful Student Learning Partnerships

Add comment

The Arts and Sciences (BASc) department at University College London has been at the forefront of pioneering a renascence of liberal arts and sciences degrees in the UK. As part of its Core modules offering, students select an interdisciplinary elective in Year 2 of their academic programme – from a range of modules specially designed for the department by University College London academics and researchers.

When creating my own module – Information Through the Ages (BASC0033) – as part of this elective set, I was keen to ensure that the student learning experience was both supported and developed in tandem with professional practices and standards, knowing that enabling students to progress their skills developed on the module beyond the module’s own assignments would aid them not only in their own unique academic degree programmes but also provide substantial evidence to future employers of their employability and skills base. Partnering with the British Library, therefore, in designing a data science and data curation project as part of the module’s core assignments, seemed to me to provide an excellent opportunity to enable both a research-based educative framework for students as well as a valuable chance for them to engage in a real-world collaboration, as providing students with external industry partners to collaborate with can contribute an important fillip to their motivation and the learning experience overall – by seeing their assessed work move beyond the confines of the academy to have an impact out in the wider world.

Through discussions with my British Library co-collaborators, Mahendra Mahey and Stella Wisdom, we alighted on the Microsoft Books/BL 19th Century collection dataset as providing excellent potential for student groups to work with for their data curation projects. With its 60,000 public domain volumes, associated metadata and 1 million+ extracted images, it presented as exciting, undiscovered territory across which our student groups might roam and rove, with the results of their work having the potential to benefit future British Library researchers.

Structuring the group project around wrangling a subset of this data: discovering, researching, cleaning and refining it, with the output from each group a curated version of the original dataset we therefore felt presented a number of significant benefits. Students were enabled to explore and develop technical skills such as data curation, software knowledge, archival research, report writing, project development and collaborative working practices, alongside experiencing a real world, digital scholarship learning experience – with the outcomes in turn supporting the British Library’s Digital Scholarship remit regards enabling innovative research based on the British Library digital collections.

Students observed that “working with the data did give me more practical insight to the field of work involved with digitisation work, and it was an enriching experience”, including how they “appreciated how involved and hands-on the projects were, as this is something that I particularly enjoy”. Data curation training was provided on site at the British Library, with the session focused on the use of OpenRefine, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”[1] Student feedback also told us that we could have provided further software training, and more guided dataset exploration/navigation resources, with groups keen to learn more nuanced data curation techniques – something we will aim to respond to in future iterations of the module – but overall, as one student succinctly noted, “I had no idea of the digitalization process and I learned a lot about data science. The training was very useful and I acquired new skills about data cleaning.”

Overall, we had five student groups wrangling the BL 19th Century collection, producing final data subsets in the following areas: Christian and Christian-related texts; Queens of Britain 1510-1946; female authors, 1800-1900 (here's a heatmap this student group produced of the spread of published titles by female authors in the 19th century); Shakespearean works, other author’s adaptations on those works, and any commentary on Shakespeare or his writing; and travel-related books.

In particular, it was excellent to see students fully engaging with the research process around their chosen data subset – exploring its cultural and institutional contexts, as well as navigating metadata/data schemas, requirements and standards.

For example, the Christian texts group considered the issue of different languages as part of their data subset of texts, following this up with textual content analysis to enable accurate record querying and selection. In their project report they noted that “[u]sing our dataset and visualisations as aids, we hope that researchers studying the Bible and Christianity can discover insights into the geographical and temporal spread of Christian-related texts. Furthermore, we hope that they can also glean new information regarding the people behind the translations of Bibles as well as those who wrote about Christianity.”

Similarly, the student group focused on travel-related texts discussed in their team project summary that “[t]he particular value of this curated dataset is that future researchers may be able to use it in the analysis of international points of view. In these works, many cities and nations are being written about from an outside perspective. This perspective is one that can be valuable in understanding historical relations and frames of reference between groups around the world: for instance, the work “Travels in France and Italy, in 1817 and 1818”, published in New York, likely provides an American perspective of Europe, while “Four Months in Persia, and a Visit to Trans-Caspia”, published in London, might detail an extended visit of a European in Persia, both revealing unique perspectives about different groups of people. A comparable work, that may have utilized or benefitted from such a collection, is Hahner’s (1998) “Women Through Women’s Eyes:Latin American Women in Nineteenth Century Travel Accounts.” In it, Hahner explores nineteenth century literature written to unearth the perspectives on Latin American women, specifically noting that the primarily European author’s writings should be understood in the context of their Eurocentric view, entrenched in “patriarchy” and “colonialism” (Hahner, 1998:21). Authors and researchers with a similar intent may use [our] curated British Library dataset comparably – that is, to locate such works.”

Data visualisation by travel books group
Data visualisation by travel books group
Data visualisation by travel books group
Data visualisation by travel books group

Over the ten weeks of the module, alongside their group data curation projects, students covered lecture topics as varied as Is a Star a Document?, "Truthiness" and Truth in a Post-Truth World, Organising Information: Classification, Taxonomies and Beyond!, and Information & Power; worked on an individual archival GIF project which drew on an institutional archival collection to create (and publish on social media) an animated GIF; and spent time in classroom discussions considering questions such as What happens when information is used for dis-informing or mis-informing purposes?; How do the technologies available to us in the 21st century potentially impact on the (data) collection process and its outputs and outcomes?; How might ideas about collections and collecting be transformed in a digital context?; What exactly do we mean by the concepts of Data and Information?; How we choose to classify or group something first requires we have a series of "rules" or instructions which determine the grouping process – but who decides on what the rules are and how might such decisions in fact influence our very understandings of the information the system is supposedly designed to facilitate access to? These dialogues were all situated within the context of both "traditional" collections systems and atypical sites of information storage and collection, with the module aiming to enable students to gain an in-depth knowledge, understanding and critical appreciation of the concept of information, from historical antecedents to digital scientific and cultural heritage forms, in the context of libraries, archives, galleries and museums (including alternative, atypical and emergent sources), and how technological, social, cultural and other changes fundamentally affect our concept of “information.”

“I think this module was particularly helpful in making me look at things in an interdisciplinary light”, one student observed in module evaluation feedback, with others going on to note that “I think the different formats of work we had to do was engaging and made the coursework much more interesting than just papers or just a project … the collaboration with the British Library deeply enriched the experience by providing a direct and visible outlet for any energies expended on the module. It made the material seem more applicable and the coursework more enjoyable … I loved that this module offered different ways of assessment. Having papers, projects, presentations, and creative multimedia work made this course engaging.”

Situating the module’s assessments within such contexts I hope encouraged students to understand the critical, interdisciplinary focus of the field of information studies, in particular the use of information in the context of empire-making and consolidation, and how histories of information, knowledge and power intersect. Combined with a collaborative, interdisciplinary curriculum design approach, which encouraged and supported students to gain technical abilities and navigate teamwork practices, we hope this module can point some useful ways forward in creating and developing engaging learning experiences, which have real world impact.

This blog post is by Sara Wingate-Gray (UCL Senior Teaching Fellow & BASC0033 module leader), Mahendra Mahey (BL Labs Manager) and Stella Wisdom (BL Digital Curator for Contemporary British Collections).

25 January 2019

BL Labs 2018 Artistic Award Winner: 'Another Intelligence Sings'

Add comment

This guest blog is by the winners of the BL Labs Artistic Award for 2018, Robert Walker, Rose Leahy and Amanda Baum, for 'Another Intelligence Sings'.AI Sings 1

When the natural world is recorded, it is quantised for the human ear, to wavelengths within our perception and timeframes within our conception. Yet the machine learning algorithm sits outside the human sensorium, outside the human lifespan. An algorithm is agnostic to the source, the intention and the timescale of data. By feeding it audio samples of lava and larvae, geological tensions and fleeting courtship, the seismic and the somatic, the many voices of life are woven into a song no one lifespan or life form could sing.

Another Intelligence Sings ( AI Sings ) is an immersive audio-tactile installation inviting you to experience the sounds of our biological world as recounted through an AI. Through the application of neural networks to field recordings from the British Library sound archive a nonhuman reading of the data emerges. Presenting an alternative composition of Earth’s songs, AI Sings explores an expanded view of what might be perceived as intelligent.

The breadth of the British Library Wildlife and Environmental Sounds archive enabled us to take a cross section of the natural world from primordial physical phenomena to the great beasts of the savannas to the songbirds of the British countryside. The final soundscape is created from using two different neural networks, Wavenet and Nsynth. We trained Wavenet, Google’s most advanced human speech synthesis neural network, on many hours of field recordings, including those from the British Library archives.

Nsynth is an augmented version of Wavenet that was built and trained by Magenta, Google AI’s creative lab. Nsynth creates sounds that are not a simple crossfade or blend but something genuinely new based on the perceived formal musical qualities of the two source sounds. This was used to create mixtures between specific audio samples, for example, sea lion meets mosquito, leopard meets horse, and mealworm meets ocean.

Click here to play a 4 minute clip of the sound from the installation.

AI Sings 2
Through this use of the technology, AI Sings reorients the algorithm’s focus, away from the human expression of individual thought and towards an amalgam of geological and biological processes. The experience aims to enable humans to meditate on the myriad intelligences around and beyond us and expand our view of what might be perceived as intelligent. This feeds into our ongoing body of shared work, which raises questions about the use of artificial intelligence in society. Previously, we have used a neural network to find linguistic patterns not perceivable to human reading to mediate our collectively written piece Weaving Worlds (2016). In AI Sings we continue this thread of asking which perspectives an AI can bring that human perception cannot.

AI Sings 3

AI Sings takes digital archive content and makes it into a tactile, sensuous, and playful experience. By making the archive material an experiential encounter, we were able to encourage listeners to enter into a world where they could be immersed and engaged in the data. Soft, tactile materials such as hair and foam invited people to enter into and interact with the work. In particular, we found that the playful nature of the materials in the piece meant that children were keen to experience the work, and listen to the soundscape, thereby extending the audience of the archive material to one it may not usually reach.

By addressing the need for experiential, visceral and poetic encounters with AI, Another Intelligence Sings goes beyond the conceptual and engages people in the technology which is so rapidly transforming society. We hope this work shows how the creative application of AI opens up new possibilities in the field of archivology, from being a tool of categorisation to becoming a means of expanding the cultural role of the library in the future.

The piece premiered at the V&A Digital Design Weekend 2018 on 22nd of September as part of London Design Festival, where it was exhibited to over 22,000 visitors. Following the weekend we were invited by Open Cell, London’s newly opened bioart- and biodesign studio and exhibition space, to be showcased on their site.

More about the project can be found on our websites:

www.baumleahy.com + www.irr.co + www.amandabaum.com + www.roseleahy.com

Watch the AI Sings team receiving their award and talking about their project on our YouTube channel (clip runs from 8.18):

 

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

14 March 2018

Working with BL Labs in search of Sir Jagadis Chandra Bose

Add comment

The 19th Century British Library Newspapers Database offers a rich mine of material to be sourced for a comprehensive view of British life in the nineteenth and early twentieth century. The online archive comprises 101 full-text titles of local, regional, and national newspapers across the UK and Ireland, and thanks to optical character recognition, they are all fully searchable. This allows for extensive data mining across several millions worth of newspaper pages. It’s like going through the proverbial haystack looking for the equally proverbial needle, but with a magnet in hand.

For my current research project on the role of the radio during the British Raj, I wanted to find out more about Sir Jagadis Chandra Bose (1858–1937), whose contributions to the invention of wireless telegraphy were hardly acknowledged during his lifetime and all but forgotten during the twentieth century.

J.C.Bose
Jagadish Chandra Bose in Royal Institution, London
(Image from Wikimedia Commons)

The person who is generally credited with having invented the radio is Guglielmo Marconi (1874–1937). In 1909, he and Karl Ferdinand Braun (1850–1918) were awarded the Nobel Prize in Physics “in recognition of their contributions to the development of wireless telegraphy”. What is generally not known is that almost ten years before that, Bose invented a coherer that would prove to be crucial for Marconi’s successful attempt at wireless telegraphy across the Atlantic in 1901. Bose never patented his invention, and Marconi reaped all the glory.

In his book Jagadis Chandra Bose and the Indian Response to Western Science, Subrata Dasgupta gives us four reasons as to why Bose’s contributions to radiotelegraphy have been largely forgotten in the West throughout the twentieth century. The first reason, according to Dasgupta, is that Bose changed research interest around 1900. Instead of continuing and focusing his work on wireless telegraphy, Bose became interested in the physiology of plants and the similarities between inorganic and living matter in their responses to external stimuli. Bose’s name thus lost currency in his former field of study.

A second reason that contributed to the erasure of Bose’s name is that he did not leave a legacy in the form of students. He did not, as Dasgupta puts it, “found a school of radio research” that could promote his name despite his personal absence from the field. Also, and thirdly, Bose sought no monetary gain from his inventions and only patented one of his several inventions. Had he done so, chances are that his name would have echoed loudly through the century, just as Marconi’s has done.

“Finally”, Dasgupta writes, “one cannot ignore the ‘Indian factor’”. Dasgupta wonders how seriously the scientific western elite really took Bose, who was the “outsider”, the “marginal man”, the “lone Indian in the hurly-burly of western scientific technology”. And he wonders how this affected “the seriousness with which others who came later would judge his significance in the annals of wireless telegraphy”.

And this is where the BL’s online archive of nineteenth-century newspapers comes in. Looking at newspaper coverage about Bose in the British press at the time suggests that Bose’s contributions to wireless telegraphy were soon to be all but forgotten during his lifetime. When Bose died in 1937, Reuters Calcutta put out a press release that was reprinted in several British newspapers. As an example, the following notice was published in the Derby Evening Telegraph of November 23rd, 1937, on Bose’s death:

Newspaper clipping announcing death of JC Bose
Notice in the Derby Evening Telegraph of November 23rd, 1937

This notice is as short as it is telling in what it says and does not say about Bose and his achievements: he is remembered as the man “who discovered a heart beat in trees”. He is not remembered as the man who almost invented the radio. He is remembered for the Western honours that are bestowed upon him (the Knighthood and his Fellowship of the Royal Society), and he is remembered as the founder of the Bose Research Institute. He is not remembered for his career as a researcher and inventor; a career that span five decades and saw him travel extensively in India, Europe and the United States.

The Derby Evening Telegraph is not alone in this act of partial remembrance. Similar articles appeared in Dundee’s Evening Telegraph and Post and The Gloucestershire Echo on the same day. The Aberdeen Press and Journal published a slightly extended version of the Reuters press release on November 24th that includes a brief account of a lecture by Bose in Whitehall in 1929, during which Bose demonstrated “that plants shudder when struck, writhe in the agonies of death, get drunk, and are revived by medicine”. However, there is again no mention of Bose’s work as a physicist or of his contributions to wireless telegraphy. The same is true for obituaries published in The Nottingham Evening Post on November 23rd, The Western Daily Press and Bristol Mirror on November 24th, another article published in the Aberdeen Press and Journal on November 26th, and two articles published in The Manchester Guardian on November 24th.

The exception to the rule is the obituary published in The Times on November 24th. Granted, with a total of 1116 words it is significantly longer than the Reuters press release, but this is also partly the point, as it allows for a much more comprehensive account of Bose’s life and achievements. But even if we only take the first two sentences of The Times obituary, which roughly add up to the word count of the Reuters press release, we are already presented with a different account altogether:

“Our Calcutta Correspondent telegraphs that Sir Jagadis Chandra Bose, F.R.S., died at Giridih, Bengal, yesterday, having nearly reached the age of 79. The reputation he won by persistent investigation and experiment as a physicist was extended to the general public in the Western world, which he frequently visited, by his remarkable gifts as a lecturer, and by the popular appeal of many of his demonstrations.”

We know that he was a physicist; the focus is on his skills as a researcher and on his talents as a lecturer rather than on his Western titles and honours, which are mentioned in passing as titles to his name; and we immediately get a sense of the significance of his work within the scientific community and for the general public. And later on in the article, it is finally acknowledged that Bose “designed an instrument identical in principle with the 'coherer' subsequently used in all systems of wireless communication. Another early invention was an instrument for verifying the laws of refraction, reflection, and polarization of electric waves. These instruments were demonstrated on the occasion of his first appearance before the British Association at the 1896 meeting at Liverpool”.

Posted by BL Labs on behalf of Dr Christin Hoene, a BL Labs Researcher in Residence at the British Library. Dr Hoene is a Leverhulme Early Career Fellow in English Literature at the University of Kent. 

If you are interested in working with the British Library's digital collections, why not come along to one of our events that we are holding at universities around the UK this year? We will be holding a roadshow at the University of Kent on 25 April 2018. You can see a programme for the day and book your place through this Eventbrite page. 

09 September 2016

BL Labs Symposium (2016): book your place for Mon 7th Nov 2016

Add comment

Bl_labs_logo

Posted by Hana Lewis, BL Labs Project Officer.

The BL Labs team are pleased to announce that the fourth annual British Library Labs Symposium will be held on Monday 7th November, from 9:30 - 17:30 in the British Library Conference Centre, St Pancras. The event is free, although you must book a ticket in advance. Don't miss out!

The Symposium showcases innovative projects which use the British Library’s digital content, and provides a platform for development, networking and debate in the Digital Scholarship field.

Melissa
Professor Melissa Terras will be giving the keynote at this year's Symposium

This year, Dr Adam Farquhar, Head of Digital Scholarship at the British Library, will launch the Symposium. This will be followed by a keynote from Professor Melissa Terras, Director of UCL Centre for Digital Humanities. Roly Keating, Chief Executive of the British Library, will present awards to the BL Labs Competition (2016) finalists, who will also give presentations on their winning projects. 

After lunch, Stella Wisdom, Digital Curator at the British Library, will announce the winners of the Shakespeare Off the Map 2016 competition, which challenged budding designers to use British Library digital collections as inspiration in the creation of exciting interactive digital media. Following, the winners will be announced of the BL Labs Awards (2016)which recognises projects that have used the British Library’s digital content in exciting and innovative ways. Presentations will be given by the winners in each of the Awards’ categories: Research, Commercial, Artistic and Teaching / Learning. A British Library Staff Award will also be presented this year, recognising an outstanding individual or team who have played a key role in innovative work with the British Library's digital collections.  

The Symposium's endnote will be followed by a networking reception which will conclude the event, at which delegates and staff can mingle and network over a drink.  

So book your place for the Symposium today!

For any further information please contact labs@bl.uk