THE BRITISH LIBRARY

Digital scholarship blog

189 posts categorized "Data"

20 October 2020

The Botish Library: developing a poetry printing machine with Python

Add comment

This is a guest post by Giulia Carla Rossi, Curator of Digital Publications at the British Library. You can find her @giugimonogatari.

In June 2020 the Office for Students announced a campaign to fill 2,500 new places on artificial intelligence and data science conversion courses in universities across the UK. While I’m not planning to retrain in cyber, I was lucky enough to be in the cohort for the trial run of one of these courses: Birkbeck’s Postgraduate Certificate in Applied Data Science. The course started as a collaborative project between The British Library, The National Archives and Birkbeck University to develop a computing course aimed at professionals working in the cultural heritage sector. The trial run has now ended and the course is set to start in full from January 2021.

The course is designed for graduates who are new to computer science – which was perfect for me, as I had no previous coding knowledge besides some very basic HTML and CSS. It was a very steep learning curve, starting from scratch and ending with developing my own piece of software, but it was great to see how code could be applied to everyday issues to facilitate and automate parts of our workload. The fact that it was targeted at information professionals and that we could use existing datasets to learn from real life examples made it easier to integrate study with work. After a while, I started to look at the everyday tasks in my to-do list and wonder “Can this be solved with Python?”

After a taught module (Demystifying Computing with Python), students had to work on an individual project module and develop a software based on their work (to solve an issue, facilitate a task, re-use and analyse existing resources). I had an idea of the themes I wanted to explore – as Curator of Digital Publications, I’m interested in new media and platforms used to deliver content, and how text and stories are shaped by these tools. When I read about French company Short Édition and the short story vending machine in Canary Wharf I knew I had found my project.

My project is to build a stand-alone printer that prints random poems from a dataset of out-of-copyright texts. A little portable Bot-ish (sic!) Library to showcase the British Library collections and fill the world with more poetry.

This is a compilation of two images, a portable printer and a design sketch of the same by the author.
A Short Story Station in Canary Wharf, London and my own sketch of a printing machine. (photo by the author)

 

Finding poetry

For my project, I decided to use the British Library’s “Digitised printed books (18th-19th century)” collection. This comprises over 60,000 volumes of 18th and 19th century texts, digitised in partnership with Microsoft and made available under Public Domain Mark. My work focused on the metadata dataset and the dataset of OCR derived text (shout out to the Digital Research team for kindly providing me with this dataset, as its size far exceeded what my computer is able to download).

The British Library actively encourages researchers to use its “digital collection and data in exciting and innovative ways” and projects with similar goals to mine had been undertaken before. In 2017, Dr Jennifer Batt worked with staff at the British Library on a data mining project: her goal was to identify poetry within a dataset of 18th Century digitised newspapers from the British Library’s Burney Collection. In her research, Batt argued that employing a set of recurring words didn’t help her finding poetry within the dataset, as only very few of the poems included key terms like ‘stanza’ and ‘line’ – and none included the word ‘poem’. In my case, I chose to work with the metadata dataset first, as a way of filtering books based on their title, and while, as Batt proved, it’s unlikely that a poem itself includes a term defining its poetry style I was quite confident that such terms might appear in the title of a poetry collection.

My first step then was to identify books containing poetry, by searching through the metadata dataset using key words associated with poetry. My goal was not to find all the poetry in the dataset, but to identify books containing some form of poetry, that could be reused to create my printer dataset. I used the Poetry Foundation’s online “Glossary of Poetic Terms - Forms & Types of Poems” to identify key terms to use, eliminating the anachronisms (no poetry slam in the 19th century, I'm afraid) and ambiguous terms (“romance” returned too many results that weren’t relevant to my research). The result was 4580 book titles containing one or more poetry-related words.

 

A screenshot showing key terms defined as 'poem, sonnet, ballad, rhyme, verse etc.
My list of poetry terms used to search through the dataset

 

 

Creating verses: when coding meets grammar

I then wanted to extract individual poems from my dataset. The variety of book structures and poetry styles made it impossible to find a blanket rule that could be applied to all books. I chose to test my code out on books that I knew had one poem per page, so that I could extract pages and easily get my poems. Because of its relatively simple structure - and possibly because of some nostalgia for my secondary school Italian class - I started my experiments with Giacomo Pincherle’s 1865 translation of Dante’s sonnets, “In Omaggio a Dante. Dante's Memorial. [Containing five sonnets from Dante, Petrarch and Metastasio, with English versions by G. Pincherle, and five original sonnets in English by G. Pincherle.]

Once I solved the problem of extracting single poems, the issue was ‘reshaping’ the text to match the print edition. Line breaks are essential to the meaning of a poem and the OCR text was just one continuous string of text that completely disregarded the metric and rhythm of the original work. The rationale behind my choice of book was also that sonnets present a fairly regular structure, which I was hoping could be of use when reshaping the text. The idea of using the poem’s metre as a tool to determine line length seemed the most effective choice: by knowing the type of metre used (iambic pentameter, terza rima, etc.) it’s possible to anticipate the number of syllables for each line and where line breaks should occur.

So I created a function to count how many syllables a word has following English grammar rules. As it’s often the case with coding, someone has likely already encountered the same problem as you and, if you’re lucky, they have found a solution: I used a function found online as my base (thank you, StackOverflow), building on it in order to cover as many grammar rules (and exceptions) as I was aware of. I used the same model and adapted it to Italian grammar rules, in order to account for the Italian sonnets in the book as well. I then decided to combine the syllable count with the use of capitalisation at the beginning of a line. This increased the chances of a successful result in case the syllable count would return a wrong result (which might happen whenever typos appear in the OCR text).

 

An image showing the poem 'To My Father', both written as a string of lines, and in its original form
The same sonnet restructured so that each line is a new string (above), and matches the line breaks in the print edition (below)

 

It was very helpful that all books in the datasets were digitised and are available to access remotely (you can search for them on the British Library catalogue by using the search term “blmsd”), so I could check and compare my results to the print editions from home even during lockdown. I also tested my functions on sonnets from Henry Thomas Mackenzie Bell’s “Old Year Leaves Being old verses revived. [With the addition of two sonnets.]” and Welbore Saint Clair Baddeley’s “Legend of the Death of Antar, an eastern romance. Also lyrical poems, songs, and sonnets.

Another image showing a poem, this time a sonnet, written as both a string of lines, and in its original form
Example of sonnet from Legend of the Death of Antar, an eastern romance. The function that divides the poems into lines could be adapted to accommodate breaks between stanzas as well.

 

Main challenges and gaps in research

  • Typos in the OCR text: Errors and typos were introduced when the books in the collection were first digitised, which translated into exceptions to the rules I devised for identifying and restructuring poems. In order to ensure the text of every poem has been correctly captured and that typos have been fixed, some degree of manual intervention might be required.
  • Scalability: The variety of poetry styles and book structures, paired with the lack of tagging around verse text, make it impossible to find a single formula that can be applied to all cases. What I created is quite dependent on a book having one poem per page, and using capitalisation in a certain way.
  • Time constraint: the time limit we had to deliver the project - and my very-recently-acquired-and-still-very-much-developing skill set - meant I had to focus on a limited number of books and had to prioritise writing the software over building the printer itself.

 

Next steps

One of the outputs of this project is a JSON file containing a dictionary of poetry books. After searching for poetry terms, I paired the poetry titles and relative metadata with their pages from the OCR dataset, so the resulting file combines useful data from the two original datasets (book IDs, titles, authors’ names and the OCR text of each book). It’s also slightly easier to navigate compared to the OCR dataset as books can be retrieved by ID, and each page is an item in a list that can be easily called. One of the next steps will be to upload this onto the British Library data repository, in the hope that people might be encouraged to use it and conduct further research around this data collection.

Another, very obvious, next step is: building the printer! The individual components have already been purchased (Adafruit IoT Pi Printer Project Pack and Raspberry Pi 3). I will then have to build the thermal printer with Raspberry Pi and connect it to my poetry dataset. It’s interesting to note that other higher education institutions and libraries have been experimenting with similar ideas - like the University of Idaho Library’s Vandal Poem of the Day Bot and the University of British Columbia’s randomised book recommendations printer for libraries.

A photograph of technical components
Component parts of the Adafruit IoT Pi Printer Project Pack. (photo by the author)

My aim when working on this project was for the printer to be used to showcase British Library collections; the idea was for it to be located in a public area in the Library, to reach new audiences that might not necessarily be there for research purposes. The printer could also be reprogrammed to print different genres and be customised for different occasions (e.g. exhibitions, anniversary celebrations, etc.) All of this was planned before Covid-19 happened, so it might be necessary to slightly adapt things now - and any suggestions in merit are very welcome! :)

Finally, none of this would have been possible without Nora McGregor, Stelios Sotiriadis, Peter Wood, the Digital Scholarship and BL Labs teams, and the support of my line manager and my team.

19 October 2020

The 2020 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate an existing British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2020 British Library Labs Staff Award, now in its fifth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is 0700 (GMT), Monday 30 November 2020.

Nominees will be highlighted on Tuesday 15 December 2020 at the online British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects (everyone is welcome to attend, you just need to register).

You can see the projects submitted by members of staff and public for the awards in our online archive.

In 2019, last year's winner focused on the brilliant work of the Imaging Team for the 'Qatar Foundation Partnership Project Hack Days', which were sessions organised for the team to experiment with the Library's digital collections. 

The runner-up for the BL Labs Staff Award in 2019 was the Heritage Made Digital team and their social media campaign to promote the British Library's digital collections one language a week from letters 'A' to 'U' #AToUnknown).

In the public Awards, last year's winners (2019) drew attention to artisticresearchteaching & learning, and community activities that used our data and / or digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.

25 September 2020

Making Data Into Sound

Add comment

This is a guest post by Anne Courtney, Gulf History Cataloguer with the Qatar Digital Library, https://www.qdl.qa/en 

Sonification

Over the summer, I’ve been investigating the sonification of data. On the Qatar Project (QDL), we generate a large amount of data, and I wanted to experiment with different methods of representing it. Sonification was a new technique for me, which I learnt about through this article: https://programminghistorian.org/en/lessons/sonification.

 

What is sonification?

Sonification is the method of representing data in an aural format, rather than visual format, such as a graph. It is particularly useful for showing changes in data over time. Different trends are highlighted depending on the choices made during the process, in the same way as they would be when drawing a graph.

 

How does it work?

First, all the data must be put in the right format:

An example of data in Excel showing listed longitude points of
Figure 1: Excel data of longitude points where the Palsgrave anchored

Then, the data is used to generate a midi file. The Programming Historian provides an example python script for this, and by changing parts of it, it is possible to change the tempo, note length, scale, and other features.

Python script ready to output a midi file of occurrences of Anjouan over time
Figure 2: Python script ready to output a midi file of occurrences of Anjouan over time

Finally, to overlay the different midi files, edit them, and change the instruments, I used MuseScore, freely-downloadable music notation software. Other alternatives include LMMS and Garageband:

A music score with name labels of where the Discovery, Palsgrave, and Mary anchored on their journeys, showing different pitches and musical notations.
Figure 3: The score of the voyages of the Discovery, Palsgrave, and Mary, labelled to show the different places where they anchored.

 

The sound of authorities

Each item which the Qatar project catalogues has authority terms linked to it, which list the main subjects and places connected to the item. As each item is dated, it is possible to trace trends in subjects and places over time by assigning the dates of the items to the authority terms. Each authority term ends up with a list of dates when it was mentioned. By assigning different instruments to the different authorities, it is possible to hear how they are connected to each other.

This sound file contains the sounds of places connected with the trade in enslaved people, and how they intersect with the authority term ‘slave trade’. The file begins in 1700 and finishes in 1900. One of the advantages of sonification is that the silence is as eloquent as the data. The authority terms are mentioned more at the end of the time period than the start, and so the piece becomes noisier as the British increasingly concern themselves with these areas. The pitch of the instruments is determined, in this instance, by the months of the records in which they are mentioned.

Authorities

The authority terms are represented by these instruments:

Anjouan: piccolo

Madagascar: cello

Zanzibar: horn

Mauritius: piano

Slave Trade: tubular bell

 

Listening for ships

Ships

This piece follows the journeys of three ships from March 1633 to January 1637. In this example, the pitch is important because it represents longitude; the further east the ships travel, the higher the pitch. The Discovery and the Palsgrave mostly travelled together from Gravesend to India, and they both made frequent trips between the Gulf and India. The Mary set out from England in April 1636 to begin her own journey to India. The notes represent the time the ships spent in harbour, and the silence is the time spent at sea. The Discovery is represented by the flute, the Palsgrave by the violin, and the Mary by the horn.

18 September 2020

Hiring a new Wikimedian in Residence

Add comment

Are you passionate about helping people and organisations build and preserve open knowledge to share and use freely? Have you got experience organising online events, workshops and training sessions? Then you may be interested in applying to be our new Wikimedian in Residence.

In collaboration with Wikimedia UK, the British Library is working on contributing and improving content, data, and metadata, across the Wikimedia family of platforms.

I recently ran a “World of Wikimedia” series of remote guest lectures for Library staff, to inspire my colleagues, and to further assist with this work, the Library is hiring a Wikimedian in Residence to join the Digital Scholarship team, on a part-time basis (18 hours per week) for 12 months.

8 people standing outside the entrance of the British Library
A Wikipedians in Residence group photo, taken at GLAMcamp London, 15-16 September 2012 (photo by Rock drum, Wikimedia Commons / CC-BY-SA-3.0)

Since hosting a successful Wikipedian in Residence in 2012 (this was Andrew Gray, who is standing second in from the right in the above photo, you can read about his residency here), many staff across the British Library have engaged with Wikimedia projects, holding edit-a-thons, and adding digital collections to Wikimedia Commons.

Now, with generous funding from the Eccles Centre for American Studies, we are looking for a proactive and self-motivated individual who can coordinate and support these activities. Furthermore, we are hoping for someone who can really help the Library to actively engage with the Wikidata, Wikibase and Wikisource platforms and communities. Increasing the visibility and enrichment of data, collections, and research materials, which the Library holds about underrepresented populations.

If this sounds like something you can do, then please do apply. The vacancy ref is 03423, closing date is 8th October 2020 and the interview date is 23rd October 2020. The post is part time 2.5 days per week, for 12 months, and initially work will be done remotely, in light of the current COVID 19 situation. However, longer term, it is likely that there will be a mix of remote and on site working.

During my time working in the Library, we have hosted a number of wonderful residencies, including Christopher Green, Rob Sherman and Sarah Cole, who each brought fresh skills, knowledge and enthusiasm, into the Library. So I very much hope that this new residency will do the same.

This post is by Digital Curator Stella Wisdom (@miss_wisdom

11 September 2020

BL Labs Public Awards 2020: enter before 0700 GMT Monday 30 November 2020!

Add comment

The sixth BL Labs Public Awards 2020 formally recognises outstanding and innovative work that has been carried out using the British Library’s data and / or digital collections by researchers, artists, entrepreneurs, educators, students and the general public.

The closing date for entering the Public Awards is 0700 GMT on Monday 30 November 2020 and you can submit your entry any time up to then.

Please help us spread the word! We want to encourage any one interested to submit over the next few months, who knows, you could even win fame and glory, priceless! We really hope to have another year of fantastic projects to showcase at our annual online awards symposium on the 15 December 2020 (which is open for registration too), inspired by our digital collections and data!

This year, BL Labs is commending work in four key areas that have used or been inspired by our digital collections and data:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Educational - Quality learning experiences created for learners of any age and ability that use the Library's digital content.
  • Community - Work that has been created by an individual or group in a community.

What kind of projects are we looking for this year?

Whilst we are really happy for you to submit your work on any subject that uses our digital collections, in this significant year, we are particularly interested in entries that may have a focus on anti-racist work or projects about lock down / global pandemic. We are also curious and keen to have submissions that have used Jupyter Notebooks to carry out computational work on our digital collections and data.

After the submission deadline has passed, entries will be shortlisted and selected entrants will be notified via email by midnight on Friday 4th December 2020. 

A prize of £150 in British Library online vouchers will be awarded to the winner and £50 in the same format to the runner up in each Awards category at the Symposium. Of course if you enter, it will be at least a chance to showcase your work to a wide audience and in the past this has often resulted in major collaborations.

The talent of the BL Labs Awards winners and runners up over the last five years has led to the production of remarkable and varied collection of innovative projects described in our 'Digital Projects Archive'. In 2019, the Awards commended work in four main categories – Research, Artistic, Community and Educational:

BL_Labs_Winners_2019-smallBL  Labs Award Winners for 2019
(Top-Left) Full-Text search of Early Music Prints Online (F-TEMPO) - Research, (Top-Right) Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction - Artistic
(Bottom-Left) John Faucit Saville and the theatres of the East Midlands Circuit - Community commendation
(Bottom-Right) The Other Voice (Learning and Teaching)

For further detailed information, please visit BL Labs Public Awards 2020, or contact us at labs@bl.uk if you have a specific query.

Posted by Mahendra Mahey, Manager of British Library Labs.

07 September 2020

When is a persistent identifier not persistent? Or an identifier?

Add comment

Ever wondered what that bar code on the back of every book is? It’s an ISBN: an International Standard Book Number. Every modern book published has an ISBN, which uniquely identifies that book, and anyone publishing a book can get an ISBN for it whether an individual or a huge publishing house. It’s a little more complex than that in practice but generally speaking it’s 1 book, 1 ISBN. Right? Right.

Except…

If you search an online catalogue, such as WorldCat or The British Library for the ISBN 9780393073775 (or the 10-digit equivalent, 0393073777) you’ll find results appear for two completely different books:

  1. Waal FD. The Bonobo and the Atheist: In Search of Humanism Among the Primates. New York: W. W. Norton & Co.; 2013. 304 p. http://www.worldcat.org/oclc/1167414372
  2. Lodge HC. The Storm Has Many Eyes; a Personal Narrative. 1st edition. New York: New York Norton; 1973. http://www.worldcat.org/oclc/989188234

A screen grab of the main catalogue showing a search for ISBN 0393073777 with the above two results

In fact, things are so confused that the cover of one book gets pulled in for the other as well. Investigate further and you’ll see that it’s not a glitch: both books have been assigned the same ISBN. Others have found the same:

“However, if the books do not match, it’s usually one of two issues. First, if it is the same book but with a different cover, then it is likely the ISBN was reused for a later/earlier reprinting. … In the other case of duplicate ISBNs, it may be that an ISBN was reused on a completely different book. This shouldn’t happen because ISBNs are supposed to be unique, but exceptions have been found.” — GoodReads Librarian Manual: ISBN-10, ISBN-13 and ASINS

While most publishers stick to the rules about never reusing an ISBN, it’s apparently common knowledge in the book trade that ISBNs from old books get reused for newer books, sometimes accidentally (due to a typo), sometimes intentionally (to save money), and that has some tricky consequences.

I recently attended a webinar entitled “Identifiers in Heritage Collections - how embedded are they?” from the Persistent Identifiers as IRO Infrastructure (“HeritagePIDs”) project, part of AHRC’s Towards a National Collection programme. As quite often happens, the question was raised: what Persistent Identifier (PID) should we use for books and why can’t we just use ISBNs? Rod Page, who gave the demo that prompted this discussion, also wrote a short follow-up blog post about what makes PIDs work (or not) which is worth a look before you read the rest of this.

These are really valid questions and worth considering in more detail, and to do that we need to understand what makes a PID special. We call them persistent, and indeed we expect some sort of guarantee that a PID remains valid for the long term, so that we can use it as a link or placeholder for the referent without worrying that the link will get broken. But we also expect PIDs to be actionable: it can be made into a valid URL by following some rules: so that we can directly obtain the object referenced or at least some information about it.

Actionability implies two further properties: an actionable identifier must be

  1. Unique: guaranteed to have only one identifier for a given object (of a given type); and
  2. Unambiguous: guaranteed that a single identifier refers to only one object

Where does this leave us with ISBNs?

Well first up they’re not actionable to start with: given an ISBN, there’s no canonical way to obtain information about the book referenced, although in practice there are a number of databases that can help. There is, in fact, an actionable ISBN standard: ISBN-A permits converting an ISBN into a DOI with all the benefits of the underlying DOI and Handle infrastructure. Sadly, creation of an ISBN-A isn’t automatic and publishers have to explicitly create the ISBN-A DOI in addition to the already-create ISBN; most don’t.

More than that though, it’s hard to make them actionable since ISBNs fail on both uniqueness and unambiguity. Firstly, as seen in the example I gave above, ISBNs do get recycled, They’re not supposed to be:

“Once assigned to a monographic publication, an ISBN can never be reused to identify another monographic publication, even if the original ISBN is found to have been assigned in error.” — International ISBN Agency. ISBN Users’ Manual [Internet]. Seventh Edition. London, UK: International ISBN Agency; 2017 [cited 2020 Jul 23]. Available from: https://www.isbn-international.org/content/isbn-users-manual

Yet they are, so we can’t rely on their precision.[1]

Secondly, and perhaps more problematic in day-to-day use, a given book may have multiple ISBNs. To an extent this is reasonable: different editions of the same book may have different content, or at the very least different page numbering, so a PID should be able to distinguish these for accurate citation. Unfortunately the same edition of the same book will frequently have multiple ISBNs; in particular each different format (hardback, paperback, large print, ePub, MOBI, PDF, …) is expected to have a distinct ISBN. Even if all that changes is the publisher, a new ISBN is still created:

“We recently encountered a case where a publisher had licensed a book to another publisher for a different geographical market. Both books used the same ISBN. If the publisher of the book changes (even if nothing else about the book has changed), the ISBN must also change.” — Everything you wanted to know about the ISBN but were too afraid to ask

Again, this is reasonable since the ISBN is primarily intended for stockkeeping by book sellers[2], and for them the difference between a hardback and paperback is important because they differ in price if nothing else. This has bitten more than one librarian when trying to merge data from two different sources (such as usage and pricing) using the ISBN as the “obvious” merge key. It makes bibliometrics harder too, since you can’t easily pull out a list of all citations of a given edition in the literature, just from a single ISBN.

So where does this leave us?

I’m not really sure yet. ISBNs as they are currently specified and used by the book industry aren’t really fit for purpose as a PID. But they’re there and they sort-of work and establishing a more robust PID for books would need commitment and co-operation from authors, publishers and libraries. That’s not impossible: a lot of work has been done recently to make the ISSN (International Standard Serial Number, for journals) more actionable.

But perhaps there are other options. Where publishers, booksellers and libraries are primarily interested in IDs for stock management, authors, researchers and scholarly communications librarians are more interested in the scholarly record as a whole and tracking the flow of ideas (and credit for those) which is where PIDs come into their own. Is there an argument for a coalition of these groups to establish a parallel identifier system for citation & credit that’s truly persistent? It wouldn’t be the first time: ISNIs (International Standard Name Identifiers) and ORCIDs (Open Researcher and Contributor IDs) both identify people, but for different purposes in different roles and with robust metadata linking the two where possible.

I’m not sure where I’m going with this train of thought so I’ll leave it there for now, but I’m sure I’ll be back. The more I dig into this the more there is to find, including the mysterious, long-forgotten and no-longer accessible Book Item & Component Identifier proposal. In the meantime, if you want a persistent identifier and aren’t sure which one you need these Guides to Choosing a Persistent Identifier from Project FREYA should get you started.


  1. Actually, as my colleague pointed out, even DOIs potentially have this problem, although I feel they can mitigate it better with metadata that allows rich expression of relationships between DOIs.  ↩︎

  2. In fact, the newer ISBN-13 standard is simply an ISBN-10 encoded as an “International Article Number”, the standard barcode format for almost all retail products, by sticking the “Bookland” country code of 978 on the front and recalculating the check digit. ↩︎

04 September 2020

British Library Joins Share-VDE Linked Data Community

Add comment

This blog post is by Alan Danskin, Collection Metadata Standards Manager, British Library. metadata@bl.uk

What is Share-VDE and why has the British Library joined the Share-VDE Community?

Share-VDE is a library-driven initiative bringing library catalogues together in a shared Virtual Discovery Environment.  It uses linked data technology to create connections between bibliographic information contributed by different institutions

Example SVDE page showing Tim Berners-Lee linked info to publications, wikipedia, and other external sites
Figure 1: SVDE page for Sir Tim Berners-Lee

For example, searching for Sir Tim Berners-Lee retrieves metadata contributed by different members, including links to his publications. The search also returns links to external sources of information, including Wikipedia.

The British Library will be the first institution to contribute its national bibliography to Share-VDE and we also plan to contribute our catalogue data. By collaborating with the Share-VDE community we will extend access to information about our collections and services and enable information to be reused.

The Library also contributes to Share-VDE by participating on community groups working to develop the metadata model and Share-VDE functionality. This provides us with a practical approach for bridging differences between the IFLA Library Reference Model (LRM) and the Bibframe initiative, led by Library of Congress.

Share VDE is promoted by the international bibliographic agency Casalini Libri and @Cult, a solutions developer working in the cultural heritage sector.

Andrew MacEwan, Head of Metadata at the British Library, explained that, “Membership of the Share-VDE community is an exciting opportunity to enrich the Library’s metadata and open it up for re-use by other institutions in a linked data environment.”

Tiziana Possemato, Chief Information Officer at Casalini Libri and Director of @Cult, said "We are delighted to collaborate with the British Library and extremely excited about unlocking the wealth of data in its collections, both to further enrich the Virtual Discovery Environment and to make the Library's resources even more accessible to users."

For further information about:

SHARE-VDE  

Linked Data

Linked Open Data

The British Library is the national library of the United Kingdom and one of the world's greatest research libraries. It provides world class information services to the academic, business, research and scientific communities and offers unparalleled access to the world's largest and most comprehensive research collection. The Library's collection has developed over 250 years and exceeds 150 million separate items representing every age of written civilisation and includes books, journals, manuscripts, maps, stamps, music, patents, photographs, newspapers and sound recordings in all written and spoken languages. Up to 10 million people visit the British Library website - www.bl.uk - every year where they can view up to 4 million digitised collection items and over 40 million pages.

Casalini Libri is a bibliographic agency producing authority and bibliographic data; a library vendor, supplying books and journals, and offering a variety of collection development and technical services; and an e-content provider, working both for publishers and libraries.

@Cult is a software development company, specializing in data conversion for LD; and provider of Integrated Library System and Discovery tools, delivering effective and innovative technological solutions to improve information management and knowledge sharing.

22 July 2020

World of Wikimedia

Add comment

During recent months of working from home, the Wikimedia family of platforms, including Wikidata and Wikisource, have enabled many librarians and archivists to do meaningful work, to enhance and amplify access to the collections that they curate.

I’ve been very encouraged to learn from other institutions and initiatives who have been working with these platforms. So I recently invited some wonderful speakers to give a “World of Wikimedia” series of remote guest lectures for staff, to inspire my colleagues in the British Library.

Circle of logos from the Wikimedia family of platforms
Logos of the Wikimedia Family of platforms

Stuart Prior from Wikimedia UK kicked off this season with an introduction to Wikimedia and the projects within it, and how it works with galleries, libraries, archives and museums. He was followed by Dr Martin Poulter, who had been the Bodleian Library’s Wikimedian In Residence. Martin shared his knowledge of how books, authors and topics are represented in Wikidata, how Wikidata is used to drive other sites, including Wikipedia, and how Wikipedia combines data and narrative to tell the world about notable books and authors.

Continuing with the theme of books, Gavin Willshaw spoke about the benefits of using Wikisource for optical character recognition (OCR) correction and staff engagement. Giving an overview of the National Library of Scotland’s fantastic project to upload 3,000 digitised Scottish Chapbooks to Wikisource during the Covid-19 lockdown. Focusing on how the project came about, its impact, and how the Library plans to take activity in this area forward in the future.

Illustration of two 18th century men fighting with swords
Tippet is the dandy---o. The toper's advice. Picking lilies. The dying swan, shelfmark L.C.2835(14), from the National Library of Scotland's Scottish Chapbooks collection

Closing the World of Wikimedia season, Adele Vrana and Anasuya Sengupta gave an extremely thought provoking talk about Whose Knowledge? This is a global multilingual campaign, which they co-founded, to centre the knowledges of marginalised communities (the majority of the world) online. Their work includes the annual #VisibleWikiWomen campaign to make women more visible on Wikipedia, which I blogged about recently.

One of the silver linings of the covid-19 lockdown has been that I’ve been able to attend a number of virtual events, which I would not have been able to travel to, if they had been physical events. These have included LD4 Wikidata Affinity Group online meetings; which is a biweekly zoom call on Tuesdays at 9am PDT (5pm BST).

I’ve also remotely attended some excellent online training sessions: “Teaching with Wikipedia: a practical 'how to' workshop” ran by Ewan McAndrew, Wikimedian in Residence at The University of Edinburgh. Also “Wikimedia and Libraries - Running Online Workshops” organised by the Chartered Institute of Library and Information Professionals in Scotland (CILIPS), presented by Dr Sara Thomas, Scotland Programme Coordinator for Wikimedia UK, and previously the Wikimedian in Residence at the Scottish Library and Information Council. From attending the latter, I learned of an online “How to Add Suffragettes & Women Activists to Wikipedia” half day edit-a-thon event taking place on the 4th July organised by Sara, Dr t s Beall and Clare Thompson from the Protests and Suffragettes project, this is a wonderful project, which recovers and celebrates the histories of women activists in Govan, Glasgow.

We have previously held a number of in person Wikipedia edit-a-thon events at the British Library, but this was the first time that I had attended one remotely, via Zoom, so this was a new experience for me. I was very impressed with how it had been organised, using break out rooms for newbies and more experienced editors, including multiple short comfort breaks into the schedule and having very do-able bite size tasks, which were achievable in the time available. They used a comprehensive, but easy to understand, shared spreadsheet for managing the tasks that attendees were working on. This is definitely an approach and a template that I plan to adopt and adapt for any future edit-a-thons I am involved in planning.

Furthermore, it was a very fun and friendly event, the organisers had created We Can [edit]! Zoom background template images for attendees to use, and I learned how to use twinkles on videocalls! This is when attendees raise both hands and wiggle their fingers pointing upwards, to indicate agreement with what is being said, without causing a soundclash. This hand signal has been borrowed it from the American Sign Language word for applause, it is also used by the Green Party and the Occupy Movement.

With enthusiasm fired up from my recent edit-a-thon attending experience, last Saturday I joined the online Wikimedia UK 2020 AGM. Lucy Crompton-Reid, Chief Executive of Wikimedia UK, gave updates on changes in the global Wikimedia movement, such as implementing the 2030 strategy, rebranding Wikimedia, the Universal Code of Conduct and plans for Wikipedia’s 20th birthday. Lucy also announced that three trustees Kelly Foster, Nick Poole and Doug Taylor, who stood for the board were all elected. Nick and Doug have both been on the board since July 2015 and were re-elected. I was delighted to learn that Kelly is a new trustee joining the board for the first time. As Kelly has previously been a trainer at BL Wikipedia edit-a-thon events, and she coached me to create my first Wikipedia article on Coventry godcakes at a Wiki-Food and (mostly) Women edit-a-thon in 2017.

In addition to these updates, Gavin Willshaw, gave a keynote presentation about the NLS Scottish chapbooks Wikisource project that I mentioned earlier, and there were three lightning talks: Andy Mabbett; 'Wiki Hates Newbies', Clare Thompson, Lesley Mitchell and Dr t s Beall; 'Protests and Suffragettes: Highlighting 100 years of women’s activism in Govan, Glasgow, Scotland' and Jason Evans; 'An update from Wales'.

Before the event ended, there was a 2020 Wikimedia UK annual awards announcement, where libraries and librarians did very well indeed:

  • UK Wikimedian of the Year was awarded to librarian Caroline Ball for education work and advocacy at the University of Derby (do admire her amazing Wikipedia dress in the embedded tweet below!)
  • Honourable Mention to Ian Watt for outreach work, training, and efforts around Scotland's COVID-19 data
  • Partnership of the Year was given to National Library of Scotland for the WikiSource chapbooks project led by Gavin Willshaw
  • Honourable Mention to University of Edinburgh for work in education and Wikidata
  • Up and Coming Wikimedian was a joint win to Emma Carroll for work on the Scottish Witch data project and Laura Wood Rose for work at University of Edinburgh and on the Women in Red initiative
  • Michael Maggs was given an Honorary Membership, in recognition of his very significant contribution to the charity over a number of years.

Big congratulations to all the winners. Their fantastic work, and also in Caroline's case, her fashion sense, is inspirational!

For anyone interested, the next online event that I’m planning to attend is a #WCCWiki Colloquium organised by The Women’s Classical Committee, which aims to increase the representation of women classicists on Wikipedia. Maybe I’ll virtually see you there…

This post is by Digital Curator Stella Wisdom (@miss_wisdom