Digital scholarship blog

260 posts categorized "Digital scholarship"

20 September 2022

Learn more about Living with Machines at events this autumn

Digital Curator, and Living with Machines Co-Investigator Dr Mia Ridge writes…

The Living with Machines research project is a collaboration between the British Library, The Alan Turing Institute and various partner universities. Our free exhibition at Leeds City Museum, Living with Machines: Human stories from the industrial age, opened at the end of July. Read on for information about adult events around the exhibition…

AI evening panels and workshop, September 2022

We’ve put together some great panels with expert speakers guaranteed to get you thinking about the impact of AI with their thought-provoking examples and questions. You'll have a chance to ask your own questions in the Q&A, and to mingle with other attendees over drinks.

We’ve also collaborated with AI Tech North to offer an exclusive workshop looking at the practical aspects of ethics in AI. If you’re using or considering AI-based services or tools, this might be for you. Our events are also part of the jam-packed programme of the Leeds Digital Festival #LeedsDigi22, where we’re in great company.

The role of AI in Creative and Cultural Industries

Thu, Sep 22, 17:30 – 19:45 BST

Leeds City Museum • Free but booking required

https://www.eventbrite.com/e/the-role-of-ai-in-creative-and-cultural-industries-tickets-395003043737

How will AI change what we wear, the TV and films we watch, what we read? 

Join our fabulous Chair Zillah Watson (independent consultant, ex-BBC) and panellists Rebecca O’Higgins (Founder KI-AH-NA), Laura Ellis (Head of Technology Forecasting, BBC) and Maja Maricevic, (Head of Higher Education and Science, British Library) for an evening that'll help you understand the future of these industries for audiences and professionals alike. 

Maja's written a blog post on The role of AI in creative and cultural industries with more background on this event.

 

Workshop: Developing ethical and fair AI for society and business

Thu, Sep 29, 13:30 - 17:00 BST

Leeds City Museum • Free but booking required

https://www.eventbrite.com/e/workshop-developing-ethical-and-fair-ai-for-society-and-business-tickets-400345623537

 

Panel: Developing ethical and fair AI for society and business

Thu, Sep 29, 17:30 – 19:45 BST

Leeds City Museum • Free but booking required

https://www.eventbrite.com/e/panel-developing-ethical-and-fair-ai-for-society-and-business-tickets-395020706567

AI is coming, so how do we live and work with it? What can we all do to develop ethical approaches to AI to help ensure a more equal and just society? 

Our expert Chair, Timandra Harkness, and panellists Sherin Mathew (Founder & CEO of AI Tech UK), Robbie Stamp (author and CEO at Bioss International), Keely Crockett (Professor in Computational Intelligence, Manchester Metropolitan University) and Andrew Dyson (Global Co-Chair of DLA Piper’s Data Protection, Privacy and Security Group) will present a range of perspectives on this important topic.

 

Wikipedia editathon and Museum Late, October 6, 2022

We've put two events on the one day so that you can combine them into one awesome experience - or just come along to one!

Living with Machines Wikithon

Thu, October 6, 13:00 – 16:30 BST

Leeds City Museum • Free but booking required: https://my.leedstickethub.co.uk/19104.

Ever wanted to try editing Wikipedia, but haven't known where to start? Join us for a session with our brilliant Wikipedian-in-residence to help improve Wikipedia’s coverage of local lives and topics at an editathon themed around our exhibition. 

Everyone is welcome! You won’t require any previous Wiki experience but please bring your own laptop for this event. Find out more, including how you can prepare, in my blog post on the Living with Machines site, Help fill gaps in Wikipedia: our Leeds editathon.

 

Museum Lates

Thu, October 6, 18:00 BST

Leeds City Museum • £5 and booking required https://my.leedstickethub.co.uk/19101

See our exhibition while enjoying a range of activities, including some we've repeated from the family / school holidays programme because they sounded so fun we wanted to try them ourselves. Highlights include:

  • Leeds-based folk singers bringing historical protest songs from the Library's collections of ballads to life
  • Weaving demonstrations on a dobby loom by local textile artist, Agnis Smallwood
  • Weaving kits to try your hand at and take home
  • Lego robotics workshops
  • Pub quiz and plant workshops

Lwm800x400

16 August 2022

#WikiLibCon22: An International Experience

It was with a little bit of apprehension that I made my way to Ireland, in late July. After two years of limited travel, and international restrictions, it felt strange to be standing in line at an airport, passport in hand, on my way to an in-person conference. Mixed in with the nervousness, however, was excitement. I was on my way to the first ever Wikimedia + Libraries Convention, hosted at Maynooth University. I’m happy to report that it was a fantastic event and worth every minute of travel nerves.

Logo for Wikimedia and Libraries Convention.
Logo for Wikimedia and Libraries Convention. Image credit: Bridges2Information, CC BY-SA 4.0, via Wikimedia Commons

A lot of hard work and inspiration had gone into making this event happen: with just three months to prepare, the organising committee outdid themselves at every turn. Laurie Bridges (Oregon State), Dr Rebecca O’Neill (Wikimedia Community Ireland), Dr Núria Ferran Ferrer (University of Barcelona) and Wikimedian of the Year 2022, Dr Nkem Osuigwe, arranged a weekend packed with fascinating talks, wonderful networking opportunities, and even some traditional Irish dancing. (Thankfully, the participants were observing this part!)

For me, the highlight of the weekend was meeting such a broad community of Wikimedians and library specialists. Having started my post remotely, the opportunity to interact with people from all over the world, in person, felt too good to be true, but as this photo demonstrates, it really did happen.

Group photo of participants at WikiLibCon22, outside St Patrick's College, Maynooth.
Participants in front of St Patrick’s College, Maynooth by B20180, CC BY-SA 4.0, via Wikimedia Commons

I did a lot of tweeting over the weekend, trying to capture these excellent presentations. You can catch a lot of impressions and fun memories of the weekend over on Twitter using the #WikiLibCon22 hashtag.

There were many highlights over the course of the two days. The keynote presentation by Dr Nkem Osuigwe was outstanding. She spoke about ‘Wikimedia Through The Prism Of Critical Librarianship’. I could not possibly do justice to the depth of thought in this excellent piece, but certain observations and quotes stood out. Nkem described critical librarianship as 'seek[ing] to find out who is misrepresented, underrepresented or not even seen at all, [a system which] seeks to uphold the human rights of user communities; to find out inequities within the system'. This is a very powerful statement which really ties in with the Wikimedia aim of knowledge equity and global knowledge. As Nkem pointed out, we have over 6000 living languages, and between 1000 and 2000 in Africa alone. Wikipedia is now extant in over 300 languages, but this is a small percentage of the world at large.

Many things in Nkem’s presentation have stuck with me, and the proverb “Until the lions have their own historians, the history of the hunt will always glorify the hunter” is one of the strongest. It was a true privilege to hear Nkem speak, and to meet so many wonderful people from the African Library and Information Associations and Institutions (AfLIA).

Image of Nkem Osuigwe presenting at WikiLibCon
Dr Nkem Osuigwe, B20180, CC BY-SA 4.0, via Wikimedia Commons

Participants came from all over the world, and from all different areas of Wikipedia. Viral hit Annie Rauwerda, of the famous @depthsofwiki account, was there to talk about her work in outreach and exploring the engagement potential of social media, while public librarian and author Amber Morrell spoke about her experience using TikTok @storytimeamber to educate and entertain. Unfortunately, I could not attend all of these papers in person, as I was presenting with Satdeep Gill (Wikimedia Foundation) on the work that the British Library and Two Centuries of Indian Print have done on Wikisource and Bengali books.

Other standout talks included Felix Nartey of the Wikimedia Foundation giving the second day keynote on ‘Wikimedia and Libraries: Working Together To Build The Infrastructure For Free Knowledge’. I attended an excellent workshop on importing bibliographic data to Wikidata, run by Dr Ursula Oberst (Leiden), and an insightful reflective talk by Liam Wyatt (Wikimedia Foundation) and Alice Kibombo (Wikimedia Community User Group Uganda) on ‘Libraries and Wikimedia: Where Have We Come From and Where Are We Going?’. I wanted to say particular thanks to Alice, who chaired our panel on Wikimedians in Residence. I was really pleased to talk alongside Rachel Helps (Brigham Young) and Kim Gile (Kansas City Public Library), sharing our experiences of Residencies and the role of a Resident. In her presentation with Liam, Alice asked a crucial question of all participants: 'Are we equipped to lead the change we'd like to see?' That has stuck with me. I feel strongly that after an event like #WikiLibCon22, we are certainly on the right path.

NB: You can see some of the presentations on Commons, as well as images from the event.

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian).

05 August 2022

Burmese Script Conversion using Aksharamukha

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

Curious about Myanmar (Burma)? Did you know that the British Library has a large collection of Burmese materials, including manuscripts dating back to the 17th century, early printed books, newspapers, periodicals, as well as current material?

You can search our main online catalogue Explore the British Library for printed material, or the Explore Archives and Manuscripts catalogue for manuscripts. But, to increase chances of discovering printed resources, you will need to search the Explore catalogue by typing in the transliteration of the Burmese title and/or author using the Library of Congress romanisation rules. This means that searching for an item using the original Burmese script, or using what you would intuitively consider to be the romanised version of Burmese script, is not going to get you very far (not yet, anyway).

Excerpt from the Library of Congress romanisation scheme
Excerpt from the Library of Congress romanisation scheme

 

The reason for this is that this is how we catalogue Burmese collection items at the Library, following a policy to transliterate Burmese using the Library of Congress (LoC) rules. In theory, the benefit of this system specifically for Burmese is that it enables a two-way transliteration, i.e. the romanisation could be precisely reversed to give the Burmese script. However, a major issue arises from this romanisation system: romanised versions of Burmese script are so far removed from their phonetic renderings, that most Burmese speakers are completely unable to recognise any Burmese words.

With the LoC scheme being unintuitive for Burmese speakers, not reflecting the spoken language, British Library catalogue records for Burmese printed materials end up virtually inaccessible to users. And we’re not alone with this problem – other libraries worldwide holding Burmese collections and using the LoC romanisation scheme, face the same issues.

The Buddha at Vesali in a Burmese manuscript, from the Henry Burney collection. British Library, Or. 14298, f. 1
The Buddha at Vesali in a Burmese manuscript, from the Henry Burney collection. British Library, Or. 14298, f. 1

 

One useful solution to this could be to find or develop a tool that converts the LoC romanisation output into Burmese script, and vice versa – similar to how you would use Google Translate. Maria Kekki, our Curator for Burmese collections, have discovered the online tool Aksharamukha, which aims to facilitate conversion between various scripts – also referred to as transliteration (transliteration into Roman alphabet is particularly referred to as romanisation). It supports 120 scripts and 21 romanisation methods, and luckily, Burmese is one of them.

Aksharamukha: Script Converter screenshot
Aksharamukha: Script Converter screenshot

 

Using Aksharamukha has already been of great help to Maria. Instead of painstakingly converting Burmese script manually into its romanised version, she could now copy-paste the conversion and make any necessary adjustments. She also noticed making fewer errors this way! However, it was missing one important thing – the ability to directly transliterate Burmese script specifically using the LoC romanisation system.

Such functionality would not only save our curatorial and acquisitions staff a significant amount of time – but also help any other libraries holding Burmese collections and following the LoC guidelines. This would also allow Burmese speakers to find material in the library catalogue much more easily – readers will also use this platform to find items in our collection, as well as other collections around the world.

To this end, Maria got in touch with the developer of Aksharamukha, Vinodh Rajan – a computer scientist who is also an expert in writing systems, languages and digital humanities. Vinodh was happy to implement two things: (1) add the LoC romanisation scheme as one of the transliteration options, and (2) add spaces in between words (when it comes to spacing, according to the LoC romanisation system, there are different rules for words of Pali and English origin, which are written together).

Vinodh demonstrating the new Aksharamukha functionality, June 2022
Vinodh demonstrating the new Aksharamukha functionality, June 2022

 

Last month (July 2022) Vinodh implemented the new system, and what we can say, the result is just fantastic! Readers are now able to copy-paste transliterated text into the Library’s catalogue search box, to see if we hold items of interest. It is also a significant improvement for cataloguing and acquisition processes, being able to create acquisitions records and minimal records. As a next step, we will look into updating all of our Burmese catalogue records to include Burmese script (alongside transliteration), and consider a similar course of action for other South or Southeast Asian scripts.

I should mention that as a bonus, Aksharamukha’s codebase is fully open source, is available on GitHub and is well documented. If you have feedback or notice any bugs, please feel free to raise an issue on GitHub. Thank you, Vinodh, for making this happen!

 

18 July 2022

UK Digital Comics: More of the same but different? [1]

This is a guest post by Linda Berube, an AHRC Collaborative Doctoral Partnership student based at the British Library and City, University of London. If you would like to know more about Linda's research, please do email her at Linda.Berube@city.ac.uk.

When I last wrote a post for the Digital Scholarship blog in 2020 (Berube, 2020), I was a fairly new PhD student, fresh out of the starting blocks, taking on the challenge of UK digital comics research.  My research involves an analysis of the systems and processes of UK digital comics publishing as a means of understanding how digital technology has affected, maybe transformed them. For this work, I have the considerable support of supervisors Ian Cooke and Stella Wisdom (British Library) and Ernesto Priego and Stephann Makri (Human-Computer Interaction Design Centre, City, University of London).

Little did I, or the rest world for that matter, know the transformations to daily life brought on by pandemic that were to come. There was no less of an impact felt in the publishing sector, and certainly in comics publishing. Still, despite all the obstacles to meetings, people from traditional[2] large and small press publishers, media and video game companies publishing comics, as well as creators and self-publishers gave generously of their time to discuss comics with me. I am currently speaking with comics readers and observing their reading practices, again all via remote meetings. To all these people, this PhD student owes a debt of gratitude for their enthusiastic participation.

British Comics Publishing: It’s where we’re at

Digital technology has had a significant impact on British comics publishing, but not as pervasively as expected from initial prognostications by scholars and the comics press. Back in 2020, I observed:

  This particular point in time offers an excellent opportunity to consider the digital comics, and specifically UK, landscape. We seem to be past the initial enthusiasm for digital technologies when babies and bathwater were ejected with abandon (see McCloud 2000, for example), and probably still in the middle of a retrenchment, so to speak, of that enthusiasm (see Priego 2011 pp278-280, for example). (Berube, 2020).

But ‘retrenchment’ might be a strong word. According to my research findings to date, and in keeping with those of the broader publishing sector (Thompson, 2010; 2021), the comics publishing process has most definitely been ‘revolutionized’ by digital technology. All comics begin life as digital files until they are published in print. Even those creators who still draw by hand must convert their work to digital versions that can be sent to a publisher or uploaded to a website or publishing platform. And, while print comics have by no means been completely supplanted by digital comics (in fact a significant number of those interviewed voiced a preference for print), reading on digital devices-laptops, tablets, smartphones-has become popular enough for publishers to provide access through ebook and app technology. Even those publishers I interviewed who were most resistant to digital felt compelled ‘to dabble in digital comics’ (according to one small press publisher) by at least providing pdf versions on Gumroad or some other storefront. The restrictions on print distribution and sales through bookstores resulting from Covid lockdown compelled some of the publishers not only to provide more access to digital versions, but some went as far to sell digital-exclusive versions, in other words comics only offered digitally.

Everywhere you look, a comic

The visibility of digital comics across sectors including health, economics, education, literacy and even the hard sciences was immediately obvious from a mapping exercise of UK comics publishers, producers and platforms as well as through interviews. What this means is that comics-the creation and reading of them-are used to teach and to learn about multiple topics, including archiving (specifically UK Legal Deposit) (Figure 1) and Anthropology (specifically Smartphones and Smart Ageing) (Figure 2):

Cartoon drawing of two people surrounded by comics and zines
Figure 1: Panel from 'The Legal Deposit and You', by Olivia Hicks (British Library, 2018). Reproduced with permission from the British Library.

 

Cartoon drawing of two women sitting on a sofa looking at and discussing content on a smartphone
Figure 2: Haapio-Kirk, L., Murariu, G., and Hahn, A. (artist) (2022) 'Beyond Anthropomorphism Palestine', Anthropology of Smartphones and Smart Ageing (ASSA) Blog. Based on Maya de Vries and Laila Abed Rabho’s research in Al-Quds (East Jerusalem). Available at: https://wwwdepts-live.ucl.ac.uk/anthropology/assa/discoveries/beyond-anthropomorphism/ . Reproduced with permission.

Moreover, comics in their incarnation as graphic novels have grabbed literary prizes, for example Jimmy Corrigan: the smartest kid on earth (Jonathan Cape, 2001) by Chris Ware won the Guardian First Book Award in 2001, and Sabrina (Granta, 2018) by Nick Drnaso was longlisted for the Man Booker Prize in 2018 (somewhat controversially, see Nally, 2018).

Just Like Reading a Book, But Not…

But by extending the definition of digital comics[3] to include graphic novels mostly produced as ebooks, the ‘same-ness” of reading in print became evident over the course of interviews with publishers and creators. Publishing a comic in pdf format, whether that be on a website, on a publishing platform, or as a book is just the easiest, most cost-effective way to do it:

  We’re print first in our digital workflow—Outside of graphic novels, with other types of books we occasionally have the opportunity to work with the digital version as a consideration at the outset, in which case the tagging/classes are a factored in at the beginning stages (a good example would be a recent straight -to-digital reflowable ebook). This is the exception though, and also does not apply to graphic novels, which are all print-led. (Interview with publisher, December 2020)

Traditional book publishers have not been the only ones taking up comics - gaming and media companies have acquired the rights to comics, comics brands previously published in print. For more and different sectors, comics increasingly have become an attractive option especially for their multimedia appeal. However, what they do with the comics is a mixture of the same, for instance being print-led as described in the above comment, and different, for example through conversion to digital interactive versions as well as providing apps with more functionality than the ebook format.

It's How You Read Them

Comics formatted especially for reading on apps, such as 2000 AD, ComiXology, and Marvel Unlimited, can be variable in the types of reading experiences they offer to readers. While some have retained the ‘multi-panel display’ experience of reading a print comic book, others have gone beyond the ‘reads like a book’ experience. ComiXology, a digital distribution platform for comics owned by Amazon, pioneered the “guided view” technology now used by the likes of Marvel and DC, where readers view one panel at a time. Some of the comics readers I have interviewed refer to this reading experience as ‘the cinematic experience’. Readers page through the comic one panel or scene at a time, yes, as if watching it on film or TV.

These reading technologies do tend to work better on a tablet than on a smartphone. The act of scrolling required to read webcomics on the WEBTOON app (and others, such as Tapas), designed to be read on smartphones, produces that same kind of ‘cinematic’ effect: readers of comics on both the ComiXology and Web Toon apps I have interviewed describe the exact same experience: the build-up of “anticipation”, “tension”,  “on the edge of my seat” as they page or scroll down to the next scene/panel. WEBTOON creators employ certain techniques in order to create that tension in the vertical format, for example the use of white space between panels: the more space, the more scrolling, the more “edge of the seat” experience. Major comics publishers have started creating ‘vertical’ (scrolling on phones) comics: Marvel launched its Infinity Comics to appeal to the smartphone webcomics reader.

So, it would seem that good old-fashioned comics pacing combined with publishing through apps designed for digital devices provide a different, but same reading experience:  a uniquely digital reading experience.

Same But Different: I’m still here

So, here I am, still a PhD student currently conducting research with comics readers, as part of my research and as part of a secondment with the BL supported by AHRC Additional Student Development funding. This additional funding has afforded me the opportunity to employ UX (user behaviour/experience) techniques with readers, primarily through conducting reading observation sessions and activities. I will be following up this blog with an update on this research as well as a call for participation into more reader research.

References 

Berube, L. (2020) ‘Not Just for Kids: UK Digital Comics, from creation to consumption’, British Library Digital Scholarship Blog”, 24 August 2020. Available at: https://blogs.bl.uk/digital-scholarship/2020/08/not-just-for-kids-uk-digital-comics-from-creation-to-consumption.html

Drnaso, N. (2018) Sabrina. London, England: Granta Books.

McCloud, Scott (2000) Reinventing Comics: How Imagination and Technology Are Revolutionizing an Art Form.  New York, N.Y: Paradox Press. 

Nally, C. (2018) ‘Graphic Novels Are Novels: Why the Booker Prize Judges Were Right to Choose One for Its Longlist’, The Conversation, 26 July. Available at: https://theconversation.com/graphic-novels-are-novels-why-the-booker-prize-judges-were-right-to-choose-one-for-its-longlist-100562.

Priego, E. (2011) The Comic Book in the Age of Digital Reproduction. [Thesis] University College London. Available at: https://doi.org/10.6084/m9.figshare.754575.v4, pp278-280.

Ware, C. (2001) Jimmy Corrigan: the smartest kid on earth. London, England: Jonathan Cape.

Notes

[1] “More of the same but different”, a phrase used by a comics creator I interviewed in reference to what comics readers want to read.↩︎

[2] By ‘traditional’, I am referring to publishers who contract with comics creators to undertake the producing, publishing, distribution, selling of a comic, retaining rights for a certain period of time and paying the creator royalties. In my research, publishers who transacted business in this way included multinational and small press publishers. Self-publishing is where the creator owns all the rights and royalties, but also performs the production, publishing, distribution work, or pays for a third-party to do so. ↩︎

[3] For this research, digital comics include a diverse selection of what is produced electronically or online: webcomics, manga, applied comics, experimental comics, as well as graphic novels [ebooks].  I have omitted animation. ↩︎

27 June 2022

IIIF-yeah! Annual Conference 2022

At the beginning of June Neil Fitzgerald, Head of Digital Research, and myself attended the annual International Image Interoperability Framework (IIIF) Showcase and Conference in Cambridge MA. The showcase was held in Massachusetts’s Institute of Technology’s iconic lecture theatre 10-250 and the conference was held in the Fong Auditorium of Boylston Hall on Harvard’s campus. There was a stillness on the MIT campus, in contrast Harvard Yard was busy with sightseeing members of the public and the dismantling of marquees from the end of year commencements in the previous weeks. 

View of the Massachusetts Institute of Technology Dome IIIF Consortium sticker reading IIIF-yeah! Conference participants outside Boylston Hall, Harvard Yard


The conference atmosphere was energising, with participants excited to be back at an in-person event, the last one being held in 2019 in Göttingen, with virtual meetings held in the meantime. During the last decade IIIF has been growing as reflected by the fast expanding community  and IIIF Consortium, which now comprises 63 organisations from across the GLAM and commercial sectors. 

The Showcase on June 6th was an opportunity to welcome those new to IIIF and highlight recent community developments. I had the pleasure of presenting the work of British Library and Zooninverse to enable new IIIF functionality on Zooniverse to support our In the Spotlight project which crowdsources information about the Library’s historical playbills collection. Other presentations covered the use of IIIF with audio, maps, and in teaching, learning and museum contexts, and the exciting plans to extend IIIF standards for 3D data. Harvard University updated on their efforts to adopt IIIF across the organisation and their IIIF resources webpage is a useful resource. I was particularly impressed by the Leventhal Map and Education Center’s digital maps initiatives, including their collaboration on Allmaps, a set of open source tools for curating, georeferencing and exploring IIIF maps (learn more).

 The following two days were packed with brilliant presentations on IIIF infrastructure, collections enrichment, IIIF resources discovery, IIIF-enabled digital humanities teaching and research, improving user experience and more. Digirati presented a new IIIF manifest editor which is being further developed to support various use cases. Ed Silverton reported on the newest features for the Exhibit tool which we at the British Library have started using to share engaging stories about our IIIF collections.

 Ed Silverton presenting a slide about the Exhibit tool Conference presenters talking about the Audiovisual Metadata Platform Conference reception under a marquee in Harvard Yard

I was interested to hear about Getty’s vision of IIIF as enabling technology, how it fits within their shared data infrastructure and their multiple use cases, including to drive image backgrounds based on colour palette annotations and the Quire publication process. It was great to hear how IIIF has been used in digital humanities research, as in the Mapping Colour in History project at Harvard which enables historical analysis of artworks though pigment data annotations, or how IIIF helps to solve some of the challenges of remote resources aggregation for the Paul Laurence Dunbar initiative.

There was also much excitement about the Detekiiif browser extension for Chrome and Firefox that detects IIIF resources in websites and helps collect and export IIIF manifests. Zentralbibliothek Zürich’s customised version ZB-detektIIIF allows scholars to create IIIF collections in JSON-LD and link to the Mirador Viewer. There were several great presentations about IIIF players and tools for audio-visual content, such as Avalon, Aviary, Clover, Audiovisual Metadata Platform and Mirador video extension. And no IIIF Conference is ever complete without a #FunWithIIIF presentation by Cogapp’s Tristan Roddis this one capturing 30 cool projects using IIIF content and technology! 

We all enjoyed lots of good conversations during the breaks and social events, and some great tours were on offer. Personally I chose to visit the Boston Public Library’s Leventhal Map and Education Centre and exhibition about environment and social justice, and BPL Digitisation studio, the latter equipped with the Internet Archive scanning stations and an impressive maps photography room.

Boston Public Library book trolleys Boston Public Library Maps Digitisation Studio Rossitza Atanassova outside Boston Pubic Library


I was also delighted to pay a visit to the Harvard Libraries digitisation team who generously showed me their imaging stations and range of digitised collections, followed by a private guided tour of the Houghton Library’s special collections and beautiful spaces. Huge thanks to all the conference organisers, the local committee, and the hosts for my visits, Christine Jacobson, Bill Comstock and David Remington. I learned a lot and had an amazing time. 

Finally, all presentations from the three days have been shared and some highlights captured on Twitter #iiif. In addition this week the Consortium is offering four free online workshops to share IIIF best practices and tools with the wider community. Don’t miss your chance to attend. 

This post is by Digital Curator Rossitza Atanassova (@RossiAtanassova)

16 June 2022

Working With Wikidata and Wikimedia Commons: Poetry Pamphlets and Lotus Sutra Manuscripts

Greetings! I’m Xiaoyan Yang, from Beijing, China, an MSc student at University College London. It was a great pleasure to have the opportunity to do a four-week placement at the British Library and Wikimedia UK under the supervision of Lucy Hinnie, Wikimedian in Residence, and Stella Wisdom, Digital Curator, Contemporary British Collections. I mainly focused on the Michael Marks Awards for Poetry Pamphlets Project and Lotus Sutra Project, and the collaboration between the Library and Wikimedia.

What interested you in applying for a placement at the Library?

This kind of placement, in world-famous cultural institutions such as the Library and Wikimedia is  a brand-new experience for me. Because my undergraduate major is economic statistics, most of my internships in the past were in commercial and Internet technology companies. The driving force of my interest in digital humanities research, especially related data, knowledge graph, and visualization, is to better combine information technologies with cultural resources, in order to reach a wider audience, and promote the transmission of cultural and historical memory in a more accessible way.

Libraries are institutions for the preservation and dissemination of knowledge for the public, and the British Library is one of the largest and best libraries in the world without doubt. It has long been a leader and innovator in resource protection and digitization. The International Dunhuang Project (IDP) initiated by the British Library is now one of the most representative transnational collaborative projects of digital humanistic resources in the field. I applied for a placement opportunity hoping to learn more about the usage of digital resources in real projects and the process of collaboration from the initial design to the following arrangement. I also wanted  to have the chance to get involved in the practice of linked data, to accumulate experience, and find the direction of future improvements.

I would like to thank Dr Adi Keinan-Schoonbaert for her kind introduction to the British Library's Asian and African Digitization projects, especially the IDP, which has enabled me to learn more about the librarian-led practices in this area. At the same time, I was very happy to sit in on the weekly meetings of the Digital Scholarship Team during this placement, which allowed me to observe how collaboration between different departments are carried out and managed in a large cultural resource organization like the British Library.

Excerpt from Lotus Sutra Or.8210 S.155. An old scroll of parchment showing vertical lines of older Chinese script.
Excerpt from Lotus Sutra Or.8210 S.155. Kumārajīva, CC BY 4.0, via Wikimedia Commons

What is the most surprising thing you have learned?

In short, it is so easy to contribute knowledge at Wikimedia. In this placement, one of my very first tasks was to upload information about winning and shortlisted poems of the Michael Marks Awards for Poetry Pamphlets for each year from 2009 to the latest, 2021, to Wikidata. The first step was to check whether this poem and its author and publisher already existed in Wikidata. If not, I created an item page for it. Before I started, I thought the process would be very complicated, but after I started following the manual, I found it was actually really easy. I just need to click "Create a new Item". 

I always remember that the first item of people that I created was Sarah Jackson, one of the shortlist winners of this award in 2009. The unique QID was automatically generated as Q111940266. With such a simple operation, anyone can contribute to the vast knowledge world of Wiki. Many people who I have never met may read this item page  in the future, a page created and perfected by me at this moment. This feeling is magical and full of achievement for me. Also, there are many useful guides, examples and batch loading tools such as Quickstatements that help the users to start editing with joy. Useful guides include the Wikidata help pages for Quickstatements and material from the University of Edinburgh.

Image of a Wikimedia SPARQL query to determine a list of information about the Michael Marks Poetry Pamphlet uploads.
An example of one of Xiaoyan’s queries - you can try it here!

How do you hope to use your skills going forward?

My current dissertation research focuses on the regional classic Chinese poetry in the Hexi Corridor. This particular geographical area is deeply bound up with the Silk Road in history and has inspired and attracted many poets to visit and write. My project aims to build a proper ontology and knowledge map, then combining with GIS visualization display and text analysis, to explore the historical, geographic, political and cultural changes in this area, from the perspective of time and space. Wikidata provides a standard way to undertake this work. 

Thanks to Dr Martin Poulter’s wonderful training and Stuart Prior’s kind instructions, I quickly picked up some practical skills on Wiki queries construction. The layout design of the timeline and geographical visualization tools offered by Wiki query inspired me to improve my skills in this field more in the future. What’s more, although I haven’t had a chance to experience Wikibase yet, I am very interested in it now, thanks to Dr Lucy Hinnie and Dr Graham Jevon’s introduction, I will definitely try it in future.

Would you like to share some Wiki advice with us?

Wiki is very self-learning friendly: on the Help page various manuals and examples are presented, all of which are very good learning resources. I will keep learning and exploring in the future.

I do want to share my feelings and a little experience with Wikidata. In the Michael Marks Awards for Poetry Pamphlets Project, all the properties used to describe poets, poems and publishers can be easily found in the existing Wikidata property list. However, in the second Lotus Sutra Project, I encountered more difficulties. For example, it is difficult to find suitable items and properties to represent paragraphs of scrolls’ text content and binding design on Wikidata, and this information is more suitable to be represented on WikiCommons at present.

However, as I learn more and more other Wikidata examples, I understand more and more about Wikidata and the purpose of these restrictions. Maintaining concise structured data and accurate correlation is one of the main purposes of Wikidata. It encourages reuse of existing properties as well as imposing more qualifications on long text descriptions. Therefore, this feature of Wikidata needs to be taken into account from the outset when designing metadata frameworks for data uploading.

In the end, I would like to sincerely thank my direct supervisor Lucy for her kind guidance, help, encouragement and affirmation, as well as the British Library and Wikimedia platform. I have received so much warm help and gained so much valuable practical experience, and I am also very happy and honored that by using my knowledge and technology I can make a small contribution to linked data. I will always cherish the wonderful memories here and continue to explore the potential of digital humanities in the future.

This post is by Xiaoyan Yang, an MSc student at University College London, and was edited by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian) and Digital Curator Stella Wisdom (@miss_wisdom).

20 April 2022

Importing images into Zooniverse with a IIIF manifest: introducing an experimental feature

Digital Curator Dr Mia Ridge shares news from a collaboration between the British Library and Zooniverse that means you can more easily create crowdsourcing projects with cultural heritage collections. There's a related blog post on Zooniverse, Fun with IIIF.

IIIF manifests - text files that tell software how to display images, sound or video files alongside metadata and other information about them - might not sound exciting, but by linking to them, you can view and annotate collections from around the world. The IIIF (International Image Interoperability Framework) standard makes images (or audio, video or 3D files) more re-usable - they can be displayed on another site alongside the original metadata and information provided by the source institution. If an institution updates a manifest - perhaps adding information from updated cataloguing or crowdsourcing - any sites that display that image automatically gets the updated metadata.

Playbill showing the title after other large text
Playbill showing the title after other large text

We've posted before about how we used IIIF manifests as the basis for our In the Spotlight crowdsourced tasks on LibCrowds.com. Playbills are great candidates for crowdsourcing because they are hard to transcribe automatically, and the layout and information present varies a lot. Using IIIF meant that we could access images of playbills directly from the British Library servers without needing server space and extra processing to make local copies. You didn't need technical knowledge to copy a manifest address and add a new volume of playbills to In the Spotlight. This worked well for a couple of years, but over time we'd found it difficult to maintain bespoke software for LibCrowds.

When we started looking for alternatives, the Zooniverse platform was an obvious option. Zooniverse hosts dozens of historical or cultural heritage projects, and hundreds of citizen science projects. It has millions of volunteers, and a 'project builder' that means anyone can create a crowdsourcing project - for free! We'd already started using Zooniverse for other Library crowdsourcing projects such as Living with Machines, which showed us how powerful the platform can be for reaching potential volunteers. 

But that experience also showed us how complicated the process of getting images and metadata onto Zooniverse could be. Using Zooniverse for volumes of playbills for In the Spotlight would require some specialist knowledge. We'd need to download images from our servers, resize them, generate a 'manifest' list of images and metadata, then upload it all to Zooniverse; and repeat that for each of the dozens of volumes of digitised playbills.

Fast forward to summer 2021, when we had the opportunity to put a small amount of funding into some development work by Zooniverse. I'd already collaborated with Sam Blickhan at Zooniverse on the Collective Wisdom project, so it was easy to drop her a line and ask if they had any plans or interest in supporting IIIF. It turns out they had, but hadn't had the resources or an interested organisation necessary before.

We came up with a brief outline of what the work needed to do, taking the ability to recreate some of the functionality of In the Spotlight on Zooniverse as a goal. Therefore, 'the ability to add subject sets via IIIF manifest links' was key. ('Subject set' is Zooniverse-speak for 'set of images or other media' that are the basis of crowdsourcing tasks.) And of course we wanted the ability to set up some crowdsourcing tasks with those items… The Zooniverse developer, Jim O'Donnell, shared his work in progress on GitHub, and I was very easily able to set up a test project and ask people to help create sample data for further testing. 

If you have a Zooniverse project and a IIIF address to hand, you can try out the import for yourself: add 'subject-sets/iiif?env=production' to your project builder URL. e.g. if your project is number #xxx then the URL to access the IIIF manifest import would be https://www.zooniverse.org/lab/xxx/subject-sets/iiif?env=production

Paste a manifest URL into the box. The platform parses the file to present a list of metadata fields, which you can flag as hidden or visible in the subject viewer (public task interface). When you're happy, you can click a button to upload the manifest as a new subject set (like a folder of items), and your images are imported. (Don't worry if it says '0 subjects).

 

Screenshot of manifest import screen
Screenshot of manifest import screen

You can try out our live task and help create real data for testing ingest processes at ​​https://frontend.preview.zooniverse.org/projects/bldigital/in-the-spotlight/classify

This is a very brief introduction, with more to come on managing data exports and IIIF annotations once you've set up, tested and launched a crowdsourced workflow (task). We'd love to hear from you - how might this be useful? What issues do you foresee? How might you want to expand or build on this functionality? Email digitalresearch@bl.uk or tweet @mia_out @LibCrowds. You can also comment on GitHub https://github.com/zooniverse/Panoptes-Front-End/pull/6095 or https://github.com/zooniverse/iiif-annotations

Digital work in libraries is always collaborative, so I'd like to thank British Library colleagues in Finance, Procurement, Technology, Collection Metadata Services and various Collections departments; the Zooniverse volunteers who helped test our first task and of course the Zooniverse team, especially Sam, Jim and Chris for their work on this.

 

12 April 2022

Making British Library collections (even) more accessible

Daniel van Strien, Digital Curator, Living with Machines, writes:

The British Library’s digital scholarship department has made many digitised materials available to researchers. This includes a collection of digitised books created by the British Library in partnership with Microsoft. This is a collection of books that have been digitised and processed using Optical Character Recognition (OCR) software to make the text machine-readable. There is also a collection of books digitised in partnership with Google. 

Since being digitised, this collection of digitised books has been used for many different projects. This includes recent work to try and augment this dataset with genre metadata and a project using machine learning to tag images extracted from the books. The books have also served as training data for a historic language model.

This blog post will focus on two challenges of working with this dataset: size and documentation, and discuss how we’ve experimented with one potential approach to addressing these challenges. 

One of the challenges of working with this collection is its size. The OCR output is over 20GB. This poses some challenges for researchers and other interested users wanting to work with these collections. Projects like Living with Machines are one avenue in which the British Library seeks to develop new methods for working at scale. For an individual researcher, one of the possible barriers to working with a collection like this is the computational resources required to process it. 

Recently we have been experimenting with a Python library, datasets, to see if this can help make this collection easier to work with. The datasets library is part of the Hugging Face ecosystem. If you have been following developments in machine learning, you have probably heard of Hugging Face already. If not, Hugging Face is a delightfully named company focusing on developing open-source tools aimed at democratising machine learning. 

The datasets library is a tool aiming to make it easier for researchers to share and process large datasets for machine learning efficiently. Whilst this was the library’s original focus, there may also be other uses cases for which the datasets library may help make datasets held by the British Library more accessible. 

Some features of the datasets library:

  • Tools for efficiently processing large datasets 
  • Support for easily sharing datasets via a ‘dataset hub’ 
  • Support for documenting datasets hosted on the hub (more on this later). 

As a result of these and other features, we have recently worked on adding the British Library books dataset library to the Hugging Face hub. Making the dataset available via the datasets library has now made the dataset more accessible in a few different ways.

Firstly, it is now possible to download the dataset in two lines of Python code: 

Image of a line of code: "from datasets import load_dataset ds = load_dataset('blbooks', '1700_1799')"

We can also use the Hugging Face library to process large datasets. For example, we only want to include data with a high OCR confidence score (this partially helps filter out text with many OCR errors): 

Image of a line of code: "ds.filter(lambda example: example['mean_wc_ocr'] > 0.9)"

One of the particularly nice features here is that the library uses memory mapping to store the dataset under the hood. This means that you can process data that is larger than the RAM you have available on your machine. This can make the process of working with large datasets more accessible. We could also use this as a first step in processing data before getting back to more familiar tools like pandas. 

Image of a line of code: "dogs_data = ds['train'].filter(lamda example: "dog" in example['text'].lower()) df = dogs_data_to_pandas()

In a follow on blog post, we’ll dig into the technical details of datasets in some more detail. Whilst making the technical processing of datasets more accessible is one part of the puzzle, there are also non-technical challenges to making a dataset more usable. 

 

Documenting datasets 

One of the challenges of sharing large datasets is documenting the data effectively. Traditionally libraries have mainly focused on describing material at the ‘item level,’ i.e. documenting one dataset at a time. However, there is a difference between documenting one book and 100,000 books. There are no easy answers to this, but libraries could explore one possible avenue by using Datasheets. Timnit Gebru et al. proposed the idea of Datasheets in ‘Datasheets for Datasets’. A datasheet aims to provide a structured format for describing a dataset. This includes questions like how and why it was constructed, what the data consists of, and how it could potentially be used. Crucially, datasheets also encourage a discussion of the bias and limitations of a dataset. Whilst you can identify some of these limitations by working with the data, there is also a crucial amount of information known by curators of the data that might not be obvious to end-users of the data. Datasheets offer one possible way for libraries to begin more systematically commuting this information. 

The dataset hub adopts the practice of writing datasheets and encourages users of the hub to write a datasheet for their dataset. For the British library books, we have attempted to write one of these datacards. Whilst it is certainly not perfect, it hopefully begins to outline some of the challenges of this dataset and gives end-users a better sense of how they should approach a dataset. 

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs