THE BRITISH LIBRARY

Digital scholarship blog

170 posts categorized "Data"

08 April 2020

Legacies of Catalogue Descriptions and Curatorial Voice: a new AHRC project

Add comment

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the School of History, Art History and Philosophy, University of Sussex. James has a background in the history of the printed image, archival theory, art history, and computational analysis. He is author of The Business of Satirical Prints in Late-Georgian England (2017), the first monograph on the infrastructure of the satirical print trade circa 1770-1830, and a member of the Programming Historian team.

I love a good catalogue. Whether describing historic books, personal papers, scientific objects, or works of art, catalogue entries are the stuff of historical research, brief insights into a many possible avenues of discovery. As a historian, I am trained to think critically about catalogues and the entries they contain, to remember that they are always crafted by people, institutions, and temporally specific ways of working, and to consider what that reality might do to my understanding of the past those catalogues and entries represent. Recently, I've started to make these catalogues my objects of historical study, to research what they contain, the labour that produced them, and the socio-cultural forces that shaped that labour, with a particular focus on the anglophone printed catalogue circa 1930-1990. One motivation for this is purely historical, to elucidate what I see as an important historical phenomenon. But another is about now, about how those catalogues are used and reused in the digital age. Browse the shelves of a university library and you'll quickly see that circumstances of production are encoded into the architecture of the printed catalogue: title pages, prefaces, fonts, spines, and the quality of paper are all signals of their historical nature. But when their entries - as many have been over the last 30 years - are moved into a database and online, these cues become detached, and their replacement – a bibliographic citation – is insufficient to evoke their historical specificity, does little to help alert the user to the myriad of texts they are navigating each time they search an online catalogue.

It is these interests and concerns that underpin "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship", a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library. This 12-month project funded by the Arts and Humanities Research Council aims to open up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Our pilot research will investigate the temporal and spatial legacy of a catalogue I know well - the landmark ‘Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum’, produced by Mary Dorothy George between 1930 and 1954, 1.1 million words of text to which all scholars of the long-eighteenth century printed image are indebted, and which forms the basis of many catalogue entries at other institutions, not least those of our partners at the Lewis Walpole Library. We are particularly interested in tracing the temporal and spatial legacies of this catalogue, and plan to repurpose corpus linguistic methods developed in our "Curatorial Voice" project (generously funded by the British Academy) to examine the enduring legacies of Dorothy George's "voice" beyond her printed volumes.

Participants at the Curatorial Voices workshop, working in small groups and drawing images on paper.
Some things we got up to at our February 2019 Curatorial Voice workshop. What a difference a year makes!

But we also want to demonstrate the value of these methods to cultural institutions. Alongside their collections, catalogues are central to the identities and legacies of these institutions. And so we posit that being better able to examine their catalogue data can help cultural institutions get on with important catalogue related work: to target precious cataloguing and curatorial labour towards the records that need the most attention, to produce empirically-grounded guides to best practice, and to enable more critical user engagement with 'legacy' catalogue records (for more info, see our paper ‘Investigating Curatorial Voice with Corpus Linguistic Techniques: the case of Dorothy George and applications in museological practice’, Museum & Society, 2020).

A table with boxes of black and red lines which visualise the representation of spacial and non-spacial sentence parts in the descriptions of the satirical prints.
An analysis of our BM Satire Descriptions corpus (see doi.org/10.5281/zenodo.3245037 for how we made it and doi.org/10.5281/zenodo.3245017 for our methods). In this visualization - a snapshot of a bigger interactive - one box represents a single description, red lines are sentence parts marked ‘spatial’, and black lines are sentence parts marked as ‘non-spatial’. This output was based on iterative machine learning analysis with Method52. The data used is published by ResearchSpace under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Over the course of the "Legacies" project, we had hoped to run two capability building workshops aimed at library, archives, and museum professionals. The first of these was due to take place at the British Library this May, and the aim of the workshop was to test our still very much work-in-progress training module on the computational analysis of catalogue data. Then Covid-19 hit and, like most things in life, the plan had to be dropped.

The new plan is still in development, but the project team know that we need input from the community to make the training module of greatest benefit to that community. The current plan is that in late summer we will run some ad hoc virtual training sessions on computational analysis of catalogue data. And so we are looking for library, archives, and museum professionals who produce or work with catalogue data to be our crash test dummies, to run through parts of the module, to tell us what works, what doesn't, and what is missing. If you'd be interested in taking part in one of these training sessions, please email James Baker and tell me why. We look forward to hearing from you.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

27 January 2020

How historians can communicate their research online

Add comment

This blog post is by Jonathan Blaney (Institute of Historical Research), Frances Madden (British Library), Francesca Morselli (DANS), Jane Winters (School of Advanced Study, University of London)

This blog will be published in several other locations including the FREYA blog and the IHR blog

Large satellite receiver
Source: Joshua Hoehne, Unsplash

On 4 December 2019, the FREYA project in collaboration with UCL Centre for Digital Humanities, Institute of Historical Research, the British Library and DARIAH-EU organized a workshop in London on identifiers in research. In particular this workshop - mainly directed to historians and humanities scholars - focused on ways in which they can build and manage an online profile as researchers, using tools such as ORCID IDs. It also covered best practices and methods of citing digital resources to make humanities researchers' work connected and discoverable to others. The workshop had 20 attendees, mainly PhD students from the London area but also curators and independent researchers.

Presentations

Frances Madden from the British Library introduced the day which was supported by the FREYA project which is funded under the EU’s Horizon 2020 programme. FREYA aims to increase the use of persistent identifiers (PIDs) across the research landscape by building up services and infrastructure. The British Library is leading on the Humanities and social sciences aspect of this work.

Frances described how PIDs are central to scholarly communication becoming effective and easy online. We will need PIDs not just for publications but for grey literature, for data, for blog posts, presentations and more. This is clearly a challenge for historians to learn about and use, and the workshop is a contribution to that effort.

PIDs: some historical context

Jonathan Blaney from the Institute of Historical Research said that there is a context to citation and the persistent identifiers which have grown up around traditional forms of print citation. These are almost invisible to us because they are deeply familiar. He gave an example of a reference to the gospel story of the woman taken in adultery:

John 7:53-8:11

There are three conventions here: the name ‘John’ (attached to this gospel since about the 2nd century) the chapter divisions (medieval and ascribed to the English bishop Stephen Langton) and the verse divisions (from the middle of the 16th century).

When learning new forms of referencing, such as the ones under discussion at the workshop, Jonathan suggested that historians should remember their implicit knowledge has been learned. He finished with an anecdote about Harry Belafonte, retold in Anthony Grafton’s The Footnote: A Curious History. As a young sailor Belafonte wanted to follow up on references in a book he had read. The next time he was on shore leave he went to a library and told the librarian:

“Just give me everything you’ve got by Ibid.”

People in conference room watching a presentation

Demonstrating the benefits

Prof Jane Winters from School introduced what she claimed was her most egotistical presentation by explaining her own choices in curating her online presence and also what was beyond her control. She showed the different results of web searches for herself using Google and DuckDuckGo and pointed out how things she had almost forgotten about can still feature prominently in results.

Jane described her own use of Twitter, and highlighted both the benefits and challenges of using social media to communicate research and build an online profile. It was the relatively rigid format of her institutional staff profile that led her to create her own website. Although Jane has an ORCID ID and a page on Humanities Commons, for example, there are many online services she has chosen not to use, such as academia.edu.

This is all very much a matter of personal choice, dependent upon people’s own tastes and willingness to engage with a particular service.

How to use what’s available

Francesca Morselli from DANS gave a presentation aiming to provide useful resources about identifiers for researchers as well as explaining in a simple yet exhaustive way how they "work" and the rationale behind them.

Most importantly PIDs ensure:

  1. Citability and discoverability (both for humans and machine)
  2. Disambiguation (between similar objects)
  3. Linking to related resources
  4. Long-term archiving and findability

Francesca then introduced the support provided by projects and infrastructures: FREYA, DARIAH-EU and ORCID. Among the FREYA project pillars (PID graph, PID Commons, PID Forum), the latter is available for anyone interested in identifiers.

The DARIAH-EU infrastructure for Arts and Humanities has recently launched the DARIAH Campus platform which includes useful resources on PIDs and managing research data (i.e. all materials which are used in supporting research). In 2018 DARIAH also organized a winter school on Open Data Citation, whose resources are archived here.

Dariah

 

A Publisher’s Perspective

Kath Burton from Routledge Journals emphasised how much use publishers make of digital tools to harvest convent, including social media crawlers, data harvesters and third party feeds.

The importance of maximising your impact online when publishing was explained, both before publishing (filling in the metadata, giving a meaningful title) and afterwards (linking to the article from social media and websites), as well as how publishers can help support this.

Kath went on to give an example of Taylor & Francis’s interest in the possibilities of online scholarly communication by describing its commitment to publishing 3D models of research objects, which is does on via Sketchfab page.

Breakout Groups

After the presentations and a coffee break there were group discussions about what everyone had just heard. During the first part, the groups were asked what was new to them in the presentations. It was clear from discussions around the room that attendees had heard much which was new to them. For example, some attendees had ORCID IDs but many were surprised at the range of things for which they could be used, such as in journal articles and logging into systems. They were also struck by the range of things in which publishers were interested such as research data. Many were really interested in the use of personal websites to manage their profile.

When asked what tallied with their experiences, it became clear that they were keen to engage with these systems, setting up ORCID IDs and Humanities Commons profiles but that they felt that they were too early on in their careers to have anything to contribute to these platforms and felt they were designed for established researchers. Jane Winters stressed that one could adopt a broad approach to the term ‘publications’, including posters, presentations and blog posts and encouraged all to share what they had.

Lastly discussion turned to how the group cites digital resources. This led to an interesting conversation around the citation of archived web pages and how to cite webpages which might change over time, with tools such as the Internet Archive being mentioned. There was also discussion about whether one can cite resources such as Wikipedia and it was clear that this was not something which had been encouraged. Jonathan, who has researched this subject, mentioned that he had found established academics are happy to cite Wikipedia than those earlier in their career.

Conclusions

The workshop effectively demonstrated the sheer range of online tools, social media forums and publishing venues (both formal and informal) through which historians can communicate their research online. This is both an opportunity and a problem. It is a challenge to develop an online presence - to decide which methods are most appropriate for different kinds of research and different personalities - but that is just the first step. For research communication to be truly valuable, it is necessary to focus your effort, manage your online activities and take control of how you appear to others in digital spaces. PIDs are invaluable in achieving this, and in helping you to establish a personal research profile that stays with you as you move through your career. At the start of the day, the majority of those who attended the workshop did not know very much about PIDs and how you can put them to use, but we hope that they came away with an enhanced understanding of the issues and possibilities, the awareness that it does not take much effort or skill to make a real difference to how you are perceived online, and some practical advice about next steps.

It was apparent that, with some admirable exceptions, neither higher education institutions nor PID organisations are successfully communicating the value and importance of PIDs to early career researchers. Workshop attendees particularly welcomed the opportunity to hear from a publisher and senior academic about how PIDs are used to structure, present and disseminate academic work. The clear link between communicating research online and public engagement also emerged during the course of the day, and there is obvious potential for collaboration between PID organisations and those involved with training focused on impact and public engagement. We ended the day with lots of ideas for further advocacy and training, and a shared appreciation for the value of PIDs for helping historians to reach out to a range of different audiences online.

20 January 2020

Using Transkribus for Arabic Handwritten Text Recognition

Add comment

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

In the last couple of years we’ve teamed up with PRImA Research Lab in Salford to run competitions for automating the transcription of Arabic manuscripts (RASM2018 and RASM2019), in an ongoing effort to identify good solutions for Arabic Handwritten Text Recognition (HTR).

I’ve been curious to test our Arabic materials with Transkribus – one of the leading tools for automating the recognition of historical documents. We’ve already tried it out on items from the Library’s India Office collection as well as early Bengali printed books, and we were pleased with the results. Several months ago the British Library joined the READ-COOP – the cooperative taking up the development of Transkribus – as a founding member.

As with other HTR tools, Transkribus’ HTR+ engine cannot start automatic transcription straight away, but first needs to be trained on a specific type of script and handwriting. This is achieved by creating a training dataset – a transcription of the text on each page, as accurate as possible, and a segmentation of the page into text areas and line, demarcating the exact location of the text. Training sets are therefore comprised of a set of images and an equivalent set of XML files, containing the location and transcription of the text.

A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474
A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474.

 

This process can be done in Transkribus, but in this case I already had a training set created using PRImA’s software Aletheia. I used the dataset created for the competitions mentioned above: 120 transcribed and ground-truthed pages from eight manuscripts digitised and made available through QDL. This dataset is now freely accessible through the British Library’s Research Repository.

Transkribus recommends creating a training set of at least 75 pages (between 5,000 and 15,000 words), however I was interested to find out a few things. First, the methods submitted for the RASM2019 competition worked on a training set of 20 pages, with an evaluation set of 100 pages. Therefore, I wanted to see how Transkribus’ HTR+ engine dealt with the same scenario. It should be noted that the RASM2019 methods were evaluated using PRImA’s evaluation methods, and this is not the case with Transkribus evaluation method – therefore, the results shown here are not accurately comparable, but give some idea on how Transkribus performed on the same training set.

I created four different models to see how Transkribus’ recognition algorithms deal with a growing training set. The models were created as follows:

  • Training model of 20 pages, and evaluation set of 100 pages
  • Training model of 50 pages, and evaluation set of 70 pages
  • Training model of 75 pages, and evaluation set of 45 pages
  • Training model of 100 pages, and evaluation set of 20 pages

The graphs below show each of the four iterations, from top to bottom:

CER of 26.80% for a training set of 20 pages

CER of 19.27% for a training set of 50 pages

CER of 15.10% for a training set of 75 pages

CER of 13.57% for a training set of 100 pages

The results can be summed up in a table:

Training Set (pp.)

Evaluation Set (pp.)

Character Error Rate (CER)

Character Accuracy

20

100

26.80%

73.20%

50

70

19.27%

80.73%

75

45

15.10%

84.9%

100

20

13.57%

86.43%

 

Indeed the accuracy improved with each iteration of training – the more training data the neural networks in Transkribus’ HTR+ engine have, the better the results. With a training set of a 100 pages, Transkribus managed to automatically transcribe the rest of the 20 pages with 86.43% accuracy rate – which is pretty good for historical handwritten Arabic script.

As a next step, we could consider (1) adding more ground-truthed pages from our manuscripts to increase the size of the training set, and by that improve HTR accuracy; (2) adding other open ground truth datasets of handwritten Arabic to the existing training set, and checking whether this improves HTR accuracy; and (3) running a few manuscripts from QDL through Transkribus to see how its HTR+ engine transcribes them. If accuracy is satisfactory, we could see how to scale this up and make those transcriptions openly available and easily accessible.

In the meantime, I’m looking forward to participating at the OpenITI AOCP workshop entitled “OCR and Digital Text Production: Learning from the Past, Fostering Collaboration and Coordination for the Future,” taking place at the University of Maryland next week, and catching up with colleagues on all things Arabic OCR/HTR!

 

13 December 2019

Do you want to see my butterfly collection?

Add comment

Posted on behalf of Sara Lucas Agutoli, artist, associate professor at the Accademia di Belle Arti di Bologna, BL Labs Artist in residence and runner up in the BL Labs Artistic Award 2019.

Sara Lucas Agutoli
Artist: Sara Lucas Agutoli
(Copyright: Ilenia Arosio)

Sara Lucas Agutoli lives and works between London and Bologna.  Her academic research focuses on the concepts of true and false in art, in particular in photography. In her art S. L. Agutoli merges popular themes with a learned and symbolic system of citations. Working with different media, she reflects on the idea of ongoing transformation – of the spaces, of the body, as well as of aesthetics – and creates personal architectures drawing on her inner experiences, knowledge and visions.

When occupied with my full time job, I often spend the time wandering on the net, looking for pictures that trigger my interest, either because they are odd and curious or aesthetically pleasant and elegant.

Since 2011 I’ve enjoyed calling myself a cyber-flâneur1:. unlike the Parisian strollers described by Baudelaire, I walked through cyber avenues, getting lost amid different digital archives. I glimpsed through collections of images instead of windows, stared at close-ups of manuscripts instead of sunsets on rivers. The net was my city and I just followed my nose walking through it. I wanted to make my curiosity an aesthetic operation. In doing so I’ve come to believe that online archives are my personal church of Saint-Julien-le-Puvre, the chosen venue for my cyber-dadaist performances,
see: https://www.moma.org/collection/works/184056

For years my working activity followed a pattern: a few months of research – during which I spend hours and hours on Flickr Commons browsing online archives of museums and institutions saving selected images on my hard disk–, followed by months in the studio working creatively with the pictures accumulated.

I did accumulate images and emotions, from advertising to family album pictures. I wanted to explore how photography was used in different parts of the world, eras and in different economical contexts.

In 2011, while in Montreal for my first art residence, I analysed the different uses of vernacular photography in the 50s in North America and Italy. To do so, I used the open archives of most of the North American Libraries (New York Public library, Congregation of Sister of St. Joseph in Canada, California Historical Society and many others) and a private physical archive located in a tin box in my grandmother house.

This lead to a series of pictures inspired by this contrast. The series was exhibited in a solo show called Fermez les yeux.

Sara Mickey: Fermez les yeux
Sara Mickey: Fermez les yeux

The vastness and the richness of topics of the images I accumulated triggered constantly my creativity and my sense of humour. They often made me ask myself  “why do those pictures exist”?

The images – especially those more vernacular, random and unforeseen – became the objects trouvés I could rework using my imagination and reality.

During this dadaist-inspired net-surfing, the most fertile encounter of the last years has been the one with the collections of two of the major London institutions: the British Library and the Wellcome Collection digital archives.  

I was about to move from Italy to London and so my artistic research was about to change, inspired by this encounter.

I started to become interested in the aesthetics of the Victorian era and in the concept of the museum as an extension of a wunderkammer.

I started collecting  images of naturalia 2 and decided to transform them into artificialia in my studio.  And so I did, merging and morphing creatively these images. In 2013 I produced a digital collage of a butterfly scientific illustration and a medical vulva lithography and it was exhibited in public space in Bologna during CHEAP poster Festival.

Cheap Poster Festival
Posters as part of the CHEAP poster festival

This collage of images from the British Library and the Wellcome Collection became the first piece of the larger project Il muro delle meraviglie – the wall of wonders – for which I chose to use the wall of my living room in my home/atelier in NW London.

Il muro delle meraviglie started like a joke to mock the colonialist aesthetic of Victorian museum collections and it became a work of art. Among the wonders I added subsequently, you can find that first collage of the butterfly and the vulva, which I decided to call  “Do you want to see my butterfly collection?” to make my queer/ feminist perspective encounter the delicacy of the naturalistic illustration of butterfly.

The title, in Italian, refers to an apparently naïve question which has an explicit sexual allusion.

The person who asks “come see my butterflies’ collection” might be suggesting it to obtain something more, as the butterfly is used as a metaphor for the female sex.

Sara Deep Thrash
Intallazione a DEEP THRASH

This work criticises the male chauvinist obsession for cataloguing, intended as an activity aimed more at showing off, than simply showing. 

It represents a feminist critique and re-appropriation of such images.

Here the butterflies become proper “c*nts” and give visibility to the female genitalia.

It has been exhibited for the first time in 2013 on the streets of Bologna (IT) during CHEAP festival and at Queer demonstration thanks to C*ntemporary

If I didn’t have access to the BL and the Wellcome digital archives, all of this wouldn’t have been possible.

Finally, I would like to thank the support I have received from BL Labs and am excited about the new experiments and projects waiting for me around the corner.

Footnotes

  1. Flâneur: Flâneur is a French term meaning ‘stroller’ or ‘loafer’ used by nineteenth-century French poet Charles Baudelaire to identify an observer of modern urban life. Dada raised the tradition of Flânerie to the level of an aesthetic operation. The Parisian walk described by Walter Benjamin in the 1920s id utilized as an art form that inscribes itself directly in the real space and time, rather than on a medium.
  2. Naturalia : Naturalia, which includes creatures and natural objects, with a particular interest in monsters

29 November 2019

Introducing Filipe Bento - BL Labs Technical Lead

Add comment

Posted by Filipe Bento, BL Labs Technical Lead

Filipe BentoI am passionate about libraries and digital initiatives within them, and am particularly interested in Open Knowledge, scholarly communication, scientific information dissemination, (Linked) Open Data, and all the innovative services that can be offered to promote their ultimate dissemination and usage, not only within academia, but also within the wider community such as industry and society. I have over twenty years experience in developing and supporting library tools, some of which have facilitated automation over manual methods to make the lives of people who work or use libraries easier.

Before working at the British Library, I was an independent consultant in the areas of digital strategies and initiatives, library technologies, information management, digital policies, Software as a Service (SaaS) and Open Source Software (OSS). Previous to that, I worked at EBSCO Information Services in several roles, firstly as the Discovery Service Engineering Support Team Manager (Europe and Latin America) and for three years as the Software Services, Application Programming Interfaces (API) and Applications (Apps) manager. My last role at EBSCO was implementing and managing the EBSCO App Store which involved working with several departments within the organisation such as marketing and legal.

Filipe Bento giving a talk the BAD conference in the Azores
Giving a talk the National Congress of BAD (Portuguese Librarians, Archivists and Documentalists Association), in the Azores

I helped the University of Aveiro's Library become the first Portuguese adopter of reference Open Source Software (OSS)  - OJS [Open Journal Systems] and implemented the institutional digital repository DSpace for the university (which included a massive data transformation and records deposit, often from citations exported from Scopus). I started my career as a lecturer and then as a computer specialist at the University of Aveiro’s Library, coordinating the development of information systems for its many branches for over fifteen years.

My PhD research in Information and Communication in Digital Platforms gave me the opportunity to connect with my professional interests in libraries, especially in the areas of information discovery. In my PhD, I was able to implement VuFind with innovative community features, as a proposal for the university, which involved engaging actively in its developer community, providing general and technical support in the process. My thesis is available via the link "Search 4.0: Integration and Cooperation Confluence in Scientific Information Discovery".

University of Aveiro (main campus), Portugal
University of Aveiro (main campus), Portugal

I have also been very active in a number of communities;
I was the (former) chairman of the board of USE.pt, the Portuguese Ex Libris Systems’ Users Association, and a previous member of the DigiMedia Research Center - Digital Media and Interaction at the University of Aveiro.

In my personal life I had been a radio and club DJ and worked on a number of personal music projects. I enjoy photography and video and am a keen traveler. I especially like being behind the wheels of cars / motorbikes and the propellers of drones.

I am really excited in joining the BL Labs team as I believe it provides an excellent opportunity to apply my skills, knowledge and expertise in library digital collections development, systems, data and APIs in a digital scholarship and wider context. I am really looking forward in offering practical advice and implementations in providing access to data, data curation, data visualisation, text and data mining and interactive web based computing environments such as Jupyter Notebooks to name a few. BL Labs and the British Library offers a rich, innovative and stimulating environment to explore what its staff and users want to do with its incredible and diverse digital collections.

30 October 2019

Workshop on “Digitisation Workflows & Digital Research Studies Methodologies”

Add comment

In this post, Nicolas Moretto, Metadata Systems Analyst at the British Library, reflects on his work trip to India.

Earlier this year I was given the opportunity to attend a workshop on “Digitisation Workflows & Digital Research Studies Methodologies” held at the National Centre for Biological Sciences (NCBS) in Bangalore, India.

The workshop, which was held on the NCBS campus in the northern part of Bangalore, was jointly organised by Tom Derrick (Two Centuries of Indian Print - 2CIP) and our host Venkat Srinivasan who is the archivist at NCBS. Tom represented the 2CIP project while I attended to cover different metadata aspects. The event was attended by colleagues from 26 different institutions. Tom and I were kindly provided with accommodation on the campus.

a photo showing the workshop participants sitting outside the main building at NCBS campus

Attendees of the workshop outside the NCBS main building                                                                                                         

The workshop was intended as an opportunity to learn more about cataloguing, digitisation and OCR, and for the Indian participants to meet colleagues from Bangalore and other parts of India, share experiences, exchange ideas and discuss common standards and best practices. The chance to meet with colleagues working on similar activities – and encountering similar challenges – was an important aspect of the workshop. Most attendees were not professional archivists but had come into archives from academic and other backgrounds and had been exposed to archives and cultural heritage in different ways. All participants shared a high level of enthusiasm for archives and a passion for preserving cultural heritage and the memory of their communities.

workshop participants sitting at desks during the workshop one group of workshop participants in discussion
On the left: The Safeda Room at NCBS. On the right: the NCBS campus offered space for discussions during the breaks

 

The topics of the two-day workshop ranged from talks on description and arrangement of material (archival and related discovery standards), presentations on specific projects to digitisation workflows and OCR. Tom gave a practical demo of OCR tools for Indic scripts. I gave a presentation on each day, covering metadata description as well as reuse and discovery.

Ten of the Indian institutions presented five-minute lightning talks covering a diverse range of initiatives and describing their archival collections. The Ashoka Archives of Contemporary India presented their collection, which includes the Mahatma Ghandi papers as well as material from other Indian politicians and academics. The Keystone Foundation gave an overview of the opportunities and challenges around their work with indigenous communities in India. Their aim is to challenge traditional portrayals of indigenous culture by employing oral history interviews, which give a voice to parts of the culture that would otherwise remain unheard. The French Institute of Pondicherry featured material that had been digitised for several Endangered Archives Programme (EAP) projects, including ceiling murals and glass frames. The participants from FLAME University presented a project of digitising Indian cookbooks, showing the interdependencies between caste and cooking. The multimedia resource Sahapedia (https://www.sahapedia.org/) was presented as a way of curating Indian heritage in an online environment. All participants were looking for ways to make cultural heritage more accessible using digital tools. On the afternoon of the second day, the participants had an opportunity to undertake a hands-on activity testing OCR tools using their own material.

The workshop was well received and feedback was overall positive. The participants voiced interest in receiving more in-depth practical training and how-to guides around cataloguing and metadata capture, setting up systems as well as preservation and conservation.

Maya Dodd speaking during her presentation Venkat shows a group of participants some documents inside the NCBS archive
On the left: MayaDodd from FLAME University presents the Indian recipes project. On the right: Venkat giving a tour of the NCBS archive

 

On the evening of the first day, Venkat gave us a tour of the NCBS archives, which he had built up from scratch, working with NCBS researchers and with the help of student volunteers. The archive was remarkably open, inviting in students and staff even if they did not have an explicit research interest. Venkat was very interested in maintaining it as an open space. His archive is accompanied by an open and evolving exhibition space, which students can contribute to.

Setting up archives in India is not an easy undertaking, and Venkat has put in a tremendous effort to make it work. Even the essentials can be difficult to come by, since there is no supplier for archival materials in India for example, and Venkat had to import all his acid-free boxes from Germany.

On my last day, I accompanied Tom on a visit to the Karnataka State Central Library. The Director of the Department of Public Libraries, Dr. Satish Kumar Hosamani was not present, but his team kindly offered to give us a tour of the library. The Librarian showed us the round reading room and newspaper reading room and the collection of rare books and manuscripts. The State Library is planning to digitise these in the near future. This activity is currently awaiting approval and funding from the Karnataka state government.

A view outside the front of the State Central Library  A view of the reading room inside the State Central Library

On the left: Karnataka State Central Library in Cubbon Park. On the right: the round reading room in the State Central Library

 

Trying to find our way to the library, we discovered the existence of a “British Library Road” in Bangalore but were unable to reach it due to the customary extremely heavy traffic in Bangalore. Getting to and from destinations usually took a long time. The best way to get around over short distances was by “Tuk-tuk”, the ever-present means of transport in Indian cities.

A screenshot of Google Maps centred on British Library Road, Bangalore A photo taken from a tuk tuk of congested traffic in Bangalore
On the left: British Library Road in Bangalore. On the right: view from a Tuk-Tuk - the traffic in Bangalore was eternally gridlocked!

 

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

02 October 2019

The 2019 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate a British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2019 British Library Labs Staff Award, now in its fourth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is 12:00 (BST), Tuesday 5 November 2019.

Nominees will be highlighted on Monday 11 November 2019 at the British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects.

You can see the projects submitted by members of staff for the last two years' awards in our online archive, as well as blogs for last year's winners and runners-up.

The Staff Award complements the British Library Labs Awards, introduced in 2015, which recognise outstanding work that has been done in the broader community. Last year's winner focused on the brilliant work of the 'Polonsky Foundation England and France Project: Digitising and Presenting Manuscripts from the British Library and the Bibliothèque nationale de France, 700–1200'.

The runner up for the BL Labs Staff Award last year was the 'Digital Documents Harvesting and Processing Tool (DDHAPT)' which was designed to overcome the problem of finding individual known documents in the United Kingdom's Legal Deposit Web Archive.

In the public competition, last year's winners drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.