THE BRITISH LIBRARY

Digital scholarship blog

5 posts from July 2019

29 July 2019

Invitation to join ‘Digital Cultural Heritage Innovation Labs Book Sprint’, Doha, Qatar, 23-27 September 2019

Posted by Mahendra Mahey, Manager of BL Labs and Milena Dobreva-McPherson, Associate Professor Library and Information Studies UCL Qatar.

Laboratory Greyscale 3 resized 1600 x 900
Building Digital Cultural Heritage Innovation Labs

Calling all of you that work in and/or do research in Digital Cultural Heritage Innovation Labs! Join us in Doha, Qatar 23-27 September 2019 for a week long Book Sprint!

Apply now by midnight 5 August 2019 for one of the fully funded trips to take part!

We want to create a new guide for setting up, running and maintaining a Digital Cultural Heritage Innovation Lab. Let’s share our experiences (both awesome and challenging) far and wide so that other organisations don’t have to reinvent the wheel.

This is a fantastic opportunity to contribute to a legacy for the Cultural Heritage sector that is bigger than any of us individually. It’s going to be a lot of hard work, but it will also be a fun, creative, and rewarding process!

The idea for the sprint came from the Building Library Labs event we held at the British Library in September 2018, work which we built on in Copenhagen (March, 2019).

The event is generously sponsored by UCL Qatar, Qatar University Library and Books Sprint Ltd.

We will let applicants know by 8 August 2019 if they have been successful.

If you are not chosen, or simply can’t make it, don’t worry! We will find other ways to get you involved after the book is published.  We intend to promote the work as part of ‘International Open access week’ which will take place between 21-27 October 2019. We also want to make sure the book is a ‘living publication’ that will be constantly updated and amended online to ensure its continued relevance and usefulness to the broader cultural heritage sector and possibly further.

If you have any specific questions before you apply, please feel free to email me at mahendra.mahey@bl.uk or Milena at m.dobreva@ucl.ac.uk

22 July 2019

Our highlights from Digital Humanities 2019: Nora and Giorgia

We've put together a series of posts about our experiences at the Digital Humanities conference in Utrecht this month. In this post, Digital Curator Nora McGregor and Dr Giorga Tolfo from the British Library / Alan Turing Institute's Living with Machines project shares her impressions. See also Mia and Yann's post, and Rossitza and Daniel's post.

Tivoli
Lunchtime at TivoliVredenburg music hall, viewed from Cloud Nine

Nora McGregor

My most exciting discovery was the Libraries & Digital Humanities Special Interest Group (@LibsDH) of the Alliance of Digital Humanities Organizations (ADHO) (@ADHOrg). I found my PEOPLE! This is a loosely joined cohort of folks from Libraries across the world with a peculiar passion for all that is supporting digital scholarship. We held a casual, brief and efficient gathering over lunch where talk turned to joining forces to develop a summer school (in the vein of popular and prolific Rare Books, and Digital Humanities week long affairs) to address the specific digital skills training needs of Librarians.

Giorga Tolfo

What talk were you most looking forward to, why? 

DH2019 offered a huge plethora of panels and workshops to choose from. When I first read the program I felt like a hungry person at the supermarket, craving everything on the shelves. Since I couldn’t eat everything, I had to focus on the panels whose topic I knew was or sounded relevant to the Living with Machines project, an interdisciplinary project at the crossroad between historical research and artificial intelligence in collaboration with the Alan Turing Institute.

As my role involves an in depth knowledge of digitisation strategies for newspapers and data models, my attention was immediately drawn to the Oceanic Exchanges panel, which focussed on some case studies around the spread of news and/or the translation of concepts across the atlantic ocean as it emerged in newspapers. Among these studies, one I was particularly interested in was on the concept of italianità (= italianness) in italian and US-based italian ethnic newspapers at the time of the unification of Italy.

What did you learn?

What I found most interesting, beyond the content of the singular research cases presented, was that regardless of the focus of the project, in the digital humanities community there are an underpinning shared methodology, as well as common known concerns and issues that we are trying to face both independently and together.

Among the latter there is certainly a problem with the availability and access to datasets. Due to copyrights limitations or lack of funds to digitise new material some possibly relevant datasets aren’t available, forcing in some cases the research questions to be reshaped according to what is available. The impact of this is the blurring of the distinction between historical research and storytelling. Which stories emerge from data analysis and visualisation? Are these universal or just some among the many possible ones? Are the sources biased or reliable? These are epistemological problems that need to be addressed carefully.

On the other side, in terms of shared methodology, there is an increasing awareness of the need (and effort) to focus on integration, sustainability and shareability. Hence the interest of many research teams on common data models, open linked data, use of standard languages and methodologies, scalable and reusable components.

Anything else?

Well, the fun run! I was one of the enthusiastic 25 people who set the alarm clock at 6am just to run.. for fun!

Our highlights from Digital Humanities 2019: Rossitza and Daniel

We've put together a series of posts about our experiences at the Digital Humanities conference in Utrecht this month. In this post, Digital Curator Dr Rossitza Atanassova and Daniel Van Strien from the British Library / Alan Turing Institute's Living with Machines project shares their impressions. See also Mia and Yann's post, and Nora and Giorgia's post.

Rossitza Atanassova

I loved the variety of the topics and formats in the conference programme and I have tweeted about some of most interesting talks I attended. I have to say movement between sessions was a bit complicated by the proliferation of stairs and escalators in the venue, which otherwise presented great views of Utrecht and offered comfy cushions to relax on during lunch! Like Mia and Nora I was inspired by the @LibsDH meetup, whilst my most surprising encounter was with the winning skeleton-poster.

Skeleton
Gender and Intersectional Identities in DH poster by @jotis13 @quinnanya @khetiwe24 @RHendery

Of particular interest to me were the sessions on digitised newspapers and related conversations between researchers and collections holding institutions. Back in the office I will reflect on some of the discussions and will continue to engage with the ‘Researchers & Libraries working together on improving digitised newspapers’ and the Digital Historical Periodica Groups. Many of the talks illustrated the importance of semantic annotations for synoptic examination of historical periodicals and I hope to apply at work my learning from the excellent pre-conference workshop on Named Entity Processing delivered by @ImpressoProject

I also found enjoyable and cool the panel on Exploring AV Corpora in the Humanities, in particular the presentation on the Distant Viewing Toolkit (DVT) for the Cultural Analysis of Moving. And outside the conference I had fun taking a walk along the artistic light-themed route to explore Utrecht city-centre. I enjoyed the conference so much that I have submitted DH2020 reviewer self-nomination!

Tunnel
Installation by Erik Groen, Ganzenmarkt Tunnel, Utrecht

Daniel Van Strien

I thought I would focus on a couple of sessions relating to OCR at the conference that I would be keen to explore further as part of the Living with Machines project. In particular I am keen to further explore two tools for OCR; Transkribus and Kraken

Transkribus was discussed in the context of doing OCR on newspapers as part of the Impresso project in the paper ‘Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images’. Although I have previously heard about the tool I was particularly interested to hear about how it was being used to work with newspapers as I have primarily heard about its use in handwritten text recognition. The paper also gave some initial idea of how much ground truth data might need to be generated before training a new OCR engine for newspaper text. As part of the impreso project a167 pages of ground truth data was created, not trivial by any means but much lower than what might be expected. With this amount of data the project was able to generate a substantial improvement in the quality of OCR over various version of ABBYY software. 

The second tool was Kraken which was introduced in the paper ‘Kraken - an Universal Text Recognizer for the Humanities’. I was particularly interested to hear about how this tool could be easily trained with new annotations to recognise new types and languages. For the most part Living with Machines will be relying on previously generated OCR but there may be occasions when it is worth investing time to try and produce more accurate OCR. For these occasions, testing Kraken further would be one nice starting point particularly because of the relative ease it provides in training data at the line rather than word level. This makes annotating the ground truth data (a little) less painful and time consuming. 



Image1


Our highlights from Digital Humanities 2019: Mia and Yann

In this series of posts, Digital Curator (and Co-Investigator on the Living with Machines project), Digital Curator Dr Mia Ridge has collected impressions of the Digital Humanities conference, held in Utrecht from July 8 - 12. In this post, Mia and Yann Ryan, Curator, Newspaper Data, share their impressions. See also Rossitza and Daniel's post, and Nora and Giorgia's post.

As my colleague Rossitza posted beforehand, a lot of the Digital Scholarship team were at the DH2019 conference. Before we left, I asked everyone to note which sessions they were looking forward to, what they'll bring back from the conference to work, and anything interesting or cool they spotted at the conference.

Mia Ridge

I’d reviewed some conference proposals so I knew there’d be lots of interesting talks, but I was particularly looking forward to lots of conversations at the conference - some online, some in person. (Apparently I tweeted quite a lot). A lot of those conversations ended up being around improving the discoverability and research experience with digitised newspapers. There was a strong theme around thinking about cultural heritage organisations as partners in research rather than simply as ‘data providers’. If you’re a researcher or GLAM practitioner who’d like to continue the conversation, join the Periodica discussion list or check out notes from the impromptu meetup on the Friday at DH2019 Lunch session - Researchers & Libraries working together on improving digitised newspapers.

I went to some sessions that were outside my usual areas of focus (media studies, VR/AR) and others that were familiar territory (designing data structures, working with union catalogue data). I’ve shared my more detailed but very rough and ready DH2019 conference notes on my own blog. Finally, I really enjoyed the 'Libraries and DH conversation', and both the libraries and newspapers conversations will inform my work in digital scholarship and on Living with Machines.

Digitised newspapers
Ad hoc session 'Researchers & Libraries working together on improving digitised newspapers' Photo by @MartijnKleppe

Yann Ryan

DH2019 was my first mega-conference and I found it a really useful, if overwhelming experience. Picking talks was a bit like trying to work out your schedule at Glastonbury: at both it’s worth keeping in mind you’re always going to miss something, and anyway the best bits always happen in the spaces in between: whether that is browsing the posters or just hanging around the communal area chatting to new friends.

It was fun to see the ways in which derived newspaper data – word embeddings, named entities and so forth – are being used by researchers in practice, and I loved hearing about the ways in which this material is bringing new insights to historical themes and concepts. It was also a great place to learn about new projects: I was particularly excited by the Impresso project (a platform for browsing digitised newspapers) and the Amsterdam Time Machine.

I learned a great deal about how researchers are working with data, as well as the format and size of newspaper datasets they need or expect. The Heritage Made Digital project will release open datasets based on the newspapers we’re digitising, and hearing how others are using similar material will help to inform the best way to carry that out.

My single favourite thing was Repetition And Popularity In Early Modern Songs, a poster for a project which measured the repetitiveness of early modern song lyrics against the number of times they were reprinted. Turns out more repetitive songs got reprinted sooner and lasted longer, which is a bit like modern pop music!

01 July 2019

British Library Digital Scholarship at Digital Humanities 2019

DSphotocollage4
BL_DigiSchol Twitter Profiles Collage

 

Members of the Library’s Digital Scholarship Department will be present at DH2019 - so far the biggest representation of our team at this important DH event. We are all really excited about it, especially the first timers amongst us!

Below we highlight the team’s contributions to the DH2019 Programme and hope to see you at these sessions. We will also be attending some of the pre-conference workshops and will record our #DH2019 impressions in a post-conference blogpost, so watch this space.

If you are interested to arrange a casual meetup do message us @BL_DigiSchol and our personal Twitter accounts. See you in Utrecht #DH2019!

 

Monday 8th July

Libraries As Research Partner in Digital Humanities

DH 2019 Pre-Conference, The Hague

Mahendra Mahey et al.

 

Wednesday 10th July 

A National Library’s digitisation guide for Digital Humanists

Rossitza Ilieva Atanassova

(11:00-12:30 SP-04 Cultural Heritage, Artifacts and Institution)

This short paper will give practical advice about the Library’s digitisation planning process for scholars who wish to use digitised resources in their research. The information will help scholars understand the institutional context, the roles involved in digitisation, the preparation stages and documentation required, typical timelines and the decision-making that happens at different stages. With this knowledge it is hoped that DH scholars will be better prepared for the process and will factor it in their research funding proposals. They will also gain an understanding of the Library’s considerations and policy for making available for reuse existing digitised resources and how scholars could request this for their projects. In making the policy and processes at the institution more transparent, the presentation will expose some of the hidden labour undertaken by cultural heritage staff to enable Digital Humanities (DH) research.

 

The Past, Present and Future of Digital Scholarship with Newspaper Collections

Mia Ridge1, Giovanni Colavizza2, Laurel Brake3, Maud Ehrmann4, Jean-Philippe Moreux6, Andrew Prescott5

1British Library; 2The Alan Turing Institute; 3Birkbeck, Univ of London; 4EPFL; 5University of Glasgow; 6Bibliothèque nationale de France

(2:00pm - 3:30pm P-07: History and Historiographies)

Historical newspapers are of interest to many humanities scholars as sources of information and language closely tied to a particular time, social context and place. Digitised newspapers are also of interest to many data-driven researchers who seek large bodies of text on which they can try new methods and tools. Recently, large consortia projects applying data science and computational methods to historical newspapers at scale have emerged, including NewsEye, impresso, Oceanic Exchanges and Living with Machines.

This multi-paper panel draws on the work of a range of interdisciplinary newspaper-based digital humanities and/or data science projects, alongside 'provocations' from two senior scholars who will provide context for current ambitions. As a unique opportunity for stakeholders to engage in dialogue, for the DH2019 community to ask their own questions of newspaper-based projects, and for researchers to map methodological similarities between projects, it aims to have a significant impact on the field.



Thursday 11th July

The Complexities of Video Games and Education: In the Library, the Museum, Schools and Universities

Stella Wisdom1, Andrew Burn2, Sally Bushell3, James Butler3, Xenia Zeiler4, Duncan Hay3

1British Library, United Kingdom; 2University College London Institute of Education, United Kingdom; 3Lancaster University, United Kingdom; 4University of Helsinki, Finland

(11:00-12:30 P-15: Cultural Heritage, Art/ifacts and Institutions)

This panel explores several research projects that use video games and digital game making tools as methods for engaging learners of all ages with digitised collections from libraries, archives and museums to facilitate new understandings of historical and cultural events, or create new media adaptations and interpretations of classic literary works.

 

Data Science & Digital Humanities: new collaborations, new opportunities and new complexities

Beatrice Alex1, Anne Alexander2, David Beavan3, Eirini Goudarouli4, Leonardo Impett6, Barbara McGillivray2, Nora McGregor5, Mia Ridge5

1University of Edinburgh; 2University of Cambridge; 3The Alan Turing Institute; 4The National Archives; 5British Library; 6Bibliotheca Hertziana - Max Planck Institute for Art History

(11:00-12:30 P-17: Scholarly Communities, Communication, Pedagogy)

This panel highlights the emerging collaborations and opportunities between the fields of Digital Humanities (DH), Data Science (DS) and Artificial Intelligence (AI). It charts the enthusiastic progress of a national-level research institute focussed on DS & AI, as it engages non-STEM disciplines. We discuss the exciting work and learnings from various new activities, across a number of high-profile institutions. As these initiatives push the intellectual and computational boundaries, the panel considers both the gains, benefits, and complexities encountered. The panel latterly turns towards the future of such interdisciplinary working, considering how DS & DH collaborations can grow, with a view towards a manifesto.