THE BRITISH LIBRARY

Digital scholarship blog

90 posts categorized "Humanities"

29 November 2019

Introducing Filipe Bento - BL Labs Technical Lead

Add comment

Posted by Filipe Bento, BL Labs Technical Lead

Filipe BentoI am passionate about libraries and digital initiatives within them, and am particularly interested in Open Knowledge, scholarly communication, scientific information dissemination, (Linked) Open Data, and all the innovative services that can be offered to promote their ultimate dissemination and usage, not only within academia, but also within the wider community such as industry and society. I have over twenty years experience in developing and supporting library tools, some of which have facilitated automation over manual methods to make the lives of people who work or use libraries easier.

Before working at the British Library, I was an independent consultant in the areas of digital strategies and initiatives, library technologies, information management, digital policies, Software as a Service (SaaS) and Open Source Software (OSS). Previous to that, I worked at EBSCO Information Services in several roles, firstly as the Discovery Service Engineering Support Team Manager (Europe and Latin America) and for three years as the Software Services, Application Programming Interfaces (API) and Applications (Apps) manager. My last role at EBSCO was implementing and managing the EBSCO App Store which involved working with several departments within the organisation such as marketing and legal.

Filipe Bento giving a talk the BAD conference in the Azores
Giving a talk the National Congress of BAD (Portuguese Librarians, Archivists and Documentalists Association), in the Azores

I helped the University of Aveiro's Library become the first Portuguese adopter of reference Open Source Software (OSS)  - OJS [Open Journal Systems] and implemented the institutional digital repository DSpace for the university (which included a massive data transformation and records deposit, often from citations exported from Scopus). I started my career as a lecturer and then as a computer specialist at the University of Aveiro’s Library, coordinating the development of information systems for its many branches for over fifteen years.

My PhD research in Information and Communication in Digital Platforms gave me the opportunity to connect with my professional interests in libraries, especially in the areas of information discovery. In my PhD, I was able to implement VuFind with innovative community features, as a proposal for the university, which involved engaging actively in its developer community, providing general and technical support in the process. My thesis is available via the link "Search 4.0: Integration and Cooperation Confluence in Scientific Information Discovery".

University of Aveiro (main campus), Portugal
University of Aveiro (main campus), Portugal

I have also been very active in a number of communities;
I was the (former) chairman of the board of USE.pt, the Portuguese Ex Libris Systems’ Users Association, and a previous member of the DigiMedia Research Center - Digital Media and Interaction at the University of Aveiro.

In my personal life I had been a radio and club DJ and worked on a number of personal music projects. I enjoy photography and video and am a keen traveler. I especially like being behind the wheels of cars / motorbikes and the propellers of drones.

I am really excited in joining the BL Labs team as I believe it provides an excellent opportunity to apply my skills, knowledge and expertise in library digital collections development, systems, data and APIs in a digital scholarship and wider context. I am really looking forward in offering practical advice and implementations in providing access to data, data curation, data visualisation, text and data mining and interactive web based computing environments such as Jupyter Notebooks to name a few. BL Labs and the British Library offers a rich, innovative and stimulating environment to explore what its staff and users want to do with its incredible and diverse digital collections.

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

14 September 2019

BL Labs Awards 2019: enter before 2100 on Sunday 29th September! (deadline extended)

Add comment

We have extended our deadline for our BL Labs Awards to 21:00 (BST) on Sunday 29th September, submit your entry here. If you have already entered, you don't have to resubmit, however, we are happy to receive updated entries too.

The BL Labs Awards formally recognises outstanding and innovative work that has been created using the British Library’s digital collections and data.

Submit your entry, and help us spread the word to all interested parties!

This year, BL Labs is commending work in four key areas:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

After the submission deadline of 21:00 (BST) on Sunday 29th September for entering the BL Labs Awards has passed, the entries will be shortlisted. Selected shortlisted entrants will be notified via email by midnight BST on Thursday 10th October 2019. 

A prize of £500 will be awarded to the winner and £100 to the runner up in each Awards category at the BL Labs Symposium on 11th November 2019 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners up over the last four years has led to the production of a remarkable and varied collection of innovative projects. In 2018, the Awards commended work in four main categories – Research, Artistic, Commercial and Teaching & Learning:

Photo collage

  • Research category Award (2018) winner: The Delius Catalogue of Works: the production of a comprehensive catalogue of works by the composer Delius, based on research using (and integrated with) the BL’s Archives and Manuscripts Catalogue by Joanna Bullivant, Daniel Grimley, David Lewis and Kevin Page from Oxford University’s Music department.
  • Artistic Award (2018) winner: Another Intelligence Sings (AI Sings): an interactive, immersive sound-art installation, which uses AI to transform environmental sound recordings from the BL’s sound archive by Amanda Baum, Rose Leahy and Rob Walker independent artists and experience designers.
  • Commercial Award (2018) winner: Fashion presentation for London Fashion Week by Nabil Nayal: the Library collection - a fashion collection inspired by digitised Elizabethan-era manuscripts from the BL, culminating in several fashion shows/events/commissions including one at the BL in London.
  • Teaching and Learning (2018) winner: Pocket Miscellanies: ten online pocket-book ‘zines’ featuring images taken from the BL digitised medieval manuscripts collection by Jonah Coman, PhD student at Glasgow School of Art.

For further information about BL Labs or our Awards, please contact us at labs@bl.uk.

Posted by Mahendra Mahey, Manager of of British Library Labs.

13 September 2019

Results of the RASM2019 Competition on Recognition of Historical Arabic Scientific Manuscripts

Add comment

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

Earlier this year, the British Library in collaboration with PRImA Research Lab and the Alan Turing Institute launched a competition on the Recognition of Historical Arabic Scientific Manuscripts, or in short, RASM2019. This competition was held in the context of the 15th International Conference on Document Analysis and Recognition (ICDAR2019). It was the second competition of this type, following RASM2018 which took place in 2018.

The Library has an extensive collection of Arabic manuscripts, comprising of almost 15,000 works. We have been digitising several hundred manuscripts as part of the British Library/Qatar Foundation Partnership, making them available on Qatar Digital Library. A natural next-step would be the creation of machine-readable content from scanned images, for enhanced search and whole new avenues of research.

Running a competition helps us identify software providers and tool developers, as well as introduce us to the specific challenges that pattern recognition systems face when dealing with historic, handwritten materials. For this year’s competition we provided a ground truth set of 120 images and associated XML files: 20 pages to be used to train text recognition systems to automatically identify Arabic script, and a 100 pages to evaluate the training.

Aside from providing larger training and evaluation sets, for this year’s competition we’ve added an extra challenge – marginalia. Notes written in the margins are often less consistent and less coherent than main blocks of text, and can go in different directions. The competition set out three different challenges: page segmentation, text line detection and Optical Character Recognition (OCR). Tackling marginalia was a bonus challenge!

We had just one submission for this year’s competition – RDI Company, Cairo University, who previously participated in 2018 and did very well. RDI submitted three different methods, and participated in two challenges: text line segmentation and OCR. When evaluating the results, PRImA compared established systems used in industry and academia – Tesseract 4.0, ABBYY FineReader Engine 12 (FRE12), and Google Cloud Vision API – to RDI’s submitted methods. The evaluation approach was the same as last year’s, with PRImA evaluating page analysis and recognition methods using different evaluation metrics, in order to gain an insight into the algorithms.

 

Results

Challenge 1 - Page Layout Analysis

The first challenge was set out to identify regions in a page, and find out where blocks of text are located on the page. RDI did not participate in this challenge, therefore an analysis was made only on common industry software mentioned above. The results can be seen in the chart below:

Chart showing RASM2019 page segmentation results
Chart showing RASM2019 page segmentation results

 

Google did relatively well here, and the results are quite similar to last year’s. Despite dealing with the more challenging marginalia text, Google’s previous accuracy score (70.6%) has gone down only very slightly to a still impressive 69.3%.

Example image showing Google’s page segmentation
Example image showing Google’s page segmentation

 

Tesseract 4 and FRE12 scored very similarly, with Tesseract decreasing from last year’s 54.5%. Interestingly, FRE12’s performance on text blocks including marginalia (42.5%) was better than last year’s FRE11 performance without marginalia, scoring at 40.9%. Analysis showed that Tesseract and FRE often misclassified text areas as illustrations, with FRE doing better than Tesseract in this regard.

 

Challenge 2 - Text Line Segmentation

The second challenge looked into segmenting text into distinct text lines. RDI submitted three methods for this challenge, all of which returned the text lines of the main text block (as they did not wish to participate in the marginalia challenge). Results were then compared with Tesseract and FineReader, and are reflected below:

Chart showing RASM2019 text line segmentation results
Chart showing RASM2019 text line segmentation results

 

RDI did very well with its three methods, with an accuracy level ranging between 76.6% and 77.6%. However, despite not attempting to segments marginalia text lines, their methods did not perform as well as last year’s method (with 81.6% accuracy). Their methods did seem to detect some marginalia, though very little overall, as seen in the screenshot below.

Example image showing RDI’s text line segmentation results
Example image showing RDI’s text line segmentation results

 

Tesseract and FineReader again scored lower than RDI, both with decreasing accuracy compared to RASM2018’s results (Tesseract 4 with 44.2%, FRE11 with 43.2%). This is due to the additional marginalia challenge. The Google method does not detect text lines, therefore the Text Line chart above does not include their results.

 

Challenge 3 - OCR Accuracy

The third and last challenge was all about text recognition, tackling the correct identification of characters and words in the text. Evaluation for this challenge was conducted four times: 1) on the whole page, including marginalia, 2) only on main blocks of text, excluding marginalia, 3) using the original texts, and 4) using normalised texts. Text normalisation was performed for both ground truth and OCR results, due to the historic nature of the material, occasional unusual spelling, and use/lack of diacritics. All methods performed slightly better when not tested on marginalia; accuracy rates are demonstrated in the charts below:

Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
 
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)

 

It is evident that there are minor differences in the character accuracies for the three RDI methods, with RDI2 performing slightly better than the others. When comparing the OCR accuracy between texts with and without marginalia, there are slightly higher success rates for the latter, though the difference is not significant. This means that tested methods performed on the marginalia almost as well as they did on the main text, which is encouraging.

Comparing RASM2018’s results, RDI’s results are good but not as good as last year (with 85.44% accuracy), likely to be a result of adding marginalia to the recognition challenge. Google performed very well too, considering they did not specifically train or optimised for this competition. Tesseract’s results went down from 30.45% to 25.13%, and FineReader Engine 12 performed better than its previous version FRE11, going up from 12.23% to 17.53% accuracy. However, it is still very low, as handwritten texts are not part of their target material.

 

Further Thoughts

RDI-Corporation has its own historical Arabic handwritten and typewritten OCR system, which has been built using different historical manuscripts. Its methods have done well, given the very challenging nature of the documents. Neither Tesseract nor ABBYY FineReader produce usable results, but that’s not surprising since they are both optimised for printed texts, and target contemporary material and not historical manuscripts.

As next steps, we would like to test these materials with Transkribus, which produced promising results for early printed Indian texts (see e.g. Tom Derrick’s blog post – stay tuned for some even more impressive results!), and potentially Kraken as well. All ground truth will be released through the Library’s future Open Access repository (now in testing phase), as well as through the website of IMPACT Centre for Competence. Watch this space for any developments!

 

21 August 2019

Chevening British Library Fellowship working with Chinese historical texts

Add comment

Chevening is the UK government’s international awards programme aimed at developing global leaders. In 2015, the Foreign and Commonwealth Office (FCO) has partnered with the British Library to offer professionals two new fellowships every year. These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as Nationalism, Independence, and Partition in South Asia, 1900-1950 and Big Data and Libraries.

We are thrilled to announce that one of the two placements available for the 2020/2021 academic year will focus on automating the recognition of historical Chinese handwritten texts. This is a special opportunity to work in the Library’s Digital Scholarship Department, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this Fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library
Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it’s important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western language print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, a project called “Living with Machines,” is underway to apply Optical Character Recognition (OCR) to UK newspapers, design and implement new methods in data science and artificial intelligence, and analyse these materials at scale.

Taking a broader perspective on Library collections, we have started to explore opportunities with non-Latin collections too. Members of the Digital Scholarship team are engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. Another initiative which Tom is engaged with is exploring Transkribus for Bengali printed texts. He trained Transkribus’ HTR+ recognition engine, which ended up transcribing this material at 94% character accuracy! Tom and Adi’s recent blog post in EuropeanaTech Insight (issue on OCR) summarises these initiatives.

Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.
Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2020 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” Fellow page: https://www.chevening.org/fellowship/british-library-chinese-handwritten-texts/.

Applications close at 12pm (GMT), 5 November 2019. Good luck!

 

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

20 August 2019

Reflections from the First Sub-Saharan African Workshop on Digital Innovation Labs in Cultural Heritage Institutions

Add comment

Guest posting by Milena Dobreva-McPherson, Associate Professor Library and Information Studies UCL Qatar with contributions from Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) and Fidelity Phiri, Visiting Researcher, UCL Qatar.

Recently UCL Qatar joined forces with the National Museums Board of Zambia to deliver a day-long workshop on Innovation Labs in Cultural Heritage Institutions which was hosted on 1 August, 2019 by the Livingstone Museum, Zambia. This workshop was the first of its kind in Sub Saharan Africa and was made possible with the support of the Africa and the Middle East Teaching Fund of the UCL Global Engagement Office. Initially planned for 15 professionals from the cultural heritage sector, it attracted 27 participants (see Fig. 1) coming from six towns located in four out of the ten provinces in Zambia (see map).

Fig. 1.  Participants by sector and gender in the First Sub Saharan Workshop on Innovation Labs in Cultural Heritage Institutions in Zambia, 1‌ August 2019
Fig. 1.  Participants by sector and gender in the First Sub Saharan Workshop on Innovation Labs in Cultural Heritage Institutions in Zambia, 1‌ August 2019

After two vibrant events about Digital Innovation Labs in Cultural Heritage organisations, this was the first event bringing together a higher proportion of participants from museums and archives in addition to the libraries represented. The Building Library Labs event was the first of its kind ever held at the British Library in September 2018, followed by a second workshop in Copenhagen (March, 2019); both attracted mostly library professionals though there were a few attendees from Archives, Galleries and Museums.  

The Innovation Labs emerged as specialised library units supporting a variety of users in experimenting with digital content in the mid 2000s. However, engaging users with digital content is equally important for museums, archives and galleries. And the exchange of institutional experience across the digital cultural heritage sector is essential for professionals who work there, especially when the number of Innovation Labs around the world is growing steadily. The presenters at the event in Zambia included Milena Dobreva-McPherson, UCL Qatar, Fidelity Phiri, Mr Tuesday Bwalya, University of Zambia, Mr Fred Nyambe (Registrar of Collections, Livingstone Museum) and Mr Brian Mwale, (Chief Librarian, National Archives of Zambia). Fiona Clancy (Digitisation Workflow Manager, British Library), Mahendra Mahey (BL Labs Manager, British Library), and Somia Salim, who is an MA student in Library and Information Studies at UCL Qatar, also contributed online (see full programme with links to some of the presentations).

The call for innovation in the heritage sector was clearly communicated in the welcome address delivered on behalf of the Livingstone district acting commissioner Harriet Kawina; this had been duly reported in several publications in Zambian national newspapers (see for an example Fig.2).

Fig. 2. Article on the event in the MAST independent newspaper, 5.08.2019
Fig. 2. Article on the event in the MAST independent newspaper, 5 August 2019

The mixture of presentations discussing the current trends in user engagement with digital content and local examples of digitisation projects and how it works in reality, created a great opportunity to discuss the stumbling blocks in opening content for wider access and use. For some Zambian institutions, the main issue is a lack of a coherent and systematic digitisation efforts, and there was a shared feeling amongst attendees that there needed to be more guidance and clear policies about digitisation for them to follow, which are still not currently in place. Other institutions accumulated digital content and keep it available only internally, not looking into or even considering access and use to external audiences using online platforms on a systematic basis. 

The workshop discussions were lively and engaged; they identified that there is definitely a larger scope to learn from each other locally. In addition, there was a growing realisation amongst organisations that opening their digital content for use by an external audience is now the next step on the agenda of those who have already accumulated it. The feedback of one of the participants, which perhaps summarised this the most clearly, suggested what needs to happen after this workshop in three-steps: 

  • Put the knowledge acquired in the workshop to use ASAP.
  • Conduct a follow up workshop to determine progress in the innovation labs created.
  • Organise a massive awareness campaign to introduce potential users to the innovation labs created.

The workshop participants also experienced the traditional scheduled power outage for the day which explains why the photo illustrating the presentation of certificates is a bit dark (but hey, in the digital world we can easily fix such glitches!)

Fig.3. Participant receiving a certificate from Assoc. Prof. Milena Dobreva
Fig.3. Participant receiving a certificate from Associate Professor Milena Dobreva

Bringing for the first time to the Sub Saharan region the knowledge about innovation labs, fostering dialogue between representatives of different cultural heritage institutions, and discussing the issue of improving access to digital content is just a humble first step in what we hope will help local institutions to improve user engagement and overcome the current digital divide which keeps available digital content hidden from the world.  Read more about Innovation Labs and the digital divide.

Dr Milena Dobreva-McPherson, Associate Professor Library and Information Studies at UCL Qatar Dr Milena Dobreva-McPherson, is Associate Professor Library and Information Studies at UCL Qatar with international experience of working in Bulgaria, Scotland and Malta. Since graduating M.Sc. (Hons) in Informatics in 1991, Milena specialized in digital humanities and digital cultural heritage in the Bulgarian Academy of Sciences, where she earned her PhD in 1999 in Informatics and Applied Mathematics and served as the Founding Head of the first Digitisation Centre in Bulgaria (2004); she was also a member of the Executive Board of the National Commission of UNESCO. Milena’s research interests are in the areas of innovation diffusion in the cultural heritage sector; citizen science; and users of digital libraries. Milena is a member of the editorial board of the IFLA Journal - Sage, and of the International Journal on Digital Libraries (IJDL) - Springer and a member of the steering committed of the three biggest conference series in digital libraries, IJDL, TPDL and ICADL. Consultant of the Europeana Task Force on Research Requirements.  

 

Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA). He holds a Master’s Degree in Information Science from China. In addition, Mr. Bwalya has received training in India and Belgium in Library Automation with Free and Open Source Library Management Systems such as Koha and ABCD. His research interests include free and open source library management systems; open access publishing; database systems; web development; records management; cataloguing and classification.

 

Fidelity Phiri, Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar Fidelity Phiri is currently employed as Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar. He has worked for National Museums Board of Zambia since 2001. He  holds a Bachelor's degree in Library and Information Science from the University of Zambia. Fidelity  also graduated in April 2019 from UCL Qatar and  is a holder of a Master’s degree in Library and Information studies. His research interests are in bibliometrics studies and digital humanities/units  that provide access to digital collections.

Acknowledgements: We would like to thank Fred Nyambe for the photos and Dania Jalees for the infographic and the editing.

22 July 2019

Our highlights from Digital Humanities 2019: Nora and Giorgia

Add comment

We've put together a series of posts about our experiences at the Digital Humanities conference in Utrecht this month. In this post, Digital Curator Nora McGregor and Dr Giorga Tolfo from the British Library / Alan Turing Institute's Living with Machines project shares her impressions. See also Mia and Yann's post, and Rossitza and Daniel's post.

Tivoli
Lunchtime at TivoliVredenburg music hall, viewed from Cloud Nine

Nora McGregor

My most exciting discovery was the Libraries & Digital Humanities Special Interest Group (@LibsDH) of the Alliance of Digital Humanities Organizations (ADHO) (@ADHOrg). I found my PEOPLE! This is a loosely joined cohort of folks from Libraries across the world with a peculiar passion for all that is supporting digital scholarship. We held a casual, brief and efficient gathering over lunch where talk turned to joining forces to develop a summer school (in the vein of popular and prolific Rare Books, and Digital Humanities week long affairs) to address the specific digital skills training needs of Librarians.

Giorga Tolfo

What talk were you most looking forward to, why? 

DH2019 offered a huge plethora of panels and workshops to choose from. When I first read the program I felt like a hungry person at the supermarket, craving everything on the shelves. Since I couldn’t eat everything, I had to focus on the panels whose topic I knew was or sounded relevant to the Living with Machines project, an interdisciplinary project at the crossroad between historical research and artificial intelligence in collaboration with the Alan Turing Institute.

As my role involves an in depth knowledge of digitisation strategies for newspapers and data models, my attention was immediately drawn to the Oceanic Exchanges panel, which focussed on some case studies around the spread of news and/or the translation of concepts across the atlantic ocean as it emerged in newspapers. Among these studies, one I was particularly interested in was on the concept of italianità (= italianness) in italian and US-based italian ethnic newspapers at the time of the unification of Italy.

What did you learn?

What I found most interesting, beyond the content of the singular research cases presented, was that regardless of the focus of the project, in the digital humanities community there are an underpinning shared methodology, as well as common known concerns and issues that we are trying to face both independently and together.

Among the latter there is certainly a problem with the availability and access to datasets. Due to copyrights limitations or lack of funds to digitise new material some possibly relevant datasets aren’t available, forcing in some cases the research questions to be reshaped according to what is available. The impact of this is the blurring of the distinction between historical research and storytelling. Which stories emerge from data analysis and visualisation? Are these universal or just some among the many possible ones? Are the sources biased or reliable? These are epistemological problems that need to be addressed carefully.

On the other side, in terms of shared methodology, there is an increasing awareness of the need (and effort) to focus on integration, sustainability and shareability. Hence the interest of many research teams on common data models, open linked data, use of standard languages and methodologies, scalable and reusable components.

Anything else?

Well, the fun run! I was one of the enthusiastic 25 people who set the alarm clock at 6am just to run.. for fun!

Our highlights from Digital Humanities 2019: Rossitza and Daniel

Add comment

We've put together a series of posts about our experiences at the Digital Humanities conference in Utrecht this month. In this post, Digital Curator Dr Rossitza Atanassova and Daniel Van Strien from the British Library / Alan Turing Institute's Living with Machines project shares their impressions. See also Mia and Yann's post, and Nora and Giorgia's post.

Rossitza Atanassova

I loved the variety of the topics and formats in the conference programme and I have tweeted about some of most interesting talks I attended. I have to say movement between sessions was a bit complicated by the proliferation of stairs and escalators in the venue, which otherwise presented great views of Utrecht and offered comfy cushions to relax on during lunch! Like Mia and Nora I was inspired by the @LibsDH meetup, whilst my most surprising encounter was with the winning skeleton-poster.

Skeleton
Gender and Intersectional Identities in DH poster by @jotis13 @quinnanya @khetiwe24 @RHendery

Of particular interest to me were the sessions on digitised newspapers and related conversations between researchers and collections holding institutions. Back in the office I will reflect on some of the discussions and will continue to engage with the ‘Researchers & Libraries working together on improving digitised newspapers’ and the Digital Historical Periodica Groups. Many of the talks illustrated the importance of semantic annotations for synoptic examination of historical periodicals and I hope to apply at work my learning from the excellent pre-conference workshop on Named Entity Processing delivered by @ImpressoProject

I also found enjoyable and cool the panel on Exploring AV Corpora in the Humanities, in particular the presentation on the Distant Viewing Toolkit (DVT) for the Cultural Analysis of Moving. And outside the conference I had fun taking a walk along the artistic light-themed route to explore Utrecht city-centre. I enjoyed the conference so much that I have submitted DH2020 reviewer self-nomination!

Tunnel
Installation by Erik Groen, Ganzenmarkt Tunnel, Utrecht

Daniel Van Strien

I thought I would focus on a couple of sessions relating to OCR at the conference that I would be keen to explore further as part of the Living with Machines project. In particular I am keen to further explore two tools for OCR; Transkribus and Kraken

Transkribus was discussed in the context of doing OCR on newspapers as part of the Impresso project in the paper ‘Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images’. Although I have previously heard about the tool I was particularly interested to hear about how it was being used to work with newspapers as I have primarily heard about its use in handwritten text recognition. The paper also gave some initial idea of how much ground truth data might need to be generated before training a new OCR engine for newspaper text. As part of the impreso project a167 pages of ground truth data was created, not trivial by any means but much lower than what might be expected. With this amount of data the project was able to generate a substantial improvement in the quality of OCR over various version of ABBYY software. 

The second tool was Kraken which was introduced in the paper ‘Kraken - an Universal Text Recognizer for the Humanities’. I was particularly interested to hear about how this tool could be easily trained with new annotations to recognise new types and languages. For the most part Living with Machines will be relying on previously generated OCR but there may be occasions when it is worth investing time to try and produce more accurate OCR. For these occasions, testing Kraken further would be one nice starting point particularly because of the relative ease it provides in training data at the line rather than word level. This makes annotating the ground truth data (a little) less painful and time consuming. 



Image1