THE BRITISH LIBRARY

Digital scholarship blog

108 posts categorized "Events"

13 February 2018

BL Labs 2017 Symposium: Samtla, Research Award Runner Up

Add comment

Samtla (Search And Mining Tools for Labelling Archives) was developed to address a need in the humanities for research tools that help to search, browse, compare, and annotate documents stored in digital archives. The system was designed in collaboration with researchers at Southampton University, whose research involved locating shared vocabulary and phrases across an archive of Aramaic Magic Texts from Late Antiquity. The archive contained texts written in Aramaic, Mandaic, Syriac, and Hebrew languages. Due to the morphological complexity of these languages, where morphemes are attached to a root morpheme to mark gender and number, standard approaches and off-the-shelf software were not flexible enough for the task, as they tended to be designed to work with a specific archive or user group. 

Figure1
Figure 1: Samtla supports tolerant search allowing queries to be matched exactly and approximately. (Click to enlarge image)

  Samtla is designed to extract the same or similar information that may be expressed by authors in different ways, whether it is in the choice of vocabulary or the grammar. Traditionally search and text mining tools have been based on words, which limits their use to corpora containing languages were 'words' can be easily identified and extracted from text, e.g. languages with a whitespace character like English, French, German, etc. Word models tend to fail when the language is morphologically complex, like Aramaic, and Hebrew. Samtla addresses these issues by adopting a character-level approach stored in a statistical language model. This means that rather than extracting words, we extract character-sequences representing the morphology of the language, which we then use to match the search terms of the query and rank the documents according to the statistics of the language. Character-based models are language independent as there is no need to preprocess the document, and we can locate words and phrases with a lot of flexibility. As a result Samtla compensates for the variability in language use, spelling errors made by users when they search, and errors in the document as a result of the digitisation process (e.g. OCR errors). 

Figure2
Figure 2: Samtla's document comparison tool displaying a semantically similar passage between two Bibles from different periods. (Click to enlarge image)

 The British Library have been very supportive of the work by openly providing access to their digital archives. The archives ranged in domain, topic, language, and scale, which enabled us to test Samtla’s flexibility to its limits. One of the biggest challenges we faced was indexing larger-scale archives of several gigabytes. Some archives also contained a scan of the original document together with metadata about the structure of the text. This provided a basis for developing new tools that brought researchers closer to the original object, which included highlighting the named entities over both the raw text, and the scanned image.

Currently we are focusing on developing approaches for leveraging the semantics underlying text data in order to help researchers find semantically related information. Semantic annotation is also useful for labelling text data with named entities, and sentiments. Our current aim is to develop approaches for annotating text data in any language or domain, which is challenging due to the fact that languages encode the semantics of a text in different ways.

As a first step we are offering labelled data to researchers, as part of a trial service, in order to help speed up the research process, or provide tagged data for machine learning approaches. If you are interested in participating in this trial, then more information can be found at www.samtla.com.

Figure3
Figure 3: Samtla's annotation tools label the texts with named entities to provide faceted browsing and data layers over the original image. (Click to enlarge image)

 If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.


Posted by BL Labs on behalf of Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang.

05 February 2018

8th Century Arabic science meets today's computer science

Add comment

Or, Announcing a Competition for the Automatic Transcription of Historical Arabic Scientific Manuscripts 

“An impartial view of Digital Humanities (DH) scholarship in the present day reveals a stark divide between ‘the West and the rest’…Far fewer large-scale DH initiatives have focused on Asia and the non-Western world than on Western Europe and the Americas…Digital databases and text corpora – the ‘raw material’ of text mining and computational text analysis – are far more abundant for English and other Latin alphabetic scripts than they are for Chinese, Japanese, Korean, Sanskrit, Hindi, Arabic and other non-Latin orthographies…Troves of unread primary sources lie dormant because no text mining technology exists to parse them.”

-Dr. Thomas Mullaney, Associate Professor of Chinese History at Stanford University

Supporting the use of Asian & African Collections in digital scholarship means shining a light on this stark divide and seeking ways to close the gap. In this spirit, we are excited to announce the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.

Add MS 7474_0043.script

The Competition

Drawing together experts from British Library, The Alan Turing Institute, Qatar Digital Library and PRImA Research Lab, our aim in launching this competition is to play an active roll in advancing the state-of-the-art in handwritten text recognition technologies for Arabic. For our first challenge we are focussing on finding an optimal solution for accurately and automatically transcribing historical Arabic scientific handwritten manuscripts.

Though such technologies are still in their infancy, unlocking historical handwritten Arabic manuscripts for large-scale text analysis has the potential to truly transform research. In conjunction with the competition we hope to build and make freely open and available a substantial image and ground truth dataset to support continued efforts in this area. 

Enter the Competition

Organisers

Apostolos Antonacopoulos Professor of Pattern Recognition, University of Salford and Head of (PRImA) research lab 
Christian Clausner Research Fellow at the Pattern Recognition and Image Analysis (PRImA) research lab  
Nora McGregor Digital Curator at British Library, Asian & African Collections
Daniel Lowe Curator at British Library, Arabic Collections
Daniel Wilson-Nunn, PhD student at University of Warwick & Turing PhD Student based at Alan Turing Institute 
• Bink Hallum, Arabic Scientific Manuscripts Curator at British Library/Qatar Foundation Partnership 

Further reading

For more on recent Digital Research Team text recognition and transcription projects see:

 

This post is by Nora McGregor, Digital Curator, British Library. She is on twitter as @ndalyrose

17 January 2018

BL Labs 2017 Symposium: Keynote Talk by Josie Fraser

Add comment

The fifth annual British Library Labs Symposium kicked off with an inspiring keynote speech by Josie Fraser, entitled ‘Open, Digital, Inclusive: Unleashing Knowledge’.

As well as working as senior technology adviser within the National Technology Team at the UK Government's Department for Digital, Culture, Media and Sport Josie is currently the Chair of Wikimedia UK.

Josie discussed the impact of the open knowledge movement on education and learning. She looked at the powerful role that Wikimedia UK and Wikimedians have played in bringing UK cultural institutions and their digital collections to new and wider audiences. Her talk also explored how open knowledge partnerships are driving diversity and better representation for all online. At the end, she took questions from the audience and invited them to join her in exploring ideas and opportunities for the future.

You can see a video the full talk, with an introduction by Dr Adam Farquhar, Head of Digital Scholarship at the British Library, here:

You can follow this link to see her slides:

 Josie slide 1

 https://www.slideshare.net/labsbl/open-digital-inclusive-unleashing-knowledge

The sixth BL Labs Symposium will be on the 12th November 2018.

Posted by Eleanor Cooper, Project Officer BL Labs.

18 December 2017

Workshop report: Identifiers for UK theses

Add comment

Along with the Universities of Southampton and London South Bank, EThOS and DataCite UK have been investigating if having persistent identifiers (PIDs) for both a thesis and its data would help to liberate data from the appendices of the PDF document. With some funding from Jisc in 2014, we ran a survey and some case studies looking at the state of linking to research data underlying theses to see where improvements could be made. Since then, there has been some slow but steady progress towards realising its recommendations. Identifiers are now visible in EThOS itself (see image below) and a small number of UK institutions are now assigning Digital Object Identifiers (DOIs) to their theses on a regular basis. Many more are implementing ORCID iDs for their post-graduate students. We wanted to reignite the conversation around unlocking thesis data and see what was needed to progress it further.

EThOS_CambridgeRecord_JBB

On 4th December 2017, we ran a workshop to hear what progress is being made and what the remaining barriers are to applying persistent identifiers to theses and thesis data. We heard from both the University of Cambridge and the London School of Hygiene and Tropical Medicine, both of whom are assigning DOIs to published theses on a regular basis. They gave an outline of how they have got to this point, including the case made within the university to ensure DOIs were available for theses.

As institutions start to identify their theses with DOIs, we need to ensure that these identifiers are picked up and usable in EThOS. Heather Rosie (EThoS Metadata Manager) explained how the lack of any consistent identifier for theses up to this point hinders disambiguation – due to errors in titles and different representations of author names, we simply do not know many theses have been published in the UK. But Heather also highlighted what institutions can do to help ensure any available identifiers make their way into EThOS - by making sure they are available for harvest, especially via OAI-PMH.

Based on the morning’s presentations there was broad discussion around the remaining issues that institutions still have in applying their DOIs or ORCIDs to their published theses. These included barriers such as:

  • Low priority due to lack of buy in or interest from both researchers and institutional decision-makers. Interest could be increased by improving understanding of what PIDs are and what they can do, particularly the tangible benefits they provide
  • A single institution may use multiple systems to manage different pieces of information about its researchers and their outputs. This creates internally competing systems that overlap; uneven resource; and a lack of clarity about what details go where
  • Further technical barriers include having to rely on the suppliers of non-open source systems to make the appropriate changes. Where plug-ins for even open-source systems are developed at institution, the associated workflow might not be appropriate for all other users. Finally, technical support teams tend to be removed from Library staff
  • Sustainability of using the identifiers, especially in terms of cost.

The second half of the workshop looked towards both the future and the past: whether the British Library digitising its large collection of legacy theses on microfilm might be a way to make them available to users, but also to ensure they are digitally preserved and assigned persistent identifiers. Paul Joseph from the University of British Columbia (UBC) gave us a great example to consider here: they have digitised 32,000 (both doctoral and masters level) and made them openly available through their repository: assigning DOIs as they did so. A major concern for UK universities undertaking a similar endeavour is the inability to confirm that third-party rights have been cleared in the thesis. But under their clear take-down policy, it was interesting to hear that UBC find that they only receive 2-3 take-down notices per year.

The final discussions of the day covered community needs for the future. This included two topics carried over from the morning’s session, on how we make the case for wider application of identifiers to theses to researchers and senior management and what can be done to make technical integration and workflow changes possible or easier. We also dug down into the other persistent identifiers related to theses that would support the needs of the UK community (such as organisation identifiers and funding identifiers), the potential for the Library to mass-digitize theses and assign DOIs to them and the other steps that can be taken to break data out of the thesis.

Through these discussions we got a strong steer as to what we at the British Library need to do to help to support the community in using persistent identifiers as a way of encouraging greater availability of doctoral research. These include providing:

  • more advocacy for PIDs – for example to students & research managers. We heard that a message from BL goes a long way – ‘we have to ask you to claim an ORCID iD because the British Library says so’, or ‘DOIs are needed because national thesis policy says so’
  • metadata guidance for libraries. What we already provide is great but we could do more of it, e.g. best practice examples, support desk, engage with system suppliers on behalf of institutions
  • preservation of digital theses. This is urgently needed
  • a big piece of IPR work to give institutions the confidence to make legacy theses open access without express permission, including a press campaign to drive interest & support.

But it is not only the Library that attendees thought may influence developments. There was also a clear appetite for stronger mandates from funders to support the deposit of open theses and reduction of embargo periods. There was also interest in national-level activities such as a national strategy for UK theses or a Scholarly Communication Licence for theses.

It’s clear there’s still a lot to be done before we’re at a stage where we can rely on persistent identifiers to help us jail-break research data out of thesis appendices. But we’ll continue to work with the community on this through EThOS and DataCite UK. We hope to hold a webinar in 2018 to talk more about the outcomes of this workshop, but in the meantime you can direct any questions on this work to datasets@bl.uk.

This post is by Rachael Kotarski, the British Library's Data Services Lead, on twitter as @RachPK.

04 November 2017

International Games Week 2017

Add comment

Today at the British Library we are hosting a pop-up game parlour for International Games Week. So if you are in the Library between 10:00 and 16:00 come play some games!

IGW_Logo_Africa-EuropeWe have our usual favourites, including Animal Upon Animal, Biblios, Carcassonne, Dobble, Pandemic, Rhino Hero, Scrabble and Ticket To Ride Europe.

Plus some new ones, including The Hollow Woods: Storytelling Card Game, which revives the Victorian craze for ‘myrioramas’ and Great Scott! - The Game of Mad Invention, a Victorian themed card game for 3 to 5 players, made by Sinister Fish Games, which uses images selected from the British Library’s Mechanical Curator collection on Flickr in their artwork

Great Scott! - The Game of Mad Invention

It is always lovely to see the British Library’s digital collections being used in creative projects and this week Robin David won the BL Lab's commercial award for his game Movable Type; which also used the Mechanical Curator images in the artwork for a card-drafting, word-building game that has been described like Scrabble crossed with Sushi Go. Moveable Type was a successful Kickstarter campaign in 2016, which sold out quickly, but we understand they have a new Kickstarter being launched very soon, we'll keep you posted!

Cassie Elle's explanation of Movable Type by Robin David

In addition to board and card games, we are also delighted to host Sally Bushell and James Butler from Lancaster University, who the British Library are working with on the AHRC funded project Creating a Chronotopic Ground for the Mapping of Literary Texts. They have been using Minecraft for The Lakescraft Project; which created an innovative teaching resource to provide a fun and innovative means of introducing concepts centred around the literary, linguistic, and psychological analysis of Lake District's landscape. This is a fascinating initiative and I'm pleased to report Lakescraft has evolved into a broader project called Litcraft, to use the approach for exploring literature set in other locations.

Introduction to The Lakescraft Project

Introductory video for Litcraft's first public release: R.L.Stevenson's Treasure Island

So lots of exciting fun games happening today in the  British Library and if you can't be here in person, do keep an eye on social media using the hashtag #ALAIGW. Also do check out what games clubs and events may be running in your local library.

This post is by Digital Curator Stella Wisdom, you can follow her on twitter @miss_wisdom

17 October 2017

Imaginary Cities – Collaborations with Technologists

Add comment

Posted by Mahendra Mahey (Manager of BL Labs) on behalf of Michael Takeo Magruder (BL Labs Artist/Researcher in Residence).

In developing the Imaginary Cities project, I enlisted two long-standing colleagues to help collaboratively design the creative-technical infrastructures required to realise my artistic vision.

The first area of work sought to address my desire to create an automated system that could take a single map image from the British Library’s 1 Million Images from Scanned Books Flickr Commons collection and from it generate an endless series of everchanging aesthetic iterations. This initiative was undertaken by the software architect and engineer David Steele who developed a server-side program to realise this concept.

David’s server application links to a curated set of British Library maps through their unique Flickr URLs. The high-resolution maps are captured and stored by the server, and through a pre-defined algorithmic process are transformed into ultra-high-resolution images that appear as mandala-esque ‘city plans’. This process of aesthetic transformation is executed once per day, and is affected by two variables. The first is simply the passage of time, while the second is based on external human or network interaction with the original source maps in the digital collection (such as changes to meta data tags, view counts, etc.).


Time-lapse of algorithmically generated images (showing days 1, 7, 32 and 152) constructed from a 19th-century map of Paris

The second challenge involved transforming the algorithmically created 2D assets into real-time 3D environments that could be experienced through leading-edge visualisation systems, including VR headsets. This work was led by the researcher and visualisation expert Drew Baker, and was done using the 3D game development platform Unity. Drew produced a working prototype application that accessed the static image ‘city plans’ generated by David’s server-side infrastructure, and translated them into immersive virtual ‘cityscapes’.

The process begins with the application analysing an image bitmap and converting each pixel into a 3D geometry that is reminiscent of a building. These structures are then textured and aligned in a square grid that matches the original bitmap. Afterwards, the camera viewpoint descends into the newly rezzed city and can be controlled by the user.

Takeo_DS-Blog3-2_Unity1
Analysis and transformation of the source image bitmap
Takeo_DS-Blog3-3_Unity2
View of the procedurally created 3D cityscape

At present I am still working with David and Drew to refine and expand these amazing systems that they have created. Moving forward, our next major task will be to successfully use the infrastructures as the foundation for a new body of artwork.

You can see a presentation from me at the British Library Labs Symposium 2017 at the British Library Conference Centre Auditorium in London, on Monday 30th of October, 2017. For more information and to book (registration is FREE), please visit the event page.

About the collaborators:

Takeo_DS-Blog3-4_D-Steele
David Steele

David Steele is a computer scientist based in Arlington, Virginia, USA specialising in progressive web programming and database architecture. He has been working with a wide range of web technologies since the mid-nineties and was a pioneer in pairing cutting-edge clients to existing corporate infrastructures. His work has enabled a variety of advanced applications from global text messaging frameworks to re-entry systems for the space shuttle. He is currently Principal Architect at Crunchy Data Solutions, Inc., and is involved in developing massively parallel backup solutions to protect the world's ever-growing data stores.

Takeo_DS-Blog3-5_D-Baker
Drew Baker

Drew Baker is an independent researcher based in Melbourne Australia. Over the past 20 years he has worked in visualisation of archaeology and cultural history. His explorations in 3D digital representation of spaces and artefacts as a research tool for both virtual archaeology and broader humanities applications laid the foundations for the London Charter, establishing internationally-recognised principles for the use of computer-based visualisation by researchers, educators and cultural heritage organisations. He is currently working with a remote community of Indigenous Australian elders from the Warlpiri nation in the Northern Territory’s Tanami Desert, digitising their intangible cultural heritage assets for use within the Kurdiji project – an initiative that seeks to improve mental health and resilience in the nation’s young people through the use mobile technologies.

26 September 2017

BL Labs Symposium (2017), Mon 30 Oct: book your place now!

Add comment

Bl_labs_logo

Posted by Mahendra Mahey, BL Labs Manager

The BL Labs team are pleased to announce that the fifth annual British Library Labs Symposium will be held on Monday 30 October, from 9:30 - 17:30 in the British Library Conference Centre, St Pancras. The event is FREE, although you must book a ticket in advance. Don't miss out!

The Symposium showcases innovative projects which use the British Library’s digital content, and provides a platform for development, networking and debate in the Digital Scholarship field.

Josie-Fraser
Josie Fraser will be giving the keynote at this year's Symposium

This year, Dr Adam Farquhar, Head of Digital Scholarship at the British Library, will launch the Symposium and Josie Fraser, Senior Technology Adviser on the National Technology Team, based in the Department for Digital, Culture, Media and Sport in the UK Government, will be presenting the keynote. 

There will be presentations from BL Labs Competition (2016) runners up, artist/researcher Michael Takeo Magruder about his 'Imaginary Cities' project and lecturer/researcher Jennifer Batt about her 'Datamining verse in Eighteenth Century Newspapers' project.

After lunch, the winners of the BL Labs Awards (2017) will be announced followed by presentations of their work. The Awards celebrates researchers, artists, educators and entrepreneurs from around the world who have made use of the British Library's digital content and data, in each of the Awards’ categories:

  • BL Labs Research Award. Recognising a project or activity which shows the development of new knowledge, research methods or tools.
  • BL Labs Artistic Award. Celebrating a creative or artistic endeavour which inspires, stimulates, amazes and provokes.
  • BL Labs Commercial Award. Recognising work that delivers or develops commercial value in the context of new products, tools or services that build on, incorporate or enhance the British Library's digital content.
  • BL Labs Teaching / Learning Award. Celebrating quality learning experiences created for learners of any age and ability that use the British Library's digital content.
  • BL Labs Staff Award. Recognising an outstanding individual or team who have played a key role in innovative work with the British Library's digital collections.  

The Symposium's endnote will be followed by a networking reception which will conclude the event, at which delegates and staff can mingle and network over a drink.  

Tickets are going fast, so book your place for the Symposium today!

For any further information please contact labs@bl.uk

11 August 2017

Last Chance to Book for Game Library Camp Tomorrow

Add comment

Tomorrow afternoon is Game Library Camp here at the British Library. So if you are in or near London, and are interested in libraries and games (all types of games, including board games, table top roleplaying, live action roleplaying (though please don't bring any foam replica weapons!), videogames, interactive fiction etc.), then please book a free place from https://gamelibcamp.eventbrite.co.uk.

The event is happening on Saturday 12 August, 12:30 to 16:30, at the Knowledge Centre, The British Library, 96 Euston Rd, London, NW1 2DB. For info on how to get here, go to https://www.bl.uk/aboutus/quickinfo/loc/stp. Please note lunch is not provided, but there are cafés on site, or bring your own snacks. We'll be using #GameLibCamp17 to discuss the event on Twitter etc.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-08-11/c9eac854-6ad0-4e23-ab9f-f766f43cf7d1.png

At a library camp the participants lead the agenda – in fact, there isn’t an agenda until attendees pitch (bad tent pun, groan!) and decide what they’d like to talk about at the start of the event.  The only requirement for a session is that it fits within the theme. If you already have an idea for a talk, discussion, game or activity; you can propose your suggestion beforehand on this page http://gamesandglams.blogspot.co.uk/p/game-library-camp-sessions.html. We'll have the use of a number of rooms at the British Library's Knowledge Centre, so will be able to run a few sessions in parallel during the event. Also, please do bring games along if you want to run a game! - this is totally encouraged.

Programme:

  • Registration from 12 noon
  • Introduction and session pitches 12:30pm
  • 1st session 1pm - 1:40pm
  • 2nd session 1:45pm - 2:25pm
  • 3rd session 2:30pm - 3:10pm
  • 4th session 3:15pm - 3:55pm
  • Closing session 4pm
  • Finish by 4:30pm
  • Post-event social meetup at The Somers Town Coffee House

In the words of experienced Library Campers Sue Lawson and Richard Veevers who run the http://www.librarycamp.co.uk website: "there's no cost, there are no keynotes and library camp is open to anyone: public/private/whatever sector and you don't have to work in a library".

This specific library camp is intended as a warm up to International Games Week in the autumn and to inspire librarians and library staff from all sectors to host their own game events. We also totally welcome colleagues from, and people who visit, other cultural heritage organisations, museums, archives etc. who participate in games projects and events, both game making and game playing.  

Furthermore, if you are interested, but you can't attend tomorrow, I recommend joining the online discussion group Games & GLAMS set up by British Library collaborator, Sarah Cole, that focuses on game related activities in the Galleries, Libraries, Archives and Museums sector. It's open to anyone with an interest in games in any of these areas. There is also an associated Games & GLAMS Twitter account: @Games_GLAMS.

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom. Stella is co-organising Game Library Camp with Darren Edwards of Bournemouth Libraries and the lead on International Games Week in the UK, and Gary Green from Surrey Libraries.