Digital scholarship blog

Enabling innovative research with British Library digital collections

30 posts categorized "Africa"

06 September 2018

Visualising the Endangered Archives Programme project data on Africa, Part 3. Finishing up

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

This summer I have taken a break by working hard, I’ve broadened my academic horizons by ignoring academia completely, and I’ve felt at home while travelling hundreds of miles a week. But above all else, I’ve just had a really nice time.

In my last two blogs I covered the early stages of my placement at the British Library, and discussed the data visualisation tools I’ve been exploring.

In this final blog I am going to outline the later stages of my project, I am also going to talk about my experience of undertaking a British Library placement, what I’ve learned and whether it was worth it (spoiler alert, it was).

What I’ve been doing

The final stages of my project have mostly consisted of two separate lines of investigation.

Firstly, I have been working on finding out as much as I can about the  Endangered Archives Programme (EAP)’s projects in Africa and finding the best ways to visualise that information in order to create a sort of bank of visualisations that the EAP team can use when they are talking about the work that they do. Visualisations, such as the one below showing the number of applications related to each region of Africa by year, can make tables of data much easier to understand.

a line graph showing Endangered Archives applications by region year on year.

Secondly, I was curious about why some project applications get funded and some do not. I wanted to know if I could identify any patterns in the reasons why projects get rejected.

This gave me the opportunity to apply my skills as a linguist to the data, albeit on a small scale. I decided to examine the feedback given to unsuccessful applicants by the panel that awards the EAP grants to see if I could identify any patterns. To do this I created a corpus, or electronic database, of texts. This could then be run through corpus analysis software to look for patterns.

screenshot showing a window of AntConc corpus analysis software listing word tokens along with their frequency and rank

This image shows a word list created for my corpus using AntConc software, which is a free and open source corpus analysis tool.

My analysis allowed me to identify a number of issues common to many unsuccessful applications. In addition to applications outside of the scope of EAP there are also proposals which would make excellent projects but their applications lack the necessary information to award a grant.

Based on my analysis I was able to make a number of recommendations about additional information EAP could provide for applicants which might help to prevent potentially valuable archives being lost due to poor applications.

What I’ve learned

As well as learning about visualisation software I’ve learned a lot this summer about the EAP archives.

I’ve found out where applications are coming from, and which African countries have the most associated applications. I’ve learned that there are many great data visualisation tools available for free online. I’ve learned that there are over 70 different languages represented in the EAP archived projects from Africa.

black and white photo of James Ssali and and unknown woman dressed smartly and standing arm-in-arm before the camera
James Ssali and an unknown woman, from the Ham Musaka archive, Uganda (EAP656)

One of the most interesting things I’ve learned is just how much archival material is available for research – and on an incredibly broad range of topics. The materials digitised and preserved in Africa over the last 13 years includes:

This wealth of information provides so much opportunity for research and these are just the archives from Africa. The EAP funds projects all over the world.

colour photograph of two pages from the opened Shui manuscript.
Shui manuscript from China (EAP143)

In addition to learning about the EAP archives I’ve learned a lot from working in the British Library more generally. The scale of the work that is carried out is immense and I don’t think I fully appreciated before working here for three months just how large the challenges they face are.

In addition to preserving a copy of every book published in the UK, the BL is also working to create large digital archives in order to facilitate the way that modern scholarship has developed. They are digitising books, audio, websites, as well as historical documents such as the records of the East India Company.

Colour photograph of East India House and the street outside bustling with pedestrians and horse and cart traffic
View of East India House by Thomas Hosmer Shepherd

Was it worth it?

A PhD is an intense thing to undertake and you have a time limit to complete it. At first glance, taking three months out to work on a placement with little direct relevance to my PhD might seem a bit foolish, particularly when it means a daily commute from Brighton to London.

Far from wasting my time, however, this placement has been an enriching experience. My PhD is on the origins and development of Cameroon Pidgin English. This placement has given me a break from my work while broadening my understanding of African culture and the context in which the language I study is spoken.

I’ve always had an interest in data visualisation and my placement has given me time to play with visualisation tools and gain a real understanding of the resources available. I feel refreshed and ready for the new term despite having worked full time all summer.

The break has also given me thinking space, it has allowed ideas to percolate and given me new skills which I can apply to my work. Taking a break from academia has given me more perspective on my work and more options for how to develop it.

colour photograph of the bronze statue of Isaac Newton outside the British Library site in London
The British Library, St Pancras

Finally, the travel has been a lot but my supervisors have been very flexible, allowing me to work from home two days a week. The up-side of coming to London regularly has been getting to work with interesting people.

Working in a large institution could be an intimidating and isolating experience but it has been anything but. The digital scholarship team have been welcoming and interested, in particular I have had two very supportive supervisors. The British Library are really keen to support and develop placement students, and there is a lovely community of PhD students at the BL some on placements, some doing their PhD here.

I have had a great time at the British Library this summer and can only recommend the scheme to anyone thinking of applying for a placement next year.

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

01 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 1. The project

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.

This month I have learned:

  • that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
  • that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
  • that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
  • and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
EAP051
Idrissou Njoya and Nji Mapon examine Mapon's endangered manuscript collection in Cameroon (EAP051)

I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.

Why am I sifting through this data?

Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.

EAP466
Manuscript of the Riyadh Mosque of Lamu, Kenya (EAP466)

The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.

Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.

The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.

EAP265
Tifinagh rock inscriptions in the Tadrart Acacus mountains, Libya (EAP265)

I’m also concerned with finding the most useful ways to visualise this information.

For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.

Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet.  OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.

With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.

I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.

Major vs Pilot Chart

For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.

Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.

Another chart that was quick to make shows the number of applicants from each continent by year.

Continent of Applicant Chart

This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.

This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.

I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.

This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.

01 May 2018

New Digital Curator in the Digital Scholarship Team

Adi Keinan-SchoonbaertHello all! My name is Adi Keinan-Schoonbaert, and I’m the new Digital Curator for Asian and African collections at the British Library. One of the core remits of the Digital Scholarship team is to enable and encourage the reuse of the Library’s digital collections. When it comes to Asian and African collections, there are always interesting projects and initiatives going on. One is the Two Centuries of Indian Print project, which just started a second phase in March 2018 – a project with a strong Digital Humanities strand led by Digital Curator Tom Derrick. Another example is a collaborative transcription project, supporting the transcription of handwritten historical Arabic scientific works for Handwritten Text Recognition (HTR) research with the help of volunteers.

To give a bit of a background about myself and how I got to the Library: I’m an archaeologist and heritage professional by education and practice, with a PhD in Heritage Studies from University College London (2013). As a field archaeologist I used to record large quantities of excavation-related data – all manually, on paper. This was probably the first time I saw the potential of applying digital tools and technologies to record, manage and share archaeological data.

My first meaningful engagement with archaeological data and digital technologies started in 2005, when I joined the Israeli-Palestinian Archaeology Working Group (IPAWG) to create a database of all archaeological sites surveyed or excavated by Israel in the West Bank since its occupation in 1967, and its linking with a Geographic Information System (GIS), enabling the spatial visualisation and querying of this data for the first time. The research potential of this GIS-linked database proved so great, that I’ve decided to further explore it in a PhD dissertation. My dissertation focused on archaeological databases covering the occupied West Bank, and I was especially interested in the nature of archaeological records and the way they reflect particular research interests and heritage management priorities, as well as variability in data quality, coverage, accuracy and reliability.

Following my PhD I stayed at UCL Institute of Archaeology as a post-doctoral research associate, and participated in a project called MicroPasts, a UCL-British Museum collaboration. This project used web-based, crowdsourcing methods to allow traditional academics and other communities in archaeology to co-produce innovative open datasets. The MicroPasts crowdsourcing platform provided a great variety of projects through which people could contribute – from transcribing British Museum card catalogues, through tagging videos on the Roman Empire, to photomasking images in preparation for 3D modelling of museum objects.

With the main phase of the MicroPasts project coming to an end, I joined the British Library as Digital Curator (Polonsky Fellow) for the Hebrew Manuscripts Digitisation Project. This role allowed me to create and implement a digital strategy for engaging, accessing and promoting a specific digitised collection, working closely with curators and the Digital Scholarship team. My work included making the collection digitally accessible (on data.bl.uk, working with British Library Labs) and encouraging open licensing, creating a website, promoting the collection in different ways, researching available digital methods to explore and exploit collections in novel ways, and implementing tools such as an online catalogue records viewer (TEI XML), OpenRefine, and 3D modelling.

A 6-months backpacking trip to Asia unexpectedly prepared me for my new role at the Library. I was delighted to join – or re-join – the Library’s Digital Research team, this time as Digital Curator for Asian and African Collections. I find these collections especially intriguing due to their diversity, richness and uniqueness. These include mostly manuscripts, printed books, periodicals, newspapers, photographs and e-resources from Africa, the Middle East (including Qatar Digital Library), Central Asia, East Asia (including the International Dunhuang Project), South Asia, SE Asia – as well as the Visual Arts materials.

I’m very excited to join the Library’s Digital Research team work alongside Neil Fitzgerald, Nora McGregor, Mia Ridge and Stella Wisdom and learn from their rich experience. Feel free to get in touch with us via [email protected] or Twitter - @BL_AdiKS for me, or @BL_DigiSchol for the Digital Scholarship team.

12 March 2018

The Ground Truth: Transcribing historical Arabic Scientific Manuscripts for OCR research

Announcing a collaborative transcription project to support state-of-the-art research in automatic handwritten text recognition for historical Arabic texts

Cultural heritage institutions around the world are digitising hundreds of thousands of pages of historical Arabic manuscript and archive collections. Making these fully text searchable has the potential to truly transform scholarship, opening up this rich content for discovery and enabling large-scale analysis.

Computer scientists and scholars are working on this challenge, building systems which can automatically transcribe images of handwritten text, but for historical Arabic script a solution remains just out of reach.

Our aim is to contribute to continued research in this area by building an open image and ground truth dataset of historical handwritten Arabic texts, ensuring historical Arabic collections benefit from state-of-the-art developments in handwritten text recognition.

What is Ground Truth?

Optical Character Recognition (OCR) systems essentially turn a picture of text into text itself—in other words, producing something like a .TXT or .DOC file from a scanned .JPG of a printed or handwritten page. Most OCR systems require ground truth, a set of files which represent the truthful record of elements of an image, for training and evaluation purposes.

The ground truth of an image’s text content, for instance, is the complete and accurate record of every character and word in the image.

By knowing what the system is supposed to recognise on a page of handwritten text, researchers can both train their system to recognise the characters as well as test how well the system does once trained.

Transcription
 

  
View more transcriptions in progress from this manuscript (Or 3366) on the platform 

A collaborative approach

This project is a proof of concept exploring whether the creation of such a dataset can be done collaboratively at scale, using the collective expertise of volunteers around the world. At the heart of this approach is the Library’s enduring commitment to creating new and interesting ways to connect diverse communities of interest and expertise, be it scholars, the general public, computer scientists, students, and curators, around our collections. For this we are utilising a free and open-source platform, From the Page, which allows anyone with an interest in historical Arabic manuscripts to experience them up close, many for the first time, to discuss, learn and share expertise in their transcription.

Helping transform research

The Digital Scholarship Department was able to fund the development of this open source platform to support Right-to-Left transcription, a feature which will benefit any scholar wishing to use the software for their own transcription needs. Any transcriptions produced in this pilot will be transformed into ground truth resources, hosted by the British Library and made freely available, without rights restriction, for anyone wishing to advance the state-of-the-art in optical character recognition technology. Specifically, resources created will be contributed to ground-breaking projects already underway such as Transkribus, the Open Islamic Texts Initiative, the IMPACT Centre of Competence Image and Ground Truth Resources and more!

Visit the new Arabic Scientific Manuscripts of the British Library transcription platform and download our Getting Started Guide for more detail (an Arabic version will be available shortly). 

  

Posted by Nora McGregor, Digital Curator, British Library

 

17 July 2017

A Wonderland of Knowledge - Behind the Scenes of the British Library (Nadya Miryanova work experience)

Posted by Nadya Miryanova BL Labs School Work Placement Student, currently studying at Lady Eleanor Holles, working with Mahendra Mahey, Manager of BL Labs.

British Library
Introduction to the British Library

Day 1

It was with a mixture of anticipation, curiosity and excitement that I opened the door to the staff entrance and started my two week work placement in the world’s largest library. I have been placed with BL Labs in the Digital Scholarship department, where I am working with Mahendra Mahey (Project Manager of BL Labs) for the following two weeks. After the inescapable health and safety induction, I am now extremely well acquainted with the BL’s elaborate fire alarm system, and following lunch at the staff restaurant, Mahendra provided me with an introduction to the British Library and explained the work undertaken by the BL Labs.

When most people hear the word ‘library’, conventional ideas typically spring to mind, including a copious number of books, and, of course, a disgruntled librarian ironically rather loudly encouraging silence every five minutes. I must admit that initially, my perspective was the same.

However, my viewpoint was soon to be completely turned around.

BL interior
British Library interior

An extraordinary institution, the British Library is indeed widely known for its remarkable collection of books, it is home to around 14 million. However, contrary to popular belief, these are only a small section of the Library’s vast collections. In fact, the British Library actually has an extremely diverse range of items, ranging from patents to musical scores, and from ancient artefacts dating as far back as 1000 BC to this morning’s newspapers, altogether giving a grand figure of approximately 200 million documented items. I was also delighted to discover that the British Library has the world’s largest collection of stamps! It is estimated that if somebody looked at 5 items each day, it would take an astonishing 80,000 years to see the whole of BL collections. 

I learnt that the objective of the BL Labs is to encourage scholars, innovators, artists, entrepreneurs and educators to work with the Library's digital collections, supporting its mission to try to ensure that the wealth and diversity of the Library’s intellectual digital heritage is available for the research, creativity and fulfilment of everyone. At BL Labs, anyone is invited to address an important research question(s) or ideas which uses the Library’s digital content and data, by entering the annual Awards or becoming involved in a collaborative project or even just using the collections in whatever way they want.

Although initially a little nervous when entering this immense institution, my fears evaporated completely, when on my very first day of working here, I was brought immediately into a friendly, welcoming atmosphere, promoted by the sincere kindness and interest that I was met with from each member of the Library's staff. 

Books Image
The George the IV British Library book collection

Day 2

At precisely 9 o’clock in the morning, I found myself seated at my office desk, looking at the newly filled out Outlook calendar on my computer to see what new and exciting tasks I would be faced with that day and looking out for any upcoming events. My Tuesday consisted mostly of independent work at my desk, and after a quick catch-up with Mahendra at 9.30, where we discussed the working plan for the day and reviewed yesterday’s work, I sat down to start my second full day of work at the British Library.

BL labs symposium
British Library Labs leaflet

Between 2013-2016, the British Library Labs held a competition, which looked for transformative project ideas that used the British Library’s digital collections and data in new and exciting ways. The BL Labs Awards recognises outstanding and innovative work that has been carried out using these collections. Mahendra had previously introduced me to the Labs Competition and Awards pages of the BL Labs website, and my main objective was to update the ideas and project submissions on this page, specifically adding the remaining Competition 2016 Entries, reviewing the 2015 and 2014 entries and checking that they were all complete with no entries missing. The competition entries can be accessed via the online archive.

This was an excellent opportunity for me to work on a new editing platform and further enhance my editing skills, which will doubtlessly prove very useful in everyday life as well as in the future. As I worked through editing and updating the pages, what struck me most was the incredible diversity and wide variety of ideas within the competition entries. From a project exploring Black Abolitionists and their presence in Britain, to the proposed creation of a Victorian meme machine, and from a planned political meeting’s mapper, to a suggested Alice in Wonderland bow tie design, each idea was entirely unique and original, despite the fact that each entry was adhering to the same brief. I was mesmerised by the amount of thought and careful planning that was evident in every submission, each one was intricately detailed and provided a careful and thorough plan of work. 

Victorian Meme
An example of a Victorian meme

After finishing lunch relatively early, I found myself with half an hour of my allocated break still left, and took the opportunity to explore the library. I walked down to the visitor’s entrance, and took a moment to admire the King’s library, a majestic tower of books standing in the British Library's centre. Stepping closer, I was able to read some of the inscriptions on the spines of the books, and was delighted to see that one of them was a book of Catullus’ poetry, poetry that I previously had studied in Latin GCSE. The scope of knowledge that lies within this library is practically endless, and it led me to reflect on the importance of the work of the BL Labs. I thought back to the competition entries, they prove that the possibilities for projects truly have no limit. The BL Labs are able to give scholars, academics and students the opportunity to access some of these digital collections such as books very easily and in any part of the world. Without this access, many of the wonderful projects that the BL currently works on would not be possible.

With that thought fresh in my mind, I was brought back to reality, and returned to my desk to continue working, this time on my mini-project. My last task for the day involved brainstorming ideas for this project. A direct focus was soon established, and I decided to explore the Russian language titles in the 65,000 digitised 19th Century Microsoft books. Later on, I shall be writing a blog post detailing my experience of working on this project.

Day 3

As the Piccadilly line train arrived at St Pancras, I actually managed to step and head off in the completely right direction for the first time that week (needless to say, my sense of direction is not the best). Feeling rather proud of myself, I walked with a skip in my step, ready to immerse myself in whatever plan of work awaited today.

I looked at the schedule of the day and my heart leapt, I was to be attending my first ever proper staff meeting. It was a very technical meeting, started off by the Head of Digital Scholarship, Adam Faquhar, who talked about current activities taking place in the Digital Scholarship department. Everyone made contributions to the general discussion in the meeting and Mahendra talked about the development of the BL Labs work and the progress made so far. It also provided me with an opportunity to talk about some of the things I was presently doing and I found that everybody was very receptive and supportive. I found it very interesting to be introduced to people who work in the same area on a day-to-day basis with the British Library and enjoyed hearing about all the different projects currently being undertaken.

SherlockNet Web interface
SherlockNet web interface

I then began working on some YouTube transcription work on the winners of the 2016 BL Labs competition, the first one being SherlockNet. The SherlockNet team worked to use convolutional neural networks to automatically tag and caption the British Library Flickr collection of digitised images taken largely from 19th Century books. If that doesn't sound impressive enough, consider the fact that this entry was submitted by three people, who were just 19 years old (undergraduate university students). My work involved listening carefully to each one of the interviews, and typing on a separate word document exactly what Luda Zhao, Karen Wang and Brian Do were talking about. This word document would then be used to make subtitles for the final film and would prove invaluable when creating a storyboard for the final cut down interview. 

BL poster
British Library Alice in Wonderland Poster

Day 4

As I turned the corner of Midland Road and stood to face the traffic lights, my gaze wondered over to the now familiar Alice in Wonderland poster that had the ‘British Library’ printed on it in block capitals. I smiled as I looked up at the Cheshire cat that was perched neatly on top of the first 'I' in the words 'British Library' and the cat smiled back, revealing a wide toothy grin. Alice, likewise, was looking up at the Cheshire cat, and in that moment, her situation was made very credible to me. She was surrounded by this entirely new world of Wonderland, and in a similar way, I find myself in a parallel world of continuous acquisition of knowledge, as each day I am learning something new, with the British Library being the Wonderland. A wonderful and well-known literary extract from Lewis Carol came to mind:

 “`Would you tell me, please, which way I ought to go from here?' (Alice)

That depends a good deal on where you want to get to,' said the Cat.

`I don't much care where--' said Alice.

`Then it doesn't matter which way you go,' said the Cat.

`--so long as I get somewhere,' Alice added as an explanation.

`Oh, you're sure to do that,' said the Cat, `if you only walk long enough.'

With this in mind, I briskly walked over to the doors of the office.

The beginning of my day consisted mostly of working on my own project, further classifiying a sub collection of Russian titles from the digitised collection of 65,000 books mostly from the 19th century. I worked on further enhancing the organisation and categorisation of these books, establishing a clear methodical approach that began with sorting the books into 2 categories-fiction and non-fiction. Curiously, the majority of the titles were actually non-fiction. After an e-mail correspondence with Katya Rogatchevskaia, Lead Curator East European Collections, I discovered that most of the books that were part of the digitisation were acquired at the time when they were published, so they were selected by Katya’s distant predecessors, a fact I found remarkable.

Nicholas II abdication in Russian
The Act of Abdication of Nicholas II and his brother Grand Duke Michael,
published as a placard that would be distributed
by hand or pasted to walls (shelfmark: HS.74/1870),
an example of a Russian language title that is now digitised

For the second-half of the day, I focussed once more on the YouTube transcriptions work and managed to finish transcribing the interviews for SherlockNet. I then discussed with Mahendra how I would storyboard the interviews in preparation for the film editing process. First, I would have to pick out specific sections of the interview that were most suitable to use in the film, marking the exact timings when the person started speaking to when they finished, and I then placed the series of timings in a chronological order. I was also able to choose the music for the end product (possibly my favourite part!), and I based my selection of the music on the mood of the videos and my perception of the characters of the individuals. I concluded my day by finding a no-copyright YouTube music page and discovered an assortment of possible music tracks. I managed to narrow down the selection to four possible soundtracks, which included titles such as ‘Spring in my Step’ and ‘Good Starts’.

Day 5

As I swiped my staff pass across the reader which permits access into the building, I checked my phone to see what the time was. It was 8.30am and concurrently, I caught sight of the date, Friday 14th July. I stopped in my tracks. Today was marking my first full working week at the British Library, I could hardly believe how quickly the time went! It forcibly reminded me of the inscription on my clock at home, ‘tempus fugit’ (time flees) because if there’s one thing that has gone abnormally fast here at my time at the BL, it’s time.

Hebrew manuscript
Digitised Hebrew Manuscript available through the British Library

In the morning, I attended a meeting discussing an event Mahendra is planning around the Digitised Hebrew manuscripts, and I was lucky enough to meet Ilana Tahan, the Lead Curator of Hebrew and Christian Orient Collections. The meeting included a telephone call to Eva Frojmovic, an academic at the Centre for Jewish Studies in the School of Fine Art of the History of Art and Cultural Studies in the University of Leeds. The discussion was centered mostly on an event that would be taking place where the BL would be talking about its collection of digitised Hebrew manuscripts in order to promote their free use to the general public. The very beautiful Hebrew manuscripts could actually have a very wide target audience, perhaps additionally reaching outside the academic learning sphere and having the potential to be used in the creative/artistic space.

Contrary to popular belief, the collection of 1302 digitised manuscripts can be used by anyone and everyone, leading to exciting possibilities and new projects. The amazing thing about the digital collections is that it makes it possible for someone who does not live in London to access them, where ever they may be in the world, and they can be looked at digitally, and can be used to enhance any learning experience, ranging from seminars or lessons to PhD research projects. The actual hard-copy of the manuscripts can also be, of course, accessed in the British Library. The structure and timings of the event were discussed, and a date was set for the next meeting and for the event. To finish the meeting, Mahendra offered an explanation of the handwriting recognition transcription process for the manuscripts. There are 22 letters in the Hebrew alphabet, and each individual handwritten letter is recognised as a shape by the computer, though it's important that the computer has ground truth (i.e. examples of human transcribed manuscripts). Each letter and word is recognised and processed and will very cleverly convert the original Hebrew handwritten-script written into computerised Hebrew script. This means it would then allow someone to search for words in the manuscript, easily and quickly using a computerised search tool. 

Ilana looking at manuscripts
Ilana Tahan, Lead Curator of Hebrew and Christian Orient Collections,
looking through Hebrew manuscripts

For the majority of the afternoon, I was floating between a variety of different projects, doing more work on the YouTube transcriptions and enhancing my mini-project, as well as creating a table of the outstanding blogs that still had to be published on the British Library's Digital Scholarship blog.

At the end of the day, I did a review of my first week, evaluating the progress that I had made with Mahendra. Throughout the week, I feel that I have enhanced and developed a number of invaluable skills, and have gained an incredible insight into the working world.

I will be writing about my second week, as well as my mini-project soon, so please come and visit this blog again if you are interested to find out more about some of the work being done at the British Library.

 

 

03 November 2016

Black Abolitionist Performances and their Presence in Britain - An update!

Posted by Hannah-Rose Murray, finalist in the BL Labs Competition 2016.

Reflecting back on an incredible and interesting journey over the last few months, it is remarkable at the speed in which five months has flown by! In May, I was chosen as one of the finalists for the British Library Labs Competition 2016, and my project has focused on black abolitionist performances and their presence in Britain during the nineteenth century. Black men and women had an impact in nearly every part of Great Britain, and it is of no surprise to learn their lectures were held in famous meeting halls, taverns, the houses of wealthy patrons, theatres, and churches across the country: we inevitably and unknowably walk past sites with a rich history of Black Britain every day.

I was inspired to apply for this competition by last year’s winner, Katrina Navickas. Her project focused on the Chartist movement, and in particular using the nineteenth century digitised newspaper database to find locations of Chartist meetings around the country. Katrina and the Labs team wrote code to identify these meetings in the Chartist newspaper, and churned out hundreds of results that would have taken her years to search manually.

I wanted to do the same thing, but with black abolitionist speeches. However, there was an inherent problem: these abolitionists travelled to Britain between 1830-1900 and gave lectures in large cities and small towns: in other words their lectures were covered in numerous city and provincial newspapers. The scale of the project was perhaps one of the most difficult things we have had to deal with.

When searching the newspapers, one of the first things we found was the OCR (Optical Character Recognition) is patchy at best. OCR refers to scanned images that have been turned into machine-readable text, and the quality of the OCR depended on many factors – from the quality of the scan itself, to the quality of the paper the newspaper was printed on, to whether it has been damaged or ‘muddied.’ If the OCR is unintelligible, the data will not be ‘read’ properly – hence there could be hundreds of references to Frederick Douglass that are not accessible or ‘readable’ to us through an electronic search (see the image below).

American-slavery
An excerpt from a newspaper article about a public meeting about slavery, from the Leamington Spa Courier, 20 February 1847

In order to 'clean' and sort through the ‘muddied’ OCR and the ‘clean’ OCR, we need to teach the computer what is ‘positive text’ (i.e., language that uses the word ‘abolitionist’, ‘black’, ‘fugitive’, ‘negro’) and ‘negative text’ (language that does not relate to abolition). For example, the image to the left shows an advert for one of Frederick Douglass’s lectures (Leamington Spa Courier, 20 February 1847). The key words in this particular advert that are likely to appear in other adverts, reports and commentaries are ‘Frederick Douglass’, ‘fugitive’, ‘slave’, ‘American’, and ‘slavery.’ I can search for this advert through the digitised database, but there are perhaps hundreds more waiting to be uncovered.
We found examples where the name ‘Frederick’ had been ‘read’ as F!e83hrick or something similar. The image below shows some OCR from the Aberdeen Journal, 5 February 1851, and an article about “three fugitive slaves.” The term ‘Fugitive Slaves’ as a heading is completely illegible, as is William’s name before ‘Crafts.’ If I used a search engine to search for William Craft, it is unlikely this result would be highlighted because of the poor OCR.

Ocr-text
OCR from the Aberdeen Journal, 5 February 1851, and an article about “three fugitive slaves.”

I have spent several years transcribing black abolitionist speeches and most of this will act as the ‘positive’ text. ‘Negative’ text can refer to other lectures of a similar structure but do not relate to abolition specifically, for example prison reform meetings or meetings about church finances. This will ensure the abolitionist language becomes easily readable. We can then test the performance of this against some of the data we already have, and once the probability ensures we are on the right track, we can apply it to a larger data set.

All of this data is built into what is called a classifier, created by Ben O’Steen, Technical Lead of BL Labs. This classifier will read the OCR and collect newspaper references, but works differently to a search engine because it measures words by weight and frequency. It also relies on probability, so for example, if there is an article that mentions fugitive and slave in the same section, it ranks a higher probability that article will be discussing someone like Frederick Douglass or William Craft. On the other hand, a search engine might read the word ‘fugitive slave’ in different articles on the same page of a newspaper.

We’re currently processing the results of the classifier, and adjusting accordingly to try and reach a higher accuracy. This involves some degree of human effort while I double check the references to see whether the results actually contains an abolitionist speech. So far, we have had a few references to abolitionist speeches, but the classifier’s biggest difficulty is language. For example, there were hundreds of results from the 1830s and the 1860s – I instantly knew that these would be references around the Chartist movement because the language the Chartists used would include words like ‘slavery’ when describing labour conditions, and frequently compared these conditions to ‘negro slavery’ in the US. The large number of references from the 1860s highlight the renewed interest in American slavery because of the American Civil War, and there are thousands of articles discussing the Union, Confederacy, slavery and the position of black people as fugitives or soldiers. Several times, the results focused on fugitive slaves in America and not in Britain.

Another result we had referred to a West Indian lion tamer in London! This is a fascinating story and part of the hidden history we see as a central part of the project, but is obviously not an abolitionist speech. We are currently working on restricting our date parameters from 1845 to 1860 to start with, to avoid numerous mentions of Chartists and the War. This is one way in which we have had to be flexible with the initial proposal of the project.

Aside from the work on the classifier, we have also been working on numerous ways to improve the OCR – is it better to apply OCR correction software or is it more beneficial to completely re-OCR the collection, or perhaps a combination of both? We have sent some small samples to a company based in Canberra, Australia called Overproof, who specialise in OCR correction and have provided promising results. Obviously the results are on a small scale but it’s been really interesting so far to see the improvements in today’s software compared to when some of these newspapers were originally scanned ten years before. We have also sent the same sample to the IMPACT centre for competence of Competence in Digitisation whose mission is to make the digitisation of historical printed text “better, faster, cheaper” and provides tools, services and facilities to further advance the state-of-the-art in the field of document imaging, language technology and the processing of historical text. Preliminary results will be presented at the Labs Symposium.

Updated website

Before I started working with the Library, I had designed a website at http://www.frederickdouglassinbritain.com. The structure was rudimentary and slightly awkward, dwarfed by the numerous pages I kept adding to it. As the project progressed, I wanted to improve the website at the same time, and with the invaluable help of Dr Mike Gardner from the University of Nottingham, I re-launched my website at the end of October. Initially, I had two maps, one showing the speaking locations of Frederick Douglass, and another map showing speaking locations by other black abolitionists such as William and Ellen Craft, William Wells Brown and Moses Roper (shown below).

Website-update-maps
Left map showing the speaking locations of Frederick Douglass. Right map showing speaking locations by other black abolitionists such as William and Ellen Craft, William Wells Brown and Moses Roper.

After working with Mike, we not only improved the aesthetics of the website and the maps (making them more professional) but we also used clustering to highlight the areas where these men and women spoke the most. This avoided the ‘busy’ appearance of the first maps and allowed visitors to explore individual places and lectures more efficiently, as the old maps had one pin per location. Furthermore, on the black abolitionist speaking locations map (below right), a user can choose an individual and see only their lectures, or choose two or three in order to correlate patterns between who gave these lectures and where they travelled. 

Website-update-maps-v2
The new map interface for my website.

Events

I am very passionate about public engagement and regard it as an essential part of being an academic, since it is so important to engage and share with, and learn from, the public. We have created two events: as part of Black History Month on the 6th October, we had a performance here at the Library celebrating the life of two formerly enslaved individuals named William and Ellen Craft. Joe Williams of Heritage Corner in Leeds – an actor and researcher who has performed as numerous people such as Frederick Douglass and the black circus entertainer Pablo Fanque – had been writing a play about the Crafts, and because it fitted so well with the project, we invited Joe and actress Martelle Edinborough, who played Ellen, to London for a performance. Both Joe and Martelle were incredible and it really brought the Craft’s story and the project to life. We had a Q&A afterwards where everyone was very responsive and positive to the performance and the Craft’s story of heroism and bravery.

Hannah-murray-actors
(Left to Right) Martelle Edinborough, Hannah-Rose Murray and Joe Williams

The next event is a walking tour, taking place on Saturday 26 November. I’ve devised this tour around central London, highlighting six sites where black activists made an indelible mark on British society during the nineteenth century. It is a way of showing how we walk past these sites on a daily basis, and how we need to recognise the contributions of these individuals to British history.

Hopefully this project will inspire others to research and use digital scholarship to find more ‘hidden voices’ in the archive. In terms of black history specifically, people of colour were actors, sailors, boxers, students, authors as well as lecturers, and there is so much more to uncover about their contribution to British history. My personal journey with the Library and the Labs team has also been a rewarding experience. It has further convinced me that we need stronger networks of collaboration between scholars and computer scientists, and the value of digital humanities in general. Academics could harness the power of technology to bring their research to life, an important and necessary tool for public engagement. I hope to continue working with the Labs team fine-tuning some of the results, as well as writing some pages about black abolitionists for the new website. I’m very grateful to the Library and the Labs team for their support, patience, and this amazing opportunity as I’ve learned so much about digital humanities, and this project – with its combination of manual and technological methods – as a larger model for how we should move forward in the future. The project will shape my career in new and exciting ways, and the opportunity to work with one of the best libraries in the world is a really gratifying experience.

I am really excited that I will be there in London in a few days time to present my findings, why don't you come and join us at the British Library Labs Symposium, between 0930 - 1730 on Monday 7th of November, 2016?

20 September 2016

Black Abolitionists: Performance and Discussion for Black History Month by Hannah-Rose Murray

Posted by Mahendra Mahey on behalf of Hannah-Rose Murray, 2016 finalist of the BL Labs 2016 Competition.

To celebrate Black History Month in October 2016, you are welcome to attend an evening of performance on the 6th October, 7pm, hosted by the British Library Labs project and the Eccles Centre for American Studies in the Auditorium, Conference Centre, British Library, St Pancras, London, UK.

I am very lucky to be one of the finalists for the Labs Competition for 2016, and together we have organized an event that celebrates our project. Through my work with the Labs team, we are attempting to use machine learning to search through the digitized newspaper collections to access black abolitionist speeches and performances that have never been discovered before (read more here). This stems from my PhD project, which focuses on African Americans in Britain during the nineteenth century and the myriad ways they resisted British racism.

Two of the individuals I study are William and Ellen Craft, and we are really pleased to be working with two performers who will bring this incredible history to light on the evening of the 6th.

Ellen_craft
Ellen Craft dressed as a man to escape from slavery. Image from "The Underground Railroad from Slavery to Freedom" 2nd ed.,

William and Ellen Craft were born enslaved in Georgia. Ellen worked as a house servant, and when she was 20, married William (although by law in the South slave marriages were not legal.) They were determined to escape as they were fearful their master would sell them separately further South and they did not want to raise children in slavery. In 1848, they devised an ingenious escape plan: Ellen would pose as a gentleman with William as her manservant, and they would catch a series of trains and steamboats to the North. Ellen was fair-skinned, which was a result of her mother’s rape by her master, the plantation owner. Ellen could thus pass for a white person, but she could not read or write. To overcome this, Ellen strapped a bandage to her right hand to give her a reason not to be able to write just in case she was asked. This was an incredibly dangerous mission to accomplish - if caught, both William and Ellen would have been tortured and most certainly separated to different parts of the South, never to see each other again. It is a testimony to their bravery they managed to succeed.

 

For a short time, the Crafts settled in Boston but legally they were still enslaved in the eyes of the American government. When slave catchers threatened to steal them back into slavery, they set sail for England where they remained for over a decade. The Crafts soon became part of an abolitionist network in which hundreds of African Americans travelled to Britain to lecture against slavery, raise money to purchase enslaved family members or to live in Britain relatively safely from the violence they experienced in America. British audiences were fascinated by their incredible escape attempt, and were shocked that a ‘white’ person like Ellen could ever have been enslaved. Both William and Ellen travelled around Britain to educate Britons about the true nature of slavery and demanded their support in helping Americans abolish it.

During the evening, performer and writer Joe Williams will play William Craft. Joe has an MA from Leeds University’s School of Performance and Cultural industries and is the founder of Heritage Corner, which focuses on African narratives in British history. He has written performed works on leading abolitionists as well as on Victorian circus genius Pablo Fanque.

Martelle Edinborough will play Ellen Craft. Martelle has stage, film and television credits that include commercials and short films. Martelle has recently worked with the Leeds based Geraldine Connor Foundation on Forrest Dreaming and Chicken Shop Shakespeare’s contribution to this year’s Ilkley Literature Festival.

There will be a short welcome and introduction to the Crafts, and after which the performance will commence for an hour, with time for a Q&A afterwards.

Tickets are £8 (with some concessions available), and available here.

Please note a small number of free seats are available for community residents in Camden (London, England). If you think you are eligible, please contact Emma Morgan, Community Engagement Manager at the British Library at [email protected].

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs