THE BRITISH LIBRARY

Digital scholarship blog

17 posts categorized "Africa"

19 February 2019

BL Labs 2018 Teaching & Learning Award Runner Up: 'Pocahontas and After'

Add comment

This guest blog is by Border Crossing, the 2018 BL Labs Teaching & Learning Award Runners Up, for their project, 'Pocahontas and After'.

BorderCross image 1

Two images, each showing two young women dressed to show their culture, their pride, their sense of self. The first image dates from 1907, and shows The Misses Simeon, from the Stoney-Nakoda people of Western Canada, photographed by Byron Harmon. The second was taken in 2018 by John Cobb at Marlborough Primary School, West London, and shows a pupil of Iraqi heritage called Rose Al Saria, pictured with her sister. It was Rose who chose the particular archive image as the basis for her self-portrait, and who conceptualised the way it would be configured and posed.

This pair of photos is just one example in Border Crossings' exhibition Pocahontas and After, which was recently honoured in the British Library’s Labs Teaching and Learning category. The exhibition - which was seen by more than 20,000 people at Syon House last summer, and goes to St Andrews in February - represents the culmination of a sustained period of education and community work, beginning with the 2017 ORIGINS Festival. During the Festival, we not only held a ceremony for three indigenous women to commemorate Pocahontas at Syon, where she had stayed in the summer of 1616: we also brought indigenous artists into direct contact with the diverse communities around the House, in the two Primary Schools where they led workshops and study sessions, in the wonderful CARAS refugee group, and through our network of committed and energetic festival volunteers. In the following months, a distilled group from each of these partners worked closely with heritage experts from the archives (including the British Library’s own Dr. Philip Hatfield), Native American cultural consultants, and our own artistic staff to explore the ways in which Native American people have been presented in the past.

Their journeys into the archives were rich and challenging. What we think of as "realistic" photographs of indigenous people often turned out to be nothing of the kind. Edward Curtis, for example, apparently carried a chest of "authentic" costumes and props with him, which he used in his photographs to recreate the life of "the vanishing race" as he imagined it may have been in some pre-contact Romantic idyll. In other words, the archive photos are often about the photographer and the viewer, far more than they are about the subject.

BorderCross image 2

BorderCross image 3

As our volunteers came to realise this, they became more and more assertive of the need for agency in contemporary portraiture. Complex and fascinating decisions started to be made, placing the generation of meaning in the bodies of the people photographed. For example, Sebastian Oliver Wallace-Odi, who has Ghanaian heritage, saw how Ronald Mumford’s archive photo had been contrived to show “British patriotism” from First Nations chiefs, riding a car bedecked in a Union Jack, during the First World War. Philip showed him how other photos demonstrated the presence of Mounties at the shoot, emphasising the lack of agency from the subjects. Sebastian countered it with an image in which the red white and blue flag is the symbol of the London Underground where his father works, and the car, like his shirt, is distinctly African.

What I love about this exhibition is that the meaning generated does not reside in one image or the other within the pair - but is rather in the energising of the space between, the dialogue between past and present, between different cultures, between human beings portrayed in different ways. It seems to me to be at once of way of honouring the indigenous subjects portrayed in the archive photographs, and of reinventing the form that was often too reductive in its attempts to categorise them.

Thanks to the Heritage Lottery Fund for supporting this project. Photos from the British Library digital collections.

Michael Walling - Artistic Director, Border Crossings. www.bordercrossings.org.uk

Watch the Border Crossing team receiving their Runner Up award and talking about their project on our YouTube channel (clip runs from 3.46 to 10.09):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

13 February 2019

BL Labs 2018 Artistic Award Runner Up: 'Nomad'

Add comment

Nomad is a collaborative project between Abira Hussein, an independent researcher and curator, and Sophie Dixon and Ed Silverton of Mnemoscene. They were the runners up in the BL Labs Artistic Award category for 2018, and they've written a guest blog post about their project for the Digital Scholarship blog.

Nomad: Reconnecting Somali heritage

The project has been supported by the Heritage Lottery fund and premiered at the British Library and British Museum during the Somali Week Festival 2018. Centred around workshops engaging Somali communities in London, Nomad explores the creative use of Mixed Reality and web-based technology to contextualise archival Somali objects with the people and traditions to which they belong.

Nomad 1

Nomad began with three Somali heritage objects - a headrest, bowl, and incense burner - which had been digitised at the British Museum. Thanks to Object Journeys, a previous project Abira was involved in, they were freely available to use.

Our goal was to reflect the utilitarian nature of the objects by showing their intended use. Furthermore, in Somali culture, songs and poetry are very important and we wanted to reconnect the objects to the sounds and traditions to which they belonged.

Our approach was to use Microsoft’s Mixed Reality HoloLens headset to show a Nomadic Somali family using the objects in real, everyday spaces. When wearing the headset the user can select different objects to reveal different members of the family, seeing how the object would be used, and hearing the songs which would have accompanied their use.

You can get a taste of the HoloLens experience in this short video (1 minute).

To create these ephemeral figures we used motion capture and 3D modelling, creating the clothing by referencing archival photographs held at the Powell Cotton Museum in Kent.

We used the British Library’s John Low collection as the source for the sounds you hear in the Mixed Reality experience. John Low travelled across Somalia between 1983-1986 working for an NGO to support community development. In his spare time he made field recordings with different tribes and dialects, providing an insight into the diversity of Somali oral traditions. The collection includes work songs reflecting pastoral life and poems, also known as Gabay, which are often recited in communal settings.

Nomad 2
Workshop held at the British Museum during the Somali Week Festival

With support from the Heritage Lottery Fund we toured the Mixed Reality experience to different Somali communities in London. The immersive experience became a way to inspire and encourage communities to share their own stories, to be part of an openly accessible archive representing their own narratives for Somali cultural heritage.

These workshops were exciting events in which participants handled real objects, tried the Mixed Reality experience and took part in the photogrammetry process to capture 3D models of the objects they had brought to the workshops.

To make the objects and sounds accessible to all, we also created Web-based Augmented Reality postcards to be used in the workshops. 

Nomad 3
Workshop participants looking at 3D objects using web-based Augmented Reality on their mobile phones

From the workshops we have 3D models, photographs and audio recordings which we’re currently adding to an online archive using the Universal Viewer. For updates about the archive and to find out more about our project please visit us at nomad-project.co.uk.

Watch the Nomad team receiving their award and talking about their project on our YouTube channel (clip runs from 4:15 to 8:16):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

05 February 2019

BL Labs 2018 Research Award Honourable Mention: 'Doctoral theses as alternative forms of knowledge: Surfacing "Southern" perspectives on student engagement with internationalisation'

Add comment

This guest blog is by Professor Catherine Montgomery, recipient of one of two Honourable Mentions in the 2018 BL Labs Awards Research category for her work with the British Library's EThOS collection.British Library slide 1

 ‘Contemporary universities are powerful institutions, interlinked on a global scale; but they embed a narrow knowledge system that reflects and reproduces social inequalities on a global scale’ (Connell, 2017).

Having worked with doctoral students for many years and learned much in this process my curiosity was sparked by the EThOS collection at the British Library. EThOS houses a large proportion of UK doctoral theses completed in British Universities and comprises a digital repository of around 500,000 theses. Doctoral students use this repository regularly but mostly as a means of exploring examples of doctorates in their chosen area of research. In my experience, doctoral students are often looking at formats or methodologies when they consult EThOS rather than exploring the knowledge provided in the theses.

So when I began to think about the EThOS collection as a whole, I came to the conclusion that it is a vastly under-used but incredibly powerful resource. Doctoral knowledge is not often thought of as a coherent body of knowledge, although individual doctoral theses are sometimes quoted and consulted by academics and other doctoral students. It is also important to remember that of 84,630 Postgraduate Research students studying full time in the UK in 2016/17, half of them, 42,325, were non-UK students, with 29,875 students being from beyond the EU. So in this sense, the knowledge represented in the EThOS collection is an important international body of knowledge.

So I began to explore the EThOS collection with some help from a group of PhD students (Gihan Ismail, Luyao Li and Yanru Xu, all doctoral candidates at the Department of Education at the University of Bath) and the EThOS library team. I wanted to interrogate the collection for a particular field of knowledge and because my research field is internationalisation of higher education, I carried out a search in EThOS for theses written in the decade 2008 to 2018 focusing on student engagement with internationalisation. This generated an initial data set of 380 doctoral theses which we downloaded into the software package NVivo. We then worked on refining the data set, excluding theses irrelevant to the topic (I was focusing on higher education so, for example, internationalisation at school-level topics were excluded) coming up with a final data set of 94 theses around the chosen topic. The EThOS team at the British Library helped at this point and carried out a separate search, coming up with a set of 78 theses using a specific adjacent word search and they downloaded these into a spreadsheet for us. The two data sets were consistent with each other which was really useful triangulation in our exploration of the use of the EThOS repository.

This description makes it sound very straightforward but there were all sorts of challenges, many of them technology related, including the fact that we were working with very large amounts of text as each of the 380 theses was around 100,000 words long or more and this slowed down the NVivo software and sometimes made it crash. There were also challenges in the search process as some earlier theses in the collection were in different formats; some were scanned and therefore not searchable.

The outcomes of the work with the EThOS collection were fascinating. Various patterns emerged from the analysis of the doctoral theses and the most prominent of these were insights into the geographies of student engagement with internationalisation; issues of methodologies and theory; and different constructions of internationalisation in higher education.

The theses were written by students from 38 different countries of the globe and examined internationalisation of higher education in African countries, the Americas and Australia, across the Asian continent and Europe. Despite this diversity amongst the students, most of the theses investigated internationalisation in the UK or international students in the UK. The international students also often carried out research on their own countries’ higher education systems and there was some limited comparative research but all of these compared their own higher education systems with one or (rarely) two others. There was only a minority of students who researched the higher education systems of international contexts different from their own national context.

A similar picture emerged when I considered the sorts of theories and ideas students were using to frame their research. There was a predominance of Western theory used by the international students to cast light on their non-western educational contexts, with many theses relying on concepts commonly associated with Western theory such as social capital, global citizenship or communities of practice. The ways in which the doctoral theses constructed ideas of internationalisation also appeared in many cases to be following a well-worn track and explored familiar concepts of internationalisation including challenges of pedagogy, intercultural interaction and the student experience. Having said this, there were also some innovative, creative and critical insights into students engaging with internationalisation, showing that alternative perspectives and different ways of thinking were generated by the theses of the EThOS collection.

Raewyn Connell, an educationalist I used in the analysis of this project tells us that in an unequal society we need ‘the view-from-below’ to challenge dominant ways of thought. I would argue that we should think about doctoral knowledge as ‘the-view-from-below’, and doctoral theses can offer us alternative perspectives and challenges to the previous narratives of issues such as internationalisation. However, it may be that the academy will need to make space for these alternative or ‘Southern’ perspectives to come in and this will rely on the capacity of the participants, both supervisors and students, to be open to negotiation in theories and ideas, something which another great scholar, Boaventura De Sousa Santos, describes as intercultural translation of knowledge.

I am very grateful indeed to the British Library and the EThOS team for developing this incredible source of digital scholarship and for their support in this project. I was delighted to be given an honourable mention in the British Library Research Lab awards and I am intending to take this work forward and explore the EThOS repository further. I was fascinated and excited to find that a growing number of countries are also developing and improving access to their doctoral research repositories (Australia, Canada, China, South Africa and USA to name but a few). This represents a huge comparative and open access data set which could be used to explore alternative perspectives on ‘taken-for-granted’ knowledge. Where better to start than with doctoral theses?

More information on the project can be found in this published article:

Montgomery, C. (2018). Surfacing ‘Southern’ perspectives on student engagement with internationalisation: doctoral theses as alternative forms of knowledge. Journal of Studies in International Education. (23) 1 123-138. https://doi.org/10.1177/1028315318803743

British Library slide 2

Watch Professor Montgomery receiving her award and talking about her project on our YouTube channel (clip runs from 6.57 to 10.39):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

06 September 2018

Visualising the Endangered Archives Programme project data on Africa, Part 3. Finishing up

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

This summer I have taken a break by working hard, I’ve broadened my academic horizons by ignoring academia completely, and I’ve felt at home while travelling hundreds of miles a week. But above all else, I’ve just had a really nice time.

In my last two blogs I covered the early stages of my placement at the British Library, and discussed the data visualisation tools I’ve been exploring.

In this final blog I am going to outline the later stages of my project, I am also going to talk about my experience of undertaking a British Library placement, what I’ve learned and whether it was worth it (spoiler alert, it was).

What I’ve been doing

The final stages of my project have mostly consisted of two separate lines of investigation.

Firstly, I have been working on finding out as much as I can about the  Endangered Archives Programme (EAP)’s projects in Africa and finding the best ways to visualise that information in order to create a sort of bank of visualisations that the EAP team can use when they are talking about the work that they do. Visualisations, such as the one below showing the number of applications related to each region of Africa by year, can make tables of data much easier to understand.

Chart

Secondly, I was curious about why some project applications get funded and some do not. I wanted to know if I could identify any patterns in the reasons why projects get rejected.

This gave me the opportunity to apply my skills as a linguist to the data, albeit on a small scale. I decided to examine the feedback given to unsuccessful applicants by the panel that awards the EAP grants to see if I could identify any patterns. To do this I created a corpus, or electronic database, of texts. This could then be run through corpus analysis software to look for patterns.

AntConc

This image shows a word list created for my corpus using AntConc software, which is a free and open source corpus analysis tool.

My analysis allowed me to identify a number of issues common to many unsuccessful applications. In addition to applications outside of the scope of EAP there are also proposals which would make excellent projects but their applications lack the necessary information to award a grant.

Based on my analysis I was able to make a number of recommendations about additional information EAP could provide for applicants which might help to prevent potentially valuable archives being lost due to poor applications.

What I’ve learned

As well as learning about visualisation software I’ve learned a lot this summer about the EAP archives.

I’ve found out where applications are coming from, and which African countries have the most associated applications. I’ve learned that there are many great data visualisation tools available for free online. I’ve learned that there are over 70 different languages represented in the EAP archived projects from Africa.

EAP656
James Ssali and an unknown woman, from the Ham Musaka archive, Uganda (EAP656)

One of the most interesting things I’ve learned is just how much archival material is available for research – and on an incredibly broad range of topics. The materials digitised and preserved in Africa over the last 13 years includes:

This wealth of information provides so much opportunity for research and these are just the archives from Africa. The EAP funds projects all over the world.

EAP143
Shui manuscript from China (EAP143)

In addition to learning about the EAP archives I’ve learned a lot from working in the British Library more generally. The scale of the work that is carried out is immense and I don’t think I fully appreciated before working here for three months just how large the challenges they face are.

In addition to preserving a copy of every book published in the UK, the BL is also working to create large digital archives in order to facilitate the way that modern scholarship has developed. They are digitising books, audio, websites, as well as historical documents such as the records of the East India Company.

East India House
View of East India House by Thomas Hosmer Shepherd

Was it worth it?

A PhD is an intense thing to undertake and you have a time limit to complete it. At first glance, taking three months out to work on a placement with little direct relevance to my PhD might seem a bit foolish, particularly when it means a daily commute from Brighton to London.

Far from wasting my time, however, this placement has been an enriching experience. My PhD is on the origins and development of Cameroon Pidgin English. This placement has given me a break from my work while broadening my understanding of African culture and the context in which the language I study is spoken.

I’ve always had an interest in data visualisation and my placement has given me time to play with visualisation tools and gain a real understanding of the resources available. I feel refreshed and ready for the new term despite having worked full time all summer.

The break has also given me thinking space, it has allowed ideas to percolate and given me new skills which I can apply to my work. Taking a break from academia has given me more perspective on my work and more options for how to develop it.

BL
The British Library, St Pancras

Finally, the travel has been a lot but my supervisors have been very flexible, allowing me to work from home two days a week. The up-side of coming to London regularly has been getting to work with interesting people.

Working in a large institution could be an intimidating and isolating experience but it has been anything but. The digital scholarship team have been welcoming and interested, in particular I have had two very supportive supervisors. The British Library are really keen to support and develop placement students, and there is a lovely community of PhD students at the BL some on placements, some doing their PhD here.

I have had a great time at the British Library this summer and can only recommend the scheme to anyone thinking of applying for a placement next year.

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

01 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 1. The project

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.

This month I have learned:

  • that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
  • that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
  • that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
  • and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
EAP051
Idrissou Njoya and Nji Mapon examine Mapon's endangered manuscript collection in Cameroon (EAP051)

I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.

Why am I sifting through this data?

Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.

EAP466
Manuscript of the Riyadh Mosque of Lamu, Kenya (EAP466)

The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.

Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.

The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.

EAP265
Tifinagh rock inscriptions in the Tadrart Acacus mountains, Libya (EAP265)

I’m also concerned with finding the most useful ways to visualise this information.

For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.

Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet.  OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.

With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.

I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.

Major vs Pilot Chart

For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.

Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.

Another chart that was quick to make shows the number of applicants from each continent by year.

Continent of Applicant Chart

This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.

This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.

I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.

This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.

01 May 2018

New Digital Curator in the Digital Scholarship Team

Add comment

Adi Keinan-SchoonbaertHello all! My name is Adi Keinan-Schoonbaert, and I’m the new Digital Curator for Asian and African collections at the British Library. One of the core remits of the Digital Scholarship team is to enable and encourage the reuse of the Library’s digital collections. When it comes to Asian and African collections, there are always interesting projects and initiatives going on. One is the Two Centuries of Indian Print project, which just started a second phase in March 2018 – a project with a strong Digital Humanities strand led by Digital Curator Tom Derrick. Another example is a collaborative transcription project, supporting the transcription of handwritten historical Arabic scientific works for Handwritten Text Recognition (HTR) research with the help of volunteers.

To give a bit of a background about myself and how I got to the Library: I’m an archaeologist and heritage professional by education and practice, with a PhD in Heritage Studies from University College London (2013). As a field archaeologist I used to record large quantities of excavation-related data – all manually, on paper. This was probably the first time I saw the potential of applying digital tools and technologies to record, manage and share archaeological data.

My first meaningful engagement with archaeological data and digital technologies started in 2005, when I joined the Israeli-Palestinian Archaeology Working Group (IPAWG) to create a database of all archaeological sites surveyed or excavated by Israel in the West Bank since its occupation in 1967, and its linking with a Geographic Information System (GIS), enabling the spatial visualisation and querying of this data for the first time. The research potential of this GIS-linked database proved so great, that I’ve decided to further explore it in a PhD dissertation. My dissertation focused on archaeological databases covering the occupied West Bank, and I was especially interested in the nature of archaeological records and the way they reflect particular research interests and heritage management priorities, as well as variability in data quality, coverage, accuracy and reliability.

Following my PhD I stayed at UCL Institute of Archaeology as a post-doctoral research associate, and participated in a project called MicroPasts, a UCL-British Museum collaboration. This project used web-based, crowdsourcing methods to allow traditional academics and other communities in archaeology to co-produce innovative open datasets. The MicroPasts crowdsourcing platform provided a great variety of projects through which people could contribute – from transcribing British Museum card catalogues, through tagging videos on the Roman Empire, to photomasking images in preparation for 3D modelling of museum objects.

With the main phase of the MicroPasts project coming to an end, I joined the British Library as Digital Curator (Polonsky Fellow) for the Hebrew Manuscripts Digitisation Project. This role allowed me to create and implement a digital strategy for engaging, accessing and promoting a specific digitised collection, working closely with curators and the Digital Scholarship team. My work included making the collection digitally accessible (on data.bl.uk, working with British Library Labs) and encouraging open licensing, creating a website, promoting the collection in different ways, researching available digital methods to explore and exploit collections in novel ways, and implementing tools such as an online catalogue records viewer (TEI XML), OpenRefine, and 3D modelling.

A 6-months backpacking trip to Asia unexpectedly prepared me for my new role at the Library. I was delighted to join – or re-join – the Library’s Digital Research team, this time as Digital Curator for Asian and African Collections. I find these collections especially intriguing due to their diversity, richness and uniqueness. These include mostly manuscripts, printed books, periodicals, newspapers, photographs and e-resources from Africa, the Middle East (including Qatar Digital Library), Central Asia, East Asia (including the International Dunhuang Project), South Asia, SE Asia – as well as the Visual Arts materials.

I’m very excited to join the Library’s Digital Research team work alongside Neil Fitzgerald, Nora McGregor, Mia Ridge and Stella Wisdom and learn from their rich experience. Feel free to get in touch with us via digitalresearch@bl.uk or Twitter - @BL_AdiKS for me, or @BL_DigiSchol for the Digital Scholarship team.

12 March 2018

The Ground Truth: Transcribing historical Arabic Scientific Manuscripts for OCR research

Add comment

Announcing a collaborative transcription project to support state-of-the-art research in automatic handwritten text recognition for historical Arabic texts

Cultural heritage institutions around the world are digitising hundreds of thousands of pages of historical Arabic manuscript and archive collections. Making these fully text searchable has the potential to truly transform scholarship, opening up this rich content for discovery and enabling large-scale analysis.

Computer scientists and scholars are working on this challenge, building systems which can automatically transcribe images of handwritten text, but for historical Arabic script a solution remains just out of reach.

Our aim is to contribute to continued research in this area by building an open image and ground truth dataset of historical handwritten Arabic texts, ensuring historical Arabic collections benefit from state-of-the-art developments in handwritten text recognition.

What is Ground Truth?

Optical Character Recognition (OCR) systems essentially turn a picture of text into text itself—in other words, producing something like a .TXT or .DOC file from a scanned .JPG of a printed or handwritten page. Most OCR systems require ground truth, a set of files which represent the truthful record of elements of an image, for training and evaluation purposes.

The ground truth of an image’s text content, for instance, is the complete and accurate record of every character and word in the image.

By knowing what the system is supposed to recognise on a page of handwritten text, researchers can both train their system to recognise the characters as well as test how well the system does once trained.

Transcription
 

  
View more transcriptions in progress from this manuscript (Or 3366) on the platform 

A collaborative approach

This project is a proof of concept exploring whether the creation of such a dataset can be done collaboratively at scale, using the collective expertise of volunteers around the world. At the heart of this approach is the Library’s enduring commitment to creating new and interesting ways to connect diverse communities of interest and expertise, be it scholars, the general public, computer scientists, students, and curators, around our collections. For this we are utilising a free and open-source platform, From the Page, which allows anyone with an interest in historical Arabic manuscripts to experience them up close, many for the first time, to discuss, learn and share expertise in their transcription.

Helping transform research

The Digital Scholarship Department was able to fund the development of this open source platform to support Right-to-Left transcription, a feature which will benefit any scholar wishing to use the software for their own transcription needs. Any transcriptions produced in this pilot will be transformed into ground truth resources, hosted by the British Library and made freely available, without rights restriction, for anyone wishing to advance the state-of-the-art in optical character recognition technology. Specifically, resources created will be contributed to ground-breaking projects already underway such as Transkribus, the Open Islamic Texts Initiative, the IMPACT Centre of Competence Image and Ground Truth Resources and more!

Visit the new Arabic Scientific Manuscripts of the British Library transcription platform and download our Getting Started Guide for more detail (an Arabic version will be available shortly). 

  

Posted by Nora McGregor, Digital Curator, British Library