01 August 2018
Visualising the Endangered Archives Programme project data on Africa, Part 1. The project
Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.
This month I have learned:
- that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
- that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
- that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
- and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.
Why am I sifting through this data?
Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.
The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.
Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.
The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.
I’m also concerned with finding the most useful ways to visualise this information.
For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.
Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet. OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.
With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.
I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.
For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.
Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.
Another chart that was quick to make shows the number of applicants from each continent by year.
This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.
This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.
I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.
This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.