Digital scholarship blog

Enabling innovative research with British Library digital collections

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

.