THE BRITISH LIBRARY

Digital scholarship blog

7 posts from August 2018

28 August 2018

Student project report: Scribal Handwriting: An automated manuscript analysis tool

In 2017-18, Dr Mia Ridge worked with three groups of second year students on UCL's Computer Science course to apply their skills to collections and digital scholarship-based projects. In this post, Francesco Benintende (francescobenintende@icloud.com), Kamil Zajac (kamil.zajac.16@ucl.ac.uk) and Andrei Maxim (andrei.maxim.16@ucl.ac.uk) explain how they worked with curator Alison Hudson on 'Scribal Handwriting: An automated manuscript analysis tool'. A video of their final presentation is available online and their project page contains more technical information.

The challenge

The team was challenged to create a tool for palaeographers (researchers who analyse handwriting) that can determine the date of a manuscript and sometimes even its scribe and place of production. To help with this task, we designed a tool to quickly find occurrences of similar handwritten characters across a collection of documents. This would be a lengthy and repetitive task if done manually by researchers. Typically, researchers compare characters’ features such as script, size and ink of different manuscripts to establish possible similarities between manuscripts and scribes. 

Our mission was to create a faster and reliable tool that could be used by palaeographers. Our aim was to speed up their research process by automating the comparisons between characters.

Our approach

To create a solution for this particular challenge, our first approach consisted of problem research and user needs analysis. During this phase we made sure we highlighted the main features necessary for the application. We wanted to improve current methods and to understand the needs of future users. This phase was characterised by interviews, questionnaires and surveys aimed at people with similar technical level and background of the future user. This helped us tailor an appropriate user interface to the researchers. In addition to this, we tried to understand what the current limitations of research in the palaeography field were. Our initial research can be found at http://students.cs.ucl.ac.uk/2017/group33/initial_research.html.

After acquiring this initial information, creating the first prototype of the web application and testing the user response to the graphic components, we shifted our focus on building a system that would recognise characters written in similar scripts. This was the main phase of the development of the project. It consisted mainly of testing and evaluating different methods to find and compare characters’ features.

In our final phase, we were concerned with testing and evaluating our web application overall.

The solution

Our solution is a web app that allows researchers to create an account, upload and maintain a collection of manuscripts. With this, they can perform character searches in their personal collection. Furthermore, it allows researchers to perform analyses on these documents from anywhere.

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/ec4b8aec-2c5d-4b88-a021-8d856f190c16.png
Manuscript Collection page of Scribal Handwriting, where users can save documents.


To power our web app, we created our own algorithm to perform analysis between two characters or two ligatures. (Ligatures are the name for two characters written as one shape, as in the example of ‘NT’ below.) It does this by finding characters in selected pages to compare them with the character to be found. This analysis relies on converting the images of characters into ‘functions’ and then comparing them. This enables us to identify similar patterns in the characters, such as recurrent shapes and angles, and use this information to treat two characters as being similar.

 

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/f696d828-4fb4-4ec2-8fb9-20df5df14ad5.png
Example of character search using a selection image (right image) to be found in a page (left image) using our algorithm.

 

Evaluation

Overall our solution offers a consistent improvement over current manual methods as it enables researchers to work on important documents without having to physically consult them. It also offers useful data about the results, by indicating which results are most similar in shape and size to the original character. This can help scholars think about how scribes’ work or even a pen’s sharpness might change over the course of many pages. This might also offer new ways of arguing which parts of a manuscript were copied out by different scribes. Such arguments are often based largely or entirely on more subjective appraisals. While these are still necessary, this app is a useful addition to palaeographers’ toolkits. The app also usefully places the results in context: their location is shown on the full page, and the excerpts include a few letters to either side. As we conclude from our testing, the main limitation in performance can be found in larger images (in pixels) and damaged manuscripts.

Regardless of size and condition, our web app is able to consistently find occurrences of characters in different manuscripts.

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/c294c710-d576-48e0-87e1-6d7a27998ea6.png
Example of character search results of for “m” character.

Conclusion

This project allowed our team to experience real world applications of image processing as well as getting a unique insight into the world of palaeography and its research procedures. The team  also had to consider how to develop a web application for an audience that might not have used this sort of program before, and how to make a website that could work on a wide range of devices, from smartphones to relatively old desktop computers.  Moreover, we outlined future points that would improve the web app to make it a consumer grade tool for researchers: the use of machine learning technologies to improve performance, a mobile version to allow researchers to work from their smartphones too, a version of the app to analyse shapes and decoration as well as text, and an improved version of the algorithm to analyse damaged documents where there is less contrast between the colour of the ink and the colour of the parchment.

23 August 2018

BL Labs Symposium (2018): Book your place for Mon 12-Nov-2018

The BL Labs team are pleased to announce that the sixth annual British Library Labs Symposium will be held on Monday 12 November 2018, from 9:30 - 17:30 in the British Library Knowledge Centre, St Pancras. The event is free, and you must book a ticket in advance. Last year's event was a sell out, so don't miss out!

The Symposium showcases innovative and inspiring projects which use the British Library’s digital content, providing a platform for development, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of digital collections and data in the cultural heritage sector.

We are very proud to announce that this year's keynote will be delivered by Daniel Pett, Head of Digital and IT at the Fitzwilliam Museum, University of Cambridge.

Daniel Pett
Daniel Pett will be giving the keynote at this year's BL Labs Symposium. Photograph Copyright Chiara Bonacchi (University of Stirling).

  Dan read archaeology at UCL and Cambridge (but played too much rugby) and then worked in IT on the trading floor of Dresdner Kleinwort Benson. Until February this year, he was Digital Humanities lead at the British Museum, where he designed and implemented digital practises connecting humanities research, museum practice, and the creative industries. He is an advocate of open access, open source and reproducible research. He designed and built the award-winning Portable Antiquities Scheme database (which holds records of over 1.3 million objects) and enabled collaboration through projects working on linked and open data (LOD) with the Institute for the Study of the Ancient World (New York University) (ISAWNYU) and the American Numismatic Society. He has worked with crowdsourcing and crowdfunding (MicroPasts), and developed the British Museum's 3D capture reputation. He holds Honorary posts at UCL Institute of Archaeology and the Centre for Digital Humanities and publishes regularly in the fields of museum studies, archaeology and digital humanities.

Dan's keynote will reflect on his years of experience in assessing the value, impact and importance of experimenting with, re-imagining and re-mixing cultural heritage digital collections in Galleries, Libraries, Archives and Museums. Dan will follow in the footsteps of previous prestigious BL Labs keynote speakers: Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); and Bill Thompson and Andrew Prescott in 2013.

Stella Wisdom (Digital Curator for Contemporary British Collections at the British Library) will give an update on some exciting and innovative projects she and other colleagues have been working on within Digital Scholarship. Mia Ridge (Digital Curator for Western Heritage Collections at the British Library) will talk about a major and ambitious data science/digital humanities project 'Living with Machines' the British Library is about to embark upon, in collaboration with the Alan Turing Institute for data science and artificial intelligence.Throughout the day, there will be several announcements and presentations from nominated and winning projects for the BL Labs Awards 2018, which recognise work that have used the British Library’s digital content in four areas: Research, Artistic, Commercial, and Educational. The closing date for the BL Labs Awards is 11 October, 2018, so it's not too late to nominate someone/a team, or enter your own project! There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2018 which showcases the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close 12 October 2018).

Adam Farquhar (Head of Digital Scholarship at the British Library) will give an update about the future of BL Labs and report on a special event held in September 2018 for invited attendees from National, State, University and Public Libraries and Institutions around the world, where they were able to share best practices in building 'labs style environmentsfor their institutions' digital collections and data.

There will be a 'sneak peek' of an art exhibition in development entitled 'Imaginary Cities' by the visual artist and researcher Michael Takeo Magruder. His practice  draws upon working with information systems such as live and algorithmically generated data, 3D printing and virtual reality and combining modern / traditional techniques such as gold / silver gilding and etching. Michael's exhibition will build on the work he has been doing with BL Labs over the last few years using digitised 18th and 19th century urban maps bringing analog and digital outputs together. The exhibition will be staged in the British Library's entrance hall in April and May 2019 and will be free to visit.

Finally, we have an inspiring talk lined up to round the day off (more information about this will be announced soon), and - as is our tradition - the symposium will conclude with a reception at which delegates and staff can mingle and network over a drink and nibbles.

So book your place for the Symposium today and we look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

Posted by Mahendra Mahey and Eleanor Cooper (BL Labs Team)

15 August 2018

Seeking researchers to work on an ambitious data science and digital humanities project at the British Library and Alan Turing Institute (London)

If you follow @BL_DigiSchol or #DigitalHumanities hashtags on twitter, you might have seen a burst of data science, history and digital humanities jobs being advertised. In this post, Dr Mia Ridge of the Library's Digital Scholarship team provides some background to contextualise the jobs advertised with the 'Living with Machines' project.

We are seeking to appoint several new roles who will collaborate on an exciting new project developed by the British Library and The Alan Turing Institute, the national centre for data science and artificial intelligence. You'd be working with an inter-disciplinary group of investigators. The project is led by Ruth Ahnert (QMUL), and co-led by (in alphabetical order): Adam Farquhar (British Library), Emma Griffin (UEA), James Hetherington (Alan Turing Institute), Jon Lawrence (Exeter), Barbara McGillivray (Alan Turing Institute and Cambridge) and Mia Ridge (British Library).

In its early stages of development, the project, called Living with Machines, brings together national-scale digital collections and data, advanced data science techniques, and fundamental humanities questions. It will look at the social and cultural impact of mechanisation across the long nineteenth century, using data science methods both to track the application of technology to our social and economic lives and the human response to their introduction. The project will initially work with digitised newspaper collections, but will look to include a variety of sources and formats held by the British Library and other institutions.

So what does this mean for you? The project name is both a reference to the impact of the Industrial Revolution and a nod to the impact of computational methods on scholarship. It will require radical collaboration between historians, data scientists, geographers, computational linguists, and curators to enable new intersections between computational methods and historical inquiry. 

We’re looking to recruit people interested in examining the impact - the challenges, as well as the opportunities - of intensely interdisciplinary collaboration, while applying transformative data-science driven approaches to enable new research questions and approaches to historical sources. As a multidisciplinary project, it will require people with enough perspective on their own disciplines to explain often tacit knowledge about the norms and practices of those disciplines. Each team member will play an active part in relevant aspects of the research process, including outreach and publications, while gaining experience working on a very large-scale project. We're looking for people who enjoy collaboration and solving problems in a complex environment.

As the job titles below indicate, the project will require people with a range of skills and experience. Outputs will range from visualisations, to monographs and articles, to libraries of code; from training workshops and documentation, to work ensuring the public and other researchers can meaningfully access and interpret the results of data science processes. A number of roles are offered to help make this a reality.

Jobs currently advertised:

The British Library jobs are now advertised, closing September 21:

You may have noticed that the British Library is also currently advertising for a Curator, Newspaper Data (closes Sept 9). This isn’t related to Living with Machines, but with an approach of applying data-driven journalism and visualisation techniques to historical collections, it should have some lovely synergies and opportunities to share work in progress with the project team. There's also a Research Software Engineer advertised that will work closely with many of the same British Library teams.

If you're applying for these posts, you may want to check out the Library's visions and values on the refreshed 'Careers' website.

Keep an eye out for press releases and formal announcements from the institutions involved, but in the meantime, please share the job ads with people who might be suitable for any of these roles. If you have any questions about the roles, HR@turing.ac.uk is a great place to start.

13 August 2018

The Parts of a Playbill

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can help out with In the Spotlight at playbills.libcrowds.com.

In the Spotlight is a brilliant tool for spotting variations between playbills across the eighteenth and nineteenth centuries. The site provides participants with access to thousands of digitised playbills, and the sheets of the playbills in the site’s collections often have lists of the cast, scenes, and any innovative ‘machinery’ involved in the production. Whilst the most famous actors obviously needed to be emphasised and drew more crowds (e.g., any playbills featuring Mr Kean tend to have his name in huge letters), from the playbills in In the Spotlight’s volumes that doesn’t always seem to be the case with playwrights. Sometimes they’re mentioned by name, but in many cases famous playwrights aren't named on the playbill. I’ve speculated previously that this is because these playwrights were so famous that perhaps audiences would hear by word of mouth or press that a new play was out by them, so it was assumed that there was no point in adding the name as audiences would already know?

What can you expect to see on a playbill?

The basics of a playbill are: the main title of the performance, a subtitle, often the current date, future or past dates of performances, the cast and characters, scenery, short or long summaries of the scenes to be acted, whether the performance is to benefit anyone, and where tickets can be bought from. There are definitely surprises though: the In the Spotlight team have also come across apologies from theatre managers for actors who were scheduled to perform not turning up, or performing drunk! The project forum has a thread for interesting things 'spotted on In the Spotlight', and we always welcome posts from others.

Crowds would often react negatively if the scheduled performers weren’t on stage. Gilli Bush-Bailey also notes in The Performing Century (2007) that crowds would be used to seeing the same minor actors reappear across several parts of the performance and playbills, stating that ‘playbills show that only the lesser actors and actresses in the company appear in both the main piece and the following farce or afterpiece’ (p. 185), with bigger names at theatres royal committing only to either a tragic or comic performance.

From our late 18th century playbills on the site, users can see quite a standard format in structure and font.

Vdc_100022589157.0x000013
In this 1797 playbill from the Margate volume, the font is uniform, with variations in size to emphasise names and performance titles.

How did playbills change over time?

In the 19th century, all kinds of new and exciting fonts are introduced, as well as more experimentation in the structuring of playbills. The type of performance also influences the layout of the playbill, for instance, a circus playbill be often be divided into a grid-like structure to describe each act and feature illustrations, and early magician playbills often change orientation half-way down the playbill to give more space to describe their tricks and stage.

Vdc_100022589063.0x00001f
1834 Birmingham playbill

This 1834 Birmingham playbill is much lengthier than the previous example, showing a variety of fonts and featuring more densely packed text. Although this may look more like an information overload, the mix of fonts and variations in size still make the main points of the playbill eye-catching to passersby. 

James Gregory’s ‘Parody Playbills’ article, stimulated by the In the Spotlight project, contains a lot of great examples and further insights into the deeper meaning of playbills and their structure.

Works Cited

Davies, T. C. and P. Holland. (2007). The Performing Century: Nineteenth-Century Theatre History. Basingstoke: Palgrave Macmillan.

Gregory, J. (2018) ‘Parody Playbills: The Politics of the Playbill in Britain in the Eighteenth and Nineteenth Centuries’ in eBLJ.

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

06 August 2018

Reminder about the 2018 BL Labs Awards: enter before midnight Thursday 11th October!

With three months to go before the submission deadline, we would like to remind you about the 2018 British Library Labs Awards!

The BL Labs Awards are a way of formally recognising outstanding and innovative work that has been created using the British Library’s digital collections and data.

Have you been working on a project that uses digitised material from the British Library's collections? If so, we'd like to encourage you to enter that project for an award in one of our categories.

This year, BL Labs will be giving awards for work in four key areas:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

BLAwards2018
BL Labs Awards 2017 Winners (Top-Left- Research Award Winner – A large-scale comparison of world music corpora with computational tools , Top-Right (Commercial Award Winner – Movable Type: The Card Game), Bottom-Left(Artistic Award Winner – Imaginary Cities) and Bottom-Right (Teaching / Learning Award Winner – Vittoria’s World of Stories)

There is also a Staff Award which recognises a project completed by a staff member or team, with the winner and runner up being announced at the Symposium along with the other award winners.

The closing date for entering your work for the 2018 round of BL Labs Awards is midnight BST on Thursday 11th October (2018). Please submit your entry and/or help us spread the word to all interested and relevant parties over the next few months. This will ensure we have another year of fantastic digital-based projects highlighted by the Awards!

Read more about the Awards (FAQs, Terms & Conditions etc), practice your application with this text version, and then submit your entry online!

The entries will be shortlisted after the submission deadline (11/10/2018) has passed, and selected shortlisted entrants will be notified via email by midnight BST on Friday 26th October 2018. 

A prize of £500 will be awarded to the winner and £100 to the runner up in each of the Awards categories at the BL Labs Symposium on 12th November 2018 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners up from the last three years has resulted in a remarkable and varied collection of innovative projects. You can read about some of last year's Awards winners and runners up in our other blogs, links below:

BLAwards2018-Staff
British Library Labs Staff Award Winner – Two Centuries of Indian Print

To act as a source of inspiration for future awards entrants, all entries submitted for awards in previous years can be browsed in our online Awards archive.

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.

01 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 1. The project

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.

This month I have learned:

  • that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
  • that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
  • that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
  • and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
EAP051
Idrissou Njoya and Nji Mapon examine Mapon's endangered manuscript collection in Cameroon (EAP051)

I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.

Why am I sifting through this data?

Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.

EAP466
Manuscript of the Riyadh Mosque of Lamu, Kenya (EAP466)

The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.

Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.

The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.

EAP265
Tifinagh rock inscriptions in the Tadrart Acacus mountains, Libya (EAP265)

I’m also concerned with finding the most useful ways to visualise this information.

For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.

Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet.  OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.

With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.

I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.

Major vs Pilot Chart

For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.

Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.

Another chart that was quick to make shows the number of applicants from each continent by year.

Continent of Applicant Chart

This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.

This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.

I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.

This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.