THE BRITISH LIBRARY

Digital scholarship blog

42 posts categorized "Research collaboration"

13 August 2018

The Parts of a Playbill

Add comment

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can help out with In the Spotlight at playbills.libcrowds.com.

In the Spotlight is a brilliant tool for spotting variations between playbills across the eighteenth and nineteenth centuries. The site provides participants with access to thousands of digitised playbills, and the sheets of the playbills in the site’s collections often have lists of the cast, scenes, and any innovative ‘machinery’ involved in the production. Whilst the most famous actors obviously needed to be emphasised and drew more crowds (e.g., any playbills featuring Mr Kean tend to have his name in huge letters), from the playbills in In the Spotlight’s volumes that doesn’t always seem to be the case with playwrights. Sometimes they’re mentioned by name, but in many cases famous playwrights aren't named on the playbill. I’ve speculated previously that this is because these playwrights were so famous that perhaps audiences would hear by word of mouth or press that a new play was out by them, so it was assumed that there was no point in adding the name as audiences would already know?

What can you expect to see on a playbill?

The basics of a playbill are: the main title of the performance, a subtitle, often the current date, future or past dates of performances, the cast and characters, scenery, short or long summaries of the scenes to be acted, whether the performance is to benefit anyone, and where tickets can be bought from. There are definitely surprises though: the In the Spotlight team have also come across apologies from theatre managers for actors who were scheduled to perform not turning up, or performing drunk! The project forum has a thread for interesting things 'spotted on In the Spotlight', and we always welcome posts from others.

Crowds would often react negatively if the scheduled performers weren’t on stage. Gilli Bush-Bailey also notes in The Performing Century (2007) that crowds would be used to seeing the same minor actors reappear across several parts of the performance and playbills, stating that ‘playbills show that only the lesser actors and actresses in the company appear in both the main piece and the following farce or afterpiece’ (p. 185), with bigger names at theatres royal committing only to either a tragic or comic performance.

From our late 18th century playbills on the site, users can see quite a standard format in structure and font.

Vdc_100022589157.0x000013
In this 1797 playbill from the Margate volume, the font is uniform, with variations in size to emphasise names and performance titles.

How did playbills change over time?

In the 19th century, all kinds of new and exciting fonts are introduced, as well as more experimentation in the structuring of playbills. The type of performance also influences the layout of the playbill, for instance, a circus playbill be often be divided into a grid-like structure to describe each act and feature illustrations, and early magician playbills often change orientation half-way down the playbill to give more space to describe their tricks and stage.

Vdc_100022589063.0x00001f
1834 Birmingham playbill

This 1834 Birmingham playbill is much lengthier than the previous example, showing a variety of fonts and featuring more densely packed text. Although this may look more like an information overload, the mix of fonts and variations in size still make the main points of the playbill eye-catching to passersby. 

James Gregory’s ‘Parody Playbills’ article, stimulated by the In the Spotlight project, contains a lot of great examples and further insights into the deeper meaning of playbills and their structure.

Works Cited

Davies, T. C. and P. Holland. (2007). The Performing Century: Nineteenth-Century Theatre History. Basingstoke: Palgrave Macmillan.

Gregory, J. (2018) ‘Parody Playbills: The Politics of the Playbill in Britain in the Eighteenth and Nineteenth Centuries’ in eBLJ.

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

01 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 1. The project

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.

This month I have learned:

  • that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
  • that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
  • that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
  • and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
EAP051
Idrissou Njoya and Nji Mapon examine Mapon's endangered manuscript collection in Cameroon (EAP051)

I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.

Why am I sifting through this data?

Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.

EAP466
Manuscript of the Riyadh Mosque of Lamu, Kenya (EAP466)

The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.

Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.

The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.

EAP265
Tifinagh rock inscriptions in the Tadrart Acacus mountains, Libya (EAP265)

I’m also concerned with finding the most useful ways to visualise this information.

For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.

Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet.  OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.

With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.

I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.

Major vs Pilot Chart

For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.

Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.

Another chart that was quick to make shows the number of applicants from each continent by year.

Continent of Applicant Chart

This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.

This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.

I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.

This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.

20 July 2018

Conference on Human Information Interaction and Retrieval

Add comment

This a guest post is by Carol Butler, who is doing a collaborative PhD research project with the Centre for Human Computer Interaction Design at City University of London and the British Library on the topic of  new technologies challenging author and reader roles. There is an introduction to her project in this previous post, and you can follow her on twitter as @fantomascarol.

The cold weather back in March seems a distant memory with all the warm weather we've been having recently. However, earlier in the year I braved the snow with my Doc Martens and woolly hat to visit New Brunswick, NJ, for the annual ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR), hosted this year by Rutgers University.

An intimate and welcoming conference with circa 130 attendees, CHIIR (pronounced ‘cheer’) attracts academics and industry professionals from the fields of Library and Information Science and Human Computer Interaction. I went to present early findings from my PhD study at their Doctoral Consortium, from which I have my first ever academic publication and where I gained helpful feedback and insight from a group of senior academics and fellow PhD students. Most notable was the help of Hideo Joho from the University of Tsukuba, who kindly shared his time for a one-to-one mentoring session.

A single-track conference, I was able to attend the majority of talks. In the main, presentations looked at better understanding how people search for information using digital tools and catalogues, and how we can improve their experience beyond offering the standard ’10 blue links’ we have come to expect from search results. Examples included early-stage research into search interfaces that understand vocal, ‘conversational’ enquiries and the new challenges this may present, and concepts such as VR goggles that can show us useful information about the world around us.

As well as seeking to better connect people with useful new information, talks also detailed how people re-find information they’ve already seen, for example in emails; on Twitter; or from their entire browsing history and file directory. There were also fascinating discussions about how people search in different languages, whether switching between languages fluidly if bilingual, or through viewing search results in different languages, side by side.

How people learn from the information they find was an important and tricky issue raised by a number of speakers, reminding us that finding information is not, in fact, the end goal. All in all, it was a very thought-provoking and enjoyable conference, and I am thankful to the organisers (ACM SIGIR), who awarded me a student travel grant to enable me to attend. Next year’s conference is considerably closer to home, in Glasgow, for more details check out http://sigir.org/chiir2019 – I very much hope to go again, and maybe I'll see some of you there!

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-07-20/9d9e6481-4a8e-4919-9355-7792cf332880.png
Not all UX problems are digital : I concluded my journey with a fascinating chat with the conductor on the train home, who shared top secret info about how he and his colleagues workaround usability problems with their paper ticketing process

16 July 2018

Crowdsourcing comedy: date and genre results from In the Spotlight

Add comment

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can join the In the Spotlight project at playbills.libcrowds.com.

In this blog post I discuss the data created so far by In the Spotlight volunteers via crowdsourcing – which has already thrown out quite a few surprises along the way! All of the data which I discuss was cleaned using Open Refine, with some manual intervention by me to group categories such as genre. My first post below highlights the most notable results to come out of the date and genre tasks so far, and a second post will present similar findings for play titles and playwrights.

Dates

I started off by analysing the dates generated by the projects as, to be honest, it seemed easiest! One of the problems we’ve encountered with the date tasks, however, is that a number of the playbills do not show a full date.  This is notable in itself but unsurprising – why would a playbill in the eighteenth or nineteenth century need a full date when they weren’t expected to last two hundred years into the future? With that in mind, this is by no means an exhaustive data set.

After creating a simple graph of the most popular dates, it became clear that we had a huge spike in the number of performances in 1825. Was something relevant to theatre history happening during this year, or were the sources of the playbill collections just unusually pro-active in 1825 after taking some time off? Was the paper stock quality better, so more playbills have lasted? The outside influence of the original collector or owner of these playbills is also something to consider, for instance, maybe he was more interested in one type of performance than others, had more time to collect playbills in certain years or in certain places, and so on. A final potential factor is that this data also only comes from the volumes added to the site projects so far, and so isn’t indicative of the Library’s playbills as a whole.

Aside from source or collector influence, some other possible explanations do present themselves. Britain in general was growing exponentially, with London in particular becoming one of the biggest cities in the world, and this era also saw the birth of railways and the extravagant influence of figures such as George IV. As this is coming off the back of what seems to be a very slow year in 1824, however, perhaps it is best just to chalk this up to the activity of the collectors. We also have another noticeable spike in 1829, but by no means as dramatic as that of 1825. I’ve spent a bit of time comparing the number of performances seen in the volumes with other online performance date tools, such as UMass's Adelphi calendar and Godwin’s Diary to compare numbers, but would love to hear any further insights into this!

alt="Graph of most popular dates"
A graph showing the most popular performance dates

Genre

The main issue I faced in working with the genre data was the wide variety of descriptors used on the playbills themselves. For instance, I encountered burlesque, burletta and burlesque burletta – which of the first two categories would the last one go under? When I went back to the playbills themselves, it was also clear that many of the ‘genres’ generated were more like comments from theatre managers or just descriptions e.g. ‘an amusing sketch’. With this in mind, genre was the dataset which I ‘interfered’ with the most from a cleaning point of view.

Some of the calls I made were to group anything cited as ‘dramatic ___’ with drama more widely, unless it had a notable second qualifier, such as pantomime, Romance or sketch. I also grouped anything mentioning ‘historical’ together, as from a research point of view this is probably the most prominent aspect, grouped harlequinades with pantomimes (although I know this might be controversial!) and grouped anything which involved a large organisation, such as military, Masonic or national performances, under ‘organisational’. Some were difficult to separate – I did wonder about grouping variety and vaudeville together, but as there were so few of each it seemed better to leave them be.

With these qualifications in mind, by far the most popular genre in the collections was farce, which I kept distinct from comedy, clocking up 537 performances from the projects. This was closely followed by comedy more generally with 527 performances, with the drama (197), melodrama (150) and tragedy (135) trailing afterwards. Once again, it could purely be that the original collectors of these volumes had more of a taste for comedy than drama, but there is such a wide gap in popularity from the volumes so far that it seems fair to conclude that the regional theatre-going public of the late eighteenth and early nineteenth centuries preferred to be cheered rather than saddened by their entertainment.

alt="Graph of the most popular genres"
A graph showing the most popular genres in records transcribed to date

You can contribute to this research

The more contributions we receive, the more accurate the titles, genre and dates results will be, so whether you’re looking out for your local theatre or interested in the more unusual performances which crop up, get involved with the project today at playbills.libcrowds.com. In the Spotlight is well on the way to hitting 100,000 contributions – make sure that you’re one of them!

14 May 2018

Seeing British Library collections through a digital lens

Add comment

Digital Curator Mia Ridge writes: in this guest post, Dr Giles Bergel describes some experiments with the Library's digitised images...

The University of Oxford’s Visual Geometry Group has been working with a number of British Library curators to apply computer vision technology to their collections. On April 5 of this year I was invited by BL Digital Curator Dr. Mia Ridge to St. Pancras to showcase some of this work and to give curators the opportunity to try the tools out for themselves.  

Image1
Visual Geometry’s VISE tool matching two identical images from separate books digitised for the British Library’s Two Centuries of Indian Print project.

Computer vision - the extraction of meaning from images - has made considerable strides in recent years, particularly through the application of so-called ‘deep learning’ to large datasets. Cultural collections provide some of the most interesting test-cases for computer vision researchers, due to their complexity; the intensity of interest that researchers bring to them; and to their importance for human well-being. Can computers see collections as humans do? Computer vision is perhaps better regarded as a powerful lens rather than as a substitute for human curation. A computer can search a large collection of images far more quickly than can a single picture researcher: while it will not bring the same contextual understanding to bear on an image, it has the advantage of speed and comprehensiveness. Sometimes, a computer vision system can surprise the researcher by suggesting similarities that weren’t readily apparent.

As a relatively new technology, computer vision attracts legitimate concerns about privacy, ethics and fairness. By making its state of the art tools freely available, Visual Geometry hope to encourage experimentation and responsible use, and to enlist users to help determine what they can and cannot do. Cultural collections provide a searching test-case for the state of the art, due to their diversity as media (prints, paintings, stamped images, photographs, film and more) each of which invite different responses. One BL curator made a telling point by searching the BBC News collection with the term 'football': the system was presented with images previously tagged with that word that related to American, Gaelic, Rugby and Association football. Although inconclusive due to lack of sufficiently specific training data, the test asked whether a computer could (or should) pick the most popular instances; attempt to generalise across multiple meanings; or discern separate usages. Despite increases in processing power and in software methods, computers' ability to generalise; to extract semantic meaning from images or texts; and to cope with overlapping or ambiguous concepts remains very basic.  

Other tests with BL images have been more immediately successful. Visual Geometry's Traherne tool, developed originally to detect differences in typesetting in early printed books, worked well with many materials that exhibit small differences, such as postage stamps or doctored photographs. Visual Geometry's Image Search Engine (VISE) has shown itself capable of retrieving matching illustrations in books digitised for the Library's Indian Print project, as well as certain bookbinding features, or popular printed ballads. Some years ago Visual Geometry produced a search interface for the Library's 1 Million Images release. A collaboration between the Library's Endangered Archives programme and Oxford researcher David Zeitlyn on the archive of Cameroonian studio photographer Jacques Toussele employed facial recognition as well as pattern detection. VGG's facial recognition software works on video (BBC News, for example) as well as still photographs and art, and is soon to be freely released to join other tools under the banner of the Seebibyte Project.    

I'll be returning to the Library in June to help curators explore using the tools with their own images. For more information on the work of Visual Geometry on cultural collections, subscribe to the project's Google Group or contact Giles Bergel.      

Dr. Giles Bergel is a digital humanist based in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford.  

The event was supported by the Seebibyte project under an EPSRC Programme Grant EP/M013774/1

 

08 May 2018

The Italian Academies database – now available in XML

Add comment

Dr Mia Ridge writes: in 2017, we made XML and image files from a four-year, AHRC-funded project: The Italian Academies 1525-1700 available through the Library's open data portal. The original data structure was quite complex, so we would be curious to hear feedback from anyone reusing the converted form for research or visualisations.

In this post, Dr Lisa Sampson, Reader in Early Modern Italian Studies at UCL, and Dr Jane Everson, Emeritus Professor of Italian literature, RHUL, provide further information about the project...

New research opportunities for students of Renaissance and Baroque culture! The Italian Academies database is now available for download. It's in a format called XML which represents the original structure of the database.

This dedicated database results from an eight-year project, funded by the Arts and Humanities Research Council UK, and provides a wealth of information on the Italian learned academies. Around 800 such institutions flourished across the peninsula over the sixteenth and seventeenth centuries, making major contributions to the cultural and scientific debates and innovations of the period, as well as forming intellectual networks across Europe. This database lists a total of 587 Academies from Venice, Padua, Ferrara, Bologna, Siena, Rome, Naples, and towns and cities in southern Italy and Sicily active in the period 1525-1700. Also listed are more than 7,000 members of one or more academies (including major figures like Galileo, as well as women and artists), and almost 1,000 printed works connected with academies held in the British Library. The database therefore provides an essential starting point for research into early modern culture in Italy and beyond. It is also an invitation to further scholarship and data collection, as these totals constitute only a fraction of the data relating to the Academies.

Terracina
Laura Terracina, nicknamed Febea, of the Accademia degli Incogniti, Naples

The database is designed to permit searches from many different perspectives and to allow easy searching across categories. In addition to the three principal fields – Academies, People, Books – searches can be conducted by title keyword, printer, illustrator, dedicatee, censor, language, gender, nationality among others. The database also lists and illustrates the mottoes and emblems of the Academies (where known) and similarly of individual academy members. Illustrations from the books entered in the database include frontispieces, colophons, and images from within texts.

Intronati emblem
Emblem of the Accademia degli Intronati, Siena


The database thus aims to promote research on the Italian Academies in disciplines ranging from literature and history, through art, science, astronomy, mathematics, printing and publishing, censorship, politics, religion and philosophy.

The Italian Academies project which created this database began in 2006 as a collaboration between the British Library and Royal Holloway University of London, funded by the Arts and Humanities Research council and led by Jane Everson. The objective was the creation of a dedicated resource on the publications and membership of the Italian learned Academies active in the period between 1525 and 1700. The software for the database was designed in-house by the British Library and the first tranche of data was completed in 2009 listing information for academies in four cities (Naples, Siena, Bologna and Padua). A second phase, listing information for many more cities, including in southern Italy and Sicily, developed the database further, between 2010 and 2014, with a major research grant from the AHRC and collaboration with the University of Reading.

The exciting possibilities now opened up by the British Library’s digital data strategy look set to stimulate new research and collaborations by making the records even more widely available, and easily downloadable, in line with Open Access goals. The Italian Academies team is now working to develop the project further with the addition of new data, and the incorporation into a hub of similar resources.

The Italian Academies project team members welcome feedback on the records and on the adoption of the database for new research (contact: www.italianacademies.org).

The original database remains accessible at http://www.bl.uk/catalogues/ItalianAcademies/Default.aspx 

An Introduction to the database, its aims, contents and objectives is available both at this site and at the new digital data site: https://data.bl.uk/iad/

Jane E. Everson, Royal Holloway University of London

Lisa Sampson, University College, London

25 April 2018

Some challenges and opportunities for digital scholarship in 2018

Add comment

In this post, Digital Curator Dr Mia Ridge shares her presentation notes for a talk on 'challenges and opportunities for digital scholarship' at the British Library's first Research Collaboration 'Open House'.

I'm part of a team that supports the creation and innovative use of the British Library's digital collections. Our working definition of digital scholarship is 'using computational methods to answer existing research questions or challenge existing theoretical paradigms'. In this post/talk, my perspective is informed by my knowledge of the internal processes necessary to support digital scholarship and of the issues that some scholars face when using digital/digitised collections, so I'm not by any means claiming this is a complete list.

Opportunities in digital scholarship

  • Scale: you can explore a bigger body of material computationally - 'reading' thousands, or hundreds of thousands, of volumes of text, images or media files - while retaining the ability to individually examine individual items as research questions arise from that distant reading
  • Perspective: you can see trends, patterns and relationships not apparent from close reading individual items, or gain a broad overview of a topic
  • Speed: you can test an idea or hypothesis on a large dataset; prototype new interfaces; generate classification data about people, places, concepts; transcribe content

Together, these opportunities enable new research questions.

Sample digital scholarship tools and methods

Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections at scale, improve discoverability or enable public engagement.

  • OCR, HTR - optical character recognition, handwritten text recognition
  • Data visualisation for analysis or publication
  • Text and data mining - applying classifications to or analysing texts, images or media. Key terms include natural language processing, corpus linguistics, sentiment analysis, applied machine learning. Examples include: Voyant tools, Clarifai image classification.
  • Mapping and GIS - assigning coordinates to quantitative or qualitative data
  • Public participation and learning including crowdsourcing, citizen science/history. Examples include In the Spotlight, transcribing information from historical playbills.
  • Creative and emerging formats including games
An experiment with image classification with Clarifai
An experiment with image classification with Clarifai

Putting it all together, we have case studies like Dr. Katrina Navickas, BL Labs Winner 2015's Political Meetings Mapper. This project, based on digitised 19th century newspapers, used Python scripts to calculate the meeting date, and extract and geocode their locations to create a map of Chartist meetings.

The Library has created a data portal, data.bl.uk, containing openly licensed datasets. We aim to describe collections in terms of their data format (images, full text, metadata, etc.), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes. Other datasets may be available by request, or digitised via funded partnerships.

We're aware that, currently, it can be hard to use the datasets from data.bl.uk as they can be too large to easily download, store and manipulate. This leads me neatly onto...

Challenges in digital scholarship

  • Digitisation and cataloguing backlog - the material you want mightn't be available without a special digitisation project
  • Providing access to assets for individual items - between copyright and technology, scholars don't always have the ability to download OCR/HTR text, or download all digitised media about an item
  • Providing access to collections as datasets - moving more material into the 'sweet spot' of material that's nicely digitised in suitable formats, usable sizes, with open licences allowing for re-use is an on-going (and expensive, time-consuming process)
  • 'Cleaning' historical data and dealing with gaps in both tools provision and source collections - none of these processes are straightforward
  • Providing access to platforms or suites of tools - how much should the Library take on for researchers, and how much should other institutions or individuals provide?
  • Skills - where will researchers learn digital scholarship methods?
  • Peer review - what if your discipline lacks DS-skilled peers? How can peers judge a website or database if they've only had experience with monographs or articles? How can scholars overcome prejudice about the 'digital'?
  • Versioning datasets as annotations or classifications change, software tools improve over time, transcriptions are corrected, etc - some of these changes may affect the argument you're making

Overall, I hope the opportunities outweigh the challenges, and it's certainly possible to start with small projects with existing tools and digital sources to explore the potential of a larger project.

If you've used BL data, you can enter the BL Labs awards - they don't close until October so you have time to start an experimental project now! You can also ask the Labs team to reality check your digital scholarship idea based on Library collections and data.

Digital scholarship is constantly shifting so on another date I might have come up with different opportunities and challenges. Let me know if you have challenges or opportunities that you think could be included in this very brief overview!