THE BRITISH LIBRARY

Digital scholarship blog

72 posts categorized "Humanities"

07 December 2018

Introducing an experimental format for learning about content mining for digital scholarship

Add comment

This post by the British Library’s Digital Curator for Western Heritage Collections, Dr Mia Ridge, reports on an experimental format designed to provide more flexible and timely training on fast-moving topics like text and data mining.

This post covers two topics – firstly, an update to the established format of sessions on our Digital Scholarship Training Programme (DSTP) to introduce ‘strands’ of related modules that cumulatively make up a ‘course’, and secondly, an overview of subjects we’ve covered related to content mining for digital scholarship with cultural heritage collections.

Introducing ‘strands’

The Digital Research team have been running the DSTP for some years now. It’s been very successful but we know that it's hard for people to get away for a whole day, so we wanted to break courses that might previously have taken 5 or 6 hours of a day into smaller modules. Shorter sessions (talks or hands-on workshops) only an hour or at most two long seemed to fit more flexibly into busy diaries. We can also reach more people with talks than with hands-on workshops, which are limited by the number of training laptops and the need to offer more individual

A 'strand' is a new, flexible format for learning and maintaining skills, with training delivered through shorter modules that combine to build attendees’ knowledge of a particular topic over time. We can repeat individual modules – for example, a shorter ‘Introduction to’ session might run more often, or target people with some existing knowledge for more advanced sessions. I haven’t formally evaluated it but I suspect that the ability to pick and choose sessions means that attendees for each module are more engaged, which makes for a better session for everyone. We've seen a lot of uptake – in some cases the 40 or so places available go almost immediately - so offering shorter sessions seems to be working.

Designing courses as individual modules makes it easier to update individual sections as technologies and platforms change. This format has several other advantages: staff find it easier to attend hour-long modules, and they can try out methods on their own collections between sessions. It takes time for attendees to collect and prepare their own data for processing with digital methods (not to mention preparation time and complexity for the instructor), so we've stayed away from this in traditional workshops.

New topics can be introduced on a 'just in time' basis as new tools and techniques emerge. This seemed to address lots of issues I was having in putting together a new course on content mining. It also makes it easier to tackle a new subject than the established 5-6 hour format, as I can pilot short sessions and use the lessons learnt in planning the next module.

The modular format also means we can invite international experts and collaborators to give talks on their specialisms with relatively low organisational overhead, as we regularly run ‘21st Century Curatorship’ talks for staff. We can link relevant staff talks, or our monthly ‘Hack and Yack’ and Digital Scholarship Reading Groups sessions to specific strands.

We originally planned to start each strand with an introductory module outlining key concepts and terms, but in reality we dived into the first one as we already had talks that'd fit lined up.

Content mining for digital scholarship with cultural heritage collections

Tom and Nora trying out AntConcFrom the course blurb: ‘Content mining (sometimes ‘text and data mining’) is a form of computational processing that uses automated analytical techniques to analyse text, images, audio-visual material, metadata and other forms of data for patterns, trends and other useful information. Content mining methods have been applied to digitised and digital historic, cultural and scientific collections to help scholars answer new research questions at scale, analysing hundreds or hundreds of thousands of items. In addition to supporting new forms of digital scholarship that apply content mining methods, methods like Named Entity Recognition or Topic Modelling can make collection items more discoverable. Content mining in cultural heritage draws on data science, 'distant reading' and other techniques to categorise items; identify concepts and entities such as people, places and events; apply sentiment analysis and analyse items at scale.’

An easily updatable mixture of introductory talks, tutorial sessions, hands-on workshops and case studies from external experts fit perfectly into the modular format, and it's worked out well, with a range of topics and formats offered so far. Sessions have included: an Introduction to Machine Learning; Computational models for detecting semantic change in historical texts (Dr Barbara McGillivray, Alan Turing Institute); Computer vision tools with Dr Giles Bergel, from the University of Oxford's Visual Geometry Group; Jupyter Notebooks/Python for simple processing and visualisations of data from In the Spotlight; Listening to the Crowd: Data Science to Understand the British Museum's Visitors (Taha Yasseri, Turing/OII); Visualising cultural heritage collections (Olivia Fletcher Vane, Royal College of Art); An Introduction to Corpus Linguistics for the Humanities (Ruth Byrne, BL and Lancaster PhD student); Corpus Analysis with AntConc.

What’s next?

My colleagues Nora McGregor, Stella Wisdom and Adi Keinan-Schoonbaert have some great ‘strands’ planned for the future, including Stella’s on ‘Emerging Formats’ and Adi’s on ‘Place’, so watch this space for updates!

06 September 2018

Visualising the Endangered Archives Programme project data on Africa, Part 3. Finishing up

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

This summer I have taken a break by working hard, I’ve broadened my academic horizons by ignoring academia completely, and I’ve felt at home while travelling hundreds of miles a week. But above all else, I’ve just had a really nice time.

In my last two blogs I covered the early stages of my placement at the British Library, and discussed the data visualisation tools I’ve been exploring.

In this final blog I am going to outline the later stages of my project, I am also going to talk about my experience of undertaking a British Library placement, what I’ve learned and whether it was worth it (spoiler alert, it was).

What I’ve been doing

The final stages of my project have mostly consisted of two separate lines of investigation.

Firstly, I have been working on finding out as much as I can about the  Endangered Archives Programme (EAP)’s projects in Africa and finding the best ways to visualise that information in order to create a sort of bank of visualisations that the EAP team can use when they are talking about the work that they do. Visualisations, such as the one below showing the number of applications related to each region of Africa by year, can make tables of data much easier to understand.

Chart

Secondly, I was curious about why some project applications get funded and some do not. I wanted to know if I could identify any patterns in the reasons why projects get rejected.

This gave me the opportunity to apply my skills as a linguist to the data, albeit on a small scale. I decided to examine the feedback given to unsuccessful applicants by the panel that awards the EAP grants to see if I could identify any patterns. To do this I created a corpus, or electronic database, of texts. This could then be run through corpus analysis software to look for patterns.

AntConc

This image shows a word list created for my corpus using AntConc software, which is a free and open source corpus analysis tool.

My analysis allowed me to identify a number of issues common to many unsuccessful applications. In addition to applications outside of the scope of EAP there are also proposals which would make excellent projects but their applications lack the necessary information to award a grant.

Based on my analysis I was able to make a number of recommendations about additional information EAP could provide for applicants which might help to prevent potentially valuable archives being lost due to poor applications.

What I’ve learned

As well as learning about visualisation software I’ve learned a lot this summer about the EAP archives.

I’ve found out where applications are coming from, and which African countries have the most associated applications. I’ve learned that there are many great data visualisation tools available for free online. I’ve learned that there are over 70 different languages represented in the EAP archived projects from Africa.

EAP656
James Ssali and an unknown woman, from the Ham Musaka archive, Uganda (EAP656)

One of the most interesting things I’ve learned is just how much archival material is available for research – and on an incredibly broad range of topics. The materials digitised and preserved in Africa over the last 13 years includes:

This wealth of information provides so much opportunity for research and these are just the archives from Africa. The EAP funds projects all over the world.

EAP143
Shui manuscript from China (EAP143)

In addition to learning about the EAP archives I’ve learned a lot from working in the British Library more generally. The scale of the work that is carried out is immense and I don’t think I fully appreciated before working here for three months just how large the challenges they face are.

In addition to preserving a copy of every book published in the UK, the BL is also working to create large digital archives in order to facilitate the way that modern scholarship has developed. They are digitising books, audio, websites, as well as historical documents such as the records of the East India Company.

East India House
View of East India House by Thomas Hosmer Shepherd

Was it worth it?

A PhD is an intense thing to undertake and you have a time limit to complete it. At first glance, taking three months out to work on a placement with little direct relevance to my PhD might seem a bit foolish, particularly when it means a daily commute from Brighton to London.

Far from wasting my time, however, this placement has been an enriching experience. My PhD is on the origins and development of Cameroon Pidgin English. This placement has given me a break from my work while broadening my understanding of African culture and the context in which the language I study is spoken.

I’ve always had an interest in data visualisation and my placement has given me time to play with visualisation tools and gain a real understanding of the resources available. I feel refreshed and ready for the new term despite having worked full time all summer.

The break has also given me thinking space, it has allowed ideas to percolate and given me new skills which I can apply to my work. Taking a break from academia has given me more perspective on my work and more options for how to develop it.

BL
The British Library, St Pancras

Finally, the travel has been a lot but my supervisors have been very flexible, allowing me to work from home two days a week. The up-side of coming to London regularly has been getting to work with interesting people.

Working in a large institution could be an intimidating and isolating experience but it has been anything but. The digital scholarship team have been welcoming and interested, in particular I have had two very supportive supervisors. The British Library are really keen to support and develop placement students, and there is a lovely community of PhD students at the BL some on placements, some doing their PhD here.

I have had a great time at the British Library this summer and can only recommend the scheme to anyone thinking of applying for a placement next year.

23 August 2018

BL Labs Symposium (2018): Book your place for Mon 12-Nov-2018

Add comment

The BL Labs team are pleased to announce that the sixth annual British Library Labs Symposium will be held on Monday 12 November 2018, from 9:30 - 17:30 in the British Library Knowledge Centre, St Pancras. The event is free, and you must book a ticket in advance. Last year's event was a sell out, so don't miss out!

The Symposium showcases innovative and inspiring projects which use the British Library’s digital content, providing a platform for development, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of digital collections and data in the cultural heritage sector.

We are very proud to announce that this year's keynote will be delivered by Daniel Pett, Head of Digital and IT at the Fitzwilliam Museum, University of Cambridge.

Daniel Pett
Daniel Pett will be giving the keynote at this year's BL Labs Symposium. Photograph Copyright Chiara Bonacchi (University of Stirling).

  Dan read archaeology at UCL and Cambridge (but played too much rugby) and then worked in IT on the trading floor of Dresdner Kleinwort Benson. Until February this year, he was Digital Humanities lead at the British Museum, where he designed and implemented digital practises connecting humanities research, museum practice, and the creative industries. He is an advocate of open access, open source and reproducible research. He designed and built the award-winning Portable Antiquities Scheme database (which holds records of over 1.3 million objects) and enabled collaboration through projects working on linked and open data (LOD) with the Institute for the Study of the Ancient World (New York University) (ISAWNYU) and the American Numismatic Society. He has worked with crowdsourcing and crowdfunding (MicroPasts), and developed the British Museum's 3D capture reputation. He holds Honorary posts at UCL Institute of Archaeology and the Centre for Digital Humanities and publishes regularly in the fields of museum studies, archaeology and digital humanities.

Dan's keynote will reflect on his years of experience in assessing the value, impact and importance of experimenting with, re-imagining and re-mixing cultural heritage digital collections in Galleries, Libraries, Archives and Museums. Dan will follow in the footsteps of previous prestigious BL Labs keynote speakers: Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); and Bill Thompson and Andrew Prescott in 2013.

Stella Wisdom (Digital Curator for Contemporary British Collections at the British Library) will give an update on some exciting and innovative projects she and other colleagues have been working on within Digital Scholarship. Mia Ridge (Digital Curator for Western Heritage Collections at the British Library) will talk about a major and ambitious data science/digital humanities project 'Living with Machines' the British Library is about to embark upon, in collaboration with the Alan Turing Institute for data science and artificial intelligence.Throughout the day, there will be several announcements and presentations from nominated and winning projects for the BL Labs Awards 2018, which recognise work that have used the British Library’s digital content in four areas: Research, Artistic, Commercial, and Educational. The closing date for the BL Labs Awards is 11 October, 2018, so it's not too late to nominate someone/a team, or enter your own project! There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2018 which showcases the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close 12 October 2018).

Adam Farquhar (Head of Digital Scholarship at the British Library) will give an update about the future of BL Labs and report on a special event held in September 2018 for invited attendees from National, State, University and Public Libraries and Institutions around the world, where they were able to share best practices in building 'labs style environmentsfor their institutions' digital collections and data.

There will be a 'sneak peek' of an art exhibition in development entitled 'Imaginary Cities' by the visual artist and researcher Michael Takeo Magruder. His practice  draws upon working with information systems such as live and algorithmically generated data, 3D printing and virtual reality and combining modern / traditional techniques such as gold / silver gilding and etching. Michael's exhibition will build on the work he has been doing with BL Labs over the last few years using digitised 18th and 19th century urban maps bringing analog and digital outputs together. The exhibition will be staged in the British Library's entrance hall in April and May 2019 and will be free to visit.

Finally, we have an inspiring talk lined up to round the day off (more information about this will be announced soon), and - as is our tradition - the symposium will conclude with a reception at which delegates and staff can mingle and network over a drink and nibbles.

So book your place for the Symposium today and we look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

Posted by Mahendra Mahey and Eleanor Cooper (BL Labs Team)

13 August 2018

The Parts of a Playbill

Add comment

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can help out with In the Spotlight at playbills.libcrowds.com.

In the Spotlight is a brilliant tool for spotting variations between playbills across the eighteenth and nineteenth centuries. The site provides participants with access to thousands of digitised playbills, and the sheets of the playbills in the site’s collections often have lists of the cast, scenes, and any innovative ‘machinery’ involved in the production. Whilst the most famous actors obviously needed to be emphasised and drew more crowds (e.g., any playbills featuring Mr Kean tend to have his name in huge letters), from the playbills in In the Spotlight’s volumes that doesn’t always seem to be the case with playwrights. Sometimes they’re mentioned by name, but in many cases famous playwrights aren't named on the playbill. I’ve speculated previously that this is because these playwrights were so famous that perhaps audiences would hear by word of mouth or press that a new play was out by them, so it was assumed that there was no point in adding the name as audiences would already know?

What can you expect to see on a playbill?

The basics of a playbill are: the main title of the performance, a subtitle, often the current date, future or past dates of performances, the cast and characters, scenery, short or long summaries of the scenes to be acted, whether the performance is to benefit anyone, and where tickets can be bought from. There are definitely surprises though: the In the Spotlight team have also come across apologies from theatre managers for actors who were scheduled to perform not turning up, or performing drunk! The project forum has a thread for interesting things 'spotted on In the Spotlight', and we always welcome posts from others.

Crowds would often react negatively if the scheduled performers weren’t on stage. Gilli Bush-Bailey also notes in The Performing Century (2007) that crowds would be used to seeing the same minor actors reappear across several parts of the performance and playbills, stating that ‘playbills show that only the lesser actors and actresses in the company appear in both the main piece and the following farce or afterpiece’ (p. 185), with bigger names at theatres royal committing only to either a tragic or comic performance.

From our late 18th century playbills on the site, users can see quite a standard format in structure and font.

Vdc_100022589157.0x000013
In this 1797 playbill from the Margate volume, the font is uniform, with variations in size to emphasise names and performance titles.

How did playbills change over time?

In the 19th century, all kinds of new and exciting fonts are introduced, as well as more experimentation in the structuring of playbills. The type of performance also influences the layout of the playbill, for instance, a circus playbill be often be divided into a grid-like structure to describe each act and feature illustrations, and early magician playbills often change orientation half-way down the playbill to give more space to describe their tricks and stage.

Vdc_100022589063.0x00001f
1834 Birmingham playbill

This 1834 Birmingham playbill is much lengthier than the previous example, showing a variety of fonts and featuring more densely packed text. Although this may look more like an information overload, the mix of fonts and variations in size still make the main points of the playbill eye-catching to passersby. 

James Gregory’s ‘Parody Playbills’ article, stimulated by the In the Spotlight project, contains a lot of great examples and further insights into the deeper meaning of playbills and their structure.

Works Cited

Davies, T. C. and P. Holland. (2007). The Performing Century: Nineteenth-Century Theatre History. Basingstoke: Palgrave Macmillan.

Gregory, J. (2018) ‘Parody Playbills: The Politics of the Playbill in Britain in the Eighteenth and Nineteenth Centuries’ in eBLJ.

01 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 1. The project

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations.

This month I have learned:

  • that people in Canada are most likely to apply for grants to preserve archives in Ethiopia and Sierra Leone, whereas those in the USA are more interested in endangered archives in Nigeria and Ghana
  • that people in Africa who want to preserve an archive are more likely to run a pilot project before applying for a big grant whereas people from Europe and North America go big or go home (so to speak)
  • that the African countries in which endangered archives are most often identified are Nigeria, Ghana and Malawi
  • and that Eastern and Western African countries are more likely to be studied by academics in Europe and North America than those of Northern, Central or Southern Africa
EAP051
Idrissou Njoya and Nji Mapon examine Mapon's endangered manuscript collection in Cameroon (EAP051)

I have learned all of this, and more, from sifting through 14 years of the Endangered Archive Programme’s grant application data for Africa.

Why am I sifting through this data?

Well, I am currently half way through a three-month placement at the British Library working with the Digital Scholarship team on data from the Endangered Archives Programme (EAP). This is a programme which gives grants to people who want to preserve and digitise pre-modern archives under threat anywhere in the world.

EAP466
Manuscript of the Riyadh Mosque of Lamu, Kenya (EAP466)

The focus of my placement is to look at how the project has worked in the specific case of Africa over the 14 years the programme has been running. I’ll be using this data to create visualisations that will help provide information for anyone interested in the archives, and for the EAP team.

Over the next weeks I will be writing a series of blog posts detailing my work. This first post gives an overview of the project and its initial stages. My second post will discuss the types of data visualisation software I have been learning to use. Then, at the end of my project, I will be writing a post about my findings, using the visualisations.

The EAP has funded the preservation of a range of important archives in Africa over the last decade and a half. Some interesting examples include a project to preserve botanical collections in Kenya, and one which created a digital record of endangered rock inscriptions in Libya. However, my project is more concerned with the metadata surrounding these projects – who is applying, from where, and for what type of archive etc.

EAP265
Tifinagh rock inscriptions in the Tadrart Acacus mountains, Libya (EAP265)

I’m also concerned with finding the most useful ways to visualise this information.

For 14 years the details of each application have been recorded in MS Excel spreadsheets. Over time this system has evolved, so my first step was to fill in information gaps in the spreadsheets. This was a time-consuming task as gap filling had to be done manually by combing through individual application forms looking for the missing information.

Once I had a complete data set, I was able to a free and open source software called OpenRefine to clean up the spreadsheet.  OpenRefine can be used to edit and regularise spreadsheet data such as spelling or formatting inconsistencies quickly and thoroughly. There is an excellent article available here if you are interested in learning more about how to use OpenRefine and what you can do with it.

With a clean, complete, spreadsheet I could start looking at what the data could tell me about the EAP projects in Africa.

I used Excel visualisation tools to give me an overview of the information in the spreadsheet. I am very familiar with Excel, so this allowed me to explore lots of questions relatively quickly.

Major vs Pilot Chart

For example, there are two types of projects that EAP fund. Small scale, exploratory, pilot studies and larger scale main projects. I wondered which type of application was more likely to be successful in being awarded a grant. Using Excel it was easy to create the charts above which show that major projects are actually more likely to be funded than pilots are.

Of course, the question of why this might be still remains, but knowing this is the pattern is a useful first step for investigation.

Another chart that was quick to make shows the number of applicants from each continent by year.

Continent of Applicant Chart

This chart reveals that, with the exception of the first three years of the programme, most applications to preserve African archives have come from people living in Africa. Applications from North America and Europe on average seem to be pretty equal. Applications from elsewhere are almost non-existent, there have been three applications from Oceania, and one from Asia over the 14 years the EAP has been running.

This type of visualisation gives an overview at a glance in a way that a table cannot. But there are some things Excel tools can’t do.

I want to see if there are links between applicants from specific North American or European countries and archives in particular African countries, but Excel tools are not designed to map networks. Nor can Excel be used to present data on a map, which is something that the EAP team is particularly keen to see, so my next step is to explore the free software available which can do this.

This next stage of my project, in which I explore a range of data visualisation tools, will be detailed in a second blog post coming soon.

16 July 2018

Crowdsourcing comedy: date and genre results from In the Spotlight

Add comment

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can join the In the Spotlight project at playbills.libcrowds.com.

In this blog post I discuss the data created so far by In the Spotlight volunteers via crowdsourcing – which has already thrown out quite a few surprises along the way! All of the data which I discuss was cleaned using Open Refine, with some manual intervention by me to group categories such as genre. My first post below highlights the most notable results to come out of the date and genre tasks so far, and a second post will present similar findings for play titles and playwrights.

Dates

I started off by analysing the dates generated by the projects as, to be honest, it seemed easiest! One of the problems we’ve encountered with the date tasks, however, is that a number of the playbills do not show a full date.  This is notable in itself but unsurprising – why would a playbill in the eighteenth or nineteenth century need a full date when they weren’t expected to last two hundred years into the future? With that in mind, this is by no means an exhaustive data set.

After creating a simple graph of the most popular dates, it became clear that we had a huge spike in the number of performances in 1825. Was something relevant to theatre history happening during this year, or were the sources of the playbill collections just unusually pro-active in 1825 after taking some time off? Was the paper stock quality better, so more playbills have lasted? The outside influence of the original collector or owner of these playbills is also something to consider, for instance, maybe he was more interested in one type of performance than others, had more time to collect playbills in certain years or in certain places, and so on. A final potential factor is that this data also only comes from the volumes added to the site projects so far, and so isn’t indicative of the Library’s playbills as a whole.

Aside from source or collector influence, some other possible explanations do present themselves. Britain in general was growing exponentially, with London in particular becoming one of the biggest cities in the world, and this era also saw the birth of railways and the extravagant influence of figures such as George IV. As this is coming off the back of what seems to be a very slow year in 1824, however, perhaps it is best just to chalk this up to the activity of the collectors. We also have another noticeable spike in 1829, but by no means as dramatic as that of 1825. I’ve spent a bit of time comparing the number of performances seen in the volumes with other online performance date tools, such as UMass's Adelphi calendar and Godwin’s Diary to compare numbers, but would love to hear any further insights into this!

alt="Graph of most popular dates"
A graph showing the most popular performance dates

Genre

The main issue I faced in working with the genre data was the wide variety of descriptors used on the playbills themselves. For instance, I encountered burlesque, burletta and burlesque burletta – which of the first two categories would the last one go under? When I went back to the playbills themselves, it was also clear that many of the ‘genres’ generated were more like comments from theatre managers or just descriptions e.g. ‘an amusing sketch’. With this in mind, genre was the dataset which I ‘interfered’ with the most from a cleaning point of view.

Some of the calls I made were to group anything cited as ‘dramatic ___’ with drama more widely, unless it had a notable second qualifier, such as pantomime, Romance or sketch. I also grouped anything mentioning ‘historical’ together, as from a research point of view this is probably the most prominent aspect, grouped harlequinades with pantomimes (although I know this might be controversial!) and grouped anything which involved a large organisation, such as military, Masonic or national performances, under ‘organisational’. Some were difficult to separate – I did wonder about grouping variety and vaudeville together, but as there were so few of each it seemed better to leave them be.

With these qualifications in mind, by far the most popular genre in the collections was farce, which I kept distinct from comedy, clocking up 537 performances from the projects. This was closely followed by comedy more generally with 527 performances, with the drama (197), melodrama (150) and tragedy (135) trailing afterwards. Once again, it could purely be that the original collectors of these volumes had more of a taste for comedy than drama, but there is such a wide gap in popularity from the volumes so far that it seems fair to conclude that the regional theatre-going public of the late eighteenth and early nineteenth centuries preferred to be cheered rather than saddened by their entertainment.

alt="Graph of the most popular genres"
A graph showing the most popular genres in records transcribed to date

You can contribute to this research

The more contributions we receive, the more accurate the titles, genre and dates results will be, so whether you’re looking out for your local theatre or interested in the more unusual performances which crop up, get involved with the project today at playbills.libcrowds.com. In the Spotlight is well on the way to hitting 100,000 contributions – make sure that you’re one of them!

14 May 2018

Seeing British Library collections through a digital lens

Add comment

Digital Curator Mia Ridge writes: in this guest post, Dr Giles Bergel describes some experiments with the Library's digitised images...

The University of Oxford’s Visual Geometry Group has been working with a number of British Library curators to apply computer vision technology to their collections. On April 5 of this year I was invited by BL Digital Curator Dr. Mia Ridge to St. Pancras to showcase some of this work and to give curators the opportunity to try the tools out for themselves.  

Image1
Visual Geometry’s VISE tool matching two identical images from separate books digitised for the British Library’s Two Centuries of Indian Print project.

Computer vision - the extraction of meaning from images - has made considerable strides in recent years, particularly through the application of so-called ‘deep learning’ to large datasets. Cultural collections provide some of the most interesting test-cases for computer vision researchers, due to their complexity; the intensity of interest that researchers bring to them; and to their importance for human well-being. Can computers see collections as humans do? Computer vision is perhaps better regarded as a powerful lens rather than as a substitute for human curation. A computer can search a large collection of images far more quickly than can a single picture researcher: while it will not bring the same contextual understanding to bear on an image, it has the advantage of speed and comprehensiveness. Sometimes, a computer vision system can surprise the researcher by suggesting similarities that weren’t readily apparent.

As a relatively new technology, computer vision attracts legitimate concerns about privacy, ethics and fairness. By making its state of the art tools freely available, Visual Geometry hope to encourage experimentation and responsible use, and to enlist users to help determine what they can and cannot do. Cultural collections provide a searching test-case for the state of the art, due to their diversity as media (prints, paintings, stamped images, photographs, film and more) each of which invite different responses. One BL curator made a telling point by searching the BBC News collection with the term 'football': the system was presented with images previously tagged with that word that related to American, Gaelic, Rugby and Association football. Although inconclusive due to lack of sufficiently specific training data, the test asked whether a computer could (or should) pick the most popular instances; attempt to generalise across multiple meanings; or discern separate usages. Despite increases in processing power and in software methods, computers' ability to generalise; to extract semantic meaning from images or texts; and to cope with overlapping or ambiguous concepts remains very basic.  

Other tests with BL images have been more immediately successful. Visual Geometry's Traherne tool, developed originally to detect differences in typesetting in early printed books, worked well with many materials that exhibit small differences, such as postage stamps or doctored photographs. Visual Geometry's Image Search Engine (VISE) has shown itself capable of retrieving matching illustrations in books digitised for the Library's Indian Print project, as well as certain bookbinding features, or popular printed ballads. Some years ago Visual Geometry produced a search interface for the Library's 1 Million Images release. A collaboration between the Library's Endangered Archives programme and Oxford researcher David Zeitlyn on the archive of Cameroonian studio photographer Jacques Toussele employed facial recognition as well as pattern detection. VGG's facial recognition software works on video (BBC News, for example) as well as still photographs and art, and is soon to be freely released to join other tools under the banner of the Seebibyte Project.    

I'll be returning to the Library in June to help curators explore using the tools with their own images. For more information on the work of Visual Geometry on cultural collections, subscribe to the project's Google Group or contact Giles Bergel.      

Dr. Giles Bergel is a digital humanist based in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford.  

The event was supported by the Seebibyte project under an EPSRC Programme Grant EP/M013774/1

 

08 May 2018

The Italian Academies database – now available in XML

Add comment

Dr Mia Ridge writes: in 2017, we made XML and image files from a four-year, AHRC-funded project: The Italian Academies 1525-1700 available through the Library's open data portal. The original data structure was quite complex, so we would be curious to hear feedback from anyone reusing the converted form for research or visualisations.

In this post, Dr Lisa Sampson, Reader in Early Modern Italian Studies at UCL, and Dr Jane Everson, Emeritus Professor of Italian literature, RHUL, provide further information about the project...

New research opportunities for students of Renaissance and Baroque culture! The Italian Academies database is now available for download. It's in a format called XML which represents the original structure of the database.

This dedicated database results from an eight-year project, funded by the Arts and Humanities Research Council UK, and provides a wealth of information on the Italian learned academies. Around 800 such institutions flourished across the peninsula over the sixteenth and seventeenth centuries, making major contributions to the cultural and scientific debates and innovations of the period, as well as forming intellectual networks across Europe. This database lists a total of 587 Academies from Venice, Padua, Ferrara, Bologna, Siena, Rome, Naples, and towns and cities in southern Italy and Sicily active in the period 1525-1700. Also listed are more than 7,000 members of one or more academies (including major figures like Galileo, as well as women and artists), and almost 1,000 printed works connected with academies held in the British Library. The database therefore provides an essential starting point for research into early modern culture in Italy and beyond. It is also an invitation to further scholarship and data collection, as these totals constitute only a fraction of the data relating to the Academies.

Terracina
Laura Terracina, nicknamed Febea, of the Accademia degli Incogniti, Naples

The database is designed to permit searches from many different perspectives and to allow easy searching across categories. In addition to the three principal fields – Academies, People, Books – searches can be conducted by title keyword, printer, illustrator, dedicatee, censor, language, gender, nationality among others. The database also lists and illustrates the mottoes and emblems of the Academies (where known) and similarly of individual academy members. Illustrations from the books entered in the database include frontispieces, colophons, and images from within texts.

Intronati emblem
Emblem of the Accademia degli Intronati, Siena


The database thus aims to promote research on the Italian Academies in disciplines ranging from literature and history, through art, science, astronomy, mathematics, printing and publishing, censorship, politics, religion and philosophy.

The Italian Academies project which created this database began in 2006 as a collaboration between the British Library and Royal Holloway University of London, funded by the Arts and Humanities Research council and led by Jane Everson. The objective was the creation of a dedicated resource on the publications and membership of the Italian learned Academies active in the period between 1525 and 1700. The software for the database was designed in-house by the British Library and the first tranche of data was completed in 2009 listing information for academies in four cities (Naples, Siena, Bologna and Padua). A second phase, listing information for many more cities, including in southern Italy and Sicily, developed the database further, between 2010 and 2014, with a major research grant from the AHRC and collaboration with the University of Reading.

The exciting possibilities now opened up by the British Library’s digital data strategy look set to stimulate new research and collaborations by making the records even more widely available, and easily downloadable, in line with Open Access goals. The Italian Academies team is now working to develop the project further with the addition of new data, and the incorporation into a hub of similar resources.

The Italian Academies project team members welcome feedback on the records and on the adoption of the database for new research (contact: www.italianacademies.org).

The original database remains accessible at http://www.bl.uk/catalogues/ItalianAcademies/Default.aspx 

An Introduction to the database, its aims, contents and objectives is available both at this site and at the new digital data site: https://data.bl.uk/iad/

Jane E. Everson, Royal Holloway University of London

Lisa Sampson, University College, London