THE BRITISH LIBRARY

Digital scholarship blog

136 posts categorized "Data"

11 September 2018

Building Library Labs around the world - the event and complete our survey!

Add comment

Posted by Mahendra Mahey, BL Labs Manager.

Original labs lab (not cropped)
Building Library Labs

Around the world, leading national, state, university and public libraries are creating 'digital lab type environments' so that their digitised and born digital collections / data can be opened up and re-used for creative, innovative and inspiring projects by everyone such as digital researchers, artists, entrepreneurs and educators.

BL Labs, which has now been running for five years, is organising what we believe will be the first ever event of its kind in the world! We are bringing together national, state and university libraries with existing or planned digital 'Labs-style' teams for an invite-only workshop this Thursday 13 September and Friday 14 September, 2018.

A few months ago, we sent out special invitations to these organisations. We were delighted by the excitement generated, and by the tremendous response we received. Over 40 institutions from North America, Europe, Asia and Africa will be attending the workshop at the British Library this week. We have planned plenty of opportunities for networking, sharing lessons learned, and telling each other about innovative projects and services that are using digital collections / data in new and interesting ways. We aim to work together in the spirit of collaboration so that we can continue to build even better Library Labs for our users in the future.

Our packed programme includes:

  • 6 presentations covering topics such as those in our international Library Labs Survey;
  • 4 stories of how national Library Labs are developing in the UK, Austria, Denmark and the Netherlands;
  • 12 lightning talks with topics ranging from 3D-Imaging to Crowdsourcing;
  • 12 parallel discussion groups focusing on subjects such as funding, technical infrastructure and user engagement;
  • 3 plenary debates looking at the value to national Libraries of Labs environments and digital research, and how we will move forward as a group after this event.

We will collate and edit the outputs of this workshop in a report detailing the current landscape of digital Labs in national, state, university and public Libraries around the world.

If you represent one of these institutions, it's still not too late to participate, and you can do so in a few ways:

  • Our 'Building Library Labs' survey is still open, and if you work in or represent a digital Library Lab in one of our sectors, your input will be particularly valuable;
  • You may be able to participate remotely in this week's event in real time through Skype;
  • You can contribute to a collaborative document which delegates are adding to during the event.

If you are interested in one of these options, contact: mahendra.mahey@bl.uk.

Please note, that event is being videoed and we will be putting up clips on our YouTube channel soon after the workshop.

We will also return to this blog and let you know how we got on, and how you can access some of the other outputs from the event. Watch this space!

 

 

 

06 September 2018

Visualising the Endangered Archives Programme project data on Africa, Part 3. Finishing up

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

This summer I have taken a break by working hard, I’ve broadened my academic horizons by ignoring academia completely, and I’ve felt at home while travelling hundreds of miles a week. But above all else, I’ve just had a really nice time.

In my last two blogs I covered the early stages of my placement at the British Library, and discussed the data visualisation tools I’ve been exploring.

In this final blog I am going to outline the later stages of my project, I am also going to talk about my experience of undertaking a British Library placement, what I’ve learned and whether it was worth it (spoiler alert, it was).

What I’ve been doing

The final stages of my project have mostly consisted of two separate lines of investigation.

Firstly, I have been working on finding out as much as I can about the  Endangered Archives Programme (EAP)’s projects in Africa and finding the best ways to visualise that information in order to create a sort of bank of visualisations that the EAP team can use when they are talking about the work that they do. Visualisations, such as the one below showing the number of applications related to each region of Africa by year, can make tables of data much easier to understand.

Chart

Secondly, I was curious about why some project applications get funded and some do not. I wanted to know if I could identify any patterns in the reasons why projects get rejected.

This gave me the opportunity to apply my skills as a linguist to the data, albeit on a small scale. I decided to examine the feedback given to unsuccessful applicants by the panel that awards the EAP grants to see if I could identify any patterns. To do this I created a corpus, or electronic database, of texts. This could then be run through corpus analysis software to look for patterns.

AntConc

This image shows a word list created for my corpus using AntConc software, which is a free and open source corpus analysis tool.

My analysis allowed me to identify a number of issues common to many unsuccessful applications. In addition to applications outside of the scope of EAP there are also proposals which would make excellent projects but their applications lack the necessary information to award a grant.

Based on my analysis I was able to make a number of recommendations about additional information EAP could provide for applicants which might help to prevent potentially valuable archives being lost due to poor applications.

What I’ve learned

As well as learning about visualisation software I’ve learned a lot this summer about the EAP archives.

I’ve found out where applications are coming from, and which African countries have the most associated applications. I’ve learned that there are many great data visualisation tools available for free online. I’ve learned that there are over 70 different languages represented in the EAP archived projects from Africa.

EAP656
James Ssali and an unknown woman, from the Ham Musaka archive, Uganda (EAP656)

One of the most interesting things I’ve learned is just how much archival material is available for research – and on an incredibly broad range of topics. The materials digitised and preserved in Africa over the last 13 years includes:

This wealth of information provides so much opportunity for research and these are just the archives from Africa. The EAP funds projects all over the world.

EAP143
Shui manuscript from China (EAP143)

In addition to learning about the EAP archives I’ve learned a lot from working in the British Library more generally. The scale of the work that is carried out is immense and I don’t think I fully appreciated before working here for three months just how large the challenges they face are.

In addition to preserving a copy of every book published in the UK, the BL is also working to create large digital archives in order to facilitate the way that modern scholarship has developed. They are digitising books, audio, websites, as well as historical documents such as the records of the East India Company.

East India House
View of East India House by Thomas Hosmer Shepherd

Was it worth it?

A PhD is an intense thing to undertake and you have a time limit to complete it. At first glance, taking three months out to work on a placement with little direct relevance to my PhD might seem a bit foolish, particularly when it means a daily commute from Brighton to London.

Far from wasting my time, however, this placement has been an enriching experience. My PhD is on the origins and development of Cameroon Pidgin English. This placement has given me a break from my work while broadening my understanding of African culture and the context in which the language I study is spoken.

I’ve always had an interest in data visualisation and my placement has given me time to play with visualisation tools and gain a real understanding of the resources available. I feel refreshed and ready for the new term despite having worked full time all summer.

The break has also given me thinking space, it has allowed ideas to percolate and given me new skills which I can apply to my work. Taking a break from academia has given me more perspective on my work and more options for how to develop it.

BL
The British Library, St Pancras

Finally, the travel has been a lot but my supervisors have been very flexible, allowing me to work from home two days a week. The up-side of coming to London regularly has been getting to work with interesting people.

Working in a large institution could be an intimidating and isolating experience but it has been anything but. The digital scholarship team have been welcoming and interested, in particular I have had two very supportive supervisors. The British Library are really keen to support and develop placement students, and there is a lovely community of PhD students at the BL some on placements, some doing their PhD here.

I have had a great time at the British Library this summer and can only recommend the scheme to anyone thinking of applying for a placement next year.

23 August 2018

BL Labs Symposium (2018): Book your place for Mon 12-Nov-2018

Add comment

The BL Labs team are pleased to announce that the sixth annual British Library Labs Symposium will be held on Monday 12 November 2018, from 9:30 - 17:30 in the British Library Knowledge Centre, St Pancras. The event is free, and you must book a ticket in advance. Last year's event was a sell out, so don't miss out!

The Symposium showcases innovative and inspiring projects which use the British Library’s digital content, providing a platform for development, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of digital collections and data in the cultural heritage sector.

We are very proud to announce that this year's keynote will be delivered by Daniel Pett, Head of Digital and IT at the Fitzwilliam Museum, University of Cambridge.

Daniel Pett
Daniel Pett will be giving the keynote at this year's BL Labs Symposium. Photograph Copyright Chiara Bonacchi (University of Stirling).

  Dan read archaeology at UCL and Cambridge (but played too much rugby) and then worked in IT on the trading floor of Dresdner Kleinwort Benson. Until February this year, he was Digital Humanities lead at the British Museum, where he designed and implemented digital practises connecting humanities research, museum practice, and the creative industries. He is an advocate of open access, open source and reproducible research. He designed and built the award-winning Portable Antiquities Scheme database (which holds records of over 1.3 million objects) and enabled collaboration through projects working on linked and open data (LOD) with the Institute for the Study of the Ancient World (New York University) (ISAWNYU) and the American Numismatic Society. He has worked with crowdsourcing and crowdfunding (MicroPasts), and developed the British Museum's 3D capture reputation. He holds Honorary posts at UCL Institute of Archaeology and the Centre for Digital Humanities and publishes regularly in the fields of museum studies, archaeology and digital humanities.

Dan's keynote will reflect on his years of experience in assessing the value, impact and importance of experimenting with, re-imagining and re-mixing cultural heritage digital collections in Galleries, Libraries, Archives and Museums. Dan will follow in the footsteps of previous prestigious BL Labs keynote speakers: Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); and Bill Thompson and Andrew Prescott in 2013.

Stella Wisdom (Digital Curator for Contemporary British Collections at the British Library) will give an update on some exciting and innovative projects she and other colleagues have been working on within Digital Scholarship. Mia Ridge (Digital Curator for Western Heritage Collections at the British Library) will talk about a major and ambitious data science/digital humanities project 'Living with Machines' the British Library is about to embark upon, in collaboration with the Alan Turing Institute for data science and artificial intelligence.Throughout the day, there will be several announcements and presentations from nominated and winning projects for the BL Labs Awards 2018, which recognise work that have used the British Library’s digital content in four areas: Research, Artistic, Commercial, and Educational. The closing date for the BL Labs Awards is 11 October, 2018, so it's not too late to nominate someone/a team, or enter your own project! There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2018 which showcases the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close 12 October 2018).

Adam Farquhar (Head of Digital Scholarship at the British Library) will give an update about the future of BL Labs and report on a special event held in September 2018 for invited attendees from National, State, University and Public Libraries and Institutions around the world, where they were able to share best practices in building 'labs style environmentsfor their institutions' digital collections and data.

There will be a 'sneak peek' of an art exhibition in development entitled 'Imaginary Cities' by the visual artist and researcher Michael Takeo Magruder. His practice  draws upon working with information systems such as live and algorithmically generated data, 3D printing and virtual reality and combining modern / traditional techniques such as gold / silver gilding and etching. Michael's exhibition will build on the work he has been doing with BL Labs over the last few years using digitised 18th and 19th century urban maps bringing analog and digital outputs together. The exhibition will be staged in the British Library's entrance hall in April and May 2019 and will be free to visit.

Finally, we have an inspiring talk lined up to round the day off (more information about this will be announced soon), and - as is our tradition - the symposium will conclude with a reception at which delegates and staff can mingle and network over a drink and nibbles.

So book your place for the Symposium today and we look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

Posted by Mahendra Mahey and Eleanor Cooper (BL Labs Team)

08 August 2018

Visualising the Endangered Archives Programme project data on Africa, Part 2. Data visualisation tools

Add comment

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

When I wrote last week that the Endangered Archives Programme (EAP) receive the most applications for archives in Nigeria, Ghana and Malawi, I am reasonably sure you were able to digest that news without difficulty.

Is that still the case if I add that Ethiopia, South Africa and Mali come in fourth, fifth and sixth place; and that the countries for which only a single application has been received include Morocco, Libya, Mauritania, Chad, Eritrea, and Egypt?

What if I give you the same information via a handy interactive map?

This map, designed using Tableau Public, shows the location of every archive that the EAP received between 2004 and 2017. Once you know that the darker the colour the more applications received, you can see at a glance how the applications have been distributed. If you want more information you can hover your cursor over each country to see its name and number of associated applications.

My placement at the British Library centres on using data visualisations such as this to tell the story of the EAP projects in Africa.

EAP054
Photo from a Cameroonian photographic archive (EAP054)

When not undertaking a placement I am a linguist. This doesn’t require a lot of data visualisation beyond the tools available in Excel. In my previous blog I discussed how useful Excel tools have been for giving me an overview of the EAP data. But there are some visualisations you can’t create in Excel, such as an interactive heat map, so I had to explore what other tools are available.

Inspired by this excellent blog from a previous placement student I started out by investigating Tableau Public primarily to look for ways to represent data using a map.

Tableau Public is open source and freely available online. It is fairly intuitive to use and has a wide range of possible graphs and charts, not just maps. You upload a spreadsheet and it will tell you how to do the rest. There are also many instructional videos online that show you the range of possibilities available.

As well as the heat map above, I also used this tool to examine which countries applications are coming from.

This map shows that the largest number of applications have come from the USA and UK, but people from Canada, South Africa and Malawi have also applied for a lot of grants.

Malawi has a strong showing on both maps. There have been 23 applications to preserve archives in Malawi, and 21 applicants from within Malawi.

EAP942
Paper from the Malawi news archive (EAP942)

Are these the same applications?

My spreadsheet suggests that they are. I can also see that there seems to be links between certain countries, such as Canada and Ethiopia, but in order to properly understand these connections I need a tool that can represent networks – something Tableau Public cannot do.

After some investigation (read ‘googling’) I was able to find Gephi, free, open source software designed specifically for visualising networks.

Of all the software I have used in this project so far, Gephi is the least intuitive. But it can be used to create informative visualisations so it is worth the effort to learn. Gephi do provide a step by step guide to getting started, but the first step is to upload a spreadsheet detailing your ‘nodes’ and ‘edges’.

Having no idea what either of these were I stalled at step one.  

Further googling turned up this useful blog post written for complete beginners which informed me that nodes are individual members of a network. So in my case countries. My list of nodes includes both the country of the archive and the country of the applicant. Edges are the links between nodes. So each application creates a link, or edge, between the two countries, or nodes, involved.

Once I understood the jargon, I was able to use Gephi’s guide to create the network below which shows all applications between 2004 and 2017 regardless of whether they were successful in acquiring a grant. Gephi GraphIn this visualisation the size of each country relates to the number of applications it features in, as country of archive, country of applicant, or both.  The colours show related groups.

Each line shows the direction and frequency of application. The line always travels in a clockwise direction from country of applicant to country of archive, the thicker the line the more applications. Where the country of applicant and country of archive are the same the line becomes a loop.

I love network maps because you can learn so much from them. In this one, for example, you can see (among other things):

  • strong links between the USA and West Africa
  • multiple Canadian applications for Sierra Leonean and Ethiopian archives
  • UK applications to a diverse range of countries
  • links between Egypt and Algeria and between Tunisia and Morocco

The last tool I explored was Google Fusion Tables. These can be used to present information from a spreadsheet on a map. Once you have coordinates for your locations, Fusion Tables are incredibly easy to use (and will fill in coordinates for you in many cases).  You upload the spreadsheet, pick the information to include and it’s done. It is so intuitive that I have yet to do much reading on how it works – hence the lack of decision on how to use it.

There is currently a Fusion-based Table over on the EAP website with links to every project they have funded. It is possible to include all sorts of information for each archive location so I plan create something more in depth for the African archives that can potentially be used as a tool by researchers.

The next step for my project is to apply these tools to the data in order to create a range of visualisations which will be the stars of my third and final blog at the beginning of September, so watch this space.

06 August 2018

Reminder about the 2018 BL Labs Awards: enter before midnight Thursday 11th October!

Add comment

With three months to go before the submission deadline, we would like to remind you about the 2018 British Library Labs Awards!

The BL Labs Awards are a way of formally recognising outstanding and innovative work that has been created using the British Library’s digital collections and data.

Have you been working on a project that uses digitised material from the British Library's collections? If so, we'd like to encourage you to enter that project for an award in one of our categories.

This year, BL Labs will be giving awards for work in four key areas:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

BLAwards2018
BL Labs Awards 2017 Winners (Top-Left- Research Award Winner – A large-scale comparison of world music corpora with computational tools , Top-Right (Commercial Award Winner – Movable Type: The Card Game), Bottom-Left(Artistic Award Winner – Imaginary Cities) and Bottom-Right (Teaching / Learning Award Winner – Vittoria’s World of Stories)

There is also a Staff Award which recognises a project completed by a staff member or team, with the winner and runner up being announced at the Symposium along with the other award winners.

The closing date for entering your work for the 2018 round of BL Labs Awards is midnight BST on Thursday 11th October (2018). Please submit your entry and/or help us spread the word to all interested and relevant parties over the next few months. This will ensure we have another year of fantastic digital-based projects highlighted by the Awards!

Read more about the Awards (FAQs, Terms & Conditions etc), practice your application with this text version, and then submit your entry online!

The entries will be shortlisted after the submission deadline (11/10/2018) has passed, and selected shortlisted entrants will be notified via email by midnight BST on Friday 26th October 2018. 

A prize of £500 will be awarded to the winner and £100 to the runner up in each of the Awards categories at the BL Labs Symposium on 12th November 2018 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners up from the last three years has resulted in a remarkable and varied collection of innovative projects. You can read about some of last year's Awards winners and runners up in our other blogs, links below:

BLAwards2018-Staff
British Library Labs Staff Award Winner – Two Centuries of Indian Print

To act as a source of inspiration for future awards entrants, all entries submitted for awards in previous years can be browsed in our online Awards archive.

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.

30 July 2018

British Library Labs Staff Awards 2018: Looking for entries now!

Add comment

Four-light-bulbs

Nominate a British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2018 British Library Labs Staff Award, now in its third year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself, for the Staff Award using this form.

The deadline for submission is 12:00 (BST), Friday 12 October 2018.

Nominees will be highlighted on Monday 12 November 2018 at the British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects.

You can see the projects submitted by members of staff for the last two years' awards in our online archive, as well as blogs for last year's winners and runners-up.

The Staff Award complements the British Library Labs Awards, introduced in 2015, which recognise outstanding work that has been done in the broader community. Last year’s winners in the public competition drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It is funded by the Andrew W. Mellon Foundation.

If you have any questions, please contact us at labs@bl.uk.

@bl_labs #bldigital @bl_digischol

16 July 2018

Crowdsourcing comedy: date and genre results from In the Spotlight

Add comment

Beatrice Ashton-Lelliott is a PhD researcher at the University of Portsmouth studying the presentation of nineteenth-century magicians in biographies, literature, and the popular press. She is currently a research placement student on the British Library’s In the Spotlight project, cleaning and contextualising the crowdsourced playbills data. She can be found on Twitter at @beeashlell and you can join the In the Spotlight project at playbills.libcrowds.com.

In this blog post I discuss the data created so far by In the Spotlight volunteers via crowdsourcing – which has already thrown out quite a few surprises along the way! All of the data which I discuss was cleaned using Open Refine, with some manual intervention by me to group categories such as genre. My first post below highlights the most notable results to come out of the date and genre tasks so far, and a second post will present similar findings for play titles and playwrights.

Dates

I started off by analysing the dates generated by the projects as, to be honest, it seemed easiest! One of the problems we’ve encountered with the date tasks, however, is that a number of the playbills do not show a full date.  This is notable in itself but unsurprising – why would a playbill in the eighteenth or nineteenth century need a full date when they weren’t expected to last two hundred years into the future? With that in mind, this is by no means an exhaustive data set.

After creating a simple graph of the most popular dates, it became clear that we had a huge spike in the number of performances in 1825. Was something relevant to theatre history happening during this year, or were the sources of the playbill collections just unusually pro-active in 1825 after taking some time off? Was the paper stock quality better, so more playbills have lasted? The outside influence of the original collector or owner of these playbills is also something to consider, for instance, maybe he was more interested in one type of performance than others, had more time to collect playbills in certain years or in certain places, and so on. A final potential factor is that this data also only comes from the volumes added to the site projects so far, and so isn’t indicative of the Library’s playbills as a whole.

Aside from source or collector influence, some other possible explanations do present themselves. Britain in general was growing exponentially, with London in particular becoming one of the biggest cities in the world, and this era also saw the birth of railways and the extravagant influence of figures such as George IV. As this is coming off the back of what seems to be a very slow year in 1824, however, perhaps it is best just to chalk this up to the activity of the collectors. We also have another noticeable spike in 1829, but by no means as dramatic as that of 1825. I’ve spent a bit of time comparing the number of performances seen in the volumes with other online performance date tools, such as UMass's Adelphi calendar and Godwin’s Diary to compare numbers, but would love to hear any further insights into this!

alt="Graph of most popular dates"
A graph showing the most popular performance dates

Genre

The main issue I faced in working with the genre data was the wide variety of descriptors used on the playbills themselves. For instance, I encountered burlesque, burletta and burlesque burletta – which of the first two categories would the last one go under? When I went back to the playbills themselves, it was also clear that many of the ‘genres’ generated were more like comments from theatre managers or just descriptions e.g. ‘an amusing sketch’. With this in mind, genre was the dataset which I ‘interfered’ with the most from a cleaning point of view.

Some of the calls I made were to group anything cited as ‘dramatic ___’ with drama more widely, unless it had a notable second qualifier, such as pantomime, Romance or sketch. I also grouped anything mentioning ‘historical’ together, as from a research point of view this is probably the most prominent aspect, grouped harlequinades with pantomimes (although I know this might be controversial!) and grouped anything which involved a large organisation, such as military, Masonic or national performances, under ‘organisational’. Some were difficult to separate – I did wonder about grouping variety and vaudeville together, but as there were so few of each it seemed better to leave them be.

With these qualifications in mind, by far the most popular genre in the collections was farce, which I kept distinct from comedy, clocking up 537 performances from the projects. This was closely followed by comedy more generally with 527 performances, with the drama (197), melodrama (150) and tragedy (135) trailing afterwards. Once again, it could purely be that the original collectors of these volumes had more of a taste for comedy than drama, but there is such a wide gap in popularity from the volumes so far that it seems fair to conclude that the regional theatre-going public of the late eighteenth and early nineteenth centuries preferred to be cheered rather than saddened by their entertainment.

alt="Graph of the most popular genres"
A graph showing the most popular genres in records transcribed to date

You can contribute to this research

The more contributions we receive, the more accurate the titles, genre and dates results will be, so whether you’re looking out for your local theatre or interested in the more unusual performances which crop up, get involved with the project today at playbills.libcrowds.com. In the Spotlight is well on the way to hitting 100,000 contributions – make sure that you’re one of them!

14 May 2018

Seeing British Library collections through a digital lens

Add comment

Digital Curator Mia Ridge writes: in this guest post, Dr Giles Bergel describes some experiments with the Library's digitised images...

The University of Oxford’s Visual Geometry Group has been working with a number of British Library curators to apply computer vision technology to their collections. On April 5 of this year I was invited by BL Digital Curator Dr. Mia Ridge to St. Pancras to showcase some of this work and to give curators the opportunity to try the tools out for themselves.  

Image1
Visual Geometry’s VISE tool matching two identical images from separate books digitised for the British Library’s Two Centuries of Indian Print project.

Computer vision - the extraction of meaning from images - has made considerable strides in recent years, particularly through the application of so-called ‘deep learning’ to large datasets. Cultural collections provide some of the most interesting test-cases for computer vision researchers, due to their complexity; the intensity of interest that researchers bring to them; and to their importance for human well-being. Can computers see collections as humans do? Computer vision is perhaps better regarded as a powerful lens rather than as a substitute for human curation. A computer can search a large collection of images far more quickly than can a single picture researcher: while it will not bring the same contextual understanding to bear on an image, it has the advantage of speed and comprehensiveness. Sometimes, a computer vision system can surprise the researcher by suggesting similarities that weren’t readily apparent.

As a relatively new technology, computer vision attracts legitimate concerns about privacy, ethics and fairness. By making its state of the art tools freely available, Visual Geometry hope to encourage experimentation and responsible use, and to enlist users to help determine what they can and cannot do. Cultural collections provide a searching test-case for the state of the art, due to their diversity as media (prints, paintings, stamped images, photographs, film and more) each of which invite different responses. One BL curator made a telling point by searching the BBC News collection with the term 'football': the system was presented with images previously tagged with that word that related to American, Gaelic, Rugby and Association football. Although inconclusive due to lack of sufficiently specific training data, the test asked whether a computer could (or should) pick the most popular instances; attempt to generalise across multiple meanings; or discern separate usages. Despite increases in processing power and in software methods, computers' ability to generalise; to extract semantic meaning from images or texts; and to cope with overlapping or ambiguous concepts remains very basic.  

Other tests with BL images have been more immediately successful. Visual Geometry's Traherne tool, developed originally to detect differences in typesetting in early printed books, worked well with many materials that exhibit small differences, such as postage stamps or doctored photographs. Visual Geometry's Image Search Engine (VISE) has shown itself capable of retrieving matching illustrations in books digitised for the Library's Indian Print project, as well as certain bookbinding features, or popular printed ballads. Some years ago Visual Geometry produced a search interface for the Library's 1 Million Images release. A collaboration between the Library's Endangered Archives programme and Oxford researcher David Zeitlyn on the archive of Cameroonian studio photographer Jacques Toussele employed facial recognition as well as pattern detection. VGG's facial recognition software works on video (BBC News, for example) as well as still photographs and art, and is soon to be freely released to join other tools under the banner of the Seebibyte Project.    

I'll be returning to the Library in June to help curators explore using the tools with their own images. For more information on the work of Visual Geometry on cultural collections, subscribe to the project's Google Group or contact Giles Bergel.      

Dr. Giles Bergel is a digital humanist based in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford.  

The event was supported by the Seebibyte project under an EPSRC Programme Grant EP/M013774/1