19 February 2021
Keen followers of this blog may remember a post from last December, which shared details of a virtual workshop about AI and Archives: Current Challenges and Prospects of Digital and Born-digital archives. This topic was one of three workshop themes identified by the Archives in the UK/Republic of Ireland & AI (AURA) network, which is a forum promoting discussions on how Artificial Intelligence (AI) can be applied to cultural heritage archives, and to explore issues with providing access to born digital and hybrid digital/physical collections.
The first AURA workshop on Open Data versus Privacy organised by Annalina Caputo from Dublin City University, took place on 16-17 November 2020. Rachel MacGregor provides a great write-up of this event here.
Here at the British Library, we teamed up with our friends at The National Archives to curate the second AURA workshop exploring the current challenges and prospects of born-digital archives, this took place online on 28-29 January 2021. The first day of the workshop held on 28 January was organised by The National Archives, you can read more about this day here, and the following day, 29 January, was organised by the BL, videos and slides for this can be found on the AURA blog and I've included them in this post.
The format for both days of the second AURA workshop comprised of four short presentations, two interactive breakout room sessions and a wider round-table discussion. The aim being that the event would generate dialogue around key challenges that professionals across all sectors are grappling with, with a view to identifying possible solutions.
The first day covered issues of access both from infrastructural and user’s perspectives, plus the ethical implications of the use of AI and advanced computational approaches to archival practices and research. The second day discussed challenges of access to email archives, and also issues relating to web archives and emerging format collections, including web-based interactive narratives. A round-up of the second day is below, including recorded videos of the presentations for anyone unable to attend on the day.
Kicking off day two, a warm welcome to the workshop attendees was given by Rachel Foss, Head of Contemporary Archives and Manuscripts at the British Library, Larry Stapleton, Senior academic and international consultant from the Waterford Institute of Technology and Mathieu d’ Aquin, Professor of Informatics at the National University of Ireland Galway.
The morning session on Email Archives: challenges of access and collaborative initiatives was chaired by David Kirsch, Associate Professor, Robert H. Smith School of Business, University of Maryland. This featured two presentations:
The first of these was about Working with ePADD: processes, challenges and collaborative solutions in working with email archives, by Callum McKean, Curator for Contemporary Literary and Creative Archives, British Library and Jessica Smith, Creative Arts Archivist, John Rylands Library, University of Manchester. Their slides can be viewed here and here. Apologies that the recording of Callum's talk is clipped, this was due to connectivity issues on the day.
The second presentation was Finding Light in Dark Archives: Using AI to connect context and content in email collections by Stephanie Decker, Professor of History and Strategy, University of Bristol and Santhilata Venkata, Digital Preservation Specialist & Researcher at The National Archives in the UK.
After their talks, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the morning session were:
- Are there any other appraisal or collaborative considerations that might improve our practices and offer ways forward?
- What do we lose by emphasizing usability for researchers?
- Should we start with how researchers want to use email archives now and in the future, rather than just on preservation?
- Potentialities of email archives as organizational, not just individual?
These questions led to discussions about, file formats, collection sizes, metadata standards and ways to interpret large data sets. There was interest in how email archives might allow researchers to reconstruct corporate archives, e.g. understand social dynamics of the office and understand decision making processes. It was felt that there is a need to understand the extent to which email represents organisation-level context. More questions were raised including:
- To what extent is it part of the organisational records and how should it be treated?
- How do you manage the relationship between constant organisational functions and structure (a CEO) and changing individuals?
- Who will be looking at organisational email in the future and how?
It was mentioned that there is a need to distinguish between email as data and email as an artifact, as the use-cases and preservation needs may be markedly different.
Duties of care that exist between depositors, tool designers, archivists and researchers was discussed and a question was asked about how we balance these?
- Managing human burden
- Differing levels of embargo
- Institutional frameworks
There was discussion of the research potential for comparing email and social media collections, e.g. tweet archives and also the difficulties researchers face in getting access to data sets. The monetary value of email archives was also raised and it was mentioned that perceived value, hasn’t been translated into monetary value.
Researcher needs and metadata was another topic brought up by attendees, it was suggested that the information about collections in online catalogues needs to be descriptive enough for researchers to decide if they wish to visit an institution, to view digital collections on a dedicated terminal. It was also suggested that archives and libraries need to make access restrictions, and the reasoning for these, very clear to users. This would help to manage expectations, so that researchers will know when to visit on-site because remote access is not possible. It was mentioned that it is challenging to identify use cases, but it was noted that without deeper understanding of researcher needs, it can be hard to make decisions about access provision.
It was acknowledged that the demands on human-processing are still high for born digital archives, and the relationship between tools and professionals still emergent. So there was a question about whether researchers could be involved in collaborations more, and to what extent will there be an onus on their responsibilities and liabilities in relation to usage of born digital archives?
Lots of food for thought before the break for lunch!
The afternoon session chaired by Nicole Basaraba, Postdoctoral Researcher, Studio Europa, Maastricht University, discussed Emerging Formats, Interactive Narratives and Socio-Cultural Questions in AI.
The first afternoon presentation Collecting Emerging Formats: Capturing Interactive Narratives in the UK Web Archive was given by Lynda Clark, Post-doctoral research fellow in Narrative and Play at InGAME: Innovation for Games and Media Enterprise, University of Dundee, and Giulia Carla Rossi, Curator for Digital Publications, British Library. Their slides can be viewed here.
The second afternoon session was Women Reclaiming AI: a collectively designed AI Voice Assistant by Coral Manton, Lecturer in Creative Computing, Bath Spa University, her slides can be seen here.
Following the same format as in the morning, after these presentations, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the afternoon session were:
- Should we be collecting examples of AIs, as well as using AI to preserve collections? What are the Implications of this
- How do we get more people to feel that they can ask questions about AI?
- How do we use AI to think about the complexity of what identity is and how do we engineer it so that technologies work for the benefit of everyone?
There was a general consensus, which acknowledged that AI is becoming a significant and pervasive part of our life. However it was felt that there are many aspects we don't fully understand. In the breakout groups workshop participants raised more questions, including:
- Where would AI-based items sit in collections?
- Why do we want it?
- How to collect?
- What do we want to collect? User interactions? The underlying technology? Many are patented technologies owned by corporations, so this makes it challenging.
- What would make AI more accessible?
- Some research outputs may be AI-based - do we need to collect all the code, or just the end experience produced? If the latter, could this be similar to documenting evidence e.g. video/sound recordings or transcripts.
- Could or should we use AI to collect? Who’s behind the AI? Who gets to decide what to archive and how? Who’s responsible for mistakes/misrepresentations made by the AI?
There was debate about how to define AI in terms of a publication/collection item, it was felt that an understanding of this would help to decide what archives and libraries should be collecting, and understand what is not being collected currently. It was mentioned that a need for user input is a critical factor in answering questions like this. A number of challenges of collecting using AI were raised in the group discussions, including:
- Lack of standardisation in formats and metadata
- Questions of authorship and copyright
- Ethical considerations
- Engagement with creators/developers
It was suggested that full scale automation is not completely desirable and some kind of human element is required for specialist collections. However, AI might be useful for speeding up manual human work.
There was discussion of problems of bias in data, that existing prejudices are baked into datasets and algorithms. This led to more questions about:
- Is there is a role for curators in defining and designing unbiased and more representative data sets to more fairly reflect society?
- Should archives collect training data, to understand underlying biases?
- Who is the author of AI created text and dialogue? Who is the legally responsible person/orgnisation?
- What opportunities are there for libraries and archives to teach people about digital safety through understanding datasets and how they are used?
Participants also questioned:
- Why do we humanise AI?
- Why do we give AI a gender?
- Is society ready for a genderless AI?
- Could the next progress in AI be a combination of human/AI? A biological advancement? Human with AI “components” - would that make us think of AIs as fallible?
With so many questions and a lack of answers, it was felt that fiction may also help us to better understand some of these issues, and Rachel Foss ended the roundtable discussion by saying that she is looking forward to reading Kazuo Ishiguro’s new novel Klara and the Sun, about an artificial being called Klara who longs to find a human owner, which is due to be published next month by Faber.
Thanks to everyone who spoke at and participated in this AURA workshop, to make it a lively and productive event. Extra special thanks to Deirdre Sullivan for helping to run the online event smoothly. Looking ahead, the third workshop on Artificial Intelligence and Archives: What comes next? is being organised by the University of Edinburgh in partnership with the AURA project team, and is scheduled to take place on Tuesday 16 March 2021. Please do join the AURA mailing list and follow #AURA_network on social media to be part of the network's ongoing discussions.