29 November 2018
Dealing with computer viruses in digital collections
Evanthia Samaras, PhD placement - Digital Preservation
Malware, or ‘malicious software’ such as computer viruses are a significant digital collection care challenge. The British Library collects a large range of digital content, so it is important that we identify any malware that could potentially put the digital collections, or our users, at risk. We also need to properly consider the question: How should we deal with malware-infected materials in digital collections?
How do we identify malware?
The Library has strict processes in place to check for malware in digital collections. For example:
- As part of our Flashback disk imaging project, we have scanned over 16,000 floppy, CD and DVD discs from the 1980s to 2000s for malware using anti-virus software. Infected items are then moved to a designated ‘quarantine’ area.
- For websites collected as part of the UK Web Archive, the Library scans every file collected (over several billion files each year!). Website files infected with malware are quarantined and ‘deactivated’ using an encryption tool so that the files cannot be read or opened (see this blog for more information).
Compared to other institutions around the world, we actually do more virus checking than many other libraries (especially for our web archives).
What are the options for dealing with malware?
The four main options for dealing with malware-infected material are:
- Discard the malware.
- Put aside and quarantine (then process at a later date).
- Fix them (try to remove malware).
- Try to get another clean version from publisher/donor.
There is also another option: Keeping the malware as a collection in its own right.
Should we collect malware?
Scholars such as Jonathan Farbowitz of New York University argue that we should be preserving malware. He suggests that:
Malware is a form of cultural heritage and an important part of the historical record… If malware were not preserved, a significant portion of contemporary computer users’ experiences as well as the “texture” of the internet and of computing itself would be lost (pp. 10, 12).
If the British Library were to start forming collections of malware, how could we ensure they are maintained safely over time?
Computer security and anti-virus software companies collect examples of malware for research and development (see the Anti-Malware Testing Standards Organization’s Real-Time Threat List). Therefore, it is indeed possible to keep malware in controlled environments over time to facilitate study.
But it is less clear whether libraries should take custodianship of such material. Could it jeopardise the ongoing care of our digital collections?
Malware in the future
It is expected that the British Library will have to deal with malware for many years to come. Making sure our collections remain safe and usable for our readers is a priority for the Library. Yet it is also important that we consider what our readers may want to access in the future. Perhaps malware could be a collection in its own right? But for now, we will continue to tread with caution when dealing with malware in our digital collections.