A New Playback Tool for the UK Web Archive
We are delighted to announce that the UK Web Archive will be working with Rhizome to build a version of pywb (Python Wayback) that we hope will greatly improve the quality of playback for access to our archived content.
What is playback of a web archive?
When we archive the web, just downloading the content is not enough. Data can be copied from the web into an archive in a variety of ways, but to make this archive actually accessible takes more than just opening downloaded files in a web browser. Technical details of pages and scripts coming out of the archive need to be presented in a way that enables them to work just like the originals, although they aren’t located on their actual servers anymore. Today’s web users have come to expect interactive features and dynamic layouts on all types of websites. Faithfully reproducing these behaviors in the archive has become an increasingly complex challenge, requiring web archive playback software that is on-par with the evolution of the web as a whole.
Currently, we use the OpenWayback playback system, originally developed by the Internet Archive. But in more recent years, Rhizome have led the development of a new playback engine, called pywb (Python Wayback). This Python toolkit for accessing web archives is part of the Webrecorder project, and provides a modern and powerful alternative implementation that is being run as an open source project. This has led to rapid adoption of pywb, as the toolkit is already being used by the Portuguese Web Archive, perma.cc, the UK National Archives, the UK Parliamentary Archive, and a number of others.
To meet our needs we need to modify pywb, but as strong believers in open source development, all work will be in the open, and wherever appropriate, we will fold the improvements back into the core pywb project.
If all goes to plan, we expect to contribute the following back to pywb for others to use:
- Support for the OpenSearch Wayback API, improving compatibility with OpenWayback-style lookup services, which we use to record which URLs have been archived.
- Support for Hadoop WebHDFS API so archived content can be delivered from a Hadoop cluster, like ours.
- Improved localization support, so we can support our partner Legal Deposit Libraries and the communities they serve.
- Support for whitelists/blacklists of open access versus blocked or restricted URLs, to help us manage access effectively.
- Support the implementation of a simplified version of the recently proposed Raw Mementos extension of the Memento standard
Other UKWA-specific changes, like theming, implementing our Legal Deposit restrictions, and deployment support, will be maintained separately.
Initially we will work with Rhizome to ensure our staff and curators can access our archived material via both pywb and OpenWayback. If the new playback tool performs as expected we will move towards using pywb to support public access to all our web archives.