UK Web Archives Forum @ BBC Broadcasting House
Friday 19th June saw the first UK Web Archives Forum at Broadcasting House. This was set-up by BBC Archives as an opportunity to get the British Library, the National Archives and Channel 4 together with BBC Archives to discuss current archiving policies & practice in the ever shifting world of web & social media archiving. Representatives from the aforementioned institutions were present including the BBC's own Web Archives team.
The session was very well received, and everyone involved came away with lots of new ideas and potential future collaborations. Presentations and overviews of state-of-play web archiving activities were shared, and then in depth discussions on the moving landscape of web archiving methodology and the challenges in archiving social media took place.
Of great interest was the work underway by the BBC Archives, British Library and National Archives in the archiving of Twitter communications. Other major areas of interests were around standards and practices. The BBC has, for example, adopted a number of solutions for web archiving including Crawling WARCs, generating PDFs, screencasts and document archiving to ensure all basis are covered in preserving bbc.co.uk. It was interesting to see the scale adopted by the British Library in preserving the .uk web domain. And the National Archives also explained their challenges in archiving .gov websites, and the large array of government funded organisations at national levels.
It was decided that we would meet again in the future to look more collaborations, quality assurances in our archive results and how to best tackle future online distribution platforms, especially social media and mobile applications were younger generations are now consuming content at a faster rate than ever bfore. There are a lot of exciting challenges in the area of web archiving, so the need for a forum to discuss and shape policies and practices is vital. We hope to work with the Digital Preservation Coalition on future workshops in this field of work, to help to provide common standards for all concerned and for those standards to be shared with the wider UK web archiving community.
Some of the tools under discussion:
- GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP
- scrapy.org - An open source and collaborative framework forextracting the data you need from websites.
- BL PDF Presentation: Beyond the Harvest: Long Term Preservation of the UK Web Archive
- Info on the Web ARChive (WARC) archive format
- The Chilling Effects database collects and analyses legal complaints and requests for removal of online materials
- Zotero is free and open-source reference management software to manage bibliographic data and related research materials (such as PDF files)
- PhantomJS is a scripted, headless browser used for automating web page interaction
- http://keepvid.com/ KeepVid Video Downloader is a free web application that allows you to download videos
- Webrecorder - Create high-quality, verifiable archival recording of the content you browse.
Download and preserve the content for future use.
- mirror-web.com - Web Archive Company
- Hanzo Archives - Web Archive Company
- Internet Memory Foundation - Web Archive Company
By Carl Davies, Archive Manager, Radio & Multiplatform, BBC Engineering