UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

4 posts from November 2012

29 November 2012

Monarchy and New Media: bookings open

Bookings are now open for this one-day conference, in London, on Thursday 7 February 2013. We at the UK Web Archive are joint organisers, with the Institute of Historical Research (University of London), and the Royal Archives.

The end of the Diamond Jubilee year affords an opportunity to look back and examine a neglected aspect of the history of the monarchy: the engagement with new forms of media. The event will include reflections on royal engagement with successive new technologies: telegraphy, radio, newsreel and television.

The event will also see the formal launch of our own jubilee collection, with reflections on our experience of creating it in collaboration with the Royal Archives and the IHR, and one historian’s engagement with the collection itself (our very own Peter Webster).

Booking costs a very reasonable £10, and further details, including a programme and a booking form may be found here.

22 November 2012

Upgrading the Wayback Machine

We're very shortly to upgrade our deployment of the Open Source Wayback Machine, the software made openly available by the Internet Archive to enable browsing of timed snapshots of an archived site. (See it in action in the UK Web Archive.) We're deploying a new version made available by the Internet Archive earlier this year.

Users will see immediately some enhancements. The banner at the top now will include more information about the number of instances of each site that are available, and an easier way of navigating between them. The information will be available in Welsh, in recognition of the remit of the archive for the whole of the UK; and there's also a handy Help link. For now, however, it will no longer be possible to minimise the banner and then reveal it again; it will be necessary to reload the page to see the banner once minimised.

Behind the scenes, the new version reads directly from our Hadoop Distributed File System (HDFS) which is more cost-effective, simpler to administer, more robust, and easier to scale up to cater for growing levels of usage.

15 November 2012

Non-Print Legal Deposit Regulations 2013: what will they say ?

Next year we anticipate that regulations will come into force which provide for legal deposit for non-print works, mirroring the longstanding situation for printed works. The final draft of the regulations, to be laid before Parliament, has recently been published.
Here's a summary of their impact in relation to web archiving.

What will we be collecting ?

The regulations cover four deposit models, of which the most relevant for the web archiving team are that:

(i) a deposit library may copy UK publications from the open web, including  websites, plus open access journals and books, government publications etc.;
(ii) a deposit library may collect other password-protected material by harvesting, subject to giving at least 1 month’s written notice for the publisher to provide access credentials (with some limited exemptions).

The regulations apply to any digital or other non-print publication, except:

(i) film and recorded sound where the audio-visual content predominates [but, for example, web pages containing video clips alongside text or images are within scope];
(ii) private intranets and emails;
(iii) personal data in social networking sites or that are only available to restricted groups.

The regulations apply to online publications:

(i) that are issued from a .uk or other UK geographic top-level domain, or;
(ii) where part of the publishing process takes place in the UK;
(iii) but excluding any which are only accessible to audiences outside the UK.

What will the Library be able to do with it ?

Deposited material may not be used for at least seven days after it is deposited or harvested.

After that, deposit libraries may:

(i) transfer, lend, copy and share deposited material with each other;
(ii) use deposited material for their own research;
(iii) copy deposited material, including in different formats, for preservation.

What will users be able to do with it ?

Users may only access deposited material while on “library premises controlled by a deposit library”.

Users may only print one copy of a restricted amount of any deposited material, for non-commercial research or other defined ‘fair dealing’ purposes such as court proceedings, statutory enquiry, criticism and review or journalism.

No more than one user in each deposit library may access the same material at the same time.

Users may not make any digital copies, except by specific and explicit licence of the publisher.

What restrictions may publishers request ?

The publisher or other rights holders may request at any time an embargo of up to 3 years, and may renew such request as many times as necessary. The requested embargo must be granted if the deposit library is satisfied on reasonable grounds that providing access would conflict with the publisher’s or rights holders’ normal exploitation of the work and unreasonably prejudice the legitimate interests of the publisher.

These conditions remain in force forever, including after all intellectual property rights in the deposited material have expired [“perpetual copyright”].

08 November 2012

Web archiving at LIKE39: what, why and how

[A guest post from Marja Kingma, curator of Dutch language collections at the British Library, and one of the leading lights of LIKE, the London Information and Knowledge Exchange.]

With a captivated audience of information professionals before him, Peter Webster (British Library) kicked off the new season of LIKE events at LIKE39. Peter had just moved to the British Library to take up his new role as Web Archiving Engagement & Liaison Officer and LIKE had just moved its meeting place to a new venue: The Castle, sister pub of The Crown Tavern. The upstairs room has state-of-the-art technical facilities and, more importantly, its own bar! And it is even closer to Farringdon Station than the Crown.

Peter is a contemporary historian and has worked with digital information in previous jobs. He now works with the UK Web Archive to raise people’s awareness of it and to encourage them to submit sites to the Archive. LIKE39 provided an excellent platform for this, because attendees know about the general issues around the ‘Digital Black Hole’ and the ephemeral nature of the Web, but they were not all familiar with the UK Web Archive.

Archiving the web, i.e. harvesting websites based in the UK on a regular basis, is just an extension of what the BL, TNA and other participants do with print material. In the past much of what has been printed has been lost and now the same is threatening to happen with electronic material and websites. Websites either disappear completely, or are abandoned, leaving no trace of a contact, which Peter called ‘orphans’. An example is the City Information Group, where the idea for LIKE was born just as CIG folded. Its site is still on the web, but the ‘Contact’ page is no longer available. In this case, there is a good chance that a contact can be found, but this is much more problematic in other cases.

Professionals started to see a Digital Black Hole appearing and something had to be done. In 2003 the Legal Deposit Library Act was passed, establishing an legal deposit requirement for publishers of electronic material, but this Act still has to be implemented. Fortunately the BL, TNA, and the Wellcome Trust didn’t wait for that to happen and started the UK Web Archive, selection and permission-based. After ten years of setting up and establishing partnerships, it is now ‘business as usual’, just in time for the implementation of the LDLA, which now seems to be going forward in earnest next year. This should make redundant the current practice of asking web owners' permission to capture their site, although this would still be necessary to make the archived copy publicly available. It should also speed up ingestion of content into the Archive, by systematic crawling of the UK domain. Alongside this method curators will continue to create thematic collections by actively bringing together websites from within the larger dataset.

It is important that all professionals dealing with websites in one way or other prepare for the preservation of their site(s), as part of the life-cycle for records management. Archiving your website preserves it for future access; researchers as well as the general public will always be able to see what your site looked like in the past. Any one can nominate sites for inclusion in the UK Web Archive, using the simple online form. is being processed for the UK Web Archive and that is a good thing, because we like to think of LIKE as the first networking group for information professionals founded on and managed by using social media tools. It would be a shame if future historians would not be able to track its development from the start!