2015 UK Domain Crawl has started
We are proud to announce that the 2015 UK Domain Crawl has started !
Over the next weeks our web crawler will visit every website in the UK, download and keep it safe on the British Library archive servers.
The first ever UK Domain crawl was run in 2013 it resulted in:
- 3.8 million seeds (starting URLs)
- 31TB data
- 1.9 billion web pages and other assets
The 2014 built on experiences and yielded:
- 20 million seeds
- Geo IP check of UK hosted websites (2.5 million seeds)
- 56TB data
- 2.5 billion webpages and other assets
- including: 4.7GB of viruses and 3.2TB of screenshots
What will the 2015 crawl be like? Will we find more urls? Surely the web grows every day, but how much? Will there be more data? Will we have more virus content?
Tweet your suggestions and thoughts about the UK Domain @UKWebArchive or use the #UKWebCrawl2015
Homepage Crawl Log Flypast © Andy Jackson