THE BRITISH LIBRARY

Science blog

03 February 2017

HPC & Big Data

Big-data-1667184_1280
Matt and Philip attended the HPC & Big Data conference on Wednesday 1st February. This is an annual one-day conference on the uses of high-performance computing and especially on big data. “Big data” is used widely to mean very large collections of data in science, social science, and business.

There were some very interesting presentations over the day. Anthony Lee from our friends the Turing Institute discussed the Institute’s plans for the future and the potential of big data in general. The increasing amounts of data being created in “big science” scientific experiments and the world at large mean that the problems of research have shifted from data collection being the hard part to processing capabilities being overwhelmed by the sheer volume of data.

A presentation from the Earlham Institute and Verne Global revealed that Iceland could become a centre for high-performance computing in the future, thanks to its combination of cheap, green electricity from hydroelectric and geothermal power, high-bandwidth data links to other continents, and a cool climate which reduces the need for active cooling of equipment. HPC worldwide now consumes more energy than the entire airline industry and whole countries of the size and development level of Italy and Spain. Seljalandsfoss-1207956_1280

Dave Underwood of the Met Office described the Met Office’s acquisition of the largest HPC computer in Europe. He also pointed out the extreme male-biased demographic of the event, something that both Matt and Philip had noticed (although we admit, one of our female team members could have gone instead of Philip).

Luciano Floridi of Oxford University discussed the ethical issues of Big Data and pointed out that as intangibles become a greater portion of companies’ value, so scandal becomes more damaging to them. Current controversies involving behaviour on the internet suggest that moral principles of security, privacy, and freedom of speech may be increasingly conflicting with one another, leading to difficult questions of how to balance them.

JISC gave a presentation on their actual and planned shared HPC data centres, and invited representatives from our friends and neighbours at the Crick Institute, and the Wellcome Trust’s Sanger Institute on their IT plans. Alison Davis from Crick pointed out that an under-rated problem for academic IT departments is individual researchers’ desire to carry huge quantities of digital data with them when they move institutions, causing extra demand on storage and raising difficult issues of ownership.

Finally, Richard Self of the University of Derby gave an illuminating presentation on the potential pitfalls of “big data” in social science and business, such as the fact that the size of a sample does not guarantee that it is representative of the whole population, the probability of finding apparent correlations in a large sample that are created by chance and not causation, and the lack of guaranteed veracity. (For example, in one investigation 14% of geographical locations from mobile phone data were 65km or more out of place.)

Philip Eagle, Content Expert - STM