Science blog

3 posts categorized "Digital scholarship"

03 February 2017

HPC & Big Data

Add comment

Matt and Philip attended the HPC & Big Data conference on Wednesday 1st February. This is an annual one-day conference on the uses of high-performance computing and especially on big data. “Big data” is used widely to mean very large collections of data in science, social science, and business.

There were some very interesting presentations over the day. Anthony Lee from our friends the Turing Institute discussed the Institute’s plans for the future and the potential of big data in general. The increasing amounts of data being created in “big science” scientific experiments and the world at large mean that the problems of research have shifted from data collection being the hard part to processing capabilities being overwhelmed by the sheer volume of data.

A presentation from the Earlham Institute and Verne Global revealed that Iceland could become a centre for high-performance computing in the future, thanks to its combination of cheap, green electricity from hydroelectric and geothermal power, high-bandwidth data links to other continents, and a cool climate which reduces the need for active cooling of equipment. HPC worldwide now consumes more energy than the entire airline industry and whole countries of the size and development level of Italy and Spain. Seljalandsfoss-1207956_1280

Dave Underwood of the Met Office described the Met Office’s acquisition of the largest HPC computer in Europe. He also pointed out the extreme male-biased demographic of the event, something that both Matt and Philip had noticed (although we admit, one of our female team members could have gone instead of Philip).

Luciano Floridi of Oxford University discussed the ethical issues of Big Data and pointed out that as intangibles become a greater portion of companies’ value, so scandal becomes more damaging to them. Current controversies involving behaviour on the internet suggest that moral principles of security, privacy, and freedom of speech may be increasingly conflicting with one another, leading to difficult questions of how to balance them.

JISC gave a presentation on their actual and planned shared HPC data centres, and invited representatives from our friends and neighbours at the Crick Institute, and the Wellcome Trust’s Sanger Institute on their IT plans. Alison Davis from Crick pointed out that an under-rated problem for academic IT departments is individual researchers’ desire to carry huge quantities of digital data with them when they move institutions, causing extra demand on storage and raising difficult issues of ownership.

Finally, Richard Self of the University of Derby gave an illuminating presentation on the potential pitfalls of “big data” in social science and business, such as the fact that the size of a sample does not guarantee that it is representative of the whole population, the probability of finding apparent correlations in a large sample that are created by chance and not causation, and the lack of guaranteed veracity. (For example, in one investigation 14% of geographical locations from mobile phone data were 65km or more out of place.)

Philip Eagle, Content Expert - STM

05 September 2016

Social Media Data: What’s the use?

Add comment

Team ScienceBL is pleased to bring you #TheDataDebates -  an exciting new partnership with the AHRC, the ESRC and the Alan Turing Institute. In our first event on 21st September we’re discussing social media. Join us!

Every day people around the world post a staggering 400 million tweets, upload 350 million photos to Facebook and view 4 billion videos on YouTube. Analysing this mass of data can help us understand how people think and act but there are also many potential problems.  Ahead of the event, we looked into a few interesting applications of social media data.

Politically correct? 

During the 2015 General Election, experts used a technique called sentiment analysis to examine Twitter users’ reactions to the televised leadership debates1. But is this type of analysis actually useful? Some think that tweets are spontaneous and might not represent the more calculated political decision of voters.

On the other side of the pond, Obama’s election strategy in 2012 made use of social media data on an unprecedented scale2. A huge data analytics team looked at social media data for patterns in past voter characteristics and used this information to inform their marketing strategy - e.g. broadcasting TV adverts in specific slots targeted at swing voters and virtually scouring the social media networks of Obama supporters on the hunt for friends who could be persuaded to join the campaign as well. 

Image from Flickr

In this year's US election, both Hillary Clinton and Donald Trump are making the most of social media's huge reach to rally support. The Trump campaign has recently released the America First app which collects personal data and awards points for recruiting friends3. Meanwhile Democrat nominee Clinton is building on the work of Barack Obama's social media team and exploring platforms such as Pinterest and YouTube4. Only time will tell who the eventual winner will be.

Playing the market

You know how Amazon suggests items you might like based on the items you’ve browsed on their site? This is a common marketing technique that allows companies to re-advertise products to users who have shown some interest in the brand but might not have bought anything. Linking browsing history to social media comments has the potential to make this targeted marketing even more sophisticated4.

Credit where credit’s due?

Many ‘new generation’ loan companies don’t use a traditional credit checks but instead gather other information on an individual - including social media data – and then decide whether to grant the loan5. Opinion is divided as to whether this new model is a good thing. On the one hand it allows people who might have been rejected by traditional checks to get credit. But critics say that people are being judged on data that they assume is private. And could this be a slippery slope to allowing other industries (e.g. insurance) to gather information in this way? Could this lead to discrimination?

Image from Flickr

What's the problem?

Despite all these applications there’s lots of discussion about the best way to analyse social media data. How can we control for biases and how do we make sure our samples are representative? There are also concerns about privacy and consent. Some social media data (like Twitter) is public and can be seen and used by anyone (subject to terms and conditions). But most Facebook data is only visible to people specified by the user. The problem is: do users always know what they are signing up for?

Image from Pixabay

Lots of big data companies are using anonymised data (where obvious identifiers like name and date of birth are removed) which can be distributed without the users consent. But there may still be the potential for individuals to be re-identified - especially if multiple datasets are combined - and this is a major problem for many concerned with privacy.

If you are an avid social media user, a big data specialist, a privacy advocate or are simply interested in finding out more join us on 21st September to discuss further. Tickets are available here.

Katie Howe

15 March 2016

Tunny and Colossus: Donald Michie and Bletchley Park

Add comment Comments (1)

In honour of British Science Week Jonathan Pledge explores the work of Donald Michie, a code-breaker at Bletchley Park from 1942 to 1945. The Donald Michie papers are held at the British Library.

Donald Michie (1923-2007) was a scientist who made key contributions in the fields of cryptography, mammalian genetics and artificial intelligence (AI).

Copy of a photograph of Donald Michie taken while he was at Bletchley Park (Add MS 89072/1/5). Copyright the estate of Donald Michie/Crown Copyright.

In 1942, Michie began working at Bletchley Park in Buckinghamshire as a code-breaker under Max H. A. Newman. His role was to decrypt the German Lorenz teleprinter cypher - codenamed ‘Tunny’.

The Tunny machine was attached to a teleprinter and encoded messages via a system of two sets of five rotating wheels, named ‘psi’ and ‘chi’, by the code-breakers. The starting position of the wheels, known as a wheel pattern, was decided by a predetermined code before the operator entered the message. The encryption worked by generating an additional letter, derived from the addition of each letter generated by the psi and chi wheels to each letter of the unencrypted message entered by the operator. The addition worked by using a simple rule represented here as dots and crosses:

• + • = •

x + x = •

• + x = x

x + • = x

Therefore using these rules, M in the teleprinter alphabet, represented as:  • • x x x, added to N: • • x x •, gives • • • • x, the letter T.

Detail of the Lorenz machine showing the encoding wheels. Creative Commons Licence.

In order for messages to be decrypted it was initially necessary to know the position of the encoding wheels before the message was sent. These were initially established by the use of ‘depths’. A depth occurred when the Tunny operator mistakenly repeated the same message with subtle textual differences without first resetting the encoding wheels.

A depth was first intercepted on 30 August 1941 and the encoding text was deciphered by John Tiltman. From this the working details of Tunny were established by the mathematician William Tutte without his ever having seen the machine itself; an astonishing feat. Using Tutte’s deduction the mathematician Alan Turing came up with a system for devising the wheel patterns; known as ‘Turingery’.

Turing, known today for his role in breaking the German navy’s ‘Enigma ‘code, was at the time best known for his 1936 paper ‘On Computable Numbers’ in which he had theorised about a ‘Universal Turing Machine’ which today we would recognise as a computer. Turing’s ideas on ‘intelligent machines’, along with his friendship, were to have a lasting effect on Michie and his future career in AI and robotics. 

Between July and October 1942, all German Tunny messages were decrypted by hand. However changes to the way the cypher was generated meant that finding the wheel setting by hand was no longer feasible. It was again William Tutte who came up with a statistical method for finding the wheels settings and it was the mathematician Max Newman who suggested using a machine for processing the data.

FO 850_234-2
Colossus computer [c 1944]. By the end of the War there were ten such machines at Bletchley. Crown Copyright.

Initially an electronic counter dubbed ‘Heath Robinson’ was used for data processing. However it was not until the engineer Thomas Flowers, designed and built Colossus, the world’s first large scale electronic computer, that wheel patterns and therefore the messages could be decrypted at speed. Michie too, along with Jack Good, played a part, discovering a way of using Colossus to dramatically reduce the processing time for ciphered texts.

The decrypting of Tunny messages was critical in providing the Allies with information on high level German military planning in particular for the Battle of Kursk in 1943 and surrounding preparations for the D-Day invasion of 1944

One of the great ironies is that much of this pioneering and critical work remained a state secret until 1996. It was only through Donald Michie’s tireless campaigning that the General Report on Tunny, written in 1945 by Michie, Jack Good and Geoffrey Timmins, was finally declassified by the British Government; providing proof of the code-breakers collective achievements during the War. 

Pages from Donald Michie’s copy of the General Report on Tunny. (Add MS 89072/1/6). Crown Copyright.

 Donald Michie at the British Library

The Donald Michie Papers at the British Library comprises of three separate tranches of material gifted to the library in 2004 and 2007. They consist of correspondence, notes, notebooks, offprints and photographs and are available to researchers through the British Library’s Explore Archives and Manuscripts catalogue at Add MS 88958, Add MS 88975 and Add MS 89072.


Jonathan Pledge: Curator of Contemporary Archives and Manuscripts, Public and Political Life

Read more about ciphers in the British Library's collections on Untold Lives