04 October 2013
Collecting new data
This week Elizabeth Newbold and Lee-Ann Coleman reflect on a week of data related meetings in Washington DC
You can’t go to Washington DC and not go to the Air and Space Museum – at least not if you’re interested in science and have a free day. We saw a great exhibition about the Wright brothers and their experiments into flight, which highlighted the value of collecting new data. The brothers relied upon existing ‘tables of coefficients’ to factor into their equations for calculating lift and drag upon different wing shapes. But their experiments showed that the coefficient for the density of air – in use since the 18th Century – did not appear to be right. They determined a new average which, using modern techniques, was shown to be very close to the correct value. What a great demonstration of the value of re-use of data and being open to evaluating it in the light of new evidence.
Most of the week of 16-20 September was not spent at great museums but at the Research Data Alliance second plenary and the DataCite Summer Meeting. Both were held (mostly) in the beautiful National Academy of Sciences building, where a statue of Einstein looks benignly over the gardens.
This was our first time at an RDA meeting – not surprising, since it is a relatively new venture brought about by the US National Science Foundation, the Australian Government and the European Commission – so we weren’t too sure what to expect. On the first day, an array of impressive speakers highlighted the value of access to research data. The second day was reserved for meetings of the working groups and on the third day, representatives of these reported back to the whole assembly. These meetings are not typical academic conferences but are intended to be working meetings and will be held twice a year, meaning that being involved requires significant commitment.
The speakers, including Tom Kalil (Deputy Director for Technology and Innovation, White House Office of Science and Technology Policy) and John Wilbanks (Chief Commons Officer, Sage Bionetworks) highlighted the need to speed up scientific discovery and its applications. For this to happen the implementation of frameworks, legal as well as technical, are required. President Obama signed the open data executive order in May this year - but implementation is the next step.
The RDA aims to create a community of practice and a pipeline of impact and it is doing this through both working and interest groups. Interest groups cover broad topics that are on-going, with loosely defined goals but as clarity emerges about a problem or issue to address they may then develop into working groups. Working groups produce case statements of what they will do, meet virtually every 4 to 6 weeks and are expected to last around 12 – 18 months. Some of these groups are very focussed, addressing a particular, often technical, issue but others seem less well defined. Given that the RDA is just becoming established, great progress has been made but with a ‘ground-up’ approach, it is difficult to know if all the issues are being addressed. They are currently seeking an executive director – who will hopefully provide a clearer sense of direction and be able to provide an aerial view of the landscape and a better articulated strategy for achieving the aims.
And then it was onto particle physics –for the start of the DataCite summer meeting. Salvatore Mele, Head of Open Access at CERN told us that the dataset providing evidence of the Higgs Boson had been cited in a paper with a DataCite DOI.
Some other highlights from the meeting included the presentation from Michael Witt of Purdue University about the Purdue University Research Repository – aka PURR. It has a lot of nice features to encourage researchers to upload, store and produce data management plans. It issues DOIs and emails users monthly with metrics and offers secure and reliable preservation for 10 years.
There were presentations from a range of organisations with interests in citation and identification. Thomson Reuters discussed their data citation index, launched last year, with 3m records. While it is not aiming to be most comprehensive it is aiming to link to important, relevant scientific data. CrossRef highlighted the new service they are offering called FundRef which aims to enable tracking of funding sources to publications and other outputs. A pilot is underway involving several US funding agencies and the Wellcome Trust and a registry of over 4000 funding body names has been created. Ultimately, to funder IDs, grant numbers and DOI could be linked. The presentation from ORCID – an identifier service for individuals demonstrated that it’s not just for the John Smiths! Over 280,000 identifiers have been issued since October 2012. Grant submission systems are starting to ask for ORCID IDs during submission process and HEIs are also getting on board. Some publishers are also requesting it.
So a lot of interesting data for thought – but considering the Wright brothers again provides a reminder that the reason for all of this activity is to enable research and support those people generating the data in the first place, to make better use of it and as result enhance science.