Science blog

Exploring science at the British Library

4 posts from October 2013

25 October 2013

How open is it?

Elizabeth Newbold explains why not everything is as straightforward as it may appear for a science librarian dealing with Open Access content

It is Open Access week and sitting in an office with Anna Kinsey, the Engagement Manager for Europe PubMedCentral, Open Access has obviously been a topic of conversation. Anna has written a blog outlining some of her frustrations with Open Access. I’m not going to repeat them (I recommend reading her post) but in conversation with her it made me think about some of my own frustrations, as a librarian, dealing with free-to-access content.

A timely incident highlighted one of the frustrations we face as librarians in providing access to content. This week I had a question from a colleague, who asked “why can I access this article?” Possibly a slightly unusual question, as we are much more likely to be asked, “why can’t I access …”.and a surprising one, as it took longer to answer than we thought.

In the past, the simple response - “because we have a subscription to the journal” - is actually no longer always true or sufficient. Alternative answers, such as “because it’s open access” or “because it’s freely available” need to be considered. And in this particular case we actually needed to know why we had access, since the mode of access would determine how the reader could use the article.

We started with the first question – do we have a subscription? In this case, the answer was no, so we needed to see why we had access, which is where it started to get complicated. The obvious next thought would be is it an OA journal or OA article? In some cases this is easy to ascertain but not always! So my frustration is the lack of clear identification and consistent explanations regarding OA material. Words such as “free”, “freely available”, “free access”, “open research”, “open”, “open access” are used seemingly interchangeably – all adding to the confusion. It is especially so when we move outside the realms of STM journals and look at the ever increasing and varied amount of other freely available content. Information about the access and restrictions which apply to the material are often hidden away on web pages that are difficult to find. This week I finally found the explanation I needed on the ‘information for librarians’ page. Luckily, I’m a librarian so I looked there but would a non-librarian have found it?

This example may seem like a small issue but it highlights increasing fragmentation when we need to be able to easily and consistently identify why we can access something, otherwise we don’t know how it can be used. 

Whilst the OA symbol originally designed by the Public Library of Science has been used to good effect – wouldn’t it be nice if there was a universally accepted and adopted set of symbols to help identify Open Access material and navigate through the maze?

18 October 2013

Testing times

In our blog this week Katie Howe reviews our most recent TalkScience@BL event.

Last week we welcomed scientists, policy makers and members of the public to the British Library for the 22nd event in our TalkScience series - “Genetic testing in assisted reproduction: Selecting, not perfecting?”

Our panel of experts included Dr Joyce Harper (UCL), Nick Meade (Genetic Alliance UK) and Professor Rosamund Scott (King’s College London), who each presented their viewpoint and expertise around the controversial field of pre-implantation genetic diagnosis (PGD). Dr Alan Thornhill (Guy’s Hospital Assisted Conception Unit) stepped in at the last minute to chair as unfortunately Dr Tom Shakespeare (University of East Anglia) was unable to join us on the night due to illness.

Dr Alan Thornhill (far right) introduces the discussion. (Photo: Tony Grant; British Library)

Alan kicked off the evening by introducing the topic of PGD. He explained that PGD is a technique used by families with a history of a serious genetic disorder to select embryos which are unaffected by that condition. He also noted that the process is very emotionally, physically and financially demanding for families.

Nick Meade then presented two case studies of patients who had undergone PGD - one for a rare heart condition and one for a recessive muscle-wasting disease. He used these human stories to illustrate a key point - that informed choice is a key component of reproductive decision making.

Next up was Rosamund Scott who guided the audience through the current state of the law surrounding PGD and reminded us that because PGD patients also have to undergo IVF, any ‘selection’ will always be restricted by the number of embryos that can be produced.

The final speaker was Joyce Harper who introduced some of the new technologies that may come to prominence in the future, such as next generation sequencing, and addressed some of the ethical issues that these developments might raise.

Audience members discussing genetic testing during the TalkScience interval (Photo: Tony Grant; British Library)

After the break the discussion was opened up to the floor and generated some great debate both in the room and on Twitter.

Currently in the UK, only ‘serious’ genetic conditions can be licensed for testing and this sparked much debate over what the word serious means and the difficulty in considering the ’seriousness’ of diseases objectively. The discussion also turned to the use of the phrase ‘designer babies’ which has become synonymous with the field of PGD. The panel agreed that the term was overused and could trivialise some of the issues surrounding PGD. Alan Thornhill reminded us that PGD is primarily negative selection against embryos with a faulty gene, rather than positive enhancement.

Questions from the floor at TalkScience@BL (Photo: Tony Grant; British Library)

Alan summed up the evening aptly by saying, “There’s still a lot to talk about and we should probably do this again soon!”

If you were not able to join us on the night then check this webpage for a podcast of the discussion, which will be available soon. We’re now thinking ahead to the annual TalkScience Christmas Quiz ‘Let’s get Quizzical!’ on 4th December for which booking is now open. We look forward to seeing you again then!

Katie Howe

11 October 2013

What’s in a name?

This week Stephen Andrews discusses a problem he has identifying author's names.

So here’s the problem; it’s difficult, if not impossible, to find everything that a particular author produces – and sometimes even to know whether the author is the same person.  This is because the articles, books etc usually have unique identifiers – Digital Object Identifiers in the case of articles and datasets and International Standard Book Number in the case of books – but their authors do not.  As more and more information becomes available through internet search engines, such as Google, I often find myself staring at a list of names that all look very similar and then seeking out personal web pages to try to determine which one relates to the creator of the output I am interested in.  A somewhat time-consuming process.

It appears I am not alone.  Jackie Knowles of the Repository Support Project provided a good example of the sort of issue I find I am facing.  Each of the variations below is a valid entry for one particular individual:

Collis, G P
Paddy Collis
G P Collis
Gerard Paddy Collis
Collis, Gerard P
Gerard P Collis
Collis, Gerard Paddy
Collis, Paddy

Are these the same person?  How do I know which variation I should be searching on?  In my experience, the more common the name, the more difficult it is to identify an individual.

Libraries have been at the forefront in terms of identifying authors of physical items, such as books, but this doesn’t really scale when it comes to the plethora of heterogeneous outputs that are posted in the digital realm.  What is needed is a means by which authors/creators of these outputs can be disambiguated and assigned a unique identifier so that I can use it to retrieve all the known outputs that might emanate from a particular individual.  There have been attempts at this, e.g. the Dutch Digital Author Identifier, but up until now it has been a fragmented approach.

More recently we have seen two major international initiatives aimed at assigning identifiers to individuals – the International Standard Name Identifier  (ISNI) and Open Researcher and Contributor ID (ORCID). The British Library has been involved in both from the outset and is on the board of the ISNI consortium.  Through its involvement in projects such as ODIN (ORCID and DataCite Interoperability Network) and Names, the Library is exploring how such services might be built upon to include links to the outputs of research.  Both projects are looking at how published outputs can be linked to their creators, which is ultimately what I’m after.  In the case of the EU-funded ODIN project, the aim is to use ORCIDs to connect individuals with the datasets referenced in DataCite.  In the JISC-funded Names project the Library partnered with Mimas, a data centre based at the University of Manchester, to develop a pilot system for a name authority service for UK repositories that would uniquely and persistently identify individuals active in research.  The project developed an algorithm to automatically disambiguate names using data from a variety of sources.  Once disambiguated the individual is assigned an ISNI (see the collated record below). 


The fact that the project aimed to encompass grant information, learning materials, presentations and data, as well as papers, in both institutional and subject-based repositories suggests to me that if it were ever to be implemented as a live service the Names system could become a valuable resource.  At present around 50,000 individuals have been identified by the system.

So things are looking up.  The question now is how these initiatives will interoperate.  I don’t want to find myself back to square one having to search multiple sources because an individual’s identifiers haven’t been integrated into the information resources I regularly use.  But at least I can see a glimmer of light at the end of the tunnel.



04 October 2013

Collecting new data

This week Elizabeth Newbold and Lee-Ann Coleman reflect on a week of data related meetings in Washington DC

You can’t go to Washington DC and not go to the Air and Space Museum – at least not if you’re interested in science and have a free day. We saw a great exhibition about the Wright brothers and their experiments into flight, which highlighted the value of collecting new data. The brothers relied upon existing ‘tables of coefficients’ to factor into their equations for calculating lift and drag upon different wing shapes. But their experiments showed that the coefficient for the density of air – in use since the 18th Century – did not appear to be right. They determined a new average which, using modern techniques, was shown to be very close to the correct value. What a great demonstration of the value of re-use of data and being open to evaluating it in the light of new evidence.
Most of the week of 16-20 September was not spent at great museums but at the Research Data Alliance second plenary  and the DataCite Summer Meeting. Both were held (mostly) in the beautiful National Academy of Sciences building, where a statue of Einstein looks benignly over the gardens.


This was our first time at an RDA meeting – not surprising, since it is a relatively new venture brought about by the US National Science Foundation, the Australian Government and the European Commission – so we weren’t too sure what to expect. On the first day, an array of impressive speakers highlighted the value of access to research data. The second day was reserved for meetings of the working groups and on the third day, representatives of these reported back to the whole assembly. These meetings are not typical academic conferences but are intended to be working meetings and will be held twice a year, meaning that being involved requires significant commitment.

The speakers, including Tom Kalil (Deputy Director for Technology and Innovation, White House Office of Science and Technology Policy) and John Wilbanks (Chief Commons Officer, Sage Bionetworks) highlighted the need to speed up scientific discovery and its applications. For this to happen the implementation of frameworks, legal as well as technical, are required. President Obama signed the open data executive order in May this year - but implementation is the next step.
The RDA aims to create a community of practice and a pipeline of impact and it is doing this through both working and interest groups. Interest groups cover broad topics that are on-going, with loosely defined goals but as clarity emerges about a problem or issue to address they may then develop into working groups. Working groups produce case statements of what they will do, meet virtually every 4 to 6 weeks and are expected to last around 12 – 18 months. Some of these groups are very focussed, addressing a particular, often technical, issue but others seem less well defined. Given that the RDA is just becoming established, great progress has been made but with a ‘ground-up’ approach, it is difficult to know if all the issues are being addressed. They are currently seeking an executive director – who will hopefully provide a clearer sense of direction and be able to provide an aerial view of the landscape and a better articulated strategy for achieving the aims.
And then it was onto particle physics –for the start of the DataCite summer meeting. Salvatore Mele, Head of Open Access at CERN told us that the dataset providing evidence of the Higgs Boson had been cited in a paper with a DataCite DOI.

Some other highlights from the meeting included the presentation from Michael Witt of Purdue University about the Purdue University Research Repository – aka PURR. It has a lot of nice features to encourage researchers to upload, store and produce data management plans. It issues DOIs and emails users monthly with metrics and offers secure and reliable preservation for 10 years.

There were presentations from a range of organisations with interests in citation and identification. Thomson Reuters discussed their data citation index, launched last year, with 3m records. While it is not aiming to be most comprehensive it is aiming to link to important, relevant scientific data. CrossRef highlighted the new service they are offering called FundRef which aims to enable tracking of funding sources to publications and other outputs. A pilot is underway involving several US funding agencies and the Wellcome Trust and a registry of over 4000 funding body names has been created. Ultimately, to funder IDs, grant numbers and DOI could be linked. The presentation from ORCID – an identifier service for individuals demonstrated that it’s not just for the John Smiths! Over 280,000 identifiers have been issued since October 2012. Grant submission systems are starting to ask for ORCID IDs during submission process and HEIs are also getting on board. Some publishers are also requesting it.

So a lot of interesting data for thought – but considering the Wright brothers again provides a reminder that the reason for all of this activity is to enable research and support those people generating the data in the first place, to make better use of it and as result enhance science.