Science blog

10 posts categorized "Open data"

29 August 2017

I4OC: The British Library and open data

Add comment

In August the British Library joined the Initiative for Open Citations as a stakeholder. The I4OC’s aim of promoting the availability of structured, separable, open citation data fits perfectly with the Library's established strategy for open metadata which has just marked its seventh anniversary. I4oc logo

In August 2010, responding to UK Government calls for increased access to public data to promote transparency, economic growth and research, the British Library launched the strategy by offering over 16m CC0 licensed records from its catalogue and national bibliography datasets. This initiative aimed to remove constraints created by restrictive licensing and library specific standards to enable wider community re-use. In doing so the Library aimed to unlock the value of the data while improving access to information and culture in line with its wider strategic objectives.
The initial release was followed in 2011 by the launch of the Library’s first Linked Open Data (LOD) bibliographic service. The Library believed Linked Open Data to be a logical evolutionary step for the established principle of freedom of access to information, offering trusted knowledge organisations a central role in the new information landscape. The development proved influential among the library community in moving the Linked Data debate from theory to practice.

Over 1,700 organisations in 123 countries now use the Library’s open metadata services with many more taking single files. The value of the Library’s open data work was recognised by the British National Bibliography linked dataset receiving a 5 star rating on the UK Government site and certification from the Open Data Institute (ODI). In 2016 the Library launched the platform in order to offer copies of a range of its datasets available for research and creative purposes. In addition, the BL Labs initiative continues to explore new opportunities for public use of the Library’s digital collections and data in exciting and innovative ways. The British Library therefore remains committed to an open approach to enable the widest possible re-use of its rich metadata and generate the best return on the investment in its creation.

I4oc users
I4OC users by country


As the example of the British Library’s open data work shows, opening up metadata facilitates access to information, creates efficiencies and allows others to enhance existing and develop new services. This is particularly important for researchers and others who do not work for organisations with subscriptions to commercial citation databases. The British Library believes that opening up metadata on research facilitates both improved research information management and original research, and therefore benefits all.

The I4OC’s recent call to arms for its stakeholders is therefore very much in tune with the British Library’s open data work in promoting the many benefits of freely accessible citation data for scholars, publishers and wider communities. Such benefits proved compelling enough to enable the I4OC to secure publisher agreement for nearly half of indexed scholarly data to be made openly accessible. This data is now being used in a range of new projects and services including OpenCitations and Wikidata. It's encouraging to see I4OC spreading the open data ideal so successfully and it is to be hoped that it will also succeed in ensuring open citations become the default in future.

Correction: Image shows users of BL open data services by country, not I4OC

05 September 2016

Social Media Data: What’s the use?

Add comment

Team ScienceBL is pleased to bring you #TheDataDebates -  an exciting new partnership with the AHRC, the ESRC and the Alan Turing Institute. In our first event on 21st September we’re discussing social media. Join us!

Every day people around the world post a staggering 400 million tweets, upload 350 million photos to Facebook and view 4 billion videos on YouTube. Analysing this mass of data can help us understand how people think and act but there are also many potential problems.  Ahead of the event, we looked into a few interesting applications of social media data.

Politically correct? 

During the 2015 General Election, experts used a technique called sentiment analysis to examine Twitter users’ reactions to the televised leadership debates1. But is this type of analysis actually useful? Some think that tweets are spontaneous and might not represent the more calculated political decision of voters.

On the other side of the pond, Obama’s election strategy in 2012 made use of social media data on an unprecedented scale2. A huge data analytics team looked at social media data for patterns in past voter characteristics and used this information to inform their marketing strategy - e.g. broadcasting TV adverts in specific slots targeted at swing voters and virtually scouring the social media networks of Obama supporters on the hunt for friends who could be persuaded to join the campaign as well. 

Image from Flickr

In this year's US election, both Hillary Clinton and Donald Trump are making the most of social media's huge reach to rally support. The Trump campaign has recently released the America First app which collects personal data and awards points for recruiting friends3. Meanwhile Democrat nominee Clinton is building on the work of Barack Obama's social media team and exploring platforms such as Pinterest and YouTube4. Only time will tell who the eventual winner will be.

Playing the market

You know how Amazon suggests items you might like based on the items you’ve browsed on their site? This is a common marketing technique that allows companies to re-advertise products to users who have shown some interest in the brand but might not have bought anything. Linking browsing history to social media comments has the potential to make this targeted marketing even more sophisticated4.

Credit where credit’s due?

Many ‘new generation’ loan companies don’t use a traditional credit checks but instead gather other information on an individual - including social media data – and then decide whether to grant the loan5. Opinion is divided as to whether this new model is a good thing. On the one hand it allows people who might have been rejected by traditional checks to get credit. But critics say that people are being judged on data that they assume is private. And could this be a slippery slope to allowing other industries (e.g. insurance) to gather information in this way? Could this lead to discrimination?

Image from Flickr

What's the problem?

Despite all these applications there’s lots of discussion about the best way to analyse social media data. How can we control for biases and how do we make sure our samples are representative? There are also concerns about privacy and consent. Some social media data (like Twitter) is public and can be seen and used by anyone (subject to terms and conditions). But most Facebook data is only visible to people specified by the user. The problem is: do users always know what they are signing up for?

Image from Pixabay

Lots of big data companies are using anonymised data (where obvious identifiers like name and date of birth are removed) which can be distributed without the users consent. But there may still be the potential for individuals to be re-identified - especially if multiple datasets are combined - and this is a major problem for many concerned with privacy.

If you are an avid social media user, a big data specialist, a privacy advocate or are simply interested in finding out more join us on 21st September to discuss further. Tickets are available here.

Katie Howe

05 October 2015

New opportunities for collaborative PhD research exploring the British Library’s science collections

Add comment Comments (0)

Applications for collaborative PhD research around the British Library’s science collections are now open to UK universities and other HEIs

AHRC logoThe British Library is looking for university partners to co-supervise collaborative PhD research projects that will open up unexplored aspects of its science collections.  Funding is available from the Arts & Humanities Research Council (AHRC) Collaborative Doctoral Partnerships programme, through which the Library works with UK universities or other eligible Higher Education Institutes around strategic research themes.

Our current CDP opportunities include a project to examine the culture and evolution of scientific research, drawing on scientists’ personal archives, and another project to develop digital tools for the investigation of scientific knowledge in the 17th and 18th centuries:

The Working Life of Scientists: Exploring the Culture of Scientific Research through Personal Archives

This project will involve a detailed mapping of the key personal relationships of 20th century British scientists to shed light on the nature, communication and reception of scientific research. It will draw on the Library’s Contemporary Archives and Manuscripts collections, which include personal archives and correspondence from the fields of computer science and programming, cybernetics and artificial intelligence, as well as evolutionary, developmental and molecular biology. As well as being situated within social and cultural history, particularly the history of science and the history of ideas, this cross-disciplinary project is applicable to research in areas such as social anthropology, sociology and social network analysis. It will open up a nuanced understanding of the BL’s collection of the personal archives of twentieth century British scientists. It will enable us to better exploit these valuable collections to research audiences across a number of disciplines.

Hans Sloane’s Books: Evaluating an Enlightenment Library

SloaneEngravedPortraitCroppedThis Digital Humanities projectwill evaluate the library of Hans Sloane (1660-1753): physician, collector and posthumous ‘founding father’ of the British Museum. For over sixty years, Hans Sloane was a dominant figure on London’s intellectual and social landscape. At the heart of his vast collections stood a library of 45,000 books, which – alongside his voluminous correspondence and thousands of prints, drawings, specimens and artefacts – bears witness to his central position in a globalised network of scientific discovery. The CDP project will apply digital techniques to exploit the raw data on over 32,000 items in the Sloane Printed Books Catalogue, and will break new ground by developing digital tools to cross reference, contextualise and analyse the data. This will forge fresh insights into how medical and scientific knowledge was gathered and disseminated in the pre-Linnaean period, with relevance to the history of science, medicine and collecting.


Moving beyond our science collections, there is also a third CDP opportunity for a project on ‘Digital Publishing and the Reader’. This will investigate the changing nature of publishing in digital environments to consider how new communication technologies should be recorded or collected as part of a national collection of British written culture.

Applications are invited from academics to develop any of these research themes with a view to co-supervising a PhD project with the British Library from October 2016. Our HEI partners receive and administer the funds for a full PhD studentship from the AHRC and, in collaboration with the Library, oversee the research and training of the student. We provide the student with staff-level access to our collections, expertise and facilities, as well as financial support for research-related costs of up to £1,000 a year.

View further details and application guidelines.

To apply, send the application form to by 27 November 2015.


06 February 2015

DataCite Case Study: at the Unviersity of Leeds

Add comment Comments (0)

In June last year, we held a DataCite workshop hosted by the University of Glasgow. We've now turned our speaker's use of Digital Object Identifiers (DOIs) for rainforest data into a video and printed case study.

You can still find a short summary of that event here. Our thanks go to Gabriela Lopez-Gonzalez for taking the time to come and film with us.


We hope that this case study will help institutions promote the idea of data citation and use of DOIs for data to their researchers, and that this in turn will encourage more submission of data to institutional repositories.


A DataCite DOI is not just for data

During January we had also been trying to spread the word that DOIs from DataCite aren't necessarily just for data. We've been working with the British Library's EThOS service to look at how UK institutions might give DOIs to their electronic theses and dissertations.

There was an initial workshop to divine the issues in November 2014, and on 16th January we held a bigger workshop, bringing more institutions together to look at how we might start to establish a common way of identifying e-theses in the UK.

The technical step of assigning a DOI to a thesis is relatively straightforward. Once an institution is working with DataCite (or CrossRef) they can use their established systems to assign a DOI to a thesis. But the policies surrounding the issue and management of this process are more complex. We're hoping that these workshops have helped everyone to pull in the same direction and collaborate on answers to common questions.

This work has given rise to a proposal to look at how to improve the connection between a thesis and the data it is built on. By triggering the consideration of sharing the data supporting a thesis, maybe we can "get 'em young" and introduce good data sharing practice as early in the research career as possible. Connecting the thesis and its data also increases the visibility of both, helping early career researchers to reap the benefits of their hard work sooner.

Watch this space to see what happens next!


12 February 2014

Is Necessity The Mother of Invention?

Add comment Comments (1)

Scientific discovery and invention. What drives them? What connects them? Allan Sudlow and Katie Howe delve into the Library’s collections to uncover some answers.

Scientists have long used patents to protect their inventions and allow them opportunities to commercialise their work. Recent controversies in cancer and stem cell research have highlighted the social and ethical, as well as the economic implications of biomedical patents. We will be exploring these issues in our forthcoming TalkScience event on 4 March: Patently Obvious?

In the meantime, we have been taking a look back at what distinguishes a scientific discovery from an invention – and asking – is necessity really the mother of invention?

The Oxford English Dictionary attributes the first printed usage of the proverb ‘Necessity is the mother of invention’ to Richard Franck in his tome Northern Memoirs, first published in 1694:

“Art imitates nature, and necessity is the mother of invention; science also invites to study and practicks, but theory gives the prospect, and operation finishes the project.”

  Northern Memoirs
Frontispiece from Northern Memoirs, Calculated for the Meridian of Scotland, Richard Franck. (1694)

At the turn of the last century, the mathematician and philosopher Alfred North Whitehead took a different view on the origins of invention, and its relationship to scientific discovery, noting in The Aims of Education:

“…inventive genius requires pleasurable mental activity as a condition for its vigorous exercise. ‘Necessity is the mother of invention’ is a silly proverb. ‘Necessity is the mother of futile dodges’ is much nearer the truth. The basis of the growth of modern invention is science, and science is almost wholly the outgrowth of pleasurable intellectual curiosity.”

This insight from the past provides a rallying call to those that support the idea of ‘blue skies’ research and feel that scientific discovery and invention should be driven by curiosity rather than a strategy or a set of pre-defined rules. In contrast, O.T. Mason describes, very precisely, what he believes underpins the nature of invention in an article The Evolution of Invention from 1895, published in the first volume of the journal Science:

  1. Of the thing or process, commonly called inventions.
  2. Of the apparatus and methods used.
  3. Of the rewards to the inventor.
  4. Of the intellectual activities involved.
  5. Of society

Fast-forward to the present, and the European Patent Convention defines – or rather doesn’t define - invention in terms of:

 “…a non-exhaustive list of things which are not regarded as inventions. It will be noted that the items on this list are all either abstract (e.g. discoveries or scientific theories) and/or non-technical (e.g. aesthetic creations or presentations of information). In contrast to this, an "invention" … must be of both a concrete and a technical character”

So we see some distinction between discovery and invention: the abstract vs the concrete. But what – I hear you cry – about necessity?

The Human Genome Project (HGP), the world’s largest biological project to date, is a great example of necessity being a spur for collaborative discovery. The HGP’s aim was to determine the sequence of the three billion chemical building blocks that make up human DNA – the entire human genetic code. Many of the scientists involved saw the HGP as a race between public and commercial research interests. In particular: Craig Venter, an American genomic researcher
and entrepreneur; and John Sulston, an English Nobel Prize winning scientist and campaigner against the patenting of human genetic information.


Sir John Sulston, who oversaw the UK's contribution to the Human Genome Project.
© Wellcome Images, made available under CC BY-NC-ND 2.0

In his book The Common Thread, Sulston describes the moment when he realised that Venter’s company (Celera Genomics) parallel work to sequence the human genome with greater speed than academic efforts: “…had made everyone realise the absolute necessity of the publicly funded teams working together”. Thus, necessity drove greater international effort, and on the 26 June 2000, the HGP consortium announced that it had assembled a working draft of the sequence of the human genome.

Competing public and commercial interests persist in scientific discovery and invention, especially in relation to genetic information. Recent attempts to patent human gene sequences have raised questions over whether a sequence of DNA is an invention or a discovery and have highlighted some of the challenges in assessing the patentability of biomedical developments. Witness the recent legal battle involving diagnostics company Myriad Genetics in the US over predictive genetic testing for susceptibility to breast cancer. The US Supreme Court judged that human DNA was a ‘product of nature’, a basic tool of scientific and technological work, thereby placing it beyond the domain of patent protection. Amongst other caveats, this judgment declared that certain forms of DNA (cDNA) were patentable.  

Will there always be a necessity to patent in this area of bioscience? Undoubtedly, but a balance needs to be struck. Necessity may drive invention but when it comes to Mother Nature, who decides? Come to TalkScience on 4 March to voice your opinion.


20 January 2014

Beautiful Science Preview

Add comment Comments (1)

Johanna Kieniewicz spills a few beans on the upcoming British Library exhibition

We are now just a month out from the British Library’s first science exhibition: Beautiful Science: Picturing Data, Inspiring Insight. Life in our team right now is a whirlwind of writing captions, finalising commissions, testing interactives and liaising with our press office. But all for a good reason. Opening February 20th, Beautiful Science will highlight the very best in graphical communication in science, linking classic diagrams from the Library’s collections to the work of contemporary scientists. The exhibition will cover the subject areas of public health, weather and climate and the tree of life, telling stories both of advances in science, as well as look at the way in which we communicate and visualise scientific data.


Picturing Data

Data is coming out our ears. From data collected by our mobile phones and movements about the city to the data acquired by scientists when sequencing genomes or smashing subatomic particles together, the quantities are vast. While a simple table of numbers is a form of data visualisation in itself, our human ability to scan, analyse and identify patterns and trends is limited.

William Farr, 1852, Report on the Mortality of Cholera in England 1848-1849

Whilst today we see a proliferation of data visualisation, it is hardly a new phenomenon, and might even be considered a rediscovery of the ‘Golden Age’ of statistical graphics of the late 19th century. Like today, the Victorian period featured a confluence of new techniques for data collection, developments in statistics and advances in technology created an environment in which data graphics flourished. In Beautiful Science, we highlight a number of graphics from this period—some of which are well known, others of which may prove to be more of a surprise, such as this piece on cholera mortality by epidemiologist and statistician William Farr.


Inspiring Insight

The very best visualisations of scientific data, do not merely present it, but also inspire insight and reveal meaning. Data visualisation is both a tool through which we can analyse and interpret data, but also functions as a method by which we communicate its meaning. It is most powerful when it does both.

Circles of Life, Martin Krzywinski, 2013

In curating Beautiful Science, we were keen to highlight the ways in which the visualisation of data is integral to the scientific process, as well as the way cutting edge science is communicated. The Circos diagrams used to display genomic data do this very well. In Beautiful Science, you can examine a comparison of the human genome with both closely and distantly related animals. Here, you see that we are quite closely related to the chimpanzee (though we presume you knew that already). But what about a chicken or a platypus? You’ll have to come to the exhibition and see for yourself.



Beautiful Science

Should we impose an aesthetic upon the presentation of scientific information? Or is beauty indeed in the eye of the beholder? We take a rather agnostic position in this debate, and rather seek to inspire the exhibition visitor with both intriguing images and inspiring ideas. What is clear, however, is that scientists should take care and be thoughtful when producing their graphics. In a world where research impact is ever more important, producing images that compellingly communicate discoveries is of increasing importance.

NASA/Goddard Space Flight Center Scientific Visualization Studio

Compelling imagery is something at which the NASA Scientific Visualisation Studio excels. Something like a model of ocean currents might potentially be quite dry and dull. Originally developed for a scientific purpose, would not colour coded vectors increasing and decreasing size not do the job? With a leap of insight, they developed a visualisation that is both informative and inspiring. We hope you will watch it with awe in the entry to the exhibition, tracking the Gulf Stream as it moves water northwards towards the British Isles, bringing us our temperate climate.


Even More Beautiful Science

A fantastic programme of events will also accompany the exhibition. From serious debate to science comedy shows, competitions, workshops and family activities, we’ve developed a programme that’s designed to make you think. Please join us!


Beautiful Science runs from 20 February to 26 May, 2014, is sponsored by Winton Capital Management, and is free to the public.

06 December 2013

Visualising Research

Add comment Comments (0)

This week we are excited to announce the launch of a data visualisation competition (and workshop), sponsored by the AHRC and BBSRC

We talk quite a lot about data on the Science Blog and have previously highlighted the role we are playing in helping researchers to discover, access or cite scientific data. But working at the British Library means we have the fantastic opportunity to bring our collections and contemporary research to the wider public through our exhibitions. Earlier in the year we gave you a taster of Beautiful Science  - an exhibition launching in February 2014, that will explore scientific data visualisation from past to present. Some famous historical names, such as Florence Nightingale, knew the power of displaying data – her iconic diagram (pictured) not only enabled any viewer to quickly grasp the meaning but led to changes in the way those injured in war were treated.


As part of our celebration of all things data and our exhibition, we have been working with the Arts & Humanities Research Council and the Biotechnology & Biological Sciences Research Council on a competition that challenges entrants to bring UK Research Council data to life. An added bonus - we hope - is that the competition aims to encourage people from different disciplines to work together, since presenting complex data not only requires mathematical, computing or scientific skills but strong expertise in art and design. A key criteria for the judges will be whether the entries convey the meaning to a wide audience and so they will be looking for that combination of valid data that tells a compelling story.

Around £3 billion of Government funding is apportioned annually between the seven UK Research Councils, which are responsible for different discipline areas. The Research Councils then distribute that funding to their various communities on the basis of applications made by researchers, which are subject to independent, expert peer review. Applications are judged by considering a combination of factors, including their scientific excellence, timeliness and promise, strategic relevance, economic and social impacts, industrial and stakeholder relevance, value for money and staff training potential. Until recently it wasn’t easy to combine funding data from different Research Councils or to explore how it was distributed across the country. And the finer grained detail, while it may have been available from an individual Council, was difficult to tease out or integrate. Behind the scenes, Research Councils worked together to make details of the research they fund available from one place. The culmination of that commitment is Gateway to Research - a database that anyone can use. The data is available programmatically and under an open government licence which means that anyone is free to interrogate it – you can extract it all, download it to your own systems, apply your own analysis tools and generally think of things to do with it that no one else has done before.

The challenge of the competition is to use the Gateway to Research data to tell a compelling story that anyone will be able to understand. While designers, graphic artists, software developers and programmers may have a particular interest, anyone and everyone is invited to produce a visualisation (on a website) that will show how this public funding contributes to research in the UK. Details of the competition are here. Entries forms will be available from 27 January 2014 and the closing date is 21 March 2014. Our judges include Jackie Hunter, Chief Executive, BBSRC, Katy Borner, Victor H. Yngve Professor of Information Science, Indiana University and Guardian Digital Agency.

On 24 January 2014, we are holding a workshop at the British Library for anyone who wants to find out more. Please register if you want some inspiration, information about the Gateway to Research database and to meet potential collaborators. Representatives from the AHRC and BBSRC will be there on the day, as well as data visualisation evangelists (Guardian Digital Agency) and developers (Cottage Labs) who have worked with the data. We will also have Andrew Steele from Scienceogram who is using public data to make the case for science in the UK.

Lee-Ann Coleman

08 November 2013

Why not cite data?

Add comment Comments (1)

Rachael Kotarski, our Content Expert for scientific datasets, explains why citing data as well as the article is the way forward.

In a previous post, Lee-Ann Coleman looked at citations in science, asking what should be cited, and what a citation means. The answers to these questions are not necessarily simple, but one response we have been hearing (and that we support), is that data needs to be cited.

Citing data not only gives credit to those who created or gathered it, but can also give some kudos to the repository that looks after it. Despite the fact that data is also key to verifying and validating research, it is not yet standard practice to cite it when writing a paper. And even if it is cited, it is rarely done in a way that allows you to identify and access that data.

Citation should connect the literature to its data foundations. Image source: Shutterstock.

As part of the Opportunities for Data Exchange (ODE) project, we investigated data citation and the ways in which data centres, publishers, libraries and researchers can encourage better data citation.

What does ‘better data citation’ look like and how do we encourage it to happen? We examined three aspects of current practice in order to answer this question:

  • How data is cited?
  • What data is cited?
  • Where is data cited within the article?

How to cite
A data citation needs to contain enough information to find and verify the data that was used, as well as give credit to those who spent considerable time/money/effort generating or collecting the data. The DataCite recommended data citation is just one example of how to include details that support these aims (and it’s pretty simple!):

Creator (publication year): Title. Publisher. Identifier.

What to cite
Data are not necessarily fixed, stable or homogenous objects, so citing them can be considerably more complicated than for articles. It is important for testing reproducibility that regardless of subsequent changes to the data or subsets of it, they are cited as used. Aspects such as the version used or date downloaded should also be encapsulated in the citation, where necessary. Linking users via an identifier (such as a DOI as used by DataCite) to the location of that exact version or subset of the data is also important. An example of citing a specific wave of data from GESIS demonstrates this:

Förster, Peter; Brähler, Elmar; Stöbel-Richter, Yve; Berth Hendrik (2012): Saxonian longitudinal study – wave 24, 2010. GESIS Data Archive, Cologne. ZA6242 Data file version 1.0.0, doi: 10.4232/1.11322

Where to cite in the article
Where you cite data in the article may depend on the form of the data being cited. For example, data obtained via colleagues but not widely available may be best mentioned in acknowledgements, and data identified by accession numbers could be cited inline in the body of the article. But the interviewees who participated in the ODE study largely advocated citation of datasets in the full reference list, to promote tracking and credit. In order to do this, data needs a full, stable citation, which also depends on reliable, long-term storage and management of the data. Of course publisher requirements play an important role. But that’s a post for another day!

These are the three ‘simple’ steps to better citation of data, but there are still cultural and behavioural barriers to sharing data. In the ODE report we concluded that the whole community - researchers, publishers, libraries and data centres - all have a role in promoting and encouraging data citation.


The recent Out of Cite, Out of Mind report has since updated and greatly extended the ODE work, with an excellent set of first principles for data citation:

CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal vol. 12 p. CIDCR1-CIDCR75 doi: 10.2481/dsj.OSOM13-043

I recommend it – and encourage anyone thinking about citing their data (or anyone else’s) to stop thinking and start doing it.


02 August 2013

Show me more data

Add comment Comments (0)

Expanding on last week’s post on open data, today we look at our role in DataCite and how we are supporting the UK research data community.

The British Library is one of the founding members of DataCite, an international organisation bringing together the research data community to work collaboratively on the challenges of making research data visible, accessible and citable. DataCite is a registration agency for Digital Object Identifiers (DOIs), and the British Library is an allocating agent on behalf of DataCite. We provide an infrastructure that supports simple and effective methods of discovery and access. We work with data centres and other organisations to enable them to assign to DOIs to data. 

Since 2011, the Library’s Science team has been developing DataCite services in the UK. In practical terms, this has involved working with a range of organisations that create, manage or archive data, setting them up on the system, so that they can assign DOIs (a process known as minting - we even have mints, pictured, to prove it!), Mints1working on the DataCite metadata schema and ensuring our community’s needs are represented within the global DataCite membership. To support this work, we have organised a series of workshops, exploring the various aspects of data citation, as well as the requirements for working with DataCite and DOIs.

We’ve covered a lot of topics in the last year. From the basics - such as what does minting a DOI actually mean and how do I do it? (you can find out how in our YouTube video) and what should I put a DOI on - to more complex subjects such as how do I deal with sensitive data or different versions? We’ve had lively discussions at all of the workshops, supported by excellent presentations from colleagues who are working with research data. You can see the full list of topics covered and presentations from the workshops on our webpages


In addition to running workshops, we’ve been out and about talking to colleagues in universities - discussing how they can use the service as well as hearing about the challenges they face in managing research data. These meetings and workshops have provided opportunities to explore how we can work together – across a range of institutions and disciplines. What is certain and, I think reassuring for everyone, is that no one has all the answers – processes and practices are evolving but it is encouraging that we can work on solutions together. If you’d like to talk to us or arrange a workshop for your organisation, then do get in touch (

We’ll be coming back to issues in research data management and data citation in future posts but for now we’re looking forward to a week of discussion and debate at the Research Data Alliance meeting and DataCite Summer meeting in September.

Elizabeth Newbold

26 July 2013

Show me the data

Add comment Comments (0)

Libraries just worry about books, right? Wrong! We also worry about data. If you want to provide a useful service to the research community (and that community includes anyone who wants to do research), you need to think about all the information, including research data sets, that people may need. But we recognise that isn’t always easy to do.

The Royal Society’s 2012 report on science as an open enterprise focused on the value of research data and, at a recent meeting, Professor Geoffrey Boulton who led the study noted that ‘open science’ approaches are not new. Henry Oldenburg, the 17th-century German natural philosopher and first Secretary of the Royal Society, ensured all his scientific correspondence was written in vernacular (and not Latin, as was the norm), and that all his observations were supported by supplementary evidence (and not just assertions).

Thus Boulton reflected that while the value of supporting reproducibility and providing an evidence base had been recognised very early on, many journals no longer published the results in tandem with the underlying data. Fortunately the technology is now allowing many publishers and others to provide better access to the data.

In some areas of science there has been a culture of data sharing. If researchers are sequencing DNA from any species they are asked to submit it to GenBank: a database established to ensure that scientists have access to the most up-to-date and comprehensive DNA sequence information. Most publishers require the researchers to provide evidence that they have added their data to GenBank before publication. So, if you work on sequencing DNA, getting access to other people’s data is relatively easy – but that is not necessarily the case for many other areas of science.

DNA sequence shutterstock_53986852

The reasons are complex. In many areas of research, there are no established or permanent stores for the many types of data that are produced. For researchers, the data they collect or generate is the primary output of the research and therefore comprises their intellectual capital. Many researchers are concerned about receiving appropriate credit for their efforts and that may not happen if they share their data with all and sundry. But that objection could be tackled if researchers could cite data – and thereby be recognised for their contribution.


The British Library is a founding member of an organisation called DataCite which, as the name suggests, was established to enable data to be cited. We have been working with a range of organisations responsible for managing, storing and preserving data from a variety of areas – everything from archaeology to atmospheric science – to enable them to attach a ‘digital tag’ to data that allows it to be referenced. This tag is ‘persistent’, so that even if the data is no longer available, it will be possible to find out what has happened to that resource. We hope when someone says – ‘show me the data’ – we will have played a role in making that possible.

Lee-Ann Coleman and Allan Sudlow