Digital scholarship blog

18 December 2017

Workshop report: Identifiers for UK theses

Along with the Universities of Southampton and London South Bank, EThOS and DataCite UK have been investigating if having persistent identifiers (PIDs) for both a thesis and its data would help to liberate data from the appendices of the PDF document. With some funding from Jisc in 2014, we ran a survey and some case studies looking at the state of linking to research data underlying theses to see where improvements could be made. Since then, there has been some slow but steady progress towards realising its recommendations. Identifiers are now visible in EThOS itself (see image below) and a small number of UK institutions are now assigning Digital Object Identifiers (DOIs) to their theses on a regular basis. Many more are implementing ORCID iDs for their post-graduate students. We wanted to reignite the conversation around unlocking thesis data and see what was needed to progress it further.


On 4th December 2017, we ran a workshop to hear what progress is being made and what the remaining barriers are to applying persistent identifiers to theses and thesis data. We heard from both the University of Cambridge and the London School of Hygiene and Tropical Medicine, both of whom are assigning DOIs to published theses on a regular basis. They gave an outline of how they have got to this point, including the case made within the university to ensure DOIs were available for theses.

As institutions start to identify their theses with DOIs, we need to ensure that these identifiers are picked up and usable in EThOS. Heather Rosie (EThoS Metadata Manager) explained how the lack of any consistent identifier for theses up to this point hinders disambiguation – due to errors in titles and different representations of author names, we simply do not know many theses have been published in the UK. But Heather also highlighted what institutions can do to help ensure any available identifiers make their way into EThOS - by making sure they are available for harvest, especially via OAI-PMH.

Based on the morning’s presentations there was broad discussion around the remaining issues that institutions still have in applying their DOIs or ORCIDs to their published theses. These included barriers such as:

  • Low priority due to lack of buy in or interest from both researchers and institutional decision-makers. Interest could be increased by improving understanding of what PIDs are and what they can do, particularly the tangible benefits they provide
  • A single institution may use multiple systems to manage different pieces of information about its researchers and their outputs. This creates internally competing systems that overlap; uneven resource; and a lack of clarity about what details go where
  • Further technical barriers include having to rely on the suppliers of non-open source systems to make the appropriate changes. Where plug-ins for even open-source systems are developed at institution, the associated workflow might not be appropriate for all other users. Finally, technical support teams tend to be removed from Library staff
  • Sustainability of using the identifiers, especially in terms of cost.

The second half of the workshop looked towards both the future and the past: whether the British Library digitising its large collection of legacy theses on microfilm might be a way to make them available to users, but also to ensure they are digitally preserved and assigned persistent identifiers. Paul Joseph from the University of British Columbia (UBC) gave us a great example to consider here: they have digitised 32,000 (both doctoral and masters level) and made them openly available through their repository: assigning DOIs as they did so. A major concern for UK universities undertaking a similar endeavour is the inability to confirm that third-party rights have been cleared in the thesis. But under their clear take-down policy, it was interesting to hear that UBC find that they only receive 2-3 take-down notices per year.

The final discussions of the day covered community needs for the future. This included two topics carried over from the morning’s session, on how we make the case for wider application of identifiers to theses to researchers and senior management and what can be done to make technical integration and workflow changes possible or easier. We also dug down into the other persistent identifiers related to theses that would support the needs of the UK community (such as organisation identifiers and funding identifiers), the potential for the Library to mass-digitize theses and assign DOIs to them and the other steps that can be taken to break data out of the thesis.

Through these discussions we got a strong steer as to what we at the British Library need to do to help to support the community in using persistent identifiers as a way of encouraging greater availability of doctoral research. These include providing:

  • more advocacy for PIDs – for example to students & research managers. We heard that a message from BL goes a long way – ‘we have to ask you to claim an ORCID iD because the British Library says so’, or ‘DOIs are needed because national thesis policy says so’
  • metadata guidance for libraries. What we already provide is great but we could do more of it, e.g. best practice examples, support desk, engage with system suppliers on behalf of institutions
  • preservation of digital theses. This is urgently needed
  • a big piece of IPR work to give institutions the confidence to make legacy theses open access without express permission, including a press campaign to drive interest & support.

But it is not only the Library that attendees thought may influence developments. There was also a clear appetite for stronger mandates from funders to support the deposit of open theses and reduction of embargo periods. There was also interest in national-level activities such as a national strategy for UK theses or a Scholarly Communication Licence for theses.

It’s clear there’s still a lot to be done before we’re at a stage where we can rely on persistent identifiers to help us jail-break research data out of thesis appendices. But we’ll continue to work with the community on this through EThOS and DataCite UK. We hope to hold a webinar in 2018 to talk more about the outcomes of this workshop, but in the meantime you can direct any questions on this work to [email protected].

This post is by Rachael Kotarski, the British Library's Data Services Lead, on twitter as @RachPK.