Digital scholarship blog

1 posts from January 2022

26 January 2022

Which Came First: The Author or the Text? Wikidata and the New Media Writing Prize

Congratulations to the 2021 New Media Writing Prize (NMWP) winners, which were announced at a Bournemouth University online event recently: Joannes Truyens and collaborators (Main Prize), Melody MOU (Student Award) and Daria Donina (FIPP Journalism Award 2021). The main prize winner ‘Neurocracy’ is an experimental dystopian narrative that takes place over 10 episodes, through Omnipedia, an imagined future version of Wikipedia in 2049. So this seemed like a very apt jumping off point for today’s blog post, which discusses a recent project where we added NMWP data to Wikidata.

Screen image of Omnipediaan imagined futuristic version of Wikipedia from Neurocracy by Joannes Truyens
Omnipedia, an imagined futuristic version of Wikipedia from Neurocracy by Joannes Truyens

Note: If you wish to read ‘Neurocracy’ and are prompted for a username and password, use NewMediaWritingPrize1 password N3wMediaWritingPrize!. You can learn more about the work in this article and listen to an interview with the author in this podcast episode.

Working With Wikidata

Dr Martin Poulter describes learning how to work with Wikidata as being like learning a language. When I first heard this description, I didn’t understand: how could something so reliant on raw data be anything like the intricacies of language learning?

It turns out, Martin was completely correct.

Imagine a stack of data as slips of paper. Each slip has an individual piece of data on it: an author’s name, a publication date, a format, a title. How do you start to string this data together so that it makes sense?

One of the beautiful things about Wikidata is that it is both machine and human readable. In order for it to work this way, and for us to upload it effectively, thinking about the relationships between these slips of paper is essential.

In 2021, I had an opportunity to see what Martin was talking about when he spoke about language, as I was asked to work with a set of data about NMWP shortlisted and winning works, which the British Library has collected in the UK Web Archive. You can read more about this special collection here and here

Image of blank post-it notes and a hand with a marker pen preparing to write on one.

About the New Media Writing Prize

The New Media Writing Prize was founded in 2010 to showcase exciting and inventive stories and poetry that integrate a variety of digital formats, platforms, and media. One of the driving forces in setting up and establishing the prize was Chris Meade, director of if:book uk, a ‘think and do tank’ for exploring digital and collaborative possibilities for writers and readers. He was the lead sponsor of the if:book UK New Media Writing Prize, and the Dot Award, which he created in honour of his mother, Dorothy, and he chaired every NMWP awards evening since 2010. Very sadly Chris passed away on 13th January 2022 and the recent 2021 awards event was dedicated to Chris and his family.

Recognising the significance of the NMWP, in recent years the British Library created the New Media Writing Prize Special Collection as part of its emerging formats work. With 11 years of metadata about a born digital collection, this was an ideal data set for me to work with in order to establish a methodology for working with Wikidata uploads in the Library.

Last year I was fortunate to collaborate with Tegan Pyke, a PhD placement student in the Contemporary British Publications Collections team, supervised by Guilia Carla Rossi, Curator for Digital Publications. Tegan's project examined the digital preservation challenges of complex digital objects, developing and testing a quality assurance process for examining works in the NMWP collection. If you want to read more about this project, a report is available here.  For the Wikidata work Tegan and Giulia provided two spreadsheets of data (or slips of paper!), and my aim was to upload linked data that covered the authors, their works, and the award itself - who had been shortlisted, who had won, and when.

Simple, right?

Getting Started

I thought so - until I began to structure my uploads. There were some key questions that needed to be answered about how these relationships would be built, and I needed to start somewhere. Should I upload the authors or the texts first? Should I go through the prize year by year, or be led by other information? And what about texts with multiple authors?

Suddenly it all felt a bit more intimidating!

I was fortunate to attend some Wikidata training run by Wikimedia UK late last year. Martin was our trainer, and one piece of advice he gave us was indispensable: if you’re not sure where to start, literally write it out with pencil and paper. What is the relationship you’re trying to show, in its simplest form? This is where language framing comes in especially useful: thinking about the basic sentence structures I’d learned in high school German became vital.

Image shows four simple sentences: Christine Wilks won NMWP in 2010. Christine Wilks wrote Underbelly. Underbelly won NMWP in 2010. NMWP was won by Christine Wilks in 2010. Christine is circled in green, NMWP in people, and Underbelly in yellow.  QIDs are listed: Q108810306, highlighted in green Q108459688, highlighted in purple Q109237591, highlighted in yellow  Properties are listed: P166, highlighted in blue P800, highlighted in turquoise P585, highlighted in orange
Image by the author, notes own.

The Numbers Bit

You can see from this image how the framework develops: specific items, like nouns, are given identification numbers when they become a Wikidata item. This is their QID. The relationships between QIDs, sort of like the adjectives and verbs, are defined as properties and have P numbers. So Christine Wilks is now Q108810306, and her relationship to her work, Underbelly, or Q109237591, is defined with P800 which means ‘notable work’.

Q108810306 - P800 - Q109237591

You can upload this relationship using the visual editor on Wikidata, by clicking fields and entering data. If you have a large amount of information (remember those slips of paper!) tools like QuickStatements become very useful. Dominic Kane blogged about his experience of this system during his British Library student placement project in 2021.

The intricacies of language are also very important on Wikidata. The nuance and inference we can draw from specific terms is important. The concept of ‘winning’ an award became a subject of semantic debate: the taxonomy of Wikidata advises that we use ‘award received’ in the case of a literary prize, as it’s less of an active sporting competition than something like a marathon or an athletic event.

Asking Questions of the Data

Ultimately we upload information to Wikidata so that it can be queried. Querying uses SPAQRL, a language which allows users to draw information and patterns from vast swathes of data. Querying can be complex: to go back to the language analogy, you have to phrase the query in precisely the right way to get the information you want.

One of the lessons I learned during the NMWP uploads was the importance of a unifying property. Users will likely query this data with a view to surveying results and finding patterns. Each author and work, therefore, needed to be linked to the prize and the collection itself (pictured above). By adding this QID to the property P6379 (‘has works in the collection’), we create a web of data that links every shortlisted author over the 11 year time period.

Getting Started

To have a look at some of the NMWP data, here are some queries I prepared earlier. Please note that data from the 2021 competition has not yet been uploaded!

Authors who won NMWP

Works that won NMWP

Authors nominated for NMWP

Works nominated for NMWP

If you fancy trying some queries but don’t know where to start, I recommend these tutorials:

Tutorials

Resources About SPARQL

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian