THE BRITISH LIBRARY

Digital scholarship blog

21 July 2017

Russian Language Books Research Project by Nadya Miryanova

Finding digitised books in the Russian language in a collection of 65,000 books

Posted by Nadya Miryanova BL Labs School Work Placement Student, currently studying at Lady Eleanor Holles, working with Mahendra Mahey, Manager of BL Labs.

Background

Although there are 200 million items in the British Library, contrary to popular belief, only 1-2% of these items are digitised. The ‘Microsoft’ books are 65,000 digitised volumes - about 22.5 million pages, and they were published between 1789 and 1914; digitised in partnership with Microsoft. They cover a wide range of subject areas including topics such as philosophy, poetry and history and they include Optically Character Recognised (OCR) text from the millions of pages.

In discussion with Mahendra Mahey, Project Manager of BL Labs, we explored making a ‘sub collection’ from this larger set which will hopefully be of use to the library in the future. At first, I simply brainstormed possible ideas and looked at different possibilities for this project, and I thought that since 2017 celebrates a century since the Russian Revolution, I would do some research into the concept of ‘revolution’.

Revolution

Definition - A forcible overthrow of a government or social order, in favour of a new system.

Etymology - Late latin ‘revolvere’, meaning to roll back, which turned into the Old French or Late Latin ‘revolutio’, from which came about our contemporary English word ‘revolution’.

Revolutions date back to as early as 2730 BC, where there was a set rebellion against the reign of the pharaoh Seth-Peribsen of the Second Dynasty of Egypt. The most recent revolution actually happened only last year in 2016, when there was a Turkish coup d'état attempt.

About the Russian Revolution

The British Library have recently opened an exhibition perfectly capturing not only the events that took place in this particularly intense period in history, but also the atmosphere that was omnipresent at the time and on my very first day here at the British Library, I got the chance to explore and study this fascinating exhibition in great depth.

The Russian Revolution was initiated by Lenin and the Bolsheviks, who hoped to create a socialist government, and in 1917, they successfully dismantled Tsarist autocracy in the hope of making society less stratified. The revolution resulted in the rise of the USSR and in the words of Karl Liebknecht, “The Russian revolution was to an unprecedented degree the cause of the proletariat of the whole world becoming more revolutionary”. However, this revolution also led to months of social and political turmoil and provoked the tragedy of the Russian Civil War on an unforeseeable scale, in which 10 million lives were lost. The revolution also produced myths that entered the artistic and intellectual fabric of the modern world, which the exhibitions uncovers and investigates. Learn more about the Russian Revolution by booking your tickets for the Russian Revolution Exhibition at the British Library on the website http://goo.gl/FL9FFt.

Russian Revolution Poster
Russian Revolution Exhibition Poster at the British Library

As part of my research project, I also wanted to incorporate some of the other subjects that I had studied at GCSE, and so I thought this would be a brilliant opportunity to compare the Russian Revolution to the French Revolution, both French and Russian being subjects that I wish to at A-level. The French Revolution was a period of far-reaching social and political upheaval in France that lasted from 1789 until 1799, and was partially carried forward by Napoleon during the later expansion of the French Empire.

Below is a mind-map I made detailing the differences and similarities between the French and the Russian Revolution.

Russian and French Revolution Research
French and Russian Revolution Comparison

Although my initial focus for the project was revolution, we soon established that it was too specific a topic and it would be more beneficial to focus on something broader, that would be useful to a larger group of researchers.

I soon discovered that the Russian titles within the digitised collection had never previously seperated and categorised, and being a native Russian speaker, I thought that this would be a better avenue to go down and explore. This would be a project in commemoration of the 100th anniversary of the Russian Revolution, which would hopefully help researchers looking at books in the Russian language in the future.

Facts about the Russian Language

  • Largest European native language.
    • 7th most spoken language in the world.
  • There are only 200,000 words in the Russian language in comparison to 1,000,000 in English.
  • The stress pattern in a word can drastically change its meaning, e.g. :
    • я плачу  (emphasis on second syllable) - I pay.
    • я плáчу (emphasis on first syllable) -I cry.

Approach

My first task included examining a huge spread sheet containing information about the 65,000 books in the collection.

  • In order to make this task a little less daunting, I first used the ‘Filter’ function in the language column of my Excel spreadsheet, and selected the Russian language. As a result, I found 583 books in total that were written in the Russian Language.
  • I now had to think of a way to organise these books. The possibilities seemed endless, should I sort them into history books? Science books? Books about Russia?
  • In the end, I decided to establish two broad categories as a starting point, fiction vs non-fiction, as this seemed like a logical place to start.
  • In order to access the Russian keyboard, I went onto the site translit.net, which turns normal Latin letters into Cyrillic.
  • I typed in a Russian word, using the English keyboard, that related to one of my two categories, e.g. for non-fiction, I wanted to find history related books, so used the simple word ‘history’, which translates as история.
  • I then copied this word, and pasted it into my spreadsheet.
  • I used the filter function on the 'Titles' section, and this would hopefully produce a number of books that included the word history in their title.
Spread Sheet Screenshot
Screenshot of my spread sheet.


Challenges

In this project, I found that I had to overcome a number of difficulties.

  • In Russian, nouns can have up to 12 inflections and adjectives can have as many as 16. This clearly shows that looking up different versions of the same word was necessary.
  • Like I previously said, I first experimented with simple words, such as history. You would think that there would definitely be books relating to history lurking somewhere in a collection of nearly 600 Russian titles. However, when I conducted my search, the spread sheet had no results. Confused, I tried another simple word, and once again had no definitive results.

Scanning more closely through the list of books, I soon noticed that there were certain spellings and letters that I did not recognise. I decided to research this matter more closely, looking at the history of the Russian language, and found out that the Russian of the 19th century does not directly resemble the Russian language used today. Why? Because of the Russian Revolution, of course.

1918 Spelling Reform Research
Bolshevik Spelling Reform of 1918 Research, detailing the causes for the reform and the changes made to the Russian language

Suddenly, everything made a lot more sense.

This discovery meant that I had to change my approach a little bit, so rather than typing in the Russian words in the spelling that I knew today, I would have to go for a sort of hunt throughout the spreadsheet, looking for words in the titles of the books that could encompass a number of books. In a way, this made the process of my project even more interesting, despite the fact that it took longer.

As I mentioned in my previous blog, the majority of the Russian language books were actually non-fiction. As a result, I decided to create sub-categories for the non-fiction set, which can be seen in the speech-bubble I created below.

Non-fiction categories
Speech bubble containing non-fiction categories

To help me in this task, I decided to create a colour-coding system for classification, so that I could keep track of my progress.

  • Yellow=Classified
  • Purple= латиницa (latin letters)- quite often I found titles which where written in Russian but using latin letters. Purple also used for titles written in another language
  • Blue=unknown classification
  • Orange= near classification
Colour coding system
Screenshot of my spread sheet showing the colour coding system that I used.

Evaluation

In conclusion, I managed to categorise the Russian language books into two broad categories, fiction and non-fiction, and I created 25 sub-collections within the non-fiction category. This project has been extremely enjoyable to work on, and although there were many challenges involved in the process, I have learnt lots during my research journey. In order to improve this project, I would definitely say that more work needs to be done on splitting up the 'history' sub-collection of my non-fiction title, since it is very broad and covers political accounts, as well as books about Russian History. Additionally, I think that this project would also considerably benefit from undergoing a thorough check with curators, in order to help classify some of the books I have not organised into separate collections yet. 

Picture from Russian Book
An illustration from one of the Russian books, По Сѣверо-Западу Россіи, available in the digitised collections. Image can be accessed on British Library Flickr Commons.