How to turn 47,000 pages of old newspapers into meaningful information?
For a research group at the University of Bristol, the answer is: big computers and historical context.
Led by Nello Cristianini, Professor of Artificial Intelligence, the group digitised 47,000 pages of two Italian-speaking local newspapers from the city of Gorizia, using the facilities of FindMyPast, based at British Library in Boston Spa. Then they used optical character recognition (OCR) software to extract digital text, and finally compared it with the digital text of three Slovenian newspapers from the same place and time, to provide context.
Gorizia lies at the crossroads of the Latin, Germanic and Slavic-speaking worlds, and its population reflects this. Until 1918, it was known as G√∂rz, and was part of the Habsburg Empire, though latterly coveted by the young Kingdom of Italy. These last years before World War One were particularly notable, as the political and ethnic tensions within the empire and over its borders played out in the city itself. The two main linguistic communities, Italian and Slovenian, published their own newspapers, and the latter have been digitised by the Slovenian Digital Library. But until the Bristol University group started work, the Italian ones were preserved on microform alone in the Biblioteca Statale Isontina, which first collected the paper versions.
The Corso Giuseppe Verdi in Gorizia, early 20th-century postcard, reproduced in Sreńćko Gombańć, Brata Edvard in Josip Rusjan iz Gorice: zańćetki motornega letenja med Slovenci (Ljubljana, 2004) YF.2007.a.13615
The team, including computer scientists and a historian, carried out statistical analysis on the newspapers, looking at the frequency of different words or phrases. This process revealed the individual stories of thousands of people, but also the collective trends of a population in the years leading up to the War and the final days of Empire. As the city lies in a quiet corner of central Europe, now divided between Italy and Slovenia, many of these stories and trends had been forgotten until now.
Professor Cristianini says: ‚ÄúIn the space of a few decades, the town embraced new ways to communicate, such as the cinema and the telephone, along with new modes of transportation, like the car, the airplane, the bicycle and the train. Far from being a backwater in a decaying empire, this was a city with an eye on the future and an interest in new ideas ‚Äď including political ones. It was, however, also a time in which new tensions emerged along ethnic lines and a time of rapid change, with problems and anxieties that sound very familiar to the modern ear. It is incredibly fortunate that the collection of newspapers in the Biblioteca Isontina library survived so many threats. We get a glimpse of the last years of a world heading towards a new chapter in its history during a period that transformed it beyond recognition. We see new technologies, new ideas, new economic opportunities, new cultural challenges and problems.‚ÄĚ
Among the patterns the team extracted are timelines that pinpoint such significant events as the arrival of Halley‚Äôs Comet, the visits of the Emperor Franz Joseph, or the devastating 1895 earthquake in Ljubljana (then Laibach, capital of the Habsburg county of Carniola). Fascinatingly, they found that the earthquake was more noted in the Slovenian-speaking community than the Italian, since Ljubljana was already predominantly Slovenian-speaking itself and had less significance to the Empire‚Äôs Italians as a regional centre.
Other ground-breaking events in the city at the time included the construction of the new Transalpina/Bohinj railway, which carried tourists from Vienna to Lake Bled and further, but was also to be used for more prosaic reasons. Then, most glamorous of all, two local brothers named Edvard and Josip Rusjan were among the first aviators in the Austro-Hungarian Empire.
The team‚Äôs findings also highlight how the war transformed the city and its surrounding county into something entirely different. During the war the front lines crossed through Gorizia itself and the urban population was largely relocated. In 1918, Italy annexed it, and twenty years of fascism and then another war followed. After 1947, the border between Italy and Yugoslavia ran right through the former county, partly separating the city centre from some of its neighbourhoods. Until Slovenia joined Schengen in 2007, this border had real impact, leading to the growth of a ‚Äúreplacement‚ÄĚ city, Nova Gorica, on the Yugoslav/Slovenian side, while historic Gorizia became something of a backwater, isolated from its hinterland and feeling neglected by Rome.
Above: View of the Castle in Gorizia in 1917, showing First World War bomb damage, from Enrico Galante, Gorizia e i campi di battaglia dell'Isonzo et del Carso (Gorizia, ) 9084.aaa.10. Below: Gorizia Castle today (Photograph Janet Ashton)
The project, from scanning and indexing to in-depth analysis, combined methodologies from both library science and historical research, as well as employing mathematical expertise, and illustrates how digital humanities is bridging the traditional boundaries between disciplines. A full study of the project‚Äôs methods and its findings, ‚ÄúLarge scale content analysis of historical newspapers in the town of Gorizia, 1873-1914‚ÄĚ, by N. Cristianini et al., has recently been published in the journal Historical Methods.
Janet Ashton, WEL Cataloguing Team Manager