THE BRITISH LIBRARY

The Newsroom blog

6 posts categorized "Metadata"

24 September 2015

Mining the FT

Add comment Comments (0)

We're pleased to announce a partnership with the Financial Times to open up its archives to new kinds of research. The business news daily newspaper has been running since 1888, and has a wealth of information on national and international economic news, and in recent years reporting on general news, the arts and society. Its digital archive is available in the standard search-and-browse manner to institutional subscribers via Cengage Gale, but the newspaper is interested to explore different ways to makes its archives available, with an emphasis on what can be done with its data.

FT1893

The full digital archive runs 1888-2010 and comprises 903,029 pages from 37,464 print editions. However, the collaboration is starting off with a relatively small amount of content, which may expand later. The FT has agreed a licence which permits use of the data for academic research purposes, either onsite at the British Library or via controlled remote access. 

Four complete sample years of FT pages images (as JPEGs) and data (XML) are being made available to research teams: 1888, 1939, 1966 and 1991. The licence runs to the end of 2015, when we will review what has been learned and will see how access and use may be extended thereafter. So the sample years would be ideal for researchers developing data-driven projects who need some test content to scope future plans, or to test tools or applications that they may be developing.

Anyone who is interested should get in touch with Luke McKernan, Lead Curator News & Moving Image at the British Library, who can provide further details. Research teams may also be interested be to take part in the Library's first news hackathon, scheduled for November 16th, which will include FT data alongside data derived from the Library's own news collection. More news on this will be published soon.

The collaboration with the Financial Times is one part of emerging plans for British Library news data. The structure of news content offers numerous opportunities for analysing, interrogating, visualising and rethinking what news archives today, as well as creating new kinds of newspaper and and other news media history. We held a news data workshop on September 7th, where we brought together researchers, developers and content owners to look at ways we might develop plans for news data that would best benefit researchers. There's a report on the workshop on our Digital Scholarship blog.

We will hope to be issuing news on further news archive datasets that we can make available for research in the near future.

 

29 August 2014

St Pancras Intelligencer no. 33

Add comment Comments (0)

Your humble blogger is taking a rest from Newsroom duties for a couple of weeks while he heads off on vacation, so there will be no St Pancras Intelligencer next Friday, nor the next. So make the most of this week's select gathering of news about news, and look out for plenty more from the Newsroom blog on our return. 

Gdelt

GDELT comparison of 'conflict events' in Germany 7/8/2009 – 9/6/2009 (green left of black line) and 9/6/2009 – 11/5/2009 (green right of black line) compared with Egypt (red) - see http://blog.gdeltproject.org/towards-psychohistory-uncovering-the-patterns-of-world-history-with-google-bigquery/

Can computers replace historians?: Rory Cellan-Jones at BBC News notes the work of the GDELT project ('a global database of society'), which has collected has collected media reports of events from sources in more than 100 languages covering a period of 35 years. It is using the data to draw out the pattern of world events with the sort of analysis that would have taken historians years to compile in the traditional manner. News looks like it is the first draft of history after all.

'Daily Mail' solves Internet paradox: Michael Wolff at USA Today looks admiringly on how the Daily Mail created the separate beast of Mail Online and created the world's 'most-trafficked' English-language newspaper website.

Open journalism also means opening up your data, so others can use and improve it: Gigaom's Mathew Ingram (never a week goes by but we don't find ourselves recommending his writings) calls for journalists to free up their data - because it's good for journalism.

How the news upstarts covered ISIS: DigiDay examines how news' new kids on the block, including Vice, BuzzFeed, Mashable, International Business Times and Vocativ have been beating newspapers at their traditional game when it comes to coverage of the rise of ISIS.

Bellingcat

https://bellingcat.com/resources/case-studies/2014/08/22/gun-safety-self-defense-and-road-marches-finding-an-isis-training-camp/

Gun Safety, Self Defense, and Road Marches – Finding an ISIS Training Camp: Talking of which, news coup of the week was undoubtedly Elliott Higgins' kickstarter-funded citizen journalism site, Bellingcat, which showed how to identify the location of an ISIS training camp using Google Earth and Bing Maps.

Can the UK’s broadcast news providers keep doing more for less?: Former ITN chief turned journalism academic Stewart Purvis looks at the struggles broadcasters have, caught between the demans of innovation and tradition:

At the opposite ends of the scale are the traditional TV news audience, predominantly over 55 years of age, and the 16-34 audience which is converting to or adopting online news use at a startling rate, especially since the arrival of smart phones and tablets ... whereas daily average TV viewing is currently three times higher among adults aged 55-plus than among adults age 16-34, the ratio is more like five or six to one when it comes to news. In the middle is the 35-54 audience which currently has a foot in both camps but whose future allegiance to TV news cannot be taken for granted.

Vice News sparks debate on engaging younger viewers: On the same theme, The Guardian looks at how traditional broadcasters such as the BBC and Channel 4 News are aiming to attract a generation at home on YouTube and social media. 

Is local TV vanity over sanity?:Media Week looks at how the plans are going for the launch of local television stations across the UK, and doesn't think that things are going too well.

Latestfashions

New Orleans newspaper page, from www.noladna.com

Old newspapers, new value: Printmaker J.S. Makkos writes a beautifully-illustrated piece for The Atlantic about making new products out of old New Orleans newspapers, and reminds us of old controversies about the disposal of surplus newspaper archives and the dangers of keeping only the grey images of microfilm. For more, see the New Orleans Digital Newspaper Archive.

The Times' newsroom set to ring with the sounds of typewriters once more: What fun - a speaker has been introduced into The Times newsroom at London Bridge, which relays the sounds of typewriters, recalling the newsroom of old. The intention is apparently to boost energy levels and encourage journalists to meet deadlines as the sounds of the typewriters rises to a crescendo. Ian Burrell at The Independent looks on, with not a little bemusement.

11 April 2014

St Pancras Intelligencer no. 13

Add comment Comments (0)

Welcome to the latest edition of the St Pancras Intelligencer, our weekly round-up of news about news - stories about news production, publications, apps, digitised resources, events and what is happening with the newspaper collection (and other news collections) at the British Library.  

Newsroom3April2014-13

The Newsroom

Opening day: So of course the British Library tops the week's news about news with the opening on April 7th of the Newsroom, its new reading room for news. Newspapers, television news, radio news and web news can now all be found in the one physical space - though for newspapers that means microfilm and digital for now, until the print papers become available again in the autumn. It all looks very beautiful - and has a lot more people in it than in this photo taken just before it opened.

Shift 2014: It's all been happening here this week, with Newsworks, the marketing body for UK national newspapers, holding its Shift 2014 conference at the British Library. The live blog of the event includes reactions to star turns such as the editors of The Guardian (Alan Rusbridger), The Independent (Amol Rajan) and The Telegraph (Jason Seiken) and Sir Martin Sorrell, chief executive of WPP. Jason Seiken's speech is here.

Here & Then: And there's more. The British Newspaper Archive, which provides digitised copies of British Library newspapers online, has issued a free iPhone app, Here & Then, with articles, images and adverts from the collection. Oh, and 135,000 pages were added to the BNA site in March.

What will yesterday’s news look like tomorrow?: Article of the week, by a mile. Adrienne LaFrance at Medium looks at the future of news archives, which focus on how they are catalogued and their data mapped for rediscovery in the future. "News organizations need to design archives that better mirror the experience of consuming news in real time, and reflect the idea that the fundamental nature of a story is ongoing".

The Press Freedom Issue: Contributoria, the community funded, collaborative journalism site, published a special issue on press freedom this month. Among the great articles available are Crowdfunding critical thought: How alternative finance builds alternative journalism, Court and council reporting - still a bedrock of local news?, Pirate journalism and The printing press created journalism. The Internet will destroy it. Read and learn.

News is still a man's world: A City University study reveals that male experts still outnumber female experts by a ratio of four to one on flagship radio and TV news programmes.

Has Thompson at the NYT given newspapers a new way to pull in extra cash and readers?: Mark Thompson, former BBC DG and now heading the New York Times, may have had a big idea - New York Times Premier, an added subscription to the online version of the newspaper, with additional content, offers (two free ebooks a month), even special crosswords. The Drum speculates.

Upvoting the news: long, engrossing article by Alex Leavitt for Medium on how news spreads across social media channels, with particular emphasis on Reddit.

The state of Egypt's news media: Al Jazeera's excellent news analysis programme The Listening Post looks at the "sorry state of journalism in Egypt".

Fracking

A sample 'card' from Vox.com

Three good things about Ezra Klein’s new site Vox, plus three challenges that it faces: The much-hyped Vox.com site, with celebrity news blogger Ezra Klein, launched on April 6th. Mathew Ingram at Gigaom says what he likes (especially the user-friendly 'cards' with background information to stories) then wonders how it will thrive.

Bristol Post editor baffled by fact that front page gay kiss costs thousands of sales: Press Gazette reports on what happened when Bristol Post editor Mike Norton decided to put same-sex marriage on his paper's front page.

'Video-checking' the Clegg and Farage debate: Fact-checking videos - where videos of speeches are analysed to see whether or not the statements made stand up - have been popularised by The Washington Post's Truth Teller. Now the fact-checking organisation Full Fact have done the same for LBC's Nick Clegg v Nigel Farage debate.

Peaches Geldof – was the coverage by newspapers, and TV, over the top?: Roy Greenslade ponders on what would have been proptionate news coverage for the sad death of Peaches Geldof.

More UGC, fewer photographers – and no paywalls:  Editors set out visions of future: Hold the Front Page reports on the Society of Editors Regional Conference, where likely changes to the regional newspaper world were set out: user-generated content, smaller offices, cover price rises,  no staff photographers, and no paywalls.

One easy, transparent way of making accuracy visible: open sourcing: George Brock argues that the way for news providers to build up trust is through links to source material - footnotes, sort of, though he prefers the term open sourcing. 

How some journalists are using anonymous secret-sharing apps: Using apps like Whisper and Secret to turn rumour into news.

We need to talk: Raju Narisetti, senior vice president of strategy at News Corp, poses 26 questions to ask news organisations about the move to digital. Fascinating insight into a business in transition.

 

04 April 2014

St Pancras Intelligencer no. 12

Add comment Comments (0)

Welcome to the latest edition of the St Pancras Intelligencer, our weekly round-up of news about news - stories about news production, publications, apps, digitised resources, events and what is happening with the newspaper collection (and other news collections) at the British Library.  

Bernardshaw

From The Poke via @jameshoggarth

45 local news stories that rocked the world: It started with Patrick Smith at Buzzfeed - now headlines from UK regional newspapers are fast becoming an Internet cult. The Poke collect 45 that show just why we love local newspapers so.

Against beautiful journalism: Thought-provoking article from Felix Salmon at the Reuter blog, who argues against the over-designed nature of some (mostly American) news sites. "Today, when you read a story at the New Republic, or Medium, or any of a thousand other sites, it looks great; every story looks great. Even something as simple as a competition announcement comes with a full-page header and whiz-bang scrollkit graphics. The result is a cognitive disconnect..."

How 3 publishers are innovating with online video: Journalism.co.uk looks at how Huffington Post, the Washington Post and BuzzFeed are taking different approaches to using video, as discussed at the FT Digital Media conference.

Harry Chapman Pincher: Perhaps the best-named journalist ever, certainly one of the most famous living British journalists, Chapman Pincher has turned 100 years old and is still writing. Nick Higham at BBC News profiles the man who became legendary for his espionage scoops.

Safeguarding the “first rough draft of history”: How pleasant to have a history of newspapers (with thank yous to the British Library for its newspaper preservation work from Sylvia Morris at the excellent Shakespeare Blog.

In praise of the almost-journalists: A fine piece by Dan Gillmor at Slate on the distinctive contribution to online news made by advocacy organisations such as Human Rights Watch and Cato Institute.

News Corp boss brands Washington Post journalists 'high priests': Not such good times for journalists of the old school. The Guardian reports how News Corp's Chief Executive Robert Thomson feels that the Washington Post's journalists have failed to embrace the transition to digital.

Apple Adds Talk Radio And News To iTunes Radio Starting With NPR: iTunes Radio gets its first non-music offering with this team up with NPR (National Public Radio), Techcrunch reports.

Journalists increasingly under fire from hackers, Google researchers show: ArsTechnica reports that news organisations are increasingly being targeted by state-sponsored hackers.

The Evolution of Automated Breaking News Stories: Is this the future of news? Technology Review reports on how a Google engineer has developed an algorithm, Wikipedia Live Monitor, that spots breaking news stories on the Web and illustrates them with pictures. Now it is tweeting them.

Debugging the backlash to data journalism: Data journalism has been all the rage, so inevitably there has been a backlash. Alexander Howard at Tow Center provides a good overview of the phenomenon, its strengths and its limitations.

Taming the news beast: The Newsroom blog goes to an International Society for Knowledge Orgaization event on news archives and news metadata, and comes back thoughtful.

London Live – capital's first dedicated TV channel – takes to the air: The Evening Standard-backed TV channel went live on March 31st. Meanwhile, Jim Waterson at BuzzFeed provides an entertaining history of the last time someone tried to launch a TV station called London Live.

The Guardian crowned newspaper of the year at Press Awards for government surveillance reports: Press Gazette names all the winners at the Press Awards. Meanwhile, former Guardian columnist Glenn Greenwald has won the University of Georgia's McGill Medal for Journalistic Courage.

German officials ban journalist from naming his son #Wikileaks. No comment.

02 April 2014

Taming the news beast

Add comment Comments (2)

Taming the News Beast was the striking title of a seminar held on April 1st by ISKO UK, the British branch of the International Society for Knowledge Organization. Subtitled "finding context and value is text and data" its aim was to explore the ways in which we can control the explosion of news information data and derive value from it. Much has been written about this explosion from the points of view of its producers and consumers, but less well known is the huge challenges it presents for those whose job it is to manage such data by working effectively with those who generate it. Few environments depend more on effective information management - while creating any number of problems for those trying to apply the rules - than the news industry today. Hence the seminar, which aimed "to share knowledge from the intersections of technology, semantics and product development".

Bbcnewslabs

Looking at the large lecture theatre at University College London filled to the brim with an enthusiastic audience of data developers, information scientists, journalism students and archivists, your blogger was moved to think that things were very different to when he spent his time at library college, many years ago now. Library and information studies, as they called it then, excited no one. Now, in the era of big data, it is where the big ideas are happening. Librarians (let's continue to give them their traditional name) are masters of the digital universe, or might aspire to be. Metadata is cool; ontologies are where it's at; semantics really means something.

The epitome of this excitement about information management - particularly news information - is the work coming out of BBC development projects such as BBC News Labs, which was introduced in a presentation by its Innovation Manager, Matt Shearer. News Labs has a a small team of people looking at better ways in which to manage news information, both within and outside the BBC. Its work includes the Juicer API (for semantic prototyping), the #newsHACK days for testing of product development ideas, entity extraction (extracting key terms from a mass of unstructured text), linked data (the important principle of working with data based on terms produced for DBpedia which other institutions can share in to create linked-up knowledge) and the Storyline ontology. There is particular excitement in trying to extract searachable terms for audiovisual media, through such technologies as speech, image and music recognition. If there is a pattern, the machines can be trained to recognise it.

Shearer's enthusiastic and sometimes mind-spinning presentation was matched by his colleague Jeremy Tarling, data architect with News Labs, who introduced Storyline - an open data model for news. Storyline is a way of structuring news stories around themes, based on a linked data model. The linked data bit is the way of ensuring consistency and shareability (they are working with other news organisations on the project). The theme element is about a new way of presenting news online which joins up stories in a less linear, more intuitive fashion. If you type in 'Edward Snowden' into a search engine you will get hundreds of stories - how to sort these out or to tell what the overarching narrative is that connects them all? If you can bundle the Snowden stories that your news organisation has produced around stories that go to make up the Edward Snowden theme - for example, Snowden at Moscow airport, Snowden finds job in Russia - you start to impose more of a pattern, and to draw out more of a story - the storyline, that is.

The nuts and bolts of this are interesting, because it requires journalists to tag their stories correctly, and listening between the lines one could see that some journalists were more willing and able to do so than others. But this sort of data innovation is happening, and it will have a dramatic impact on how news sources such as the BBC News website look in the future.

The energy, resources and ingenuity put into such work by the BBC can leave the rest of us overwhelmed, not to say humbled, but the remaining speakers had equally interesting things to say. Rob Corrao, Chief Operating Officer of LAC Group, gave a dry, droll account of how his consultancy company had been brought in to enable ABC News in New York to get on top of the "endless torrent" of news information coming in every day. This was a different approach to the problem, more of an exercise in logistics than simple data management policies. They managed the people and the work-processes first, then everything else fell into place. A content strategy was essential to understanding how best to manage the news process, including such simple ideas as prioritising the digitisation of footage of people likely to feature before long in obituary pieces. The more you know what the news will be in advance, the easier it is to manage it.

Ian Roberts of the University of Sheffield introduced AnnoMarket, a European-funded project which will process your text documents for you, or conduct analyses of news and social media sources. As automated metadata extraction tools start to make more of an impact (that is, tools which extract useful information from digital sources), so businesses are popping up which will do the hard work for you. Send them a large bunch of documents in digital form, and they will analyse them for you. Essentially it's like handing them a book and they give you back an index.

Finally Pete Sowerbutts of the Press Association talked about how the news agency is applying semantic data management tools to its news archives, so that with a bit of basic information about a subject (e.g. name, age, occupation), place or organisation and some properly applied tagging, a linked-up catalogue starts to emerge. People, places and organisations are the subjects that all of the projects like to tackle, because they are easily defined. Themes - i.e. what news stories are actually about - are harder to pin down, semantically speaking.

Beneath all the jargon, much of this was about tackling age-old problems of how best to catalogue the world around us. Librarians in the room of a particular vintage looked like they had seen all of this before, and indeed they had. Librarians' role in life is to try impose order on an impossibly chaotic world. Previously they came up with classification schemes and controlled vocabularies and tried to make real-life objects match these. Now we have automated systems which try to apply similar rules with reduced human intervention because of the sheer vastness of the data we are trying to manage, and because it is digital and digital lets you do this sort of thing. Yet real life continues to elude all of our attempts to describe it precisely. Sometimes they only way you are going to find out what a news publication is actually about is to pick it up and read it. But you still have to find it in the first place. 

An unanswered question for me was whether what applies to news applies to news archives. News changes once it has been produced. It turns into a body of information about the past, where the stories that mattered when they were news may no longer matter, because researchers will approach the body of information with their own ideas in mind, looking across stories as much as they may look directly for them. Our finding tools for news archives must be practical, but they must not be too prescriptive. ABC News may hope to guess what the news will be in the future, but the news archivist can never be so presumptuous. It is you, the users, who will provide the storylines.

 

 

14 March 2014

St Pancras Intelligencer no. 9

Add comment Comments (0)

Welcome to the latest edition of the St Pancras Intelligencer, our weekly round-up of news about news - stories about news production, publications, apps, digitised resources, events and what is happening with the newspaper collection (and other news collections) at the British Library. 

  Newsroom_issue desk

The Newsroom: Well of course we have to start with our own big news, which is that the Newsroom - the British Library's news reading room for news - opens at St Pancras on Monday 7 April. Is this first library space ever to be named after a blog...?

Named Entity Recognition for newspapers: Not the most exciting title for a blog post, but something worth reading closely by anyone interested in the future of digitised newspaper research. Europeana Newspapers explains how key terms can be extracted from newspaper text to enhance search and improve linkage of data.

News Archive Connected Studio: Build Studio: Keep an eye on what Peter Rippon and his team at the BBC are doing in planning how to open up their news archives. Much audience testing is coming first.

Why Twitter will never be a news organization: An interesting interview in Time with Twitter's Head of News, Vivian Schiller. "The Twitter news team is never going to pick and choose news stories, pick and choose winners. That’s not our job at all. But what we need to do is ... to make it easier for news organizations but also for our consumers to find what they’re looking for."

Why Twitter can't keep crashing: Mat Honan at Wired says that Twitter has become too important to how the world gains its news to have the crashes that it not infrequently does have. "It is the definition of breaking news. Twitter is increasingly the key place where information is born – stuff that maybe starts with one person but is important to the whole world."

Strictly algorithm: Really interesting article by Stuart Dredge at The Guardian on how the news we wants find us - through algorithms - and what this means for news, journalism and democracy.

Thomas Jewell Bennett: an early supporter of Indian Home Rule: Pat Farrington writes for the British Library's Untold Lives blog on her great-uncle, editor of the Times of India, some of whose letters are held here.

Russia’s information warriors are on the march – we must respond: Anne Applebaum at the Telegraph sets out to sort out the truth from lies in the Russian media's reporting of the crisis in Ukraine.

Channel4subtitle

Ah, sweet irony: For afficianados of errors in TV subtitles, much joy was brought about by this misinterpretation of Matt Frei talking about Russian Foreign Minister Sergey Lavrov on Channel 4 News.

BBC values: The BBC Academy interviews James Harding, director of BBC News, about values and maintaining audience trust.

Endangered species: At British Journalism Review Kim Fletcher argues that traditional newspaper editors are on their way out; content officers are on their way in.

Fleet Street editors of the past were little different from those of today: Talking of which, Roy Greenslade reviews Dennis Griffiths' Blum & Taff: A tale of two editors, on R.D. Blumenfeld and H.A. Gwynne, Fleet Street greats from another age.

Why venture capitalists are suddenly investing in news: Adrienne LaFrance at Quartz looks at why the investment money is pouring into the new kids on the news block: Buzzfeed, Upworthy, Vice etc. As one interviewee puts it: "“They are all technology companies first ... They understand how people utilize technology and how to present and create content."

Journalism startups aren't a revolution if they're filled with all these white men: Emily Bell looks at the somewhat familiar make-up of some supposedly cutting edge news start-ups.

Robot reporters and the age of drone journalism: And finally, look out for Emily Bell's lecture on how new technogies are driving the future of journalism, at the British Library on 25 April.