30 October 2013
Guess the journal!
Over recent months I’ve been working on-and-off with a collection of metadata relating to articles published since 1995 in journals the library have categorised under the ‘History’ subject heading. 382497 rows of data (under CC0) about publication habits in the Historical profession, which lend themselves to some interesting analysis and visualisation.
To recap from previous posts on this blog and on another, I started this work by extracting words which frequently occurred within journal article titles. Having filtered out words whose meaning was fuzzy (‘new’, ‘early’, ‘late’, ‘age’) or whose presence was not helpful (‘David’), I was left with this list of topwords (I’ve avoided ‘keywords’, I just don’t like the word at the moment):
africa america archaeology art britain british china chinese cultural culture development empire england europe france historical history identity life making medieval national policy political politics power revolution social society state study women world
Next I created a .csv where each row represented an occurrence of a one of these 33 topwords in an article title. This totalled 209210 rows; and though this was less than the total number of rows, as many titles contained more than one of these words some articles were represented more than once.
Before we get to the fun bit, there are a number of problems with the data that need pointing out:
- There are some odd gaps and declines in article volume for some journals around 2005. This isn’t due to actual publication trends, so we are working on why the data isn’t accurate – huge thanks to the Metadata Services team (especially Corine Deliot) for their hard work.
- The volume of English language titles smother the various English, Italian and – notably thanks to Zeitschrift für Geschichtswissenschaft – German titles, leaving us with very Anglophonic data. I’d like to do some translating, but for now I’ll restrict myself to trends in English language articles.
- The data isn’t smoothed by articles per journal issue (or articles per journal per year), thus ‘power’ journals are created on sheer volume of output alone (and, as we all should know and should hope to be the mantra of future academic publication, less can be more…).
- The data includes reviews, though this isn’t necessarily a bad thing as it adds book titles to the list of titles mines (hence why ‘David’ is one of the unfiltered topwords).
- Some words have multiple meanings (china) or are ill-suited to simple text mining (art), but then corpus linguists have known this for years.
- Some journals in the data are not really history journals, but rather politics and current affairs publications with a sprinkling of historical content. Archaeology is similarly problematic, but I’ve left these journals in for now out of a sense of GLAM solidarity.
Despite all of this, I’d like you to play a game of guess the journal from a network graph; a network graph representing data for the 30 highest ranking English language History journals in terms of article volume published between 1995 and 2013. On one hand you doing this will help me validate that my data – and this particular way I’ve chose to represent it (a force-directed ‘Force Atlas’ graph generated using Gephi) – has some value; Adam Crymble has a nice example of how this can be useful. On the other it should be a bit of fun.
- That each number on the network represents a journal name,
- that each word within square brackets is a topword from an article title,
- that the thickness of the line between the word and the number represents the occurrence of that topword in the numbered journal,
- and that the colouring represents the group (or modularity) the numbered journal has been assigned to based on the structure of the network;
can you guess which number the following journal is represented by? (Or is this whole thing meaningless?)
- English Historical Review
- International Journal of African Historical Studies
- International Journal of Maritime History
- Journal of American History
- Journal of Asian Studies
- Journal of Social History
To start of you off, I’ll gift you that American Historical Review is number 34 – right at the heart of the network, not surprising given the volume of output. I’ll also give you a little derived data to help you make up your mind.
Answers in the comments please!