Digital scholarship blog

4 posts from April 2013

30 April 2013

Biohumanities Symposium at University of Maryland

Shared Horizons Logo

Dna2 scrnsht


An excellent and exciting symposium took place just over a fortnight ago at the Maryland Institute for Technology in the Humanities (MITH): Shared Horizons: Data, Biomedicine and the Digital Humanities, 10-12 April 2013. The twitter tag #DHbio succinctly signifies its outlook.

It felt like a special moment on the timeline of interdisciplinary research, and as its title hints it brought together active researchers from digital humanities and art technology centres, biochemistry laboratories, bioinformatics computational units, complex science and visualisation institutes, and notably medical and national libraries.






The conference was strengthened by the presence of senior representatives from research councils in medicine, arts and humanities and biology in both the UK and the USA. A sense of occasion was further heightened by the reception at the Deputy’s Residence of the UK Embassy at Washington DC. The British Deputy Head of Mission Mr Philip Barton welcomed us by noting that one of the two key priorities for him and his staff based in the USA is the promotion of collaboration between US and UK researchers. The reception was supported by Research Councils UK.



The University of Maryland was certainly a suitable venue for not only does MITH bring technology and humanities together with a ground-breaking verve unsurpassed anywhere, but the university is the base of Kari Kraus who has written one of the definitive papers on the interface between biology and humanities, adopting the term ‘biohumanities’ in her writings: Conjectural Criticism: Computing Past and Future Texts. It is one of those papers that rewards the reader who returns to it from time to time.


Infectious Reporting scrnsht  

One way to characterise the symposium is simply to note the first and the last of the talks. The first was by a historian, Tom Ewing, who has been employing segmentation and network analyses to understand the relationship between newspaper reporting on influenza and the spread of the disease virus itself. The last was by a mathematical physicist, Simon DeDeo, who highlighted coarse-graining, renormalisation and semantic analysis in identifying the key junctures in the history of the Old Bailey’s (Central Criminal Court of England and Wales) verdicts over the centuries, a possible route to assessing judicial procedures.


Diachronic Informatics


The phrase ‘interdisciplinary research’ points to an aspiration which is often expressed but truly attained less commonly. It was not used as much as you might expect at the symposium, probably because the significance of the conjunction of biology and humanities was already well understood by the people attending. Instead of bemusement and hesitation there was a beaming sense of anticipation.

Sometimes when technology and humanities are brought together in the same sentence it reflects a desire to apply scientific technologies and computer techniques for the benefit of humanities scholarship. This is an important and longstanding goal. John Unsworth of Brandeis University in his opening address reminded us just how long humanities researchers and historians have been using computers: since the 1940s with the Jesuit scholar Roberto Busa (see for example a paper by Willard McCarty).

But what made this meeting different from my perspective was the opportunity it presented to explore the intellectual interface of science and humanities. The use of technology in a new context often produces novel and overlapping understanding in the course of the research, but some research goes further and actively aims to find shared and parallel concepts: shared horizons, indeed.

One person whose research has certainly done so was the keynote speaker: David B. Searls of the University of Pennsylvania.

The title of his talk was “With a Wild Surmise: Intimations of Computational Biology in Keats, Carroll, and Joyce”.

Wild Surmise

It began with reference to the poet John Keats, making a comparison between, on the one hand, the Romanticism and its reaction to the rationalism of the Enlightenment, and, on the other hand, some of the changes occurring in science with the emergence of very large datasets, systems biology and computational modeling. It noted an increasing interest in understanding whole systems and complexity combined with a broad exploration of information with less emphasis on specific hypotheses. Of course, scientists have always been drawn to feasible observations and available data. In the words of the late British immunologist Sir Peter Medawar, Science is the Art of the Soluble; but it is clear that some careful statistical thought needs to be given to the way that big datasets are being compiled and used widely. (A brief presentation by Sabina Leonelli of the University of Exeter contemplated a similar philosophical theme.)

A riveting portion of the talk mapped the extraordinary demonstrations of styles of writing and esoteric sublanguage by the author James Joyce to the challenges found in modern natural language processing and text mining. 

The part of the talk that had the greatest resonance in drawing comparisons with biology was that which addressed Lewis Carroll. It is closest to the bioinformatic research that Professor Searls has conducted over many years. A very enjoyable and informative paper is entitled: “From Jabberwocky to Genome: Lewis Carroll and Computational Biology”.



Alice Manuscript

Images of Alice from British Library collection items: a Shower of Cards and
a manuscript "Alice's Adventures Under Ground"


This paper looks at the work of the mathematician Charles Lutwidge Dodgson otherwise known as Lewis Carroll, the writer of Alice’s Adventures in Wonderland (1865) and Through the Looking Glass, and What Alice Found There (1871). In reflecting on Carroll’s delight in wordplay and combinatorial nonsense, Searls playfully shows some parallels with bioinformatic and evolutionary concepts such as mutation, recombination, the occurrence of indels (insertions and deletions), segmentation and even sequence alignment.

Examples include the derivation of the word SLITHY in the poem Jabberwocky from a recombination of the words SLIMY and LITHE; and likewise the word CHORTLE obtained by recombining CHUCKLE and SNORT is another of Carroll’s neologisms. By contrast the word BRILLIG (said by Humpty Dumpty to mean four o’clock in the afternoon when broiling things for dinner begins) might be seen as a mutational product of BRoIL[L]InG. Searls observes that Carroll’s puzzles known as Syzygies, seem to have “anticipated key elements of multiple alignment, minimum distance alignment, and local alignment that are now central to biological sequence analysis”.



These two diagrams are from the article published by David B. Searls in the Journal of Biological Computation: From Jabberwocky to Genome: Lewis Carroll and Computational Biology

As with nearly all play there is a serious side to these explorations that is made plain in other papers by Searls, with titles such as “The linguistics of DNA”, “The language of genes” and “A primer in macromolecular linguistics”.

Along with noting the many linguistic metaphors that have become part of molecular biology: genetic code, gene expression, reading frames, transcription of DNA into RNA, and translation of RNA into proteins, with some enzymes editing RNA, Searls and colleagues have argued for many years that a number of bioinformatic techniques are akin to those in linguistics, and understanding of the genome and biological sequences can be enhanced through careful examination of linguistic formalisms and methods.

The approach is proving useful in understanding structural aspects of nucleic acids (DNA and RNA) as well as the proteins that are produced; and has been extended to the identification and understanding of genes themselves, as informational entities. The Chomsky hierarchy of language has played a prominent role in the approaches outlined, whereby the syntactic nature of a sentence may be depicted as a branching structure (actually, a kind of tree visualisation; see earlier blog posts on 3D trees and phlyogenetic visualisation). (Noam Chomsky is also known for the theory of a universal grammar which is the subject of debate; but no matter where one stands on that topic, the Chomsky hierarchy is a beautifully effective way for elucidating and visualising aspects of language, as illustrated in "The linguistics of DNA", American Scientist 80(6): 579-591, 1992, available from JSTOR.) 

A new avenue was brought to the bioinformatic way of thinking by a presentation at Shared Horizons on the study of metre in Urdu poetry through sequential analysis, a collaboration between an Urdu expert (Sean Pue) and a researcher of microbial ecology (Tracy Teal) and their genomics colleague (Titus Brown).

Urdu scrnsht


The UK was well represented at the symposium. Not only was I able to give an invited talk on behalf of the Department of Digital Scholarship at the British Library but delegates included academics from University of Cambridge, University of Exeter, University of Manchester and Imperial College London.

Andrew Prescott, Head of Digital Humanities at King’s College London (and former Curator of Manuscripts at the British Library), did some quick and proactive thinking, gathering together three prominent speakers from the UK for an impromptu, brief but richly useful session, including a presentation by Christopher Howe of University of Cambridge on the use of phylogenetic methods to study manuscripts.

This blog post has barely touched on the symposium and its outcomes. I hope to write more another time. In the meantime some of the presentations can be found on the Shared Horizons website.

Many thanks to Professor Neil Fraistat, Director of MITH, and his colleagues, especially Jennifer Guiliano and Trevor Muñoz.


 Jeremy Leighton John, @emsscurator



19 April 2013

This is not a post about MOOCs this is a post about learning

Back in Summer 2011 when I first began to notice the disruptive word 'digital' preceding the comfortable (though perhaps under-theorised) word 'humanities', the two together leading to capitalisation and the acronym DH, I was uncertain quite where to seek knowledge and understanding of this phenomenon. The traditional silos of knowledge were little help. Ket entries into 'the canon' as we now know it would not arrive until November (Ramsay's Reading Machines) and the following January (Gold's Debates in the Digital Humanities), and my search term prowess had yet to lead me to Moretti's superb (though at the time unhelpfully titled) Graphs, Maps, Trees: Abstract Models for a Literary History (2005). So, to learn more about DH I started reading examples of what I am currently writing - blog posts - and by following the hyperlinks within I came across many authors whose names (and amended blog posts) would eventually surface in Gold's DDH: Mark Sample, William Pannapacker, Bethany Nowviskie, Dan Cohen, Trevor Owens, Matthew Kirschenbaum, Kathleen Fitzpatrick, Alan Liu.

Debates in the Digital Humanities is out photography courtesy of Flickr user BryanAlexander / Creative Commons Licensed

So far, so standard DH trajectory. But one name not included in that list (and ergo not included in DDH) is Brian Croxall or, more specifically, the students who took his English 389 module at Emory University during the autumn of 2011. English 389 or #389dh as it became known to me, was the code for Emory's 'Introduction to Digital Humanities' course, and as a consequence of some serendipity I ended up 'taking' the course. I write 'taking' for I wasn't flying to Atlanta on a weekly basis, neither was I tapped into the seminar room via Skype, nor was I contributing in any meaningful way, rather I was following the course via the excellent class blog, reading what the students were reading, and thinking through the assignments they were doing (with the benefit of hindsight, I should have got more involved: posted comments or used twitter as a platform for engaging with the student community). What the experience created in me - apart from an enthusiasm for using blogging in teaching - was a desire to exploit the openness of so many DH modules and module conveners as a means of learning more about DH. This learning encompassed a range of activities: reading documentation, comparing reading lists, installing software, understanding assignments and following staff/students (and their conversations) on twitter. Most recently this has led me to Matthew Kirschenbaum's Introduction to Digital Humanities module and two excellent reflections (here and here) on knowledge in a digital age from a student working with Ryan Cordell, all repositories of knowledge as (if not more) valuable than traditional academic outputs.

And so with all this talk on digital learning the word MOOC, or Massive Open Online Course, screams into view. Depending on your opinion the MOOC is the future of higher education, the end of higher education, a positive (and necessary) force for creativity within higher education, or a flash in the pan best ignored by higher education (for two excellent recent reflections on MOOCs see posts by Matthew Yglesias and Clay Shirky). I shan't wade into this war of words and pedagogies here, but what is unquestionable is that MOOCs are out there, are being used and are being taken seriously by some institutions: the University of Edinburgh being the most recent to join Stanford University's Coursera platform.

A popular meme. Say "MOOC"... image courtesy of Flickr user audreywatters / Creative Commons Licensed
This week I decided to start Coursera's Computer Science 101 module on the recommendation of an attendee at THATCamp London, an event on humanities and technology held at the British Library on 14 April. The course offers a grounding in CS aimed at those, like me, who were never offered the opportunity to learn the basics at school or university, and have found themselves occasionally a bit ahead of themselves when trying to manipulate data in platforms such as Open Refine. I am pleased to report that I am finding the course very useful, with the videos, documentation and tasks well attuned to the type of knowledge being communicated and learning requried. It is clear, however, that not all learning can take place on MOOCs. Quite apart from the now standard claims that deep learning and nuanced subject areas require a level of interpersonal discussion not possible with MOOCs (and ergo only possible at an HE institution can), it is clear that the sort of learning I undertook to understand DH requires a broader definition of learning than both the MOOC and the HE institution can at present offer.
THE MOOC! the movie image courtesy of Flickr user giulia.forsythe / Creative Commons Licensed

As Digital Curators at the British Library we aim to communicate our knowledge of digital scholarship to both internal and external audiences. Part of this message is that the structured learning delivered by courses and institutions can only go so far and that learning often requires the learner to embed themselves within digital communities in order to learn. In some senses then the digital scholarship community is in itself the best, the most supportive and the most interactive MOOC on digital scholarship out there. What might this mean for the future of the MOOC?

James Baker, Digital Curator, @j_w_baker

09 April 2013

The day after the #DayofDH

Yesterday was Day of DH 2013, a day long jamboree of posts from digital humanities sorts with the aim of promoting discussion and providing a snapshot into one day in DH history. Last year the stand out post came from Dan Cohen, then (as now) Director of Roy Rosenzweig Center for History and New Media at George Mason University and soon to be Executive Director of the hugely exciting Digital Public Library of America. Cohen's post is short and sweet:

What Is Day of DH? Charitable and Uncharitable Views

Uncharitable: 24 hours of navel-gazing and obsessive self-recording by members of a relatively young, slightly insecure field that already spends too much time defining itself or arguing over the definition of digital humanities, even though they basically agree.

Charitable: A group version of Reddit’s IAmA, which gives people unexpected insights into what day-to-day work looks like in a field, and which could be usefully extended to other fields so that outsiders or those interested in joining can understand better what disciplines actually entail.

Undeterred, Day of DH ploughed on as before adding to the mix some shiny swag.


Answers on a postcard?... What is the identity of a Digital Humanist courtesy of Flickr user Craig Bellamy / Creative Commons Licensed

But what was 'as before'? Well the most curious element of Day of DH - pointed out by Ernesto Priego in his excellent rebel Day of DH post - is the single blog format. As opposed to DHers globally blogging and tweeting as normal, to participate in Day of DH 'proper' (ergo not like Priego) one must sign-up and blog on the official Day of DH platform, in a new space detached from one's usual networked spaces. The advantages of this from a data collection perspective are obvious, and yet - as Priego deftly emphasises - it renders the whole experience a little forced.

In spite of all this, I still took part: not least because the thrust of Cohen's 'charitable' perspective - Day of DH is a window into how people go about doing DH - remains compelling, and if you want to gaze into the windows of others it is courteous to draw back your own curtains.

My blog, entitled 'A Day of Digital Curation', can be found here. I hope the posts demonstrate what it can mean to be in and around DH today: to work on the fringes of libraries and academia, to enable digital research, to be part of a community, to promote openness of both research and data, to collaborate without borders. Idealistic this may seem, but good DH has to be in order to thrive.

05 April 2013

Archive Fever at British Library

The Library is all abuzz today as preparations are underway to celebrate a historical milestone in our archiving tradition....with a big old fashioned party! As of the stroke of midnight the British Library, the National Library of Scotland, the National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin, will be imbued with the powers to archive the entire UK web, along with e-journals, e-books and other e-content published in the UK.


To illustrate the value of this legislation, curatorial folks across all of the institutions recently nominated websites which encapsulate the unique bits of our digital culture that could be forever lost without the remit to collect these ephemeral online materials. The resulting list of 100 select websites is delightfully eccentric and wide-ranging, both surprising and yet obvious in parts, diverse in many ways yet homogenous in others. It represents and celebrates interesting flashes of our digital lives online, but if taken as a whole and definitive list however, it is clearly an incomplete record which would leave future generations with more questions than answers.

The archivization produces as much as it records the event.- Jacques Derrida, Archive Fever p.17

The real beauty of Legal Deposit Legislation is precisely that it ensures that alongside our essential expert interpretation and curatorial selection processes, we are also collecting and preserving the fullest possible record of life and society in the UK today so that future researchers may make their own informed assertions.

So celebrate with us today over at #digitaluniverse as the UK Web Archive starts archiving the full UK Web Domain and visit the British Library website for more details behind the legislation.

-Nora McGregor, Digital Curator, @ndalyrose