I love Makefiles more than I ought to. If you haven’t come across GNU Make before, then Mike Bostock’s “Why Use Make” is a good introduction, as is the section on Make at Kieran Healy’s “Plain Person’s Guide to Plain Text Social Science.” I like Make for several reasons. It lets you specify how your final products (like a website or a PDF document) are related to inputs, and that discipline is invaluable for producing reproducible research and for structuring your project sensibly. For lots of tasks it provides free parallelization and rebuilds only what is absolutely necessary. Since my projects fit into several different genres, once I have created a Makefile for the genre, it is trivial to adapt it to different projects. Whether it is an article or a book manuscript, a data analysis project, a website, my CV, or some kind of file conversion process, all that I need to remember how to do is type make to build the project, make deploy to put it on the web, and make clean to start over.

I often get asked how to do certain tasks related to digital humanities. Several of these queries came all at once recently, so it made sense to create some general purpose Makefiles that solve certain classes of problems. Below I point you to Makefiles for writing projects, for data analysis notebooks and other websites using R Markdown, for OCRing PDFs, and for converting shapefiles.

If you look over all these Makefiles you’ll see that there are probably only five or six elements which are repeated over and over. It doesn’t take many lines in a Makefile to get powerful results, yet I run the command make literally dozens of times per day in widely varying projects. GNU Make is a little peculiar, but picking it up has probably had the best return on my time for any technology I’ve learned.

Continue reading →

This semester I am teaching a graduate course on Data and Visualization in Digital History. The aim of this course is to teach students how to do the kind of data analysis and visualization that they are likely to do for a dissertation chapter or a journal article. In my way of working, that means the first part of the semester is an introduction to scripting in R, focusing on the grammar of graphics with ggplot2 and the grammar of data manipulation with dplyr and tidyr. Then the second part of the course is aimed at introducing specific kinds of analysis in the context of historical work. My aim is that this course will be the first in a two course sequence, where the second course (colloquially known as Clio 3) will have more programming in R (as opposed to scripting), will have more *nix-craft, will tackle a more advanced historical problem, will possibly cover more machine learning, and will end up creating interactive analyses in Shiny.

There are a few things about the Data and Visualization course that I think are worth mentioning.

First, I’ve been creating worksheets for historical data analysis each week. These worksheets tend to demonstrate some technique, then ask students to build up an analysis step by step. The questions within each worksheet range in difficulty from the rote and mechanical to the very difficult. While for now these worksheets are aimed at this class in particular, I intend over time to write worksheets like these for any topic in R I end up teaching. I’m rather pleased with these worksheets as a method of teaching data analysis by example. If I’m judging my students’ initial reactions correctly, they are also finding them helpful, if rather difficult at times.

Continue reading →

Kellen Funk and I are working on detecting how a New York legal code of civil procedure spread to most other jurisdictions in the United States. That Field Code and the other codes derived from it are the basis of modern American legal practice, so tracking the network and content of the borrowings reveals the structure of a significant part of American legal history.

States which adopted a version of the Field Code.
Figure 1: States which adopted a version of the Field Code. [JPEG]

In response to an invitation from the Digital Humanities Working Group at George Mason, we wrote a working paper that describes the current state of our research. In the paper we explain the historical problem to show why it is worth tracking how the Field Code spread. Then we give an overview of how we went about detecting which civil procedure codes were similar to one another, after which we give a few sample visualizations to show how we went about learning from those similarities. And finally we wrap up with a summary of what we think our project tells us about the history of nineteenth-century American law. We are working on an article, which will be structured rather differently with a fuller statement of our argument and many more visualizations, but in the meantime the working paper gives a fairly succinct overview of the project and its argument. It may also be of interest for going into more detail as to how a historical data analysis project proceeds from problem to interpretation than we may be able to do in the article. We also have a notebook with more details about the project.



Kate Bowler writes about the prosperity gospel and her cancer:

The prosperity gospel popularized a Christian explanation for why some people make it and some do not. They revolutionized prayer as an instrument for getting God always to say “yes.” It offers people a guarantee: Follow these rules, and God will reward you, heal you, restore you. It’s also distressingly similar to the popular cartoon emojis for the iPhone, the ones that show you images of yourself in various poses. One of the standard cartoons shows me holding a #blessed sign. My world is conspiring to make me believe that I am special, that I am the exception whose character will save me from the grisly predictions and the CT scans in my inbox. I am blessed.

The prosperity gospel holds to this illusion of control until the very end. If a believer gets sick and dies, shame compounds the grief. Those who are loved and lost are just that — those who have lost the test of faith. In my work, I have heard countless stories of refusing to acknowledge that the end had finally come. An emaciated man was pushed about a megachurch in a wheelchair as churchgoers declared that he was already healed. A woman danced around her sister’s deathbed shouting to horrified family members that the body can yet live. There is no graceful death, no ars moriendi, in the prosperity gospel. There are only jarring disappointments after fevered attempts to deny its inevitability.

The prosperity gospel has taken a religion based on the contemplation of a dying man and stripped it of its call to surrender all. Perhaps worse, it has replaced Christian faith with the most painful forms of certainty. The movement has perfected a rarefied form of America’s addiction to self-rule, which denies much of our humanity: our fragile bodies, our finitude, our need to stare down our deaths (at least once in a while) and be filled with dread and wonder. At some point, we must say to ourselves, I’m going to need to let go.

Kate Bowler - Death, the Prosperity Gospel, and Me →


In my last post I explained that historians of U.S. religion have barely begun to scratch the surface of the data (meaning, sources that are amenable to computation) that are available to them. To demonstrate this I gave the example of a single source, the Minutes of the Annual Conferences of the Methodist Episcopal Church.

In this post I want to attempt a very preliminary taxonomy of the kinds of sources that are available to religious historians who wish to use mapping or quantitative analysis of some kind or another. Let’s call this a taxonomy instead of a catalog, because I’m going to list the kinds of sources that I’ve come about rather than try to give a bibliography of all of the sources themselves. I’d love to be able to list all the sources, but I haven’t done all that work yet. And let’s say this is very preliminary, because I hope this post is an example of the so-called Cunningham’s Law: “the best way to get the right answer on the Internet is not to ask a question; it’s to post the wrong answer.” That is to say, if you know of a source or category of source that I don’t know about, I hope you’ll correct me in the comments. Finally, I should mention that I’m teaching a course this semester on “Data and Visualization in Digital History” where we are working on nineteenth-century U.S. religious statistics. I’m indebted to the excellent students in that course, who have already turned up many sources that I didn’t know about.

Enough throat clearing.

All U.S. religious statistics are divided into two parts, those from the Census, and those not from the Census.

Continue reading →

While everyone else is live tweeting, I’m live blogging the AHA’s digital projects lightening round. While of course the projects are widely varied in terms of content, they all have something in common. With very few exceptions, all of the digital projects are expressed in terms of a historical argument or interpretation. This is rather different than many DH presentations, which tend to focus on methods or tools or technologies. Why the difference? It can probably be attributed to the way that the context of the AHA meeting pushes everyone to frame their project in disciplinary terms. That is a very good thing. And maybe everyone here was in the room for Cameron Blevin’s talk at the AHA last year, and took his admonishment to heart.

AHA16 Digital Projects Lightning Round Lineup →

Here is a summary of what I said in the DH pedagogy lightening sessions at the AHA.

Simple idea 1: Installing software takes a lot of time, and installing software can often be harder and require more technological skill than actually using the software.

Simple idea 2: You must scaffold your digital history courses, so that one assignment leads into the next, and so that students build the methodological and technical skills that they need as they go through the course.

The problem is that students need to install the software before they can use it. The most technologically difficult, and the least pedagogically or historically interesting task, happens at the beginning of the course. This presents a tremendous barrier to student involvement. It wastes course time early in the semester, when building momentum is crucial.

Not so simple solution: My solution to this problem is to try to take the burden of installing software on myself so as to not waste students’ time. For my “Data and DH” course next semester, as well as for previous courses, I have relied on an RRCHNM installation of RStudio Server. This lets students access a full development environment through their browser: no installing or configuring software.

You might object: there is no way my institution will give me a server of my own, and installing RStudio Server might be too difficult for me. For RStudio, at least, the analogsea package can help you get a server up and running at Digital Ocean. Assuming you already have a Digital Ocean account, it can be as simple as these few lines of code (though you will also have to add users and configure the memory).

library(analogsea)

docklet_create() %>%
 docklet_rstudio()

The point is not that you should use RStudio Server (though it’s great), and there are other options like Anaconda for Python. The point is to find a way to reduce or eliminate the waste of student time and attention that comes from installing software. Find a way to scaffold your courses so that you can get straight into the digital history.


Cameron Blevins and I recently published an article in Digital Humanities Quarterly titled “Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction.” The article has two related goals. First we explain the historical method behind the gender package for R, showing how it takes into account changes in the associations between names and genders. This method can be used by historians and other scholars to guess genders from first names as reliably as possible. Then, to show how the method can actually be used to make an argument, we apply the method to show that, while the number of history dissertations written by men and women is nearly equal, there continues to be a gap between the number of books on history reviewed in the American Historical Review written by men and women.

Here is the abstract:

This article describes a new method for inferring the gender of personal names using large historical datasets. In contrast to existing methods of gender prediction that treat names as if they are timelessly associated with one gender, this method uses a historical approach that takes into account how naming practices change over time. It uses historical data to measure the likelihood that a name was associated with a particular gender based on the time or place under study. This approach generates more accurate results for sources that encompass changing periods of time, providing digital humanities scholars with a tool to estimate the gender of names across large textual collections. The article first describes the methodology as implemented in the gender package for the R programming language. It goes on to apply the method to a case study in which we examine gender and gatekeeping in the American historical profession over the past half-century. The gender package illustrates the importance of incorporating historical approaches into computer science and related fields.

Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction →