Isn’t it obvious?

A common response to digital history research is that has failed to make an argumentative or interpretative payoff commensurate with the amount of effort that has been put into it. Broadly speaking, I’m sympathetic to that claim. But there is a particular form that this claim sometimes takes which I think is mistaken: the idea that even when interpretations or arguments from digital history work are presented, they do not tell us anything new. Scott Weingart has written perceptively and more generally about the problem that “digital history can never be new.” I want to add only a small piece to that discussion.

When I present the results of my work as a visualization, audiences sometimes react by saying that they can immediately explain what the visualization shows and that it merely reflects what they already knew. Matthew Lincoln has written about the “confabulation” or “just-so stories” that readers of visualizations can come up with in order to explain them. And it’s not just the audiences for visualizations who do this; I do it myself whenever I create visualizations for my own consumption. The sense that a visualization is immediately explainable is the result, I think, of the ability of visualizations to rapidly and persuasively communicate large amounts of information.

Continue reading “Isn’t it obvious?”

Preprint for “The Spine of American Law”

Kellen Funk and I have co-authored an article titled “The Spine of American Law: Digital Text Analysis and U.S. Legal Practice.” The article has been recently accepted for publication in the American Historical Review. It is currently scheduled for the February 2018 issue. Here is our abstract.

In the second half of the nineteenth century, the majority of U.S. states adopted a novel code of legal practice for their civil courts. Legal scholars have long recognized the influence of the New York lawyer David Dudley Field on American legal codification, but tracing the influence of Field’s code of civil procedure with precision across some 30,000 pages of statutes is a daunting task. By adapting methods of digital text analysis to observe text reuse in legal sources, this article provides a methodological guide to show how the evolution of law can be studied at a macro level—across many codes and jurisdictions—and at a micro level—regulation by regulation. Applying these techniques to the Field Code and its emulators, we show that by a combination of creditors’ remedies the code exchanged the rhythms of agriculture for those of merchant capitalism. Archival research confirmed that the spread of the Field Code united the American South and American West in one Greater Reconstruction. Instead of just a national political development centered in Washington, we show that Reconstruction was also a state-level legal development centered on a procedure code from the Empire State of finance capitalism.

The authors’ original manuscript (or preprint) is available at SSRN. This is the version that we submitted for peer review in July 2016. The final version will be different, in part because of our revisions in response to the helpful peer reviews, and in part because we have expanded our original corpus by some 40% and plan to expand it further before publication. While we think these revisions greatly strengthen the essay, we don’t think that they invalidate this earlier version. So we are making the authors’ original manuscript available now following Oxford University Press’s policy.

Continue reading “Preprint for “The Spine of American Law””

USAboundaries v0.3.0 released

I’ve recently published version 0.3.0 of my USAboundaries R package to CRAN. USAboundaries provides access to spatial data for U.S. counties, states, cities, congressional districts, and zip codes. Of course you can easily get contemporary boundaries from lots of places, but this package lets you specify dates and get the locations for historical county and state boundaries as well as city locations.

This version of the package has a number of new features. It has jumped on the Simple Features bandwagon, so now all boundary data is returned as an sf object. This version also includes updated shapefiles from the U.S. Census for contemporary data, as well as new centroids for Zipcode Tabulation Areas and historical city populations courtesy of Erik Steiner’s project from CESTA.

I’m especially glad that the package has added a new author: Jordan Bratt, a PhD student at George Mason and a collaborator on Mapping Early American Elections. Jordan added functionality to the package that lets users get projections from the State Plane Coordinate System, so that they can make locally accurate maps at the level of the state or below.

The package has a new website thanks to pkgdown. You can read the full release notes at the package website.

Announcing Current Research in Digital History

Today Stephen Robertson and I are announcing a new conference and peer-reviewed proceedings titled Current Research in Digital History, hosted (and funded) by RRCHNM and George Mason University’s Department of History and Art History. You can read the announcement at the RRCHNM website, and here is our brief description from the conference website:

Hosted by the Roy Rosenzweig Center for History and New Media, Current Research in Digital History is an annual one-day conference that publishes online, peer-reviewed proceedings. Its primary aim is to encourage and publish scholarship in digital history that offers discipline-specific arguments and interpretations. A format of short presentations provides an opportunity to make an argument on the basis of ongoing research in a larger project.

As a number of people have pointed out, most notably Cameron Blevins, digital history has a problem in that it rarely makes arguments or interpretations that advance conversations in historical fields. We intend for this conference and proceedings to be one part of an effort to encourage those kinds of arguments.

CRDH is also intended to be a publication venue for what we might call preliminary results. Let me give you a specific example. Kellen Funk and I have been working on tracking the migration of law in nineteenth-century U.S. codes of civil procedure for some time. While we are getting close to a final publication about those results and methods, we have had the basic argument down for quite a while. A venue like CRDH would let us not just present but also publish a mix of preliminary conclusions and method on the way to our larger argument. While that’s not the only kind of paper we anticipate digital historians might want to bring to CRDH, we do think preliminary results is one significant category that would be served by this kind of short-form publication.

We’ve gathered a program committee that I am very excited to be working with: Kalani Craig, Jessica Marie Johnson, Michelle Moravec, and Scott Weingart. We’re grateful to these four scholars for lending their time and expertise.

We’ve tried to think through very carefully what this conference and publication should look like, soliciting advice from a number of different people in the field. We’ve written up a fuller explanation of CRDH (PDF here). We hope you’ll take a look and then send us a paper for consideration.

A confirmation of Andrew Goldstone on “Teaching Quantitative Methods”

At his blog, Andrew Goldstone has posted a pre-print of his essay on “Teaching Quantitative Methods: What Makes It Hard (in Literary Studies)” for the forthcoming Debates in DH 2018. It’s a “lessons learned” essay from one of his courses that is well worth reading if you’re teaching or taking that kind of a course in a humanities discipline. This semester I’m teaching my fourth course that fits into that category (fifth, if you count DHSI), and I can co-sign nearly everything that Goldstone writes, having committed many of the same mistakes and learned some of the same lessons. (Except over time I’ve relaxed my *nix-based fundamentalism and repealed my ban on Windows.) Here is a response to Goldstone’s main points.

Continue reading “A confirmation of Andrew Goldstone on “Teaching Quantitative Methods””

Pick the title for my digital history textbook

In my first semester teaching one of my department’s graduate methods courses in digital history, I realized that there was not a lot good material for teaching computer programming and data analysis in R for historians. So I started writing up a series of tutorials for my students, which they said were helpful. It seemed like those materials could be the nucleus of a textbook, so I started writing one with the title Digital History Methods in R.

It was too soon to start writing, though. Besides needing to spend my time on more pressing projects, I didn’t really have a clear conception of how to teach the material. And in the past few years, the landscape for teaching computational history has been transformed. There are many more books available, some specifically aimed at humanists, such as Graham, Milligan, and Weingart’s Exploring Big Historical Data and Arnold and Tilton’s Humanities Data in R, and others aimed at teaching a modern version of R, such as Hadley Wickham’s Advanced R and R for Data Science. The “tidyverse” of R packages has made a consistent approach to data analysis possible, and the set of packages for text analysis in R is now much better. R markdown and bookdown have made writing a technical book about R much easier, and Shiny has made it much easier to demonstrate concepts interactively.

After teaching these courses a few times, I have a clearer conception of what the textbook needs to accomplish and how I want it to look.

Continue reading “Pick the title for my digital history textbook”

New article: “A Servile Copy”

Kellen Funk and I have just published an article titled “A Servile Copy: Text Reuse and Medium Data in American Civil Procedure” (PDF). The article is a brief invited contribution to a forum in Rechtsgeschichte [Legal History] on legal history and digital history. Kellen and I give an overview of our project to discover how nineteenth-century codes of civil procedure in the United States borrowed from one another. (We will have more soon about this project in a longer research article.)

If you are interested in digital legal history, you might also look at some of the articles which have been posted in advance of the next issue of Law and History Review, which will be focused on digital legal history.

Syllabus for “Text Analysis for Historians”

This semester I am teaching an independent study for graduate students on “Text Analysis for Historians.” You can see the syllabus here. It’s an unashamedly disciplinary course. While of course the readings are heavily dependent on the work that is being done in digital humanities or digital literary studies, the organizing principle is whether a method is likely to be useful for historical questions. And the syllabus is organized around a corpus of Anglo-American legal treatises, with readings to frame our work in the context of U.S. legal history.

They are mentioned on the syllabus, but this class draws from syllabi from Ted Underwood, Andrew Goldstone, and Ben Schmidt, and Kellen Funk offered suggestions for the readings.


New package tokenizers joins rOpenSci

This post originally appeared at the rOpenSci blog.

The R package ecosystem for natural language processing has been flourishing in recent days. R packages for text analysis have usually been based on the classes provided by the NLP or tm packages. Many of them depend on Java. But recently there have been a number of new packages for text analysis in R, most notably text2vec, quanteda, and tidytext. These packages are built on top of Rcpp instead of rJava, which makes them much more reliable and portable. And instead of the classes based on NLP, which I have never thought to be particularly idiomatic for R, they use standard R data structures. The text2vec and quanteda packages both rely on the sparse matrices provided by the rock solid Matrix package. The tidytext package is idiosyncratic (in the best possible way!) for doing all of its work in data frames rather than matrices, but a data frame is about as standard as you can get. For a long time when I would recommend R to people, I had to add the caveat that they should use Python if they were primarily interested in text analysis. But now I no longer feel the need to hedge.

Still there is a lot of duplicated effort between these packages on the one hand and a lot of incompatibilities between the packages on the other. The R ecosystem for text analysis is not exactly coherent or consistent at the moment.

My small contribution to the new text analysis ecosystem is the tokenizers package, which was recently accepted into rOpenSci after a careful peer review by Kevin Ushey. A new version of the package is on CRAN. (Also check out Jeroen Ooms’s hunspell package, which is a part of rOpensci.)

Continue reading “New package tokenizers joins rOpenSci”

Introducing America’s Public Bible (Beta)

It’s the start of August, and I don’t want to presume on the good graces of this blog’s readers. So in the spirit of late summer, I’m finally getting around to briefly describing of one of my summer projects in the hope that you find it fun, leaving a fuller accounting of the why and wherefore of the project for another time.

America’s Public Bible is a website which looks for all of the biblical quotations in Chronicling America. Chronicling America is a collection of digitized newspapers from the Library of Congress as part of the NEH’s National Digital Newspaper Program. ChronAm currently has some eleven million newspaper pages, spanning the years 1836 to 1922. Using the text that ChronAm provides, I have looked for which Bible verses (just from the KJV for now) are quoted or alluded to on every page. If you want an explanation of why I think this is an interesting scholarly question, there is an introductory essay at the site.

Continue reading “Introducing America’s Public Bible (Beta)”