Mapping the spread of American slavery

[A revised version of this post was published at Smithsonian.com.]

In September of 1861, the U.S. Coast Survey published a large map, just under three feet square, titled a “Map showing the distribution of the slave population of the southern states of the United States.” Based on the population statistics gathered in the 1860 census, and certified by the superintendent of the Census Office, the map depicted the percentage of the population enslaved in each county.

U.S. Coast Survey, Map showing the distribution of the slave population of the southern states of the United States (Washington, DC: Henry S. Graham, 1861). Image from the Library of Congress.

The map showed at a glance the large-scale patterns of slavery in the American South: the concentrations of slavery in eastern Virginia, in South Carolina, and most of all along the Mississippi. It also repaid closer examination, since each county was labeled with the exact percentage enslaved. The map of slavery was one of many thematic maps produced in the nineteenth century United States. As Susan Schulten has shown, this particular map was used by the federal government during the Civil War, and it was a favorite of Abraham Lincoln’s.1

A detail from the U.S. Coast Survey map of slavery, showing the Mississippi River and delta.

Though such thematic maps, in particular of slavery, have their origins in the nineteenth century, the technique is useful for historians. As I see it, one of the main problems for the historians’ method today is the problem of scale. How can we understand the past at different chronological and geographical scales? How can we move intelligibly between looking at individuals and looking at the Atlantic World, between studying a moment and studying several centuries?2 Maps can help, especially interactive web maps that make it possible to zoom in and out, to represent more than one subject of interest, and to set representations of the past in motion in order to show change over time.

Continue reading “Mapping the spread of American slavery”

A better map of slavery in 1860

TL;DR I made a bad map of slavery, and there is a better map at the end of the post.

When I finished working the other night I tweeted the current state of the map of slavery that I had been making. Anthea Butler retweeted it, and then a lot of people saw it. (Not that many, but certainly more than will ever read the dissertation chapter the map is a part of.) I’m glad that people found the map interesting. But though there was nothing erroneous about the map, it certainly was not the best map of slavery possible. Here is the draft map.

Number of slaves by county in 1860 (quartile breaks)

It’s easy to spot the biggest problem in that map: the values mapped to the colors are less than ideal. I suspect that most people who saw the map didn’t pay any attention to the legend at the bottom. And why should they have? Until I changed the numbers to a humanist-readable format the legend was almost incomprehensible. What the legend means is that the lightest yellow represents counties where there were 450 or fewer slaves living; the dark red represents counties where there were more than 5,380 slaves and fewer than 37,300 slaves.

Those numbers should give a reader pause: why should a county with 5,380 slaves be classified the same as a county with almost seven times as many slaves? The breaks in the first map are not arbitrary, but divide the counties into quartiles. That is, I ran a function which divided the counties into four even groups. This was a rough and ready way to classify the counties.

The trouble is that quartiles are not a particularly meaningful way to classify the counties. You might even argue—and this certainly wasn’t my intent—that it is a sensationalist way to classify the counties. By definition, using quartiles means that one-quarter of the counties on the map would be colored bright red. If this were a map of smokers, then one-fourth of the counties would be bright red; if it were a map of lung cancer, one-fourth of the counties would be bright red. That’s because when using quartiles, the breaks are determined by the count of the observations (i.e., the number of counties) rather than the value of the observations (i.e., the number of slaves in each county). Below is a histogram of the distribution of counties: you can see that a few counties had very large numbers of slaves, while most counties had relatively smaller numbers.

Histogram of number of slaves per county

But the question of how to categorize the counties is as much a historical question as it is a question for the techniques of data analysis. Though histories of slavery have often been written about large plantations where many slaves lived, historians have long known that many enslaved African Americans did not live on plantations, because most slaveholders owned only a few slaves. This is an important point, because the possibilities for slave culture and religion are very different on a farm with one or two enslaved African Americans than on a plantation with a hundred slaves. Below is a chart of the number of slaveholders by the number of slaves that they owned. 1 (Notice also how widespread slave ownership was: 395,216 slaveholders according to the 1860 census.)

Slaveholders by number of slaves owned in 1860

What the charts of counties and slaveholders demonstrate is that dividing the counties into quartiles does not make for an accurate map. Fortunately there are better methods, in particular George Jenks’s algorithm for finding breaks in the data set. The Jenks method tries to make groups whose individual members are as close to each other as possible, but where each group in the aggregate is as much unlike the other groups as possible. Using that algorithm, we can divide the counties into more meaningful groups, as the chart below shows.

Histogram of slaves per county with breaks compared

Using the Jenks breaks, we get a much better map of where slaves lived in 1860. We can see all of the detail that was in the earlier map: the South’s fertile crescent through Virginia, North Carolina, South Carolina, George, Alabama, and Mississippi; the Mississippi, Missouri, and Tennessee river valleys; South Carolina as the state with the highest concentration of slaves. But this revised map has a higher resolution (if you will). We can now see cities like Washington, Charleston, Nashville, Mobile, and New Orleans—important since slavery must be understood in terms of slave markets, commodities markets, and capitalism. 2 The hinterland of slavery is also more clearly defined—important since the expansion of slavery was the issue in the sectional crisis. 3

Number of slaves by county in 1860 (Jenks breaks)

The lesson here is not that you should only make finished work public. But I hope that this look at the decisions that go into working with data demonstrates how a historian’s knowledge is more important than technological skill in making a historical map.


  1. In this case the categories come directly from the Census tables. As some people wrote to say, the proportion of African Americans in the total population is another way to measure this, but that’s the subject of another map.
  2. See Walter Johnson, Soul by Soul: Life Inside the Antebellum Slave Market (Cambridge, MA: Harvard University Press, 1999); Walter Johnson, River of Dark Dreams: Slavery and Empire in the Cotton Kingdom, 2013.
  3. The code and data for the map is on GitHub.

Analyzing historical history dissertations: page counts

This series on “Analyzing Historical History Dissertations” is a work in progress and I’ve re-done some of these visualizations. If you would like to cite or link to this work in progress, please consider using the landing page, which will always have the most up-to-date information and a list of all the posts.

The first question anyone writing a dissertation probably asks is, How long should this thing be? When Michael Beck looked at data from the University of Minnesota, he found that history dissertations were the longest. Ben Schmidt found that the average length of history dissertations at Princeton varied quite a bit, from a peak of about 425 pages on average around 1995 to a low of slightly more than 250 pages on average around 2006 or 2007. Ben also concluded that “300 pages is the normal length.”

Using the ProQuest data, we can see how history dissertations varied in length over time:

The more useful view is to look at just dissertations since 1945:

We can make a few observations. First, the average length of dissertations is remarkably stable. From 1880 to 1930, history dissertations get quite a bit longer. But since from the 1950s to the present, the average length of dissertations has fluctuated within a relatively narrow band. That band is relatively narrow, that is, in relation to the huge overall variation in the length of history dissertations, which have a normal range between 150 and 600 pages. The acceptable range can even go a little lower than 150 pages, and it can go much, much higher than 600 pages.

We can be more precise about typical length of a history dissertation by plotting the mean and median. (If you prefer, you can see that data in tabular form at the end of the post.)

The mean length is longer by 27 pages on average than the median length, as you would expect since the permissible maximum length for a dissertation is much more flexible than the permissible minimum length. But the two measures fluctuate more or less in tandem. From a peak in 1958 to a trough in 1972, dissertations got shorter by about 45 pages. Then from 1972 dissertations gradually got longer till they reached a peak in 1988 about 55 pages longer. Since 1988 dissertations are getting shorter, with 2012 being a low with a mean of 331 and a median of 306.

I don’t have a good explanation for these fluctuations. Could dissertations have gotten shorter from 1958 to 1972 because of a shift from narrative or political history to social history? Then could they have gotten longer from 1972 to 1988 because of the rise of cultural history? I suppose, though the dates feel vaguely off. What explains why dissertations got shorter through the 1990s and 2000s? I think matching this data up to time-to-degree data and job market data might prove fruitful.

It’s not enough to look at the mean or median dissertation length, given that there is such an enormous variation in the permissible length of dissertations. Another helpful way to look at the data is to see the distribution of the quartiles. (This chart cuts off many outliers above 800 pages long.)

The boxes in this chart show the middle 50 percent of dissertations for each half decade. We might interpret this as the typical range for most dissertations. Even typical dissertations fluctuate in length, so that the low end of typical can be 70 pages shorter than median, and the high end of typical can be 50 or 60 pages more than median. But many dissertations come in shorter, and there is a very high upper bound to the maximum length of dissertations.

Next up, I’ll compare the typical length of dissertations for the academy as a whole to the length of dissertations at specific universities.

In summary, what does this data about page lengths say about history dissertations? It says that your adviser was right when she said that the dissertation will be done when you’ve written what you need to write.


Some caveats: There are definitely errors in the data, for example, a six page dissertation from Princeton advised by Robert Darnton. (Sweet deal, if you can get it.) But there are only 215 dissertations with fewer than 100 pages, and only 53 dissertations with more than 1500 pages, so I don’t think these errors skew the data that much. Though it is scarcely believable, the dissertations above 1500 are probably not all errors, either. Another problem is that we’re dealing with number of pages rather than word counts, and the number of words per page presumably changes with different writing technologies. (The definition of a word, on the other hand, is stable and timeless, even eternal.) Fortunately the timebound and hideous formatting requirements that universitites impose on dissertations probably keep this variation in check.

Mean and Median Length of History Dissertations, 1945–2012

year mean median
1945 324 319
1946 301 296
1947 400 329
1948 366 314
1949 358 311
1950 306 282
1951 375 348
1952 370 364
1953 372 335
1954 361 338
1955 362 338
1956 362 340
1957 371 348
1958 384 369
1959 369 338
1960 372 343
1961 360 332
1962 350 326
1963 350 330
1964 357 331
1965 347 319
1966 351 324
1967 349 328
1968 348 327
1969 344 322
1970 353 326
1971 351 323
1972 344 318
1973 352 326
1974 360 331
1975 361 334
1976 364 338
1977 367 341
1978 362 328
1979 369 342
1980 373 344
1981 383 350
1982 388 356
1983 383 353
1984 385 358
1985 393 354
1986 386 348
1987 386 356
1988 389 353
1989 386 353
1990 384 350
1991 380 347
1992 377 347
1993 381 346
1994 372 339
1995 354 327
1996 350 322
1997 353 326
1998 354 327
1999 351 325
2000 354 327
2001 350 324
2002 350 325
2003 343 318
2004 340 317
2005 343 316
2006 339 311
2007 346 316
2008 337 313
2009 332 308
2010 334 311
2011 330 310
2012 331 306
2013 333 311

Digital humanities is a spectrum; or, we’re all digital humanists now

Digital humanities is a spectrum. To put it another way, all humanities scholars use digital practices and concepts to one degree or another, even those who do not identify as digital humanists. Working as a digital humanist is not one side of a binary, the other side of which is working as a traditional scholar.

Consider a few examples: one historian keeps notes and transcribed documents in MS Word documents so that they can be searched. A literary scholar uses a print-on-demand machine to get a physical copy of a book or newspaper scanned by Google. A medievalist uses a library or archive website to read a document that would otherwise require a trip to Europe. A professor making assignments for a class posts readings to Blackboard. A graduate student in a hurry uses Amazon’s “Look Inside” feature to verify a footnote. A history department circulates papers for a workshop via e-mail.

Continue reading “Digital humanities is a spectrum; or, we’re all digital humanists now”