RRCHNM is a shop that is more and more working on computational history and historical data visualization. But we are also first and foremost a web shop: ever since Roy Rosenzweig saw the potential of the internet and left CD ROMs behind, we’ve been committed to delivering history via people’s web browsers. Those two commitments are becoming increasingly compatible. For example, Ben Schmidt has written persuasively about the next decade of data programming happening in the browser via JavaScript. But combining data analysis and the web takes work. In this blog post, I want to explain how we are solving one aspect of that challenge via our custom data API.

We have a lot of datasets in play for RRCHNM’s projects. Some of the spatial datasets, such as Natural Earth and the Atlas of Historic County Boundaries, we use over and over across projects. AHCB is a critical part of both Mapping Early American Elections and American Religious Ecologies. Some of the datasets are small and intended for display. Others are large text corpora, such as Chronicling America, Gale’s Making of Modern Law, or all of the full text collections from the Library of Congress gathered as part of Computing Cultural Heritage in the Cloud, from which we compute derivative datasets of biblical quotations, legal citations, or the like. Even those derivative datasets can be fairly large and unwieldy. And other datasets are ones that we are transcribing ourselves using our DataScribe tool. These include the data about religious congregations from the 1926 Census of Religious Bodies and about London’s bills of mortality.

The version of record for these datasets is typically a PostgreSQL database. We use a relational database for—well—all the reasons everyone else uses relational databases. In particular, we value the strong guarantees a database provides about the data being strongly typed and well structured. We find it useful to be able to access the exact same data via, say, R for data analysis and a web application for display. And of course, there is the ability to query and index the data, combine datasets through joins, provide shared access, and so forth. PostgreSQL is not an exciting choice; it may very well be the least exciting choice imaginable. But rock solid and boring is a great place to be for critical infrastructure. 

An example of what some of the data looks like from the American Religious Ecologies project. It might not look like much, but we had to reverse engineer and entire federal census in order to create it.

An example of what some of the data looks like from the American Religious Ecologies project. It might not look like much, but we had to reverse engineer and entire federal census in order to create it.

That still leaves the problem of getting the data out of the database and into the user’s browser. We needed a solution that could provide some key features:

  • The data should be delivered in a format easily usable for web visualization, which means JSON or GeoJSON.
  • The data should be reshaped as necessary. Frequently the form that data is stored in, typically some kind of normalized form, is not the way that the data should be structured for display.
  • Large datasets must be queryable. Although browser can handle more and more, that does not mean that they should be made to do so, so ideally the minimum amount of data necessary should be sent to the browser.
  • It should be easily extensible as we add new projects, and it should not require us to reinvent the wheel every time we start a new project. Rather, it should let us use existing data and functionality (such as the AHCB dataset I mentioned) across projects.
  • And, if the need arises, it should allow the browser to write back to the database.
JSON from the data API. It’s not exciting, but if it’s what you need, it’s very useful.

JSON from the data API. It’s not exciting, but if it’s what you need, it’s very useful.

Our solution was to create a custom data API for RRCHNM projects, which we call Apiary. (Yes, we know other projects use that name, but this is just our internal codename.) The API is written in Go, a simple but powerful language well suited for our needs here. The API is containerized using Docker, for ease of deployment. The API essentially consists of a thin, fairly minimal application that provides the necessary boilerplate to set up a database connection, a web server, and so forth. Then individual endpoints that provide specific datasets are added as handlers. Adding a new dataset or view on a dataset is thus as straightforward as writing a new function in Go. But since those handlers fall into a few different types, in most instance the main work of adding a new endpoint is writing a new SQL query. 

Our data API is available under an open-source license on GitHub. (You can also take a look at the API’s endpoints.) To be clear, this project is a highly-custom application, not a library or a general purpose application. Nearly all of the handlers would be of no use to non-RRCHNM projects, and you would have to create your own database, queries, endpoints, and so forth. But as we look around at the landscape of digital history and digital humanities projects, we see other projects that have a similar need to store, structure, query, and display data in the browser. Perhaps the general idea of a data API could prove useful to other institutions and scholars.


I recently had to set up a new Mac for work. Generally speaking, this happens so infrequently that it is worth setting up the new machine from scratch, rather than using Migration Assistant. I like to avoid carrying over the cruft that comes from several years of a constantly updated development environment, and all work files are in iCloud Drive or GitHub anyway. But still, that leaves a fair bit of set up to do to get things working correctly.

For a long time I’ve kept my dotfiles in a GitHub repository. This sets up configuration for ZSH, Neovim, Go, R, Homebrew, LaTeX, Git, and the like. While a lot of it is Mac specific, the shell and text editor configuration work fine on Linux machines, so I can easily bring settings to servers. This makes customizing my development environment fairly painless. (And Visual Studio Code now has great settings sync, so that takes care of itself.)

Of course not everything can go in a public GitHub repository. Recently, I’ve taken to having a single file (~/.env.zsh), which contains project and service credentials, as well as machine-specific settings, stored as environment variables. For example, all the projects that I create pull their database connection settings from environment variables. And setting the number of cores available on a particular makes scaling up parallel processing easier. This file, like SSH keys, is easy to move over to a new machine.

Some machine-specific settings from my environment file.

Some machine-specific settings from my environment file.

What was new to me this time was using Homebrew bundles for installing software and dependencies. While I’ve used Homebrew for a long time, I recently learned from a blog post by Casey Liss that Homebrew has for a while now supported creating a list of software to install. In addition to installing CLIs and other packages from Homebrew proper, and GUI applications as Homebrew Casks, it even supports (though not particularly reliably) installing applications from the Mac App Store.

So I set up a Brewfile for my work machine. This worked great for setting up the new machine, and it is nice to have an explicit record of the software that I need to have installed.


As promised, here is the first installment of an occasional series on my tech stack. If you want to jump straight to the history, see below for two book recommendations on labor history and religious history.


Last week I mentioned that I would write about the technology stack that I use to do digital history work. For this week, let me briefly introduce the concept.

Talking about “a stack” of technologies is not the same thing as talking about digital tools. The whole discussion of “tools” in digital humanities, or the broader cultural contexts of life hacking, productivity pr0n, and their ilk, is not something I want to get into right now.

Here is how a technology stack and “tools” are different. A tool is a specific means to a specific ends. You want to make a map, therefore you use X. You need to clean data, therefore you use Y. You need to make a network, therefore you use Z. There’s nothing wrong with that, of course. But too often that approach leads to a shallow understanding of the method being used. Networks and maps are different, of course, but they can also be worked on using the same technologies. The short-term convenience of using a highly specialized tool buys a lot of long-term pain because then data and outputs are rarely reusable or interoperable.

So what I want to talk about for a few newsletter issues are the components of an overall approach to the kind of digital history I do (computational, spatial, data visualization and so forth), and how they work together. These are parts of a system for doing research that work together across projects. The goal is to build fluency with using these parts of the system, so that future projects can build on the data and systems I have developed along the way.

For example, almost all of my projects store their data in PostgreSQL, whether or not the data is spatial or textual, whether it comes from an API or bulk download. If I need to analyze that data, I do it in R. If it needs to go on the web, I use some combination of Go and JavaScript. Those different parts of the stack are all pulling from the same database.

Here are a few of the pieces I’m sure I want to talk about. Perhaps there will be others too.

  • Data store: PostgreSQL, PostGIS
  • Web and general-purpose programming: Go
  • Data analysis and visualization: R and JavaScript
  • Websites: Hugo
  • Web servers and hosting

More in the coming weeks.


It was just Labor Day in the U.S., so here are two book recommendations.

The first is Roy Rosenzweig’s Eight Hours for What We Will: Workers and Leisure in an Industrial City, 1870–1920. I now work at the Roy Rosenzweig Center for History and New Media, though I never met the man Roy Rosenzweig. (I was once assigned to hand out educational materials at the Organization of American Historians on behalf of RRCHNM. I was mostly unsuccessful at that task, but at least four people came up to me to tell me, at great length, how much Rosenzweig meant to them. It made an impression.) But I knew Eight Hours for What We Will a long time before I came to RRCHNM, and not just because it still belongs on the exam list for every PhD in American history. One of my textbooks as an undergrad had the forbidding title, Historiography: Ancient, Medieval, and Modern. In reviewing labor history, it mentioned Rosenzweig’s history of Worcester, Massachusetts. Since Worcester is the closest city to the town where I grew up, I looked it up in the university library and loved it. The book still sparkles with historical imagination and appreciation for its subjects, and it is very much worth reading.

The second is Heath Carter’s Union Made: Working People and the Rise of Social Christianity in Chicago. You may detect that I love history books which are carefully anchored in a particular place. Carter’s book is very much a history of Chicago, but it also unpacks the class divisions in specific congregations. The book begins with labor union representatives preaching Labor Day sermons in church pulpits in 1910. I think you will find that practice a contrast with practically any American church a century later.


Updates

Working: Containerizing and adapting my prediction model for America’s Public Bible so that I can use it for Computing Cultural Heritage in the Cloud.

Listening: Tennessee Ernie Ford, Nearer the Cross (1958): found an LP for 99¢ at the used bookstore.

Reading: Just finished Margaret O’Mara’s The Code: Silicon Valley and the Remaking of America, which I thought was both really good and a really good read.

Hiking: Sky Meadows state park with my family.


Hi folks. It has been an embarrassingly long time since I wrote an issue of this newsletter. A few things happened. The sheer exhaustion of the pandemic caught up with me, as I am sure it did with you. But even more, I took on a major non-work responsibility—the details don’t matter for our purposes—and I have tried to discharge my duty faithfully. But a new academic year is upon us, and I hope to get back to writing this newsletter. Below is a scattershot of updates to get started again.


I read in the news that in an address on the crisis in Afghanistan, President Biden quoted Isaiah 6:8 (“Here am I; Send me”), referring to American service members . I think it is fair to say that it is a jarring and not at all typical use of the text. Certainly I had never encountered a use of that text in that context before. So I had to wonder, Is there a history of using that verse to refer to the military? But before I could work on it myself, my feed reader turned up Chris Gehrz’s post for The Anxious Bench: “‘Here I am, send me’ in American Military History.” Chris uses my America’s Public Bible to turn up a number of earlier examples of similar uses. You should take a look.


Speaking of America’s Public Bible, I am working as quickly as I can to complete the updated version to send to the press. Here’s Isaiah 6:8 in the long-running prototype version, which you can continue to access.

APB prototype

And here is the far more reliable and (I hope) more useful version that is forthcoming but still in development.

APB development

It’s not just a visual refresh: I’ve also extended the chronological range, found a lot more quotations, and added an interpretative layer to the project.


Speaking of layers, one of the most useful essays I’ve read on the form of digital scholarship is Robert Darnton’s 1999 essay, “The New Age of the Book.” I was not precocious enough to be reading the New York Review of Books at the dawn of the new millennium, but I had the good fortune to hear Darnton speak on a similar subject at the Brandeis University library while I was in graduate school and subsequently discovered the essay. I’ve found his idea of an e-book as a pyramid of scholarly materials—from a broad base of sources ascending to an interpretative point—to be a persuasive goal for digital scholarship.

I’ve tried to structure the new version of the project along those lines. Here’s a draft of the “how to use this site”:

The elements of this site form an interpretative pyramid, something like the e-books that Robert Darnton envisioned.

  • At the base are quotations in the newspaper. You can browse the gallery of quotations to see examples, or see the datasets for a complete list.
  • Those quotations are aggregated into trend lines, which are accompanied by tables of quotations. You can start by browsing the featured verses.
  • Verse histories take the information from the trend line and the quotations and offer brief interpretative essays on their history.
  • Longer essays and other explorations introduce the site, its methods, and address topical questions in the history of the Bible in the United States.

Speaking of e-books, America’s Public Bible will be a digital monograph, and it will be published more like a book than like a website. (It will even have the obligatory colon and subtitle: A Commentary.) But I hope to continue adding things as occasion arises, and one of its primary purposes is to be an ongoing platform for other people’s scholarship, too.

And so I’ve recently become a part of Computing Cultural Heritage in the Cloud at the LC Labs. Part of my aim there is to extend APB by finding biblical quotations across all the Library of Congress’s digital collections. But my not-so-secret other aim is to hang out with the cool folks at LC Labs and my fellow researchers, Andromeda Yelton and Lauren Tilton. (Mission accomplished.) Here’s a post from the Library about the project, and here’s a story in the Wall Street Journal.


Returning to Darnton’s ideas for e-books, there is a kind of homology between his concept of a layered pyramid of scholarship, and what programmers would call “the tech stack.” The stack is the set of technologies which enable some kind of software product. For example, you might have heard of the LAMP stack, which undergirds popular software like WordPress: Linux (the operating system), Apache (the web server), MySQL (the database), and PHP/Perl/Python (the programming language). Well, I don’t use any of those. But I thought I might start writing an occasional series on the technology stack that I do use for my digital research. Why? Because I love PostgreSQL and I think you should too. More on that next time.


Updates

Listening: Unearthed.

Working: Collaborating with colleagues on a map of city-level data from the Censuses of Religious Bodies.

Playing: MLB The Show.

Reading: Ted Gioia, Healing Songs.

Watching: Mythic Quest. The series as a whole is dumb yet charming, but the standalone episode “A Dark Quiet Death” was truly moving.


Hi folks. It has been a hectic couple of months since last I wrote. By a curious confluence of events, in the past week or two a number of lines of work have come to fruition, while others have just gotten started. (Almost all of this work is done in collaboration with my colleagues at RRCHNM. Credit where credit is due, but you’ll have to click through to see all the contributors to these projects.)

With only brief commentary, here is a gallery of work just released or just started.

Collecting These Times

Tomorrow we will release Collecting These Times: American Jewish Experiences of the Pandemic. This site points American Jews and Jewish communities to institutions that are collecting their memories of the pandemic: 72 such collecting efforts and counting.

American Jewish Life

At the same time, we are releasing a visual refresh of American Jewish Life, RRCHNM’s own collection project for the American Jewish community, part of our broader Pandemic Religion collecting project.

Web Monetization

It’s already available on GitHub, but later this week we will announce a Web Monetization module for Omeka S. Web Monetization lets users support websites by streaming small payments. It is still very early days for this technology, but we hope it will let cultural heritage institutions be supported directly by their users.

Citations

Kellen Funk and I are doing some visualizations about the advent of legal modernity in nineteenth- and early twentieth-century Anglo-American law. Here is a first cut of a visualization in support of a presentation Kellen gave last week.

RelEc sketch

Also, I’ve been sketching some visualizations for our American Religious Ecologies project. Sorry no direct link for this map; still too new.

DataScribe

We’ve been developing DataScribe for the transcription of structured data from historical sources. Now we’ve got it integrated with our collection of schedules from the 1926 Census of Religious Bodies and have transcribed nearly ten thousand schedules with more to go.

APB

The visualizations are coming together nicely for the much expanded version of my interactive scholarly work, America’s Public Bible: A Commentary. (Worn out and now outdated prototype.)

Essay

Speaking of which, my essay “The Making of America’s Public Bible: Computational Text Analysis for Religious History” was published this month in Digital Humanities and Research Methods in Religious Studies. You can find a preprint at Humanities Commons.

JSH

We need to send the final version back to the journal this week, but Stephen Robertson and I will have an article on patterns of argument in digital history coming out in the Journal of Social History. It is the introduction to a special section in that journal that we edited, and it will be accompanied by a website that annotates about a dozen articles in digital history to show how they can be models for future work.


We’re hiring

RRCHNM is hiring a full-stack web developer. Here is the job ad. We are doing exciting new things, especially with data and computational history, but building on our existing strengths in public history. We are a great place to work, and we hope that whoever we hire will be a developer-scholar and a real partner in our work. I’m chairing the search, and if this position at all interests you, feel free to reach out. I’d also be grateful if you could pass this along to anyone you know who might be interested.


Updates

Listening: Spotify playlists for country music by decade.

Working: See above.

Playing: Learning the chords for “The Man in Black.”

Reading: Walter Isaacson, Steve Jobs. Do any of you have favorite examples of the biographer’s art that I should read? Feel free to reply and let me know.

Subscribing: My friend Jason Heppler is starting a newsletter on “digital humanities, cities, data, design, libraries, climate, data visualization, art, the environment.” You should subscribe too.


Jimmie Rodgers was the first major star of country music, known for adding his trademark “blue yodel” to vaudeville and hillbilly songs. His persona was as important as the music, and Rodgers cultivated a larger-than-life image as someone who had lived out all his songs about trains, whiskey, and women.

For most of his life, though, Rodgers was wracked by tuberculosis. In 1933, the year he died, the Census Bureau reported 67,422 deaths from respiratory tuberculosis in the continental United States (a rate of 53.6 per 100,000 people). Tuberculosis remained the most common type of death from infection, even if cancers and heart disease edged it out as the overall leading cause of death. Tuberculosis spreads through the air, propelled by coughs and sneezes. While the disease can affect the brain, bones, and other organs, in most people it infects the lungs, leading to uncontrolled coughing, fever, and wasting away.

In 1931, Rodgers recorded the autobiographical song “T.B. Blues” for Victor.

I’ve been fightin’ like a lion,
Looks like I’m going to lose.
I’m fightin’ like a lion,
Looks like I’m going to lose.
‘Cause there ain’t nobody
Ever whipped the T.B. blues.
I’ve got the T.B. blues.

Gee but the graveyard
Is a lonesome place.
Lord, that graveyard
Is a lonesome place.
They put you on your back,
Throw that mud down in your face.
I’ve got the T.B. blues.

Of his cough, Rodgers sang, “My body rattles / Like a train on that old S.P.” At the age of thirty-six, Rodgers coughed to death in a New York hotel room shortly after a recording session. His body was shipped south to Mississippi in the baggage car of a train.

On August 21, 2003, Johnny Cash laid down his second-to-last track, a recording of the last song he wrote, originally titled “Asthma Like the 309.” Cash was 71 years, mostly confined to a wheelchair, taking over thirty medications, suffering from diabetes, neurological, and respiratory diseases, and mourning the recent death of June Carter Cash. He died three weeks after that last recording session.

The song would not be released until 2006 on the posthumous album American V: A Hundred Highways. Of the many songs about death on the album, “Like the 309” is humorous, even up-beat. (Compare it to the unbearably sad “On the Evening Train.”) It pokes fun at Cash’s asthma: “It should be a while before I see doctor Death / So, it would sure would be nice if I could get my breath.” At one point in the recording Cash—who could barely finish a line without running out of breath—exhales into the microphone. Set to an almost jaunty tune, the central image is of a casket being loaded onto a train.

Take me to the depot, put me to bed,
Blow an electric fan on my gnarly ol’ head.
Everybody take a look, see, I’m doin’ fine,
Then load my box on the 309.


Not much happening this week, work-wise. But last week the American Religious Ecologies team released the first 40K+ schedules we have digitized from the 1926 Census of Religious Bodies. You can browse the schedules here and there is an explanation here.


Updates

Listening: American V: A Hundred Highways.

Working: Not working.

Playing: Attempting a piano/guitar duet with my daughter.

Playing: Should I restart Breath of the Wild and try to play it through?

Reading: Bill C. Malone, Country Music USA.


In 1927 the Presbyterian Church in the U.S.A. published a statistical review of their past century, tabulating membership data and other figures from 1826 to 1926. The “laborious task” took the compiler, Herman C. Weber, about three years to accomplish. Its aim was both to make permanent the research he had done on behalf of the denomination as a whole, but also to inform the ongoing ministry of individual congregations.

The volume undertook not just to tabulate but also to visualize the statistics of the church. Weber described his visualizations in a tone that’s not so different from how historians talk about visualizations today. He claimed that “the circle of those who can understand visualizations is very large,” presumably including the businessmen he had mentioned. But even then, the transformation of figures into “lines whose ups and downs and relationships can be seen at a glance” required careful interpretation, which he provided by annotating the charts.

Here are a few pages from Presbyterian Statistics through One Hundred Years, 1826–1926: Tabulated, Visualized, and Interpreted, which you can see in full at the HathiTrust.

Image from Weber's Presbyterian Statistics Image from Weber's Presbyterian Statistics Image from Weber's Presbyterian Statistics Image from Weber's Presbyterian Statistics

Weber’s visualizations were part and parcel of a broader uptick in statistical enumeration and visualizations in the early twentieth century. Two examples. W.E.B. DuBois included statistics of black churches in the visualizations he created for the 1900 Paris Exposition. (Examples here, and recently republished.) In 1906, as part of broader professionalization and expansion of the Census Bureau, the Census of Religious Bodies was added to the population and manufacturing censuses.

Broadly speaking, statistics of religion served at least three purposes. The first was that numbers were a useful rhetorical device. For instance, when Robert Baird mentioned numbers in Religion in America or when Lyman Beecher brought them up in A Plea for the West, those figures served the purpose of either celebrating the rise of evangelical Protestantism or stirring up fears of Roman Catholic dominance. Shari Rabin has written about how American Jews used numbers rhetorically in a fascinating article. (In my book on conversion, I tried to end each chapter discussing such rhetorical counting up of converts.)

A second purpose was to conduct the business of a church or denomination. Of course, baptismal and other sacramental records had been kept for that purpose for many centuries. But in the United States, membership data was also used to determine representation to denominational governing bodies, not unlike the federal population census, whose purpose was to determine Congressional apportionment.

Weber’s book and efforts like it at the start of the twentieth century served a third purpose: a guide to the present and future. Weber claimed that his statistical research “has been especially welcomed by business men in the eldership and has led to to a new emphasis on laymen’s responsibilities.” And—though somewhat hesitantly—he claimed “that they approximate a record of successes and failures in the past” and so were “the best signposts we have, though they may be a trifle hard to read.”


Around the internet

Consolation Prize cover

Check out Consolation Prize, a new podcast from RRCHNM. It tells the always surprising stories of diplomatic consuls, and through them the history of the United States in the world. Readers of this newsletter might especially enjoy episode 4, which features historian of American religion Leigh Eric Schmidt among other guests. The subject is Alexander Russell Webb, a convert to Islam who became a Muslim missionary to the United States. I’m not involved (though I know the principal on the project) but I’m told that the next episode will be about consuls in Jerusalem and the American obsession with the Holy Land.


Updates

Working: Some initial exploration of citations in historical legal cases.

Reading: Sean Wilentz, Bob Dylan in America.

Playing: I can play C, D, G, E minor, and A minor chords. I can even play them one after another, if you don’t mind waiting ten seconds in between.

Otherwise, I’m doing everything I can to get this semester in the can as quickly as possible, even though it went better than expected.


Greetings. Sorry it’s been a while, but welcome back to “Working on It.”


While I’ve been working a lot recently, what I have been working on is grants—writing and in a couple of instances receiving. Like any kind of work, grant writing has its pleasures. There’s the pleasure of craft: knowing what to do and doing it well. Even more there is the pleasure of collaboration. For this most recent round of grant writing, I was fortunate to be working with a team of colleagues where there is a lot of trust. Even the financial and administrative wrangling can, on rare days, produce the satisfaction of overcoming obstacles. (Waiting months for results, though, is never fun.) Still, the work of grant writing is decidedly only a means to an end and dull in and of itself, even if it weren’t imprudent to talk about work that might never come into being.

But one of the ends that grants can bring about is enabling other people’s work. We’ve been going through a strategic planning process at RRCHNM, and I’ve come to realize that the word that describes us best is enablers. Which brings me to DataScribe.

DataScribe logo

For about a year, a team at RRCHNM has been working on an NEH-funded piece of software for transcribing historical sources into datasets. We have launched our public beta, as described here. If you want to be a part, you can find out how to do so on our website.

DataScribe will enable the work of other scholars in two related ways. First, for those who know what they are doing but who could use a tool which offers a better workflow than a mess of spreadsheets, DataScribe will be great for team transcriptions.

Our American Religious Ecologies project is certainly going to benefit. But the other thing DataScribe will do is help scholars who have historical sources that could be datasets, but who don’t know how to go about creating them. Forgive my snobbery, but many more historians want to do this kind of work than actually know how to do it. DataScribe will provide an opinionated way of transcribing datasets alongside educational materials and case studies. We hope that we have figured out to do something well, and that we can enable others to do the same.


While we are on the topic of things RRCHNM has recently released, we’ve also published the third issue of our journal, Current Research in Digital History. There are a number of good journals for digital scholarship out there these days; I often point people to the Journal of Cultural Analytics. But CRDH is filling a useful niche. We encourage and publish scholarship in digital history that offers discipline-specific arguments and interpretations, rather than simply showcasing digital projects. And by featuring short essays that can embed whatever digital content you want, we also seeks to provide an opportunity to make arguments on the basis of ongoing research in larger projects.

We’ve recently changed our publication model (details here), so now we are accepting, peer reviewing, and publishing articles on a rolling basis. If you are an early career scholar or a graduate student, or if you’ve got work in progress and want to get some initial results out into the world, CRDH is a great place to publish. Plus we are fast; publishing with us is measured in months rather than years. All you have to do is offer a meaningful historical interpretation.

Screenshot of a CRDH issue

Brief book note

I’ve probably recommended Tara Isabella Burton’s Strange Rites: New Religions for a Godless World to more people than any book I’ve read in a long time. Partly that is because her compelling descriptions of contemporary religious phenomena—living theater, Harry Potter fandom, witchcraft, wellness movements like SoulCycle, sexual utopianism—that are almost entirely off my radar as a scholar. (Some of you might add, rather unkindly, that they are also remote from my own unfashionable experiences.) Unless you are a twenty-something Brooklynite or denizen of San Francisco who regularly attends spin class, writes fan fiction, and has a rewards card to Goop, I am willing to bet you are going to learn a lot from this book.

I’m less certain about her argument that these phenomena are new religions. More precisely, I don’t really care whether they deserve the label religion or not. The “weak” form of her argument—that these practices are replacing the community formation and meaning making typically provided by more conventionally defined religions—is claim enough for me. Scholars have been a bit obsessed about the rise of the “nones” ever sense the Pew report by that name came out. Burton’s book does more to explain and give color to that trend than anything else I’ve read.

But to be honest, what really captivated me about this book was what I took to be the subtext: that all of these spiritual practices and communities are not just about the self; they are outright selfish. Maybe I am wrong that Burton is making that critique; I certainly don’t think that criticism is explicit. But I do think that it is correct. Let me put it a different way. There is an awful lot that is wrong about American Christianity, and some of it is downright despicable. But at least the message that I hear out of American Christianity, however inconsistently, is that you are supposed to be living for God and for others, and not for yourself. These new religious movements … not so much.


Random screenshot

Not so random, but here are a few screenshots from DataScribe.

DataScribe screenshot DataScribe screenshot DataScribe screenshot

Updates

Working: Made some substantial progress on the interactive visualizations for America’s Public Bible. Also, did I mention that I’ve been writing a lot of grants?

Reading: Zev Eleff’s Authentically Orthodox: A Tradition-Bound Faith in American Life. Whether you come to this book because you care about the history of American Judaism or because you want to understand how religious “traditions” works in America, this is a heck of a book.

Listening: All Johnny Cash, all the time.

Playing: I bought a guitar and I’m starting to learn. See previous item.

Watching: I started Ted Lasso grudgingly and then loved it.


Greetings. And welcome back to “Working on It.”


Everyone who writes is familiar with the process of revision. Less familiar is the process of revising a visualization. The goal is to make the visualization display as much meaning as the data will support, and no more. One also has a responsibility to do the best one can to avoid having the visualization misinterpreted. There is a way of talking about visualizations—a mistake, in my view—that focuses primarily on their distortions and deceptions, as if people never lie with words. Nevertheless, most people are less sophisticated at reading visualizations than prose, and so authors bear more of a burden to do right by the reader.

As an example of trying to make an honest visualization, here is a series of screenshots showing revisions I made to America’s Public Bible. These visualizations are part of the expanded version of the site and aren’t available yet; only the prototype is up. The goal of this particular visualization is to let the reader pick a Bible verse, and then show the trend in the rate of quotations over time in American newspaper.

Making a visualization like this entails innumerable small choices along the way: here are a few of the salient ones. Let’s start with what the dataset looks like.

Table of predictions

The prediction model I trained starts with newspaper pages. After identifying potential quotations, it makes a prediction: is this a quotation from the Bible or not? The result is table like the one above. Each row indicates that a particular verse appears on a particular newspaper page, with the probability that it is actually a quotation and not just a false positive. That kind of machine learning, called supervised classification, has some well-established procedures to ensure that the predictions are honest and not the result of self-deception. Important as they are, those rules need not detain us now.

This dataset has a few specific considerations for analyzing it honestly. One consideration is what the correct threshold is for determining whether something is a genuine quotation or not. Obviously it has to be more than 50%, but how much more? In the prototype I set that threshold at 90% because, frankly, I didn’t want to be embarrassed by people noticing entries that weren’t quotations. But that’s not quite honest either: it leaves out many, many quotations which are genuine but just have a lower probability. (The tradeoff here is similar to that of a medical diagnostic test, albeit with rather different stakes.) The revised version of the site will include quotations above 58%. Another consideration is the fact that the OCR for Chronicling America can be quite dreadful in places; obviously it is not possible to find a quotation in a bunch of gibberish. In calculating the rate of quotations, the number of quotations is the numerator but I have also had to figure out how to exclude junk OCR in order to get the denominator right.

Once we’ve got a dataset of quotations, it is straightforward to aggregate them into the rate of quotations per word per year. Visualizing the rate, instead of the number, of quotations is a key decision. In this case it is an obvious choice. The number of quotations almost always goes up over time, simply because there were a lot more newspaper from the end of the nineteenth century that were digitized than from the beginning. The trend is what we really care about.

Comparison of trend lines

Drawing a trend line is not as simple as it might seem. The contrived screenshot above shows two different options for how to visualize the trend. The gray line is the simplest option, but not very good; the red line is a better option. The gray line is very spiky because it shows each individual year, and because each pair of points is joined by a straight line segment. Year to year the data is noisy and we can only expect to find meaning in the long-term trends. I have seen people try to divine meaning from every little spike, when they are just noise. So the red line is a better, if not entirely satisfactory approach. For that line, each data point is shown as a five-year rolling average (e.g., the data point for 1860 is the average of the years 1858–1862). And the data points are joined with a smooth curve. The red trend line is the best I can come up with to represent the nature of the actual trend: in most instances quite gradual, with occasional genuine spikes.

The interface around the visualization is unfinished, but there are a few other decisions that had to be made. One is which verses to include in the list the user can select. This is somewhat subjective, but in many instances there are verses which have a very high rate of false positives. For example, the phrase “went into the city” frequently appears in both the Bible and (who would have guessed?) in newspapers. Then too, the Bible frequently repeats itself: the synoptic Gospels and quotations of the Old Testament in the New Testament are just the most obvious examples. In this case, the verse “Suffer little children to come unto me” was one of the most popular verses in the nineteenth century, but there is no reliable way to computationally distinguish between a quotation to Matthew 19:14, Mark 10:14, and Luke 18:16. In such instances, I collapse all the quotations into a single reference which stands for all three.

Cutoff date

The next consideration is which cutoff dates to pick for the visualization. The visualization above shows the trend to the maximum chronological extent of the Chronicling America corpus in the 1960s, whereas the visualizations above extend only to 1922. There appears to be a huge increase in the rate of quotations in the mid twentieth century, but displaying that trend would be misleading. There are next to no newspapers in Chronicling America after 1922. (Unreasonably long copyright terms rear their ugly heads again.) Thus we face what we might call the batting average problem: if you only have a few plate appearances and get lucky, you could end up with an abnormally good batting average. So, it’s more honest to represent the trend line only when there is a substantial underlying corpus. For Chronicling America, the correct minimum and maximum years are easily determined.

Completed visualization

This last screenshot is what the mostly complete visualization looks like. This version adds another feature to help readers accurately interpret the trends. In addition to showing the trend line for Chronicling America, it also shows the trend line for a separate corpus, Gale’s Nineteenth-century Newspapers. Since the corpora don’t completely overlap, one would not expect the trend lines to be identical. But one would also expect them to be close, as they are here.

Many difficult decisions go into making a visualization as honest as possible, and then there is more work to be done interpreting it honestly. More on the problem of interpretation some other time, but I will leave you with my parlor trick for giving talks about this project.


Brief book note

Two well-written, recently published works take up the themes of masculinity and femininity within American evangelicalism or megachurch Christianity.

Kristen Kobes Du Mez’s Jesus and John Wayne: How White Evangelicals Corrupted a Faith and Fractured a Nation chronicles the stream of masculinity that has pervaded American evangelicalism. No mere monograph, this history is quite sweeping in its interpretation of the movement in terms of its advocacy for (or perhaps, obsession with?) a particular configuration of family and gender roles. The next time that I teach twentieth-century evangelicalism, I will likely start with Du Mez’s book to provide the framework. (Don’t miss this tongue-in-cheek review.)

Kate Bowler’s The Preacher’s Wife: The Precarious Power of Evangelical Women Celebrities is a fascinating study of celebrity Christian women, often the wives of megachurch pastors. The main dynamic she explores is that evangelical women operate under much stricter theological constraints governing gender roles than their liberal or mainline counterparts, but evangelical women have much great access to the marketplace by which celebrity power is created. It’s a fitting sequel to Bowler’s Blessed, on the history of the prosperity gospel.


Random screen shot

Too many screenshots above, so here is a photo of my tomato plants. No fig tree, but I’m thankful that I can sit in peace and safety under my own vines.

Tomato plants

Updates

Reading: Already finished it, but next time I’ll write briefly about Tara Isabella Burton’s Strange Rites. Currently reading Diana Butler Bass’s Standing Against the Whirlwind: Evangelical Episcopalians in Nineteenth-Century America.

Working: Getting reading to teach a minor field readings for PhD students in American religion, the DH Practicum for incoming students at RRCHNM, as well as a course on “Capitalism and American Religion.” Here’s the syllabus for that last one.

Watching: The Expanse. I promised my colleagues at RRCHNM that I would finish it before we reopen, but I’m not going to meet my goal.

Playing: Some PGA Tour game on the Xbox, which is infinitely more stressful than any first-person shooter.