In this methods course you will be introduced to data manipulation and visualization for historians. You will learn to work with historical data, including finding, gathering, manipulating, analyzing, visualizing, and arguing from data, with special attention to geospatial, textual, and network data. These methods will be taught primarily through scripting in the R programming language, using other command line tools as appropriate. While historical methods can be applied to many topics and time periods, they cannot be understood separate from how the discipline forms meaningful questions and interpretations, nor divorced from the particularities of the sources and histories of some specific topic. Therefore, in this course we will examine the historiographical tradition to see how historians have used data and visualization to understand the past. And we will work together to apply these methods to a linked series of datasets, some of which we will create ourselves, in the history of nineteenth-century American religion.
After taking this course, you will be able to
- perform exploratory data analysis; clean, tidy, and manipulate data; gather historical data from print and manuscript sources; use existing historical data sets provided by government or other research groups; create common visualizations; work with geospatial, textual, and network data; understand the basics of some machine learning techniques.
- write scripts using the R programming language and its extensive set of packages, as well as use command line programs.
- understand the place of data analysis and visualization within humanities computing, digital history, and the discipline of history.
- conceive of and execute a historical research project suitable for treatment in a dissertation chapter or journal article.
- take the course “Programming in History/New Media,” a.k.a. Clio 3, should you choose.
You will need to secure copies of these books. All other readings are available on the schedule.
Shawn Graham, Ian Milligan, Scott Weingart, Exploring Big Historical Data: The Historian’s Macroscope (Imperial College Press, 2015).
Daniel Kaplan, Data Computing: An Introduction to Wrangling and Visualization with R (Project Mosaic, 2015).
Isabel Meirelles, Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations (Rockport Publishers, 2013).
Susan Schulten, Mapping the Nation: History and Cartography in Nineteenth-Century America (University of Chicago Press, 2012).
I may also post additional material to my book in progress on the subject of the course.
We will use this software during the course. Plan on bringing a computer to each class meeting.
- An RStudio Server instance (kindly provided by RRCHNM) will give you access to R in your browser. You will be given a user name and password. You should, however, consider installing R and RStudio Desktop on your own computer.
- That same server will give you access to a Linux (CentOS, to be precise) command line which you can access through SSH. You will receive instruction on how to do this in class. Mac and Linux computers come with SSH installed; on Windows you will need to install PuTTY.
- Much of your work for the course will go on GitHub. Sign up for an account.
- Slack will replace e-mail and Blackboard for our course. You will receive an invitation to the Mason Data and DH team. You may wish to install one of the apps. Here is an introduction to Slack from one of Kris Shaffer’s courses.
- Get a real text editor and CSV reader. On Mac Sublime Text or Atom and on Windows Sublime Text or Notepad++ seem to be popular choices. The one true text editor is Vim, but it has a steep learning curve. Download LibreOffice because it does better than Excel at not messing with CSV files.