Clio 2: Computational History (Spring 2019)
In this course you will learn to use computational methods to create historical arguments. You will work with historical data, including finding, gathering, manipulating, analyzing, visualizing, and arguing from data, with special attention to geospatial, textual, and network data. These methods will be taught primarily through scripting in the R programming language. While historical methods can be applied to many topics and time periods, they cannot be understood separate from how the discipline forms meaningful questions and interpretations, nor divorced from the particularities of the sources and histories of some specific topic. You will therefore work through a series of example problems using datasets from the history of the nineteenth-century United States, and you will apply these methods to a dataset in your own field of research.
Additional emphasis will be placed on publishing your scholarship on the web. While this is not a course in web development, you will learn the basics of publishing documents on the web, including familiarity with command line programs, getting files onto servers, basic web technologies such as HTML and CSS, Git and GitHub, and packages such as RMarkdown for publishing data analysis.
In other words, this class will teach you how to have something historically meaningful to say from data, and how to publish what you want to say on the web.
After taking this course, you will be able to
- perform exploratory data analysis; clean, tidy, and manipulate data; gather historical data from print and manuscript sources; use existing historical data sets; create common visualizations; work with geospatial, textual, and network data.
- write scripts using the R programming language and its extensive set of packages.
- understand the place of data analysis and visualization within humanities computing, digital history, and the discipline of history.
- conceive of and execute a short research project in computational history.
- publish your scholarship on the web.
This is a graduate methods course in a field that moves reasonably quickly. The syllabus is likely to change over the course of the semester. In particular, I am likely to send you additional projects or visualizations to look at before class.
You are always welcome to talk with me during office hours. While you can drop in, I strongly encourage you to book an appointment. If the scheduled times don’t work for you, email me and suggest a few other times that would work for you.
All communication for this course will happen in our Slack group. Read this getting started guide if you need help. This is your primary place to ask for help.
Bring a computer to each class meeting. See the list of software that you will need to install under the heading for the first week. This class is going to assume that you a computer with some kind of Unix-like operating system available. The easiest will be macOS or a Linux distribution. But if you use Windows, good news, you can install the Windows Subsystem for Linux, though after that you are mostly on your own to figure out the peculiarities of Windows.
You will need a basic web server and your own domain. If you do not already have a domain and web hosting, the personal shared hosting from Reclaim Hosting will be more than adequate.
One textbook is required in print (though it is partially available online).
- Kieran Healy, Data Visualization: A Practical Introduction (Princeton University Press, 2018). ISBN: 9780691181622.
All the other required readings are available online or through the GMU libraries, though they can also be purchased—sometimes in more complete editions—in print or e-books. These are the books we will use most frequently.
- Shawn Graham, Ian Milligan, and Scott Weingart, Exploring Big Historical Data: The Historian’s Macroscope (Imperial College Press, 2015).
- Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly, 2017).
In general I have provided datasets and questions for you to work on for all the assignments except the final paper. But for any assignment, you may substitute a dataset from your own historical interests after checking with me. The forward-thinking graduate student will try to find such datasets early on in the semester so that you can use the intermediate assignments as preparation for your final assignment. If you can peer even farther into the future, you can try to use the final assignment as a test run of work you might want to do in one of your own research projects, such as an article or a dissertation.
Assignments should be submitted via this form unless otherwise instructed. For each assignment you will be expected to turn in two things. The first will be a web page with a public-facing presentation of your work. The other will be a GitHub repository with the source code that creates that web page.
Preparation and participation are expected as a matter of course in a graduate class. Complete all readings and submit all assignments before class. If the readings include sample code or questions at the end, work through them as part of doing the readings. Final grades will be calculated using the typical percentage-based grading scale (A = 93–100, A- = 90–92, B+ = 88–89, B = 83–87, B- = 80–82, … F = 0–59).
Worksheets and weekly assignments (20%). Many classes will have an assignment due before class begins. Some will require you to do library research; others will be practice data analysis worksheets. Some of the questions on the worksheets will be easy; most will be difficult; some you may find nearly impossible. The aim is to practice. We will go over the worksheets in class each week. If you attempt a problem and can’t solve it, you should still turn in whatever work you did on it. Students who complete all the easy and moderately difficult questions, attempt the very difficult questions, and ask for help as needed will do just fine. These assignments will graded by completion.
Analysis assignments (3 × 15% = 45%). You will do three analysis assignments, each demonstrating a specific skill in data analysis. For these assignments you will be given a historical dataset and asked some interpretative questions. You will prepare an RMarkdown document containing prose, code, and tables or visualizations to answer the historical questions and, as necessary, explain your methods. You will be given a starter GitHub repository that you can fork with the data and questions.
Research paper (35%). You will write a research paper suitable for a presentation at a disciplinary or digital humanities conference. This paper should advance a historical argumentation on the basis computational historical methods, though you can and should use more traditional historical methods as necessary. The body of the paper should be about 2,000 words in length. It should include notes in Chicago format like any other work of history. The paper should include embedded visualizations or tables as appropriate. Each table and figure must have a caption written in complete sentences. The paper should be attractively presented on your website using the Radix RMarkdown format. Explain your methods as needed, but write in a way which would be understandable and compelling to any historian working in your field. The paper should be accompanied by a GitHub repository containing your data and code in a reproducible analysis. Ideally this paper could be presented at a conference, and it could serve as a trial for computational work you might do in a larger research project. As a model, see the most recent CFP for Current Research in Digital History. Due Monday, May 13 at 5pm.
Week 1 (January 28): The web
Do your level best to get these set up before the first day of class:
- Join the class Slack group.
- Get your own domain and web hosting, at Reclaim Hosting if you don’t have it already.
- Install R, a programming language for data analysis.
- Install RStudio, an environment for using R.
- Install Visual Studio Code, a general-purpose text editor for developers.
- Install Cyberduck, an FTP client.
- Get a GitHub account and post it to the Slack group. (E.g., I am
lmullenand this is my GitHub profile.)
- Install Git (more detailed guide).
- Install Homebrew (only if you use macOS).
- Install Windows Subsystem for Linux (only if you use Windows).
- Alan Jacobs, “Tending the Digital Commons: A Small Ethics toward the Future,” Hedgehog Review 20, no. 1 (2018).
- Arguing with Digital History working group, “Digital History and Argument,” white paper, Roy Rosenzweig Center for History and New Media (November 13, 2017).
Week 2 (February 4): Data from history and historians
- Find at least three primary source data tables, datasets, or corpora from your field of historical research. Post full citations and URLs in the Slack group, along with a sentence or two explaining what you’ve found. Examine the links that other people post before class.
- Shari Rabin, “‘Let us Endeavor to Count Them Up’: The Nineteenth-Century Origins of American Jewish Demography,” American Jewish History 101, no 4 (2017): 419–440, https://doi.org/10.1353/ajh.2017.0060.
- Laurie F. Maffly-Kipp, “If It’s South Dakota You Must Be Episcopalian: Lies, Truth-Telling, and the Mapping of U.S. Religion” Church History 71, no. 1 (2002): 132–42.
- Roger Finke and Rodney Stark, The Churching of America, 1776-2005: Winners and Losers in Our Religious Economy (Rutgers University Press, 2005), ch. 1.
- Jessica Marie Johnson, “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads,” Social Text 36, no. 4 (2018): 57–79, https://doi.org/10.1215/01642472-7145658.
- Frederick W. Gibbs, “New Forms of History: Critiquing Data and Its Representations,” The American Historian (February 2016).
- Taylor Arnold and Lauren Tilton, “New Data: The Role of Statistics in DH,” in Debates in DH 2019, ed. Matthew K. Gold and Lauren F. Klein (University of Minnesota Press, forthcoming 2019).
- Fletcher W. Hewes and Henry Garnett, Scribner’s Statistical Atlas of the United States Showing by Graphic Methods Their Present Condition and Their Political, Social and Industrial Development (New York: Charles Scribner’s Sons, 1883), plates 58 to 61.
- Herman Carl Weber, Presbyterian Statistics through One Hundred Years, 1826-1926 (Philadelphia: Presbyterian Church in the U.S.A., 1927), part II
Week 3 (February 11): Basics of R
- If you found a primary source dataset last week that is worth transcribing, then you can transcribe it. Otherwise, transcribe some of the Minutes of the Methodist Episcopal Church from after 1851. Whichever source you use, transcribe at least 25 rows of the data into a spreadsheet. Be prepared to describe in class how you decided on the structure of your data, and how you identified what the variables were. (Hint: read the Broman and Woo article before you start this assignment.)
- Karl W. Broman and Kara H. Woo, “Data Organization in Spreadsheets,” American Statistician 72, no. 1 (2018): 2–10, https://doi.org/10.1080/00031305.2017.1375989.
- Wickham and Grolemund, R for Data Science, ch. 1, 4, 6, 8.
- Read “Introduction” and “Getting Started” from Computational Historical Thinking.
Week 4 (February 18): Data manipulation
- Wickham and Grolemund, R for Data Science, ch. 5, 12.
- Graham, Milligan, Weingart, Macroscope, ch. 1–2.
- William G. Thomas III, “Computing and the Historical Imagination,” in A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth (Blackwell, 2004).
Week 5 (February 25): Data visualization
- Wickham and Grolemund, R for Data Science, ch. 3.
- Healy, Data Visualization, ch. 1, 3.
- Graham, Milligan, Weingart, Macroscope, ch 5.
- Kieran Healy and James Moody, “Data Visualization in Sociology” Annual Review of Sociology, 40:105–128.
Week 6 (March 4): More data manipulation and visualizations
- Healy, Data Visualization, ch 4–5, 8.
- Wickham and Grolemund, R for Data Science, ch. 13.
- Lauren F. Klein, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings,” American Literature 85, no. 4 (December 1, 2013): 661–88, https://doi.org/10.1215/00029831-2367310.
- John Theibault, “Visualizations and Historical Arguments,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty (University of Michigan Press, 2013), https://doi.org/10.3998/dh.12230987.0001.001.
Spring break (March 11)
Week 7 (March 18): Exploratory data analysis
- Wickham and Grolemund, R for Data Science, ch. 7, 19–21.
- Roger Peng, Exploratory Data Analysis with R (Leanpub, 2016), chs. 1, 4–6.
- Skim: Wickham and Grolemund, R for Data Science, ch. 27, 29.3-29.4, 30.
Week 8 (March 25): Mapping
- Healy, Data Visualization, ch 7.
- Richard White “What is Spatial History?.”
- Cameron Blevins, “Space, Nation, and the Triumph of Region: A View of the World from Houston,” Journal of American History 101, no. 1 (2014): 122–47, https://doi.org/10.1093/jahist/jau184.
- Documentation for the leaflet package.
Week 9 (April 1): Mapping
- Todd Presner and David Shepard, “Mapping the Geospatial Turn,” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 201–212. GMU library
- Bret E. Carroll, “Spatial Approaches to American Religious Studies,” Oxford Research Encyclopedia of Religion (Oxford University Press, 2015), https://doi.org/10.1093/acrefore/9780199340378.013.13.
- Stephen Robertson, “Putting Harlem on the Map,” in Writing History in the Digital Age, edited by Jack Dougherty and Kristen Nawrotzki (University of Michigan Press, 2013).
Week 10 (April 8): Text analysis
- Kasper, Welbers, Wouter van Atteveldt, and Kenneth Benoit, “Text analysis in R,” Communications Methods and Measures 11, no. 4: 245–265, https://doi.org/10.1080/19312458.2017.1387238.
- Taylor Arnold, Nicolas Ballier, Paula Lissón, and Lauren Tilton, “Beyond Lexical Frequencies: Using R for Text Analysis in the Digital Humanities.” preprint
- Ben Schmidt, “Vector Space Models for the Digital Humanities” (October 25, 2015).
- Ben Schmidt, “Rejecting the Gender Binary: A Vector-Space Operation” (October 30, 2015).
- Graham, Milligan, Weingart, Macroscope, chs. 3–4.
Week 11 (April 15): Text analysis
- Tim Hitchcock and William J. Turkel, “The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior,” Law and History Review 34, no. 4 (2016): 929–955, https://doi.org/10.1017/S0738248016000304.
- Matthew K. Gold et al., “Forum: Text Analysis at Scale,” in Debates in the Digital Humanities 2016 (University of Minnesota Press, 2016), 525–568.
- Ryan Cordell, “Reprinting, Circulation, and the Network Author in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): 417–445, https://doi.org/10.1093/alh/ajv028.
- David A. Smith, Ryan Cordell, and Abby Mullen, “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): E1–E15, https://doi.org/10.1093/alh/ajv029.
- Robert K. Nelson, Mining the Dispatch.
Week 12 (April 22): Network analysis
- Graham, Milligan, Weingart, Macroscope, ch. 6–7.
- Matthew Lincoln, “Social Network Centralization Dynamics in Print Production in the Low Countries, 1550–1750,” International Journal for Digital Art History 2 (2016): 134–157, https://doi.org/10.11588/dah.2016.2.25337.
- “AHR Forum: Mapping the Republic of Letters,” American Historical Review 122, no. 2 (2017): 399–463.
- Rebecca Sutton Koeser, “Trusting Others to ‘Do the Math’” Interdisciplinary Science Reviews 40, no. 4 (2015): 376–392, https://doi.org/10.1080/03080188.2016.1165454.
- Benjamin Schmidt, “Do Digital Humanists Need to Understand Algorithms?” in Debates in the Digital Humanities 2016, ed. Matthew K. Gold and Lauren F. Klein (University of Minnesota Press, 2016).
Week 13 (April 29): Supervised classification
- Gareth James, et al., An Introduction to Statistical Learning: With Applications in R (Springer, 2013), ch. 1, 2, 4. GMU library
- Matthew L. Jockers and Ted Underwood, “Text-Mining the Humanities” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 291–306. GMU library
Week 14 (May 6): TBD
Topic and readings to be determined by the needs of student research papers.
- Ben Marwick, Carl Boettiger, and Lincoln Mullen, “Packaging Data Analytical Work Reproducibly Using R (and Friends),” American Statistician 72, no. 1 (2018): 80–88, https://doi.org/10.1080/00031305.2017.1375986.
This syllabus may be updated online as necessary. The online version of this syllabus is the only authoritative version.
Students must satisfactorily complete all assignments (including participation assignments) in order to pass this course. Your attendance is expected at every meeting. If you must be absent, I request that you notify me in advance of the class meeting. I am sometimes willing to grant extensions on assignments for cause, but you must request an extension before the assignment’s due date. For every day or part of a day that an assignment is late without an extension, I may reduce your grade. No work (other than final projects) will be accepted after the last day that the class meets. I will discuss grades only in person during office hours.
See the George Mason University catalog for general policies, as well as the university statement on diversity. You are expected to know and follow George Mason’s policies on academic integrity and the honor code. If you are a student with a disability and you need academic accommodations, please see me and contact the Office of Disability Services at 703-993-2474 or through their website. You are responsible for verifying your enrollment status. All academic accommodations must be arranged through that office. Please note the dates for dropping and adding courses from the GMU academic calendar.