Clio 2: Computational History (Spring 2018)
In this course you will learn to apply computational methods to create historical arguments. You will learn to work with historical data, including finding, gathering, manipulating, analyzing, visualizing, and arguing from data, with special attention to geospatial, textual, and network data. These methods will be taught primarily through scripting in the R programming language. While historical methods can be applied to many topics and time periods, they cannot be understood separate from how the discipline forms meaningful questions and interpretations, nor divorced from the particularities of the sources and histories of some specific topic. You will therefore work through a series of example problems using datasets from the history of the nineteenth-century U.S. religion, and you will apply these methods to a dataset in your own field of research.
After taking this course, you will be able to
- perform exploratory data analysis; clean, tidy, and manipulate data; gather historical data from print and manuscript sources; use existing historical data sets; create common visualizations; work with geospatial, textual, and network data.
- write scripts using the R programming language and its extensive set of packages.
- understand the place of data analysis and visualization within humanities computing, digital history, and the discipline of history.
- conceive of and execute a research project in computational history suitable for treatment in a dissertation chapter or journal article.
- take the course “Programming in History/New Media,” a.k.a. Clio 3, should you choose.
You are always welcome to book an appointment during my office hours. If the times that are available do not work for you, feel free to contact me. All communication for this course will happen in our Slack group. Read this getting started guide if you need help.
Bring a computer to each class meeting. We will use R and RStudio. Install them on your own computer. You will also have access to an RStudio Server instance which will let you use R in your browser. Much of your work for the course will go on GitHub, so sign up for an account.
All required readings are available online for free or through the GMU libraries, though they can also be purchased (sometimes in more complete editions) in print or e-books. These are the books we will use most frequently.
- Shawn Graham, Ian Milligan, and Scott Weingart, Exploring Big Historical Data: The Historian’s Macroscope (Imperial College Press, 2015).
- Kieran Healy, Data Visualization: A Practical Introduction (Princeton University Press, forthcoming 2018).
- Roger Peng, Exploratory Data Analysis with R (Leanpub, 2016).
- Julia Silge and David Robinson, Text Mining with R: A Tidy Approach (O’Reilly, 2017).
- Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly, 2017).
Be prepared. Preparation and participation are expected as a matter of course in a graduate class. Complete all readings and assignments before class. If the readings include sample code or questions at the end, work through them as part of doing the readings.
Worksheets and weekly assignments (20%). Many classes will have an assignment due before class begins. Some will require you to do library research; others will be practice data analysis worksheets. Some of the questions on the worksheet will be easy; most will be difficult; some you may find nearly impossible. The aim is to practice. We will go over the worksheets in class each week. If you attempt a problem and can’t solve it, you should still turn in whatever work you did on it. Students who complete all the easy and moderate difficulty questions, attempt the very difficult questions, and ask for help as needed will do just fine. These assignments will graded by completion, with three levels: “incomplete,” “acceptable,” “excellent.” Unless otherwise specified, these assignments should be submitted as a PDF or a standalone HTML file, one file per assignment. Name them like this:
Mullen-worksheet-week02.pdf. Submit them to this Dropbox folder.
Analysis assignments (3 @ 15% each). You will do three analysis assignments, each demonstrating a specific skill in data analysis. For these assignments you will be given a historical dataset and asked some interpretative questions. You will prepare an RMarkdown document containing prose, code, and tables or visualizations to answer the historical questions and, as necessary, explain your methods. You will be given a starter GitHub repository with the data and questions. Submit your final analysis as an HTML file along with your R Markdown file to this Dropbox folder.
You will also be evaluated on the code in your GitHub repository, which I must be able to run on my computer.
R package tutorial (10%). At our second meeting, you will pick from a list of R packages not covered in this class. You will be assigned a week (beginning at week 7) during which you will teach the class for 15 minutes about the topic you selected. As part of that teaching, you will prepare a PDF handout. That handout should include these parts: (1) a one- to two-paragraph summary of what the package does and while it is useful; (2) a brief section of example code and results; (3) a bulleted list of examples (historical if possible) where the package was used. A draft of that handout is due to me one week before you are scheduled to teach. I will offer feedback, and you will give the class a revised version in Slack on the Friday before you teach.
Research paper (25%). You will write one research paper suitable for a presentation at a disciplinary or digital humanities conference (see for example the CFP for Current Research in Digital History, or the CFP for the major conference in your field). This paper must advance a historical argument using data analysis of a set of sources that you choose from your research interests. Submit this paper as a PDF or self-contained HTML file (if it includes interactive visualizations) to this Dropbox folder. Further instructions will be given throughout the semester. Due May 10.
Week 1 (Jan. 22): Introduction
- Arguing with Digital History working group, “Digital History and Argument,” white paper, Roy Rosenzweig Center for History and New Media (November 13, 2017).
- Read “Introduction” and “Getting Started” from Computational Historical Thinking.
Week 2 (Jan. 29.): Data from history and historians
- Getting familiar with R worksheet.
- Find primary source data tables, datasets, or corpora from your field of historical research. At least one of these must be a source which can be transcribed into a tabular dataset in a later week. Post full citations and URLs in the Slack group, along with a sentence or two explaining what you’ve found. Examine the links that other people post before class.
- Wickham and Grolemund, R for Data Science, ch. 1, 4, 6, 8.
- Shari Rabin, “‘Let us Endeavor to Count Them Up’: The Nineteenth-Century Origins of American Jewish Demography,” American Jewish History 101, no 4 (2017): 419–440, https://doi.org/10.1353/ajh.2017.0060.
- Roger Finke and Rodney Stark, The Churching of America, 1776-2005: Winners and Losers in Our Religious Economy (Rutgers University Press, 2005), ch. 1, 5.
- Laurie F. Maffly-Kipp, “If It’s South Dakota You Must Be Episcopalian: Lies, Truth-Telling, and the Mapping of U.S. Religion” Church History 71, no. 1 (2002): 132–42.
- Fletcher W. Hewes and Henry Garnett, Scribner’s Statistical Atlas of the United States Showing by Graphic Methods Their Present Condition and Their Political, Social and Industrial Development (New York: Charles Scribner’s Sons, 1883), plates 58 to 61.
- Herman Carl Weber, Presbyterian Statistics through One Hundred Years, 1826-1926 (Philadelphia: Presbyterian Church in the U.S.A., 1927), part II
Week 3 (Feb. 5): Data manipulation
- Data structures worksheet.
- Transcribe at least 50 rows of the historical data you found last week. Be prepared to describe in class how you decided on the structure of your data.
- Wickham and Grolemund, R for Data Science, ch. 5, 12.
- Karl W. Broman and Kara H. Woo, “Data Organization in Spreadsheets” PeerJ Preprints 5:e3183v1 (2017): https://doi.org/10.7287/peerj.preprints.3183v1.
- Frederick W. Gibbs, “New Forms of History: Critiquing Data and Its Representations,” The American Historian (February 2016).
- Graham, Milligan, Weingart, Macroscope, ch. 1–2.
- William G. Thomas III, “Computing and the Historical Imagination,” in A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth (Blackwell, 2004).
- Susan Hockey, “The History of Humanities Computing,” in A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth (Blackwell, 2004).
Week 4 (Feb. 12): Data visualization
- Wickham and Grolemund, R for Data Science, ch. 3.
- Healy, Data Visualization, ch. 1, 3.
- Graham, Milligan, Weingart, Macroscope, ch 5.
- Kieran Healy and James Moody, “Data Visualization in Sociology” Annual Review of Sociology, 40:105–128.
Week 5 (Feb. 19): Data manipulation and visualizations
- Wickham and Grolemund, R for Data Science, ch. 12–13.
- Healy, Data Visualization, ch 4–5, 8.
- John Theibault, “Visualizations and Historical Arguments,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty (University of Michigan Press, 2013), https://doi.org/10.3998/dh.12230987.0001.001.
Week 6 (Feb. 26): Exploratory data analysis
- Wickham and Grolemund, R for Data Science, ch. 7, 19, 21.
- Peng, Exploratory Data Analysis, ch. 1, 4–6.
- Jenny Bryan et al, Happy Git and GitHub for the useR.
- Skim: Wickham and Grolemund, R for Data Science, ch. 27, 29.3-29.4, 30.
Week 7 (Mar. 5): Mapping
- Healy, Data Visualization, ch 7.
- Documentation for the leaflet package.
- Richard White “What is Spatial History?.”
- Cameron Blevins, “Space, Nation, and the Triumph of Region: A View of the World from Houston,” Journal of American History 101, no. 1 (2014): 122–47, https://doi.org/10.1093/jahist/jau184.
Week 8 (Mar. 19): Mapping
- Todd Presner and David Shepard, “Mapping the Geospatial Turn,” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 201–212. GMU library
- Bret E. Carroll, “Spatial Approaches to American Religious Studies, Oxford Research Encyclopedia of Religion (Oxford University Press, 2015), https://doi.org/10.1093/acrefore/9780199340378.013.13.
- Stephen Robertson, “Putting Harlem on the Map,” in Writing History in the Digital Age, edited by Jack Dougherty and Kristen Nawrotzki (University of Michigan Press, 2013).
Week 9 (Mar. 26): Text analysis
- Mapping assignment due (see GitHub repository for data and instructions).
- Silge and Robinson, Tidy Text Mining with R, ch. 1–2, 4–7.
- Wickham and Grolemund, R for Data Science, 14.
- Graham, Milligan, Weingart, Macroscope, chs. 3–4.
- Tim Hitchcock and William J. Turkel, “The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior,” Law and History Review 34, no. 4 (2016): 929–955, https://doi.org/10.1017/S0738248016000304.
- Caitlin: forcats
- Greta: lubridate
Week 10 (Apr. 2): Text analysis
- Matthew L. Jockers and Ted Underwood, “Text-Mining the Humanities” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 291–306. GMU library
- Matthew K. Gold et al., “Forum: Text Analysis at Scale,” in Debates in the Digital Humanities 2016 (University of Minnesota Press, 2016), 525–568.
- Ryan Cordell, “Reprinting, Circulation, and the Network Author in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): 417–445, https://doi.org/10.1093/alh/ajv028.
- David A. Smith, Ryan Cordell, and Abby Mullen, “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): E1–E15, https://doi.org/10.1093/alh/ajv029.
- Andrew: DT
- Chris: stringr
Week 11 (Apr. 9): Network analysis
- Text analysis assignment due (see GitHub repository for data and instructions).
- Graham, Milligan, Weingart, Macroscope, ch. 6–7.
- Rebecca Sutton Koeser, “Trusting Others to ‘Do the Math’” Interdisciplinary Science Reviews 40, no. 4 (2015): 376–392, https://doi.org/10.1080/03080188.2016.1165454.
- Benjamin Schmidt, “Do Digital Humanists Need to Understand Algorithms?” in Debates in the Digital Humanities 2016, ed. Matthew K. Gold and Lauren F. Klein (University of Minnesota Press, 2016).
- Clarke: rvest
- Brian: ggrepel
Week 12 (Apr. 16): Network analysis
- Matthew Lincoln, “Social Network Centralization Dynamics in Print Production in the Low Countries, 1550–1750,” International Journal for Digital Art History 2 (2016): 134–157, https://doi.org/10.11588/dah.2016.2.25337.
- “AHR Forum: Mapping the Republic of Letters,” American Historical Review 122, no. 2 (2017): 399–463.
- Kenny: magick
- Jay: iheatmapr
Week 13 (Apr. 23): TBD
Topic and readings to be determined by the needs of student research papers.
- Alan: dygraphs
- John: Bookdown
- Spencer: Shiny
Week 14 (Apr. 30): TBD
Topic and readings to be determined by the needs of student research papers.
This syllabus may be updated online as necessary. The online version of this syllabus is the only authoritative version.
Students must satisfactorily complete all assignments (including participation assignments) in order to pass this course. Your attendance is expected at every meeting. If you must be absent, I request that you notify me in advance of the class meeting. I am sometimes willing to grant extensions for cause, but you must request an extension before the assignment’s due date. For every day or part of a day that an assignment is late without an extension, I may reduce your grade. No work (other than final exams and final projects) will be accepted later than the last day that the class meets. I will discuss grades only in person during office hours.
See the George Mason University catalog for general policies, as well as the university statement on diversity. You are expected to know and follow George Mason’s policies on academic integrity and the honor code. If you are a student with a disability and you need academic accommodations, please see me and contact the Office of Disability Services at 703-993-2474 or through their website. You are responsible for verifying your enrollment status. All academic accommodations must be arranged through that office. Please note these dates from the academic calendar.
- Last day to add a class or drop a class without penalty: January 29, 2018.
- Last day to drop a class without special permission: February 23, 2018.