This syllabus comes from https://lincolnmullen.com/courses/data.2020/.

Only the online version of this syllabus is authoritative, and it may be updated as necessary.

Computational History (Spring 2020)

Course: HIST 697-001. Spring 2020. Department of History and Art History, George Mason University. 3 credits. Meets Mondays, 7:20–10:00pm in Music Theater Building 1008.

Instructor: Lincoln Mullen <lmullen@gmu.edu>. Office: Research Hall 484. Office hours: By appointment. Book an appointment.

Important updates

This syllabus has been modified for the move to online classes for the remainder of the semester. This online syllabus will be kept up to date and is your best source of guidance for the requirements of this course.

The key changes are these:

  1. Class will meet at 7:20 p.m. on Mondays via Webex. Class sessions will be recorded for anyone who cannot make it at that time or who might experience technical difficulties.

  2. I will provide written tutorials of the techniques we are learning, to the extent possible. The tutorials will be disseminated via Slack. These will be in addition to the in-class explanations and sample code customarily provided.

  3. The class calendar, as well as some readings, have changed due to the change in the university calendar. See the schedule below.

  4. While the final project will retain the same emphasis on indepdenent data analysis that produces historical insight for your field, the details of the assignment have changed. See the assignments section below.

  5. I will continue to be available to you in office hours, but now via Webex. In fact, the variety of times of day when I will be available will be much greater. Here’s how to meet with me individually.

What won’t change is that I am committed to you and your success in the course. Please let me know whenever you need help.

Course description

In this course you will learn to use computational methods to create historical interpretations. You will work with historical data, which includes finding, gathering, manipulating, analyzing, visualizing, and arguing from datasets, with special attention to geospatial, textual, and network data. These methods will be taught primarily using the R programming language. While data analysis methods can be applied to many topics and time periods, they cannot be understood separate from how the discipline forms meaningful questions and interpretations, nor divorced from the particularities of the sources and histories of some specific topic. You will therefore work through a series of example problems using datasets from the history of the nineteenth-century United States, and then apply the methods to write a research paper using a dataset from your own historical field.

Learning goals

After taking this course, you will be able to

  • gather historical data from print and manuscript sources; use existing historical data sets; clean, tidy, and manipulate data; perform exploratory data analysis; create common visualizations; work with geospatial, textual, and network data.
  • write scripts using the R programming language and its extensive set of packages.
  • understand the place of data analysis and visualization within the field of digital history and the discipline of history.
  • conceive of and execute a short research project in computational history.

Essential information

Most required readings are available online or through the GMU libraries. These are the main books that we will be using.

This is a graduate methods course in a field that moves reasonably quickly. The syllabus is likely to change over the course of the semester. In particular, I am likely to send you additional projects or visualizations to look at before class, which should be treated the same as other assigned readings.

All communication for this course will happen in our Slack group. Read this getting started guide if you need help. The Slack group is your primary place to ask for help. Please ask for help in the public channels rather than private messages. You are almost certainly not the only person to have your question, and asking and answering questions publicly benefits everyone. When you ask a question, help me help you by including the code that you are asking about and any error messages that are relevant.

You are always welcome to talk with me during office hours via Webex. My office hours page has instructions on how to book an appointment and connect to a Webex session. If the scheduled times don’t work for you, please contact me and suggest a few other times that would work for you.

Bring a computer to each class meeting. For the most part, we will be using an RStudio Server instance hosted by RRCHNM, which you can log in to using a web browser. But you should also install some key software on your computer. See the list under the heading for the first week. I will assume that you have a computer with some kind of Unix-like operating system available. The easiest will be macOS or a Linux distribution. But if you use Windows, good news: R has very good support for Windows.

In general I have provided datasets and questions for you to work on for all the assignments except the final paper. But for any assignment, you may substitute a dataset from your own historical field after checking with me. The forward-thinking graduate student will try to find such datasets early on in the semester so that you can use the intermediate assignments as preparation for your final assignment. If you can peer even farther into the future, you could try to use the final assignment as a test run for work you might want to do in one of your own research projects, such as a conference presentation, article, or dissertation.

Assignments

For each assignment, you should send me the completed HTML file knit from your RMarkdown document. Please submit the assignments via the Blackboard page for this class. Send the assignments before the start of class on the day on which they are due.

Preparation and participation are expected as a matter of course in a graduate class. Complete all readings and submit all assignments before class. If the readings include sample code or questions at the end, work through them as part of doing the readings, though you do not need to to submit them and I will not check them. Final grades will be calculated using the typical percentage-based grading scale (A = 93–100, A- = 90–92, B+ = 88–89, B = 83–87, B- = 80–82, … F = 0–59).

Worksheets and weekly assignments (25%). Many classes will have an assignment due before class begins. Some will require you to do library research; others will be practice data analysis worksheets. Some of the questions on the worksheets will be easy; most will be difficult; some you may find nearly impossible. The aim is to practice. We will go over the worksheets in class each week. If you attempt a problem and can’t solve it, you should still turn in whatever work you did on it. Students who complete all the easy and moderately difficult questions, attempt the very difficult questions, and ask for help as needed will do just fine. These assignments will graded by completion.

Analysis assignments (4 × 10% = 40%). You will do four analysis assignments, each demonstrating a specific skill in data analysis. For these assignments you will be use a historical dataset and asked some interpretative questions. You will prepare an RMarkdown document containing prose, code, and tables or visualizations to answer the historical questions and, as necessary, explain your methods. For these assignment I will provide a dataset that you can work with (but see below).

Final project (35%). You will designate one of the analysis assignments as a stepping stone to your final project. For that analysis assignment, you will use the same dataset that you will use for the final project. You will try out one of the methods we are learning on that dataset. In addition to the normal feedback that I will provide on an assignment, I will also give you guidance about how to refine and expand your analysis, visualizations, and interpretations. Then, you will expand and revise the work you did in the analysis assignment for the final project. This expanded version should include more prose and citations, not to exceed 1,500 words. The visualizations and data analysis should be expanded if necessary and refined in each case to the level of quality that would be expected in a published article. Each table and figure must have a caption written in complete sentences. Explain your methods as needed, but write in a way which would be understandable and compelling to any historian working in your field. The final assignment will be evaluated according to two primary criteria: (1) Did the visualizations significantly improve in refinement and quality? (2) Does the combination of prose and visualizations convey a meaningful historical argument? Due Monday, May 18 at 5pm.

Schedule

Week 1 (January 27): Introduction to computational history

Assignment:

  • Find one example of a digital history project that uses visualization or data analysis. Be prepared with the URL and a three-minute answer to these questions: What is interesting or insightful about this project? What did this project do that you would like to learn how to do for your own research?

Readings:

Do your level best to get these set up before the first day of class:

  • Join the class Slack group.
  • Get a GitHub account and post it to the Slack group (e.g., I am lmullen and this is my GitHub profile).
  • Install R, a programming language for data analysis.
  • Install RStudio, an environment for using R.
  • Install Homebrew (only if you use macOS).

These are mostly optional, but it would be helpful to have them:

Week 2 (February 3): Data from history and historians

Assignment:

  • Find at least three primary source data tables, datasets, or corpora from your field of historical research. These could include sources that are in print or manuscript, as well as datasets that have already been created. Post full citations and URLs in the Slack group, along with a sentence or two explaining what you’ve found. Examine the links that other people post before class.

Readings:

Browse:

Week 3 (February 10): Basics of R

Assignment:

  • Getting familiar with R worksheet.
  • Either use a primary source dataset that you found last week or, as a backup, the Minutes of the Methodist Episcopal Church from after 1851. Create a well-structured spreadsheet and transcribe at least 25 rows of the data. Upload a CSV file to Slack before class. Be prepared to describe in class how you decided on the structure of your data, and how you identified what the variables were. Use the Broman and Woo article as a guide.

Readings:

Week 4 (February 17): Data manipulation

Assignment:

Readings:

  • Wickham and Grolemund, R for Data Science, ch. 5, 12, 13.
  • Documentation for the tidyverse.
  • Documentation for databases in R.

Week 5 (February 24): Data visualization

Assignment:

Readings:

  • Healy, Data Visualization, ch. 1, 3, 4.
  • Wickham and Grolemund, R for Data Science, ch. 3, 28.
  • Kieran Healy and James Moody, “Data Visualization in SociologyAnnual Review of Sociology, 40:105–128.
  • Lauren F. Klein, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings,” American Literature 85, no. 4 (December 1, 2013): 661–88, https://doi.org/10.1215/00029831-2367310.
  • John Theibault, “Visualizations and Historical Arguments,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty (University of Michigan Press, 2013), https://doi.org/10.3998/dh.12230987.0001.001.

Week 6 (March 2): Exploratory data analysis

Assignment:

Readings:

Spring break (March 9)

Extended spring break (March 16)

Week 7 (March 23): Maps

Assignment:

Readings:

For reference:

Week 8 (March 30): Networks

Assignment:

Readings:

  • Mark E. J. Newman, Networks: An Introduction (Oxford University Press, 2010), ch. 1, 3, 4. Skim chs. 6, 7.
  • Matthew Lincoln, “Social Network Centralization Dynamics in Print Production in the Low Countries, 1550–1750,” International Journal for Digital Art History 2 (2016): 134–157, https://doi.org/10.11588/dah.2016.2.25337.

Browse:

For reference:

Week 9 (April 6): Texts

Assignment:

Readings:

  • Kasper, Welbers, Wouter van Atteveldt, and Kenneth Benoit, “Text analysis in R,” Communications Methods and Measures 11, no. 4: 245–265, https://doi.org/10.1080/19312458.2017.1387238.
  • Taylor Arnold, Nicolas Ballier, Paula Lissón, and Lauren Tilton, “Beyond Lexical Frequencies: Using R for Text Analysis in the Digital Humanities,” Language Resources and Evaluation 53, no. 4 (2019): 707–733, https://doi.org/10.1007/s10579-019-09456-6.
  • Tim Hitchcock and William J. Turkel, “The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior,” Law and History Review 34, no. 4 (2016): 929–955, https://doi.org/10.1017/S0738248016000304.
  • Joshua Catalano, “Digitally Analyzing the Uneven Ground: Language Borrowing Among Indian Treaties,” Current Research in Digital History 1 (2018): https://doi.org/10.31835/crdh.2018.02.
  • Ryan Cordell, “Reprinting, Circulation, and the Network Author in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): 417–445, https://doi.org/10.1093/alh/ajv028.

For reference:

Week 10 (April 13): Word embeddings

Readings:

Week 11 (April 20): Clustering (unsupervised classification)

Assignment:

Readings:

  • Roger Peng, Exploratory Data Analysis with R (Leanpub, 2016), ch. 12.
  • Robert K. Nelson, Mining the Dispatch (Digital Scholarship Lab, University of Richmond).
  • Benjamin Schmidt, “Stable Random Projection: Lightweight, General-Purpose Dimensionality Reduction for Digitized Libraries,” Journal of Cultural Analytics (2018): https://doi.org/10.22148/16.025.
  • Skim Gareth James, et al., An Introduction to Statistical Learning: With Applications in R (Springer, 2013), ch. 10. GMU library

Week 12 (April 27): Prediction (supervised classification)

Readings:

  • Wickham and Grolemund, R for Data Science, ch. 23–24.
  • Matthew L. Jockers and Ted Underwood, “Text-Mining the Humanities” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 291–306. GMU library
  • Skim Gareth James, et al., An Introduction to Statistical Learning: With Applications in R (Springer, 2013), ch. 1, 2, 4. GMU library

Week 13 (May 4): Next steps with computational history

Readings:

Week 14 (May 11): Final project workshop

Assignment:

  • Circulate a draft of your final project in Slack by Friday, May 8. Be prepared to present your work in class for approximately ten minutes. Read each person’s draft and come prepared to offer helpful comments on their work.

Fine print

This syllabus may be updated online as necessary. The online version of this syllabus is the only authoritative version.

Students must satisfactorily complete all assignments in order to pass this course. I am sometimes willing to grant extensions on assignments for cause, but you must request an extension before the assignment’s due date. For every day or part of a day that an assignment is late without an extension, I may reduce your grade. No work (other than final projects) will be accepted after the last day that the class meets. I will discuss grades only in person during office hours.

See the George Mason University catalog for general policies, as well as the university statement on diversity. You are expected to know and follow George Mason’s policies on academic integrity and the honor code. If you are a student with a disability and you need academic accommodations, please see me and contact the Office of Disability Services at 703-993-2474 or through their website. You are responsible for verifying your enrollment status. All academic accommodations must be arranged through that office. Please note the dates for dropping and adding courses from the GMU academic calendar.

This syllabus draws ideas and assignments from many people and syllabi, including Taylor Arnold, Andrew Goldstone, Jason Heppler, Ben Schmidt, and Lauren Tilton.