This syllabus comes from https://lincolnmullen.com/courses/clio2.2018/.

Only the online version of this syllabus is authoritative, and it may be updated as necessary.

Clio 2: Computational History (Spring 2018)

Course: HIST 697-001. Spring 2018. Department of History and Art History, George Mason University. 3 credits. Meets Mondays, 7:20–10:00pm in RRCHNM conference room, Research Hall 402.

Instructor: Lincoln Mullen <lmullen@gmu.edu>. Office: Research Hall 457. Office hours: by appointment. Book an appointment.

Course description

In this course you will learn to apply computational methods to create historical arguments. You will learn to work with historical data, including finding, gathering, manipulating, analyzing, visualizing, and arguing from data, with special attention to geospatial, textual, and network data. These methods will be taught primarily through scripting in the R programming language. While historical methods can be applied to many topics and time periods, they cannot be understood separate from how the discipline forms meaningful questions and interpretations, nor divorced from the particularities of the sources and histories of some specific topic. You will therefore work through a series of example problems using datasets from the history of the nineteenth-century U.S. religion, and you will apply these methods to a dataset in your own field of research.

Learning goals

After taking this course, you will be able to

  • perform exploratory data analysis; clean, tidy, and manipulate data; gather historical data from print and manuscript sources; use existing historical data sets; create common visualizations; work with geospatial, textual, and network data.
  • write scripts using the R programming language and its extensive set of packages.
  • understand the place of data analysis and visualization within humanities computing, digital history, and the discipline of history.
  • conceive of and execute a research project in computational history suitable for treatment in a dissertation chapter or journal article.
  • take the course “Programming in History/New Media,” a.k.a. Clio 3, should you choose.

Essential information

You are always welcome to book an appointment during my office hours. If the times that are available do not work for you, feel free to contact me. All communication for this course will happen in our Slack group. Read this getting started guide if you need help.

Bring a computer to each class meeting. We will use R and RStudio. Install them on your own computer. You will also have access to an RStudio Server instance which will let you use R in your browser. Much of your work for the course will go on GitHub, so sign up for an account.

Main texts

All required readings are available online for free or through the GMU libraries, though they can also be purchased (sometimes in more complete editions) in print or e-books. These are the books we will use most frequently.

Assignments

Be prepared. Preparation and participation are expected as a matter of course in a graduate class. Complete all readings and assignments before class. If the readings include sample code or questions at the end, work through them as part of doing the readings.

Worksheets and weekly assignments (20%). Many classes will have an assignment due before class begins. Some will require you to do library research; others will be practice data analysis worksheets. Some of the questions on the worksheet will be easy; most will be difficult; some you may find nearly impossible. The aim is to practice. We will go over the worksheets in class each week. If you attempt a problem and can’t solve it, you should still turn in whatever work you did on it. Students who complete all the easy and moderate difficulty questions, attempt the very difficult questions, and ask for help as needed will do just fine. These assignments will graded by completion, with three levels: “incomplete,” “acceptable,” “excellent.” Unless otherwise specified, these assignments should be submitted as a PDF or a standalone HTML file, one file per assignment. Name them like this: Mullen-worksheet-week02.pdf. Submit them to this Dropbox folder.

Analysis assignments (3 @ 15% each). You will do three analysis assignments, each demonstrating a specific skill in data analysis. For these assignments you will be given a historical dataset and asked some interpretative questions. You will prepare an RMarkdown document containing prose, code, and tables or visualizations to answer the historical questions and, as necessary, explain your methods. You will be given a starter GitHub repository with the data and questions. Submit your final analysis as an HTML file along with your R Markdown file to this Dropbox folder. You will also be evaluated on the code in your GitHub repository, which I must be able to run on my computer.

R package tutorial (10%). At our second meeting, you will pick from a list of R packages not covered in this class. You will be assigned a week (beginning at week 7) during which you will teach the class for 15 minutes about the topic you selected. As part of that teaching, you will prepare a PDF handout. That handout should include these parts: (1) a one- to two-paragraph summary of what the package does and while it is useful; (2) a brief section of example code and results; (3) a bulleted list of examples (historical if possible) where the package was used. A draft of that handout is due to me one week before you are scheduled to teach. I will offer feedback, and you will give the class a revised version in Slack on the Friday before you teach.

Research paper (25%). You will write one research paper suitable for a presentation at a disciplinary or digital humanities conference (see for example the CFP for Current Research in Digital History, or the CFP for the major conference in your field). This paper must advance a historical argument using data analysis of a set of sources that you choose from your research interests. Submit this paper as a PDF or self-contained HTML file (if it includes interactive visualizations) to this Dropbox folder. Further instructions will be given throughout the semester. Due May 10.

Schedule

Week 1 (Jan. 22): Introduction

Assignment:

Read:

Week 2 (Jan. 29.): Data from history and historians

Assignment:

  • Getting familiar with R worksheet.
  • Find primary source data tables, datasets, or corpora from your field of historical research. At least one of these must be a source which can be transcribed into a tabular dataset in a later week. Post full citations and URLs in the Slack group, along with a sentence or two explaining what you’ve found. Examine the links that other people post before class.

Read:

Browse:

Week 3 (Feb. 5): Data manipulation

Assignment:

  • Data structures worksheet.
  • Transcribe at least 50 rows of the historical data you found last week. Be prepared to describe in class how you decided on the structure of your data.

Read:

Week 4 (Feb. 12): Data visualization

Assignment:

Read:

  • Wickham and Grolemund, R for Data Science, ch. 3.
  • Healy, Data Visualization, ch. 1, 3.
  • Graham, Milligan, Weingart, Macroscope, ch 5.
  • Kieran Healy and James Moody, “Data Visualization in SociologyAnnual Review of Sociology, 40:105–128.

Week 5 (Feb. 19): Data manipulation and visualizations

Assignment:

Read:

  • Wickham and Grolemund, R for Data Science, ch. 12–13.
  • Healy, Data Visualization, ch 4–5, 8.
  • John Theibault, “Visualizations and Historical Arguments,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty (University of Michigan Press, 2013), https://doi.org/10.3998/dh.12230987.0001.001.

Week 6 (Feb. 26): Exploratory data analysis

Assignment:

Read:

  • Wickham and Grolemund, R for Data Science, ch. 7, 19, 21.
  • Peng, Exploratory Data Analysis, ch. 1, 4–6.
  • Jenny Bryan et al, Happy Git and GitHub for the useR.
  • Skim: Wickham and Grolemund, R for Data Science, ch. 27, 29.3-29.4, 30.

Week 7 (Mar. 5): Mapping

Assignment:

Read:

Week 8 (Mar. 19): Mapping

Read:

  • Todd Presner and David Shepard, “Mapping the Geospatial Turn,” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 201–212. GMU library
  • Bret E. Carroll, “Spatial Approaches to American Religious Studies, Oxford Research Encyclopedia of Religion (Oxford University Press, 2015), https://doi.org/10.1093/acrefore/9780199340378.013.13.
  • Stephen Robertson, “Putting Harlem on the Map,” in Writing History in the Digital Age, edited by Jack Dougherty and Kristen Nawrotzki (University of Michigan Press, 2013).

Week 9 (Mar. 26): Text analysis

Assignment:

  • Mapping assignment due (see GitHub repository for data and instructions).

Read:

  • Silge and Robinson, Tidy Text Mining with R, ch. 1–2, 4–7.
  • Wickham and Grolemund, R for Data Science, 14.
  • Graham, Milligan, Weingart, Macroscope, chs. 3–4.
  • Tim Hitchcock and William J. Turkel, “The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior,” Law and History Review 34, no. 4 (2016): 929–955, https://doi.org/10.1017/S0738248016000304.

Tutorials:

  • Caitlin: forcats
  • Greta: lubridate

Week 10 (Apr. 2): Text analysis

Read:

  • Matthew L. Jockers and Ted Underwood, “Text-Mining the Humanities” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Wiley, 2016), 291–306. GMU library
  • Matthew K. Gold et al., “Forum: Text Analysis at Scale,” in Debates in the Digital Humanities 2016 (University of Minnesota Press, 2016), 525–568.
  • Ryan Cordell, “Reprinting, Circulation, and the Network Author in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): 417–445, https://doi.org/10.1093/alh/ajv028.
  • David A. Smith, Ryan Cordell, and Abby Mullen, “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers,” American Literary History 27, no. 3 (2015): E1–E15, https://doi.org/10.1093/alh/ajv029.

Tutorials:

  • Andrew: DT
  • Chris: stringr

Week 11 (Apr. 9): Network analysis

Assignment:

  • Text analysis assignment due (see GitHub repository for data and instructions).

Read:

Tutorials:

  • Clarke: rvest
  • Brian: ggrepel

Week 12 (Apr. 16): Network analysis

Read:

Tutorials:

  • Kenny: magick
  • Jay: iheatmapr

Week 13 (Apr. 23): TBD

Topic and readings to be determined by the needs of student research papers.

Tutorials:

  • Alan: dygraphs
  • John: Bookdown
  • Spencer: Shiny

Week 14 (Apr. 30): TBD

Topic and readings to be determined by the needs of student research papers.

Fine print

This syllabus may be updated online as necessary. The online version of this syllabus is the only authoritative version.

Students must satisfactorily complete all assignments (including participation assignments) in order to pass this course. Your attendance is expected at every meeting. If you must be absent, I request that you notify me in advance of the class meeting. I am sometimes willing to grant extensions for cause, but you must request an extension before the assignment’s due date. For every day or part of a day that an assignment is late without an extension, I may reduce your grade. No work (other than final exams and final projects) will be accepted later than the last day that the class meets. I will discuss grades only in person during office hours.

See the George Mason University catalog for general policies, as well as the university statement on diversity. You are expected to know and follow George Mason’s policies on academic integrity and the honor code. If you are a student with a disability and you need academic accommodations, please see me and contact the Office of Disability Services at 703-993-2474 or through their website. You are responsible for verifying your enrollment status. All academic accommodations must be arranged through that office. Please note these dates from the academic calendar.

  • Last day to add a class or drop a class without penalty: January 29, 2018.
  • Last day to drop a class without special permission: February 23, 2018.