Aug. 27: Unix as a Way of Life

How to interact with your computer and to run programs through the command-line interface. You will also learn a philosophy for writing programs.

Reading

  • Mike Gancarz, Linux and the Unix Philosophy, chs. 1–8, focusing on the ten tenets.
  • William E. Shotts Jr., The Linux Command Line: A Complete Introduction. Most of this book is a reference source, but familiarize yourself at a minimum with chapters 2 (navigation), 4 (file manipulation), 5 (commands), 6 (redirection), 10 (processes), 11 (environment). Nearly all of what Shotts writes about Linux will apply to the Unix terminal in Mac OS X.

Exercises

Try out all the Unix style commands in your terminal.

Before class, do your best to get the following installed:

If you are on a Mac, you should install Homebrew and any necessary dependencies as you go along. If you are on some kind of Linux machine, then probably everything you need is in your package manager. If you are on a Windows PC, you should install Ubuntu 14.04 LTS inside Virtual Box using Vagrant. Follow this tutorial on Vagrant, substituting ubuntu/trusty64 for hashicorp/precise32.

Sept. 3: Version Control and Reproducible Research

Version control lets you contribute to projects and distribute your code. GNU Make helps automate and reproduce your results.

Reading

Exercises

  • At least one day before class, submit a pull request to the repository for this syllabus. The pull request should modify the list of participants (source/participants.md) to add your name with a link to your personal website, as well as your GitHub user name and a link to your GitHub user profile. Feel free to include your Twitter user name and link if you like. (A guide to Markdown if you need it.)
  • Create a minimal Makefile. This Makefile should take a text file (provided by you) and find and replace words of your choosing to a new text file. (Hint: sed 's/foo/bar/g' input-file.txt > output-file.txt replaces all instances of foo with bar and redirects standard out to a file.) The Makefile should also put the time stamp for when the output file was generated at the bottom of the file. (Hint: in your shell the >> operator appends to a file; there is also a command to get the current time.) Can you rewrite the Makefile so that it uses rules? So that it uses special targets? So that it works on several text files at once? On an arbitrary number of files? So that it uses a default rule? Post your Makefile and input text files to GitHub.

Sept. 10: Flow Control and Functions in JavaScript

The basics of JavaScript, as well as some foundational principles like loops, conditionals, functions, closure, and recursion.

Reading

Exercises

  • Create separate .js files with the solutions for each exercise in these chapters, and post them to GitHub.

Sept. 17: Data Structures in JavaScript

An introduction to how data is structured, and an introduction to the object-oriented style of programming for modeling data.

Reading

Exercises

  • Create separate .js files with the solutions for each exercise in these chapters, and post them to GitHub.
  • Think of a historical source, event, or life that could be modeled as data, and create a data model for it in JavaScript. Can you write a constructor function to create new objects? Can you create at least five objects? How can you store an arbitrary number of objects? Now that you’ve stored those objects, what can you do with them that is interesting? How can you display, represent, filter, and link them? Post your code to GitHub.

Sept. 24: Introduction to R / Grammar of Graphics in R

We learn our second programming language and begin to make real visualizations.

Reading

Exercises

  • Experiment with ggplot2 in R Studio as you read the assigned books.
  • Find a historical data set and make as many different kinds of charts with it as you can. (Some of them should be bad charts or unhelpful charts.) Annotate the charts in RMarkdown and Knitr (guide here). Post the code to GitHub and the document to RPubs.

Oct. 1: Manipulating Data in R

Data seldom comes in the format we need it: this is how to munge it into a useful form.

Reading

Exercises

  1. In the data-raw directory of the historydata package, there are several raw data files stored in untidy formats as CSV files. I have transformed these into tidy data in the actual package. In other words, loading sarna.csv gives you different results than loading the package and accessing the sarna dataset. Try turning those untidy datasets into tidy data sets that match the versions actually using dplyr and tidyr. (Start with sarna.csv.) You can see how I have done this using the corresponding R files in data-raw, for example, sarna.R.
  2. In the data-raw directory, there is a nhgis0011_ts_state.csv which has counts of the state populations. Can you use this data with the summarize() function to create counts of the national population for each census year? (In other words, can you sum() up the state populations for each year?)
  3. Now take some dataset of your own. Can you turn it into a tidy dataset? Can you clean the data as necessary? Can you use all seven data manipulation verbs on your data? The verbs are filter(), arrange(), mutate(), select(), summarize(), gather(), and spread(). (There is also a family of verbs that fall under the category of joins: in dplyr this includes left_join(); the base R function is merge(). We’ll deal with these later.) What new visualizations can you make?
  4. The Biographical Directory of Federal Judges, 1789-present is an immensely interesting dataset, but also messy and untidy. Can you make it better?

Share your code on GitHub, and publish your results to Rpubs.

Tutorial

Mandy: Dat

Oct. 8: Spatial Analysis in R

How to make maps and perform other kinds of spatial analysis.

Reading

Exercises

Select from the spatial data sets available to you, or find your own. Make maps. Publish your code to GitHub and your results to Rpubs. Hint: If you use ggplot2 with a projected shapefile (i.e., a shapefile whose coordinates are stored in some coordinate reference system other than latitude and longitude) it will probably blow up. First convert the shapefile to EPSG 4326/WGS 84.

Tutorial

Sara: Open Refine

Oct. 15: More about R

This week we’ll learn whatever we haven’t covered about R that would be most helpful for your projects.

Tutorial

George: Web scraping

Oct. 22: Text Mining in R

How to do “distant reading,” document similarity, and other kinds of textual analysis.

Reading

Exercises

Chose one (or both) of the following, in either case posting your code to GitHub and your results to RPubs:

  • In the nineteenth-century United States, there was a fierce debate over whether to codify laws. New York created several codes of civil procedure, which other states then borrowed. You will be given a handful of codes. Which codes borrowed from one another? What did they borrow? How can you visualize this? How can you browse the borrowings? What interpretations do you draw from this? You can clone this repository: the OCRed codes are in the text/ directory. The RMarkdown files in the directory will provide some hints about how to proceed.
  • You will be given a cleaned up set of texts from the Oxford Movement’s Tracts for the Times (zipfile here). What does text mining and topic modeling these texts tell you? You may substitute another corpus if you wish.

Tutorial

Peter: Image processing

Oct. 29: Network Analysis in R

How to measure and visualize networks of people, events, ideas, sources, you name it.

Reading

Exercises

You will be provided with some historical data suitable for network analysis, or you may bring your own. Do some network analysis with visualizations and interpretations.

Tutorial

Janelle: PHP

Nov. 5: D3.js Concepts

The basics of a powerful visualization library for the web.

Reading

Exercises

  • Using some suitable data set(s), create as many different kinds of D3 visualizations as you can manage. (These need not be complicated visualizations.) Can you add interactivity to them? What does interactivity add to the graphics? What does it take away?

Tutorial

Anne: D3

Nov. 12: D3.js Applications

From D3 basics to D3 for history.

Reading

No assigned reading, but you may find Elijah Meeks, D3.js in Action (Manning, 2014) useful for advanced D3.

Exercises

Over the course of the semester we have written programs to do many kinds of analysis. Take one of the kinds of analysis that seems most promising for your work, and translate it to the web using D3. Create the most sophisticated (not flashy) visualization that you can, and embed it in an interpretation or narrative. Use the principles of reproducible research as appropriate.

Tutorial

Allison: MARC

Nov. 19: Workshop day / TBD

This week we will work collaboratively on the projects for the course. We may also cover additional topics such as web applications and frameworks (Ruby on Rails, Sinatra, Node.js); programming practices such as debugging, refactoring, and testing; other programming languages (Python, Ruby, PHP); basic statistics of use to historians; or other topics relevant to your research.

Nov. 26: No class

Thanksgiving break.

Dec. 3: Project Presentations

You will present your final projects, with an emphasis on both their code and historical interpretations. Final projects are due by 6 p.m. on December 10.