Please note that this book is a work in progress, the equivalent of a pre-alpha release. All the code should work, because if it didn’t the site could not be built. But there is still a lot of work to do to explain the historical methods under discussion. Feel free to leave feedback as issues on the GitHub repository, or to e-mail me.

How to set up R.

Installing R

On Ubuntu

R can be installed on Ubuntu using the package manager. The version in the Ubuntu repositories is almost certainly out of date, so the best thing to do is to add the CRAN repository using the RStudio mirror. (You will have to run most of these commands as root, prefixing them with the command sudo.)

add-apt-repository "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/"
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9

You should then be able to update the list of Ubuntu packages and install the necessary R packages:

apt-get install r-base r-base-core r-base-dev

You can then test whether R is properly installed by running the following command in your terminal:

R --version

For some tasks in R, like package development, you’ll need many more external dependencies. The following will take a long time to download and install, but it will provide almost all of what you need.

apt-get install libcurl4-openssl-dev libxml2-dev qpdf texlive-full build-essential gdal-bin libgdal-dev libproj-dev libxml2-dev git

Installing R Studio

Packages and Libraries

A package is a collection of R files bundled together to accomplish a particular purpose. Usually these packages provide a set of functions to provide a particular kind of analysis. The dplyr package, for example, provides a set of functions for manipulating data, and the rgdal package provides a set of functions for working with geospatial files. Because R is especially good at data analysis, it is also common to provide data packages which wrap generally useful datasets. R has a built in datasets package which provides a set of sample datasets, and the babynames package provides data about births from the Social Security Administration.1

A library is a collection of packages stored on your computer. R has a built-in library which stores the packages which come bundled with base R. You will also create a personal package library which stores the packages that you have installed. You needn’t worry about where this library is on your computer: R will prompt you for permission to create it the first time that you install packages. (You can find where it is by running the function .libPaths(). When you work on specific analysis projects, it is desirable to keep a separate library of packages uses just for that project in the project directory. This will ensure that you can later re-run your analysis. This can be managed using the Packrat package, which we’ll cover later. (See the chapter on reproducible research.)

Installing packages

You won’t get very far in doing historical work with R without installing a few packages. This section will guide you in installing a few basic packages which you can’t live without. Then it will suggest some packages categorized by the type of analysis that you wish to do.

There are two main places from which you can install packages. CRAN, the Comprehensive R Archive Network, is a central repository of R packages.2 To be published on CRAN, packages have to go through a rigorous testing process. You can be fairly confident that a CRAN package is, technically speaking at least, reliable. R users also have a culture oriented around academic research, so you can be more confident with CRAN packages than with the repositories for most languages that the functions provided are academically rigorous, and in some cases at least, peer reviewed. CRAN provides a wide array of packages, for statistics, business, science, social science, geospatial analysis, network analysis, text mining, natural language processing, visualization, and other domains. Most of the packages are not targeted specifically at historians (you can help change this!) but many can be used by historians. You can browse the list of packages available at CRAN by name, or can browse the task views which provide guides to packages relevant to a particular analytical technique.

R includes a function that lets you install packages from CRAN. Running the function below will install the package devtools, which provides many functions useful for writing R code. If this is the first time that you have installed an R package, you may be prompted to create a personal package library; just accept whatever the default location is for your operating system.

install.packages("devtools")

There are two common operations that you might want to do with install.packages(). One is to install more than one package at a time. You can combine the names of the packages that you want to install into a character vector. In this example, we’ll install the httr package, which lets you make requests from R over the internet via HTTP, and the jsonlite package, which lets you read the JSON data format which is commonly used to exchange data through web applications: c("httr", "jsonlite"). You might also want to install all the dependencies that a package needs. By default when using install.packages() to install package A, you will also install the packages B and C without which A cannot work. But packages will also have other dependencies which add additional functionality. Specifying dependencies = TRUE will install all the packages which A can possibly use: B and C, but perhaps also D and E. Putting those two arguments together, we can install httr and jsonlite with all their dependencies:

install.packages(c("httr", "jsonlite"), dependencies = TRUE)

The second main place to install packages is from GitHub, a website for sharing open-source code. Many R packages are developed on GitHub, so you can install more recent versions of them than are available on CRAN. Other packages are distributed only informally via GitHub rather than formally via CRAN. Such packages can be installed using the install_github() function provided by the devtools package we installed earlier.

Let’s install the historydata package which provides sample datasets of interest to historians. This package will give us something to work with later. You can find this package at GitHub. Notice that the README provides instructions on how to install the package. But even if it didn’t, we could find out how to install the package from its URL. Notice the URL: https://github.com/lmullen/historydata. After GitHub’s domain name, there are two parts to the URL. The username of the person or organization who controls this package is first: lmullen. Next is the name of the repository (which probably should be the same as the name of the package, but which might not be): historydata. We can use that information to install the file with devtools.3

devtools::install_github("ropensci/historydata")

Now using install.packages() and devtools::install_github() you’ll be able to install the vast majority of the packages that are useful to you.

There is one more wrinkle to consider. R is a high-level language which uses libraries provided by low-level languages. For example, the RCurl package (which many R packages need to interact with the internet) requires the libcurl library which is external to R. The rgdal package depends on the GDAL/OGR Geospatial Data Abstraction Library. When looking at the documentation for package, note if it requires any external libraries. Those external libraries will have to be installed on your operating system. On Mac OS X, this is most easily done with Homebrew; on Ubuntu and other Linux distributions you can use the package manager; on Windows you will have to follow the instructions from the developers of the external library. In the list of suggested R packages below, I have mentioned the any external libraries which are required.

Suggested packages

The following are some suggested packages for historical analysis. We’ll use most of these throughout this book.

General R functionality:

Data:

Web:

Visualization:

Data manipulation:

Geospatial and mapping:

Updating Packages

Citing packages

.Rprofile


  1. For a fuller description of packages, see the introduction and chapter 1 of Hadley Wickham, R Packages (O’Reilly, forthcoming).

  2. There are other CRAN-like repositories, such as Bioconductor which provide discipline specific packages, but these are all for scientists.

  3. The double colon operator :: lets us specify which package a function comes from. In this case we are using it as a shortcut so we don’t have to load devtools into the global environment.