Visualization with the Grammar of Graphics

Lincoln Mullen

http://lincolnmullen.com

Rules versus Grammar

Rule: The axis of a bar chart must start at zero

Rule: Don’t use pie charts

Rule: Use meaningful data

An analogy to English grammar

Error: “John and me went to the store.”

Rule: Use John and I.

Error: “He gave the bag to John and I.”

Grammar: I is the subjective case; me is the objective case.

Grammar of Graphics

Historical visualizations

A simple example of grammar of graphics

The grammar of graphics maps variables in the data to visual properties.

The mapping from our example

  • confessions (continuous numeric) → position along x-axis
  • converts (countinuous numeric) → position along y-axis
  • duration (discrete/binned numeric) → color (binned)

Exercise

What kind of meaningful marks can you make in a visualization?

Bonus: What kind of data are they good for?

Hint: We’ve already seen these:

  • x-axis position (good for continuous numeric values)
  • y-axis position (good for continuous numeric values)
  • color (good for categorical values)

Marks = the vocabulary
Relationship between marks = syntax

Gapminder data

##         country continent year lifeExp      pop  gdpPercap
## 1     Mauritius    Africa 1957  58.089   609816  2034.0380
## 2         Niger    Africa 1957  38.598  3692184   835.5234
## 3       Belgium    Europe 1967  70.940  9556500 13149.0412
## 4         Sudan    Africa 1967  42.858 12716129  1687.9976
## 5  Sierra Leone    Africa 1972  35.400  2879013  1353.7598
## 6       Croatia    Europe 1977  70.640  4318673 11305.3852
## 7         Nepal      Asia 1977  46.748 13933198   694.1124
## 8    Bangladesh      Asia 1982  50.009 93074406   676.9819
## 9       Bolivia  Americas 1987  57.251  6156369  2753.6915
## 10      Germany    Europe 1987  74.847 77718298 24639.1857
## 11       Gambia    Africa 1987  49.265   848406   611.6589
## 12      Myanmar      Asia 1997  60.328 43247867   415.0000

Position along the axis

x-axis is independent; y-axis is dependent

Which plot of life expectancy over time makes sense?

Color (binned)

Color (continuous)

Size

Text

Position (discrete)

Two-variable bar plot

Position (discrete)

One-variable bar plot (histogram)

Lines connecting data points

Lines describing models of data

Transformations: Normalizing data / log scales

Which marks/relationships are best?

Basics of visual perception for visualizations

We are good at seeing

  • motion
  • contrast
  • relative distances
  • properly chosen colors for categories

We are bad at seeing

  • angles
  • slopes of lines (which are a kind of angle)
  • area
  • continuous color ranges

Rule of thumb: can’t beat a scatter plot, line plot, or bar plot

Aesthetic principles

Should the humanities visualize differently than the sciences?

Application

Create visualizations of gapminder data

  • What are the variables in the data?
  • How many different kinds of marks/relationships can you use?
  • Which marks are best suited to each visualization?
  • Which visualizations convey knowledge?
  • What additional steps might you take to make a visualization clearer?

Feel free to use other datasets, or your own data.