Working with Spatial Data
Much of the work you do in creating maps will be spent preparing data. A rule of thumb is that 80% of your time will be spent preparing data and 20% of your time will be spent making the map. That rule, if anything, is an understatement about the amount of time spent preparing data. Understanding the basics of spatial data manipulation is essential if you want to be able to create maps for your own research and teaching.
Kinds of spatial data
There are several ways of classifying spatial data. We can classify them by the way the data is encoded, the coordinate reference system (or map projection) that they use, the shapes of the data, the format, and so on. After an overview of the most common formats that spatial data comes in, we will work on manipulating spatial data. As a sample data set, we will use the Natural Earth quick start kit.
Vector vs Raster
Some geospatial data is represented as vector files, while other data is represented as raster data. Raster data is like a digital photograph, in that it is a grid of cells where each cell contains some kind of information. Just like a digital photo, a raster image can only be enlarged to a certain point before it becomes blurry. Vector data, on the other hand, is a mathematical representation of lines or shapes. It can be zoomed without any loss of quality.
You are most likely to encounter raster data in the form of map images, such as
.tiff files, or possible in the form of terrain data.1 Map images may not be associated with spatial data. But through a process called georectification, it is possible to embed spatial information within a file. The resulting files can then be loaded into a GIS program. Sometimes the resulting files have a different file extension, such as
.geotiff. See inspecting spatial data to learn how to determine if an image has spatial information embedded. In the Natural Earth quick start kit, the file
NE1_50M_SR_W/NE1_50M_SR_W.tif contains terrain data for the earth.
You are more likely to work with vector data. For example, other than the raster file mentioned above, all of the data in the Natural Earth quick start kit is vector data. See common spatial data formats below for more information.
Points, lines, polygons, and quantitative and qualitative information
Vector formats typically include the following kinds of information.
Point data describes a set of points. The geospatial information is likely to be encoded in terms of latitude and longitude, but it maybe encoded in a different coordinate reference system. Additional information can be associated with point data, whether that data is quantitative or qualitative. For instance, below is the North American Catholic dioceses dataset (available on the resources page) in a spreadsheet. Notice the columns for latitude and longitude are associated with temporal and qualitative information.
Here is a simple map of point data, in this case, Catholic dioceses in North America in 1850.
The easiest (and best) format to keep your own point data in is a CSV file. Point data can also be contained in shapefiles or GeoJSON files. See the file
ne_110m_populated_places.shp for a shapefile containing points.
Another common form of data is line data. A line is a set of vertices (points) that are connected in sequence. This kind of data can record motion or connections. For instance, the railroad data from Railroads and the Making of Modern America is described as lines in shapefiles. Below is a map of the railroads in the United States in 1850.