Modeling Historical Events and Lives in YAML

I've recently migrated this blog, and the older posts might not yet be satisfactorily cleaned up. Apologies for the temporary mess.

For my dissertation, I am researching the lives of converts from the nineteenth century. Some people who converted left behind an enormous source base. Orestes Brownson converted from Congregationalism to Presbyterianism to Universalism to Unitarianism to Transcendentalism to Catholicism, publishing voluminously all along the way. For other converts, I can find the barest of mentions in a newspaper or collection of papers. The dissertation needs to get both at the experience of well-known, articulate converts like Brownson, and lesser- or unknown converts. To retrieve that second kind of experience, I want to try analyzing all the conversions as data.

As I compile my research, I want to use it for two purposes. First, I need regular research notes to use when writing the dissertation. Second, I’d like to use the research as data, which I’ll analyze from some unknown tool (maybe Ruby). I have an idea of some of the questions that I’ll ask: How many people converted from X to Y? How likely were converts who were clergy in one religion likely to become clergy in another? How were conversions distributed over time? over space? But I won’t know which questions can be investigated programmatically or what the data to answer them will look like until I’ve done substantially more research.

The idea: use YAML to model lives and events

With that research problem in mind, I’ve drawn up a list of specifications for what my data model should look like.

  1. The data must be human-readable and -writable as research notes.
  2. The data model must be able to grow organically as I do the research.
  3. The data model must be able to hold large amounts of undigested text as notes.
  4. The data must be portable to other formats, possible JSON or XML/TEI.
My idea is to use YAML as the format for the data. YAML is a “human friendly data serialization standard for all programming languages.” YAML’s two top priorities are “YAML is easily readable by humans” and “YAML data is portable between programming languages,” which match my own priorities. I’m familiar with YAML from using Jekyll for this blog and another web project. YAML also fits well into the principles I learned from Linux and the Unix Philosophy, especially “store data in flat text files.”

Example YAML model and Ruby script

I’ve created a working example with two YAML files and a Ruby script to output some of the data. I’ve shared the example as a Gist on GitHub.

The YAML file for Orestes Brownson is below, and there is another sample file for Charles Wharton in the Gist. You’ll notice that at the outermost level of indentation, there are keys and values for basic biographical information, such as born: 1803-09-16. The most important part of the model is the list of conversions, which is a YAML array as signaled by the - character and indentation. The markup for the notes field (notes: >) lets that field contain as many paragraphs as necessary. Finally, the source array has one value (@carey_orestes_2004) which is the key to an entry in my BibTeX database, which I’ve added with Vim’s autocomplete function.

<span class=“c1”># A model of a convert's life</span>
<span class=“nn”>—</span>
<span class=“s”>name-last</span>       <span class=“pi”>:</span> <span class=“s”>Brownson</span>
<span class=“s”>name-first</span>      <span class=“pi”>:</span> <span class=“s”>Orestes Augustus</span>
<span class=“s”>born</span>            <span class=“pi”>:</span> <span class=“s”>1803-09-16</span>
<span class=“s”>died</span>            <span class=“pi”>:</span> <span class=“s”>1876-04-17</span>
<span class=“s”>birth-religion</span>  <span class=“pi”>:</span> <span class=“s”>Congregationalism</span>

<span class=“s”>conversions</span> <span class=“pi”>:</span>

<span class=“pi”>-</span> <span class=“s”>origin-religion</span> <span class=“pi”>:</span> <span class=“s”>Congregationalism</span> <span class=“s”>destination-religion</span> <span class=“pi”>:</span> <span class=“s”>Presbyterianism</span> <span class=“s”>date</span> <span class=“pi”>:</span> <span class=“s”>1822</span> <span class=“s”>ritual</span> <span class=“pi”>:</span> <span class=“s”>church membership</span> <span class=“s”>citation</span> <span class=“pi”>:</span> <span class=“s”>ANB</span> <span class=“s”>notes</span> <span class=“pi”>:</span> <span class=“pi”>></span> <span class=“no”>Brownson's change to congregationalism was more denominational </span> <span class=“no”>switching than a change in conscience.</span>

<span class=“pi”>-</span> <span class=“s”>origin-religion</span> <span class=“pi”>:</span> <span class=“s”>Presbyterianism</span> <span class=“s”>destination-religion</span> <span class=“pi”>:</span> <span class=“s”>Universalism</span> <span class=“s”>date</span> <span class=“pi”>:</span> <span class=“s”>1826</span> <span class=“s”>ritual</span> <span class=“pi”>:</span> <span class=“s”>ordination</span> <span class=“s”>location</span> <span class=“pi”>:</span> <span class=“s2”>”</span><span class=“s”>Jaffrey,</span><span class=“nv”> </span><span class=“s”>New</span><span class=“nv”> </span><span class=“s”>Hampshire”</span> <span class=“s”>citation</span> <span class=“pi”>:</span> <span class=“s”>ANB</span> <span class=“s”>notes</span> <span class=“pi”>:</span> <span class=“pi”>></span> <span class=“no”>“He would later refer to his years in this fold as 'the most </span> <span class=“no”>anti-Christian period of my life'” (ANB).</span>

  &lt;span class="no">Brownson was editor of _The Gospel Advocate and Impartial &lt;/span>
  &lt;span class="no">Investigator_, a Universalist publication.&lt;/span>

<span class=“pi”>-</span> <span class=“s”>origin-religion</span> <span class=“pi”>:</span> <span class=“s”>Universalism</span> <span class=“s”>destination-religion</span> <span class=“pi”>:</span> <span class=“s”>Unitarianism</span> <span class=“s”>ritual</span> <span class=“pi”>:</span> <span class=“s”>further research</span> <span class=“s”>location</span> <span class=“pi”>:</span> <span class=“s2”>”</span><span class=“s”>Walpole,</span><span class=“nv”> </span><span class=“s”>New</span><span class=“nv”> </span><span class=“s”>Hampshire”</span> <span class=“s”>citation</span> <span class=“pi”>:</span> <span class=“s”>ANB</span> <span class=“s”>notes</span> <span class=“pi”>:</span> <span class=“pi”>></span> <span class=“no”>Brownson spent some time at Brook Farm, which prepared him for </span> <span class=“no”>Transcendentalism</span>

<span class=“pi”>-</span> <span class=“s”>origin-religion</span> <span class=“pi”>:</span> <span class=“s”>Unitarianism and Transcendentalism</span> <span class=“s”>destination-religion</span> <span class=“pi”>:</span> <span class=“s”>Catholicism</span> <span class=“s”>date</span> <span class=“pi”>:</span> <span class=“s”>1844-10-19</span> <span class=“s”>ritual</span> <span class=“pi”>:</span> <span class=“s”>baptism</span> <span class=“s”>citation</span> <span class=“pi”>:</span> <span class=“s”>ANB</span> <span class=“s”>notes</span> <span class=“pi”>:</span> <span class=“pi”>></span> <span class=“no”>Brownson studied after his conversion with a Sulpician priest.</span>

<span class=“s”>source</span> <span class=“pi”>:</span> <span class=“pi”>-</span> <span class=“err”>@</span><span class=“s”>carey_orestes_2004</span> <span class=“pi”>-</span> <span class=“s”>American National Biography</span>

<span class=“s”>comments</span> <span class=“pi”>:</span> <span class=“pi”>></span> <span class=“no”>This is a minimal example of what a model of a convert might look </span> <span class=“no”>like. The historical data is hastily gathered, so only the model is </span> <span class=“no”>of interest here.</span>

<span class=“no”>N.B. I would like to replace the citations with BibTeX keys.</span> <span class=“nn”>…</span>

I had to prove to myself that I could get at the data programmatically, so I wrote the Ruby script below. It’s just a proof-of-concept, and it’s the first Ruby script I’ve written, so there are ugly parts. The script creates a class Converts, which loads an array of YAML files into a hash. The class has a few methods to display the names of the converts and a list of all the conversions. Doubtless there are more interesting things that can be done.
<span class=“c1”>#!/usr/bin/env ruby</span>
<span class=“c1”># A proof-of-concept script that outputs some simple data from YAML </span>
<span class=“c1”># files modeling conversions</span>
<span class=“c1”>#</span>
<span class=“c1”># Author:: Lincoln Mullen (</span>

<span class=“nb”>require</span> <span class=“s1”>'yaml'</span>

<span class=“c1”># This class loads data from YAML files, and outputs some values</span>

<span class=“k”>class</span> <span class=“nc”>Converts</span>

<span class=“kp”>attr_accessor</span> <span class=“ss”>:files</span><span class=“p”>,</span> <span class=“ss”>:data</span>

<span class=“k”>def</span> <span class=“nf”>initialize</span> <span class=“p”>(</span><span class=“n”>files</span> <span class=“o”>=</span> <span class=“kp”>nil</span><span class=“p”>,</span> <span class=“n”>data</span> <span class=“o”>=</span> <span class=“kp”>nil</span><span class=“p”>)</span> <span class=“vi”>@files</span> <span class=“o”>=</span> <span class=“n”>files</span> <span class=“vi”>@data</span> <span class=“o”>=</span> <span class=“no”>Hash</span><span class=“p”>.</span><span class=“nf”>new</span>

&lt;span class="k">if&lt;/span> &lt;span class="vi">@files&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">nil?&lt;/span>
  &lt;span class="nb">puts&lt;/span> &lt;span class="s2">"You didn&#39;t pass me any files."&lt;/span>
&lt;span class="k">elsif&lt;/span> &lt;span class="vi">@files&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">respond_to?&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">"each"&lt;/span>&lt;span class="p">)&lt;/span>
  &lt;span class="c1"># walk through the array of files, creating a hash with the &lt;/span>
  &lt;span class="c1"># file name as the key and the file data as the value&lt;/span>
  &lt;span class="vi">@files&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">each&lt;/span> &lt;span class="k">do&lt;/span> &lt;span class="o">|&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">|&lt;/span>
    &lt;span class="vi">@data&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="no">YAML&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">load_file&lt;/span>&lt;span class="p">(&lt;/span> &lt;span class="n">file&lt;/span> &lt;span class="p">)&lt;/span>
  &lt;span class="k">end&lt;/span>
&lt;span class="k">end&lt;/span>

<span class=“k”>end</span>

<span class=“c1”># output the hash we can see what we're working with</span> <span class=“k”>def</span> <span class=“nf”>display_raw</span> <span class=“nb”>puts</span> <span class=“s2”>”</span><span class=“se”>\n</span><span class=“s2”>This is the raw data we have loaded:”</span> <span class=“nb”>p</span><span class=“p”>(</span> <span class=“vi”>@data</span> <span class=“p”>)</span> <span class=“k”>end</span>

<span class=“c1”># walk through the hash, outputting the names of each person</span> <span class=“k”>def</span> <span class=“nf”>display_names</span> <span class=“nb”>puts</span> <span class=“s2”>”</span><span class=“se”>\n</span><span class=“s2”>These people converted:”</span> <span class=“vi”>@data</span><span class=“p”>.</span><span class=“nf”>each_key</span> <span class=“k”>do</span> <span class=“o”>|</span><span class=“n”>key</span><span class=“o”>|</span> <span class=“nb”>puts</span> <span class=“s2”>” - </span><span class=“si”>#{</span><span class=“vi”>@data</span><span class=“p”>[</span><span class=“n”>key</span><span class=“p”>][</span><span class=“s2”>“name-first”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”> </span><span class=“si”>#{</span><span class=“vi”>@data</span><span class=“p”>[</span><span class=“n”>key</span><span class=“p”>][</span><span class=“s2”>“name-last”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”>”</span> <span class=“k”>end</span> <span class=“k”>end</span>

<span class=“c1”># walk through the hash, outputting the names and conversions of </span> <span class=“c1”># each person</span> <span class=“k”>def</span> <span class=“nf”>display_conversions</span> <span class=“nb”>puts</span> <span class=“s2”>”</span><span class=“se”>\n</span><span class=“s2”>We know about these conversions:”</span> <span class=“vi”>@data</span><span class=“p”>.</span><span class=“nf”>each_key</span> <span class=“k”>do</span> <span class=“o”>|</span><span class=“n”>key</span><span class=“o”>|</span> <span class=“nb”>puts</span> <span class=“s2”>” - </span><span class=“si”>#{</span><span class=“vi”>@data</span><span class=“p”>[</span><span class=“n”>key</span><span class=“p”>][</span><span class=“s2”>“name-first”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”> </span><span class=“si”>#{</span><span class=“vi”>@data</span><span class=“p”>[</span><span class=“n”>key</span><span class=“p”>][</span><span class=“s2”>“name-last”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”>:”</span> <span class=“c1”># each person has an array of conversions (even if there is </span> <span class=“c1”># only one conversion)</span> <span class=“vi”>@data</span><span class=“p”>[</span><span class=“n”>key</span><span class=“p”>][</span><span class=“s2”>“conversions”</span><span class=“p”>].</span><span class=“nf”>each</span> <span class=“p”>{</span> <span class=“o”>|</span><span class=“n”>conversion</span><span class=“o”>|</span> <span class=“nb”>puts</span> <span class=“s2”>” + From </span><span class=“si”>#{</span><span class=“n”>conversion</span><span class=“p”>[</span><span class=“s2”>“origin-religion”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”> to </span><span class=“si”>#{</span><span class=“n”>conversion</span><span class=“p”>[</span><span class=“s2”>“destination-religion”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”> by </span><span class=“si”>#{</span><span class=“n”>conversion</span><span class=“p”>[</span><span class=“s2”>“ritual”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”> in </span><span class=“si”>#{</span><span class=“n”>conversion</span><span class=“p”>[</span><span class=“s2”>“date”</span><span class=“p”>]</span><span class=“si”>}</span><span class=“s2”>.”</span> <span class=“p”>}</span> <span class=“k”>end</span> <span class=“k”>end</span>

<span class=“k”>end</span>

<span class=“c1”># get sample data by loading every YAML file in the directory</span> <span class=“nb”>puts</span> <span class=“s2”>“Let's load all the YAML files in this directory:”</span> <span class=“nb”>puts</span> <span class=“no”>Dir</span><span class=“p”>.</span><span class=“nf”>glob</span><span class=“p”>(</span> <span class=“s1”>'.yml'</span><span class=“p”>).</span><span class=“nf”>join</span><span class=“p”>(</span><span class=“s1”>', '</span><span class=“p”>)</span> <span class=“n”>c</span> <span class=“o”>=</span> <span class=“no”>Converts</span><span class=“p”>.</span><span class=“nf”>new</span><span class=“p”>(</span><span class=“no”>Dir</span><span class=“p”>.</span><span class=“nf”>glob</span><span class=“p”>(</span><span class=“s1”>'.yml'</span><span class=“p”>))</span>

<span class=“c1”># call the methods to display the names and conversions</span> <span class=“n”>c</span><span class=“p”>.</span><span class=“nf”>display_names</span> <span class=“n”>c</span><span class=“p”>.</span><span class=“nf”>display_conversions</span>

If you run the script on the sample YAML files, you get the output below. (Yes—the script does output in Markdown. I only know one trick.)
Let<span class=“s1”>'s load all the YAML files in this directory:
brownson-orestes.yml, wharton-charles.yml

These people converted: - Charles Wharton - Orestes Augustus Brownson

We know about these conversions: - Charles Wharton: + From Catholicism to Church of England by conformity in . - Orestes Augustus Brownson: + From Congregationalism to Presbyterianism by church membership in 1822. + From Presbyterianism to Universalism by ordination in 1826. + From Universalism to Unitarianism by further research in . + From Unitarianism and Transcendentalism to Catholicism by baptism in 1844-10-19.</span>

What’s next?

If this model works for modeling conversions, it should also work for modeling other kinds of historical events. For example, suppose a labor historian is researching strikes and kept a YAML file for each strike …

<span class=“s”>id:</span><span class=“err”>    </span><span class=“s”>Pullman strike</span>
<span class=“s”>location</span><span class=“pi”>:</span> <span class=“s”>Pullman, Illinois</span>
<span class=“s”>date</span><span class=“pi”>:</span> <span class=“s”>1894-05-11</span>
<span class=“s”>corporations</span><span class=“pi”>:</span>
<span class=“s”>-</span><span class=“err”>  </span><span class=“s”>Pullman Palace Car Company</span>
<span class=“s”>unions</span><span class=“pi”>:</span>
<span class=“s”>-</span><span class=“err”>  </span><span class=“s”>American Railway Union</span>
<span class=“s”>accounts</span><span class=“pi”>:</span>

<span class=“s”>-</span><span class=“err”> </span><span class=“s”>name</span><span class=“pi”>:</span> <span class=“s”>John A. Doe</span> <span class=“err”> </span><span class=“s”>source</span><span class=“pi”>:</span> <span class=“s”>Chicago Tribune</span> <span class=“err”> </span><span class=“s”>description</span><span class=“pi”>:</span> <span class=“pi”>></span> <span class=“err”> </span><span class=“no”>“Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris</span> <span class=“err”> </span><span class=“no”>malesuada, purus vel posuere aliquam, enim orci tempor quam, ac</span> <span class=“err”> </span><span class=“no”>rutrum arcu arcu nec leo.”</span>

<span class=“err”>-</span><span class=“no”> name: Jane B. Doe</span> <span class=“err”> </span><span class=“no”>source: New York Times</span> <span class=“err”> </span><span class=“no”>description: ></span> <span class=“err”> </span><span class=“no”>“Maecenas in velit nulla, pretium vestibulum lacus. Morbi dui purus,</span> <span class=“err”> </span><span class=“no”>imperdiet ac aliquam sodales, gravida ut diam. Vestibulum nec erat a</span> <span class=“err”> </span><span class=“no”>ligula tincidunt dignissim in et diam. Quisque tincidunt</span> <span class=“err”> </span><span class=“no”>pellentesque lorem, a scelerisque quam lacinia vitae.”</span>

and another for each union …
<span class=“s”>union</span><span class=“pi”>:</span> <span class=“s”>American Railway Union</span>
<span class=“s”>leaders:</span><span class=“err”>   </span>
<span class=“err”>  </span><span class=“s”>-</span><span class=“err”>   </span><span class=“s”>name</span><span class=“pi”>:</span> <span class=“s”>Eugene V. Debs</span>
<span class=“err”>      </span><span class=“s”>start</span><span class=“pi”>:</span> <span class=“s”>1893-06-20</span>
<span class=“err”>      </span><span class=“s”>end</span><span class=“pi”>:</span> <span class=“s”>~</span>
<span class=“s”>founded</span><span class=“pi”>:</span>
<span class=“err”>  </span><span class=“s”>date</span><span class=“pi”>:</span> <span class=“s”>1893-06-20</span>
<span class=“err”>  </span><span class=“s”>place</span><span class=“pi”>:</span> <span class=“s”>Chicago, Illinois</span>
I asked about this idea at Digital Humanities Questions & Answers and on Twitter. Chad Black, Ben Brumfield, Ethan Gruber, Caleb McDaniel, and Conal Tuohy offered valuable advice about how to think about this problem and what tools might be helpful later in the project. The TEI markup for an event and person (recommended by Conal) seems promising because it can accommodate types of data that I know I’ll need, such as uncertain dates and name changes.

For now, though, I’m going to work with YAML, since I can get started on it right away and since I’m completely sure it will work as research notes and reasonably sure it can be munged into another format later.

I’ll be glad for any advice about how to improve the data model or script and about what considerations I should think about to make sure the data is useful. If you have any ideas about what to do with the data once I’ve gathered it, I’ll be glad for those too.