*This series on “Analyzing Historical History Dissertations” is a work in progress and I’ve re-done some of these visualizations. If you would like to cite or link to this work in progress, please consider using the landing page, which will always have the most up-to-date information and a list of all the posts.*
After my post yesterday about historical history dissertations, Yoni Appelbaum sent me some useful questions about the completeness and accuracy of the data. I’ve had my own questions about exactly what data I am working with, and what are its limitations. The difficulty with a reasonably large data set like this one is that it is easier to know something about the data in aggregate than any of the particulars.
The main question boils down to this: why does the data from ProQuest have 84,428 dissertations about history (based on the filters I described earlier) when the AHA’s Directory of History Dissertations contains “29,421 dissertations that were completed or are currently in progress at 194 academic departments in Canada and the United States”?
The answer is that the AHA data is based on dissertations completed within history departments. Those dissertations also have to be reported to the AHA, perhaps by the authors themselves, but usually by member history departments. The ProQuest data, as far as I can tell, comes from reports by universities to ProQuest, as well as retrospective data entry from sources like Dissertations Abstracts International and UMI. It contains data from many departments, with subject fields that describe the content of the dissertation. I’m filtering the data set to get only disserations that match these subjects. So while I’m not selecting, for example, American Studies dissertations as a whole, I am getting American Studies dissertations that someone judged to be history. This certainly leads to some fuzziness, but I think it also gives a fuller picture about the writing of history. In other words, the AHA data is probably better for answering questions about professional academic historians in history departments; the ProQuest data is probably better for answering questions about writing about history (or the past) across the academy.
But there is another question: is the ProQuest data tolerably complete? I suspect, given the very low numbers of dissertations before the 1950s, that the data is patchy and unreliable before that time. (For example, I didn’t find W. E. B. Du Bois’s 1896 dissertation in either data set.) Other than that, I’m not sure how to answer the question definitively. But I was curious to see whether the two data sets contained the dissertations by faculty affiliated with my own department. This kind of spot check is only suggestive, of course. Leaving out two faculty with dissertations after 2012, and four faculty with PhDs from British universities, I came up with the following table.
|Faculty||In ProQuest||In AHA||Year|
|Fischer, David Hackett||no||no|
|Hulliung, Mark L.||no||no|
|James, Heyward Parker||yes||no||2001|
|Kapelle, William E.||no||no|
|Sarna, Jonathan D.||yes||yes||1979|
In sum, I think the ProQuest data is sketchy before the 1950s, but at least as reliable as the AHA data after the 1950s, taking into account a much broader definition of what counts as a history dissertation.