This probably useful only for me, but I’ve made a small utility to help get the Chronicling America OCR files. The batch files from the Chronicling America bulk data downloads are .tar.bz2 files with both plain text and XML versions of the OCR text of the newspaper pages. The files are slow to unzip and dump tens of thousands of files, at least half of which you don’t need, onto your disk. So the utility process the batches without unzipping them and creates a CSV file with the text and the IDs used elsewhere in Chronicling America. You can get the utility at GitHub.