Skip to content

Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database

License

Notifications You must be signed in to change notification settings

lmullen/chronam-ocr-debatcher

Repository files navigation

Build Status

Chronicling America OCR debatcher

This program takes paths to .tar.bz2 batches of OCR files from the Chronicling America bulk data downloads. It converts each batch into a CSV file, which you can load into a database or do whatever you like with. It will process the batches concurrently.

Usage:

./chronam-ocr-debatcher [--processes=8] <path/to/a/batch.tar.bz2 ...>

You can download binaries from the releases page.

About

Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database

Resources

License

Stars

Watchers

Forks

Packages

No packages published