This Rust program processes a large CSV file of disaster records, grouping them by disaster number and creating separate CSV files for each disaster.
- Download the Rust installer from rustup.rs
- Run the downloaded
rustup-init.exe
- Follow the on-screen instructions
- Open a new command prompt to ensure the PATH is updated
- Open Terminal
- Run the following command:
bash curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Follow the on-screen instructions
- Either restart your terminal or run:
source $HOME/.cargo/env
- Clone this repository:
git clone https://github.com/erg/fema-csv
cd fema-csv
- Place your input CSV file at the specified location:
/Users/erg/factor/IndividualsAndHouseholdsProgramValidRegistrations.csv
Or modify the path in src/main.rs
to point to your CSV file location.
- Build and run the project:
cargo run --release
The program will:
- Create a
csvs
directory in the project folder - Process the input CSV file using parallel processing
- Create separate CSV files for each disaster number in the
csvs
directory - Show progress every million records processed
The program will create files in the following format:
csvs/[disaster_number].csv
Each output file will contain:
- The original CSV headers
- All records corresponding to that disaster number
- Rust 1.54 or later
- Sufficient disk space for the output files
- The input CSV file should be UTF-8 encoded
csv
= "1.3.1" - For CSV file processing
These dependencies will be automatically downloaded when you run cargo build
or cargo run
.
The output will be a series of CSV files in the csvs
directory, one for each disaster number. It uses 8.8GB (the size of the input file) of disk space.