It's like lodash for CSV files !
withCSV is a Node.js library to consume and produce CSV files with clean and readable code, without sacrificing performance. It features :
📜 A fluent API similar to lodash chainable methods, treating your CSV like the array of objects it really is
🏋️ Based on a robust parsing library using Node streams. It is stupid fast and memory-efficient by default, for you to go crazy on large files 🪐
🖋 Equipped with a streaming CSV stringifying library fit for writing large volumes of data to disk
⚙ Barely 300 lines of Typescript
⏳ Support for asynchronous callbacks
withCSV can be installed using your package manager of choice :
npm install with-csv
# or
yarn add with-csvGiven the following CSV file :
id,name,phone,flag,category
1,Joe,0612345678,true,6
2,Jack,0698765421,false,12
3,Mark,0645631256,true,54
4,Valerie,0645631256,true,12
import { withCSV } from 'with-csv'
const result = await withCSV('my.csv')
.columns(['name', 'phone', 'flag'])
// row (below) is automatically typed as {name: string, phone: string, flag: string}
.filter(row => row.flag === 'true')
.map(row => `${row.name}: ${row.phone}`)
// value (below) has been typed as the output of .map , which is a string
.filter(value => value.startsWith('J'))
// At this point the CSV file hasn't yet been read
// It will be read by the terminator method `rows` (below)
.rows()
console.log(result)
// [
// "Joe: 0612345678",
// "Jack: 0698765421"
// ]
// You can also use withCSV to produce CSV files after treatment
await withCSV('my.csv')
.columns(['name', 'phone', 'flag'])
// row (below) is automatically typed as {name: string, phone: string, flag: string}
.filter(row => row.flag === 'true')
.toCSV('your.csv')withCSV(csvFile, options): Returns an instance of withCSV configured with the provided CSV file and options. At this stage the CSV file is not opened yet.
- csvSource: The path to the CSV file
- options (optional): A csv-parse options object
The withCSV instance exposes the methods columns which takes as input an array of column names. This allows withCSV to infer the type of the rows.
Once you have selected your columns, you can start manipulating your CSV data using the Querying API. It consists of two categories of methods :
⛓️ chainable methods which are stacked in a pipeline through which every row will be processed one by one
🚧 terminator methods which will trigger the reading of the file, and the processing of each row through the pipeline
Only one terminator method can be present. It will return a promise which resolves to the output of your pipeline.
Unless otherwise specified, the methods signature are always the same as their corresponding method in the javascript Array prototype.
The only major differences are :
- All methods in the querying API accept asynchronous callbacks
- Array methods such as
filter,mapetc... will not receive the whole array as their last argument. This is by design as the CSV file is never held fully in memory.
map(callback): maps each record to a new shape. The output will be typed accordingly.
pick(keys): picks a subset of properties from the records. keys is described in the lodash.pick documentation.
filter(callback): filters out records.
forEach(callback): this is executed on each record, but doesn't alter the data of the rows.
uniq(iterator): deduplicates records from your CSV file. It can accept as argument :
- A column name to deduplicate on that column
- An array of column anmes to deduplicate on the combination of those columns
- A callback returning a string to deduplicate on the value of that string
The following methods only ever consume one row at a time so they are safe to use on very large files
process: Executes the pipeline on all the rows, but without outputting any data. This is useful for example when your pipeline is based on forEach and you want to discard the final output data.
find(callback): returns the first matching record
findIndex(callback): returns the index of the first matching record
every(callback): returns true if all records match
some(callback): returns true if at least one record matches
includes(value): returns true if the final output contains the value passed. This uses lodash.isEqual so the value can be a primitive, object, array, etc...
count(): returns the number of rows at the end of the pipeline
first(limit): returns the first elements of the result up to a maximum of limit
toCSV(csvTarget, options): writes the result to a file or a stream
- if
csvTargetis a string, a file at this path will be created - if
csvTargetis a WriteStream, the data will be piped directly to it optionsare documented in thecsv-stringifydocumentation page
The following methods consume the entirety of your CSV file and the resulting output will be stored in memory. Very large files should be adequately filtered beforehand or you may max out your machine's memory.
last(limit): returns the last elements of the result up to a maximum of limit
skip(offset): returns the whole result but omits the offset first items
key(property, filterUndefined): returns an array of the values of that property for each row. If filterUnderfined is true, then only defined values will be returned.
toJSON(replacer, spaces): returns the final result of the query pipeline, as a JSON string. replacer and spaces are documented in the JSON.stringify signature.
rows(): returns all the rows of the result, as an array of objects
Feel free to write us about any bug you may find, or submit a PR if you have fresh ideas for the library !
This project does its testing with a ghetto Jest clone written in <60 lines, and which includes basic benchmarking. You can launch the test suite & benchmarks with npm test or yarn test (it should take around 1 minute).
Here are the benchmark results on a decently powerful machine. Small sample contains 100 rows, medium contains 100 000 and large contains 2 000 000.
Benchmark Import and export : small: 12.515ms
Benchmark Import and export : medium: 390.564ms
Benchmark Import and export : large: 8.820s
1 => Import and export
Benchmark 1 map : small: 3.122s
Benchmark 1 map : medium: 576.674ms
Benchmark 1 map : large: 9.265s
2 => 1 map
Benchmark 4 chained map : small: 2.094ms
Benchmark 4 chained map : medium: 509.942ms
Benchmark 4 chained map : large: 11.925s
3 => 4 chained map
Benchmark uniq : small: 2.786ms
Benchmark uniq : medium: 667.513ms
Benchmark uniq : large: 15.352s
4 => uniq
Made with 💖 @ Ambler HQ