A package manager for data

Quilt is a data package manager, inspired by the likes of pip and npm. Just as software package managers provide versioned, reusable building blocks for execution, Quilt provides versioned, reusable building blocks for analysis.

Motivations

Reproducibility - Imagine source code without versions. Ouch. Why live with un-versioned data? Versioned data makes analysis reproducible by creating unambiguous references to potentially complex data dependencies.
Less data cleaning - Finding, cleaning, and organizing data consumes 79% of the average data scientist's time. If data is cleaned once and packaged for posterity, it frees up time for analysis. Quilt further makes it possible to import data and start working immediately. Users can skip data preparation scripts for downloading, cleaning, and parsing data.
De-duplication - Data fragments are hashed with SHA256. Duplicate data fragments are written to disk once globally per user. As a result, large, repeated data fragments consume less disk and network bandwidth.
Faster analysis -** **Serialized data loads 5 to 20 times faster than files. Moreover, specialized storage formats like Apache Parquet minimize I/O bottlenecks so that tools like Presto DB and Hive run faster.
**Collaboration and transparency **- Data likes to be shared. Quilt offers a centralized data warehouse for finding and sharing data sets.

Demo

Learn more on the Quilt blog

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
styles		styles
.gitignore		.gitignore
README.md		README.md
SUMMARY.md		SUMMARY.md
basics.md		basics.md
book.json		book.json
buildyml.md		buildyml.md
edit-a-package.md		edit-a-package.md
installation.md		installation.md
jupyter.md		jupyter.md
make-a-package.md		make-a-package.md
methods.md		methods.md
python.md		python.md
r.md		r.md
shell.md		shell.md
supported-languages.md		supported-languages.md
terminology.md		terminology.md
troubleshooting.md		troubleshooting.md
tutorial.md		tutorial.md
use-a-package.md		use-a-package.md
use-cases.md		use-cases.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A package manager for data

Motivations

Demo

Learn more on the Quilt blog

About

Uh oh!

Releases

Packages

Languages

akarve/quilt-docs

Folders and files

Latest commit

History

Repository files navigation

A package manager for data

Motivations

Demo

Learn more on the Quilt blog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages