Skip to content

bruin-data/ingestr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2,537 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Copy data from any source to any destination without any code


ingestr is a command-line app that allows you to ingest data from any source into any destination using simple command-line flags, no code necessary.

  • ✨ copy data from your database into any destination
  • βž• incremental loading: append, merge or delete+insert
  • 🐍 single-command installation

ingestr takes away the complexity of managing any backend or writing any code for ingesting data, simply run the command and watch the data land on its destination.

MongoDB to Postgres benchmark

Installation

You can install ingestr using the install script:

curl -LsSf https://getbruin.com/install/ingestr | sh

Alternatively, you can install it with pip:

pip install ingestr

The pip package can also be used from Python. Install the SDK extra for Python data ingestion:

pip install 'ingestr[sdk]'

Python rows, generators, and DataFrames are sent to the bundled ingestr binary as Arrow IPC streams by default:

import ingestr

ingestr.ingest(
    [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Grace"}],
    dest_uri="duckdb:///tmp/warehouse.duckdb",
    dest_table="main.people",
)

DataFrames and yielded data use the same Arrow stream transport:

ingestr.ingest(df, dest_uri="duckdb:///tmp/warehouse.duckdb", dest_table="main.events")

def events():
    yield [{"id": 1, "event": "signup"}]
    yield [{"id": 2, "event": "purchase"}]

ingestr.ingest(events, dest_uri="postgresql://...", dest_table="public.events")

For push-style code, omit the data argument and use ingest as a context manager. The context value accepts the same shapes as ingestr.ingest(data, ...):

with ingestr.ingest(dest_uri="postgresql://...", dest_table="public.events") as ingest:
    for response in client.list_events():
        ingest(response["items"])

For very large already-materialized data, use the existing mmap Arrow IPC file transport:

ingestr.ingest(df, dest_uri="duckdb:///tmp/warehouse.duckdb", dest_table="main.events", transport="mmap")

For full CLI pass-through, use ingestr.run(["ingest", "--source-uri", "...", "--dest-uri", "...", "--source-table", "..."]), or ingestr.run_cli(...) for keyword arguments that map to CLI flags.

Quickstart

ingestr ingest \
    --source-uri 'postgresql://admin:admin@localhost:8837/web?sslmode=disable' \
    --source-table 'public.some_data' \
    --dest-uri 'bigquery://<your-project-name>?credentials_path=/path/to/service/account.json' \
    --dest-table 'ingestr.some_data'

That's it.

This command:

  • gets the table public.some_data from the Postgres instance.
  • uploads this data to your BigQuery warehouse under the schema ingestr and table some_data.

Documentation

You can see the full documentation here.

Community

Join our Slack community here.

Contributing

Pull requests are welcome. However, please open an issue first to discuss what you would like to change. We maybe able to offer you help and feedback regarding any changes you would like to make.

Note

After cloning ingestr make sure to run make setup to install githooks.

Supported sources & destinations

Source Destination
Databases
AWS Athena βœ… βœ…
AWS Redshift βœ… βœ…
Cassandra βœ… βœ…
ClickHouse βœ… βœ…
Couchbase βœ… -
CrateDB βœ… βœ…
Databricks βœ… βœ…
DuckDB βœ… βœ…
DynamoDB βœ… βœ…
Elasticsearch βœ… βœ…
Google BigQuery βœ… βœ…
GCP Spanner βœ… -
IBM Db2 βœ… -
InfluxDB βœ… -
Kafka βœ… -
Local CSV file βœ… βœ…
MaxCompute βœ… βœ…
Microsoft Fabric βœ… βœ…
Microsoft OneLake - βœ…
Microsoft SQL Server βœ… βœ…
MongoDB βœ… βœ…
MotherDuck βœ… βœ…
MySQL βœ… βœ…
Oracle βœ… -
Postgres βœ… βœ…
RabbitMQ βœ… -
SAP Hana βœ… -
Snowflake βœ… βœ…
Socrata βœ… -
SQLite βœ… βœ…
Synapse - βœ…
Trino βœ… βœ…
Platforms
Adjust βœ… -
Airtable βœ… -
Allium βœ… -
Amazon Kinesis βœ… -
Anthropic βœ… -
AppsFlyer βœ… -
Apple Ads βœ… -
Apple App Store βœ… -
Applovin βœ… -
Applovin Max βœ… -
Asana βœ… -
Attio βœ… -
Azure Data Lake Storage Gen2 βœ… βœ…
Bruin βœ… -
Chess.com βœ… -
ClickUp βœ… -
Cursor βœ… -
Docebo βœ… -
Dune βœ… -
Facebook Ads βœ… -
Fireflies βœ… -
Fluxx βœ… -
Frankfurter βœ… -
Freshdesk βœ… -
FundraiseUp βœ… -
G2 βœ… -
GitHub βœ… -
Google Ads βœ… -
Google Analytics βœ… -
Google Cloud Storage (GCS) βœ… βœ…
Google Sheets βœ… -
Gorgias βœ… -
Granola βœ… -
Hostaway βœ… -
HubSpot βœ… -
Indeed βœ… -
Intercom βœ… -
Internet Society Pulse βœ… -
Jira βœ… -
JobTread βœ… -
Klaviyo βœ… -
Linear βœ… -
LinkedIn Ads βœ… -
Mailchimp βœ… -
Mixpanel βœ… -
Monday βœ… -
Notion βœ… -
Paddle βœ… -
Personio βœ… -
PhantomBuster βœ… -
Pinterest βœ… -
Pipedrive βœ… -
Plus Vibe AI βœ… -
PostHog βœ… -
Primer βœ… -
QuickBooks βœ… -
Reddit Ads βœ… -
RevenueCat βœ… -
S3 βœ… βœ…
Salesforce βœ… -
SFTP βœ… -
Shopify βœ… -
Slack βœ… -
Smartsheet βœ… -
Snapchat Ads βœ… -
Solidgate βœ… -
Stripe βœ… -
SurveyMonkey βœ… -
TikTok Ads βœ… -
Trustpilot βœ… -
Wise βœ… -
Zendesk βœ… -
Zoom βœ… -

Feel free to create an issue if you'd like to see support for another source or destination.

License

ingestr is source-available under the Functional Source License 1.1, with Apache 2.0 as the future license. You can use ingestr freely for internal production use, development, testing, education, research, and professional services. You cannot use ingestr to offer a competing commercial ingestion, ELT, connector, or managed data pipeline product/service.

Each version becomes Apache 2.0 two years after release.

Packages

 
 
 

Contributors

Languages