Skip to content

DBT macros and tests for advanced statistical analysis, ML refactors from sk-learn and other libraries, and scalable big data workflows

License

Notifications You must be signed in to change notification settings

albertovpd/dbt_spice

Repository files navigation

DBT SPICE

Table of contents

Introduction

The aim of this repository is both to provide tools for Heavy crunching data (deep statistical analyses, Machine Learning methods refactored to DBT Jinja SQL, etc) and Big Data best practices with DBT (cleaners to run be triggered to maintain your datasets unpolluted, metadata crawlers for BigQuery, etc).

Currently focused on GCP work with BigQuery. Support to the mission & PRs are also accepted.

Maybe one day this can be turned into a DBT package to install.

Development

Methodology

Data processing macros will be developed using dummy CSVs as DBT seeds. Then they will be run against massive columns. Processing rows and computing times will be added to the documentation

Currently working with Python 3.11.9. DBT/SQL libraries at requirements.txt

  • Macros for data processing will be tested using CSVs as seeds to create the input and expected output

Features

Utils

BigQuery autocleaner

Description:

Helps to keep the BigQuery environment clean and organized. Automatically removes redundant objects in BigQuery (tables that are not needed anymore, tables that were renamed and the old versions still exist, etc)

Path:

macros/utils/bq_cleaner.sql

Numerical processing

...

String processing

...



Work in progress and backlog

⚒️ In progress

String Occurrence Count

📋 TODO
If there is a specific functionality that you would like to cover with DBT, contact me. 
Also support and PRs are accepted

TDF-IDF

Max-min Scaler

Z-score Scaler



Contact

About

DBT macros and tests for advanced statistical analysis, ML refactors from sk-learn and other libraries, and scalable big data workflows

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published