Starred repositories
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
An orchestration platform for the development, production, and observation of data assets.
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Always know what to expect from your data.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 16…
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data…
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Compare tables within or across databases
A Unified Library for Parameter-Efficient and Modular Transfer Learning
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Open-source IoT Gateway - integrates devices connected to legacy and third-party systems with ThingsBoard IoT Platform using Modbus, CAN bus, BACnet, BLE, OPC-UA, MQTT, ODBC and REST protocols
MetricFlow allows you to define, build, and maintain metrics in code.
Data Pipeline Framework using the singer.io spec
Open-source metadata collector based on ODD Specification