Stars
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.
Vision infrastructure to turn complex documents into RAG/LLM-ready data
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Netty project - an event-driven asynchronous network application framework
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Open-Source Web UI for Apache Kafka Management
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
该项目整合了多款优秀的开源产品,构建了一个功能全面的数据开发平台。平台提供了强大的数据集成、数据开发、数据查询、数据服务、数据质量管理、工作流调度和元数据管理功能。#dinky #dolphinscheduler #datavines #flinkcdc #openmetadata #flink #数据开发 #数据平台 # 数据开发平台 #大数据
Free and Open Source, Distributed, RESTful Search Engine
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
A modular graph-based Retrieval-Augmented Generation (RAG) system
DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。
The official gpt4free repository | various collection of powerful language models | o4, o3 and deepseek r1, gpt-4.1, gemini 2.5
A damn simple library for building production-ready RESTful web services.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。