-
cosmos-curate Public
Forked from nvidia-cosmos/cosmos-curateCosmos-Curate is a powerful video curation system that processes, analyzes, and organizes video content using advanced AI models and distributed computing.
Python Apache License 2.0 UpdatedDec 14, 2025 -
czkawka Public
Forked from qarmin/czkawkaMulti functional app to find duplicates, empty folders, similar images etc.
Rust Other UpdatedNov 17, 2025 -
cosmos-xenna Public
Forked from nvidia-cosmos/cosmos-xennaPython library for building and running distributed data pipelines using Ray
Python Apache License 2.0 UpdatedNov 17, 2025 -
whisper.cpp Public
Forked from ggml-org/whisper.cppPort of OpenAI's Whisper model in C/C++
C++ MIT License UpdatedOct 28, 2025 -
data-prep-kit Public
Forked from data-prep-kit/data-prep-kitOpen source project for data preparation of LLM application builders
HTML Apache License 2.0 UpdatedOct 23, 2025 -
Curator Public
Forked from NVIDIA-NeMo/CuratorScalable data pre processing and curation toolkit for LLMs
Python Apache License 2.0 UpdatedOct 19, 2025 -
data-juicer Public
Forked from datajuicer/data-juicerA one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Python Apache License 2.0 UpdatedOct 10, 2025 -
Awesome-LLMs-for-Video-Understanding Public
Forked from yunlong10/Awesome-LLMs-for-Video-Understanding🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
UpdatedOct 7, 2025 -
-
-
ray Public
Forked from ray-project/rayRay is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Python Apache License 2.0 UpdatedSep 5, 2025 -
-
Run Public
Forked from NVIDIA-NeMo/RunA tool to configure, launch and manage your machine learning experiments.
Python Apache License 2.0 UpdatedSep 3, 2025 -
Daft Public
Forked from Eventual-Inc/DaftDistributed data engine for Python/SQL designed for the cloud, powered by Rust
Rust Apache License 2.0 UpdatedSep 3, 2025 -
auron Public
Forked from apache/auronThe Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
Rust Apache License 2.0 UpdatedSep 2, 2025 -
lancedb Public
Forked from lancedb/lancedbDeveloper-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
Python Apache License 2.0 UpdatedAug 28, 2025 -
marimo Public
Forked from marimo-team/marimoTransform data, train models, and run SQL with marimo — feels like a next-gen reactive notebook, stored as Git-friendly Python. Deploy as scripts, pipelines, endpoints, and apps. All from an AI-nat…
Python Apache License 2.0 UpdatedAug 27, 2025 -
lilac Public
Forked from databricks/lilacCurate better data for LLMs
Python Apache License 2.0 UpdatedAug 27, 2025 -
fiftyone Public
Forked from voxel51/fiftyoneRefine high-quality datasets and visual AI models
Python Apache License 2.0 UpdatedAug 26, 2025 -
flyte Public
Forked from flyteorg/flyteScalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Go Apache License 2.0 UpdatedAug 19, 2025 -
bmf Public
Forked from BabitMF/bmfCross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high perf…
C++ Apache License 2.0 UpdatedAug 18, 2025 -
-
lance Public
Forked from lance-format/lanceModern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Rust Apache License 2.0 UpdatedAug 12, 2025 -
juicefs Public
Forked from juicedata/juicefsJuiceFS is a distributed POSIX file system built on top of Redis and S3.
Go Apache License 2.0 UpdatedAug 7, 2025 -
llm-datasets Public
Forked from mlabonne/llm-datasetsCurated list of datasets and tools for post-training.
UpdatedJul 27, 2025 -
Stirling-PDF Public
Forked from Stirling-Tools/Stirling-PDF#1 Locally hosted web application that allows you to perform various operations on PDF files
Java Other UpdatedJul 24, 2025 -
sail Public
Forked from lakehq/sailLakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Rust Apache License 2.0 UpdatedJul 12, 2025 -
verl Public
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Python Apache License 2.0 UpdatedJul 12, 2025 -
lance-ray Public
Forked from lance-format/lance-rayLance Format and Ray
Python Apache License 2.0 UpdatedJul 4, 2025 -
distilabel Public
Forked from argilla-io/distilabelDistilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Python Apache License 2.0 UpdatedJun 9, 2025