Stars
Kernel + OpenAGI template for building computer use agents
Fully Open Framework for Democratized Multimodal Training
Multimodal computer agent data collection program
Bring projects, wikis, and teams together with AI. AppFlowy is the AI collaborative workspace where you achieve more without losing control of your data. The leading open source Notion alternative.
vCard is a fully responsive personal portfolio website, responsive for all devices.
This repo contains the code for 1D tokenizer and generator
AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x
Janus-Series: Unified Multimodal Understanding and Generation Models
Create Open XML PowerPoint documents in Python
A PyTorch native platform for training generative AI models
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
verl: Volcano Engine Reinforcement Learning for LLMs
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2…
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Instant voice cloning by MIT and MyShell. Audio foundation model.
Custom nodes and workflows for SDXL in ComfyUI