Lists (17)
Sort Name ascending (A-Z)
3d boxes
3d detectors on 2d imagecode_llms
llms for codeCourses
CV useful
datasets
small image-netexersies
faceswap
highlight_detection
intresting stuff
learning
linux_hardware_utils
llm_assist_code
llm_stuff
useful stuff for LLM taskssoftware_dev stuff
Useful
Video_related
video_to_audio
video -> audio retrivalStars
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Scalable and memory-optimized training of diffusion models
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
[ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning.
A playbook for systematically maximizing the performance of deep learning models.
A Conversational Speech Generation Model
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Browser extension to add git graph to GitHub website.
Simple frontend for LLMs built in react-native.
LLM2CLIP significantly improves already state-of-the-art CLIP models.
Erasing Concepts from Diffusion Models
Fast Open-Source Search & Clustering engine Γ for Vectors & Arbitrary Objects Γ in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram π
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and π video, up to 5x faster than OpenAI CLIP and LLaVA πΌοΈ & ποΈ
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
aider is AI pair programming in your terminal
Development repository for the Triton language and compiler
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Build and share delightful machine learning apps, all in Python. π Star to support our work!
π R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
A personal knowledge management and sharing system for VSCode
ControlNet++: All-in-one ControlNet for image generations and editing!
Omnivore is a complete, open source read-it-later solution for people who like reading.
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"