Lists (1)
Sort Name ascending (A-Z)
Stars
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Python tool for converting files and office documents to Markdown.
Perplexity style AI Search engine clone built with Gemini 2.0 Flash and Grounding
腾讯自动驾驶仿真系统 TAD Sim (Tencent Autonomous Driving Simulation) 单机版是腾讯自动驾驶以建立更加安全和高效的自动驾驶测试工具为目标, 为自动驾驶系统研发和验证而量身定做的跨平台分布式系统.
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Label Studio is a multi-type data labeling and annotation tool with standardized output format
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
PyTorch Implementation of EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
[ICCV 2023] DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Crack LeetCode, not only how, but also why.
A technical report on convolution arithmetic in the context of deep learning
Python audio and music signal processing library
The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement …
TRACTA: Multi-Target Multi-Camera Tracking by Tracklet-to-Target Assignment
Exhaustive search Block Matching Algorithm to estimate the motion between two frame images.