Stars
OpenMMLab Detection Toolbox and Benchmark
Image augmentation for machine learning experiments.
A paper list of object detection using deep learning.
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Object detection, 3D detection, and pose estimation using center point detection:
SOTA Re-identification Methods and Toolbox
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Simultaneous object detection and tracking using center points.
[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
Pytorch implementation of RetinaNet object detection.
A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights
Measuring Massive Multitask Language Understanding | ICLR 2021
Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.
Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
Boosting your Web Services of Deep Learning Applications.
A PyTorch-native inference engine with cache acceleration, parallelism and quantization for DiTs.
R-FCN with joint training and python support
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
[CVPR 2018] Cascaded Pyramid Network for Multi-Person Pose Estimation
The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition