-
Serna.ai
- India
-
17:20
(UTC +05:30) - in/rishi-swethan
- https://medium.com/@rishiswethan.c.r
Stars
[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
This CNN is capable of diagnosing breast cancer from an eosin stained image. This model was trained using 400 images. It has an accuracy of 80%
Python3 library for downloading YouTube Videos.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
A high-throughput and memory-efficient inference and serving engine for LLMs
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
The official implementation of "Divergence of Features and Mean: A BatchNorm-based Abnormality Criterion for Weakly Supervised Video Anomaly Detection"
Images to inference with no labeling (use foundation models to train supervised models).
a state-of-the-art-level open visual language model | 多模态预训练模型
Classification of Fundus Images into 5 stages of Diabetic Retinopathy, and segmentation of blood vessels in fundus images
Refine high-quality datasets and visual AI models
Zero-shot crack detection with SAM and Grounding DINO.
PLVS is a real-time SLAM system with points, lines, volumetric mapping and 3D unsupervised incremental segmentation.
Efficient vision foundation models for high-resolution generation and perception.
Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
The repo contains an audio emotion detection model, facial emotion detection model, and a model that combines both these models to predict emotions from a video
Package for imputing the arterial blood pressure (ABP) waveform from non-invasive physiological waveforms (PPG & ECG) using a deep neural network
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Code repository for the paper "On the Benefits of 3D Pose and Tracking for Human Action Recognition", (CVPR 2023)
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.
ImageBind One Embedding Space to Bind Them All
Feature rich WhatsApp Client for Desktop Linux
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
OpenMMLab Semantic Segmentation Toolbox and Benchmark.