Stars
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
An open source implementation of CLIP.
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A method to increase the speed and lower the memory footprint of existing vision transformers.
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
Segment Anything in Medical Images
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
[ICLR'24 & IJCV‘25] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
This is the repository for "Synergistic Fine-tune and Inference Efficiency for Vision Transformer".
Implementation of neural network model that can generate natural language captions for images. Three different architectures are proposed and compared: first one uses vanilla recurrent neural netwo…
A Point Transformer based Auto-Encoder for Robot Grasping and Grasping Candidate Quality Inference.
Implementation of an end-to-end object pose estimator, based on PoseCNN, which consists of two stages - feature extraction with a backbone network and pose estimation represented by instance segmen…
A two-stage object detector, based on Faster R-CNN, which consists of two modules - Region Proposal Networks (RPN) and Fast R-CNN. Trained to detect a set of object classes and evaluate the detecti…
Improving PointNet through the use of Self-Attention Layers to combine overall with fine-grained features.