-
University of Science and Technology of China
- Beijing, China
-
00:29
(UTC +08:00) - https://zhendongwang6.github.io/
- https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN
Highlights
- Pro
Lists (27)
Sort Name ascending (A-Z)
chatgpt
clip
controlnet
dataset
diffusion model
face-anti-spoofing
face-forgery-detection
flow
gan
img2img
interview
knowledge distillation
large language models
large vision model
ocr
pretrain
r1
sam系列
score metrics
segmentation
subject driven generation
survey
tools
vae
video generation
vision_language
visual text generation
Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
stable diffusion webui colab
This repository contains the source code for the paper First Order Motion Model for Image Animation
High-Resolution Image Synthesis with Latent Diffusion Models
LAVIS - A One-stop Library for Language-Vision Intelligence
Code release for NeRF (Neural Radiance Fields)
Public facing notes page
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Image restoration with neural networks but without learning.
Using Low-rank adaptation to quickly fine-tune diffusion models.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Taming Transformers for High-Resolution Image Synthesis
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
[ICCV 2019] Monocular depth estimation from a single image
SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Open-source and strong foundation image recognition models.
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
The Python Code Tutorials
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)