- Buffalo, USA
- tianyunjie96@gmail.com
- https://sunsmarterjie.github.io
Stars
[NeurIPS 2025] YOLOv12: Attention-Centric Real-Time Object Detectors
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
[NeurIPS 2025 Spotlight] ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coder
The matplotlib-based software to generate chart dataset contains chart image, data and visual attributes json file. Originated from chart editing projects
vHeat: Building Vision Models upon Heat Conduction
[ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
[Neurocomputing] The official code for "H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation".
A simple and efficient Mamba implementation in pure PyTorch and MLX.
VMamba: Visual State Space Models,code is based on mamba
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
[CSUR] A Survey on Video Diffusion Models
Generative Models by Stability AI
Official implementation of AnimateDiff.
[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
This Repository is "SSL for Image Representation", one of the OpenLab of the PseudoLab.
A toolbox for object skeleton detection, can also be used for edge detection, building extraction and road extraction. TIP (2021)
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image