Skip to content
View Ray-ui's full-sized avatar

Block or report Ray-ui

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

Python 3,602 527 Updated Oct 17, 2025

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Python 2,273 192 Updated May 27, 2026

High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.

Python 14,017 1,442 Updated Oct 28, 2025

A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

22 1 Updated Sep 12, 2025

A modern static resume template and theme. Powered by Jekyll and GitHub pages.

HTML 2,294 1,556 Updated Jun 15, 2024

ATS and Human-friendly Resume Writer in Markdown.

TypeScript 612 133 Updated Jun 16, 2026

📄 Awesome CV is LaTeX template for your outstanding job application

TeX 27,782 5,285 Updated Mar 13, 2026

An elegant \LaTeX\ résumé template. 大陆镜像 https://gods.coding.net/p/resume/git

TeX 11,201 2,855 Updated Mar 15, 2024

Masked Depth Modeling for Spatial Perception

Python 1,227 95 Updated Jun 18, 2026

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 10,655 1,608 Updated Jun 15, 2026

[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Python 2,569 194 Updated Nov 2, 2025

[ICCV 2019] Monocular depth estimation from a single image

Jupyter Notebook 4,494 988 Updated Aug 10, 2024

TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding

Python 24 5 Updated Nov 18, 2025

Papers related to wireless large AI models and wireless foundation models.

26 Updated May 16, 2025

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding

Jupyter Notebook 2 1 Updated Nov 17, 2025
Python 4,057 677 Updated Aug 24, 2025

Open-source and strong foundation image recognition models.

Jupyter Notebook 3,675 323 Updated Feb 18, 2025
Python 56 3 Updated Sep 13, 2024

[CVPR24] Official Implementation of GEM (Grounding Everything Module)

Python 140 7 Updated Apr 10, 2025

Official code for the VideoGEM paper

Python 6 Updated Sep 18, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 33,836 4,020 Updated Mar 25, 2026

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 292 19 Updated Aug 5, 2025

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,504 130 Updated Aug 5, 2025

This repository is intended to host tools and demos for ActivityNet

Jupyter Notebook 975 323 Updated Mar 21, 2024

[ICLR 2026] FastVGGT: Fast Visual Geometry Transformer

Python 795 45 Updated Jan 28, 2026

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Python 658 60 Updated Jun 23, 2026

TALL: Temporal Activity Localization via Language Query

Python 220 50 Updated Mar 15, 2018

Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

Python 16 1 Updated Oct 12, 2025

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 4,220 516 Updated Mar 23, 2026

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Python 379 34 Updated May 8, 2024
Next