-
Tongji University/visiting student@UC Berkeley
- China
-
19:32
(UTC -12:00) - https://mao1207.github.io/
Stars
KoeYe / SWE-agent-UE
Forked from SWE-agent/SWE-agentSWE-agent that can solve Unreal Enging coding problem
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
OpenCE (Open Context Engineering): A community toolkit to implement, evaluate, and combine LLM context strategies (RAG, ACE, Compression). Evolved from the `ACE-open` reproduction.
Visualizing the attention of vision-language models
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
The code and data for the paper "Evaluating Model Perception of Color Illusions in Photorealistic Scenes"
Biomedical Visual Instruction Tuning with Clinician Preference Alignment
Optimus: the first large-scale pre-trained VAE language model
utilities for decoding deep representations (like sentence embeddings) back to text
Access a database of word frequencies, in various natural languages.
A Next-Generation Training Engine Built for Ultra-Large MoE Models
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
A curated list of reinforcement learning with human feedback resources (continually updated)
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Reproduce the fast matrix multiplication method based on Multiplying Matrices Without Multiplying and Bolt: Accelerated Data Mining with Fast Vector Compression , while doing the speedup of the und…
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Code for our CVPR'2024 paper "GauHuman: Articulated Gaussian Splatting from Monocular Human Videos"
Algorithm for converting a heterogeneous graph to a homogeneous graph
Code for "The One Where They Reconstructed 3D Humans and Environments in TV shows" appearing in ECCV 2022.