-
The Hong Kong Polytechnic University
- Hong Kong
- huofushuo.github.io
Stars
A Stealthy Inference Cost Attack in Edge Serving of LVLMs
[CVPR 2026 Hightlight] OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
[NeurIPS 2025] Hybrid Latent Reasoning via Reinforcement Learning
Codura is an intelligent code assistant designed to supercharge your IDE with context-aware code completion, inline explanations, test case generation, and refactoring suggestions. Powered by large…
🚀 Lightweight Redis GUI 🌐Full platform support🔥为爱发电💞
A LLM-based Agent that predict its tasks proactively.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Explore the Multimodal “Aha Moment” on 2B Model
Witness the aha moment of VLM with less than $3.
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
[NAACL 2025 Oral] From redundancy to relevance: Enhancing explainability in multimodal large language models
[NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
[ECCV 2024] ControlCap: Controllable Region-level Captioning
[EMNLP'24] EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security and NDSS).
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
[EMNLP'24] MedAdapter: Efficient Test-Time Adaptation of Large Language Models Towards Medical Reasoning
[ECCV 2024] FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Aligning pretrained language models with instruction data generated by themselves.
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
A very powerful and easy-to-use number precision calculation and formatting library.