Skip to content
View yueliu1999's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yueliu1999

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Python 66 Updated Dec 10, 2025

AI Robustness Evaluation System

Python 33 17 Updated Dec 19, 2025

MAPO: MIXED ADVANTAGE POLICY OPTIMIZATION

Python 38 Updated Sep 24, 2025

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 353 25 Updated Dec 13, 2025

A benchmark for LLMs on complicated tasks in the terminal

Python 1,237 439 Updated Dec 20, 2025

[ICCV 2025] Official PyTorch Implementation of "Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction""

Python 50 1 Updated Sep 5, 2025

[ICCV 2025] Official PyTorch Implementation of "Learning Self-supervised Part-aware 3D Hybrid Representations of 2D Gaussians and Superquadrics"

Python 60 2 Updated Aug 21, 2025
Python 33 1 Updated Jun 24, 2025

[EMNLP2025] From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

282 36 Updated Nov 5, 2025

[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR

243 3 Updated Aug 28, 2025

open-source coding LLM for software engineering tasks

Python 1,072 128 Updated Sep 30, 2025

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Python 68 Updated Jul 24, 2025

Reinforcing General Reasoning without Verifiers

Python 92 6 Updated Jun 24, 2025

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Python 19 Updated Sep 23, 2025

🤖️ A collection of papers, blogs and projects of research agents.

6 Updated Oct 14, 2025

AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models

Shell 207 22 Updated Nov 15, 2025

A collection of resources and papers on AI Scientist / Robot Scientist

117 4 Updated Sep 30, 2025

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Python 50 3 Updated Jul 15, 2025

The official implementation of the work "Can Indirect Prompt Injection Attacks Be Detected and Removed?"

Python 5 1 Updated Jul 31, 2025

The official implementation of the work "Defense Against Prompt Injection Attack by Leveraging Attack Techniques"

Python 7 2 Updated Jul 22, 2025

[NeurIPS 2025] An official source code for paper "GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning".

Python 114 7 Updated Sep 19, 2025

Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"

Python 83 7 Updated May 27, 2025

Official implementation of MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Python 70 3 Updated Jun 26, 2025

Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning Models to enhance their security and reliability.

83 3 Updated Aug 25, 2025
Python 142 7 Updated May 6, 2025

Awesome-Efficient-Inference-for-LRMs is a collection of state-of-the-art, novel, exciting, token-efficient methods for Large Reasoning Models (LRMs). It contains papers, codes, datasets, evaluation…

233 15 Updated Jun 14, 2025
Next