Skip to content
View yueliu1999's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yueliu1999

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 101 Updated Jan 30, 2026

[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Python 67 Updated Dec 10, 2025

AI Robustness Evaluation System

Python 34 19 Updated Feb 4, 2026

MAPO: MIXED ADVANTAGE POLICY OPTIMIZATION

Python 38 Updated Sep 24, 2025

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 383 29 Updated Jan 27, 2026

A benchmark for LLMs on complicated tasks in the terminal

Python 1,474 466 Updated Jan 22, 2026

[ICCV 2025] Official PyTorch Implementation of "Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction""

Python 51 1 Updated Sep 5, 2025

[ICCV 2025] Official PyTorch Implementation of "Learning Self-supervised Part-aware 3D Hybrid Representations of 2D Gaussians and Superquadrics"

Python 61 2 Updated Dec 22, 2025
Python 33 1 Updated Jun 24, 2025

[EMNLP2025] From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

296 36 Updated Nov 5, 2025

[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR

247 3 Updated Aug 28, 2025

open-source coding LLM for software engineering tasks

Python 1,122 139 Updated Sep 30, 2025

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Python 68 Updated Jul 24, 2025

Reinforcing General Reasoning without Verifiers

Python 96 6 Updated Jun 24, 2025

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Python 22 Updated Sep 23, 2025

🤖️ A collection of papers, blogs and projects of research agents.

6 Updated Feb 2, 2026

AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models

Shell 210 22 Updated Jan 28, 2026

A collection of resources and papers on AI Scientist / Robot Scientist

124 4 Updated Sep 30, 2025

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Python 51 3 Updated Jul 15, 2025

The official implementation of the work "Can Indirect Prompt Injection Attacks Be Detected and Removed?"

Python 5 1 Updated Dec 25, 2025

The official implementation of the work "Defense Against Prompt Injection Attack by Leveraging Attack Techniques"

Python 8 3 Updated Jul 22, 2025

[NeurIPS 2025] An official source code for paper "GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning".

Python 115 9 Updated Sep 19, 2025

Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"

Python 86 6 Updated May 27, 2025

Official implementation of MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Python 73 3 Updated Jun 26, 2025

Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning Models to enhance their security and reliability.

87 3 Updated Aug 25, 2025
Python 144 7 Updated May 6, 2025
Next