-
Peking University
- Beijing
-
09:41
(UTC +08:00) - https://weizeming.github.io
- @weizeming25
- https://scholar.google.com/citations?user=Kyn1zdQAAAAJ
Lists (1)
Sort Name ascending (A-Z)
Stars
The official code implement of <Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima>
OSIM (Open Security Information Model)是 面向 AI 的开源安全数据标准化 项目,通过定义规范统一的安全数据 schema 语义层,破解行业数据碎片化难题,使安全团队、工具和 AI 系统能够在不同的数据源之间进行一致性推理和分析。致力于实现跨厂商、跨产品的安全数据无缝对接,为安全智能化升级与建立协同防御的核心数据打基础!
A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Qwen3Guard is a multilingual guardrail model series developed by the Qwen team at Alibaba Cloud.
TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Code for paper "MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM safety"
Open-source red teaming framework for MLLMs with 42+ attack methods
The loss landscape of Large Language Models resemble basin!
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
The Strata-Sword is a hierarchical Chinese-English jailbreak safety benchmark based on quantified reasoning complexity, developed in-house by Alibaba-AAIG | Strata-Sword 是 Alibaba-AAIG自研的中英文分层越狱攻击安…
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning Models to enhance their security and reliability.
Integrate the DeepSeek API into popular software
[ArXiv 2025] Imperceptible Jailbreaking against Large Language Models
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command li…
The official repository for paper: Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios
The official code repository for the paper "False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize".
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
This repository provides the official implementation of POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models.
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models