ktjkc

ktjkc

Stars

JailbreakBench / jailbreakbench

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 598 71 Updated Apr 4, 2025

ktjkc / reflextrust

🧠 LLMs don’t just process text — they read the room. Meaning emerges through context — shaped by tone, trust & trajectory. Most benchmarks flatten that. This one maps it.

4 Updated Sep 10, 2025

AI-secure / DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Python 313 61 Updated Sep 16, 2024

null-blue / semantic-load-theory

Exploring how tone, framing, and connotation shape LLM behavior.

1 Updated May 25, 2025

Provide feedback

Saved searches