Tags: LLAMATOR-Core/llamator
Tags
Release v3.4.0 (#176) * Refactor test preset functions to improve clarity. * Add CoP attack. * Add DoS Repetition Token Attack. * Improve saving attacker's and client's answers, including empty tested client answer in case of error. * Rename `get_tested_client_prompts` into `get_attack_prompts`. --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Release v3.3.0 (#157) 1. **Redesigned the output of testing parameter presets.** Added the following presets: `all`, `owasp:llm01`, `owasp:llm07`, `owasp:llm09`, `llm`, `vlm`, `eng`, `rus`. 2. **Added a new Linguistic Sandwich attack.** An adversarial prompt in a low-resource language is sandwiched between benign prompts in other languages. 3. **In the System Prompt Leakage attack, the heuristiс evaluation has been replaced with LLM-as-a-judge.** This checks the similarity between the system's output and the intended prompt based on the system description. 4. **The static Past Tense attack has become the dynamic Time Machine attack.** The attacking model now alters the temporal context of the adversarial prompt. 5. **Add new tag - `model`: `llm` / `vlm`** 6. **README update** - Enterprise Version announce 7. **Other minor fixes and improvements.** --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Release v3.2.0 (#144) * Added Deceptive Delight * Added Dialogue Injection Continuation * Added VLM Lowres PDFs Attack * Added VLM M-Attack * Added VLM Text Hallucination Attack * Introduced support for Vision Language Model (VLM) attacks, expanding the framework’s multimodal testing capabilities * Added Dialogue Injection Developer Mode*(formerly "Dialog Injection") * Renamed Harmful Behavior Multistage to PAIR * Added scoring to PAIR attack via the Judge Model * Revised and translated Harmbench dataset into Russian * Added `language` column to datasets and enabled filtering attacks by language * Updated `start_testing` to return a dictionary object with test results * Removed Complimentary Transition * Removed Typoglycemia Attack * Removed legacy `RU_*` attacks (now handled via language-based dataset filtering) --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com> Co-authored-by: 3ndetz <jayrawrr3@gmail.com> Co-authored-by: ti3c2 <ti3c2@yandex.com> Co-authored-by: svyatocheck <svyatwork2@gmail.com> Co-authored-by: Egorov, Michil <michil.egorov@x5.ru>
Release v3.1.0 (#126) * Enhance documentation and add judge model validation checks * Add chat badge to project overview and README for community engagement * Add Autodan Turbo * Add Dialogue Injection Attack * Switch parquet engine from fastparquet to pyarrow --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com> Co-authored-by: Artyom Semenov <129667548+wearetyomsmnv@users.noreply.github.com> Co-authored-by: 3ndetz <jayrawrr3@gmail.com>
Release v3.0.0 (#120) * Update LangChain versions; * Improve console output and progress bars; * Changed the way of setting parameters for the test start function; * Attack class now includes dictionaries with descriptions of various aspects of an attack; * Add verification for attack parameters; * Added a function for displaying templates with written attack presets; * Add a new config for the judge model, allowing it to be specified as a separate model; * Update examples in Jupyter notebooks; * Update the logging order of attack steps; * Add handling for emergency attack stoppages; * Add Shuffle Inconsistency attack (Original Paper: https://arxiv.org/html/2501.04931); * Add to attacks with datasets custom parameter for another dataset; * Refactor judge models interaction for Ethical Compliance, Logical Inconsistencies, Sycophancy tests; --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Release v2.2.0 (#86) * Add HarmBench Prompts * Add Suffix Attack * Remake Harmful Behavior Attack --------- Co-authored-by: Shine-afk <belyaevskij.nikita@gmail.com> Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Release v2.1.0 (#80) * Add Crescendo attack * Add BON attack * Add Docker example with Jupyter Notebook and installed LLAMATOR * Improve attack system prompt for Prompt Leakage * Other minor improvements and bug fixes --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Release v2.0.0 (#64) What's New: New Features & Enhancements - Introduced Multistage Attack: We've added a novel `multistage_depth` parameter to the `start_testing()` fucntion, allowing users to specify the depth of a dialogue during testing, enabling more sophisticated and targeted LLM Red teaming strategies. - Refactored Sycophancy Attack: The `sycophancy_test` has been renamed to `sycophancy`, transforming it into a multistage attack for increased effectiveness in uncovering model vulnerabilities. - Enhanced Logical Inconsistencies Attack: The `logical_inconsistencies_test` has been renamed to `logical_inconsistencies` and restructured as a multistage attack to better detect and exploit logical weaknesses within language models. - New Multistage Harmful Behavior Attack: Introducing `harmful_behaviour_multistage`, a more nuanced version of the original harmful behavior attack, designed for deeper penetration testing. - Innovative System Prompt Leakage Attack: We've developed a new multistage attack, `system_prompt_leakage`, leveraging jailbreak examples from dataset to target and exploit model internals. Improvements & Refinements - Conducted extensive refactoring for improved code efficiency and maintainability across the framework. - Made numerous small improvements and optimizations to enhance overall performance and user experience. --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
PreviousNext