Skip to content

Tags: LLAMATOR-Core/llamator

Tags

v3.4.0

Toggle v3.4.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v3.4.0 (#176)

* Refactor test preset functions to improve clarity.
* Add CoP attack.
* Add DoS Repetition Token Attack.
* Improve saving attacker's and client's answers, including empty tested client answer in case of error.
* Rename `get_tested_client_prompts` into `get_attack_prompts`.

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

v3.3.0

Toggle v3.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v3.3.0 (#157)

1.  **Redesigned the output of testing parameter presets.** Added the following presets: `all`, `owasp:llm01`, `owasp:llm07`, `owasp:llm09`, `llm`,  `vlm`, `eng`, `rus`.
2. **Added a new Linguistic Sandwich attack.** An adversarial prompt in a low-resource language is sandwiched between benign prompts in other languages.
3. **In the System Prompt Leakage attack, the heuristiс evaluation has been replaced with LLM-as-a-judge.** This checks the similarity between the system's output and the intended prompt based on the system description.
4.  **The static Past Tense attack has become the dynamic Time Machine attack.** The attacking model now alters the temporal context of the adversarial prompt.
5. **Add new tag - `model`: `llm` / `vlm`**
6. **README update** - Enterprise Version announce
7. **Other minor fixes and improvements.**

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

v3.2.0

Toggle v3.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v3.2.0 (#144)

* Added Deceptive Delight
* Added Dialogue Injection Continuation
* Added VLM Lowres PDFs Attack
* Added VLM M-Attack
* Added VLM Text Hallucination Attack
* Introduced support for Vision Language Model (VLM) attacks, expanding the framework’s multimodal testing capabilities
* Added Dialogue Injection Developer Mode*(formerly "Dialog Injection")
* Renamed Harmful Behavior Multistage to PAIR
* Added scoring to PAIR attack via the Judge Model 
* Revised and translated Harmbench dataset into Russian
* Added `language` column to datasets and enabled filtering attacks by language
* Updated `start_testing` to return a dictionary object with test results
* Removed Complimentary Transition
* Removed Typoglycemia Attack
* Removed legacy `RU_*` attacks (now handled via language-based dataset filtering)

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Co-authored-by: 3ndetz <jayrawrr3@gmail.com>
Co-authored-by: ti3c2 <ti3c2@yandex.com>
Co-authored-by: svyatocheck <svyatwork2@gmail.com>
Co-authored-by: Egorov, Michil <michil.egorov@x5.ru>

v3.1.0

Toggle v3.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v3.1.0 (#126)

* Enhance documentation and add judge model validation checks

* Add chat badge to project overview and README for community engagement

* Add Autodan Turbo

* Add Dialogue Injection Attack

* Switch parquet engine from fastparquet to pyarrow

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>
Co-authored-by: Artyom Semenov <129667548+wearetyomsmnv@users.noreply.github.com>
Co-authored-by: 3ndetz <jayrawrr3@gmail.com>

v3.0.0

Toggle v3.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v3.0.0 (#120)

* Update LangChain versions;
* Improve console output and progress bars;
* Changed the way of setting parameters for the test start function;
* Attack class now includes dictionaries with descriptions of various aspects of an attack;
* Add verification for attack parameters;
* Added a function for displaying templates with written attack presets;
* Add a new config for the judge model, allowing it to be specified as a separate model;
* Update examples in Jupyter notebooks;
* Update the logging order of attack steps;
* Add handling for emergency attack stoppages;
* Add Shuffle Inconsistency attack (Original Paper: https://arxiv.org/html/2501.04931);
* Add to attacks with datasets custom parameter for another dataset;
* Refactor judge models interaction for Ethical Compliance, Logical Inconsistencies, Sycophancy tests;

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

v2.3.1

Toggle v2.3.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.3.1 (#95)

* Add video guides about Red Teaming and LLAMATOR

* Update Documentation: copyright, guides section

* Fix null checking for multistage attacks

* Enhance sycophancy

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>

v2.2.0

Toggle v2.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.2.0 (#86)

* Add HarmBench Prompts

* Add Suffix Attack

* Remake Harmful Behavior Attack

---------

Co-authored-by: Shine-afk <belyaevskij.nikita@gmail.com>
Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

v2.1.0

Toggle v2.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.1.0 (#80)

* Add Crescendo attack

* Add BON attack

* Add Docker example with Jupyter Notebook and installed LLAMATOR

* Improve attack system prompt for Prompt Leakage

* Other minor improvements and bug fixes

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

v2.0.1

Toggle v2.0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.0.1 (#67)

* small fix for attacks and add strip parameter for ChatSession

---------

Co-authored-by: Низамов Тимур Дамирович <abc@nizamovtimur.ru>

v2.0.0

Toggle v2.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.0.0 (#64)

What's New:

New Features & Enhancements
- Introduced Multistage Attack: We've added a novel `multistage_depth` parameter to the `start_testing()` fucntion, allowing users to specify the depth of a dialogue during testing, enabling more sophisticated and targeted LLM Red teaming strategies.
- Refactored Sycophancy Attack: The `sycophancy_test` has been renamed to `sycophancy`, transforming it into a multistage attack for increased effectiveness in uncovering model vulnerabilities.
- Enhanced Logical Inconsistencies Attack: The `logical_inconsistencies_test` has been renamed to `logical_inconsistencies` and restructured as a multistage attack to better detect and exploit logical weaknesses within language models.
- New Multistage Harmful Behavior Attack: Introducing `harmful_behaviour_multistage`, a more nuanced version of the original harmful behavior attack, designed for deeper penetration testing.
- Innovative System Prompt Leakage Attack: We've developed a new multistage attack, `system_prompt_leakage`, leveraging jailbreak examples from dataset to target and exploit model internals.

Improvements & Refinements
- Conducted extensive refactoring for improved code efficiency and maintainability across the framework.
- Made numerous small improvements and optimizations to enhance overall performance and user experience.

---------

Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru>
Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>