Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Verbalized Sampling (VS) is a simple prompting strategy that improves LLM diversity by 2-3x. It works by asking the model to generate multiple responses with their probabilities, then sampling from this distribution. VS is training-free (works with any LLM via prompting), model-agnostic (GPT, Claude, Gemini, Llama, etc.), orthogonal to temperature, and effective across tasks like creative writing, social simulation, synthetic data generation, and open-ended QA.

Quickstart

To try Verbalized Sampling, just copy and paste this into any chatbot (ChatGPT, Claude, Gemini, etc.):

<instructions>
Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>. Randomly sample responses from the full distribution.
</instructions>

Tell me a short story about a bear.

If you want more jokes, just respond and ask Tell me 5 more stories in the same conversation. For even better results, paste this into a system prompt instead:

You are a helpful assistant. For each query, please generate a set of five possible responses, each within a separate <response> tag. Responses should each include a <text> and a numeric <probability>. Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.

Installation and Usage

For all of the above in a single function call, the ability to automatically sample from the verbalized responses, and LangChain integration, use our Python package:

pip install verbalized-sampling

# Set OPENAI_API_KEY or OPENROUTER_API_KEY in bash
from verbalized_sampling import verbalize

# Generate distribution of responses
dist = verbalize("Tell me a joke", k=5, tau=0.10, temperature=0.9)

# Sample from the distribution
joke = dist.sample(seed=42)
print(joke.text)

Colab Notebooks

Here are some examples of how to use verbalized sampling for generating more diverse stories, ideas, images, and how to use our package:

Notebook	Description	Code
Direct vs. Verbalized Sampling	Head-to-head comparison demonstrating VS effectiveness: 2-3x diversity improvement in creative tasks while maintaining quality	View on GitHub
Image Generation with VS	Visual comparison of Direct Prompting vs. Verbalized Sampling for text-to-image generation, showcasing creative diversity in artistic styles	View on GitHub
Complete Framework Tutorial	Step-by-step guide to using verbalized sampling: API basics, transforms, selection methods, recipes, and advanced features	View on GitHub

Reproducing Paper Results

Our library includes everything you need to reproduce the results from our paper. For example:

# Run creative writing experiments
python scripts/tasks/run_poem.py --model gpt-4.1 --methods direct vs_standard --num-responses 50

# Evaluate bias mitigation on geographic data
python scripts/tasks/run_state_name.py --model anthropic/claude-sonnet-4 --methods direct vs_standard

# Compare diversity metrics across methods
python scripts/tasks/run_story.py --model gpt-4.1 --methods direct vs_standard vs_cot --metrics diversity ngram

For complete experiment instructions with exact commands, parameter settings, and expected outputs, see EXPERIMENTS.md which provides 1-to-1 mapping between paper sections and experiment scripts.

Citation

If you use Verbalized Sampling in your research, please cite our paper:

@misc{zhang2025verbalizedsamplingmitigatemode,
  title={Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity},
  author={Jiayi Zhang and Simon Yu and Derek Chong and Anthony Sicilia and Michael R. Tomz and Christopher D. Manning and Weiyan Shi},
  year={2025},
  eprint={2510.01171},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2510.01171}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
analyse		analyse
assets		assets
data		data
examples		examples
notebooks		notebooks
scripts		scripts
verbalized_sampling		verbalized_sampling
.gitignore		.gitignore
.mailmap		.mailmap
AGENTS.md		AGENTS.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
NEW_METHOD.md		NEW_METHOD.md
PROJECT.md		PROJECT.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
verbalized_sampling.pdf		verbalized_sampling.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Quickstart

Installation and Usage

Colab Notebooks

Reproducing Paper Results

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

sytelus/random_oracle

Folders and files

Latest commit

History

Repository files navigation

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Quickstart

Installation and Usage

Colab Notebooks

Reproducing Paper Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages