Skip to content

YecanLee/Mink

Repository files navigation

Min-k Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics

[Project Page] | Run Analysis Baseline

📖 Table of Contents [Back to Top]

🌠 Datasets [Back to Top]

The datasets used in this paper are located in the project's datasets folder, including AQuA, GPQA-main, GSM8K, MATH500, and Alpaca Creative Writing.

datasets/
-aqua.parquet
-gpqa-main.csv
-gsm8k.parquet
-math500.jsonl
-alpaca_eval.json

🛸 Dependency Installation [Back to Top]

To install all the dependencies for our paper, run the following command:

pip install -r requirements.txt

We recommend you to build a new conda environment to use the repository.

conda create -n mink python=3.11
conda activate mink
pip install -r requirements.txt

🚀 Run Paper Inference Experiments [Back to Top]

You could choose to run the inference experiments for our proposed method by using one of the following ways:

Run with huggingface transformers library

To run the inference experiments for our proposed method by using the huggingface transformers library, please run the following command:

conda activate mink

python llm_mink.py \ 
--τ 3.0 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \

When switching to different datasets, replace the code line from AQuA import * with the corresponding dataset, such as GPQA, GSM8K, or MATH500, and also replace the associated functions accordingly.

🚀 Run Benchmark Inference Reasoning Experiments [Back to Top]

We compared 4 different decoding methods with our proposed method in our paper, those are: Top-k Sampling, Top-p Sampling, Min-p Sampling and Top-nσ Sampling. We compare those methods with the following hyperparameter combinations:

  • Top-k Sampling: k=20
  • Top-p Sampling: p=0.9
  • Min-p Sampling: p=0.1
  • Top-nσ Sampling: n=1.0

We run the decoding methods on the following 4 models:

We then benchmark the decoding quality of those decoding methods.

We used the dataset for model comparison in our paper to run the experiments.

To run the LLM inference experiments for top-k sampling decoding method, run the following command:

python llm_topk.py \
--k 20 \
--temperature 1.0\
--model_name Qwen3-4B-Instruct \

To run the LLM inference experiments for top-p sampling decoding method, run the following command:

python llm_top-p.py \
--p 0.9 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \

To run the LLM inference experiments for min-p sampling decoding method, run the following command:

python llm_minp.py \
--p 0.1\
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \

To run the LLM inference experiments for top-nσ sampling decoding method, run the following command:

python llm_top-nσ.py \
--n 1.0 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \

When switching to different datasets, replace the code line from AQuA import * with the corresponding dataset, such as GPQA, GSM8K, or MATH500, and also replace the associated functions accordingly.

🚀 Run Benchmark Creative Writing Experiments [Back to Top]

We compared 6 different decoding methods with our proposed method in our paper, those are: Top-k Sampling, Top-p Sampling, Mirostat, η-Sampling, Min-p Sampling and Top-nσ Sampling. We compare those methods with the following hyperparameter combinations:

  • Top-k Sampling: k=20
  • Top-p Sampling: p=0.9
  • Mirostat: τ=5.0
  • η-Sampling: η=9×10^-4
  • Min-p Sampling: p=0.1
  • Top-nσ Sampling: n=1.0

We run the decoding methods on the following 2 models:

We use llm-as-judge Deepseek V3.2-Exp

To run the creative writing experiments, run the following command:

python creative writing.py \
--model_name Qwen3-4B-Instruct \
--num_prompt 500\

🧪 Benchmark Decoding Methods [Back to Top]

To benchmark the decoding methods, please make sure you have all the dependencies installed.

💪 Enhancements [Back to Top]

Generation could likely be speed-up by:

  • using torch.compile in PyTorch 2.0, we implemented this by using max_autotune mode in the generation scripts, you may need to modify the torch.compile codes to fit your needs.

TF32 Note (important for Ampere, Hopper, and other recent NVIDIA GPUs users).
When we ran the above generation scripts, TF32 matmuls were disabled per PyTorch's defaults.

About

[ACL 2026 Main] Official PyTorch Implementation of "Min-k Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages