📖 Table of Contents [Back to Top]
- Datasets
- Dependency Installation
- Run Paper Inference Experiments
- Run Benchmark Inference Reasoning Experiments
- Run Benchmark Creative Writing Experiments
- Benchmark Decoding Methods
- Enhancements
🌠 Datasets [Back to Top]
The datasets used in this paper are located in the project's datasets folder, including AQuA, GPQA-main, GSM8K, MATH500, and Alpaca Creative Writing.
datasets/
-aqua.parquet
-gpqa-main.csv
-gsm8k.parquet
-math500.jsonl
-alpaca_eval.json🛸 Dependency Installation [Back to Top]
To install all the dependencies for our paper, run the following command:
pip install -r requirements.txtWe recommend you to build a new conda environment to use the repository.
conda create -n mink python=3.11
conda activate mink
pip install -r requirements.txt🚀 Run Paper Inference Experiments [Back to Top]
You could choose to run the inference experiments for our proposed method by using one of the following ways:
To run the inference experiments for our proposed method by using the huggingface transformers library, please run the following command:
conda activate mink
python llm_mink.py \
--τ 3.0 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \When switching to different datasets, replace the code line from AQuA import * with the corresponding dataset, such as GPQA, GSM8K, or MATH500, and also replace the associated functions accordingly.
🚀 Run Benchmark Inference Reasoning Experiments [Back to Top]
We compared 4 different decoding methods with our proposed method in our paper, those are: Top-k Sampling, Top-p Sampling, Min-p Sampling and Top-nσ Sampling. We compare those methods with the following hyperparameter combinations:
- Top-k Sampling: k=20
- Top-p Sampling: p=0.9
- Min-p Sampling: p=0.1
- Top-nσ Sampling: n=1.0
We run the decoding methods on the following 4 models:
We then benchmark the decoding quality of those decoding methods.
We used the dataset for model comparison in our paper to run the experiments.
To run the LLM inference experiments for top-k sampling decoding method, run the following command:
python llm_topk.py \
--k 20 \
--temperature 1.0\
--model_name Qwen3-4B-Instruct \To run the LLM inference experiments for top-p sampling decoding method, run the following command:
python llm_top-p.py \
--p 0.9 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \To run the LLM inference experiments for min-p sampling decoding method, run the following command:
python llm_minp.py \
--p 0.1\
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \To run the LLM inference experiments for top-nσ sampling decoding method, run the following command:
python llm_top-nσ.py \
--n 1.0 \
--temperature 1.0 \
--model_name Qwen3-4B-Instruct \When switching to different datasets, replace the code line from AQuA import * with the corresponding dataset, such as GPQA, GSM8K, or MATH500, and also replace the associated functions accordingly.
🚀 Run Benchmark Creative Writing Experiments [Back to Top]
We compared 6 different decoding methods with our proposed method in our paper, those are: Top-k Sampling, Top-p Sampling, Mirostat, η-Sampling, Min-p Sampling and Top-nσ Sampling. We compare those methods with the following hyperparameter combinations:
- Top-k Sampling: k=20
- Top-p Sampling: p=0.9
- Mirostat: τ=5.0
- η-Sampling: η=9×10^-4
- Min-p Sampling: p=0.1
- Top-nσ Sampling: n=1.0
We run the decoding methods on the following 2 models:
We use llm-as-judge Deepseek V3.2-Exp
To run the creative writing experiments, run the following command:
python creative writing.py \
--model_name Qwen3-4B-Instruct \
--num_prompt 500\🧪 Benchmark Decoding Methods [Back to Top]
To benchmark the decoding methods, please make sure you have all the dependencies installed.
💪 Enhancements [Back to Top]
Generation could likely be speed-up by:
- using
torch.compilein PyTorch 2.0, we implemented this by usingmax_autotunemode in the generation scripts, you may need to modify thetorch.compilecodes to fit your needs.
TF32 Note (important for Ampere, Hopper, and other recent NVIDIA GPUs users).
When we ran the above generation scripts, TF32 matmuls were disabled per PyTorch's defaults.