ColQwen3 Embedding Model Benchmarks

Benchmark suite for comparing ColQwen3 embedding models (BASE vs AWQ quantized).

Models

Size	BASE Model	AWQ Model
4B	`TomoroAI/tomoro-colqwen3-embed-4b`	`shubhamg2208/tomoro-ai-colqwen3-embed-4b-w4a16-autoawq-seqlen-1024`
8B	`TomoroAI/tomoro-colqwen3-embed-8b`	`shubhamg2208/tomoro-ai-colqwen3-embed-8b-w4a16-autoawq-seqlen-1024`

Requirements

Python 3.11+
CUDA-capable GPU
uv package manager

Installation

uv sync

Running Benchmarks

4B Models

# Run both BASE and AWQ
uv run python benchmark.py

# Run BASE model only
uv run python benchmark.py --only_base --output_json base_results.json

# Run AWQ model only
uv run python benchmark.py --only_awq --output_json awq_results.json

8B Models

# Run both BASE and AWQ
uv run python benchmark_8b.py

# Run BASE model only
uv run python benchmark_8b.py --only_base --output_json base_8b_results.json

# Run AWQ model only
uv run python benchmark_8b.py --only_awq --output_json awq_8b_results.json

Configuration Options

Option	Default	Description
`--text_samples`	64	Number of text samples
`--text_batch_size`	8	Batch size for text
`--image_samples`	16	Number of image samples
`--image_batch_size`	4	Batch size for images
`--image_size`	512	Image dimensions (pixels)
`--warmup_steps`	3	Warmup iterations
`--measure_steps`	10	Measurement iterations
`--only_base`	-	Benchmark BASE model only
`--only_awq`	-	Benchmark AWQ model only
`--text_only`	-	Skip image benchmarks (text tower only)
`--sweep_batch_sizes`	-	Comma-separated batch sizes to sweep
`--high_batch_sweep`	-	High batch sizes for quantized model only (e.g. '512,1024,1536,2048')
`--output_json`	-	Save results to JSON file

Batch Size Sweep

The sweep mode tests multiple batch sizes to find the optimal throughput for each model under memory constraints. This demonstrates how quantization enables higher throughput by allowing larger batch sizes.

# Sweep batch sizes for BASE model
uv run python benchmark.py --only_base \
    --sweep_batch_sizes "8,16,32,64,128,256" \
    --output_json sweep_base.json

# Sweep batch sizes for Quantized model (can use larger batches)
uv run python benchmark.py --only_awq \
    --sweep_batch_sizes "8,16,32,64,128,256,512" \
    --output_json sweep_awq.json

# High batch sweep for Quantized model only (demonstrates max batch advantage)
uv run python benchmark.py --only_awq \
    --high_batch_sweep "512,1024,1536,2048,2560,3072" \
    --output_json sweep_awq_high.json

Example with Custom Parameters

uv run python benchmark.py \
    --text_samples 32 \
    --text_batch_size 8 \
    --image_samples 16 \
    --image_batch_size 4 \
    --warmup_steps 5 \
    --measure_steps 20 \
    --output_json results.json

Reports

REPORT.md - 4B model benchmark results
REPORT_8B.md - 8B model benchmark results

Notes

AutoRound quantization is applied only to the text tower; the vision tower remains in FP16/BF16
Run models separately (--only_base / --only_awq) to avoid GPU memory conflicts
Results are saved to JSON for further analysis when using --output_json
Quantized models use ~60% less memory, enabling larger batch sizes and higher throughput on memory-constrained GPUs

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tomoro-ai-colqwen3-embed-4b-w4a16-autoawq-seqlen-512		tomoro-ai-colqwen3-embed-4b-w4a16-autoawq-seqlen-512
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
REPORT.md		REPORT.md
REPORT_8B.md		REPORT_8B.md
awq_8b_results.json		awq_8b_results.json
awq_results.json		awq_results.json
base_8b_results.json		base_8b_results.json
base_results.json		base_results.json
benchmark.py		benchmark.py
benchmark_8b.py		benchmark_8b.py
configuration_colqwen3.py		configuration_colqwen3.py
main.py		main.py
modeling_colqwen3.py		modeling_colqwen3.py
processing_colqwen3.py		processing_colqwen3.py
pyproject.toml		pyproject.toml
sweep_4b_awq.json		sweep_4b_awq.json
sweep_4b_awq_high.json		sweep_4b_awq_high.json
sweep_4b_base.json		sweep_4b_base.json
sweep_4b_results.json		sweep_4b_results.json
sweep_8b_awq.json		sweep_8b_awq.json
sweep_8b_awq_high.json		sweep_8b_awq_high.json
sweep_8b_base.json		sweep_8b_base.json
test_text_only.py		test_text_only.py
test_vision.py		test_vision.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ColQwen3 Embedding Model Benchmarks

Models

Requirements

Installation

Running Benchmarks

4B Models

8B Models

Configuration Options

Batch Size Sweep

Example with Custom Parameters

Reports

Notes

About

Uh oh!

Releases

Packages

Languages

goodhamgupta/serving

Folders and files

Latest commit

History

Repository files navigation

ColQwen3 Embedding Model Benchmarks

Models

Requirements

Installation

Running Benchmarks

4B Models

8B Models

Configuration Options

Batch Size Sweep

Example with Custom Parameters

Reports

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages