gsm8k

An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.

Updated Nov 2, 2025
Python

jpordoy / -Dynamic-Multi-Chain-Multi-Path-Reasoning-with-Consensus

Star

Multi-path reasoning with dynamic chains and consensus scoring for improved GSM8K benchmark performance.

consensus claude llm gsm8k multi-path-reasoning

Updated Feb 11, 2026
Jupyter Notebook

DeveloperZeeshu / Nano_R1-model

Star

Nano R1 Model is an AI-driven reasoning model built using reinforcement learning techniques. It focuses on decision-making and adaptability in dynamic environments, utilizing state-of-the-art machine learning methods to improve over time. Developed with Python and hosted on Hugging Face.

python neural-network pytorch transformer gsm8k unsloth qwen2-5 grpo

Updated Jun 16, 2025

Mantissagithub / alphazero_llm_trainer

Star

AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.

reinforcement-learning pytorch mcts alphazero gsm8k reward-model llm-train mathematical-reas

Updated Dec 2, 2025
Python

strongSoda / prompt-sculpting

Star

You Don't Need Prompt Engineering Anymore: The Prompting Inversion

ai llm prompt-engineering gsm8k gpt5 gpt4o

Updated Oct 28, 2025
Python

DURGESH716 / Creating-Hard-Reasoning-Benchmark

Star

Hard Reasoning Benchmark filtered with disagreement scores

benchmarks mathematical-modelling model-evaluation gsm8k arc-agi model-reliability

Updated Feb 14, 2026
Python

goblinasaddy / nanoJEPA

Star

A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.

deep-learning pytorch transformer research-project representation-learning language-model latent-space gsm8k jepa math-reasoning

Updated Feb 28, 2026
Python

HyperKuvid-Labs / alphazero_llm_trainer

Star

AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.

reinforcement-learning pytorch mcts alphazero gsm8k reward-model mathematical-reasonin

Updated Dec 20, 2025
Python

Shuichi346 / llm-benchmark-script

Star

A tool to evaluate and compare local LLMs running on Ollama or LM Studio under identical conditions using deepeval's public benchmarks (MMLU, TruthfulQA, GSM8K).

python macos benchmark quantization model-evaluation apple-silicon llm gsm8k local-llm mmlu ollama lmstudio truthfulqa deepeval

Updated Mar 14, 2026
Python

manncodes / rlvr-gsm8k-benchmark

Star

Comprehensive benchmarking framework for RLVR/RLHF libraries on GSM8K mathematical reasoning dataset

benchmark machine-learning reinforcement-learning deep-learning evaluation transformers llm rlhf gsm8k rlvr

Updated Oct 7, 2025
Python

imadi124 / EfficientMath-AI

Star

End-to-end fine-tuning of Llama-3.1-8B on GSM8K using Unsloth + LoRA. Includes quantization to GGUF, Gradio deployment on HF Spaces. Built on a free T4 GPU.

ai math inference text-generation lora huggingface-datasets huggingface-spaces generative-ai llamacpp gsm8k peft-fine-tuning-llm unsloth llama3

Updated Mar 29, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the gsm8k topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gsm8k topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gsm8k

Here are 30 public repositories matching this topic...

akjindal53244 / Arithmo

TianHongZXY / CoRe

om-ai-lab / open-agent-leaderboard

THU-KEG / DICE

declare-lab / LLM-ReasoningTest

ambideXtrous9 / SFT-GRPO-GSPO-Finetune-on-Qwen3-Unsloth-Reasoning-and-Non-Reasoning-Dataset

HAD653 / gsm8k-cot-120b

SuperBruceJia / GSM8K-Consistency

ZZZ150751 / cs336_spring2025_assignment5

msmrexe / llm-math-reasoning-analysis

jpordoy / -Dynamic-Multi-Chain-Multi-Path-Reasoning-with-Consensus

DeveloperZeeshu / Nano_R1-model

Mantissagithub / alphazero_llm_trainer

strongSoda / prompt-sculpting

DURGESH716 / Creating-Hard-Reasoning-Benchmark

goblinasaddy / nanoJEPA

HyperKuvid-Labs / alphazero_llm_trainer

Shuichi346 / llm-benchmark-script

manncodes / rlvr-gsm8k-benchmark

imadi124 / EfficientMath-AI

Improve this page

Add this topic to your repo