Small and Efficient Mathematical Reasoning LLMs
-
Updated
Jan 27, 2024 - Python
Small and Efficient Mathematical Reasoning LLMs
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)
Reproducible Language Agent Research
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
GRPO and SFT Finetune Qwen3 using Unsloth : Reasoning and Non-Reasoning Dataset
Short-CoT distilled GSM8K dataset generated with OpenAI gpt-oss-120b.
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
CS336 作业 5:基于 Qwen2.5 模型的 LLM 对齐与推理强化学习。完整实现了监督微调(SFT)与组相对策略优化(GRPO)算法,并在 GSM8K 数据集上完成零样本、在策与离策的训练与评估对比。
An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.
Multi-path reasoning with dynamic chains and consensus scoring for improved GSM8K benchmark performance.
Nano R1 Model is an AI-driven reasoning model built using reinforcement learning techniques. It focuses on decision-making and adaptability in dynamic environments, utilizing state-of-the-art machine learning methods to improve over time. Developed with Python and hosted on Hugging Face.
AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.
Hard Reasoning Benchmark filtered with disagreement scores
A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.
AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.
A tool to evaluate and compare local LLMs running on Ollama or LM Studio under identical conditions using deepeval's public benchmarks (MMLU, TruthfulQA, GSM8K).
Comprehensive benchmarking framework for RLVR/RLHF libraries on GSM8K mathematical reasoning dataset
End-to-end fine-tuning of Llama-3.1-8B on GSM8K using Unsloth + LoRA. Includes quantization to GGUF, Gradio deployment on HF Spaces. Built on a free T4 GPU.
Add a description, image, and links to the gsm8k topic page so that developers can more easily learn about it.
To associate your repository with the gsm8k topic, visit your repo's landing page and select "manage topics."