LLM-Paper-To-Code

A collection of Jupyter notebooks implementing advanced LLM prompting techniques to solve Logic Grid Puzzles (Zebra Puzzles) using the ZebraLogicBench dataset.

Overview

This repository demonstrates practical implementations of research papers on Large Language Model (LLM) prompting strategies. The notebooks use GitHub Models (free tier) to solve constraint satisfaction problems in the form of Logic Grid Puzzles.

Logic Grid Puzzles, also known as Zebra Puzzles, are constraint satisfaction problems where you must deduce a unique correct assignment of values to houses based on given clues. These puzzles are commonly used to test logical reasoning abilities in exams such as the Law School Admission Test (LSAT).

Notebooks

1. Self_Consistency_Code.ipynb

Implements the Self-Consistency prompting technique, which improves reasoning accuracy by:

Sampling multiple reasoning paths from the LLM
Extracting final answers from each sample
Selecting the most consistent (most common) answer

This technique is based on the paper: "Self-Consistency Improves Chain of Thought Reasoning in Language Models"

Key Features:

Uses Chain-of-Thought (CoT) prompting
Generates multiple samples with temperature-based sampling
Implements majority voting for answer selection
Compares results against ground truth

2. DynamicCheatSheet.ipynb

Implements a dynamic few-shot learning approach with iteratively generated "cheat sheets":

Dynamically generates example-based guidance
Uses previously solved examples to improve performance
Demonstrates adaptive prompting strategies

This technique is based on the paper: "Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory" by Suzgun et al.

Prerequisites

Python 3.7+
Jupyter Notebook or Google Colab
GitHub account (for GitHub Models free tier)

Setup

Install dependencies:
```
pip install azure-ai-inference datasets
```
Get a GitHub token:
- GitHub Models provides free access to various LLMs
- You'll need a GitHub personal access token
- Check rate limits: GitHub Models Documentation
Set up Hugging Face access (for dataset):
- You'll need access to the ZebraLogicBench dataset
- Login to Hugging Face in the notebook

Usage

Running in Google Colab (Recommended)

Click on the "Open In Colab" badge above each notebook
Follow the setup instructions in the notebook
Add your GitHub token when prompted
Run all cells

Running Locally

Clone this repository:

git clone https://github.com/pacozaa/LLM-Paper-To-Code.git
cd LLM-Paper-To-Code

Start Jupyter Notebook:
```
jupyter notebook
```
Open the desired notebook and follow the instructions

Dataset

This project uses the ZebraLogicBench dataset:

Dataset Viewer: Hugging Face Dataset
Blog Post: Zebra Logic Benchmark
The dataset contains logic grid puzzles of various sizes (2x2, 3x3, etc.)

Prompt Templates

The repository includes YAML prompt templates for few-shot learning:

zebra-logic-1.prompt.yml - Basic prompt template
zebra-logic-2-longer.prompt.yml - Extended prompt template with more examples

These templates demonstrate:

System prompts for puzzle-solving
Few-shot examples with reasoning steps
Structured JSON output format

Techniques Implemented

Chain-of-Thought (CoT) Prompting: Encouraging step-by-step reasoning
Self-Consistency: Sampling multiple reasoning paths and majority voting
Few-Shot Learning: Using example problems to guide the model
Dynamic Examples: Iteratively building better prompts from solved examples

Models Tested

The notebooks work with various models available through GitHub Models:

GPT-4 variants: openai/gpt-4o, openai/gpt-4.1
Microsoft Phi: phi-4 (may also be referenced as microsoft/phi-4)

Note: Model identifiers should match the GitHub Models API naming conventions. Some models may accept shorthand names.

References

Self-Consistency Paper: Wang et al. "Self-Consistency Improves Chain of Thought Reasoning in Language Models"
Dynamic Cheatsheet Paper: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou. "Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory" - arXiv:2504.07952v1
ZebraLogicBench: A benchmark for evaluating logical reasoning in language models
GitHub Models: Prototyping with AI Models

Contributing

Contributions are welcome! Feel free to:

Add new prompting techniques
Improve existing implementations
Add more comprehensive examples
Enhance documentation

License

This project is open source and available for educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-Paper-To-Code

Overview

Notebooks

1. Self_Consistency_Code.ipynb

2. DynamicCheatSheet.ipynb

Prerequisites

Setup

Usage

Running in Google Colab (Recommended)

Running Locally

Dataset

Prompt Templates

Techniques Implemented

Models Tested

References

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
DynamicCheatSheet.ipynb		DynamicCheatSheet.ipynb
README.md		README.md
Self_Consistency_Code.ipynb		Self_Consistency_Code.ipynb
zebra-logic-1.prompt.yml		zebra-logic-1.prompt.yml
zebra-logic-2-longer.prompt.yml		zebra-logic-2-longer.prompt.yml

pacozaa/LLM-Paper-To-Code

Folders and files

Latest commit

History

Repository files navigation

LLM-Paper-To-Code

Overview

Notebooks

1. Self_Consistency_Code.ipynb

2. DynamicCheatSheet.ipynb

Prerequisites

Setup

Usage

Running in Google Colab (Recommended)

Running Locally

Dataset

Prompt Templates

Techniques Implemented

Models Tested

References

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages