Skip to content

houdhoudGH/QuizGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

21 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โœจ QuizGen โ€” LLM-Powered MCQ Generator

Turn any document into ready-to-use multiple-choice quizzes with Llama 3.1, LangChain, and a two-stage generate-and-review pipeline.

Python Streamlit LangChain HuggingFace License: MIT


Upload a PDF or TXT, pick a subject + difficulty, get a clean CSV of quiz questions back โ€”
each one generated AND reviewed by Llama 3.1 before it reaches you.


QuizGen interface

๐ŸŒŸ What Makes This Project Different

LangChain MCQ generators are everywhere. This one earns its place through four engineering choices:

  • ๐Ÿ”— Two-stage SequentialChain โ€” generation and review are separate LLM passes; the second pass critiques the first and fixes unsuitable questions before the user ever sees them
  • ๐Ÿ“ Structured output enforcement โ€” strict JSON schema injected into the prompt (RESPONSE_JSON) makes the LLM output parseable every time, no regex cleanup needed
  • ๐Ÿ“ฆ Production packaging โ€” proper src/mcqgenerator/ Python package with setup.py, logging module, and utility separation โ€” not a notebook dump
  • ๐ŸŽจ Polished UI โ€” custom Streamlit theme with gradient hero, card-based MCQ display, and color-coded answer reveals โ€” not the default look

๐Ÿ—บ๏ธ The Two-Stage Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚       User uploads PDF / TXT + settings      โ”‚
โ”‚   (subject, difficulty, number of MCQs)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚   Stage 1 โ€” Quiz Chain   โ”‚  Llama 3.1 generates N MCQs
        โ”‚   PromptTemplate +       โ”‚  with strict JSON schema
        โ”‚   LLMChain               โ”‚  Output โ†’ quiz (JSON string)
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  Stage 2 โ€” Review Chain  โ”‚  Llama 3.1 critiques the quiz
        โ”‚  Same LLM, new prompt    โ”‚  for difficulty + suitability
        โ”‚  Rewrites unsuitable Qs  โ”‚  Output โ†’ review + corrections
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚   JSON parsing โ†’ table   โ”‚  Structured to: MCQ | Choices | Correct
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚   Streamlit UI + CSV     โ”‚  Preview + one-click download
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The SequentialChain orchestrates both stages โ€” Stage 1's output feeds Stage 2 as input, all in one call from the Streamlit app.


๐Ÿ’ฌ QuizGen in Action

๐ŸŽจ Clean, intuitive interface

Upload a document, configure subject/difficulty, hit generate. That's it.

Main interface

๐Ÿ“‹ Beautiful card-based quiz display

Each MCQ renders as its own card with numbered badge, choices, and a color-coded correct answer reveal:

Generated quiz results

๐Ÿ”„ Multiple questions, consistent quality

The two-stage pipeline ensures every question gets reviewed before display:

More quiz questions

๐Ÿ” AI Quality Review + CSV Export

After generation, the AI's own quality review is shown alongside a one-click CSV export:

AI review and export

๐Ÿ“Š Example Output

Input: sample_ml_fundamentals.txt (a 6 KB article on machine learning basics) Settings: Subject = Machine Learning, Difficulty = Medium, Number = 6

A few of the generated questions:

# Question Correct
1 What type of machine learning uses labeled data to learn from examples? c (Supervised learning)
3 What is the term for a model that learns the training data too well and performs poorly on new data? b (Overfitting)
6 What is the term for the phenomenon where a model's performance degrades over time as the world changes? a (Concept drift)

Plus an AI quality review: "Moderate difficulty. No changes needed."


๐Ÿ”ฌ How It Works

Stage 1 โ€” Quiz Generation Prompt

The LLM is given the source text plus a strict instruction to return only valid JSON matching a schema:

TEMPLATE_QUIZ = """
{system_msg}
Context: {text}
Your task is to write exactly {number} multiple-choice questions based on the
above content. The questions should be appropriate for {subject} students and
written in a {difficulty} difficulty.
Return ONLY a JSON object matching the format shown in RESPONSE_JSON below.
Do not include any extra explanation.

### RESPONSE_JSON
{response_json}
"""

The RESPONSE_JSON (loaded from Response.json) gives the model a concrete schema to mimic:

{
  "1": {
    "mcq": "multiple choice question",
    "options": {"a": "choice", "b": "choice", "c": "choice", "d": "choice"},
    "correct": "correct answer"
  }
}

This pattern โ€” show the model the exact output shape โ€” is dramatically more reliable than describing the format in prose.

Stage 2 โ€” Review Chain

The generated quiz is fed back into the LLM with a new prompt:

TEMPLATE_REVIEW = """
{system_msg}
Below is a quiz for {subject} students. Review its difficulty in no more
than 50 words. If any question is not suitable, rewrite only the problem
parts in a suitable difficulty.

Quiz:
{quiz}
"""

This catches questions that are too easy, too obscure, or off-topic โ€” a critical safety net for educational content.

SequentialChain Orchestration

Both chains are wired together so a single call produces both outputs:

combined_chain = SequentialChain(
    chains=[quiz_chain, review_chain],
    input_variables=["system_msg", "text", "number", "subject",
                     "difficulty", "response_json"],
    output_variables=["quiz", "review"],
    verbose=True
)

๐Ÿ“ Repository Structure

quizgen/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ mcqgenerator/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ MCQgenerator.py    # SequentialChain definition
โ”‚       โ”œโ”€โ”€ utils.py           # read_file, get_table_data
โ”‚       โ””โ”€โ”€ logger.py          # Logging setup
โ”œโ”€โ”€ docs/
โ”‚   โ””โ”€โ”€ images/                # Screenshots for this README
โ”œโ”€โ”€ app.py                     # Streamlit web interface
โ”œโ”€โ”€ mcq.ipynb                  # Pipeline development notebook
โ”œโ”€โ”€ test.py                    # Logging test
โ”œโ”€โ”€ data.txt                   # Sample input
โ”œโ”€โ”€ Response.json              # JSON schema template
โ”œโ”€โ”€ quiz.csv                   # Sample output
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ setup.py
โ””โ”€โ”€ .env                       # HUGGING_FACE_API_KEY (git-ignored)

โš™๏ธ Tech Stack

Layer Technology
LLM Llama 3.1 8B Instruct (via HuggingFace Inference API, auto-provider routing)
Orchestration LangChain LLMChain + SequentialChain + PromptTemplate
Web UI Streamlit (custom CSS theme)
PDF Parsing PyPDF2
Data Handling Pandas
Env Management python-dotenv
Packaging setuptools

๐Ÿš€ Getting Started

1. Clone the repo

git clone https://github.com/houdhoudGH/quizgen.git
cd quizgen

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt

3. Set up your API key

Create a .env file:

HUGGING_FACE_API_KEY=your-hf-token

Get a free token at huggingface.co/settings/tokens.

4. Run the app

streamlit run app.py

Open http://localhost:8501 ๐Ÿš€

5. Use it

  1. Upload a PDF or TXT file
  2. Pick subject, difficulty (Simple/Medium/Hard), and number of MCQs
  3. Click Generate MCQs
  4. Review the generated quiz on screen
  5. Download the result as CSV

๐Ÿ”ฎ Roadmap

The app ships as a working two-stage LLM pipeline. Five directions for production hardening:

  • Containerization โ€” Dockerfile + docker-compose.yml for one-command deployment
  • CI/CD โ€” GitHub Actions workflow for automated testing and linting
  • Question variety โ€” extend beyond MCQs to true/false, fill-in-the-blank, and short-answer
  • Interactive mode โ€” let users actually take the quiz in-app, with scoring
  • Multilingual โ€” generate quizzes in Arabic, French, English from the same source
  • Export formats โ€” Anki deck, Kahoot CSV, Google Forms import

๐Ÿ“„ License

MIT โ€” see LICENSE for details.


๐ŸŽ“ About This Project

QuizGen explores multi-stage LLM orchestration โ€” using a generate-then-critique pipeline to produce educational content more reliable than a single-shot prompt could deliver.


Made with ๐Ÿ’œ by Gheffari Nour El Houda

Master 2 Data Science & NLP ยท AI Engineer


LangChain ยท Llama 3.1 ยท HuggingFace ยท Streamlit


If you found this useful, consider giving the repo a โญ

About

๐Ÿ“ LLM-powered MCQ generator โ€” turn any PDF/TXT into ready-to-use multiple-choice quizzes with Llama 3.1, LangChain SequentialChain, and a two-stage generate-then-review pipeline. Streamlit UI + CSV export.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors