Distilling Knowledge from Large
Language Models
A Comparative Analysis of Three
Research Papers
Introduction: Distillation
• Large Language Models (LLMs) are powerful tools with remarkable
capabilities.
• Distillation aims to transfer knowledge from large LLMs to smaller, more
manageable models.
• Motivations for distillation:
- Efficiency: Smaller models are faster and cheaper to run.
- Accessibility: Enables the use of open-source models.
- Customization: Facilitates tailoring models for specific tasks.
• This presentation analyzes three papers that explore different approaches
to LLM distillation.
• We will also discuss the potential of these techniques for application in
LawGPT.
Paper 1: Personalized Distillation
• Title: Personalized Distillation: Empowering Open-Sourced LLMs with Adaptive
Learning for Code Generation
• Key Idea:
- Standard Distillation: LLMs generate data, and smaller models learn from it.
- Personalized Distillation: Adapts to the student model's learning progress.
• Process:
1. Student model attempts a task.
2. Evaluation and feedback are provided.
3. Teacher model refines the attempt if needed.
• Benefit: More efficient learning with less data.
Paper 1: Potential Application to LawGPT
• Relate personalized distillation to the LawGPT project.
• How can it improve LawGPT's training?
- LawGPT attempts to answer legal questions.
- A powerful LLM provides feedback and refines the answers.
- Focus on mistakes allows for efficient learning.
• LawGPT Application:
1. A base LawGPT model attempts to answer legal queries.
2. A more powerful LLM (e.g., GPT-4) reviews the answer, providing
corrections and detailed feedback.
3. The base LawGPT model learns from its mistakes, iteratively improving its
legal reasoning and accuracy.
Paper 2: Divide-and-Conquer Distillation
• Title: Divide-or-Conquer? Which Part Should You Distill Your LLM?
• Key Idea: Break down complex reasoning into two phases:
- Decomposition: Breaking down problems into smaller parts.
- Solving: Executing solutions to the sub-problems.
• Hypothesis (authors assumed):
- Decomposition is easier to distill (general problem-solving skills).
- Solving is harder to distill (requires more domain knowledge).
• Diff approaches:
– Static Approach: The LLM first decomposes the entire problem into sub-problems, then
solves each one.
– Dynamic Approach: The LLM decomposes part of the problem, solves it, and uses the
solution to guide further decomposition.
• The authors chose a static approach for clearer separation of stages, easier
implementation, and potential for future integration into dynamic processes
Paper 2: Potential Application to LawGPT
• Apply the divide-and-conquer strategy to LawGPT.
- Decomposition Model: A smaller model breaks down legal questions.
- Solving Model: A larger model answers the sub-questions.
• Benefit: More efficient use of computational resources.
• LawGPT Application:
1. A smaller LawGPT model could be trained to decompose complex legal
questions into simpler sub-questions.
2. A larger, more knowledgeable LawGPT model then answers these sub-
questions.
3. This enables a modular approach, where different models handle different
aspects of legal reasoning.
Paper 3: Distillation with Explanations
• Title: Distillation with Explanations from Large Language Models
• Key Idea: LLMs can generate incorrect answers.
• Observation: LLM explanations are often consistent with their (incorrect) answers.
• Method:
– Combine ground truth labels with LLM-generated explanations to train a smaller model.
– LLM explanations have value even if the answer is wrong because they show the model's
reasoning.
– Smaller models can learn valuable reasoning steps from these explanations.
– By combining these explanations with correct labels, we can train models to be both accurate
and capable of reasoning.
• Challenge: LLMs Can Be Incorrect:
- LLMs have shown impressive capabilities in language tasks.
- However, they can generate incorrect or inaccurate answers.
- Noisy data from incorrect answers can negatively affect model training.
Paper 3: Potential Application to LawGPT
• Use LLMs to generate explanations for legal concepts, even with occasional errors.
• Train LawGPT using a combination of:
- LLM-generated explanations.
- Ground truth labels.
• Benefit: Leverage LLM reasoning while maintaining accuracy.
• LawGPT Application:
1. When training LawGPT, use a larger LLM to generate explanations for its
answers.
2. Combine these explanations with a dataset of legal questions and verified
correct answers.
3. This helps LawGPT learn to provide both accurate answers and sound legal
reasoning.