GLM-4 Fine-tuning for CMCC-34 Intent Classification

This repository contains a complete pipeline for fine-tuning GLM-4-9B on the CMCC-34 dataset for intent classification using QLoRA (Quantized Low-Rank Adaptation). The project includes data preparation, LLM-baseddata augmentation, model training, evaluation, and inference capabilities.

🚀 Quick Start

1. Environment Setup

# Install dependencies
pip install torch transformers peft bitsandbytes accelerate
pip install scikit-learn matplotlib seaborn tqdm

# For CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

2. Data Preparation & Augmentation

# Convert raw CSV data to GLM-4 format with system prompts
cd data/cmcc-34
python convert_data.py

# Run LLM-based data augmentation to address class imbalance
cd ../aug
python run_augmentation.py --dry-run    # Test configuration
python run_augmentation.py              # Run full augmentation

### 3. Model Training

```bash
# Train the model using QLoRA
cd finetune
python train_cmcc34_system_prompt.py

4. Model Evaluation

# Quick evaluation (100 samples)
cd evaluation
python evaluate.py --q

# Full evaluation
python evaluate.py --model-path ../finetune/output/cmcc34_qlora_system_prompt/checkpoint-5000

5. Model Inference

# Interactive CLI
cd inference
python trans_cli_finetuned_demo.py

# Web interface
python trans_web_finetuned_demo.py

## 🆕 Data Augmentation Pipeline

### Problem Statement

The CMCC-34 dataset suffers from severe class imbalance:
- **Most frequent class**: 1,961 samples (Class 0: 咨询业务规定)
- **Least frequent classes**: 1 sample each (Class 32: 办理销户/重开, Class 33: 咨询电商货品信息)
- **Imbalance ratio**: 1961:1
- **Gini coefficient**: 0.651 (high inequality)


### Usage
```bash
# Start GLM-4 API server
cd inference
python glm4v_server.py

# Run augmentation
cd ../aug
python run_augmentation.py --dry-run    # Test configuration
python run_augmentation.py              # Run full augmentation

## 🔧 Configuration

### Training Configuration

The training uses QLoRA with the following key parameters:

- **Base Model**: GLM-4-9B-0414
- **Quantization**: 4-bit (QLoRA)
- **LoRA Rank**: 16
- **LoRA Alpha**: 64
- **Learning Rate**: 2e-4
- **Batch Size**: 4
- **Max Steps**: 5000

### Evaluation Configuration

- **Test Dataset**: CMCC-34 test set
- **Metrics**: Accuracy, F1-Macro, F1-Weighted
- **Output**: Failed predictions, confusion matrix, detailed analysis

## 📊 Model Performance

### Available Checkpoints

Multiple checkpoints are available in `finetune/output/cmcc34_qlora_system_prompt/`:

- `checkpoint-500/` - Early training
- `checkpoint-1000/` - 1000 steps
- `checkpoint-2000/` - 2000 steps
- `checkpoint-3000/` - 3000 steps
- `checkpoint-4000/` - 4000 steps
- `checkpoint-5000/` - Latest (recommended)

### Performance Metrics

The model achieves:
- **Accuracy**: ~85-90% (depending on checkpoint)
- **F1-Macro**: ~0.85-0.90
- **F1-Weighted**: ~0.85-0.90

## 🎯 Intent Classification

The model classifies user intents into 34 categories:

| Intent ID | Intent Name |
|-----------|-------------|
| 0 | 咨询（含查询）业务规定 |
| 1 | 办理取消 |
| 2 | 咨询（含查询）业务资费 |
| 3 | 咨询（含查询）营销活动信息 |
| 4 | 咨询（含查询）办理方式 |
| 5 | 投诉（含抱怨）业务使用问题 |
| 6 | 咨询（含查询）账户信息 |
| 7 | 办理开通 |
| 8 | 咨询（含查询）业务订购信息查询 |
| 9 | 投诉（含抱怨）不知情定制问题 |
| 10 | 咨询（含查询）产品/业务功能 |
| ... | ... |

## 🛠️ Usage Examples

### Command Line Evaluation

```bash
# Quick evaluation with 50 samples
python evaluate.py --quick --samples 50

# Full evaluation with custom model path
python evaluate.py --model-path ../finetune/output/cmcc34_qlora_system_prompt/checkpoint-5000

# Custom batch size and output directory
python evaluate.py --batch-size 25 --output-dir my_evaluation_results

Programmatic Usage

from evaluation.evaluate import SystemPromptEvaluator

# Initialize evaluator
evaluator = SystemPromptEvaluator(
    base_model_path="THUDM/GLM-4-9B-0414",
    finetuned_path="finetune/output/cmcc34_qlora_system_prompt/checkpoint-5000",
    test_file="finetune/data/cmcc-34/test.jsonl",
    output_dir="evaluation_results"
)

# Load model and evaluate
evaluator.load_model()
test_data = evaluator.load_test_data()
results = evaluator.evaluate_batch(test_data)

# Print results
evaluator.print_results(results)
evaluator.save_final_results(results)

Interactive Inference

# Start interactive CLI
cd inference
python trans_cli_finetuned_demo.py

# Example conversation:
# User: 我想查询我的余额
# Assistant: 意图：6:咨询（含查询）账户信息

# GLM-4V Vision model
python trans_cli_vision_demo.py

# Batch inference
python trans_batch_demo.py

📈 Evaluation Results

The evaluation script generates comprehensive results:

Output Files

failed_predictions.json - Failed prediction details
error_predictions.json - Wrong predictions analysis
confusion_matrix.png - Visual confusion matrix
detailed_analysis.json - Per-class performance metrics
system_prompt_evaluation_results.json - Complete results

Analysis Features

Failed Prediction Analysis: Detailed error categorization
Confusion Matrix: Visual representation of classification errors
Per-Class Performance: Precision, recall, F1-score for each intent
Error Distribution: Most confused intent pairs
Retry Logic: Automatic handling of long inputs and memory errors

🔍 Troubleshooting

Common Issues

CUDA Out of Memory

# Reduce batch size
python evaluate.py --batch-size 10

# Use smaller checkpoint
python evaluate.py --model-path checkpoint-2000

Model Loading Errors

# Check model path
ls finetune/output/cmcc34_qlora_system_prompt/

# Verify base model
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('THUDM/GLM-4-9B-0414')"

Data Loading Issues

# Regenerate dataset
cd finetune/data/cmcc-34
python regenerate_dataset.py

Memory Requirements

Training: ~24GB GPU memory (with QLoRA)
Evaluation: ~10GB GPU memory
Inference: ~10GB GPU memory
Data Augmentation: ~15GB GPU memory

🚀 Advanced Usage

Custom Training

# Modify training config
vim finetune/configs/cmcc34_qlora_system_prompt.yaml

# Train with custom parameters
python train_cmcc34_system_prompt.py --config custom_config.yaml

Custom Evaluation

# Evaluate on custom test set
python evaluate.py --test-file custom_test.jsonl

# Compare multiple checkpoints
for checkpoint in 1000 2000 3000 4000 5000; do
    python evaluate.py --model-path checkpoint-$checkpoint --output-dir eval_$checkpoint
done

Production Deployment

# Load model for production
from inference.model_loader import load_finetuned_model

model, tokenizer = load_finetuned_model(
    base_model_path="THUDM/GLM-4-9B-0414",
    finetuned_path="finetune/output/cmcc34_qlora_system_prompt/checkpoint-5000"
)

# Batch inference
def classify_intents(texts):
    results = []
    for text in texts:
        intent = model.generate_intent(text)
        results.append(intent)
    return results

📚 References

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

THUDM for the GLM-4 model
Microsoft for the QLoRA technique
The open-source community for various tools and libraries

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
.github		.github
aug		aug
chip		chip
data		data
demo		demo
eval		eval
finetune		finetune
inference		inference
prompt_engineer		prompt_engineer
rag		rag
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
=0.0.10		=0.0.10
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GLM-4 Fine-tuning for CMCC-34 Intent Classification

🚀 Quick Start

1. Environment Setup

2. Data Preparation & Augmentation

4. Model Evaluation

5. Model Inference

Programmatic Usage

Interactive Inference

📈 Evaluation Results

Output Files

Analysis Features

🔍 Troubleshooting

Common Issues

Memory Requirements

🚀 Advanced Usage

Custom Training

Custom Evaluation

Production Deployment

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Trevoewu/GLM-4

Folders and files

Latest commit

History

Repository files navigation

GLM-4 Fine-tuning for CMCC-34 Intent Classification

🚀 Quick Start

1. Environment Setup

2. Data Preparation & Augmentation

4. Model Evaluation

5. Model Inference

Programmatic Usage

Interactive Inference

📈 Evaluation Results

Output Files

Analysis Features

🔍 Troubleshooting

Common Issues

Memory Requirements

🚀 Advanced Usage

Custom Training

Custom Evaluation

Production Deployment

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages