A plug-and-play AI visual inspection system that uses transfer learning (ResNet-18) to detect manufacturing defects in real-time. Perfect for industrial quality control applications.
- π€ Transfer Learning: ResNet-18 pre-trained on ImageNet, fine-tuned for defect detection
- π₯ Visual Explanations: Grad-CAM heatmaps show which regions influenced predictions
- β‘ Real-time Inference: Process images from camera feed or uploaded files
- π REST API: FastAPI backend for easy integration into existing systems
- π Interactive Dashboard: Streamlit web interface for monitoring and analysis
- π¦ ONNX Export: Deploy models in production environments
- π¨ 6 Defect Classes: Crazing, Inclusion, Patches, Pitted Surface, Rolled-in Scale, Scratches
| Metric | Score |
|---|---|
| Overall Accuracy | 99.72% |
| Macro Precision | 99.73% |
| Macro Recall | 99.72% |
| Macro F1-Score | 99.72% |
| Defect Class | Accuracy | Samples |
|---|---|---|
| Crazing | 100.00% | 60/60 |
| Inclusion | 98.33% | 59/60 |
| Patches | 100.00% | 60/60 |
| Pitted Surface | 100.00% | 60/60 |
| Rolled-in Scale | 100.00% | 60/60 |
| Scratches | 100.00% | 60/60 |
- Deep Learning: PyTorch 2.0+
- Computer Vision: OpenCV, torchvision
- Backend API: FastAPI, Uvicorn
- Frontend Dashboard: Streamlit
- Explainability: Grad-CAM
- Deployment: ONNX, Docker (optional)
- Dataset: NEU-DET Surface Defect Database
AutoVision/
βββ data/
β βββ NEU-DET/ # Dataset
β βββ train/
β β βββ images/ # Training images (6 classes)
β β βββ annotations/ # XML annotations
β βββ validation/
β βββ images/ # Validation images
β βββ annotations/ # XML annotations
βββ models/
β βββ resnet18_anomaly.pth # PyTorch model weights
β βββ resnet18_anomaly.onnx # ONNX format (for deployment)
βββ src/
β βββ model.py # Model architecture
β βββ train.py # Training script
β βββ preprocess.py # Image preprocessing utilities
β βββ gradcam.py # Grad-CAM implementation
β βββ app.py # FastAPI backend
β βββ dashboard.py # Streamlit dashboard
βββ results/ # Inference results & visualizations
βββ test_model.py # Full model evaluation
βββ inference_with_bbox.py # Batch inference with bounding boxes
βββ single_inference.py # Single image detailed analysis
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker configuration
βββ README.md # This file
git clone https://github.com/Swarno-Coder/AutoVision.git
cd AutoVisionpip install -r requirements.txtpython src/train.pyTraining Results:
- Duration: ~10-15 minutes (on GPU)
- Final Accuracy: 99.72%
- Models saved to
models/
# Full evaluation on validation set
python test_model.py
# Visualize predictions with bounding boxes
python inference_with_bbox.py
# Detailed single image analysis
python single_inference.pypython src/app.pyAccess Points:
- API: http://localhost:8000
- Documentation: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
streamlit run src/dashboard.pyDashboard will open at: http://localhost:8501
import requests
# Upload image
url = "http://localhost:8000/predict"
files = {"file": open("defect_image.jpg", "rb")}
response = requests.post(url, files=files)
result = response.json()
print(f"Defect: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1%}")Response:
{
"success": true,
"prediction": "crazing",
"confidence": 0.9234,
"class_id": 0,
"all_probabilities": {
"crazing": 0.9234,
"inclusion": 0.0421,
...
},
"top_3_predictions": [
{"class": "crazing", "probability": 0.9234},
{"class": "inclusion", "probability": 0.0421},
{"class": "patches", "probability": 0.0198}
]
}import requests
import base64
from PIL import Image
from io import BytesIO
# Upload image and get visual explanation
url = "http://localhost:8000/predict/gradcam"
files = {"file": open("defect_image.jpg", "rb")}
response = requests.post(url, files=files)
result = response.json()
# Decode and display heatmap overlay
img_data = base64.b64decode(result["gradcam_image"])
img = Image.open(BytesIO(img_data))
img.show()
print(f"Explanation: {result['explanation']}")import requests
url = "http://localhost:8000/batch/predict"
files = [
("files", open("image1.jpg", "rb")),
("files", open("image2.jpg", "rb")),
("files", open("image3.jpg", "rb"))
]
response = requests.post(url, files=files)
results = response.json()
for item in results["results"]:
print(f"{item['filename']}: {item['prediction']} ({item['confidence']:.1%})")- Drag & drop or browse to upload images
- Instant defect classification
- Grad-CAM visual explanations
- Probability distribution charts
- Confidence threshold settings
- Live webcam feed processing
- Real-time defect detection
- Heatmap overlay on video
- FPS monitoring
- Continuous inference
- Interactive API documentation
- Live endpoint testing
- Code examples (Python, cURL, JavaScript)
- Request/response visualization
Grad-CAM (Gradient-weighted Class Activation Mapping) provides visual explanations:
- Red regions: High influence on prediction
- Blue regions: Low influence on prediction
- Helps understand what the model "sees"
- Useful for debugging and building trust
Example:
from src.gradcam import GradCAM, overlay_heatmap_on_image
from src.model import get_transforms
from PIL import Image
# Initialize
gradcam = GradCAM('./models/resnet18_anomaly.pth')
transform = get_transforms()
# Load image
image = Image.open('defect.jpg')
input_tensor = transform(image).unsqueeze(0)
# Generate heatmap
heatmap = gradcam.generate(input_tensor)
# Overlay on image
import numpy as np
image_np = np.array(image.resize((224, 224)))
result = overlay_heatmap_on_image(image_np, heatmap)NEU-DET (Northeastern University Surface Defect Database)
- Total Images: 1,800 (300 per class)
- Image Size: 200x200 pixels
- Format: Grayscale converted to RGB
- Split: 80% train (1,440), 20% validation (360)
- Crazing: Network of fine cracks
- Inclusion: Embedded impurities
- Patches: Irregular surface patches
- Pitted Surface: Corrosion pitting
- Rolled-in Scale: Oxide scale defects
- Scratches: Linear surface scratches
python test_model.pyGenerates:
- Confusion matrix
- Per-class metrics
- Accuracy charts
- Classification report
python inference_with_bbox.pyGenerates:
- Side-by-side ground truth vs prediction
- Bounding box overlays
- Confidence scores
- 12 sample visualizations
python single_inference.pyGenerates:
- Detailed visualization
- Probability distribution
- Grad-CAM overlay
- Comprehensive metadata
# Build image
docker build -t autovision .
# Run container
docker run -p 8000:8000 -p 8501:8501 autovision- Architecture: ResNet-18
- Input Size: 224x224
- Classes: 6
- Pretrained: ImageNet weights
- Batch Size: 32
- Epochs: 10
- Learning Rate: 0.001
- Optimizer: Adam
- Loss: CrossEntropyLoss
- Host: 0.0.0.0
- Port: 8000
- CORS: Enabled
- Device: Auto-detect (CUDA/CPU)
- Use GPU if available (10x faster)
- Increase batch size for faster convergence
- Data augmentation for better generalization
- Use ONNX model for production (faster)
- Batch predictions for multiple images
- GPU acceleration for real-time processing
- Deploy with Gunicorn/Uvicorn workers
- Use caching for repeated predictions
- Load balancing for high traffic
Contributions welcome! Please follow these steps:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset: NEU Surface Defect Database
- Framework: PyTorch team
- Inspiration: MVTec AD Dataset
- Community: FastAPI & Streamlit developers
Developer: Swarno-Coder GitHub: @Swarno-Coder Project: AutoVision
- Add more defect classes
- Multi-object detection (YOLOv8 integration)
- Anomaly detection mode
- Mobile app integration
- Cloud deployment guide
- Continuous learning pipeline
- Production monitoring dashboard
- Alert system for critical defects
If you use this project in your research, please cite:
@software{autovision2025,
author = {Swarno-Coder},
title = {AutoVision: Intelligent Visual Defect Detection System},
year = {2025},
url = {https://github.com/Swarno-Coder/AutoVision}
}Making defect detection intelligent and accessible