A complete end-to-end pipeline for extracting structured data from chart and graph images, based on the PlotQA paper ("PlotQA: Reasoning over Scientific Plots" by Nitesh Methani et al., 2020).
# Install dependencies
pip install -r requirements.txt
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder detectron2==detectron2-0.6+18f6958pt2.8.0cu128
# Download model weights (see Installation section)
# Then run:
python calculate_visual_values.py --image your_chart.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debugThis pipeline takes chart images (bar charts, line graphs, scatter plots) as input and outputs structured JSON data containing:
- Chart type and title
- Axis labels and tick values
- Data series with extracted values
- Legend information
- Visual element values (heights, lengths, positions)
Key Features:
- Caffe2 Compatibility: Full compatibility with original PlotQA Caffe2 models through specialized detectors
- Dual Architecture Support: Both Caffe2-compatible and exact Caffe2 architecture replication
- Enhanced Output: Provides both CSV (original format) and JSON (structured) outputs
- Visual Value Calculation: Calculates numerical values for visual elements (bars, lines, dots)
- Debug Mode: Comprehensive debugging and visualization tools
The pipeline consists of four main stages:
- VED (Visual Element Detection): Uses specialized Caffe2-compatible detectors for original PlotQA models
- OCR (Optical Character Recognition): Extracts text from detected regions using Tesseract
- SIE (Structural Information Extraction): Builds structured data from detections and OCR results
- Visual Value Calculation: Computes numerical values for visual elements using coordinate-based scaling
Refer to Installation guide here
# Basic chart processing
python process_chart.py --image your_chart.png --model models/ved/model_final.pkl --use-caffe2
# Visual values calculation with debug
python calculate_visual_values.py --image your_chart.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debug
# With KMP fix for Windows
$env:KMP_DUPLICATE_LIB_OK="TRUE"; python calculate_visual_values.py --image your_chart.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debug# Test with PlotQA dataset (if downloaded)
python process_chart.py --image data/plotqa/VAL/png/18458.png --model models/ved/model_final.pkl --use-caffe2
# Test visual values calculation
python calculate_visual_values.py --image data/plotqa/VAL/png/18458.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debugCommon Arguments:
--image: Path to input chart image (required)--model: Path to trained VED model weights (required)--output: Output directory (default: "results")--confidence: Detection confidence threshold (default: 0.05)--debug: Enable debug mode for detailed analysis--verbose: Enable verbose logging
Detector Selection:
The pipeline supports three detector types with the following selection logic:
- If
use_caffe2=TrueANDuse_exact_caffe2=True→ ExactCaffe2Detector { THIS IS USED BY DEFAULT } - If
use_caffe2=TrueANDuse_exact_caffe2=False→ Caffe2CompatibleDetector - If
use_caffe2=False→ PlotQADetector (Detectron2)
Default Behavior (when no flags are specified):
calculate_visual_values.py: Uses ExactCaffe2Detector by default (--use-caffe2and--use-exact-caffe2are bothTrueby default)process_chart.py: Uses ExactCaffe2Detector by default (--use-caffe2and--use-exact-caffe2are bothTrueby default)
Detector Options:
--use-caffe2: Use Caffe2-compatible detector (recommended for original models)--use-exact-caffe2: Use exact Caffe2 architecture replication (most compatible)--use-detectron2: Use Detectron2 detector (modern PyTorch-based, overrides Caffe2 options)
Standard Output:
{image_name}.csv- Original PlotQA format{image_name}.json- Structured JSON format{image_name}_metadata.json- Processing metadata
Visual Values Output (calculate_visual_values.py):
{image_name}_visual_values.json- Visual element values with coordinates and scaling
Debug Mode Output:
temp/detection_summary.json- Detailed detection informationtemp/detections_visualized.png- Image with bounding boxestemp/ocr_debug.json- OCR results with extended bounding boxestemp/ocr_with_extended_bboxes.png- OCR visualizationtemp/debug_crops/- Individual cropped regions
The pipeline supports multiple detector architectures for maximum compatibility:
- Purpose: Provides compatibility with original PlotQA Caffe2 models
- Features:
- Loads original
.pklmodel files - Maintains original detection format
- Optimized for PlotQA dataset models
- Loads original
- Usage:
--use-caffe2flag
- Purpose: Exact replication of original Caffe2 architecture
- Features:
- Pixel-perfect compatibility with original models
- Handles edge cases in original implementation
- Best for production use with original PlotQA models
- Usage:
--use-exact-caffe2flag (recommended)
- Purpose: Modern PyTorch-based detection
- Features:
- Faster inference
- Better GPU utilization
- Modern PyTorch ecosystem integration
- Usage:
--use-detectron2flag or omit Caffe2 flags
The original PlotQA models were trained using Caffe2, and while Detectron2 provides modern PyTorch-based detection, there can be subtle differences in:
- Model loading and initialization
- Preprocessing pipelines
- Post-processing steps
- Numerical precision
Our Caffe2-compatible detectors ensure pixel-perfect compatibility with original PlotQA models.
| Feature | Caffe2-Compatible | Exact Caffe2 | Detectron2 |
|---|---|---|---|
| Model Loading | Original .pkl files |
Original .pkl files |
Converted .pth files |
| Preprocessing | Original pipeline | Exact original pipeline | Modern pipeline |
| Post-processing | Original NMS | Exact original NMS | Modern NMS |
| Compatibility | High | Perfect | Good |
| Performance | Good | Good | Best |
| Maintenance | Medium | High | Low |
class Caffe2CompatibleDetector:
def __init__(self, model_path, confidence_threshold=0.1):
# Load original Caffe2 model
self.model = self._load_caffe2_model(model_path)
# Configure preprocessing to match original
self.preprocessor = self._setup_preprocessing()
# Configure post-processing
self.postprocessor = self._setup_postprocessing()
def detect_single_image(self, image_path):
# Resize to 650x650 (original PlotQA size)
resized_image = self._resize_image(image_path, (650, 650))
# Run inference
detections = self._run_inference(resized_image)
# Apply confidence filtering
filtered_detections = self._filter_detections(detections)
return filtered_detections, resized_image, original_dimensionsclass ExactCaffe2Detector:
def __init__(self, model_path, confidence_threshold=0.1):
# Load with exact Caffe2 architecture
self.model = self._load_exact_caffe2_model(model_path)
# Replicate exact preprocessing steps
self.preprocessor = self._replicate_caffe2_preprocessing()
# Replicate exact post-processing
self.postprocessor = self._replicate_caffe2_postprocessing()
def detect_single_image(self, image_path):
# Exact same preprocessing as original Caffe2
processed_image = self._exact_caffe2_preprocess(image_path)
# Run with exact same inference pipeline
detections = self._exact_caffe2_inference(processed_image)
# Apply exact same post-processing
final_detections = self._exact_caffe2_postprocess(detections)
return final_detections, processed_image, original_dimensionsThe pipeline supports both original Caffe2 model formats:
.pklfiles: Original Caffe2 pickle files (recommended).pthfiles: PyTorch state dict files (for Detectron2)
For best compatibility with original PlotQA models, use .pkl files with Caffe2-compatible detectors.
Common Arguments:
--image: Path to input chart image (required)--model: Path to trained VED model weights (required)--output: Output directory (default: "results")--confidence: Detection confidence threshold (default: 0.05)--debug: Enable debug mode for detailed analysis--verbose: Enable verbose logging
Detector Selection:
--use-caffe2: Use Caffe2-compatible detector (recommended for original models)--use-exact-caffe2: Use exact Caffe2 architecture replication (most compatible)--use-detectron2: Use Detectron2 detector (modern PyTorch-based)
When using --debug flag, the pipeline generates additional files in a temp/ directory:
detection_summary.json- Detailed detection information with coordinatesdetections_visualized.png- Image with bounding boxes overlaidocr_debug.json- OCR results with extended bounding boxesocr_with_extended_bboxes.png- Visualization of OCR processingdebug_crops/- Individual cropped regions for each detected element
For advanced users who need more control, individual components can be used separately:
# 1. Generate detections only
python generate_detections.py \
--model_path ./models/ved/model_final.pkl \
--image_path chart.png \
--output detections.txt
# 2. OCR and structural extraction
python ocr_and_sie.py \
--image_dir ./images/ \
--detections_dir ./detections/ \
--output_dir ./extracted/The pipeline outputs structured JSON with the following format:
{
"chart_type": "bar",
"title": "Sales by Quarter",
"x_axis": {
"label": "Quarter",
"type": "categorical"
},
"y_axis": {
"label": "Sales ($M)",
"type": "numeric"
},
"data_series": [
{
"name": "Sales",
"type": "bar",
"data": [
{"x": "Q1", "y": 15.2},
{"x": "Q2", "y": 23.1},
{"x": "Q3", "y": 18.7},
{"x": "Q4", "y": 28.3}
]
}
],
"metadata": {
"source_image": "chart.png",
"extraction_method": "PlotQA Pipeline",
"total_series": 1
}
}When using calculate_visual_values.py, additional visual value data is included:
{
"chart_orientation": {
"isHbar": false,
"isSinglePlot": true
},
"visual_elements_with_values": [
{
"pred_class": "bar",
"bbox": [100, 200, 150, 400],
"confidence": 0.95,
"x_value": "Q1",
"y_value": 15.2,
"ocr_text": null
},
{
"pred_class": "xticklabel",
"bbox": [120, 450, 140, 470],
"confidence": 0.88,
"ocr_text": "Q1"
}
],
"calculation_method": "find_visual_values_algorithm"
}The pipeline also generates CSV files in the original PlotQA format for compatibility:
title,xlabel,ylabel,Sales
Sales by Quarter,Quarter,Sales ($M),15.2
Sales by Quarter,Quarter,Sales ($M),23.1
Sales by Quarter,Quarter,Sales ($M),18.7
Sales by Quarter,Quarter,Sales ($M),28.3
- Bar Charts: Vertical and horizontal bars
- Line Graphs: Single and multi-series lines
- Scatter Plots: Point-based data visualization
- Mixed Charts: Combinations of the above
The VED model detects the following chart elements (matching original PlotQA categories):
| Category | Description |
|---|---|
bar |
Bar chart bars |
dot_line |
Dotted line elements |
legend_label |
Legend item labels |
line |
Line chart lines |
preview |
Legend preview boxes |
title |
Chart title |
xlabel |
X-axis label |
xticklabel |
X-axis tick labels |
ylabel |
Y-axis label |
yticklabel |
Y-axis tick labels |
- Confidence Threshold: Minimum detection confidence (default: 0.5)
- NMS Threshold: Non-maximum suppression threshold
- Input Size: Image resizing for detection (default: 800x1333)
- Tesseract Config: Character whitelist and recognition mode
- Text Padding: Padding around text bounding boxes (default: 5px)
- Language: OCR language model (default: English)
- GPU Usage: Use CUDA-enabled GPU for faster training and inference
- Batch Size: Adjust based on GPU memory (4-8 for most GPUs)
- Image Quality: Higher resolution images generally produce better results
- Preprocessing: Ensure charts are clearly visible with good contrast
"No module named 'detectron2'"
- Install Detectron2 with CUDA support:
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu118/torch2.0/index.html
"Tesseract not found"
- Install Tesseract OCR and add to PATH
- Ubuntu:
sudo apt-get install tesseract-ocr - Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
"CUDA out of memory"
- Reduce batch size in training
- Use smaller input image sizes
- Enable gradient checkpointing
Poor extraction accuracy
- Check detection confidence threshold
- Verify image quality and resolution
- Retrain model with more similar data
Enable verbose logging and detailed analysis:
# Basic debug mode
python process_chart.py --image chart.png --model models/ved/model_final.pkl --use-caffe2 --debug
# Visual values with debug
python calculate_visual_values.py --image chart.png --model models/ved/model_final.pkl --use-caffe2 --debug --verbosecharto/ # Main project directory
├── calculate_visual_values.py # Main Visual values calculation script
├── process_chart.py # Chart processing script
├── caffe2_compatible_detector.py # Caffe2-compatible detector
├── exact_caffe2_detector.py # Exact Caffe2 architecture replication
├── generate_detections.py # Detectron2-based detection
├── ocr_and_sie.py # OCR and structural extraction
├── utils.py # Utility functions
├── bbox_conversion.py # Bounding box conversion utilities
├── upscale_boxes.py # Box upscaling functionality
├── requirements.txt # Complete dependencies list (latest versions)
├── README.md # This file
├── calculate_visual_values_readme.md # Detailed description of the working of the Main Script
├── models/ # Trained model weights
│ └── ved/
│ ├── model_final.pkl # Caffe2 model file
│ ├── model_iter19999.pkl # Training checkpoint
│ ├── net.pbtxt # Network architecture
│ └── param_init_net.pbtxt # Parameter initialization
├── data/ # PlotQA dataset
│ └── plotqa/
│ ├── TRAIN/ # Training images
│ ├── VAL/ # Validation images
│ ├── TEST/ # Test images
│ └── annotations/ # COCO-style annotations
└─── samle_results/ # Example results
Main pipeline class for chart processing:
from process_chart import PlotQAProcessor
# Initialize with Caffe2-compatible detector
processor = PlotQAProcessor(
model_path="./models/ved/model_final.pkl",
confidence_threshold=0.05,
use_caffe2=True,
use_exact_caffe2=True,
debug=False
)
# Process a single image
results = processor.process_image("chart.png", "output_dir")Extended processor with visual value calculations:
from calculate_visual_values import VisualValuesCalculator
# Initialize calculator
calculator = VisualValuesCalculator(
model_path="./models/ved/model_final.pkl",
confidence_threshold=0.05,
use_caffe2=True,
use_exact_caffe2=True,
debug=True # Enable for full visual value calculation
)
# Process with visual values
results = calculator.process_with_values("chart.png", "output_dir")Direct access to Caffe2-compatible detection:
from caffe2_compatible_detector import Caffe2CompatibleDetector
detector = Caffe2CompatibleDetector(
model_path="./models/ved/model_final.pkl",
confidence_threshold=0.05
)
detections, resized_image, original_dimensions = detector.detect_single_image("chart.png")Exact Caffe2 architecture replication:
from exact_caffe2_detector import ExactCaffe2Detector
detector = ExactCaffe2Detector(
model_path="./models/ved/model_final.pkl",
confidence_threshold=0.05
)
detections, resized_image, original_dimensions = detector.detect_single_image("chart.png")- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use this pipeline in your research, please cite the original PlotQA paper:
@inproceedings{methani2020plotqa,
title={PlotQA: Reasoning over Scientific Plots},
author={Methani, Nitesh and Ganguly, Pritha and Khapra, Mitesh M and Kumar, Pratyush},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1527--1536},
year={2020}
}This implementation maintains compatibility with the original PlotQA pipeline while modernizing the underlying technology:
- Dataset Structure: Uses exact same directory structure and annotation files (train_50k_annotations.json, etc.)
- Detection Format: Outputs detections in original format:
CLASS_LABEL CLASS_CONFIDENCE XMIN YMIN XMAX YMAX - Category Labels: Uses same visual element categories (axis, tick, tick_label, etc.)
- CSV Output: Maintains original semi-structured CSV format for compatibility
- Pipeline Stages: Follows same VED → OCR → SIE workflow
- Detectron2: Uses PyTorch-based Detectron2 instead of Caffe2-based Detectron
- Python 3: Updated from Python 2 to Python 3.8+
- Enhanced OCR: Improved OCR processing with better error handling
- JSON Output: Added structured JSON output alongside original CSV
- Better Documentation: Comprehensive usage examples and API documentation
If you have an existing PlotQA setup, you can:
- Use the same dataset directory structure
- Use the same
.pklmodel files with Caffe2-compatible detectors - Replace detection generation with
process_chart.pyorcalculate_visual_values.py - Get both original CSV and new JSON outputs
- Access visual element values with coordinate-based calculations
-
"No module named 'detectron2'"
- Install Detectron2:
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder detectron2==detectron2-0.6+18f6958pt2.8.0cu128
- Install Detectron2:
-
"Model file not found" or "Could not load model"
- Ensure
model_final.pklis inmodels/ved/directory - Check file size:
ls -lh models/ved/model_final.pkl(should be ~100MB+) - Re-download from: https://drive.google.com/drive/folders/1P00jD-WFg_RBissIPmuWEWct3xoM3mgU?usp=sharing
- Ensure
-
"Tesseract not found"
- Install Tesseract OCR and add to PATH
- Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
-
"CUDA out of memory"
- Use CPU version: Install PyTorch without CUDA
- Reduce batch size in processing
-
KMP_DUPLICATE_LIB_OK error (Windows)
$env:KMP_DUPLICATE_LIB_OK="TRUE" python process_chart.py --image chart.png --model models/ved/model_final.pkl --use-caffe2
-
"No visual elements detected"
- Check image quality and resolution
- Try different confidence thresholds:
--confidence 0.1or--confidence 0.05 - Ensure image is a chart/graph (not a photo or other image type)
Test the pipeline with sample images:
# Test basic processing
python process_chart.py --image test_chart.png --model models/ved/model_final.pkl --use-caffe2
# Test visual values calculation
python calculate_visual_values.py --image test_chart.png --model models/ved/model_final.pkl --use-caffe2 --debug
# Test different detector architectures
python process_chart.py --image test_chart.png --model models/ved/model_final.pkl --use-exact-caffe2
python process_chart.py --image test_chart.png --model models/ved/model_final.pkl --use-detectron2
#if error
$env:KMP_DUPLICATE_LIB_OK="TRUE"; python calculate_visual_values.py --image test_chart.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debug --verbose
$env:KMP_DUPLICATE_LIB_OK="TRUE"; python process_chart.py --image test_chart.png --model models/ved/model_final.pkl --confidence 0.05 --use-caffe2 --debug --verbose
# Install dependencies
pip install -r requirements.txt
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder detectron2==detectron2-0.6+18f6958pt2.8.0cu128
# Basic processing
python process_chart.py --image chart.png --model models/ved/model_final.pkl --use-caffe2
# Visual values with debug
python calculate_visual_values.py --image chart.png --model models/ved/model_final.pkl --use-caffe2 --debug
# Windows KMP fix
$env:KMP_DUPLICATE_LIB_OK="TRUE"; python process_chart.py --image chart.png --model models/ved/model_final.pkl --use-caffe2- Model Weights: https://drive.google.com/drive/folders/1P00jD-WFg_RBissIPmuWEWct3xoM3mgU?usp=sharing
- Dataset: https://drive.google.com/drive/folders/15bWhzXxAN4WsXn4p37t_GYABb1F52nQw?usp=sharing
- Tesseract (Windows): https://github.com/UB-Mannheim/tesseract/wiki
- Facebook AI Research for Detectron2
- The PlotQA dataset authors (Methani et al., 2020)
- Original PlotQA pipeline contributors
- Tesseract OCR team
- PyTorch community