Multimodal AI System for Early Parkinson's Disease Detection
NeuroVive analyzes spiral drawings and voice recordings to screen for Parkinson's Disease, combining computer vision and audio deep learning into a single production-ready inference API.
NeuroVive runs two independent classification models under a single FastAPI application. Each model accepts a different input modality and returns a diagnosis label with a confidence score.
| Modality | Model | Input | Test Accuracy |
|---|---|---|---|
| Spiral Drawing | NeuroSpiral | Image (PNG / JPG) | 87.80% |
| Voice Recording | NeuroVox | Audio (WAV) | 97.80% |
Output labels:
PD— Parkinson's DiseaseHC— Healthy Control
User Upload
│
▼
FastAPI Application
│
├──────────────────────────────────┐
│ │
POST /predict/image POST /predict/voice
│ │
NeuroSpiral NeuroVox
EfficientNet-B0 + HOG + LBP ResNet-18 + Mel-Spectrogram
│ │
└──────────────┬───────────────────┘
│
▼
{ "label": "PD"|"HC",
"probability": 0.0–1.0 }
Runtime details:
- Both ONNX models are loaded at startup via a
lifespancontext manager. - CPU-bound inference runs in a
ThreadPoolExecutor(4 workers) to avoid blocking the async event loop. - Uploaded audio files are saved to
tmp/and deleted automatically viaBackgroundTasksafter each request. - GPU acceleration is supported via
CUDAExecutionProviderwith automatic CPU fallback.
Classifies hand-drawn spiral or wave images, capturing the fine motor tremors characteristic of Parkinson's Disease.
NeuroSpiral uses a hybrid approach: a deep CNN backbone extracts high-level visual features, while handcrafted descriptors capture local texture and edge patterns that CNNs may overlook.
| Feature | What It Captures |
|---|---|
| EfficientNet-B0 | High-level visual representations |
| HOG (Histogram of Oriented Gradients) | Edge and shape patterns |
| LBP (Local Binary Pattern) | Texture and tremor-related micro-patterns |
Input Image (224×224)
│
├───────────────────────┐
│ │
EfficientNet-B0 HOG + LBP
(1280-dim features) (512 → 128-dim)
│ │
└──────────┬────────────┘
│ Concatenate (1280 + 128)
▼
Fusion MLP Head
1408 → 512 → 256 → 1
│
Sigmoid
│
PD / HC + Score
| Parameter | Value |
|---|---|
| Backbone | EfficientNet-B0 |
| Optimizer | AdamW |
| Learning Rate | 1e-4 |
| Weight Decay | 1e-5 |
| Epochs | 15 (best checkpoint at epoch 13) |
| Batch Size | 16 |
| LR Scheduler | Linear warmup (5 epochs) → CosineAnnealing |
| Loss Function | BCEWithLogitsLoss |
| Augmentation | 15× per image — rotation, affine, noise, flip |
Classifies voice recordings of patients sustaining vowels or reading sentences, detecting vocal biomarkers of Parkinson's Disease.
WAV File
│
▼
Resample → 22,050 Hz, mono
│
▼
Pad / Truncate → 6 seconds
│
▼
Mel-Spectrogram (n_mels=40, n_fft=1024)
│
▼
Convert to dB scale
│
▼
Tensor shape: (1, 1, 40, T) → Model input
Input Mel-Spectrogram (1, 40, T)
│
ResNet-18 Backbone
(first conv modified for 1-channel input)
│
Classifier Head
512 → 256 → BatchNorm → SiLU → Dropout → 1
│
Sigmoid
│
PD / HC + Score
| Parameter | Value |
|---|---|
| Backbone | ResNet-18 (ImageNet pretrained) |
| Optimizer | AdamW |
| Learning Rate | 1e-4 |
| Weight Decay | 1e-2 |
| Epochs | 10 (best checkpoint at epoch 6) |
| Batch Size | 64 |
| LR Scheduler | CosineAnnealingWarmRestarts |
| Loss Function | BCEWithLogitsLoss |
| Data Split | 70% train / 20% test / 10% val |
- Time-stretching (1.1× rate)
- Pitch-shifting
- Gaussian noise injection
- Overlapping 6-second chunks (10% overlap)
Base URL: http://localhost:8000
Classify a spiral or wave drawing.
Request
Content-Type: multipart/form-data
Body field: image (file) — JPEG or PNG
Response
{
"label": "PD",
"probability": 0.8342
}Error Codes
| Code | Reason |
|---|---|
400 |
Image is corrupt or unreadable |
415 |
File is not a supported image format |
500 |
Internal inference error |
Classify a voice recording.
Request
Content-Type: multipart/form-data
Body field: audio (file) — WAV only
Response
{
"label": "HC",
"probability": 0.1253
}Error Codes
| Code | Reason |
|---|---|
415 |
File is not a .wav file |
500 |
Processing or inference error |
| Field | Type | Description |
|---|---|---|
label |
string |
"PD" or "HC" |
probability |
float |
Sigmoid output, rounded to 4 decimal places (0.0–1.0) |
Label threshold note:
- NeuroSpiral:
PDif probability < 0.5,HCotherwise- NeuroVox:
PDif probability > 0.5,HCotherwise
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | Val F1 |
|---|---|---|---|---|---|
| 5 | 0.2479 | 89.41% | 0.2482 | 87.36% | 88.38% |
| 9 | 0.0779 | 97.03% | 0.2181 | 91.00% | 91.41% |
| 13 ✅ | 0.0197 | 99.42% | 0.1993 | 93.49% | 93.58% |
| 15 | 0.0157 | 99.57% | 0.2296 | 92.53% | 92.84% |
Test Set Performance
Accuracy : 87.80%
F1 Score : 87.80%
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | Val F1 |
|---|---|---|---|---|---|
| 2 | 0.2731 | 92.54% | 0.2971 | 90.11% | 88.31% |
| 5 | 0.0637 | 99.52% | 0.1335 | 96.70% | 96.10% |
| 6 ✅ | 0.0513 | 99.84% | 0.0865 | 97.80% | 97.37% |
| 10 | 0.0293 | 100.00% | 0.0809 | 96.70% | 96.10% |
Dataset Summary
Total valid audio files : 1,274
PD samples : 663
HC samples : 611
Rejected / failed : 404
AI-Solution/
├── README.md ← Overview of the whole system
├── LICENSE
├── .gitignore
│
├── NeuroVox/ ← Voice model service
│ ├── README.md
│ ├── requirements.txt
│ ├── docs/
│ └── src/
│
├── NeuroSpiral/ ← Image model service
│ ├── README.md
│ ├── requirements.txt
│ ├── docs/
│ └── src/
│
└── NeuroProduction/ ← Unified inference API
├── README.md
├── requirements.txt
├── checkpoint/
├── docs/
└── src/
pip install fastapi uvicorn onnxruntime opencv-python \
scikit-image librosa numpy./NeuroProduction/checkpoint/
├── spiral_best_model.onnx
└── voice_best_model.onnx
python -m ./NeuroProduction/src/api/main.pyThe API will be available at http://0.0.0.0:8000.
# Spiral drawing
curl -X POST "http://localhost:8000/predict/image" \
-F "image=@spiral_drawing.png"
# Voice recording
curl -X POST "http://localhost:8000/predict/voice" \
-F "audio=@patient_voice.wav"# Train NeuroSpiral
python ./NeuroSpiral/src/pipeline/main.py
# Train NeuroVox
python ./NeuroVox/src/pipeline/main.py| Category | Library |
|---|---|
| API Framework | FastAPI, Uvicorn |
| Deep Learning | PyTorch, timm |
| Inference Runtime | ONNX Runtime |
| Image Processing | OpenCV, scikit-image |
| Audio Processing | Librosa |
| Data | Polars, NumPy |
| Metrics | torchmetrics, scikit-learn |
| Augmentation | Albumentations |
Built for clinical research and early-stage Parkinson's Disease screening support.