What about an ECG foundation model?
Cardiovascular diseases are the leading cause of death worldwide, accounting for an estimated 17.9 million deaths annually, which is about 32% of all global deaths. Electrocardiograms (ECGs) play a crucial role in diagnosing these conditions, with over 300 million ECGs performed each year globally.
Despite the widespread use of ECGs, there's a lack of publicly available general-purpose models that can effectively interpret ECG data across diverse populations and conditions. Our work presents D-BETA, an approach that learns general knowledge from both ECG signals and their relevant textual reports simultaneously without needing exact manual labels during pre-training. D-BETA not only captures subtle details in each type of data but also learns how they connect, helping it make a better foundation model with more accurate decisions in downstream tasks.
Across comprehensive evaluation, D-BETA consistently outperforms strong baselines on multiple cardiac conditions, offering a scalable, self-supervised path toward accurate, label-efficient heart health AI worldwide.
This repository shows how to perform inference with the model and includes a quick example in a zero-shot setting on the CODE-15 test dataset. The structure is as follows:
.
├── configs
│ ├── config.json
├── data
│ ├── pretrain
│ ├── downstream
│ │ ├── code-test
│ │ │ └── data
│ │ ├── annotations
│ │ ├── ecg_tracings.hdf5
├── models
│ ├── modules
│ └── dbeta.py
└── infer.ipynb
└── README.md
First, we need to clone the project and prepare the environment as follows:
git clone https://github.com/manhph2211/D-BETA.git && cd D-BETA
conda create -n dbeta python=3.9
conda activate dbeta
pip install -r requirements.txtNext, please download the CODE-test data from here and put it into the data/downstream/code-test directory.
Then, we need to download the pre-trained model from here, and put it into checkpoints directory.
Finally, to run the code, we can just use the example.ipynb notebook. You can also run the following command to execute the encoder only for feature extraction:
import torch
from models.processor import get_model, get_ecg_feats
model = get_model(config_path='configs/config.json', checkpoint_path='checkpoints/sample.pt')
ecgs = torch.randn(2, 12, 5000) # [batch, leads, length], 5000 = 10s x 500Hz
ecg_features = get_ecg_feats(model, ecgs)
print(ecg_features.shape) # (2, 768)This research was supported by the Google South Asia & Southeast Asia research award.
We are also thankful for the valuable work provided by this nice repo and repo.
If you find this work useful 😄, please consider citing our paper:
@inproceedings{hungboosting,
title={Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners},
author={Hung, Manh Pham and Saeed, Aaqib and Ma, Dong},
booktitle={Forty-second International Conference on Machine Learning}
}