Eva (Encoding of visual atlas) is a foundation model for tissue imaging data that learns complex spatial representations of tissues at the molecular, cellular, and patient levels. Eva uses a novel vision transformer architecture and is pre-trained on masked image reconstruction of spatial proteomics and matched histopathology.
git clone https://github.com/YAndrewL/Eva.git
cd Eva
conda env create -f env.yaml
conda activate Eva
pip install -e .Eva model weights are open-sourced on HuggingFace Hub.
from Eva.utils import load_from_hf, extract_features, create_model
from omegaconf import OmegaConf
import torch
# Load configuration
conf = OmegaConf.load("config.yaml")
# Load model from HuggingFace Hub
device = "cuda" if torch.cuda.is_available() else "cpu"
model = load_from_hf(
repo_id="yandrewl/Eva",
conf=conf,
device=device
)Download the GenePT marker embeddings from Zenodo record. Use the file GenePT_gene_protein_embedding_model_3_text.pickle and store it as marker_embeddings/marker_embedding.pkl locally.
# Extract embeddings
patch = torch.randn(1, 224, 224, 6) # Shape: [B, H, W, C]
biomarkers = ["DAPI", "CD3e", "CD20", "CD4", "CD8", "PanCK"] # biomarkers
features = extract_features(
patch=patch,
bms=[biomarkers], # List of biomarker lists (one per batch item)
model=model,
device=device,
cls=False, # Use CLS token (True) or average patches (False)
channel_mode="full" # Options: "full", "HE", "MIF"
)
# or use model method
features = model.extract_features(
patch=patch,
bms=[biomarkers],
device=device,
cls=False,
channel_mode="full" # Options: "full", "HE", "MIF"
)When data includes H&E (Hematoxylin and Eosin) channels, H&E should be added as the last three channels:
mif_patch = torch.randn(1, 224, 224, 6)
he_patch = torch.randn(1, 224, 224, 3)
patch = torch.cat([mif_patch, he_patch], dim=-1)
biomarkers = ["DAPI", "CD3e", "CD20", "CD4", "CD8", "PanCK", "HECHA1", "HECHA2", "HECHA3"] # Last 3 are HE channels
# Extract features using different modality
features = extract_features(
patch=patch,
bms=[biomarkers],
model=model,
device=device,
cls=False,
channel_mode="MIF", # Set to "HE" to use HE channels only, or "full" to use all channels
)The model requires a configuration file (YAML format) that specifies:
- Dataset parameters (patch_size, token_size, marker_dim, etc.)
- Channel mixer parameters (dim, n_layers, n_heads, etc.)
- Patch mixer parameters (dim, n_layers, n_heads, etc.)
- Decoder parameters (dim, n_layers, n_heads, etc.)
See config.yaml for an example configuration.