Skip to content

Eydcao/VICON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics

TMLR arXiv Hugging Face

Updates

  • [Mar 2026] Pre-trained checkpoints released on HuggingFace!
  • [Jan 2026] VICON has been accepted by TMLR! The accepted paper is available on OpenReview.

Installation

# Clone repository
git clone https://github.com/Eydcao/VICON.git
cd VICON

# Create conda environment (installs Python + Poetry)
conda env create -f environment.yml
conda activate vicon

# Install dependencies via Poetry
poetry install --no-root

Pre-trained Checkpoints

Model Dataset Download
VICON Combined (All 3 PDEs) Hugging Face

Dataset

VICON was evaluated on three fluid dynamics datasets:

  • PDEArena-Incomp (incompressible Navier-Stokes)
  • PDEBench-Comp-HighVis (compressible Navier-Stokes)
  • PDEBench-Comp-LowVis (compressible Navier-Stokes with numerical-zero viscosity)

Refer to dataset_prepare/README.md for details.

Usage

We use Hydra for configuration management, allowing flexible parameter modifications via command line or config files.

Training

# Train the model with specific configurations
# Assuming GPUs 0 and 1, enable wandb logging
CUDA_VISIBLE_DEVICES="0,1" python src/train.py \
    plot=0 board=1 amp=0 \
    dataset_workers=2 multi_gpu=1 \
    datasets.train_batch_size=30 \
    loss.min_ex=5 \
    model.transformer.num_layers=10 \
    model.use_patch_pos_encoding=True \
    model.use_func_pos_encoding=True \
    datasets.types.COMPRESSIBLE2D.folder=$COMPRESSIBLE2D_DIR \
    datasets.types.EULER2D.folder=$EULER2D_DIR \
    datasets.types.NS2D.folder=$NS2D_DIR

Evaluation

# Run rollout evaluation
python src/rollout.py \
    rollout.ckpt_dir=/path/to/checkpoint/dir \
    rollout.ckpt_stamp=checkpoint_name \
    board=0

Refer to the configs/ folder for detailed configuration options.

Motivations

Current approaches to operator learning of PDEs face challenges limiting practical applications:

  1. Single Operator Learning

    • Requires complete retraining when equation type or parameters change
    • Impractical for real-world deployment where system conditions vary
  2. Pretrain-Finetune Approach

    • Can handle multiple PDE types during pretraining
    • Still requires substantial data collection (hundreds of frames/trajectories) for finetuning
    • Challenging in downstream applications with limited data availability, e.g., online environments

The ICON Innovation and Its Limitations

ICON (Yang et al, 24) introduced a novel perspective inspired by in-context learning in LLMs:

  • Defines physical fields (before/after certain timestep) as query/answer (or COND/QoI) pairs
  • Extracts dynamics directly from a few pairs without requiring finetuning

However, ICON faces an architectural limitation:

  • Processes entire discretized physical fields as individual query/answer
  • Results in extremely long transformer sequences
  • Becomes computationally infeasible for real-scale, high-dimensional data

VICON's Solution

Figure 1: Schematic overview of VICON architecture.

Inspired by Vision Transformers (ViT), which efficiently handle large images by processing them in patches, VICON overcomes these limitations while maintaining the benefits of in-context learning. Our contributions include:

  1. First implementation of in-context learning for 2D PDEs without requiring explicit PDE information
  2. State-of-the-art empirical results compared to existing methods
  3. Flexible rollout capabilities through learning to extract dynamics from pairs with varying timestep sizes

Method

VICON combines the in-context operator learning framework with vision transformer architecture through several key components:

1. Patch-wise Processing

  • Divides input physical fields into manageable patches
  • Significantly reduces sequence length compared to token-per-point approach in original ICON

2. Dual Positional Encoding System

Our system uses two types of positional encodings to inform precise spatial and function relationships between tokens:

a) Patch Position Encoding

  • Encodes relative spatial relationships between patches
  • Maintains awareness of physical space structure

b) Function Position Encoding

  • Indicates the role of each patch in the sequence:
    • Which pair in the sequence it belongs to
    • Whether it's part of the input (query) or output (answer) in that pair
  • Crucial for maintaining the in-context learning structure

3. Flexible Rollout Strategies

VICON's unique training approach enables versatile rollout schemes:

  • Forms pairs with varying timestep sizes during training
  • Allows a single trained model to:
    • Extract dynamics at different time scales
    • Perform rollouts with various timestep strides
    • Potentially reduce the number of rollout steps when appropriate, minimizing error accumulation in long-term predictions

Figure 2: VICON's flexible rollout strategy: Starting with timestep dt=1, the model progressively accumulates flow fields needed for larger stride predictions up to dt=5, enabling efficient long-term predictions with larger timestep strides.

Results

Performance Improvements over MPP

  • Reduction in scaled L2 error for long-term predictions:
    • 40% for PDEBench-Comp-LowVis
    • 61.6% for PDEBench-Comp-HighVis
  • 67% reduction in turbulence kinetic energy prediction error for PDEBench-Comp-LowVis
  • 3x faster inference time

Figure 3: Comparison of turbulence kinetic energy predictions: VICON demonstrates superior accuracy over MPP in both RMSE metrics and advanced physical statistics.

Timestep Stride Generalization and Flexible Rollout Strategies

VICON demonstrates exceptional adaptability in handling varying time strides:

  • Successfully maintains prediction accuracy with previously unseen larger strides (smax=6,7), despite training only on smax=1~5
  • Particularly effective for PDEArena-Incomp dataset, where multi-step rollout (smax=5) largely outperforms single-step predictions
  • Enables direct application to experimental settings with hardware-constrained sampling rates, eliminating the need for interpolation or retraining the model

Figure 4: Comparison with state-of-the-art performance. VICON is additionally evaluated with different timestep strides, including previously unseen ones. For PDEArena-Incomp dataset, maximum stride size (smax=5) achieves optimal rollout performance.

Citation

@article{cao2026vicon,
 author = {Cao, Y. and Liu, Y. and Yang, L. and Yu, R. and Schaeffer, H. and Osher, S.},
 title = {{VICON}: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction},
 journal = {Transactions on Machine Learning Research},
 year = {2026},
 issn = {2835-8856},
 url = {https://openreview.net/forum?id=6V3YmHULQ3}
}

Poster

Acknowledgements

This work was supported by the US Army Research Office (Army-ECASE W911NF-23-1-0231), US Department of Energy, IARPA HAYSTAC Program, CDC, DARPA AIE FoundSci, DARPA YFA, NSF grants (#2205093, #2100237, #2146343, #2134274), AFOSR MURI (FA9550-21-1-0084), NSF DMS 2427558, NUS Presidential Young Professorship, STROBE NSF STC887 DMR 1548924, and ONR N00014-20-1-2787.

About

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages