Skip to content

vladmandic/dcae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoEncoders

DC-AE

What's Changed?

From original EfficientViT DC-AE code...

  • Removed internal dependencies on anything thats not pure dc-ae inference code
  • Refactored to work as standalone module with relative imports
  • Simplified dependencies
  • Replaced model loader
  • Added single class for anything related to dc-ae
  • Unified device/dtype handling

Usage

from dcae import DCAE
ae = DCAE(model='dc-ae-f32c32-mix-1.0', device=torch.device('cuda'), dtype=torch.bfloat16, cache_dir='~/.cache/huggingface')
encoded = ae.encode(tensor)
decoded = ae.decode(encoded)

Credit

Notes

  • 2 variants: in and mix
    They have different scaling factors. For example f32c32-mix-1.0 has scaling factor as 0.4552 but f32c32-in-1.0 has 0.3189
  • each variant in 3 flavors: f32c32, f64c128, f128c512
    with increasing number of internal stages
    resulting sizes are same as each stage compresses by 2* and adds 2** channels:
    example for 1024x1024 input:
    • f32c32: 1.26GB, 5 stages (2**5=32) encodes to 32x32x32
    • f64c128: 2.64GB, 6 stages (2**6=64) encodes to 128x16x16
    • f128c512: 4.37GB, 7 stages (2**7=128) encodes to 512x8x8
  • notes in paper on FID/PSNR dont make sense
  • despite large size for an autoencoder, its fast and has relative low resource requirements
    typical encode/decode is ~0.1s for 1k image on RTX4090
  • without any tiling it can do native 4k encode/decode in ~0.5s using 20GB VRAM

Comparing AutoEncoders

python compare.py <image>

Will run encode/decode on a given image using all available DC-AE models and other known autoencoders and produce an image grid with:

  • image after encode/decode
  • memory usage and time taken for each
  • diff image from original
  • diff, mse, ssim and fid scores

Supported models:

  • dc-ae-f32c32-in-1.0, dc-ae-f64c128-in-1.0, dc-ae-f128c512-in-1.0,
  • dc-ae-f32c32-mix-1.0, dc-ae-f64c128-mix-1.0, dc-ae-f128c512-mix-1.0,
  • madebyollin/taesd, madebyollin/taesdxl, madebyollin/sdxl-vae-fp16-fix,
  • ostris/vae-kl-f8-d16, cross-attention/asymmetric-autoencoder-kl-x-1-5,

Examples

helmet-vae-grid afghan-vae-grid alla-vae-grid asian-vae-grid cara-vae-grid paint-vae-grid smoke-vae-grid robot-vae-grid mdd-vae-grid

Originals

Link

About

EfficientViT DC-AE Simplified

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •