VL-VAE

VL-VAE is a transformer-based VAE architecture that supports progressive decoding through variable-length latent embeddings.

Examples

Progressive decoding examples from CelebA-HQ-256x256.

out2.mp4

out5.mp4

out6.mp4

out12.mp4

Architecture

VL-VAE uses a straightforward architecture consisting of two headless transformers that implement the encoder and decoder networks respectively. Unlike conventional autoencoders, the architecture does not neccessarily include downsampling layers. Instead, compression is enforced by randomly truncating the encoder's output (i.e. latent embeddings) during training. We sample truncation lengths according to an exponential distribution.

TODO

Experiment with alternative attention mechanisms (NAT, axial, etc).
Experiment with alternative positional embedding methods.
Experiment with alternative patch embeddings.
Scale up to 1024x1024 resolution.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VL-VAE

Examples

Architecture

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VL-VAE

Examples

Architecture

TODO

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages