[

LTXV Model

]

LTXV: the first generation of the LTX video model

LTXV introduced core creative control and conditioning capabilities. LTX-2.3 is the current model, built for higher fidelity and production workflows.

Try LTX-2.3 Now Download LTXV

Introducing LTXV

LTXV is the foundational model in the LTX video generation family, combining fast inference, strong temporal consistency, and high visual fidelity for text-to-video and image-to-video workflows.

Try LTXV Now

Real-time performance with cinematic quality

Efficient spatiotemporal modeling enables fast generation without sacrificing motion coherence or detail.

Creative control & customization

Precise control over motion and depth, with support for LoRA and IC-LoRA to apply custom styles, brand identity, and motion behavior.

Open-source and scalable by design

A highly optimized, open architecture built to run efficiently on both high-end and consumer GPUs.

Model Evolution

The LTX video models build on each other, offering increasingly advanced capabilities for creative exploration and production-ready workflows.

LTXV

First-generation LTX video model built for speed, realism, and creative control
Creative exploration and flexible workflows
Video-only generation (no integrated audio)
Creative control through LoRA and editing tools
Legacy open-source model, available in LTX Studio

LTX-2

Advanced LTX model delivering synchronized audio–video generation and high-fidelity output
Production-ready video generation with extended capabilities
Native, synchronized audio and video generation
LoRA support with enhanced style features and camera control
Available model with open source, API access, Playground, and in LTX Studio

Model evolution

The LTX video models build on each other, offering increasingly advanced capabilities for creative exploration and production-ready workflows.

Try LTXV Now

01

LTXV

First-generation LTX video model built for speed, realism, and creative control
Creative exploration and flexible workflows
Video-only generation (no integrated audio)
Creative control through LoRA and editing tools
Legacy open-source model, available in LTX Studio

02

LTX-2

Advanced LTX model delivering synchronized audio–video generation and high-fidelity output
Production-ready video generation with extended capabilities
Native, synchronized audio and video generation
LoRA support with enhanced style features and camera control
Available model with open source, API access, Playground, and in LTX Studio

Model Evolution

The LTX video models build on each other, offering increasingly advanced capabilities for creative exploration and production-ready workflows.

Try LTXV Now

LTXV

First-generation LTX video model built for speed, realism, and creative control
Creative exploration and flexible workflows
Video-only generation (no integrated audio)
Creative control through LoRA and editing tools
Legacy open-source model, available in LTX Studio

LTX-2

Advanced LTX model delivering synchronized audio–video generation and high-fidelity output
Production-ready video generation with extended capabilities
Native, synchronized audio and video generation
LoRA support with enhanced style features and camera control
Available model with open source, API access, Playground, and in LTX Studio

LTXV vs LTX-2: model comparison

Compare the foundational creative capabilities of LTXV with the advanced, production-ready features of LTX-2, the model available and supported today.

LTXV

LTX-2

Modalities

Video only

Video + Audio (jointly generated)

Primary Task

Text-to-Video, Image-to-Video

Text-to-Audio+Video (T2AV)

Architecture

Single-stream diffusion transformer integrated with Video-VAE

Asymmetric dual-stream diffusion transformer (video stream + audio stream) with separate VAEs for Audio and Video

Transformer Streams

One unified stream

Dev Model

8-step Distilled model

8 bit quantization

FP8

NVFP4

Latent Space Design

Deeply compressed spatiotemporal latent space

Decoupled latent spaces (separate VAEs for video and audio), Deeply compressed video latent

Video VAE

Custom Video-VAE with integrated patchifying; decoder performs both latent-to-pixel conversion and final denoising in pixel space

Spatiotemporal causal Video-VAE

Audio VAE

Causal Audio-VAE operating on mel spectrograms

Compression / Tokenization

1:192 compression, 32×32×8 pixels per token

Audio tokens ≈ 1/25s per token, 128-dim latent vectors (video compression not restated)

Denoising Strategy

Transformer denoises in latent space; VAE decoder performs final denoising in pixel space

Dual-stream DiT jointly denoises audio and video latents

Cross-Modal Interaction

Not applicable

Bidirectional audio-video cross-attention throughout the model

Positional Encoding

3D RoPE for video

3D RoPE for video, 1D temporal RoPE for audio; temporal RoPE used for cross-modal attention

Text Conditioning

T5XXL standard textual encoding

Multilingual text encoder (Gemma 3-12B) with multi-layer feature extraction and thinking tokens

Classifier-Free Guidance (CFG)

Standard

Modality-aware CFG with separate text and cross-modal guidance scales

Supported Outputs

Silent video

Video with synchronized speech, ambient audio, and foley

Inference Capabilities

Faster-than-real-time video generation; supports text-to-video and image-to-video (trained simultaneously)

Multi-scale, multi-tile inference up to 4k audiovisual output

Maximum Duration

5 seconds base generation (up to 60 seconds with Temporal Expansion)

Up to 20 seconds of synchronized audiovisual content

Open Source

GitHub

Generation flows in LTX-2

Two flows, optimized for different production needs

Fast

Built for speed and tight feedback loops. Choose Fast Flow when rapid iteration matters more than maximum visual detail.

Technical characteristics:

Resolutions: 1080p, 1440p, 4K
Duration: up to 20 seconds
Lower compute load and faster render times

Get Started with Fast Flow

Pro

High-fidelity generation for stable, detailed results. Choose Pro Flow when visual quality and consistency are more important than render speed.

Technical characteristics:

Resolutions: 1080p, 1440p, 4K
FPS: 25 / 50
Duration: up to 20 seconds
Enhanced detail and stability across extended sequences

Get Started with Pro Flow

FAQs

What is LTXV?

LTXV is an open source AI video generation model developed by Lightricks as part of the LTX Models family. It supports text-to-video and image-to-video workflows and is commonly used in creator and research environments.

Is LTXV open source and where can I download it?

LTXV is available as an open source AI video generation model on GitHub and Hugging Face:

GitHub: https://github.com/Lightricks/LTX-Video
Hugging Face: https://huggingface.co/Lightricks/LTX-Video

These repositories provide access to the model code, files, and documentation for local use and experimentation.

Does LTXV support text-to-video?

Yes. LTXV supports text-to-video generation, allowing users to create videos from written prompts using open source workflows.

How do I use LTXV with ComfyUI?

LTXV is supported in ComfyUI Core as well as by the LTX-Video custom nodes and workflows. They can be installed from our GitHub or through the node-manager in ComfyUI.

Did LTXV get an upgrade or a "LTXV 2.0"?

LTX-2.3 is the next-generation model following LTXV. LTXV is the first video generation model in the LTX family, while LTX-2.3 is built on a different architecture and designed for higher-fidelity, production-grade video generation.

What are the different LTXV models, and which one should I use?

LTXV is available in multiple model variants designed to balance quality, speed, and VRAM usage depending on your workflow and hardware.

In general:

13B models offer the highest visual quality, but require more VRAM.
2B models are smaller and faster, making them suitable for lower-end hardware or rapid iteration.

Key model types include:

Development (dev) models, which provide full feature support, are trainable, and support the highest quality outputs.
Distilled models, which are much faster. Geared toward high quality though slightly less prompt adherent.
Mixed workflows, which combine dev and distilled models for a balance of speed and quality.
FP8 (quantized) models, which further reduce VRAM usage while maintaining similar behavior.

For most users:

Choose 13B dev for maximum quality and flexibility.
Choose 13B distilled for faster iteration with good quality.
Choose 2B distilled for lightweight, fast generation on limited hardware.

Each model variant is supported by specific inference configurations and recommended ComfyUI workflows, allowing users to select the setup that best fits their needs.

Does LTXV have an API?

No. LTXV does not currently have a public API. The model is available as an open source AI video generation model and is intended to be run locally or integrated through open source tools and workflows, such as ComfyUI.

For API-based video generation, Lightricks provides access through other models, such as LTX-2.3, which is designed for stable, production-ready API usage.

Does LTXV support LoRA training and LoRA-based customization?

Yes. LTXV supports LoRA-based workflows, allowing users to apply style, behavior, or identity adaptations through LoRA models as part of their video generation pipeline. LoRAs can be used with supported LTXV model variants and integrated through open source tools such as ComfyUI.

LoRA support enables faster experimentation and customization without retraining the full model, making it useful for creative and research-focused workflows.