LipSync — Real-Time 3D Facial Animation from Audio

Project Status: Active development. The TypeScript rewrite is feature-complete with 204 passing tests. The original AS3 source is preserved under /reference/as3/ for historical reference.

LipSync generates real-time 3D facial animations from speech audio. It performs LPC (Linear Predictive Coding) analysis on 44.1 kHz audio, classifies the resulting feature vectors through a neural network into viseme categories, and drives morph targets on a 3D avatar.

Originally authored in 2011 as an ActionScript 3 / Away3D / Flex application, this repository is the modern TypeScript rewrite using Three.js and Vite.

Prerequisites

Node.js >= 20.0.0
npm (included with Node.js)
A modern browser (see Browser Support)

How It Works

Audio (44.1 kHz PCM)
  └─► 20 ms windows (794 samples)
        └─► LPC analysis → 9 reflection coefficients
              └─► Neural Network (50→50→6) → 6-bit binary vector
                    └─► Decode → Phoneme (v1–v9, silence)
                          └─► 10 viseme morph targets → 3D avatar

Audio Pipeline

Component	Detail
Sample rate	44.1 kHz
Window size	18 ms (794 samples)
Step interval	20 ms (882 samples, `STEP_SAMPLES`)
Decimation	Stride-7 downsampling (keep every 7th sample)
LPC order	9 reflection coefficients (PARCOR)
VAD threshold	Energy ≥ 0.025 activates recognition
NN architecture	2 hidden layers × 50 neurons, 6 output bits
Viseme classes	10 (v1–v9 + silence)

3D Rendering

Three.js (r184) with WebGL renderer
GLTF/GLB model loading via GLTFLoader + KTX2 texture support
Primary model: Ready Player Me brunette.glb (72 morph targets)
Fallback model: facecap.glb (52 ARKit-style morph targets)
Morph targets support both RPM naming (viseme_aa, viseme_PP, ...) and ARKit naming (jawOpen, mouthSmile_L, ...)
2 camera presets with smooth lerp transitions
Eye bone rotation + blink morph animation

Quick Start

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Run all tests
npm test

# TypeScript type check
npm run build:tsc

# Preview production build
npx serve dist -l 30925

Project Structure

LipSync/
├── src/                          # TypeScript source
│   ├── main.ts                   # Entry point — UI, scene setup, mic pipeline
│   ├── core/
│   │   ├── lpc.ts                # LPC analysis (Durbin-Levinson recursion)
│   │   ├── nn.ts                 # Neural network (forward pass, sigmoid activation)
│   │   └── phoneme.ts            # Phoneme data model, encoding/decoding
│   ├── player/
│   │   ├── player.ts             # LipsyncPlayer — pipeline orchestrator
│   │   └── audio.ts              # Audio extraction, decimation, VAD
│   ├── avatar3d/
│   │   ├── avatar.ts             # Avatar3D — scene, camera, render loop
│   │   ├── modelLoader.ts        # GLTF/GLB loader, morph target binding
│   │   └── expression.ts         # Expression data model (14-parameter system)
│   └── __tests__/                # Vitest unit & integration tests
│       ├── lpc.test.ts           # LPC analysis (19 tests)
│       ├── nn.test.ts            # NN forward pass (12 tests)
│       ├── phoneme.test.ts       # Phoneme encoding/decoding (39 tests)
│       ├── audio.test.ts         # Audio pipeline (38 tests)
│       ├── player.test.ts        # Player integration (27 tests)
│       ├── expression.test.ts    # Expression system (50 tests)
│       └── avatar3d-avatar.test.ts # 3D avatar tests (6 tests)
├── public/                       # Static assets
│   ├── models/
│   │   ├── brunette.glb          # Ready Player Me avatar (72 morph targets)
│   │   └── facecap.glb           # Face Cap model (52 morph targets)
│   ├── samples/                  # Test speech audio files
│   ├── basis/                    # KTX2 transcoder (WASM)
│   └── audio-processor.js        # AudioWorklet processor (mic capture)
├── ground-truth/                 # JSON test vectors (verified at 1e-9 tolerance)
│   ├── audio-pipeline.json
│   ├── lpc-test-vectors.json
│   ├── nn-weights.json
│   └── phoneme-model.json
├── reference/
│   ├── as3/                      # Original AS3 source (Away3D, Flex)
│   │   ├── src/
│   │   ├── lib/
│   │   └── obj/
│   └── docs/                     # Historical documentation
│       ├── as3-architecture.md
│       ├── RESEARCH.md
│       ├── NN_FORWARD_PASS_AUDIT.md
│       └── REALTIME_ANALYSIS.md
├── package.json                  # Dependencies & scripts
├── tsconfig.json                 # TypeScript config
├── vite.config.ts                # Vite build config
├── vitest.config.ts              # Vitest test config
├── index.html                    # App shell
└── LICENSE                       # Unlicense (Public Domain)

Tests

204 tests — all passing across 7 test files using Vitest 3.

LPC analysis verified against 4 ground-truth JSON files
Neural network forward pass validated at 1e-9 floating-point tolerance
Phoneme encoding/decoding covers all 19 phoneme symbols
Audio pipeline tests cover VAD, decimation, energy computation
Player integration tests cover temporal smoothing, pre-buffering, event dispatch

npm test          # Run all tests
npm run test:watch  # Watch mode

Deployment

The app is automatically deployed to GitHub Pages on every push to master via GitHub Actions:

https://s-soltys.github.io/LipSync/

Deployment process:

npm run build
npx serve dist -l 30925

The Vite config sets base to /LipSync/ for the production build and copies ground-truth/ into the build output via a custom plugin.

Browser Support

LipSync targets ES2022 modern browsers. The application uses:

AudioWorklet API for microphone capture (Chrome 64+, Firefox 76+, Safari 14.1+)
WebGL 2.0 for 3D rendering
ES modules (no legacy bundling)

Reference

/reference/docs/ — Architecture documents, research notes, audit reports
/reference/as3/ — Original ActionScript 3 source code (2011, Away3D + Flex)

Key reference documents:

as3-architecture.md — Original architecture walkthrough
NN_FORWARD_PASS_AUDIT.md — Neural network pass audit
REALTIME_ANALYSIS.md — Real-time pipeline analysis
RESEARCH.md — Background research notes

License

This project is public domain under the Unlicense.

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any means.

See the LICENSE file for the full text.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
ground-truth		ground-truth
public		public
reference		reference
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE_REVIEW.md		ARCHITECTURE_REVIEW.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCUMENTATION_REVIEW.md		DOCUMENTATION_REVIEW.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.md		README.md
REVIEW.md		REVIEW.md
SECURITY.md		SECURITY.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipSync — Real-Time 3D Facial Animation from Audio

Prerequisites

How It Works

Audio Pipeline

3D Rendering

Quick Start

Project Structure

Tests

Deployment

Browser Support

Reference

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LipSync — Real-Time 3D Facial Animation from Audio

Prerequisites

How It Works

Audio Pipeline

3D Rendering

Quick Start

Project Structure

Tests

Deployment

Browser Support

Reference

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages