English | 日本語
⚠️ This is a reference snapshot, not a maintained project. The purpose of this repository is to serve as a codebase that you can show to an AI assistant (Claude, ChatGPT, Codex, etc.) to help you build a similar system. It is not intended to be cloned and run as-is.
| Idle | Voice Input | Thinking | Speaking |
|---|---|---|---|
Character art: WhiteCUL立ち絵素材 by moiky — not included in this repository. You must supply your own images.
An iOS app that turns an iPhone (mounted on an Insta360 Flow Pro or any DockKit-compatible motorized stand) into a physical AI avatar with:
- 🎭 29 expressions — pre-composed character images swapped in real-time
- 🗣️ Text-to-speech with lip sync — audio streamed from a server, mouth animation synced to playback
- 💬 Speech bubbles — assistant and user speech displayed on screen
- 🎤 Voice input — on-device wake word detection + speech-to-text (iOS 26+ SpeechAnalyzer)
- 🤖 DockKit motor control — nod, shake, look around, custom velocity/rotation sequences
- 🌙 Idle scenes — sleeping, snacking, singing, thinking, bored (with burn-in protection)
- 📡 WebSocket control — all features controllable via JSON commands over LAN
┌─────────────┐ WebSocket (8765) ┌──────────────┐
│ AI Backend │ ◄──────────────────────► │ iPhone App │
│ (any host) │ JSON commands + audio │ (this code) │
└─────────────┘ └──────┬───────┘
│ DockKit API
┌──────▼───────┐
│ Motorized │
│ Phone Stand │
└──────────────┘
The app is the display + motor controller. Your AI backend synthesizes speech (e.g., VOICEVOX, ElevenLabs), encodes it as base64 WAV, and sends it along with expression/motion commands via WebSocket.
| File | Role |
|---|---|
ContentView.swift |
Main UI — composites expression images, speech bubbles, status bar; routes WebSocket commands |
WebSocketServer.swift |
LAN WebSocket server (port 8765) using Network.framework |
ExpressionManager.swift |
Expression enum (29 types), auto-blink, JSON command parser |
DockKitManager.swift |
DockKit accessory control — gestures, orientation, velocity, motion presets |
AudioPlayerManager.swift |
Base64 WAV playback + lip-sync timer |
SpeechQueueManager.swift |
FIFO queue for sequential speech items with expression/motion per item |
VoiceInputManager.swift |
iOS 26 SpeechAnalyzer — wake word → capture → send to backend |
IdleSceneManager.swift |
Time/activity-based idle animations with screen dimming |
SpeechBubbleView.swift |
Styled speech bubble (assistant/user/previous/sending) |
StatusBarView.swift |
Connection status pills (WebSocket, DockKit, mic, audio) |
CameraManager.swift |
AVCaptureSession for camera preview (DockKit requires active camera) |
See spec/ for detailed documentation:
architecture.md— System diagram and data flowwebsocket-protocol.md— Full WebSocket API referencehost-scripts.md— Backend control scriptsvoice-input.md— On-device voice input (iOS 26+)dockkit-integration.md— Motorized stand controlidle-scenes.md— Idle animation system
The host-scripts/ directory contains the backend tools that control the app:
| Script | Purpose |
|---|---|
yuki_cmd.sh |
Main command — TTS synthesis (VOICEVOX) + expression + motion, sent as one WebSocket message |
ws_send.py |
Raw WebSocket client for sending arbitrary JSON commands |
# Quick start
export YUKI_IPHONE_IP=<your-iphone-ip>
./host-scripts/yuki_cmd.sh "こんにちは" --expression happy --motion smallNodSee spec/host-scripts.md for full documentation.
- Character images — The original uses WhiteCUL (copyrighted). You must supply your own PNG images named to match the
Expressionenum cases inExpressionManager.swift. - TTS backend — The app plays audio sent to it; you need your own speech synthesis pipeline.
- AI orchestration — The WebSocket protocol is documented in
spec/websocket-protocol.md. Build your own controller.
- Show this code to your AI assistant and describe what you want to build
- The AI can understand the architecture from the code + specs and help you adapt it
- Replace the expression images with your own character art
- Build your own backend that speaks the WebSocket protocol
- Optionally integrate DockKit for physical movement
- iOS 17.0+ (iOS 26+ for voice input features)
- Xcode 26+
- DockKit-compatible stand (optional, for motor control)
- A TTS engine on your network (VOICEVOX, ElevenLabs, etc.)
- Character art in screenshots: WhiteCUL立ち絵素材 by moiky — used under the original distribution terms. The character image files are not included in this repository.
- WhiteCUL: A character from the VOICEVOX project
MIT — The code is free to use. Character art is NOT included and must be sourced separately. Screenshots contain WhiteCUL character art by moiky, used for demonstration purposes only.