Why Sokuji?

Live speech translation powered by OpenAI, Google Gemini, Palabra.ai, and Kizuna AI

Why Sokuji?

Sokuji is a cross-platform desktop application designed to provide live speech translation using OpenAI, Google Gemini, Palabra.ai, and Kizuna AI APIs. Available for Windows, macOS, and Linux, it bridges language barriers in live conversations by capturing audio input, processing it through advanced AI models, and delivering translated output in real-time. It also supports OpenAI-compatible API endpoints for flexibility.

demo.mp4

Browser Extension Available!

Prefer not to install a desktop application? Try our browser extension for Chrome, Edge, and other Chromium-based browsers. It offers the same powerful live speech translation features directly in your browser, with special integration for Google Meet and Microsoft Teams.

Installing Browser Extension in Developer Mode

If you want to install the latest version of the browser extension:

Download the latest sokuji-extension.zip from the releases page
Extract the zip file to a folder
Open Chrome/Chromium and go to chrome://extensions/
Enable "Developer mode" in the top right corner
Click "Load unpacked" and select the extracted folder
The Sokuji extension will be installed and ready to use

More than just translation

Sokuji goes beyond basic translation by offering a complete audio routing solution with virtual device management (Linux only), allowing for seamless integration with other applications. It provides a modern, intuitive interface with real-time audio visualization and comprehensive logging.

Features

Real-time speech translation using OpenAI, Google Gemini, Palabra.ai, and Kizuna AI APIs
Simple Mode Interface: Streamlined 6-section configuration for non-technical users:
- Interface language selection
- Translation language pairs (source/target)
- API key management with validation
- Microphone selection with "Off" option
- Speaker selection with "Off" option
- Real-time session duration display
Multi-Provider Support: Seamlessly switch between OpenAI, Google Gemini, Palabra.ai, and Kizuna AI.
Supported Models:
- OpenAI: gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview, gpt-realtime, gpt-realtime-2025-08-28
- Google Gemini: gemini-2.0-flash-live-001, gemini-2.5-flash-preview-native-audio-dialog
- Palabra.ai: Real-time speech-to-speech translation via WebRTC
- Kizuna AI: OpenAI-compatible models with backend-managed authentication
- OpenAI Compatible: Support for custom OpenAI-compatible API endpoints (Electron only)
Automatic turn detection with multiple modes (Normal, Semantic, Disabled) for OpenAI
Audio visualization with waveform display
Advanced Virtual Microphone (Linux only) with dual-queue audio mixing system:
- Regular audio tracks: Queued and played sequentially
- Immediate audio tracks: Separate queue for real-time audio mixing
- Simultaneous playback: Mix both track types for enhanced audio experience
- Chunked audio support: Efficient handling of large audio streams
Real-time Voice Passthrough: Live audio monitoring during recording sessions
Virtual audio device creation and management on Linux (using PulseAudio/PipeWire)
Automatic audio routing between virtual devices (Linux only)
Automatic device switching and configuration persistence
Audio input and output device selection
Comprehensive logs for tracking API interactions
Customizable model settings (temperature, max tokens)
User transcript model selection (for OpenAI: gpt-4o-mini-transcribe, gpt-4o-transcribe, whisper-1)
Noise reduction options (for OpenAI: None, Near field, Far field)
API key validation with real-time feedback
Configuration persistence in user's home directory
Optimized AI Client Performance: Enhanced conversation management with consistent ID generation
Enhanced Tooltips: Interactive help tooltips powered by @floating-ui for better user guidance
Multi-language Support: Complete internationalization with 35+ languages and English fallback

Audio Architecture

Sokuji uses a modern audio processing pipeline built on Web Audio API, with additional virtual device capabilities on Linux:

ModernAudioRecorder: Captures input with advanced echo cancellation
ModernAudioPlayer: Handles playback with queue-based audio management
Real-time Processing: Low-latency audio streaming with chunked playback
Virtual Device Support: On Linux, creates virtual audio devices for application integration

Audio Flow

The audio flow in Sokuji:

Input Capture: Microphone audio is captured with echo cancellation enabled
AI Processing: Audio is sent to the selected AI provider for translation
Playback: Translated audio is played through the selected monitor device
Virtual Device Output (Linux only): Audio is also routed to virtual microphone for other applications
Optional Passthrough: Original voice can be monitored in real-time

This architecture provides:

Better echo cancellation using modern browser APIs
Lower latency through optimized audio pipelines
Virtual device integration on Linux for seamless app-to-app audio routing
Cross-platform compatibility with graceful degradation

Developer Notes

Architecture Improvements

Modern Audio Service Architecture:

ModernAudioRecorder: Web Audio API-based recording with echo cancellation
ModernAudioPlayer: Queue-based playback with event-driven processing
Unified audio service for both Electron and browser extension platforms

Optimized Client Management:

GeminiClient: Improved conversation item management with consistent instance IDs
Reduced method calls and improved performance
Better memory management for long-running sessions

Audio Processing Implementation:

Queue-based audio chunk management for smooth playback
Real-time passthrough with configurable volume control
Event-driven playback to reduce CPU usage
Automatic device switching and reconnection

Preparation

(required) An OpenAI, Google Gemini, or Palabra.ai API key, OR a Kizuna AI account. For Palabra.ai, you will need a Client ID and Client Secret. For Kizuna AI, sign in to your account to automatically access backend-managed API keys. For OpenAI-compatible endpoints, configure your custom API endpoint URL in the settings (Electron only).
(optional) Linux with PulseAudio or PipeWire for virtual audio device features (desktop app only)

Installation

From Source

Prerequisites

Node.js (latest LTS version recommended)
npm
Audio support works on all platforms (Windows, macOS, Linux)
Virtual audio devices require Linux with PulseAudio or PipeWire (desktop app only)

Steps

Clone the repository

git clone https://github.com/kizuna-ai-lab/sokuji.git
cd sokuji

Install dependencies
```
npm install
```
Launch the application in development mode
```
npm run electron:dev
```
Build the application for production
```
npm run electron:build
```

From Packages

Download the appropriate package for your platform from the releases page:

Windows

Download and run the .exe installer:

Sokuji Setup 0.9.18.exe

macOS

Download and install the .dmg package:

Sokuji-0.9.18.dmg

Linux (Debian/Ubuntu)

Download and install the .deb package:

sudo dpkg -i sokuji_0.9.18_amd64.deb

For other Linux distributions, you can also download the portable .zip package and extract it to your preferred location.

How to Use

Setup your API key:
- Click the Settings button in the top-right corner
- Select your desired provider (OpenAI, Gemini, Palabra, or Kizuna AI).
- For user-managed providers: Enter your API key and click "Validate". For Palabra, you will need to enter a Client ID and Client Secret. For OpenAI Compatible endpoints (Electron only), configure both the API key and custom endpoint URL.
- For Kizuna AI: Sign in to your account to automatically access backend-managed API keys.
- Click "Save" to store your configuration securely.
Configure audio devices:
- Click the Audio button to open the Audio panel
- Select your input device (microphone)
- Select your output device (speakers/headphones)
Start a session:
- Click "Start Session" to begin
- Speak into your microphone
- View real-time transcription and translation
Monitor and control audio:
- Toggle monitor device to hear translated output
- Enable real voice passthrough for live monitoring
- Adjust passthrough volume as needed
Use with other applications (Linux only):
- Select "Sokuji_Virtual_Mic" as the microphone input in your target application
- Translated audio will be sent to that application with advanced mixing support

Recent Improvements

Simple Mode Interface (v0.10.x)

Redesigned user interface for improved accessibility:

Streamlined Configuration: 6-section unified layout replacing complex tabbed interface
Enhanced Tooltips: Interactive help using @floating-ui library for better user guidance
Session Duration Display: Real-time tracking of conversation length
Unified Styling: Consistent UI design with improved visual hierarchy
Multi-language Support: Complete i18n with 35+ languages and English fallback

Modern Audio Processing (v0.9.x)

The audio system now features improved echo cancellation and processing:

Echo Cancellation: Advanced echo suppression using modern Web Audio APIs
Queue-Based Playback: Smooth audio streaming with intelligent buffering
Real-time Passthrough: Monitor your voice with adjustable volume control
Event-Driven Architecture: Reduced CPU usage through efficient event handling
Cross-Platform Support: Unified audio handling across all platforms

AI Client Optimization (v0.8.x)

Enhanced Google Gemini client performance:

Consistent ID Generation: Optimized conversation item management with fixed instance IDs
Improved Memory Usage: Reduced redundant ID generation calls
Better Performance: Streamlined conversation handling for faster response times

Real-time Voice Passthrough

Live audio monitoring capabilities:

Real-time Feedback: Hear your voice while recording for better user experience
Volume Control: Adjustable passthrough volume for optimal monitoring
Low Latency: Immediate audio feedback using optimized audio processing

Architecture

Sokuji features a simplified architecture focused on core functionality:

Backend (Cloudflare Workers)

Simplified User System: Only users and usage_logs tables
Real-time Usage Tracking: Relay server directly writes usage data to database
Better Auth: Handles all user authentication and session management
Streamlined API: Only essential endpoints maintained (/quota, /check, /reset)

Frontend (React + TypeScript)

Service Factory Pattern: Platform-specific implementations (Electron/Browser Extension)
Modern Audio Processing: AudioWorklet with ScriptProcessor fallback
Unified Components: SimpleConfigPanel and SimpleMainPanel for streamlined UX
Context-Based State: React Context API without external state management

Database Schema

-- Core user table
users (id, email, name, subscription, token_quota)

-- Simplified usage tracking (written by relay)
usage_logs (id, user_id, session_id, model, total_tokens, input_tokens, output_tokens, created_at)

Technologies Used

Runtime: Electron 34+ (Windows, macOS, Linux) / Chrome Extension Manifest V3
Frontend: React 18 + TypeScript
Backend: Cloudflare Workers + Hono + D1 Database
Authentication: Better Auth
AI Providers: OpenAI, Google Gemini, Palabra.ai, Kizuna AI, and OpenAI-compatible endpoints
Advanced Audio Processing:
- Web Audio API for real-time audio processing
- MediaRecorder API for reliable audio capture
- ScriptProcessor for real-time audio analysis
- Queue-based playback system for smooth streaming
UI Libraries:
- @floating-ui/react for advanced tooltip positioning
- SASS for styling
- Lucide React for icons
Internationalization:
- i18next for multi-language support
- 35+ language translations

License

AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 622 Commits
.github		.github
BlackHole @ 4fdd55c		BlackHole @ 4fdd55c
assets		assets
backend		backend
docs		docs
electron		electron
extension		extension
pkg-scripts		pkg-scripts
public		public
resources		resources
screenshots		screenshots
scripts		scripts
shared		shared
src		src
.electronforge.config.js		.electronforge.config.js
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
build-pkg.sh		build-pkg.sh
forge.config.js		forge.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

License

kizuna-ai-lab/sokuji

Folders and files

Latest commit

History

Repository files navigation

Why Sokuji?

Browser Extension Available!

Installing Browser Extension in Developer Mode

More than just translation

Features

Audio Architecture

Audio Flow

Developer Notes

Architecture Improvements

Preparation

Installation

From Source

Prerequisites

Steps

From Packages

Windows

macOS

Linux (Debian/Ubuntu)

How to Use

Recent Improvements

Simple Mode Interface (v0.10.x)

Modern Audio Processing (v0.9.x)

AI Client Optimization (v0.8.x)

Real-time Voice Passthrough

Architecture

Backend (Cloudflare Workers)

Frontend (React + TypeScript)

Database Schema

Technologies Used

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 59

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages