Support Audio #31

sxy-trans-n · 2025-06-19T08:55:56Z

Integrate WhisperKit and Enhance Swama CLI and Server

Summary

This PR integrates WhisperKit for local speech recognition, adds comprehensive CLI transcription support, and extends the server API to include audio transcription endpoints. All changes maintain high code quality with proper error handling and performance optimizations.

Key Changes

WhisperKit Integration

Added WhisperKit dependency to Package.swift
Implemented WhisperKitRunner with clean audio validation and transcription
Added model caching and concurrency control via ModelPool
Support for multiple response formats (simple, JSON, verbose JSON)

CLI Enhancements

New transcribe command with flexible options (model, language, format, temperature, prompt)
Enhanced pull command to support WhisperKit models
Updated model alias system with WhisperKit model mappings

Server API Extension

Added /v1/audio/transcriptions endpoint with OpenAI-compatible interface
Multipart form data parsing for audio file uploads
Proper error handling and HTTP status codes
Support for multiple output formats

Model Management

Unified model downloading for WhisperKit models
Added WhisperKit-specific model validation and storage
Enhanced model metadata generation

Files Changed

Core Package Files

swama/Package.swift - Added WhisperKit dependency
swama/Package.resolved - Updated dependencies

CLI Commands

swama/Sources/Swama/CLI/Command.swift - Registered Transcribe subcommand
swama/Sources/Swama/CLI/Pull.swift - Enhanced to support WhisperKit models
swama/Sources/Swama/CLI/Transcribe.swift - New transcription CLI command

Model Management

swama/Sources/SwamaKit/Model/ModelAliases.swift - Added WhisperKit aliases
swama/Sources/SwamaKit/Model/ModelDownloader.swift - WhisperKit download support
swama/Sources/SwamaKit/Model/ModelPool.swift - WhisperKit caching and concurrency
swama/Sources/SwamaKit/Model/WhisperKitRunner.swift - New WhisperKit runner

Server Components

swama/Sources/SwamaKit/Server/HTTPHandler.swift - Added transcription endpoint
swama/Sources/SwamaKit/Server/TranscriptionsHandler.swift - New transcription API handler

Testing

All existing unit tests pass
CLI help and basic functionality verified
Code formatted with SwiftFormat
Successful build in release mode

Usage Examples

CLI Transcription

# Basic transcription
swama transcribe audio.wav

# With specific model and language
swama transcribe audio.wav -m whisper-base -l en

# Verbose output with timestamps
swama transcribe audio.wav --verbose

# JSON output
swama transcribe audio.wav -f json

Model Download

# Download WhisperKit models
swama pull whisper-base
swama pull whisper-small
swama pull whisper-large

Server API

# Transcribe audio via API
curl -X POST http://localhost:28100/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=whisper-base" \
  -F "response_format=json"

Backward Compatibility

This PR maintains full backward compatibility with existing CLI commands and server endpoints. All existing functionality continues to work as expected.

syh-trans-n

👏

sxy-trans-n added 2 commits June 19, 2025 17:43

add audio support

d49631e

update readme

0804c56

sxy-trans-n requested review from djx-trans-n and syh-trans-n June 19, 2025 08:56

syh-trans-n approved these changes Jun 19, 2025

View reviewed changes

sxy-trans-n merged commit da109de into main Jun 19, 2025
2 checks passed

sxy-trans-n deleted the audio branch June 19, 2025 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Audio #31

Support Audio #31

Uh oh!

sxy-trans-n commented Jun 19, 2025 •

edited

Loading

Uh oh!

syh-trans-n left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support Audio #31

Support Audio #31

Uh oh!

Conversation

sxy-trans-n commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrate WhisperKit and Enhance Swama CLI and Server

Summary

Key Changes

WhisperKit Integration

CLI Enhancements

Server API Extension

Model Management

Files Changed

Core Package Files

CLI Commands

Model Management

Server Components

Testing

Usage Examples

CLI Transcription

Model Download

Server API

Backward Compatibility

Uh oh!

syh-trans-n left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sxy-trans-n commented Jun 19, 2025 •

edited

Loading