A simple and lightweight proxy for seamless integration with multiple STT providers including Whisper.cpp and Cloudflare AI.
Proxy Collection: This package is part of a plug-and-play proxy collection designed for easy integration into your projects:
@derogab/llm-proxy- LLM provider proxy@derogab/stt-proxy- Speech-to-Text provider proxy (this package)
- Multi-provider support: Switch between STT providers with environment variables.
- TypeScript support: Full TypeScript definitions included.
- Simple API: Single function interface for all providers.
- Automatic provider detection: Automatically selects the best available provider based on environment variables.
npm install @derogab/stt-proxyimport { transcribe } from '@derogab/stt-proxy';
const result = await transcribe('/path/to/audio.wav');
console.log(result.text);The package automatically detects which STT provider to use based on your environment variables. Configure one or more providers:
STT_PROVIDER=cloudflare # Optional, force a specific provider (whisper.cpp, cloudflare)When STT_PROVIDER is set, the specified provider will be used and an error is thrown if its credentials are not configured. When not set, providers are selected automatically based on priority.
Note:
PROVIDERis supported as a fallback for backward compatibility whenSTT_PROVIDERis not set.
WHISPER_CPP_MODEL_PATH=/path/to/ggml-base.bin # Required, path to your GGML model fileDownload models from HuggingFace:
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binCLOUDFLARE_ACCOUNT_ID=your-account-id # Required
CLOUDFLARE_AUTH_KEY=your-api-token # RequiredUses the @cf/openai/whisper-large-v3-turbo model.
Transcribes audio to text using the configured STT provider. The package automatically manages provider initialization and cleanup.
Parameters:
audio: Path to audio file (string) or audio Bufferoptions(optional): Transcription options
Returns:
- Promise that resolves to an object with
textproperty
Options Format:
type TranscribeOptions = {
language?: string; // Language code (e.g., 'en', 'es', 'fr')
translate?: boolean; // Translate to English
};Output Format:
type TranscribeOutput = {
text: string;
};Example:
// Transcribe from file path
const result1 = await transcribe('/path/to/audio.wav');
console.log(result1.text);
// Transcribe from Buffer
const audioBuffer = fs.readFileSync('/path/to/audio.wav');
const result2 = await transcribe(audioBuffer);
console.log(result2.text);
// With options
const result3 = await transcribe('/path/to/audio.wav', {
language: 'en',
translate: false
});
console.log(result3.text);When STT_PROVIDER environment variable is set, that provider is used directly.
Otherwise, the package selects providers in the following order:
- Whisper.cpp (if
WHISPER_CPP_MODEL_PATHis set and file exists) - Cloudflare AI (if
CLOUDFLARE_ACCOUNT_IDandCLOUDFLARE_AUTH_KEYare set)
If no providers are configured, the function throws an error.
- FFmpeg: Required for audio conversion (Whisper.cpp only).
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows (with Chocolatey) choco install ffmpeg
# Install dependencies
npm install
# Build the package
npm run build
# Run tests
npm testSTT Proxy is made with ♥ by derogab and it's released under the MIT license.
If you like this project or directly benefit from it, please consider buying me a coffee:
🔗 bc1qd0qatgz8h62uvnr74utwncc6j5ckfz2v2g4lef
⚡️ derogab@sats.mobi
💶 Sponsor on GitHub