GitHub

STTS is an open-source audio inference server built specifcially for voice models. It lets you self-host the latest open-source STT/TTS models with streaming API in your own infrastructure.

✨ Supported Models

List of supported models along with the TTFB for each. If you want us to add more models, email at founders@tensorfuse.io

Model Name	Type	TTFB	GPU Used
Orpheus(3B)	TTS	180 ms	1xH100
Whisper (coming soon)	STT	----	----

⚡ Quickstart

Run with Docker (recommended) for Linux devices: SSH into a remote GPU server and run the following docker command in the terminal. It will start the streaming server on port 7004:
Note: Orpheus is a gated model on Huggingface, make sure you have access to it.

docker run --gpus=all \
  -p 8000:8000 -p 8001:8001 -p 8002:8002 -p 8003:8003 -p 7004:7004 \
  --shm-size=1g \
  -e HUGGING_FACE_HUB_TOKEN=hf_XXXX \
  -v "${HOME}/.cache/huggingface":/cache/huggingface \
  tensorfuse/stts:latest --mode tts --model orpheus

You can start streaming audio from the server using the python script below:

import requests
import sseclient  # Make sure this is from sseclient-py (pip install sseclient-py)
import logging
import base64
import wave
import json
import os
import sys

logging.basicConfig(level=logging.INFO, stream=sys.stdout)
logger = logging.getLogger("sse-debug")
API_URL = "http://localhost:7004/v1/audio"
TEXT = "<giggle> The quick brown fox jumps over the lazy dog"
MODEL = "orpheus"
OUT = "debug_audio.wav"

def debug_sse():
    params = {"text": TEXT, "model": MODEL}
    logger.info(f"Requesting: {API_URL} with {params}")
    try:
        response = requests.post(API_URL, params=params, stream=True)
        logger.info(f"HTTP {response.status_code}")
        logger.info(f"Headers: {dict(response.headers)}")
        logger.info(f"Content-Type: {response.headers.get('content-type','')}")
        if 'text/event-stream' not in response.headers.get('content-type', ''):
            logger.warning("Server did NOT send the expected text/event-stream Content-Type!")

        client = sseclient.SSEClient(response)  # Do not read from response first!
        print("Connected to SSE, waiting for events:")
        all_bytes = bytearray()
        sample_rate = None

        for event in client.events():
            logger.info(f"Received SSE event: {event.data[:70]}{'...' if len(event.data) > 70 else ''}")
            if event.data == '[DONE]':
                logger.info("Stream finished, got [DONE].")
                break
            try:
                data = json.loads(event.data)
                if 'audio' in data:
                    chunk = base64.b64decode(data['audio'])
                    all_bytes.extend(chunk)
                    if not sample_rate:
                        sample_rate = data.get('rate')
            except Exception as e:
                logger.error(f"Failed to parse SSE data: {event.data} ({e})")

        if all_bytes and sample_rate:
            with wave.open(OUT, 'wb') as f:
                f.setnchannels(1)
                f.setsampwidth(2)
                f.setframerate(sample_rate)
                f.writeframes(all_bytes)
            logger.info(f"Saved audio to {os.path.abspath(OUT)}")
        else:
            logger.warning("No audio chunks or sample rate detected.")

    except Exception as e:
        logger.exception(f"Error during SSE client run: {e}")

if __name__ == "__main__":
    debug_sse()

About

This repo is built and maintained by Tensorfuse. We're building serverless GPU runtime that lets you self-host AI models of any modality in your own AWS.

If you're building AI agents, Voice Agents, Chatbots, etc and want to customize and deploy models, check out Tensorfuse

Below is the list of all resource to help you get started

🔗 Links and Resources

Type	Links
Get Started	Start here
📚 Documentation	Read Our Docs
Twitter (aka X)	Follow us on X
🔮 Model Library	SOTA models hosted on AWS using Tensorfuse
✍️ Blog	Read our Blogs

For any enquries, email at founders@tensorfuse.io

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
stts		stts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STTS is an open-source audio inference server built specifcially for voice models. It lets you self-host the latest open-source STT/TTS models with streaming API in your own infrastructure.

✨ Supported Models

⚡ Quickstart

About

🔗 Links and Resources

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

tensorfuse/stts

Folders and files

Latest commit

History

Repository files navigation

STTS is an open-source audio inference server built specifcially for voice models. It lets you self-host the latest open-source STT/TTS models with streaming API in your own infrastructure.

✨ Supported Models

⚡ Quickstart

About

🔗 Links and Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages