List of supported models along with the TTFB for each. If you want us to add more models, email at founders@tensorfuse.io
| Model Name | Type | TTFB | GPU Used |
|---|---|---|---|
| Orpheus(3B) | TTS | 180 ms | 1xH100 |
| Whisper (coming soon) | STT | ---- | ---- |
- Run with Docker (recommended) for Linux devices: SSH into a remote GPU server and run the following docker command in the terminal. It will start the streaming server on port 7004:
- Note: Orpheus is a gated model on Huggingface, make sure you have access to it.
docker run --gpus=all \
-p 8000:8000 -p 8001:8001 -p 8002:8002 -p 8003:8003 -p 7004:7004 \
--shm-size=1g \
-e HUGGING_FACE_HUB_TOKEN=hf_XXXX \
-v "${HOME}/.cache/huggingface":/cache/huggingface \
tensorfuse/stts:latest --mode tts --model orpheusYou can start streaming audio from the server using the python script below:
import requests
import sseclient # Make sure this is from sseclient-py (pip install sseclient-py)
import logging
import base64
import wave
import json
import os
import sys
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
logger = logging.getLogger("sse-debug")
API_URL = "http://localhost:7004/v1/audio"
TEXT = "<giggle> The quick brown fox jumps over the lazy dog"
MODEL = "orpheus"
OUT = "debug_audio.wav"
def debug_sse():
params = {"text": TEXT, "model": MODEL}
logger.info(f"Requesting: {API_URL} with {params}")
try:
response = requests.post(API_URL, params=params, stream=True)
logger.info(f"HTTP {response.status_code}")
logger.info(f"Headers: {dict(response.headers)}")
logger.info(f"Content-Type: {response.headers.get('content-type','')}")
if 'text/event-stream' not in response.headers.get('content-type', ''):
logger.warning("Server did NOT send the expected text/event-stream Content-Type!")
client = sseclient.SSEClient(response) # Do not read from response first!
print("Connected to SSE, waiting for events:")
all_bytes = bytearray()
sample_rate = None
for event in client.events():
logger.info(f"Received SSE event: {event.data[:70]}{'...' if len(event.data) > 70 else ''}")
if event.data == '[DONE]':
logger.info("Stream finished, got [DONE].")
break
try:
data = json.loads(event.data)
if 'audio' in data:
chunk = base64.b64decode(data['audio'])
all_bytes.extend(chunk)
if not sample_rate:
sample_rate = data.get('rate')
except Exception as e:
logger.error(f"Failed to parse SSE data: {event.data} ({e})")
if all_bytes and sample_rate:
with wave.open(OUT, 'wb') as f:
f.setnchannels(1)
f.setsampwidth(2)
f.setframerate(sample_rate)
f.writeframes(all_bytes)
logger.info(f"Saved audio to {os.path.abspath(OUT)}")
else:
logger.warning("No audio chunks or sample rate detected.")
except Exception as e:
logger.exception(f"Error during SSE client run: {e}")
if __name__ == "__main__":
debug_sse()This repo is built and maintained by Tensorfuse. We're building serverless GPU runtime that lets you self-host AI models of any modality in your own AWS.
If you're building AI agents, Voice Agents, Chatbots, etc and want to customize and deploy models, check out Tensorfuse
Below is the list of all resource to help you get started
| Type | Links |
|---|---|
| Get Started | Start here |
| 📚 Documentation | Read Our Docs |
| Follow us on X | |
| 🔮 Model Library | SOTA models hosted on AWS using Tensorfuse |
| ✍️ Blog | Read our Blogs |
For any enquries, email at founders@tensorfuse.io