-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long time interval when synthesizing Chinese text-to-speech #930
Comments
I found https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/tokenize/_basic_sent.py. Maybe I need a chinese version of it |
Hey yes, |
@zhanghx0905 are you interested in helping to make this better for Chinese? I think we'd need chinese period |
I will make some attempts and see what I can do for this issue. |
TEN-Agent is founded by a Chinese team. I checked their implementation, and it looks not complicated, calling TTS when special symbol is matched. self.sentence_expr = re.compile(r".+?[,,.。!!??::]", re.DOTALL) I implemented similar logic for the livekit agent: import functools
import re
from dataclasses import dataclass
from typing import List, Tuple
from livekit.agents.tokenize import token_stream, tokenizer
_sentence_pattern = re.compile(r".+?[,,.。!!??::]", re.DOTALL)
@dataclass
class _TokenizerOptions:
language: str
min_sentence_len: int
stream_context_len: int
class ChineseSentenceTokenizer(tokenizer.SentenceTokenizer):
def __init__(
self,
*,
language: str = "chinese",
min_sentence_len: int = 10,
stream_context_len: int = 10,
) -> None:
self._config = _TokenizerOptions(
language=language,
min_sentence_len=min_sentence_len,
stream_context_len=stream_context_len,
)
def tokenize(self, text: str, *, language: str | None = None) -> List[str]:
sentences = self.chinese_sentence_segmentation(text)
return [sentence[0] for sentence in sentences]
def stream(self, *, language: str | None = None) -> tokenizer.SentenceStream:
return token_stream.BufferedSentenceStream(
tokenizer=functools.partial(self.chinese_sentence_segmentation),
min_token_len=self._config.min_sentence_len,
min_ctx_len=self._config.stream_context_len,
)
def chinese_sentence_segmentation(self, text: str) -> List[Tuple[str, int, int]]:
result = []
start_pos = 0
for match in _sentence_pattern.finditer(text):
sentence = match.group(0)
end_pos = match.end()
sentence = sentence.strip()
if sentence:
result.append((sentence, start_pos, end_pos))
start_pos = end_pos
if start_pos < len(text):
sentence = text[start_pos:].strip()
if sentence:
result.append((sentence, start_pos, len(text)))
return result You can use this class as following, agent = VoicePipelineAgent(
# ...
transcription=AgentTranscriptionOptions(
sentence_tokenizer=ChineseSentenceTokenizer(),
),
) |
I have more questions. |
|
I think I've finally figured out what most of the parameters really mean, and here's my final solution from livekit.agents import tts as _tts
tts = _tts.StreamAdapter(
tts=openai.TTS(base_url=OPENAI_BASEURL),
sentence_tokenizer=ChineseSentenceTokenizer(min_sentence_len=10),
)
agent = VoicePipelineAgent(
#...
tts=tts,
) |
I have encountered an issue with the voice assistant when synthesizing Chinese text. The time interval between LLM and synthesized speech outputs is noticeably longer when the output is in Chinese compared to English. This issue does not occur when the output is in English, where the speech synthesis proceeds without any delay.
I noticed that the TTS speech synthesis almost always starts only after the LLM output has fully completed. I think there is something wrong with tokenizers.
That's my code,
The text was updated successfully, but these errors were encountered: