😮💨
I am a Ph.D. graduate from South China University of Technology, with research interests in OCR, text image processing,and document understanding.
-
SCUT
- Guangzhou
-
20:54
(UTC +08:00) - https://scholar.google.com/citations?user=dW7AgfgAAAAJ&hl=zh-CN
Stars
6
stars
written in C++
Clear filter
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
FlashMLA: Efficient Multi-head Latent Attention Kernels
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
A tensorflow implementation of EAST text detector
Geometric Augmentation for Text Image