End-to-end ASR with Wav2Vec2 + CTC — 4.15% WER on LibriSpeech test set, beam search decoding, warmup + cosine LR scheduling
-
Updated
Apr 9, 2026 - Python
End-to-end ASR with Wav2Vec2 + CTC — 4.15% WER on LibriSpeech test set, beam search decoding, warmup + cosine LR scheduling
ASR-LE is an advanced ASR evaluation + observability toolkit that goes beyond WER: it shows where errors happen in time, estimates streaming p95 first-word latency, generates “worst moments” automatically, and produces reusable artifacts (report.json, timeline bins, moments, etc.)
TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Convert handwriting images into text (OCR)
The goal of this project is to accurately transcribe handwritten English word images into text using a deep learning model. This is achieved through the use of: The IAM Handwriting Dataset is a labeled image corpus CNN + BLSTM + CTC architecture An end-to-end training and inference pipeline
Neural network trained to regonize short speech commands
CTF writeups and proof files from TCS HackQuest Season 10 cybersecurity challenges.
Sistema OCR con Deep Learning (CRNN + CTC) entrenado con 100k imágenes sintéticas, incluye app web con Gradio
Small CRNN that solves the Ikariam pirate fortress captcha at 97.3% accuracy. ONNX-ready.
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
Korean jamo-level STT for dysarthria assessment support with DeepSpeech2, Simple-Attention, and Transformer model comparison.
A small model for recognition of digits in audio clips
Implementation of Conformer CTC ASR model
Add a description, image, and links to the ctc topic page so that developers can more easily learn about it.
To associate your repository with the ctc topic, visit your repo's landing page and select "manage topics."