Careful Whisper

No careless whispers, only words colored with confidence.

A whisper-based, lightweight transcription pipeline for processing audio and/or video files in batch mode, generating accurate transcriptions with color-coded confidence scores to indicate the reliability of each transcribed segment. For the demo of whisper's capability, visit replicate.com/openai/whisper.

Environment Setup

Install the python dependencies listed in requirements.txt:

pip install -r requirements.txt

Make sure required system-level dependencies are installed. If not, you may need to install ffmpeg (e.g., on Debian/Ubuntu-based distributions) via:

sudo apt-get update
sudo apt-get install ffmpeg

Supported Arguments

$ python pipe.py -h
usage: pipe.py [-h] [--output-html-dir OUTPUT_HTML_DIR] [--output-json-dir OUTPUT_JSON_DIR] [--no-json]
    [--logs-dir LOGS_DIR] [--recursive] [--overwrite | --no-overwrite] [--no-timestamp-to-name] 
    [--model {tiny,base,small,medium,large,turbo}] [--language LANGUAGE] [--word-timestamps]
    [--audio-format AUDIO_FORMAT] [--keep-intermediate-audio] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] 
    [--log-file LOG_FILE] path

Whisper-based transcription pipeline (single file or batch directory).

options:
  -h, --help            show this help message and exit

I/O:
  path                  Path to a media file (single run) OR a directory (batch run).
  --output-html-dir OUTPUT_HTML_DIR
                        Directory to write highlighted HTML outputs. (default: artifacts/html)
  --output-json-dir OUTPUT_JSON_DIR
                        Directory to dump raw Whisper JSON (use --no-json to disable). (default: artifacts/json)
  --no-json             Disable writing raw JSON transcription outputs. (default: False)
  --logs-dir LOGS_DIR   Directory for pipeline logs. (default: artifacts/logs)

Run mode:
  --recursive           When PATH is a directory, search recursively for media files. (default: False)
  --overwrite, --no-overwrite
                        Allow overwriting existing HTML outputs. (default: True)
  --no-timestamp-to-name
                        Disable appending (YYMMDD_HHMM) to output filename stem. (default: False)

Model & transcription:
  --model {tiny,base,small,medium,large,turbo}
                        Which Whisper model to use. (default: base)
  --language LANGUAGE   Force a language (name like 'English' or code like 'en'). Default: auto. (default: None)
  --word-timestamps     Enable word-level timestamps (experimental). (default: False)

Media conversion:
  --audio-format AUDIO_FORMAT
                        Temp audio format used when converting video to audio. (default: wav)
  --keep-intermediate-audio
                        Persist the converted temp audio file instead of deleting it. (default: False)

Logging:
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Console and file log level. (default: INFO)
  --log-file LOG_FILE   Log filename inside --logs-dir. (default: whisper.log)

Supported models: tiny, base, small, medium, large, turbo.
Supported languages: 
    ==> Default: auto-detect
    ==> You may pass a language *name* (e.g., 'English', 'French') or a code (e.g., 'en', 'fr')
    ==> Known names (e.g., en): see https://github.com/openai/whisper/blob/main/whisper/tokenizer.py for details

Sample usage

For demonstration purposes, you can try the following workflow in a fresh Jupyter notebook on Google Colab:

# Clone the repository
!git clone https://github.com/williamqwu/careful-whisper.git careful_whisper
%cd careful_whisper

# Verify the working directory
!pwd
!ls

# Install Python dependencies
!pip install -r requirements.txt

# Download a sample audio file
!wget -O test.wav https://cdn.openai.com/whisper/draft-20220913a/younha.wav

# Run the pipeline (note: place options before positional arguments)
!python pipe.py --model tiny test.wav

Future Plan

Implement smarter paragraph splitting.
Add configurable input, output, and model preferences via argument parsing.
Add a demonstration test program.
Speaker diarization: differentiate between voices of different individuals.
Word-level confidence scoring: provide per-word reliability estimates.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pipe.py		pipe.py
requirements.txt		requirements.txt
util_general.py		util_general.py
util_html.py		util_html.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Careful Whisper

Environment Setup

Supported Arguments

Sample usage

Future Plan

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

williamqwu/careful-whisper

Folders and files

Latest commit

History

Repository files navigation

Careful Whisper

Environment Setup

Supported Arguments

Sample usage

Future Plan

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages