Skip to content

Releases: amsehili/auditok

v0.4.2

10 May 06:39

Choose a tag to compare

Improvements:

  • Allow max_trailing_silence to exceed max_silence: after the event
    boundary is decided (at max_silence), additional silent frames are
    collected past the boundary up to max_trailing_silence. Collection stops
    as soon as a new valid frame is detected, so separate events are not merged.
    This lets you use a small max_silence for tight event boundaries while
    keeping enough trailing audio to preserve natural fadeouts. The same option
    is exposed on the CLI as -g/--max-trailing-silence.

Bug fixes:

  • Fix 1-frame gap in leading silence between consecutive events. The silent
    frame that triggered the transition into SILENCE was previously dropped
    instead of being seeded into the leading buffer for the next token, causing
    an unavoidable 1-frame gap regardless of max_leading_silence. Trailing
    frames trimmed by max_trailing_silence are now also seeded into the
    leading buffer of the next token.

v0.4.1

09 Apr 19:51

Choose a tag to compare

Bug fixes:

  • Fix CLI forcing default -r/-w/-c on file input via FFmpegAudioSource
  • Fix Ctrl+C crash and audio glitches during playback with -E
  • Quit correctly when -B is specified for audio playback with progress bar

Improvements:

  • Refactor command line code and tests
  • Show an error message when a command line argument appears more than once
  • Make CLI help text 80-column max

v0.4.0

31 Mar 21:18

Choose a tag to compare

Breaking changes:

  • Drop Python 3.7 support (EOL since June 2023)
  • Drop pydub dependency; ffmpeg is now required directly for non-WAV/raw formats
  • Replace pyaudio with sounddevice for microphone input
  • Remove deprecated AudioRegion.meta accessor (use .start / .end)
  • Remove deprecated AudioRegion.samples property (use .numpy())
  • Make split(), trim(), split_and_plot() keyword-only after the first positional argument
  • Remove dataset module; split core.py into audio.py and core.py
  • Remove setup.py; migrate to pyproject.toml

New features:

  • Add FFmpegAudioSource: streams audio from an ffmpeg subprocess pipe, ~2x faster than pydub's temp-file approach
  • Add trim() to remove leading and trailing silence from audio
  • Add fix_pauses() / remove_pauses() to normalize pauses between audio events
  • Add max_leading_silence parameter to split(), trim(), and StreamTokenizer to preserve natural sound onsets
  • Add max_trailing_silence parameter to control trailing silence independently of max_silence; deprecate drop_trailing_silence
  • split() accepts max_dur=None (or float("inf")) for unlimited event length
  • FFmpegAudioSource accepts sampling_rate, sample_width, channels for on-the-fly conversion
  • from_file() forwards sr/sw/ch to FFmpegAudioSource
  • Use ffmpeg for audio export; save() accepts audio_codec, audio_bitrate, audio_quality, ffmpeg_extra_args
  • AudioRegion._repr_html_() renders inline HTML5 audio player in Jupyter
  • Add interactive Jupyter widget: split_and_plot(interactive=True) with Canvas waveform, clickable regions, playback controls, and time ruler
  • Restructure CLI with subcommands: auditok split (default), auditok trim, auditok fix-pauses
  • Add --max-leading-silence and --max-trailing-silence CLI options
  • Add recording indicator with elapsed time for mic-based trim and fix-pauses
  • Make --drop-trailing-silence deprecated in CLI

Packaging and metadata:

  • Migrate from setup.py to pyproject.toml
  • Make matplotlib, sounddevice, and tqdm optional (pip install auditok[all])
  • Update development status from Alpha to Production/Stable
  • Add VAD, silence detection, and audio segmentation keywords for PyPI
  • Add Python 3.14 support
  • Add type annotations to public API with py.typed marker (PEP 561)
  • Add mypy to pre-commit hooks

Bug fixes:

  • Fix split() using analysis_window instead of actual frame duration after int truncation
  • Validate hop_size and block_size in _OverlapAudioReader
  • Fix matplotlib plot layout: wide figure default, proper legend placement
  • Fix deprecated AudioRegion.meta.start/.end usage in split_and_plot()
  • Suppress C-level ALSA/JACK/OSS warnings from PortAudio during initialization
  • Fix resource leak in split() when generator is not fully consumed

v0.3.0

01 Nov 10:29

Choose a tag to compare

  • Use numpy instead of the audioop (deprecated then removed in Python 13) for signal processing operations
  • Use pytest instead of genty for tests
  • Accept input of type Path for split
  • Remove deprecated ADSFactory
  • Implement AudioRegion.join, make_silence and split_and_join_with_silence
  • Use Github Actions instead of travis for CI
  • Use Codecov for tests coverage

v0.2.0

02 Mar 21:01

Choose a tag to compare

  • Implement split function as a high-level API for tokenization
  • Implement AudioRegion class for simple audio objects manipulation
  • Use a much faster energy computation method (based on standard audioop module)
  • Make ADSFactory deprecated
  • Choose which channel(s) to use for tokenization
  • Save multi-channel audio data
  • Refactor code in all modules
  • Use genty for tests
  • Improve documentation
  • Use ArgumentParser instead of OptionParser in command-line script
  • Clean up command-line script and move functions and workers to dedicated modules
  • Add "timestamp" placehoder to main script
  • Play audio with a progress bar
  • StreamSaverWorker: avoid caching data for a long time in memory, save data regularly to disk
  • Use numpy style for documentation and update theme
  • Ensure pep8 compliance (flake8) and formatting with black
  • Add pre-commit hooks
  • Change license to MIT
  • Add project logo

v0.1.8

01 Nov 06:10

Choose a tag to compare

  • Add command line argument to select audio device used by pyaudio (#17)
  • Add command line argument to select buffer size for pyaudio (#17)

v0.1.7

01 Nov 06:13

Choose a tag to compare

  • Add Python 3.5/3.6 to test scheme
  • Make source pep8 compliant
  • Add shortcut names for AudioSource object properties
  • Fix Python 3 bug with reading binary data from STDIN (#16)