Releases: amsehili/auditok
Releases · amsehili/auditok
v0.4.2
Improvements:
- Allow
max_trailing_silenceto exceedmax_silence: after the event
boundary is decided (atmax_silence), additional silent frames are
collected past the boundary up tomax_trailing_silence. Collection stops
as soon as a new valid frame is detected, so separate events are not merged.
This lets you use a smallmax_silencefor tight event boundaries while
keeping enough trailing audio to preserve natural fadeouts. The same option
is exposed on the CLI as-g/--max-trailing-silence.
Bug fixes:
- Fix 1-frame gap in leading silence between consecutive events. The silent
frame that triggered the transition intoSILENCEwas previously dropped
instead of being seeded into the leading buffer for the next token, causing
an unavoidable 1-frame gap regardless ofmax_leading_silence. Trailing
frames trimmed bymax_trailing_silenceare now also seeded into the
leading buffer of the next token.
v0.4.1
Bug fixes:
- Fix CLI forcing default
-r/-w/-con file input viaFFmpegAudioSource - Fix Ctrl+C crash and audio glitches during playback with
-E - Quit correctly when
-Bis specified for audio playback with progress bar
Improvements:
- Refactor command line code and tests
- Show an error message when a command line argument appears more than once
- Make CLI help text 80-column max
v0.4.0
Breaking changes:
- Drop Python 3.7 support (EOL since June 2023)
- Drop pydub dependency; ffmpeg is now required directly for non-WAV/raw formats
- Replace pyaudio with sounddevice for microphone input
- Remove deprecated
AudioRegion.metaaccessor (use.start/.end) - Remove deprecated
AudioRegion.samplesproperty (use.numpy()) - Make
split(),trim(),split_and_plot()keyword-only after the first positional argument - Remove
datasetmodule; splitcore.pyintoaudio.pyandcore.py - Remove
setup.py; migrate topyproject.toml
New features:
- Add
FFmpegAudioSource: streams audio from an ffmpeg subprocess pipe, ~2x faster than pydub's temp-file approach - Add
trim()to remove leading and trailing silence from audio - Add
fix_pauses()/remove_pauses()to normalize pauses between audio events - Add
max_leading_silenceparameter tosplit(),trim(), andStreamTokenizerto preserve natural sound onsets - Add
max_trailing_silenceparameter to control trailing silence independently ofmax_silence; deprecatedrop_trailing_silence split()acceptsmax_dur=None(orfloat("inf")) for unlimited event lengthFFmpegAudioSourceacceptssampling_rate,sample_width,channelsfor on-the-fly conversionfrom_file()forwardssr/sw/chtoFFmpegAudioSource- Use ffmpeg for audio export;
save()acceptsaudio_codec,audio_bitrate,audio_quality,ffmpeg_extra_args AudioRegion._repr_html_()renders inline HTML5 audio player in Jupyter- Add interactive Jupyter widget:
split_and_plot(interactive=True)with Canvas waveform, clickable regions, playback controls, and time ruler - Restructure CLI with subcommands:
auditok split(default),auditok trim,auditok fix-pauses - Add
--max-leading-silenceand--max-trailing-silenceCLI options - Add recording indicator with elapsed time for mic-based
trimandfix-pauses - Make
--drop-trailing-silencedeprecated in CLI
Packaging and metadata:
- Migrate from
setup.pytopyproject.toml - Make
matplotlib,sounddevice, andtqdmoptional (pip install auditok[all]) - Update development status from Alpha to Production/Stable
- Add VAD, silence detection, and audio segmentation keywords for PyPI
- Add Python 3.14 support
- Add type annotations to public API with
py.typedmarker (PEP 561) - Add mypy to pre-commit hooks
Bug fixes:
- Fix
split()usinganalysis_windowinstead of actual frame duration after int truncation - Validate
hop_sizeandblock_sizein_OverlapAudioReader - Fix matplotlib plot layout: wide figure default, proper legend placement
- Fix deprecated
AudioRegion.meta.start/.endusage insplit_and_plot() - Suppress C-level ALSA/JACK/OSS warnings from PortAudio during initialization
- Fix resource leak in
split()when generator is not fully consumed
v0.3.0
- Use
numpyinstead of theaudioop(deprecated then removed in Python 13) for signal processing operations - Use
pytestinstead ofgentyfor tests - Accept input of type
Pathforsplit - Remove deprecated
ADSFactory - Implement
AudioRegion.join,make_silenceandsplit_and_join_with_silence - Use Github Actions instead of travis for CI
- Use Codecov for tests coverage
v0.2.0
- Implement
splitfunction as a high-level API for tokenization - Implement
AudioRegionclass for simple audio objects manipulation - Use a much faster energy computation method (based on standard
audioopmodule) - Make
ADSFactorydeprecated - Choose which channel(s) to use for tokenization
- Save multi-channel audio data
- Refactor code in all modules
- Use
gentyfor tests - Improve documentation
- Use
ArgumentParserinstead ofOptionParserin command-line script - Clean up command-line script and move functions and workers to dedicated modules
- Add "timestamp" placehoder to main script
- Play audio with a progress bar
StreamSaverWorker: avoid caching data for a long time in memory, save data regularly to disk- Use
numpystyle for documentation and update theme - Ensure pep8 compliance (
flake8) and formatting withblack - Add
pre-commithooks - Change license to MIT
- Add project logo