Tags · pretyflaco/millet

v0.12.8

v0.12.8: correct no-headphones mic crosstalk in dual-diarize path

Jun 15, 2026
2f1c0c9
zip
tar.gz

v0.12.7

v0.12.7: single-source path keeps diarized in-room speakers

v0.12.6 routed in-room recordings to the mono path (correctly diarizing the
in-room speakers), but the mono path then remapped diarized speakers onto
YOU/REMOTE by channel energy. On dual-mono audio every speaker is equally
mic-dominant, so that remap collapsed the genuine speakers back into one.

Skip the channel-energy YOU/REMOTE relabeling (and channel correction) when
the recording was detected single-source, keeping the pyannote diarization
result so voiceprint naming can label each in-room speaker.

Jun 7, 2026
08d373b
zip
tar.gz
Notes
Downloads

v0.12.6

v0.12.6: fix in-room multi-speaker collapse in dual-diarize path

The default dual-diarize path assumes the mic (left) channel carries a single
local speaker (labeled YOU) and only diarizes the system (right) channel. For
an in-room recording -- several people sharing one mic, system channel silent
or a duplicate of the mic -- every mic speaker collapsed into one.

Now detect single-source stereo and fall back to the mono path (mix down +
diarize the combined signal), which splits the in-room speakers. Genuine
remote calls (active, distinct system channel) keep using dual-diarize.

- _is_single_source_stereo: True when the system channel's active-sample RMS
  is below system_inactive_rms_ratio (0.10) of the mic's, OR the channels'
  Pearson correlation is >= channel_duplicate_corr (0.98). Conservative on
  analysis failure (keeps dual-diarize).
- _load_stereo_int16: ffmpeg-based stereo decode (wav/ogg).
- CLI --single-source-fallback/--no-single-source-fallback (default on);
  TranscriptionConfig.single_source_fallback + the two thresholds.
- Tests: silent/duplicate/decorrelated detection + dispatch fallback + the
  no-regression guard for real remote calls.

Jun 7, 2026
b7ed363
zip
tar.gz
Notes
Downloads

v0.12.5

v0.12.5: title-aware schedule matching + sync collision guard

detect_meeting_type now considers the session title: a titled session only
auto-matches a schedule whose name/folder slug equals the title slug,
otherwise it returns None so the caller files it under its own folder. This
stops an ad-hoc meeting recorded inside a schedule window (e.g. a "post-scrum"
at 09:03 inside the 06:30-09:30 standup window) from being misfiled as the
scheduled meeting. Untitled sessions keep the prior pure time-window behavior.

sync_session writes a local-only .session-id marker into each synced folder
and disambiguates (<folder>-<sessionid-suffix>) instead of overwriting when an
existing folder belongs to a different session. The marker is registered in
the clone's .git/info/exclude so it is never committed/pushed and never trips
the uncommitted-changes guard.

Pairs with vezir v0.7.16 (title injection + sync-as override).

Jun 3, 2026
fc440f1
zip
tar.gz
Notes
Downloads

v0.12.4

v0.12.4: robust language detection + sync exit-code

Language: whisperx detects from only the first ~30s of each channel, so a
misleading opener (e.g. an opening 'Gracias') mislabeled an English meeting
as Spanish even after the dominant-channel fix.
- Multi-window detection: sample N windows across each channel via
  faster-whisper's detect_language(language_detection_segments=N) instead of
  the first-30s guess (whisperx backend; --language-detection-segments,
  default 6).
- Soft default-language bias: --default-language <lang> keeps the team default
  unless a channel confidently detects another language
  (>= default_language_override_confidence, default 0.70). Fed into the
  dominant-channel selection.

Sync: cli/sync.py now raises SystemExit(1) when any session fails (e.g. git
push rejected) instead of exiting 0 — so callers no longer rely on scraping
the log to notice a failed sync.

Tests: +default-language bias, +CLI sync exit-code. Full suite 295 pass,
7 pre-existing env-only failures.

Jun 2, 2026
2bfd947
zip
tar.gz
Notes

v0.12.3

v0.12.3: summary language from dominant channel + per-language summaries

In the dual-channel paths the transcript/summary language was taken from
the mic channel only. A local speaker's minority-language asides (e.g. a
few Portuguese phrases) made the whole summary that language even when the
meeting was mostly English on the system channel.

- Summary/transcript language now follows the channel with the most speech
  (_dominant_channel_language); mic wins exact ties.
- Each channel is word-aligned with its OWN detected language
  (_align_channel) instead of sharing the mic's language model.
- apply_labels gains summary_language: regenerate the summary in a chosen
  language and save it as an ADDITIONAL <base>.summary.<lang>.md (with
  suffixed meta/frontmatter sidecars), preserving the primary auto-detected
  summary. MeetingSummary.save gains lang_suffix.
- sync: <base>.summary.<lang>.md syncs as a distinct summary.<lang>.md;
  .frontmatter.json is excluded (also fixes a latent collision where the
  frontmatter sidecar could be pushed as transcript.json).

Tests: +8 (dominant-language selection, additional-language save/override).
Full suite 285 pass, 7 pre-existing env-only failures.

Jun 2, 2026
173b552
zip
tar.gz
Notes

v0.12.2

v0.12.2: suppress phantom remote speakers in dual-diarize

pyannote can over-segment a single remote stream into multiple clusters
(e.g. peeling short backchannel "yeah/cool/awesome" off the main speaker
into a phantom), which voiceprint matching then mis-names from a weak,
barely-over-threshold match.

- Voiceprint auto-apply gate: a match at/above MATCH_THRESHOLD is applied
  only if it has enough embeddable speech AND is unambiguous (strong
  absolute confidence OR a clear margin over the runner-up profile).
  SpeakerMatch gains evidence_seconds + margin; identify_speakers computes
  the per-cluster margin. Weak/ambiguous matches stay raw and route to
  needs_labeling instead of confidently mislabeling (e.g. the observed
  0.69/0.13-margin false positive). Sidecar records only applied matches.
- Remote-cluster consolidation (dual-diarize): merge same-speaker clusters
  (voiceprint cosine >= cluster_merge_similarity) and absorb thin clusters
  (< cluster_min_speech_seconds embeddable) into the dominant remote; attach
  trivial unassigned segments to the nearest remote so a 0.4s one-liner no
  longer surfaces as a generic REMOTE. Behind --no-consolidate-remote-clusters.

Validated on a real 2-speaker session (4 speakers -> 2 + 1 raw, no false
name) and a 13-speaker session (no legit speaker suppressed).

Tests: +18 (consolidation merge/absorb/no-over-merge/orphan/config + gate
policy). Full suite 277 pass, 7 pre-existing env-only failures.

Jun 2, 2026
3a7e238
zip
tar.gz
Notes

v0.12.1

v0.12.1: fix label --auto discarding matches in non-interactive runs

label --auto auto-applied confident voiceprint matches, then prompted
interactively for unrecognized speakers.  In the vezir worker (no TTY)
click.prompt hit EOF -> Abort, discarding ALL matches before they were
written.  Meetings with fully-recognizable speakers were left stuck in
needs_labeling with raw SPEAKER_N ids.

Now: when stdin is not a TTY, skip prompting -- apply auto-matches, leave
unmatched speakers as raw ids.

Also adds a *.autoid.json sidecar (name + confidence per speaker, keyed by
final transcript id) so vezir's labeling screen can pre-fill recognized
names and show confidence.  Excluded from sync + transcript resolution.
3 new tests.

Jun 2, 2026
a78b65d
zip
tar.gz
Notes

v0.12.0

v0.12.0: dual-diarize default — per-channel ASR + remote speaker diar…

…ization

New default mixdown for stereo: dual-diarize.  Transcribes mic and system
channels separately (Kemal = continuous YOU from mic, immune to overlap),
then runs pyannote diarization on the system channel only to split distinct
remote speakers (Openoms/Jonas/Max/...).  Overlapping segments preserved.

Eliminates the overlap-fragmentation bug where mono+diarization flickered
words between speakers during talk-over ('This year' -> Openoms, 'they' ->
Kemal, 'rented the' -> Openoms, 'whole island' -> Kemal — Kemal said the
entire sentence).

Also includes:
- Channel-energy correction (mono path, --channel-correct): per-segment/word
  RMS reassignment for turn-boundary leaks; on by default for --mixdown mono.
  --channel-correct-margin (default 0.30) for tuning.
- DNS-retry hardening for millet sync git operations (clone/pull/push):
  transient DNS failures auto-retry 5x with backoff.
- 11 new channel-correction tests; default-mixdown test updated.

Validated on DEVSTANDUP (5spk), LUKAS_2 (2spk), AB_BOARD (4spk .ogg):
overlap-fragmentation eliminated, all distinct remote speakers preserved.

Jun 1, 2026
5b53386
zip
tar.gz
Notes

v0.11.0

v0.11.0: opt-in Parakeet ASR backend (onnx-asr, English, CUDA)

Add a third ASR backend alongside whisperx and mlx: NVIDIA Parakeet TDT
via onnx-asr (ONNX Runtime, pure-Python — no extra torch/transformers).
Opt-in via --asr-backend parakeet; auto selection unchanged.

- millet/parakeet.py: backend + Silero VAD chunking for long audio
  (Parakeet's ~20-30s per-utterance limit), WhisperX-shaped output
  contract, cuDNN/cuBLAS ctypes preload so onnxruntime-gpu finds the
  torch-bundled CUDA libs, HF-cache completeness check.
- transcribe.py: parakeet backend validation, _transcribe_asr dispatch,
  config B (native timestamps, default) / C (--parakeet-keep-alignment)
  alignment toggle.
- cli: --asr-backend parakeet, --parakeet-model, --parakeet-keep-alignment;
  millet download parakeet (explicit, lazy model fetch).
- [parakeet] optional extra (onnx-asr[hub]); scripts/bench_asr.py harness
  + benchmark results doc.
- tests/test_parakeet.py: 12 tests (contract, B/C wiring, validation,
  dispatch, availability guard).

Benchmark note: on a 3090, whisperx is faster than Parakeet; Parakeet's
value is finer segmentation, not speed. Stays opt-in pending further
validation.

May 30, 2026
13d217e
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.8

v0.12.7

v0.12.6

v0.12.5

v0.12.4

v0.12.3

v0.12.2

v0.12.1

v0.12.0

v0.11.0

Tags: pretyflaco/millet