Skip to content

Tags: pretyflaco/millet

Tags

v0.12.8

Toggle v0.12.8's commit message
v0.12.8: correct no-headphones mic crosstalk in dual-diarize path

v0.12.7

Toggle v0.12.7's commit message
v0.12.7: single-source path keeps diarized in-room speakers

v0.12.6 routed in-room recordings to the mono path (correctly diarizing the
in-room speakers), but the mono path then remapped diarized speakers onto
YOU/REMOTE by channel energy. On dual-mono audio every speaker is equally
mic-dominant, so that remap collapsed the genuine speakers back into one.

Skip the channel-energy YOU/REMOTE relabeling (and channel correction) when
the recording was detected single-source, keeping the pyannote diarization
result so voiceprint naming can label each in-room speaker.

v0.12.6

Toggle v0.12.6's commit message
v0.12.6: fix in-room multi-speaker collapse in dual-diarize path

The default dual-diarize path assumes the mic (left) channel carries a single
local speaker (labeled YOU) and only diarizes the system (right) channel. For
an in-room recording -- several people sharing one mic, system channel silent
or a duplicate of the mic -- every mic speaker collapsed into one.

Now detect single-source stereo and fall back to the mono path (mix down +
diarize the combined signal), which splits the in-room speakers. Genuine
remote calls (active, distinct system channel) keep using dual-diarize.

- _is_single_source_stereo: True when the system channel's active-sample RMS
  is below system_inactive_rms_ratio (0.10) of the mic's, OR the channels'
  Pearson correlation is >= channel_duplicate_corr (0.98). Conservative on
  analysis failure (keeps dual-diarize).
- _load_stereo_int16: ffmpeg-based stereo decode (wav/ogg).
- CLI --single-source-fallback/--no-single-source-fallback (default on);
  TranscriptionConfig.single_source_fallback + the two thresholds.
- Tests: silent/duplicate/decorrelated detection + dispatch fallback + the
  no-regression guard for real remote calls.

v0.12.5

Toggle v0.12.5's commit message
v0.12.5: title-aware schedule matching + sync collision guard

detect_meeting_type now considers the session title: a titled session only
auto-matches a schedule whose name/folder slug equals the title slug,
otherwise it returns None so the caller files it under its own folder. This
stops an ad-hoc meeting recorded inside a schedule window (e.g. a "post-scrum"
at 09:03 inside the 06:30-09:30 standup window) from being misfiled as the
scheduled meeting. Untitled sessions keep the prior pure time-window behavior.

sync_session writes a local-only .session-id marker into each synced folder
and disambiguates (<folder>-<sessionid-suffix>) instead of overwriting when an
existing folder belongs to a different session. The marker is registered in
the clone's .git/info/exclude so it is never committed/pushed and never trips
the uncommitted-changes guard.

Pairs with vezir v0.7.16 (title injection + sync-as override).

v0.12.4

Toggle v0.12.4's commit message
v0.12.4: robust language detection + sync exit-code

Language: whisperx detects from only the first ~30s of each channel, so a
misleading opener (e.g. an opening 'Gracias') mislabeled an English meeting
as Spanish even after the dominant-channel fix.
- Multi-window detection: sample N windows across each channel via
  faster-whisper's detect_language(language_detection_segments=N) instead of
  the first-30s guess (whisperx backend; --language-detection-segments,
  default 6).
- Soft default-language bias: --default-language <lang> keeps the team default
  unless a channel confidently detects another language
  (>= default_language_override_confidence, default 0.70). Fed into the
  dominant-channel selection.

Sync: cli/sync.py now raises SystemExit(1) when any session fails (e.g. git
push rejected) instead of exiting 0 — so callers no longer rely on scraping
the log to notice a failed sync.

Tests: +default-language bias, +CLI sync exit-code. Full suite 295 pass,
7 pre-existing env-only failures.

v0.12.3

Toggle v0.12.3's commit message
v0.12.3: summary language from dominant channel + per-language summaries

In the dual-channel paths the transcript/summary language was taken from
the mic channel only. A local speaker's minority-language asides (e.g. a
few Portuguese phrases) made the whole summary that language even when the
meeting was mostly English on the system channel.

- Summary/transcript language now follows the channel with the most speech
  (_dominant_channel_language); mic wins exact ties.
- Each channel is word-aligned with its OWN detected language
  (_align_channel) instead of sharing the mic's language model.
- apply_labels gains summary_language: regenerate the summary in a chosen
  language and save it as an ADDITIONAL <base>.summary.<lang>.md (with
  suffixed meta/frontmatter sidecars), preserving the primary auto-detected
  summary. MeetingSummary.save gains lang_suffix.
- sync: <base>.summary.<lang>.md syncs as a distinct summary.<lang>.md;
  .frontmatter.json is excluded (also fixes a latent collision where the
  frontmatter sidecar could be pushed as transcript.json).

Tests: +8 (dominant-language selection, additional-language save/override).
Full suite 285 pass, 7 pre-existing env-only failures.

v0.12.2

Toggle v0.12.2's commit message
v0.12.2: suppress phantom remote speakers in dual-diarize

pyannote can over-segment a single remote stream into multiple clusters
(e.g. peeling short backchannel "yeah/cool/awesome" off the main speaker
into a phantom), which voiceprint matching then mis-names from a weak,
barely-over-threshold match.

- Voiceprint auto-apply gate: a match at/above MATCH_THRESHOLD is applied
  only if it has enough embeddable speech AND is unambiguous (strong
  absolute confidence OR a clear margin over the runner-up profile).
  SpeakerMatch gains evidence_seconds + margin; identify_speakers computes
  the per-cluster margin. Weak/ambiguous matches stay raw and route to
  needs_labeling instead of confidently mislabeling (e.g. the observed
  0.69/0.13-margin false positive). Sidecar records only applied matches.
- Remote-cluster consolidation (dual-diarize): merge same-speaker clusters
  (voiceprint cosine >= cluster_merge_similarity) and absorb thin clusters
  (< cluster_min_speech_seconds embeddable) into the dominant remote; attach
  trivial unassigned segments to the nearest remote so a 0.4s one-liner no
  longer surfaces as a generic REMOTE. Behind --no-consolidate-remote-clusters.

Validated on a real 2-speaker session (4 speakers -> 2 + 1 raw, no false
name) and a 13-speaker session (no legit speaker suppressed).

Tests: +18 (consolidation merge/absorb/no-over-merge/orphan/config + gate
policy). Full suite 277 pass, 7 pre-existing env-only failures.

v0.12.1

Toggle v0.12.1's commit message
v0.12.1: fix label --auto discarding matches in non-interactive runs

label --auto auto-applied confident voiceprint matches, then prompted
interactively for unrecognized speakers.  In the vezir worker (no TTY)
click.prompt hit EOF -> Abort, discarding ALL matches before they were
written.  Meetings with fully-recognizable speakers were left stuck in
needs_labeling with raw SPEAKER_N ids.

Now: when stdin is not a TTY, skip prompting -- apply auto-matches, leave
unmatched speakers as raw ids.

Also adds a *.autoid.json sidecar (name + confidence per speaker, keyed by
final transcript id) so vezir's labeling screen can pre-fill recognized
names and show confidence.  Excluded from sync + transcript resolution.
3 new tests.

v0.12.0

Toggle v0.12.0's commit message
v0.12.0: dual-diarize default — per-channel ASR + remote speaker diar…

…ization

New default mixdown for stereo: dual-diarize.  Transcribes mic and system
channels separately (Kemal = continuous YOU from mic, immune to overlap),
then runs pyannote diarization on the system channel only to split distinct
remote speakers (Openoms/Jonas/Max/...).  Overlapping segments preserved.

Eliminates the overlap-fragmentation bug where mono+diarization flickered
words between speakers during talk-over ('This year' -> Openoms, 'they' ->
Kemal, 'rented the' -> Openoms, 'whole island' -> Kemal — Kemal said the
entire sentence).

Also includes:
- Channel-energy correction (mono path, --channel-correct): per-segment/word
  RMS reassignment for turn-boundary leaks; on by default for --mixdown mono.
  --channel-correct-margin (default 0.30) for tuning.
- DNS-retry hardening for millet sync git operations (clone/pull/push):
  transient DNS failures auto-retry 5x with backoff.
- 11 new channel-correction tests; default-mixdown test updated.

Validated on DEVSTANDUP (5spk), LUKAS_2 (2spk), AB_BOARD (4spk .ogg):
overlap-fragmentation eliminated, all distinct remote speakers preserved.

v0.11.0

Toggle v0.11.0's commit message
v0.11.0: opt-in Parakeet ASR backend (onnx-asr, English, CUDA)

Add a third ASR backend alongside whisperx and mlx: NVIDIA Parakeet TDT
via onnx-asr (ONNX Runtime, pure-Python — no extra torch/transformers).
Opt-in via --asr-backend parakeet; auto selection unchanged.

- millet/parakeet.py: backend + Silero VAD chunking for long audio
  (Parakeet's ~20-30s per-utterance limit), WhisperX-shaped output
  contract, cuDNN/cuBLAS ctypes preload so onnxruntime-gpu finds the
  torch-bundled CUDA libs, HF-cache completeness check.
- transcribe.py: parakeet backend validation, _transcribe_asr dispatch,
  config B (native timestamps, default) / C (--parakeet-keep-alignment)
  alignment toggle.
- cli: --asr-backend parakeet, --parakeet-model, --parakeet-keep-alignment;
  millet download parakeet (explicit, lazy model fetch).
- [parakeet] optional extra (onnx-asr[hub]); scripts/bench_asr.py harness
  + benchmark results doc.
- tests/test_parakeet.py: 12 tests (contract, B/C wiring, validation,
  dispatch, availability guard).

Benchmark note: on a 3090, whisperx is faster than Parakeet; Parakeet's
value is finer segmentation, not speed. Stays opt-in pending further
validation.