Skip to content

eellak/gsoc2026-opencouncil-stt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenCouncil Greek ASR Fine-Tuning Notes

Working vault for a GSoC 2026 project to fine-tune Whisper on Greek municipal council speech, with LLM post-correction. Holds dataset exploration notes, decisions, specs, and the local review-UI prototype.

No training yet — dataset exploration comes first.

Start here:

Vault Rules

  • CURRENT.md is the first file to read and should stay short.
  • docs/decisions/ — accepted decisions and open questions, split by theme. See decisions index.
  • docs/progress.md — where we are against the GSoC plan. The plan itself is in the proposal.
  • docs/roadmap.md — phases and current direction.
  • docs/meetings/ — normalized meeting notes.
  • docs/specs/ — product and implementation specs.
  • docs/logs/ — weekly digests only. See logs index for cadence rules.
  • docs/reference/ — stable technical references.
  • archive/ — superseded material, local only (gitignored).
  • Data outputs live under data/; scripts live under scripts/.
  • CLAUDE.md is the single source of truth for assistant instructions; AGENTS.md is a symlink to it for tools that read that filename.

Current Dataset Outputs

Regenerate the cleaned data:

rtk python3 scripts/preprocess_corrections.py

Local UI Prototype

The SvelteKit correction-review prototype lives in ui/. It ingests the May 12 CSV into local SQLite and supports review labels, timestamp adjustments, stats, and included-row export. Meeting JSON matching (utterance IDs, speaker context, surrounding transcript) is the next gap to close.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors