Pipeline for adding Backchannel= and Coconstruct= annotations to SST speaker-view CoNLL-U files.
src/sst/- source SST CoNLL-U files (
train,dev,test,merged)
- source SST CoNLL-U files (
lexicon/- lexical resources (e.g.,
lexicon/sl_backchannels.txt)
- lexical resources (e.g.,
scripts/- numbered workflow scripts (
01to07) - see
scripts/README.mdfor script-level details
- numbered workflow scripts (
docs/- process documentation
docs/BACKCHANNELS_EXTRACTION.mddocs/COCONSTRUCTIONS_EXTRACTION.md
output/sst/- extracted candidates and generated annotated corpora
- final package:
output/sst/final_bc_coco/
- Merge corpus
python3 scripts/01_merge_sst.py- Extract backchannel candidates
python3 scripts/02_extract_backchannel_candidates.py- Apply backchannels (uses filtered rows from the candidate table)
python3 scripts/03_apply_backchannel_annotations.py- Extract coconstruction candidates
python3 scripts/04_extract_coconstruction_candidates.py- Manual coconstruction annotation (outside script)
- fill
is_coconstruction,coconstruct_deprel,governor_token_idfor YES cases
- Apply coconstructions
python3 scripts/05_apply_coconstruction_annotations.py- Split final merged file back to train/dev/test
python3 scripts/06_split_final_corpus.py- Run strict diff checks
python3 scripts/07_diffcheck_final_vs_src.py- Backchannels workflow:
docs/BACKCHANNELS_EXTRACTION.md - Coconstructions workflow + manual annotation:
docs/COCONSTRUCTIONS_EXTRACTION.md - Docs index:
docs/README.md
output/sst/final_bc_coco/conllu/sl_sst-ud-merged.conlluoutput/sst/final_bc_coco/conllu/sl_sst-ud-train.conlluoutput/sst/final_bc_coco/conllu/sl_sst-ud-dev.conlluoutput/sst/final_bc_coco/conllu/sl_sst-ud-test.conllu