Skip to content

Conversation

@theferrit32
Copy link
Contributor

@theferrit32 theferrit32 commented Nov 12, 2025

Close #577
Close #587

@theferrit32 theferrit32 self-assigned this Nov 12, 2025
@jsstevenson jsstevenson requested review from jsstevenson and removed request for jsstevenson November 12, 2025 19:25
@jsstevenson jsstevenson self-requested a review December 4, 2025 02:53
@theferrit32
Copy link
Contributor Author

Only thing left is addressing this comment
#589 (comment)

Copy link
Contributor

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting review for VCF code now. Going to be looking at the normalize fix next.

@korikuzma korikuzma self-requested a review December 4, 2025 18:21
Copy link
Contributor

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM

@theferrit32
Copy link
Contributor Author

VCF changes after fixing the max seq limit logic, and bumping it up to 50bp to match the rle_seq_limit default elsewhere

def _normalize_allele(input_allele, data_proxy, rle_seq_limit=50):

git diff --word-diff-regex='[^;]+' -- **/*.vcf

tests/extras/data/test_vcf_expected_altsonly_output.vcf
chr19 289464 . T TCACGCCTGTAATCC 50 PASS platforms=4;platformnames=Illumina,PacBio,CG,10X;datasets=4;datasetnames=HiSeqPE300x,CCS15kb_20kb,CGnormal,10XChromiumLR;callsets=6;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbGATK4,CGnormal,HiSeqPE300xfreebayes,CCS15kb_20kbDV,10XLRGATK;datasetsmissingcall=IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_CGnormal_callable,CS_HiSeqPE300xfreebayes_callable;filt=CS_CCS15kb_20kbDV_filt,CS_CCS15kb_20kbGATK4_filt;VRS_Allele_IDs=ga4gh:VA.ySvDptXfHB_9WEfu78v32DzBXJfwGgO7;VRS_Starts=289464;VRS_Ends=289466;[-VRS_States-]{+VRS_States=CACGCCTGTAATCCCA+};VRS_Lengths=.;VRS_RepeatSubunitLengths=. GT:PS:DP:ADALL:AD:GQ 0/1:.:518:94,98:116,137:785

tests/extras/data/test_vcf_expected_output.vcf
chr19 289464 . T TCACGCCTGTAATCC 50 PASS platforms=4;platformnames=Illumina,PacBio,CG,10X;datasets=4;datasetnames=HiSeqPE300x,CCS15kb_20kb,CGnormal,10XChromiumLR;callsets=6;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbGATK4,CGnormal,HiSeqPE300xfreebayes,CCS15kb_20kbDV,10XLRGATK;datasetsmissingcall=IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_CGnormal_callable,CS_HiSeqPE300xfreebayes_callable;filt=CS_CCS15kb_20kbDV_filt,CS_CCS15kb_20kbGATK4_filt;VRS_Allele_IDs=ga4gh:VA.nqqTUy-a2gssemOmJb4CJv-HNuFAmGrO,ga4gh:VA.ySvDptXfHB_9WEfu78v32DzBXJfwGgO7;VRS_Starts=289463,289464;VRS_Ends=289464,289466;[-VRS_States=T,-]{+VRS_States=T,CACGCCTGTAATCCCA+};VRS_Lengths=1,.;VRS_RepeatSubunitLengths=1,. GT:PS:DP:ADALL:AD:GQ 0/1:.:518:94,98:116,137:785

theferrit32 and others added 5 commits December 12, 2025 16:38
…T specific to RLE that defaults to 50, and excludes sequence vals from INFO if over that
Co-authored-by: Kori Kuzma <korikuzma@gmail.com>
Co-authored-by: Kori Kuzma <korikuzma@gmail.com>
Copy link
Contributor

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @theferrit32 🚀

@theferrit32 theferrit32 merged commit ded576f into main Dec 15, 2025
16 checks passed
@theferrit32 theferrit32 deleted the issue-577-VCF-RLE branch December 15, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Same-as-reference Alleles should have a ReferenceLengthExpression state Add RLE params to VRS-annotated VCFs

5 participants