-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi Ali,
Thanks again for this great tool. I'd have a question on how the fasta reference files are read, to make sense of results I'm getting.
In the case a fasta file used as reference (in the index) is composed of many short reads, how does krepp build the index from it?
With one simple example, when querying the same file on its own index, I'm getting matches only for the couple first reads:
#software: krepp #version: v0.4.2 #invocation :krepp dist -i /scratch/abernardi/cfDNA/run/HD_krepp/PCOH_PN005/krepp/indexes/common -q /scratch/abernardi/cfDNA/Data/WGS/References/fasta/PCOH_PN005_PBMC.fasta --num-threads 1 -o /scratch/abernardi/cfDNA/run/HD_krepp/PCOH_PN005/krepp/distances/PCOH_PN005_PBMC.tsv --no-filter SEQ_ID REFERENCE_NAME DIST PCOH_PN005_PBMC:0/A/1 0 1.26254e-05 PCOH_PN005_PBMC:0/A/2 0 1.26254e-05 PCOH_PN005_PBMC:1/A/1 0 1.26254e-05 PCOH_PN005_PBMC:1/A/2 NaN NaN PCOH_PN005_PBMC:2/B/2 0 1.26254e-05 PCOH_PN005_PBMC:2/B/1 0 1.26254e-05 PCOH_PN005_PBMC:3/B/2 NaN NaN PCOH_PN005_PBMC:3/B/1 NaN NaN PCOH_PN005_PBMC:4/B/2 NaN NaN PCOH_PN005_PBMC:4/B/1 NaN NaN PCOH_PN005_PBMC:5/B/2 NaN NaN PCOH_PN005_PBMC:5/B/1 NaN NaN PCOH_PN005_PBMC:6/B/2 NaN NaN PCOH_PN005_PBMC:6/B/1 NaN NaN ...only NaN after
The reads are 50-300 bp long.
Another example gives a similar pattern of fast decreasing occurrence of matches. Are only a certain amount of reads used in the index?
On the other hand, I get something that looks expected using classic long-read T2T files split by chromosomes.
Any help on that?
Much appreciated,
Armand