Hello,
I have a question about the difference between running RFMix on VCF files separated by chromosome versus using a single whole-genome VCF file.
In the manual, it's recommended that:
"It is recommended that the entire genome (all chromosomes) be contained in one VCF/BCF file for the query, and the reference rather than separate by chromosome."
Could you please clarify:
- If I combine all per-chromosome VCF files into a single whole-genome VCF, do I still need to specify --chromosome= and run RFMix one chromosome at a time? Or does RFMix process all chromosomes at once (i.e. output an aggregated result rather than by chromosome)?
- If it’s still necessary to run one chromosome at a time with the whole-genome VCF data, would there be difference between using a single genome-wide VCF file vs per-chromosome files?
I’ve successfully run RFMix using per-chromosome VCFs on selected chromosomes, but would like to better understand the benefits or implications of combining them before moving on to the next step.
Thank you!