This is the repository for the analysis scripts accompanying the manuscript for The history of chromosomal instability in genome doubled tumors. The repository for GRITIC, the tool developed in this work can be found here.
These scripts are used run GRITIC on the PCAWG and Hartwig cohorts. They are also used to build simulated samples used to evaluate GRITIC, using PCAWG and Hartwig samples as the initial framework. Access to the original PCAWG and Hartwig data is required to run these scripts.
HartwigDataLoader.py- A data loader for the Hartwig cohort.hartwig_handler.pyRuns GRITIC on the Hartwig cohort.PCAWGDataLoader.py- A data loader for the PCAWG cohort..pcawg_handler.pyRuns GRITIC on the PCAWG cohort.SampleSimulator.pyProduces simulated tumor samples from a template.state_validation_run.pySimulates samples and runs through GRITIC.state_validation_run_titrate.pySimulates samples and runs through GRITIC.
These scripts produce the figures for the manuscript.
arm_pre_post.py- A precursor script to generate a table for figures 4A,B and supplementary figures 37-41. Requirespre_post_non_gain.pyandpre_post_non_loss.pyto be run first.cancer_type_parsimony.pyProduces figure 2D and supplementary figures 15D,E.cn_wgd_figures.pyProduces figures 1A-C and supplementary figure 1. Requires access to PCAWG and Hartwig copy number and forhartwig_pcawg_copy_number_export.Rto be run first.combinesamplegainplots.pyProduces supplementary figures 19 and 26.DataTools.pyHelper functions for data loading.evaluate_parsimony_penalty.pyEvalutes the effect of the non-parsimony penalty on parsimony inference, a precursor to supplementary figures 17 and 18.event_proportions_relative_to_wgd_probabilistic.pyProduces figures 4C,D and supplementary figures 42-49.fraction_of_gains_job.pyProduces figure 3D and supplementary figures 28A-B. Requirespre_post_permute_job.pyto be run first.gain_distributions_across_samples.pyProduces figure 3C.GainClassificationAnalysis.RProduces supplementary figure 36, requires access to PCAWG structural variant data.GainClassifier.RHelper script forGainClassificationAnalysis.R.generic_gain_histograms.pyProduces supplementary figure 27.hartwig_multiregion.pyProduces supplementary figure 12. Requires raw data to run.hartwig_pcawg_copy_number_export.RA precursor script forcn_wgd_figures.py. Requires access to PCAWG and Hartwig copy number.major_3_4_analysis.pyProduces supplementary figure 20.major_cn_timing_job.pyProduces figures 2E,F and supplementary figure 22.make_sample_timing_plots.pyPlots sample gain timing posteriors, a precursor to supplementary figures 19 and 26.min_dist_nearest_timing.RHelper function forpermutation_final_prior_true.RandRun_WDnearest_prior_true.R.pan_gain_probabilistic.pyProduces figure 3B and supplementary figures 24 and 25.parsimony_plots_with_bootstrapping.pyProduces figure 3D and supplementary figures 15A,B and 16.permutation_final_prior_true.RPrecursor script forSync_Gain_04092024.R.plot_parsimony_prior_evaluations.pyProduces supplementary figures 17 and 18.plot_wgd_sampling.pyA helper script to produce figure 3B and supplementary figures 24 and 25.plot_wgd_timing_event_proportions.pyProduces figures 3F,G and supplementary figures 30-33.plot_passage_cn_changes.RProduces figure 2B. Requiresprocess_cn_passage_data.pyto be run first.PlotTiming_posteriors.RHelper script forSync_Gain_04092024.R.PlotTiming.RHelper script forSync_Gain_04092024.R.pre_post_correlations_corrected.RProduces figures 4A,B and supplementary figures 37-41. Requiresarm_pre_post.pyto be run first.pre_post_major_cn.pyProduces supplementary figures 21.pre_post_non_gain.pyPrecursor script to produce aggregated gain timing files. Required forarm_pre_post.pyandevent_proportions_relative_to_wgd_probabilistic.py.pre_post_non_loss.pyPrecursor script to produce aggregated loss timing files. Required forarm_pre_post.pyandevent_proportions_relative_to_wgd_probabilistic.py.pre_post_permute_job.pyPrecursor script to produce aggregated loss timing files.process_cn_passage_data.pyPrecursor script to produce figure 2B. Requires access to tetraploid passage CN data.route_calibration_plot.pyProduces supplementary figures 8-11.route_difference_analysis.pyProduces supplementary figures 13-14.Run_WDnearest_prior_true.RPrecursor script forSync_Gain_04092024.R.Sync_Gain_04092024.RProduces figures 3H,I and supplementary figures 34D-F.timing_matchup_sim.pyProduces figures 1H and supplementary figures 3-7.usarc_analysis_medicc.ipynbProduces figure 3A.wgd_calling_analysis_script.pyProduces supplementary figure 51.wgd_constraint_analysis.pyProduces supplementary figure 2.wgd_dist_permutation_job.pyProduces figure 3E and supplementary figures 28C,D and 29.WGD_Chromothripsis_Fisher_Exact.RProduces supplementary figure 35, requires access to PCAWG chromothripsis data.