0% found this document useful (0 votes)
15 views38 pages

Reads Filter-1

The document provides a detailed overview of the fastp tool, including its version and various command-line options for processing FASTQ files. It includes information on input/output file specifications, quality filtering, trimming options, and statistics on read quality before and after filtering. The filtering results indicate the number of reads that passed and failed due to various quality metrics.

Uploaded by

Nicolas Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views38 pages

Reads Filter-1

The document provides a detailed overview of the fastp tool, including its version and various command-line options for processing FASTQ files. It includes information on input/output file specifications, quality filtering, trimming options, and statistics on read quality before and after filtering. The filtering results indicate the number of reads that passed and failed due to various quality metrics.

Uploaded by

Nicolas Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

reads_filter

May 24, 2025

fastp version: 0.24.1; fastqc version: 0.11.0; multiqc version: 1.29


[3]: fastp -h

option needs value: --html


usage: fastp [options] …
options:
-i, --in1 read1 input file name (string [=])
-o, --out1 read1 output file name (string [=])
-I, --in2 read2 input file name (string [=])
-O, --out2 read2 output file name (string [=])
--unpaired1 for PE input, if read1 passed QC but
read2 not, it will be written to unpaired1. Default is to discard it. (string
[=])
--unpaired2 for PE input, if read2 passed QC but
read1 not, it will be written to unpaired2. If --unpaired2 is same as
--unpaired1 (default mode), both unpaired reads will be written to this same
file. (string [=])
--overlapped_out for each read pair, output the overlapped
region if it has no any mismatched base. (string [=])
--failed_out specify the file to store reads that
cannot pass the filters. (string [=])
-m, --merge for paired-end input, merge each pair of
reads into a single read if they are overlapped. The merged reads will be
written to the file given by --merged_out, the unmerged reads will be written to
the files specified by --out1 and --out2. The merging mode is disabled by
default.
--merged_out in the merging mode, specify the file
name to store merged output, or specify --stdout to stream the merged output
(string [=])
--include_unmerged in the merging mode, write the unmerged
or unpaired reads to the file specified by --merge. Disabled by default.
-6, --phred64 indicate the input is using phred64
scoring (it'll be converted to phred33, so the output will still be phred33)
-z, --compression compression level for gzip output (1 ~
9). 1 is fastest, 9 is smallest, default is 4. (int [=4])
--stdin input from STDIN. If the STDIN is
interleaved paired-end FASTQ, please also add --interleaved_in.

1
--stdout stream passing-filters reads to STDOUT.
This option will result in interleaved FASTQ output for paired-end output.
Disabled by default.
--interleaved_in indicate that <in1> is an interleaved
FASTQ which contains both read1 and read2. Disabled by default.
--reads_to_process specify how many reads/pairs to be
processed. Default 0 means process all reads. (int [=0])
--dont_overwrite don't overwrite existing files.
Overwritting is allowed by default.
--fix_mgi_id the MGI FASTQ ID format is not compatible
with many BAM operation tools, enable this option to fix it.
-V, --verbose output verbose log information (i.e. when
every 1M reads are processed).
-A, --disable_adapter_trimming adapter trimming is enabled by default.
If this option is specified, adapter trimming is disabled
-a, --adapter_sequence the adapter for read1. For SE data, if
not specified, the adapter will be auto-detected. For PE data, this is used if
R1/R2 are found not overlapped. (string [=auto])
--adapter_sequence_r2 the adapter for read2 (PE data only).
This is used if R1/R2 are found not overlapped. If not specified, it will be the
same as <adapter_sequence> (string [=auto])
--adapter_fasta specify a FASTA file to trim both read1
and read2 (if PE) by all the sequences in this FASTA file (string [=])
--detect_adapter_for_pe by default, the auto-detection for
adapter is for SE data input only, turn on this option to enable it for PE data.
-f, --trim_front1 trimming how many bases in front for
read1, default is 0 (int [=0])
-t, --trim_tail1 trimming how many bases in tail for
read1, default is 0 (int [=0])
-b, --max_len1 if read1 is longer than max_len1, then
trim read1 at its tail to make it as long as max_len1. Default 0 means no
limitation (int [=0])
-F, --trim_front2 trimming how many bases in front for
read2. If it's not specified, it will follow read1's settings (int [=0])
-T, --trim_tail2 trimming how many bases in tail for
read2. If it's not specified, it will follow read1's settings (int [=0])
-B, --max_len2 if read2 is longer than max_len2, then
trim read2 at its tail to make it as long as max_len2. Default 0 means no
limitation. If it's not specified, it will follow read1's settings (int [=0])
-D, --dedup enable deduplication to drop the
duplicated reads/pairs
--dup_calc_accuracy accuracy level to calculate duplication
(1~6), higher level uses more memory (1G, 2G, 4G, 8G, 16G, 24G). Default 1 for
no-dedup mode, and 3 for dedup mode. (int [=0])
--dont_eval_duplication don't evaluate duplication rate to save
time and use less memory.
-g, --trim_poly_g force polyG tail trimming, by default
trimming is automatically enabled for Illumina NextSeq/NovaSeq data

2
--poly_g_min_len the minimum length to detect polyG in the
read tail. 10 by default. (int [=10])
-G, --disable_trim_poly_g disable polyG tail trimming, by default
trimming is automatically enabled for Illumina NextSeq/NovaSeq data
-x, --trim_poly_x enable polyX trimming in 3' ends.
--poly_x_min_len the minimum length to detect polyX in the
read tail. 10 by default. (int [=10])
-5, --cut_front move a sliding window from front (5') to
tail, drop the bases in the window if its mean quality < threshold, stop
otherwise.
-3, --cut_tail move a sliding window from tail (3') to
front, drop the bases in the window if its mean quality < threshold, stop
otherwise.
-r, --cut_right move a sliding window from front to tail,
if meet one window with mean quality < threshold, drop the bases in the window
and the right part, and then stop.
-W, --cut_window_size the window size option shared by
cut_front, cut_tail or cut_sliding. Range: 1~1000, default: 4 (int [=4])
-M, --cut_mean_quality the mean quality requirement option
shared by cut_front, cut_tail or cut_sliding. Range: 1~36 default: 20 (Q20) (int
[=20])
--cut_front_window_size the window size option of cut_front,
default to cut_window_size if not specified (int [=4])
--cut_front_mean_quality the mean quality requirement option for
cut_front, default to cut_mean_quality if not specified (int [=20])
--cut_tail_window_size the window size option of cut_tail,
default to cut_window_size if not specified (int [=4])
--cut_tail_mean_quality the mean quality requirement option for
cut_tail, default to cut_mean_quality if not specified (int [=20])
--cut_right_window_size the window size option of cut_right,
default to cut_window_size if not specified (int [=4])
--cut_right_mean_quality the mean quality requirement option for
cut_right, default to cut_mean_quality if not specified (int [=20])
-Q, --disable_quality_filtering quality filtering is enabled by default.
If this option is specified, quality filtering is disabled
-q, --qualified_quality_phred the quality value that a base is
qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
-u, --unqualified_percent_limit how many percents of bases are allowed to
be unqualified (0~100). Default 40 means 40% (int [=40])
-n, --n_base_limit if one read's number of N base is
>n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])
-e, --average_qual if one read's average quality score
<avg_qual, then this read/pair is discarded. Default 0 means no requirement (int
[=0])
-L, --disable_length_filtering length filtering is enabled by default.
If this option is specified, length filtering is disabled
-l, --length_required reads shorter than length_required will
be discarded, default is 15. (int [=15])

3
--length_limit reads longer than length_limit will be
discarded, default 0 means no limitation. (int [=0])
-y, --low_complexity_filter enable low complexity filter. The
complexity is defined as the percentage of base that is different from its next
base (base[i] != base[i+1]).
-Y, --complexity_threshold the threshold for low complexity filter
(0~100). Default is 30, which means 30% complexity is required. (int [=30])
--filter_by_index1 specify a file contains a list of
barcodes of index1 to be filtered out, one barcode per line (string [=])
--filter_by_index2 specify a file contains a list of
barcodes of index2 to be filtered out, one barcode per line (string [=])
--filter_by_index_threshold the allowed difference of index barcode
for index filtering, default 0 means completely identical. (int [=0])
-c, --correction enable base correction in overlapped
regions (only for PE data), default is disabled
--overlap_len_require the minimum length to detect overlapped
region of PE reads. This will affect overlap analysis based PE merge, adapter
trimming and correction. 30 by default. (int [=30])
--overlap_diff_limit the maximum number of mismatched bases to
detect overlapped region of PE reads. This will affect overlap analysis based PE
merge, adapter trimming and correction. 5 by default. (int [=5])
--overlap_diff_percent_limit the maximum percentage of mismatched
bases to detect overlapped region of PE reads. This will affect overlap analysis
based PE merge, adapter trimming and correction. Default 20 means 20%. (int
[=20])
-U, --umi enable unique molecular identifier (UMI)
preprocessing
--umi_loc specify the location of UMI, can be
(index1/index2/read1/read2/per_index/per_read, default is none (string [=])
--umi_len if the UMI is in read1/read2, its length
should be provided (int [=0])
--umi_prefix if specified, an underline will be used
to connect prefix and UMI (i.e. prefix=UMI, UMI=AATTCG, final=UMI_AATTCG). No
prefix by default (string [=])
--umi_skip if the UMI is in read1/read2, fastp can
skip several bases following UMI, default is 0 (int [=0])
--umi_delim delimiter to use between the read name
and the UMI, default is : (string [=:])
-p, --overrepresentation_analysis enable overrepresented sequence analysis.
-P, --overrepresentation_sampling one in (--overrepresentation_sampling)
reads will be computed for overrepresentation analysis (1~10000), smaller is
slower, default is 20. (int [=20])
-j, --json the json format report file name (string
[=fastp.json])
-h, --html the html format report file name (string
[=fastp.html])
-R, --report_title should be quoted with ' or ", default is
"fastp report" (string [=fastp report])

4
-w, --thread worker thread number, default is 3 (int
[=3])
-s, --split split output by limiting total split file
number with this option (2~999), a sequential number prefix will be added to
output name ( 0001.out.fq, 0002.out.fq…), disabled by default (int [=0])
-S, --split_by_lines split output by limiting lines of each
file with this option(>=1000), a sequential number prefix will be added to
output name ( 0001.out.fq, 0002.out.fq…), disabled by default (long [=0])
-d, --split_prefix_digits the digits for the sequential number
padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to
disable padding (int [=4])
--cut_by_quality5 DEPRECATED, use --cut_front instead.
--cut_by_quality3 DEPRECATED, use --cut_tail instead.
--cut_by_quality_aggressive DEPRECATED, use --cut_right instead.
--discard_unmerged DEPRECATED, no effect now, see the
introduction for merging.
-?, --help print this message

[1]: ls

fastqc_results VA1832-2021
VA2041-2021 VA2095-2021 VA542-2022
fastqc_trimming VA1833-2021
VA2042-2021 VA3232-2021 VA543-2022
skesa_assemblies.ipynb VA1842-2021 VA2043-2021
VA4087-2021
VA1638-2021 VA1887-2021
VA2092-2021 VA4090-2021
VA1701-2021 VA2040-2021
VA2093-2021 VA541-2022

[2]: cd VA1638-2021

[3]: ls

VA1638-2021_S3_L001_R1_001.fastq.gz
VA1638-2021_S3_L001_R2_001.fastq.gz

[4]: fastp -i VA1638-2021_S3_L001_R1_001.fastq.gz -o VA1638-2021_trimmed_R1_001.


↪fastq.gz -I VA1638-2021_S3_L001_R2_001.fastq.gz -O␣

↪VA1638-2021_trimmed_R2_001.fastq.gz -t 3 -T 3 -f 21 -F 21 -w 16

Read1 before filtering:


total reads: 1612419
total bases: 385216491
Q20 bases: 355237753(92.2177%)

5
Q30 bases: 338487695(87.8695%)

Read2 before filtering:


total reads: 1612419
total bases: 388315698
Q20 bases: 265429810(68.3541%)
Q30 bases: 217240910(55.9444%)

Read1 after filtering:


total reads: 1301600
total bases: 277000395
Q20 bases: 261080908(94.2529%)
Q30 bases: 251441811(90.7731%)

Read2 after filtering:


total reads: 1301600
total bases: 277376191
Q20 bases: 207727855(74.8903%)
Q30 bases: 175905196(63.4176%)

Filtering result:
reads passed filter: 2603200
reads failed due to low quality: 620634
reads failed due to too many N: 0
reads failed due to too short: 1004
reads with adapter trimmed: 542752
bases trimmed due to adapters: 2754504

Duplication rate: 0.123355%

Insert size peak (evaluated by paired-end reads): 353

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA1638-2021_S3_L001_R1_001.fastq.gz -o
VA1638-2021_trimmed_R1_001.fastq.gz -I VA1638-2021_S3_L001_R2_001.fastq.gz -O
VA1638-2021_trimmed_R2_001.fastq.gz -t 3 -T 3 -f 21 -F 21 -w 16
fastp v0.24.1, time used: 15 seconds

[5]: ls

fastp.html
VA1638-2021_S3_L001_R2_001.fastq.gz
fastp.json
VA1638-2021_trimmed_R1_001.fastq.gz
VA1638-2021_S3_L001_R1_001.fastq.gz
VA1638-2021_trimmed_R2_001.fastq.gz

6
[6]: cd ../VA1701-2021

[10]: ls

VA1701-2021_S4_L001_R1_001.fastq.gz
VA1701-2021_S4_L001_R2_001.fastq.gz

[11]: fastp -i VA1701-2021_S4_L001_R1_001.fastq.gz -o VA1701-2021_trimmed_R1_001.


↪fastq.gz -I VA1701-2021_S4_L001_R2_001.fastq.gz -O␣

↪VA1701-2021_trimmed_R2_001.fastq.gz -t 4 -T 4 -f 21 -F 21 -w 16

Read1 before filtering:


total reads: 1649770
total bases: 389185731
Q20 bases: 359553679(92.3861%)
Q30 bases: 342907893(88.1091%)

Read2 before filtering:


total reads: 1649770
total bases: 391679759
Q20 bases: 291890251(74.5227%)
Q30 bases: 248156493(63.357%)

Read1 after filtering:


total reads: 1429765
total bases: 299156897
Q20 bases: 281179984(93.9908%)
Q30 bases: 270299630(90.3538%)

Read2 after filtering:


total reads: 1429765
total bases: 299498792
Q20 bases: 238364662(79.5879%)
Q30 bases: 207475253(69.2742%)

Filtering result:
reads passed filter: 2859530
reads failed due to low quality: 438240
reads failed due to too many N: 0
reads failed due to too short: 1770
reads with adapter trimmed: 617114
bases trimmed due to adapters: 2440871

Duplication rate: 0.158628%

Insert size peak (evaluated by paired-end reads): 390

JSON report: fastp.json

7
HTML report: fastp.html

fastp -i VA1701-2021_S4_L001_R1_001.fastq.gz -o
VA1701-2021_trimmed_R1_001.fastq.gz -I VA1701-2021_S4_L001_R2_001.fastq.gz -O
VA1701-2021_trimmed_R2_001.fastq.gz -t 4 -T 4 -f 21 -F 21 -w 16
fastp v0.24.1, time used: 16 seconds

[12]: ls

fastp.html
VA1701-2021_S4_L001_R2_001.fastq.gz
fastp.json
VA1701-2021_trimmed_R1_001.fastq.gz
VA1701-2021_S4_L001_R1_001.fastq.gz
VA1701-2021_trimmed_R2_001.fastq.gz

[13]: cd ../VA1832-2021

[16]: ls

VA1832-2021_S5_L001_R1_001.fastq.gz
VA1832-2021_S5_L001_R2_001.fastq.gz

[17]: fastp -i VA1832-2021_S5_L001_R1_001.fastq.gz -o VA1832-2021_trimmed_R1_001.


↪fastq.gz -I VA1832-2021_S5_L001_R2_001.fastq.gz -O␣

↪VA1832-2021_trimmed_R2_001.fastq.gz -t 4 -T 4 -f 21 -F 21 -w 16

Read1 before filtering:


total reads: 2700870
total bases: 635812669
Q20 bases: 589029622(92.642%)
Q30 bases: 562684825(88.4985%)

Read2 before filtering:


total reads: 2700870
total bases: 640160850
Q20 bases: 511233652(79.8602%)
Q30 bases: 448432799(70.05%)

Read1 after filtering:


total reads: 2454906
total bases: 513899596
Q20 bases: 482158793(93.8235%)
Q30 bases: 463083394(90.1116%)

Read2 after filtering:


total reads: 2454906
total bases: 514650854

8
Q20 bases: 429595266(83.4731%)
Q30 bases: 382325264(74.2883%)

Filtering result:
reads passed filter: 4909812
reads failed due to low quality: 489712
reads failed due to too many N: 0
reads failed due to too short: 2216
reads with adapter trimmed: 1332290
bases trimmed due to adapters: 4226519

Duplication rate: 0.351516%

Insert size peak (evaluated by paired-end reads): 330

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA1832-2021_S5_L001_R1_001.fastq.gz -o
VA1832-2021_trimmed_R1_001.fastq.gz -I VA1832-2021_S5_L001_R2_001.fastq.gz -O
VA1832-2021_trimmed_R2_001.fastq.gz -t 4 -T 4 -f 21 -F 21 -w 16
fastp v0.24.1, time used: 27 seconds

[4]: cd ../VA1833-2021

[5]: ls

VA1833-2021_S6_L001_R1_001.fastq.gz
VA1833-2021_S6_L001_R2_001.fastq.gz

[6]: fastp -i VA1833-2021_S6_L001_R1_001.fastq.gz -o VA1833-2021_trimmed_R1_001.


↪fastq.gz -I VA1833-2021_S6_L001_R2_001.fastq.gz -O␣

↪VA1833-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16

Read1 before filtering:


total reads: 2393206
total bases: 553598254
Q20 bases: 512921245(92.6523%)
Q30 bases: 490158775(88.5405%)

Read2 before filtering:


total reads: 2393206
total bases: 561963857
Q20 bases: 368023573(65.4888%)
Q30 bases: 296716775(52.8%)

Read1 after filtering:


total reads: 1880503

9
total bases: 386657672
Q20 bases: 366630140(94.8203%)
Q30 bases: 354444014(91.6687%)

Read2 after filtering:


total reads: 1880503
total bases: 387569951
Q20 bases: 282351663(72.8518%)
Q30 bases: 237290815(61.2253%)

Filtering result:
reads passed filter: 3761006
reads failed due to low quality: 1023640
reads failed due to too many N: 0
reads failed due to too short: 1766
reads with adapter trimmed: 1187252
bases trimmed due to adapters: 6733581

Duplication rate: 0.155189%

Insert size peak (evaluated by paired-end reads): 242

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA1833-2021_S6_L001_R1_001.fastq.gz -o
VA1833-2021_trimmed_R1_001.fastq.gz -I VA1833-2021_S6_L001_R2_001.fastq.gz -O
VA1833-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16
fastp v0.24.1, time used: 21 seconds

[7]: cd ../VA1842-2021

[8]: ls

VA1842-2021_S7_L001_R1_001.fastq.gz
VA1842-2021_S7_L001_R2_001.fastq.gz

[12]: fastp -i VA1842-2021_S7_L001_R1_001.fastq.gz -o VA1842-2021_trimmed_R1_001.


↪fastq.gz -I VA1842-2021_S7_L001_R2_001.fastq.gz -O␣

↪VA1842-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16

Read1 before filtering:


total reads: 1180436
total bases: 271645038
Q20 bases: 250923780(92.3719%)
Q30 bases: 239435537(88.1428%)

Read2 before filtering:

10
total reads: 1180436
total bases: 273271603
Q20 bases: 220349001(80.6337%)
Q30 bases: 194221055(71.0725%)

Read1 after filtering:


total reads: 1073571
total bases: 221933964
Q20 bases: 207582421(93.5334%)
Q30 bases: 199046885(89.6874%)

Read2 after filtering:


total reads: 1073571
total bases: 222118352
Q20 bases: 186786705(84.0933%)
Q30 bases: 166769866(75.0815%)

Filtering result:
reads passed filter: 2147142
reads failed due to low quality: 212462
reads failed due to too many N: 0
reads failed due to too short: 1268
reads with adapter trimmed: 541744
bases trimmed due to adapters: 1799333

Duplication rate: 0.247875%

Insert size peak (evaluated by paired-end reads): 198

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA1842-2021_S7_L001_R1_001.fastq.gz -o
VA1842-2021_trimmed_R1_001.fastq.gz -I VA1842-2021_S7_L001_R2_001.fastq.gz -O
VA1842-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16
fastp v0.24.1, time used: 12 seconds

[13]: cd ../VA1887-2021

[14]: ls

VA1887-2021_S8_L001_R1_001.fastq.gz
VA1887-2021_S8_L001_R2_001.fastq.gz

[15]: fastp -i VA1887-2021_S8_L001_R1_001.fastq.gz -o VA1887-2021_trimmed_R1_001.


↪fastq.gz -I VA1887-2021_S8_L001_R2_001.fastq.gz -O␣

↪VA1887-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16

11
Read1 before filtering:
total reads: 2624018
total bases: 613320531
Q20 bases: 566930291(92.4362%)
Q30 bases: 540969342(88.2034%)

Read2 before filtering:


total reads: 2624018
total bases: 616990567
Q20 bases: 509966753(82.6539%)
Q30 bases: 453926494(73.5711%)

Read1 after filtering:


total reads: 2421082
total bases: 510125170
Q20 bases: 476515554(93.4115%)
Q30 bases: 456444734(89.477%)

Read2 after filtering:


total reads: 2421082
total bases: 510614539
Q20 bases: 437175630(85.6175%)
Q30 bases: 393252543(77.0155%)

Filtering result:
reads passed filter: 4842164
reads failed due to low quality: 404348
reads failed due to too many N: 0
reads failed due to too short: 1524
reads with adapter trimmed: 1352518
bases trimmed due to adapters: 4118103

Duplication rate: 0.38826%

Insert size peak (evaluated by paired-end reads): 302

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA1887-2021_S8_L001_R1_001.fastq.gz -o
VA1887-2021_trimmed_R1_001.fastq.gz -I VA1887-2021_S8_L001_R2_001.fastq.gz -O
VA1887-2021_trimmed_R2_001.fastq.gz -t 2 -T 2 -f 20 -F 20 -w 16
fastp v0.24.1, time used: 25 seconds

[16]: cd ../VA2040-2021

[17]: ls

12
VA2040-2021_S9_L001_R1_001.fastq.gz
VA2040-2021_S9_L001_R2_001.fastq.gz

[18]: fastp -i VA2040-2021_S9_L001_R1_001.fastq.gz -o VA2040-2021_S9_trimmed_001.


↪fastq.gz -I VA2040-2021_S9_L001_R2_001.fastq.gz -O␣

↪VA2040-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16

Read1 before filtering:


total reads: 1563938
total bases: 365218631
Q20 bases: 337653666(92.4525%)
Q30 bases: 322345942(88.2611%)

Read2 before filtering:


total reads: 1563938
total bases: 367683729
Q20 bases: 288788727(78.5427%)
Q30 bases: 251463502(68.3913%)

Read1 after filtering:


total reads: 1405667
total bases: 295382844
Q20 bases: 276777783(93.7014%)
Q30 bases: 265690260(89.9478%)

Read2 after filtering:


total reads: 1405667
total bases: 295658774
Q20 bases: 243500821(82.3587%)
Q30 bases: 215293598(72.8183%)

Filtering result:
reads passed filter: 2811334
reads failed due to low quality: 315550
reads failed due to too many N: 0
reads failed due to too short: 992
reads with adapter trimmed: 708540
bases trimmed due to adapters: 2559243

Duplication rate: 0.18511%

Insert size peak (evaluated by paired-end reads): 338

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2040-2021_S9_L001_R1_001.fastq.gz -o
VA2040-2021_S9_trimmed_001.fastq.gz -I VA2040-2021_S9_L001_R2_001.fastq.gz -O

13
VA2040-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16
fastp v0.24.1, time used: 16 seconds

[19]: cd ../VA2041-2021

[20]: ls

VA2041-2021_S10_L001_R1_001.fastq.gz
VA2041-2021_S10_L001_R2_001.fastq.gz

[21]: fastp -i VA2041-2021_S10_L001_R1_001.fastq.gz -o VA2041-2021_trimmed_R1_001.


↪fastq.gz -I VA2041-2021_S10_L001_R2_001.fastq.gz -O␣

↪VA2041-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16

Read1 before filtering:


total reads: 2101307
total bases: 494930621
Q20 bases: 457430335(92.4231%)
Q30 bases: 436285000(88.1507%)

Read2 before filtering:


total reads: 2101307
total bases: 499276413
Q20 bases: 361846304(72.4741%)
Q30 bases: 303782694(60.8446%)

Read1 after filtering:


total reads: 1779039
total bases: 376344731
Q20 bases: 354430884(94.1772%)
Q30 bases: 340947286(90.5944%)

Read2 after filtering:


total reads: 1779039
total bases: 376808179
Q20 bases: 293480424(77.8859%)
Q30 bases: 252703553(67.0642%)

Filtering result:
reads passed filter: 3558078
reads failed due to low quality: 643304
reads failed due to too many N: 0
reads failed due to too short: 1232
reads with adapter trimmed: 904258
bases trimmed due to adapters: 4066578

Duplication rate: 0.130871%

14
Insert size peak (evaluated by paired-end reads): 330

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2041-2021_S10_L001_R1_001.fastq.gz -o
VA2041-2021_trimmed_R1_001.fastq.gz -I VA2041-2021_S10_L001_R2_001.fastq.gz -O
VA2041-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16
fastp v0.24.1, time used: 20 seconds

[22]: cd ../VA2042-2021

[23]: ls

VA2042-2021_S1_L001_R1_001.fastq.gz
VA2042-2021_S1_L001_R2_001.fastq.gz

[24]: fastp -i VA2042-2021_S1_L001_R1_001.fastq.gz -o VA2042-2021_trimmed_R1_001.


↪fastq.gz -I VA2042-2021_S1_L001_R2_001.fastq.gz -O␣

↪VA2042-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16

Read1 before filtering:


total reads: 2505223
total bases: 593891171
Q20 bases: 548903083(92.4249%)
Q30 bases: 523454552(88.1398%)

Read2 before filtering:


total reads: 2505223
total bases: 597373758
Q20 bases: 475732063(79.6373%)
Q30 bases: 416417051(69.708%)

Read1 after filtering:


total reads: 2261313
total bases: 483819789
Q20 bases: 452981659(93.6261%)
Q30 bases: 434227302(89.7498%)

Read2 after filtering:


total reads: 2261313
total bases: 484264198
Q20 bases: 402865127(83.1912%)
Q30 bases: 357360291(73.7945%)

Filtering result:
reads passed filter: 4522626
reads failed due to low quality: 486762

15
reads failed due to too many N: 0
reads failed due to too short: 1058
reads with adapter trimmed: 1083528
bases trimmed due to adapters: 3696114

Duplication rate: 0.21559%

Insert size peak (evaluated by paired-end reads): 355

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2042-2021_S1_L001_R1_001.fastq.gz -o
VA2042-2021_trimmed_R1_001.fastq.gz -I VA2042-2021_S1_L001_R2_001.fastq.gz -O
VA2042-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 2 -T 2 -w 16
fastp v0.24.1, time used: 25 seconds

[2]: cd VA2043-2021/

[3]: ls

VA2043-2021_S2_L001_R1_001.fastq.gz
VA2043-2021_S2_L001_R2_001.fastq.gz

[4]: fastp -i VA2043-2021_S2_L001_R1_001.fastq.gz -o VA2043-2021_trimmed_R1_001.


↪fastq.gz -I VA2043-2021_S2_L001_R2_001.fastq.gz -O␣

↪VA2043-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 3 -T 2 -w 16

Read1 before filtering:


total reads: 1696591
total bases: 399749309
Q20 bases: 370112274(92.5861%)
Q30 bases: 353614632(88.4591%)

Read2 before filtering:


total reads: 1696591
total bases: 403031145
Q20 bases: 286219215(71.0166%)
Q30 bases: 238921838(59.2812%)

Read1 after filtering:


total reads: 1424642
total bases: 299860257
Q20 bases: 282956525(94.3628%)
Q30 bases: 272856911(90.9947%)

Read2 after filtering:


total reads: 1424642

16
total bases: 301630509
Q20 bases: 231189257(76.6465%)
Q30 bases: 198357439(65.7617%)

Filtering result:
reads passed filter: 2849284
reads failed due to low quality: 542520
reads failed due to too many N: 0
reads failed due to too short: 1378
reads with adapter trimmed: 672104
bases trimmed due to adapters: 3100420

Duplication rate: 0.152305%

Insert size peak (evaluated by paired-end reads): 346

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2043-2021_S2_L001_R1_001.fastq.gz -o
VA2043-2021_trimmed_R1_001.fastq.gz -I VA2043-2021_S2_L001_R2_001.fastq.gz -O
VA2043-2021_trimmed_R2_001.fastq.gz -f 20 -F 20 -t 3 -T 2 -w 16
fastp v0.24.1, time used: 16 seconds

[5]: cd ../VA2092-2021

[6]: ls

VA2092-2021-2_S2_L001_R1_001.fastq.gz
VA2092-2021_S1_L001_R1_001.fastq.gz
VA2092-2021-2_S2_L001_R2_001.fastq.gz
VA2092-2021_S1_L001_R2_001.fastq.gz
En este caso, como son varias lecturas R1 y R2, haremos los cortes de calidad con fastp, y luego
“cat” para unir todas las R1 y todas las R2, en sus respectivos archivos.
[8]: fastp -i VA2092-2021-2_S2_L001_R1_001.fastq.gz -o␣
↪VA2092-2021-2_S2_trimmed_R1_001.fastq.gz -I VA2092-2021-2_S2_L001_R2_001.

↪fastq.gz -O VA2092-2021-2_S2_trimmed_R2_001.fastq.gz -f 20 -F 20 -T 1 -t 1␣

↪-w 16

Read1 before filtering:


total reads: 1222381
total bases: 306817631
Q20 bases: 281849161(91.8621%)
Q30 bases: 270886390(88.2891%)

Read2 before filtering:

17
total reads: 1222381
total bases: 306817631
Q20 bases: 262289307(85.487%)
Q30 bases: 243888111(79.4896%)

Read1 after filtering:


total reads: 1166264
total bases: 243998145
Q20 bases: 230570567(94.4969%)
Q30 bases: 223736861(91.6961%)

Read2 after filtering:


total reads: 1166264
total bases: 243998145
Q20 bases: 218648435(89.6107%)
Q30 bases: 206352067(84.5712%)

Filtering result:
reads passed filter: 2332528
reads failed due to low quality: 112234
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 670326
bases trimmed due to adapters: 48590492

Duplication rate: 1.28315%

Insert size peak (evaluated by paired-end reads): 326

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2092-2021-2_S2_L001_R1_001.fastq.gz -o
VA2092-2021-2_S2_trimmed_R1_001.fastq.gz -I
VA2092-2021-2_S2_L001_R2_001.fastq.gz -O
VA2092-2021-2_S2_trimmed_R2_001.fastq.gz -f 20 -F 20 -T 1 -t 1 -w 16
fastp v0.24.1, time used: 13 seconds

[9]: ls

fastp.html
VA2092-2021-2_S2_trimmed_R1_001.fastq.gz
fastp.json
VA2092-2021-2_S2_trimmed_R2_001.fastq.gz
VA2092-2021-2_S2_L001_R1_001.fastq.gz
VA2092-2021_S1_L001_R1_001.fastq.gz
VA2092-2021-2_S2_L001_R2_001.fastq.gz
VA2092-2021_S1_L001_R2_001.fastq.gz

18
[10]: fastp -i VA2092-2021_S1_L001_R1_001.fastq.gz -o VA2092-2021_S1_trimmed_R1_001.
↪fastq.gz -I VA2092-2021_S1_L001_R2_001.fastq.gz -O␣

↪VA2092-2021_S1_trimmed_R2_001.fastq.gz -f 29 -F 21 -t 1 -T 1 -w 16

Read1 before filtering:


total reads: 1314248
total bases: 329876248
Q20 bases: 306158195(92.81%)
Q30 bases: 295110500(89.461%)

Read2 before filtering:


total reads: 1314248
total bases: 329876248
Q20 bases: 285377612(86.5105%)
Q30 bases: 266861556(80.8975%)

Read1 after filtering:


total reads: 1255506
total bases: 256798012
Q20 bases: 243122683(94.6747%)
Q30 bases: 235867507(91.8494%)

Read2 after filtering:


total reads: 1255506
total bases: 266842060
Q20 bases: 240440573(90.1059%)
Q30 bases: 227519177(85.2636%)

Filtering result:
reads passed filter: 2511012
reads failed due to low quality: 117484
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 695548
bases trimmed due to adapters: 41471124

Duplication rate: 1.25296%

Insert size peak (evaluated by paired-end reads): 271

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2092-2021_S1_L001_R1_001.fastq.gz -o
VA2092-2021_S1_trimmed_R1_001.fastq.gz -I VA2092-2021_S1_L001_R2_001.fastq.gz -O
VA2092-2021_S1_trimmed_R2_001.fastq.gz -f 29 -F 21 -t 1 -T 1 -w 16
fastp v0.24.1, time used: 14 seconds

19
[11]: ls

fastp.html
fastp.json
VA2092-2021-2_S2_L001_R1_001.fastq.gz
VA2092-2021-2_S2_L001_R2_001.fastq.gz
VA2092-2021-2_S2_trimmed_R1_001.fastq.gz
VA2092-2021-2_S2_trimmed_R2_001.fastq.gz
VA2092-2021_S1_L001_R1_001.fastq.gz
VA2092-2021_S1_L001_R2_001.fastq.gz
VA2092-2021_S1_trimmed_R1_001.fastq.gz
VA2092-2021_S1_trimmed_R2_001.fastq.gz

[12]: cd ../VA2093-2021

[13]: ls

VA2093-2021-2_S4_L001_R1_001.fastq.gz
VA2093-2021_S3_L001_R1_001.fastq.gz
VA2093-2021-2_S4_L001_R2_001.fastq.gz
VA2093-2021_S3_L001_R2_001.fastq.gz

[14]: fastp -i VA2093-2021-2_S4_L001_R1_001.fastq.gz -o␣


↪VA2093-2021-2_S4_trimmed_R1_001.fastq.gz -I VA2093-2021-2_S4_L001_R2_001.

↪fastq.gz -O VA2093-2021-2_S4_trimmed_R2_001.fastq.gz -f 21 -F 20 -t 1 -T 1␣

↪-w 16

Read1 before filtering:


total reads: 1289794
total bases: 323738294
Q20 bases: 298751919(92.2819%)
Q30 bases: 287465473(88.7956%)

Read2 before filtering:


total reads: 1289794
total bases: 323738294
Q20 bases: 269694857(83.3064%)
Q30 bases: 247299978(76.3889%)

Read1 after filtering:


total reads: 1215673
total bases: 256077445
Q20 bases: 242640299(94.7527%)
Q30 bases: 235574975(91.9936%)

Read2 after filtering:


total reads: 1215673
total bases: 257293118

20
Q20 bases: 224750643(87.352%)
Q30 bases: 209136476(81.2834%)

Filtering result:
reads passed filter: 2431346
reads failed due to low quality: 148242
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 658818
bases trimmed due to adapters: 44733442

Duplication rate: 1.24446%

Insert size peak (evaluated by paired-end reads): 270

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2093-2021-2_S4_L001_R1_001.fastq.gz -o
VA2093-2021-2_S4_trimmed_R1_001.fastq.gz -I
VA2093-2021-2_S4_L001_R2_001.fastq.gz -O
VA2093-2021-2_S4_trimmed_R2_001.fastq.gz -f 21 -F 20 -t 1 -T 1 -w 16
fastp v0.24.1, time used: 14 seconds

[17]: cd ../VA2095-2021

[18]: ls

VA2095-2021-2_S6_L001_R1_001.fastq.gz
VA2095-2021_S5_L001_R1_001.fastq.gz
VA2095-2021-2_S6_L001_R2_001.fastq.gz
VA2095-2021_S5_L001_R2_001.fastq.gz

[19]: fastp -i VA2095-2021-2_S6_L001_R1_001.fastq.gz -o␣


↪VA2095-2021-2_S6_trimmed_R1_001.fastq.gz -I VA2095-2021-2_S6_L001_R2_001.

↪fastq.gz -O VA2095-2021-2_S6_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 2 -T 2␣

↪-w 16

Read1 before filtering:


total reads: 1471735
total bases: 369405485
Q20 bases: 339391104(91.8749%)
Q30 bases: 325724504(88.1753%)

Read2 before filtering:


total reads: 1471735
total bases: 369405485
Q20 bases: 323240660(87.5029%)

21
Q30 bases: 303304491(82.1061%)

Read1 after filtering:


total reads: 1410741
total bases: 296128606
Q20 bases: 279249472(94.3001%)
Q30 bases: 270298793(91.2775%)

Read2 after filtering:


total reads: 1410741
total bases: 296128606
Q20 bases: 270478658(91.3382%)
Q30 bases: 257217242(86.86%)

Filtering result:
reads passed filter: 2821482
reads failed due to low quality: 121988
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 744196
bases trimmed due to adapters: 51151414

Duplication rate: 1.25722%

Insert size peak (evaluated by paired-end reads): 334

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2095-2021-2_S6_L001_R1_001.fastq.gz -o
VA2095-2021-2_S6_trimmed_R1_001.fastq.gz -I
VA2095-2021-2_S6_L001_R2_001.fastq.gz -O
VA2095-2021-2_S6_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 2 -T 2 -w 16
fastp v0.24.1, time used: 16 seconds

[20]: fastp -i VA2095-2021_S5_L001_R1_001.fastq.gz -o VA2095-2021_S5_trimmed_R1_001.


↪fastq.gz -I VA2095-2021_S5_L001_R2_001.fastq.gz -O␣

↪VA2095-2021_S5_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 2 -T 2 -w 16

Read1 before filtering:


total reads: 1544771
total bases: 387737521
Q20 bases: 359179805(92.6348%)
Q30 bases: 346280566(89.308%)

Read2 before filtering:


total reads: 1544771
total bases: 387737521

22
Q20 bases: 292103599(75.3354%)
Q30 bases: 256014729(66.0278%)

Read1 after filtering:


total reads: 1380094
total bases: 289232821
Q20 bases: 275690668(95.3179%)
Q30 bases: 268789723(92.932%)

Read2 after filtering:


total reads: 1380094
total bases: 289232821
Q20 bases: 234025182(80.9124%)
Q30 bases: 210350491(72.727%)

Filtering result:
reads passed filter: 2760188
reads failed due to low quality: 329354
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 731530
bases trimmed due to adapters: 51053894

Duplication rate: 0.642037%

Insert size peak (evaluated by paired-end reads): 353

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA2095-2021_S5_L001_R1_001.fastq.gz -o
VA2095-2021_S5_trimmed_R1_001.fastq.gz -I VA2095-2021_S5_L001_R2_001.fastq.gz -O
VA2095-2021_S5_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 2 -T 2 -w 16
fastp v0.24.1, time used: 16 seconds

[21]: cd ../VA3232-2021

[22]: ls

VA3232-2021_S5_L001_R1_001.fastq.gz
VA3232-2021_S5_L001_R2_001.fastq.gz

[24]: fastp -i VA3232-2021_S5_L001_R1_001.fastq.gz -o VA3232-2021_trimmed_R1_001.


↪fastq.gz -I VA3232-2021_S5_L001_R2_001.fastq.gz -O␣

↪VA3232-2021_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 4 -w 16

Read1 before filtering:


total reads: 1880186

23
total bases: 301026002
Q20 bases: 284821156(94.6168%)
Q30 bases: 277725348(92.2596%)

Read2 before filtering:


total reads: 1880186
total bases: 314195238
Q20 bases: 259970409(82.7417%)
Q30 bases: 240175665(76.4415%)

Read1 after filtering:


total reads: 1788215
total bases: 237816467
Q20 bases: 227675784(95.7359%)
Q30 bases: 223031017(93.7828%)

Read2 after filtering:


total reads: 1788215
total bases: 237584067
Q20 bases: 206358878(86.8572%)
Q30 bases: 193611502(81.4918%)

Filtering result:
reads passed filter: 3576430
reads failed due to low quality: 177416
reads failed due to too many N: 0
reads failed due to too short: 6526
reads with adapter trimmed: 3021114
bases trimmed due to adapters: 13575259

Duplication rate: 7.55468%

Insert size peak (evaluated by paired-end reads): 127

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA3232-2021_S5_L001_R1_001.fastq.gz -o
VA3232-2021_trimmed_R1_001.fastq.gz -I VA3232-2021_S5_L001_R2_001.fastq.gz -O
VA3232-2021_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 4 -w 16
fastp v0.24.1, time used: 16 seconds

[25]: cd ../VA4087-2021

[26]: ls

VA4087-21c3_S1_L001_R1_001.fastq.gz
VA4087-21c4_S2_L001_R1_001.fastq.gz

24
VA4087-21c3_S1_L001_R2_001.fastq.gz
VA4087-21c4_S2_L001_R2_001.fastq.gz

[27]: fastp -i VA4087-21c3_S1_L001_R1_001.fastq.gz -o VA4087-21c3_S1_trimmed_R1_001.


↪fastq.gz -I VA4087-21c3_S1_L001_R2_001.fastq.gz -O␣

↪VA4087-21c3_S1_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16

Read1 before filtering:


total reads: 2261638
total bases: 286392493
Q20 bases: 280827161(98.0567%)
Q30 bases: 278487190(97.2397%)

Read2 before filtering:


total reads: 2261638
total bases: 286905898
Q20 bases: 276687353(96.4384%)
Q30 bases: 272780943(95.0768%)

Read1 after filtering:


total reads: 2228047
total bases: 230244245
Q20 bases: 226253407(98.2667%)
Q30 bases: 224441797(97.4799%)

Read2 after filtering:


total reads: 2228047
total bases: 230421403
Q20 bases: 223169238(96.8527%)
Q30 bases: 220357381(95.6323%)

Filtering result:
reads passed filter: 4456094
reads failed due to low quality: 50746
reads failed due to too many N: 0
reads failed due to too short: 16436
reads with adapter trimmed: 2291498
bases trimmed due to adapters: 460571

Duplication rate: 2.53812%

Insert size peak (evaluated by paired-end reads): 115

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA4087-21c3_S1_L001_R1_001.fastq.gz -o
VA4087-21c3_S1_trimmed_R1_001.fastq.gz -I VA4087-21c3_S1_L001_R2_001.fastq.gz -O

25
VA4087-21c3_S1_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16
fastp v0.24.1, time used: 13 seconds

[28]: fastp -i VA4087-21c4_S2_L001_R1_001.fastq.gz -o VA4087-21c4_S2_trimmed_R1_001.


↪fastq.gz -I VA4087-21c4_S2_L001_R2_001.fastq.gz -O␣

↪VA4087-21c4_S2_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16

Read1 before filtering:


total reads: 1979131
total bases: 248493695
Q20 bases: 243787127(98.106%)
Q30 bases: 241827300(97.3173%)

Read2 before filtering:


total reads: 1979131
total bases: 249050374
Q20 bases: 238383799(95.7171%)
Q30 bases: 234514215(94.1634%)

Read1 after filtering:


total reads: 1948060
total bases: 199254019
Q20 bases: 195929953(98.3317%)
Q30 bases: 194429641(97.5788%)

Read2 after filtering:


total reads: 1948060
total bases: 199437762
Q20 bases: 191870868(96.2059%)
Q30 bases: 189123836(94.8285%)

Filtering result:
reads passed filter: 3896120
reads failed due to low quality: 47300
reads failed due to too many N: 0
reads failed due to too short: 14842
reads with adapter trimmed: 2056876
bases trimmed due to adapters: 472512

Duplication rate: 2.89117%

Insert size peak (evaluated by paired-end reads): 115

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA4087-21c4_S2_L001_R1_001.fastq.gz -o
VA4087-21c4_S2_trimmed_R1_001.fastq.gz -I VA4087-21c4_S2_L001_R2_001.fastq.gz -O

26
VA4087-21c4_S2_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16
fastp v0.24.1, time used: 11 seconds

[29]: cd ../VA4090-2021

[30]: ls

VA4090-21c1_S1_L001_R1_001.fastq.gz
VA4090-21c2_S2_L001_R1_001.fastq.gz
VA4090-21c1_S1_L001_R2_001.fastq.gz
VA4090-21c2_S2_L001_R2_001.fastq.gz

[31]: fastp -i VA4090-21c1_S1_L001_R1_001.fastq.gz -o VA4090-21c1_S1_trimmed_R1_001.


↪fastq.gz -I VA4090-21c1_S1_L001_R2_001.fastq.gz -O␣

↪VA4090-21c1_S1_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16

Read1 before filtering:


total reads: 2231889
total bases: 311650786
Q20 bases: 302287180(96.9955%)
Q30 bases: 298534283(95.7913%)

Read2 before filtering:


total reads: 2231889
total bases: 311884694
Q20 bases: 297520216(95.3943%)
Q30 bases: 291690015(93.525%)

Read1 after filtering:


total reads: 2204585
total bases: 255347625
Q20 bases: 247946213(97.1014%)
Q30 bases: 244990735(95.944%)

Read2 after filtering:


total reads: 2204585
total bases: 255423507
Q20 bases: 244624601(95.7722%)
Q30 bases: 240212066(94.0446%)

Filtering result:
reads passed filter: 4409170
reads failed due to low quality: 49526
reads failed due to too many N: 0
reads failed due to too short: 5082
reads with adapter trimmed: 1662504
bases trimmed due to adapters: 376459

27
Duplication rate: 2.41226%

Insert size peak (evaluated by paired-end reads): 168

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA4090-21c1_S1_L001_R1_001.fastq.gz -o
VA4090-21c1_S1_trimmed_R1_001.fastq.gz -I VA4090-21c1_S1_L001_R2_001.fastq.gz -O
VA4090-21c1_S1_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16
fastp v0.24.1, time used: 15 seconds

[32]: fastp -i VA4090-21c2_S2_L001_R1_001.fastq.gz -o VA4090-21c2_S2_trimmed_R1_001.


↪fastq.gz -I VA4090-21c2_S2_L001_R2_001.fastq.gz -O␣

↪VA4090-21c2_S2_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16

Read1 before filtering:


total reads: 2497124
total bases: 346642849
Q20 bases: 336827567(97.1685%)
Q30 bases: 332894957(96.034%)

Read2 before filtering:


total reads: 2497124
total bases: 346901294
Q20 bases: 334093828(96.308%)
Q30 bases: 328617894(94.7295%)

Read1 after filtering:


total reads: 2469381
total bases: 283939640
Q20 bases: 276170510(97.2638%)
Q30 bases: 273052207(96.1656%)

Read2 after filtering:


total reads: 2469381
total bases: 284034063
Q20 bases: 274525716(96.6524%)
Q30 bases: 270386338(95.195%)

Filtering result:
reads passed filter: 4938762
reads failed due to low quality: 50150
reads failed due to too many N: 0
reads failed due to too short: 5336
reads with adapter trimmed: 2009198
bases trimmed due to adapters: 416879

28
Duplication rate: 2.22232%

Insert size peak (evaluated by paired-end reads): 168

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA4090-21c2_S2_L001_R1_001.fastq.gz -o
VA4090-21c2_S2_trimmed_R1_001.fastq.gz -I VA4090-21c2_S2_L001_R2_001.fastq.gz -O
VA4090-21c2_S2_trimmed_R2_001.fastq.gz -f 21 -F 21 -t 3 -T 3 -w 16
fastp v0.24.1, time used: 16 seconds

[34]: cd ../VA541-2022

[35]: ls

VA541-2022-2_R1.fastq.gz VA541-2022-2_R2.fastq.gz
VA541-2022_R1.fastq.gz
VA541-2022.2_R1.fastq.gz VA541-2022.2_R2.fastq.gz
VA541-2022_R2.fastq.gz

[37]: fastp -i VA541-2022-2_R1.fastq.gz -o VA541-2022-2_R1_trimmed.fastq.gz -I␣


↪VA541-2022-2_R2.fastq.gz -O VA541-2022-2_R2_trimmed.fastq.gz -f 21 -F 21 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 3670412
total bases: 554232212
Q20 bases: 513916024(92.7258%)
Q30 bases: 481789788(86.9292%)

Read2 before filtering:


total reads: 3670412
total bases: 554232212
Q20 bases: 512294073(92.4331%)
Q30 bases: 470404283(84.8749%)

Read1 after filtering:


total reads: 3424451
total bases: 406365581
Q20 bases: 384954859(94.7312%)
Q30 bases: 361592780(88.9821%)

Read2 after filtering:


total reads: 3424451
total bases: 406696672
Q20 bases: 384679079(94.5862%)
Q30 bases: 353248639(86.858%)

29
Filtering result:
reads passed filter: 6848902
reads failed due to low quality: 481038
reads failed due to too many N: 3316
reads failed due to too short: 7568
reads with adapter trimmed: 3307802
bases trimmed due to adapters: 69689585

Duplication rate: 2.91678%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA541-2022-2_R1.fastq.gz -o VA541-2022-2_R1_trimmed.fastq.gz -I


VA541-2022-2_R2.fastq.gz -O VA541-2022-2_R2_trimmed.fastq.gz -f 21 -F 21 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 21 seconds

[38]: fastp -i VA541-2022.2_R1.fastq.gz -o VA541-2022.2_R1_trimmed.fastq.gz -I␣


↪VA541-2022.2_R2.fastq.gz -O VA541-2022.2_R2_trimmed.fastq.gz -f 21 -F 21 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 11351881
total bases: 1714134031
Q20 bases: 1584881393(92.4596%)
Q30 bases: 1484395303(86.5974%)

Read2 before filtering:


total reads: 11351881
total bases: 1714134031
Q20 bases: 1585249947(92.4811%)
Q30 bases: 1454377083(84.8462%)

Read1 after filtering:


total reads: 10596086
total bases: 1256476359
Q20 bases: 1187929529(94.5445%)
Q30 bases: 1114449443(88.6964%)

Read2 after filtering:


total reads: 10596086
total bases: 1257447270
Q20 bases: 1189363428(94.5856%)
Q30 bases: 1091333906(86.7896%)

30
Filtering result:
reads passed filter: 21192172
reads failed due to low quality: 1478618
reads failed due to too many N: 10298
reads failed due to too short: 22674
reads with adapter trimmed: 10329036
bases trimmed due to adapters: 217609677

Duplication rate: 35.2916%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA541-2022.2_R1.fastq.gz -o VA541-2022.2_R1_trimmed.fastq.gz -I


VA541-2022.2_R2.fastq.gz -O VA541-2022.2_R2_trimmed.fastq.gz -f 21 -F 21 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 59 seconds

[40]: fastp -i VA541-2022_R1.fastq.gz -o VA541-2022_R1_trimmed.fastq.gz -I␣


↪VA541-2022_R2.fastq.gz -O VA541-2022_R2_trimmed.fastq.gz -f 21 -F 21 -t 1 -T␣

↪1 -w 16

Read1 before filtering:


total reads: 7681469
total bases: 1159901819
Q20 bases: 1070965369(92.3324%)
Q30 bases: 1002605515(86.4388%)

Read2 before filtering:


total reads: 7681469
total bases: 1159901819
Q20 bases: 1072955874(92.504%)
Q30 bases: 983972800(84.8324%)

Read1 after filtering:


total reads: 7171635
total bases: 850110778
Q20 bases: 802974670(94.4553%)
Q30 bases: 752856663(88.5598%)

Read2 after filtering:


total reads: 7171635
total bases: 850750598
Q20 bases: 804684349(94.5852%)
Q30 bases: 738085267(86.7569%)

31
Filtering result:
reads passed filter: 14343270
reads failed due to low quality: 997580
reads failed due to too many N: 6982
reads failed due to too short: 15106
reads with adapter trimmed: 7021234
bases trimmed due to adapters: 147920092

Duplication rate: 4.37222%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA541-2022_R1.fastq.gz -o VA541-2022_R1_trimmed.fastq.gz -I


VA541-2022_R2.fastq.gz -O VA541-2022_R2_trimmed.fastq.gz -f 21 -F 21 -t 1 -T 1
-w 16
fastp v0.24.1, time used: 40 seconds

[41]: cd ../VA542-2022

[42]: ls

VA542-2022-2_R1.fastq.gz VA542-2022-2_R2.fastq.gz
VA542-2022_R1.fastq.gz
VA542-2022.2_R1.fastq.gz VA542-2022.2_R2.fastq.gz
VA542-2022_R2.fastq.gz

[43]: fastp -i VA542-2022-2_R1.fastq.gz -o VA542-2022-2_R1_trimmed.fastq.gz -I␣


↪VA542-2022-2_R2.fastq.gz -O VA542-2022-2_R2_trimmed.fastq.gz -f 25 -F 25 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 2981362
total bases: 450185662
Q20 bases: 414623211(92.1005%)
Q30 bases: 388724397(86.3476%)

Read2 before filtering:


total reads: 2981362
total bases: 450185662
Q20 bases: 415759649(92.3529%)
Q30 bases: 381122722(84.659%)

Read1 after filtering:


total reads: 2772126

32
total bases: 322117785
Q20 bases: 302971986(94.0563%)
Q30 bases: 283713637(88.0776%)

Read2 after filtering:


total reads: 2772126
total bases: 322295178
Q20 bases: 304074310(94.3465%)
Q30 bases: 278004232(86.2576%)

Filtering result:
reads passed filter: 5544252
reads failed due to low quality: 410706
reads failed due to too many N: 2822
reads failed due to too short: 4944
reads with adapter trimmed: 2683728
bases trimmed due to adapters: 48406330

Duplication rate: 1.75584%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA542-2022-2_R1.fastq.gz -o VA542-2022-2_R1_trimmed.fastq.gz -I


VA542-2022-2_R2.fastq.gz -O VA542-2022-2_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 16 seconds

[44]: fastp -i VA542-2022.2_R1.fastq.gz -o VA542-2022.2_R1_trimmed.fastq.gz -I␣


↪VA542-2022.2_R2.fastq.gz -O VA542-2022.2_R2_trimmed.fastq.gz -f 25 -F 25 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 9881653
total bases: 1492129603
Q20 bases: 1376242667(92.2335%)
Q30 bases: 1289788130(86.4394%)

Read2 before filtering:


total reads: 9881653
total bases: 1492129603
Q20 bases: 1378553822(92.3883%)
Q30 bases: 1264530081(84.7467%)

Read1 after filtering:


total reads: 9191765

33
total bases: 1067991438
Q20 bases: 1005506002(94.1493%)
Q30 bases: 941410401(88.1477%)

Read2 after filtering:


total reads: 9191765
total bases: 1068635924
Q20 bases: 1008551668(94.3775%)
Q30 bases: 922626587(86.3368%)

Filtering result:
reads passed filter: 18383530
reads failed due to low quality: 1352900
reads failed due to too many N: 9410
reads failed due to too short: 17466
reads with adapter trimmed: 8898500
bases trimmed due to adapters: 160485780

Duplication rate: 32.3088%

Insert size peak (evaluated by paired-end reads): 148

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA542-2022.2_R1.fastq.gz -o VA542-2022.2_R1_trimmed.fastq.gz -I


VA542-2022.2_R2.fastq.gz -O VA542-2022.2_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 51 seconds

[45]: fastp -i VA542-2022_R1.fastq.gz -o VA542-2022_R1_trimmed.fastq.gz -I␣


↪VA542-2022_R2.fastq.gz -O VA542-2022_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T␣

↪1 -w 16

Read1 before filtering:


total reads: 6900291
total bases: 1041943941
Q20 bases: 961619456(92.2909%)
Q30 bases: 901063733(86.4791%)

Read2 before filtering:


total reads: 6900291
total bases: 1041943941
Q20 bases: 962794173(92.4036%)
Q30 bases: 883407359(84.7845%)

Read1 after filtering:


total reads: 6419639

34
total bases: 745873653
Q20 bases: 702534016(94.1894%)
Q30 bases: 657696764(88.178%)

Read2 after filtering:


total reads: 6419639
total bases: 746340746
Q20 bases: 704477358(94.3908%)
Q30 bases: 644622355(86.3711%)

Filtering result:
reads passed filter: 12839278
reads failed due to low quality: 942194
reads failed due to too many N: 6588
reads failed due to too short: 12522
reads with adapter trimmed: 6214772
bases trimmed due to adapters: 112079450

Duplication rate: 3.06194%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA542-2022_R1.fastq.gz -o VA542-2022_R1_trimmed.fastq.gz -I


VA542-2022_R2.fastq.gz -O VA542-2022_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T 1
-w 16
fastp v0.24.1, time used: 36 seconds

[47]: cd ../VA543-2022

[48]: ls

VA543-2022-2_R1.fastq.gz VA543-2022-2_R2.fastq.gz
VA543-2022_R1.fastq.gz
VA543-2022.2_R1.fastq.gz VA543-2022.2_R2.fastq.gz
VA543-2022_R2.fastq.gz

[49]: fastp -i VA543-2022-2_R1.fastq.gz -o VA543-2022-2_R1_trimmed.fastq.gz -I␣


↪VA543-2022-2_R2.fastq.gz -O VA543-2022-2_R2_trimmed.fastq.gz -f 25 -F 25 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 2382669
total bases: 359783019
Q20 bases: 331106860(92.0296%)
Q30 bases: 309448585(86.0098%)

35
Read2 before filtering:
total reads: 2382669
total bases: 359783019
Q20 bases: 330784846(91.9401%)
Q30 bases: 302508577(84.0808%)

Read1 after filtering:


total reads: 2208851
total bases: 256565098
Q20 bases: 241234432(94.0246%)
Q30 bases: 225268615(87.8017%)

Read2 after filtering:


total reads: 2208851
total bases: 256708057
Q20 bases: 241352015(94.0181%)
Q30 bases: 220137871(85.7542%)

Filtering result:
reads passed filter: 4417702
reads failed due to low quality: 340998
reads failed due to too many N: 2198
reads failed due to too short: 4440
reads with adapter trimmed: 2135090
bases trimmed due to adapters: 39061341

Duplication rate: 1.63233%

Insert size peak (evaluated by paired-end reads): 147

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA543-2022-2_R1.fastq.gz -o VA543-2022-2_R1_trimmed.fastq.gz -I


VA543-2022-2_R2.fastq.gz -O VA543-2022-2_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 13 seconds

[50]: fastp -i VA543-2022.2_R1.fastq.gz -o VA543-2022.2_R1_trimmed.fastq.gz -I␣


↪VA543-2022.2_R2.fastq.gz -O VA543-2022.2_R2_trimmed.fastq.gz -f 25 -F 25 -t␣

↪1 -T 1 -w 16

Read1 before filtering:


total reads: 9056436
total bases: 1367521836
Q20 bases: 1259604172(92.1085%)
Q30 bases: 1178450017(86.1741%)

36
Read2 before filtering:
total reads: 9056436
total bases: 1367521836
Q20 bases: 1259863586(92.1275%)
Q30 bases: 1153549476(84.3533%)

Read1 after filtering:


total reads: 8407353
total bases: 977152795
Q20 bases: 918959320(94.0446%)
Q30 bases: 858867348(87.8949%)

Read2 after filtering:


total reads: 8407353
total bases: 977732860
Q20 bases: 920977216(94.1952%)
Q30 bases: 840875161(86.0025%)

Filtering result:
reads passed filter: 16814706
reads failed due to low quality: 1273186
reads failed due to too many N: 8256
reads failed due to too short: 16724
reads with adapter trimmed: 8087750
bases trimmed due to adapters: 146753539

Duplication rate: 28.6675%

Insert size peak (evaluated by paired-end reads): 147

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA543-2022.2_R1.fastq.gz -o VA543-2022.2_R1_trimmed.fastq.gz -I


VA543-2022.2_R2.fastq.gz -O VA543-2022.2_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T
1 -w 16
fastp v0.24.1, time used: 47 seconds

[52]: fastp -i VA543-2022_R1.fastq.gz -o VA543-2022_R1_trimmed.fastq.gz -I␣


↪VA543-2022_R2.fastq.gz -O VA543-2022_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T␣

↪1 -w 16

Read1 before filtering:


total reads: 6673767
total bases: 1007738817
Q20 bases: 928497312(92.1367%)
Q30 bases: 869001432(86.2328%)

37
Read2 before filtering:
total reads: 6673767
total bases: 1007738817
Q20 bases: 929078740(92.1944%)
Q30 bases: 851040899(84.4505%)

Read1 after filtering:


total reads: 6198502
total bases: 720587697
Q20 bases: 677724888(94.0517%)
Q30 bases: 633598733(87.9281%)

Read2 after filtering:


total reads: 6198502
total bases: 721024803
Q20 bases: 679625201(94.2582%)
Q30 bases: 620737290(86.091%)

Filtering result:
reads passed filter: 12397004
reads failed due to low quality: 932188
reads failed due to too many N: 6058
reads failed due to too short: 12284
reads with adapter trimmed: 5952660
bases trimmed due to adapters: 107692198

Duplication rate: 3.20032%

Insert size peak (evaluated by paired-end reads): 157

JSON report: fastp.json


HTML report: fastp.html

fastp -i VA543-2022_R1.fastq.gz -o VA543-2022_R1_trimmed.fastq.gz -I


VA543-2022_R2.fastq.gz -O VA543-2022_R2_trimmed.fastq.gz -f 25 -F 25 -t 1 -T 1
-w 16
fastp v0.24.1, time used: 35 seconds
Las multiples lecturas para un aislado no serán concatenadas, debido a que Skesa acepta multiples
archivos de lecturas
Listo.

38

You might also like