Data Preprocessing and
Normalization
Data preprocessing is a critical stage in ensuring that heterogeneous MRI
and fMRI images collected from multiple scanners, patients, and modalities
are aligned to a uniform representation. Without proper normalization, deep
learning models may misinterpret intensity variations as pathological
signals, leading to unreliable segmentation and classification. In this section,
we formalize the mathematical formulations for normalization and
preprocessing strategies used in the proposed framework.
The first technique employed is Z-score normalization:
′ X −μ
X=
σ
(1)
where X is the raw voxel intensity, μ is the mean voxel intensity within the
brain mask, and σ is the standard deviation. This operation standardizes the
data so that all inputs have zero mean and unit variance. The advantage of
this transformation is that it minimizes inter-subject intensity differences,
allowing the network to focus on structural features rather than scanner-
dependent artifacts.
Another widely used technique is min-max scaling:
′ X − X min
X=
X max − X min
(2)
where X min and X max are the minimum and maximum intensity values of the
image. Unlike Z-score normalization, which produces unbounded
standardized values, min-max scaling compresses all voxel values into a
bounded interval [0 ,1] . This normalization is particularly effective for
convolutional layers that perform better when inputs lie within a fixed
range, ensuring consistent gradient propagation and preventing exploding
activations.
To further reduce scanner-induced variability, histogram matching is
performed:
H new (i)=H ref ( H src ( i) )
−1
(3)
Here, H s r c (i) is the cumulative histogram of the source image and H r ef (i) is
that of a reference template. By aligning the intensity distribution of each
subject to a common reference, histogram matching harmonizes multi-
institutional data. This process prevents biases that occur when the same
tumour tissue appears with systematically higher or lower intensity due to
differences in acquisition protocols.
MRI images also suffer from low-frequency intensity variations caused by
magnetic field non-uniformities, commonly known as bias field. To correct
this, we employ bias field correction:
I observed (x )
I corrected (x)=
B(x )
( 4)
where I observed (x) represents the measured voxel intensity at position x , and
B(x) is the estimated bias field. By dividing out the estimated bias, the
corrected image reflects true tissue properties, leading to more accurate
tumour segmentation boundaries.
Normalization is further stabilized by rescaling voxel dimensions to isotropic
resolution:
I resampled (x , y , z)=I
( sx , sy , sz )
x y z
(5)
where s x , s y , s z are the resampling factors along each axis. This ensures that
tumour volumes are not distorted due to different acquisition voxel sizes,
which is critical for 3D convolutional kernels that assume spatial
consistency.
Another important step is skull stripping, which removes non-brain tissues
such as the skull, scalp, and fat. This is mathematically expressed as:
I brain (x )=I (x)⋅ M brain (x )
(6)
where M brain ( x) is a binary brain mask equal to 1 inside brain regions and 0
elsewhere. This step eliminates irrelevant signals that could otherwise
mislead the tumour detection algorithm, improving both training stability
and inference accuracy.
We also apply intensity clipping to mitigate outliers:
I clipped (x )=min ( max ( I (x), Q1 −1.5 ⋅ I Q R ) ,Q3 +1.5 ⋅ I Q R )
(7)
where Q 1 and Q 3 are the first and third quartiles of intensity distribution and
I Q R=Q3 −Q 1. This method reduces the influence of abnormal bright spots,
often caused by imaging noise, without losing meaningful tumour intensities.
Following normalization, data augmentation enhances robustness:
I a u g (x )=T (I ( x),θ)
(8)
where T (⋅) is a transformation function parameterized by random variable θ
(including rotation, flipping, elastic deformation, and noise injection).
Augmentation introduces variability, preventing overfitting and ensuring
generalization to unseen patient scans.
To handle multiple modalities, we apply modality-wise standardization:
′ I m (x )− μ m
I m ( x)= ∀ m∈ {T 1 , T 1 c , T 2 , F L A I R }
σm
(9)
This guarantees that each modality contributes equally when concatenated
as input channels. For example, FLAIR intensities (which highlight edema)
are standardized independently of T1c intensities (which highlight
enhancing tumours).
Finally, normalized patches are extracted for efficient training:
′
Pi j k =I (x +i , y + j, z +k ),(i , j, k ) ∈Ω
(10)
where Ω defines the patch window size. This allows training on smaller sub-
volumes rather than entire 3D scans, reducing memory cost and enabling
the network to learn finegrained local features.