1 Introduction
Sleep accounts for a significant portion of a human life, precisely one-third, and is directly related to one’s physical and mental well-being. As a fundamental technique for disease monitoring [
13], arrangement, and intervention, sleep stage classification has remarkable practical significance in healthcare [
6]. The two principal standards governing sleep stage classification are the
Rechtschaffen & Kales (R&K) criteria [
49] and the
American Academy of Sleep Medicine (AASM) criteria [
4]. Based on these widely accepted international sleep stage classification standards, sleep monitoring is indispensable in many healthcare areas. Notably, brain disorders such as aphasia, epilepsy, and Parkinson’s disease exhibit intricate and close associations with sleep disorders, prompting extensive research into the application of sleep monitoring in the intervention of brain disorders [
43]. Christensen et al. [
11] employed
electroencephalography (EEG) monitoring equipment and data-driven analytical methods to reveal sleep characteristics in patients with insomnia. Coelli et al. [
12] conducted benchmark research on sleep monitoring in epileptic patients, using a multiscale functional clustering approach to survey epileptic networks in various sleep stages. In Parkinson’s disease, sleep disorders represent the most frequent non-motor symptoms, and monitoring sleep quality offers an effective way to anticipate Parkinson’s disease onset and track disease progression [
27].
The conventional method of sleep stage classification requires professional medical experts to manually analyze the
Polysomnography (PSG) signals of subjects [
51]. This approach is time-consuming, low in efficiency, and labour intensive. Moreover, this method’s results are subjective and easily influenced by the expertise and experience of the analysts [
26]. The development of artificial intelligence has led to the emergence of automatic sleep classification approaches that significantly improve accuracy and efficiency [
26]. Typically, these methods extract time-frequency transformation features from the raw PSG signal and employ machine learning methods like Random Forest [
38],
Support Vector Machine (SVM) [
1], and K-Nearest Neighbor [
52] to build the final classification model. However, these methods require significant prior knowledge for feature extraction and processing. With the development of deep learning, the emergence of deep learning has brought many advancements in the accuracy and efficiency of sleep stage classification. Deep learning-based sleep stage classification methods employ end-to-end neural networks for feature extraction and model construction.
Convolutional neural networks (CNN) have been employed to extract spatial sleep features from the PSG signal [
40,
48]. Goshtasbi et al. proposed a fully convolutional neural network called SleepFCN [
18], which utilizes residual dilated causal convolutions to capture temporal context information and thus enhances the accuracy and speed of recognition.
Recurrent neural networks (RNN) have also been used to extract temporal features related to sleep from the PSG signal [
8,
41,
54]. Furthermore,
Long Short-Term Memory (LSTM) [
15,
40] has been utilized to address the issue of forgetting over long-time series signals. Zhao et al. proposed SleepContextNet [
57], which utilizes a CNN-LSTM model structure combined with data augmentation techniques, significantly improving classification accuracy. Wang et al. [
45] proposed a novel multi-scale attention mechanism incorporating channel and spatial attention, resulting in exceptional classification accuracy. Phan et al. proposed SeqSleepNet [
33] to address the sleep stage classification problem as a sequence-to-sequence classification problem. To achieve interpretability at the epoch and sequence level and improve the accuracy of sleep stage classification, they further developed SleepTransformer [
34], which is the first transformer-based sleep stage classification model and achieved state-of-the-art performance. To address the issue of heterogeneity among physiological signals, Zhu et al. proposed MaskSleepNet [
59]. This model learns the joint distribution of mask and non-mask modalities by leveraging partial modalities of mask signals. It also uses multi-scale convolution and multi-head attention to extract features and make predictions at sub-scales, respectively. In addition, Researchers have utilized sparse autoencoders to categorize pre-extracted time-frequency features [
44]. And some generative adversarial networks models are used for EEG and
electrocardiography (ECG) signal generation to improve related classification tasks [
17].
However, the abovementioned models are more suitable for extracting features from grid or image data. They do not utilize the functional connectivity relationship of brain structures in the PSG signal. Furthermore, the brain’s cerebral cortex forms a non-Euclidean space, making it well suited for representing the feature distribution of brain space using a graph structure. Correspondingly, the
graph neural networks (GCN) have been widely employed and worked well in graph-structured data [
58]. Although existing studies have achieved acceptable sleep stage classification accuracy [
21,
23,
28], these approaches have not addressed the challenge of PSG signal-based sleep stage classification, which depends on the combination of multiple physiological signals, including EEG, ECG,
electrooculography (EOG), and
electromyography (EMG) signals, which vary significantly across different subjects [
10]. For instance, the EEG signal can be affected by subjects’ electrode drift and hair, while the EMG signal can be affected by muscle fatigue, skin resistance, and muscle strength of subjects [
56]. The challenge of subject dependence limits the adaptability of sleep stage classification models, as models trained on certain subjects cannot be applied to new subjects. However, most existing methods only modify the feature extractor based on graph models without focusing on improving subject independence. Furthermore, obtaining and labeling sleep stage classification data is complex and requires professional medical expertise [
39], making training a new model for each new subject with their data impractical.
Fortunately, the development of transfer learning has provided hope for achieving subject-independent sleep stage classification [
30,
60]. Researchers have begun to focus on improving the generalization of the model. Jia et al. proposed the MSTGCN model [
22], which integrates domain generalization [
5,
46] and spatio-temporal GCN, using the
domain adversarial (DA) method to improve the model’s robustness across subjects. Tang et al. [
42] employed the
Maximum Mean Discrepancy (MMD) [
19] method to reduce the distribution difference between the training set and the testing set data of the ECG signal. Most other transfer learning-based sleep stage classification methods utilize the pre-training and fine-tuning paradigm to enhance prediction accuracy [
2]. However, this paradigm has many limitations due to the need for target data. Moreover, they ignored the structural characteristics of the sleep stage classification problem, resulting in unsatisfactory limited improvement results. To tackle the aforementioned challenges, we have fused the sleep stage classification problem with domain generalization [
31], culminating in the proposal of a
Structure Incentive Domain Adversarial learning (SIDA) method to augment subject generalization of the sleep stage classification model. As shown in Figure
1, the inspiration for the SIDA method came from the structure of the sleep cycle. During an entire sleep episode, there are typically five complete sleep cycles [
16], each consisting of five stages from the Wakefulness (Wake) stage to the
Rapid Eye Moment (REM) stage and back [
7]. The sleep stage categories themselves are limited and consist of five distinct stages, and each stage may exhibit unique subject dependencies. Furthermore, we generalize the problems caused by the above structure as the
Subject Dependency Differences of different sleep Categories (SDDC) concept. More specifically, in contrast to traditional domain generalization models, SIDA establishes distinct domain (i.e., subject) discriminators for every sleep stage to dissociate the subject dependence differences amongst the various sleep stages. This strategy facilitates the model in precisely learning subject or domain invariant features. Moreover, we have bridged the sleep stage classifier and domain discriminators in SIDA with direct connections, positively influencing the training process. To our knowledge, this study marks the inaugural effort to define the SDDC notion precisely. Leveraging the PSG-based sleep stage classification’s category structure, we introduce the SIDA method to attain optimal cross-subject sleep stage classification. Notably, we have utilized the leave-one-subject-out cross-validation method to rigorously validate our method. We have trained the classification model on the data from existing seen subjects and tested the efficacy of the trained model on the data of another unseen subject. Furthermore, we have validated and chosen the ultimate model on separate validation data that are randomly selected from training data. We have evaluated the effectiveness of the proposed SIDA method on three benchmark sleep stage classification datasets (i.e., ISRUC-S1 [
24], ISRUC-S3 [
24], and
Sleep Heart Health Study Visit 1 (SHHS1) [
36,
53]. The experimental results indicate that our proposed SIDA method outperforms other comparing methods and delivers the best cross-subject sleep stage classification results. In conclusion, the primary contributions of this study can be summarized as follows:
—
We clearly define the SDDC concept and open up the idea of handling the challenge of subject dependence on the category from the perspective of transfer learning.
—
We propose the SIDA method, which is a domain generalization method, to realize category-by-category subject dependency alignment and achieve direct soft weighting between the classifier and discriminators.
—
Our proposed SIDA method is a plug-and-play method that can easily combine with existing methods. With experiments on the three public sleep stage classification datasets, the extensive experiments demonstrate that the results of the existing sleep stage classification methods have been improved by combining them with our SIDA method.
3 Preliminaries and Motivation
3.1 Sleep Stage Classification Problem
PSG is often employed to record various human body electrical signals during sleep. It contains multi-channel EEG, ECG, EOG, and EMG signals. The PSG signal can be segmented into multi-segment multi-channel signals with 30-second epochs each for sleep stage classification. According to the AASM standard, sleep stages are divided into five stages: Wake, REM, N1, N3, and N3, corresponding to the five categories in sleep stage classification.
The sleep stage classification aims to make the model learn the mapping relationship between the input signal and the sleep stage category. The sleep stage classification problem is defined as
\(\hat{y}_i = G_y(G_f(x_i))\), building a sleep stage classification model based on the input sample
\(x_i\), where
\(G_f\) is the feature extractor, and
\(G_y\) is the label classifier. Given the input signal sequence
\(\mathcal {S} =~(S_{i-d},\ ...\ , S_i,\ ...,\ S_{i+d}) \in \mathbb {R}^{N\times {T_n}\times {T_s}}\\)\), where
N denotes the number of channels,
\(T_s\) denotes the time series length of each epoch,
\(T_n=2d+1\) denotes the number of samples of neighbouring
\(2d+1\) epochs,
\(\mathcal {S}\) represents the temporal context of
\(S_i\). The classification model will jointly predict the characteristics of the
ith epoch according to the transition characteristics of sleep stage rules [
9]. Features of each sleep epoch are pre-extracted from the dual-channel FeatureNet [
22] and an
N-channel feature matrix of the
ith epoch is defined as
\(X_i = {(x^1_i,\ x^2_i,\ ...,\ x^N_i)}^T \in \mathbb {R}^{N\times F}\), where
\(x_i^n\in \mathbb {R}^F, n\in \lbrace 1, 2,..., N\rbrace\) denotes features pre-extracted from channel
n at epoch
i. Sometimes, features are preprocessed by bandpass filters according to the frequency distribution of different signals. However, current sleep stage classification methods generally use full unfiltered features.
3.2 Domain Generalization
Suppose we have M subjects (i.e., subjects), we randomly divide M subjects into \(M^{^{\prime }}\) groups, where \(M^{^{\prime }} = |\frac{M}{num}|,num\) is the number of subjects in each group. Group \(m^{^{\prime }} = \lbrace m_1^{^{\prime }},...,m_{num}^{^{\prime }}\rbrace\), where {\(m_1^{^{\prime }},\ ...,\ m_{num}^{^{\prime }}\)} is random group sampling without replacement from the set {\(1,\ ...,\ M\)}. The data of the \(M^{^{\prime }}\) group constitutes \(M^{^{\prime }}\) domains (i.e., \(\mathcal {D}_{m^{^{\prime }}} = \lbrace (x_{m^{^{\prime }},k},y_{m^{^{\prime }},k})| k\in \lbrace 1,\ ...,\ K\rbrace \rbrace\) and K denotes the number of samples of \(D_{M^{^{\prime }}}\)), and the joint distributions between each pair of domains are different (i.e., \(P^{j_1}_{XY} \ne P^{j_2}_{XY}, 1 \le j_1 \ne j_2 \le M^{^{\prime }}\)). Cross-subject classification is the following process, suppose the sample of {\(1,\ ...,\ M^{^{\prime }}-1\)} domains constitutes \(\mathcal {D}_{train} = \lbrace \mathcal {D}_1,...,\mathcal {D}_{M^{^{\prime }}-1} \rbrace = \lbrace (x_j,y_j,d_j)|j\ \in \ \lbrace 1,\ ...,\ J\rbrace \rbrace . (x_j,y_j,d_j)\) is the sample composed of \(M^{^{\prime }}-1\) domains, where \(x_j\) denotes the training sample (i.e. the pre-trained feature), \(y_j\) denotes the sleep stage label, \(d_j \in \lbrace 1,\ ...,\ M^{^{\prime }}-1\rbrace\) denotes the subject domain label. J is the sum of the numbers of \(M^{^{\prime }}-1\) domain samples. The sample of the \(M^{^{\prime }}\)th domain constitutes \(\mathcal {D}_{test}={\mathcal {D}_{M^{^{\prime }}}}=\lbrace (x_j^{te},y_j^{te},d_j^{te})\rbrace\). \((x_j^{te},y_j^{te},d_j^{te})\) is the data composed of the \(M^{^{\prime }}\)th domain, where \(x_j^{te}\) denotes the sample (i.e., the pre-trained feature), \(y_j^{te}\) denotes the sleep stage label, \(d_j^{te}\) denotes the subject domain label.
3.3 Motivation
We aim to enhance our model’s cross-subject sleep stage classification robustness through domain generalization. Domain generalization eliminates differences between domains (i.e., subjects) through domain alignment. The alignment process aims to align all data of each domain without distinction. However, the biggest challenge in classification tasks is always the category difference, as different sleep stage categories have subject dependency differences. As illustrated in Figure
2, different shapes represent different sleep stage categories, and different colors represent different subject domains. In aligning subject data, if data of the same category are correctly aligned (i.e., the green box in the figure is a positive transfer), then it will enhance the model’s cross-subject generalization and improve its classification accuracy. However, if data from the different categories are incorrectly aligned (i.e., the red box in the figure is a negative transfer), then it will severely impact the model classification accuracy. Inspired by the subject dependency difference of categories of sleep stage classification, we hope to align subjects in a fine-grained way by category. Fortunately, the sleep stage classification problem for subject generalization has a category-complete and recurrent structure with information from domain supervision. This structure motivates us to propose the category-specific domain adversarial method SIDA.