4.1 sEMG Datasets
We test our proposed training scheme on four online datasets containing EMG information of different hand gestures. The first three were acquired using Myo armband, a device developed by Thalmic Labs, equipped with eight sEMG sensors displayed circularly, while the last one was acquired using 10 active double-differential
OttoBockMy-oBock13E200 sEMG electrodes.
2Myo-sEMG. The first dataset, detailed in Reference [
21] contains EMG signals characterizing 7 hand gestures correlated to the primary movements of the hand. There are four mobility gestures (i.e., wrist flexion and extension, ulnar, and radial deviation) and two gestures used for grasping and releasing objects (i.e., spread fingers and close fist). The 7
th gesture characterizes the neutral position, corresponding to the relaxation of the muscles.
13Myo-sEMG. The second dataset includes 13 gestures: the same 7 gestures described above, plus 6 additional classes. It contains gestures from 50 different subjects and two sets of trials per user. All 13 gestures are depicted in Figure
2. More details about the dataset can be found in Reference [
2].
NinaPro DB5.C. The third dataset is a subset of NinaPro DB5 dataset, detailed in [
47]. The dataset is acquired using two Myo armbands, one positioned just below the elbow and the other one closer to the arm. For our experiments, we considered the subset C, which contains sEMG data associated to 24 gestures.
NinaPro DB1. The forth dataset was introduced in Reference [
5], and encompasses physiological data acquired from 27 able-bodied subjects, performing a total of 53 different gestures. The sEMG data is recorded using 10 electrodes, positioned as follows. The first eight electrodes are evenly distributed around the forearm using an elastic band, maintaining a consistent distance from the radio-humeral joint located directly below the elbow. Two more electrodes are strategically positioned on the major flexor and extensor muscles located in the forearm.
We also validate our models in a real-context scenario. For the real-life predictions, we recorded the EMG activity associated with each gesture at forearm level using Myo armband. The information collected from each channel is transmitted to a computer via Bluetooth protocol where it is processed to extract relevant time domain features that will be used by the classifier to determine which gesture has been performed.
4.3 Performance Analysis in Terms of Accuracy and Robustness
Our best AGR system trained conventionally achieves state-of-the-art performance [
2,
30,
41], of over 99% accuracy for the first two datasets, around 86% in the case of the 24-gestures dataset [
23,
46] and around 88.5% in the case of 53-gestures dataset [
55]. A more detailed comparison with other new sEMG-based AGR systems is presented in Table
1 where we show comparisons with other recent works proposing neural-network solutions on the same datasets.
Since, in this case, the weights are not guaranteed to be positive, the lower bound introduced in Proposition
2.2 does not constitute a valid Lipschitz constant. Computing the exact Lipschitz constant
\(\theta _m\) of the system is a very difficult task [
18], but we can easily bound
\(\theta _m\) between the estimate given by (
6) and the spectral norm of the product of all the weight matrices from the network. We found that the Lipschitz constant upper bound
\(\theta _m\) is greater than
\(10^{12}\) for all our baseline models. Also, while training our model, we faced the problem of overfitting, which is a challenging issue in classification of physiological signals.
This suggests that despite the high performance of the classifiers, their robustness is poorly controlled, leaving the systems vulnerable to adversarial perturbations. A first step towards controlling the Lipschitz constant of the classification algorithm and implicitly its robustness is to impose the nonnegativity condition associated with constraint
\(\mathcal {D}\). Training under such a nonnegativity constraint is shown to improve the network operation interpretability [
13] and acts as a regularization, reducing overfitting. On the other hand, it can affect its approximation capability and potentially lead to a performance decay. To further study the effect of other regularization techniques from a dual performance-robustness perspective, we trained several models for 1000 iterations using common regularization methods, such as
Dropout, \(\ell _1/\ell _2\) Regularization, and
Batch Normalization. Such comparisons were also featured in other works like [
28]. The results for the 7-gesture dataset are summarized in Table
2.
As expected, employing regularization techniques during the training phase improves the overall performance of the baseline classifiers. While the positive impact of regularization techniques on enhancing neural network model performance by mitigating overfitting has been extensively researched and validated, the exploration of their influence on system robustness remains an understudied area. It can be observed that Batch Normalization is the most efficient technique from the accuracy view-point, but it comes with an increase in the overall Lipschitz constant of the classifier. Training the proposed system subject to the nonnegativity constraint (
\(\mathcal {D}\)) results in an overall accuracy of 96.92 %,
\(95.87\%\), 84.75%, and 85.65% for the case of 7, 13, 24, and 53 classes, respectively. The performance decay was balanced by an increase in the robustness, since the Lipschitz constant, computed as indicated in Proposition
2.2, equals
\(\theta _m = 9.69\times 10^{10}\) for 7 classes,
\(\theta _m = 9.73 \times 10^{10}\) for 13 classes,
\(\theta _m = 1.03 \times 10^{11}\) for 24 classes, and
\(\theta _m = 8.4 \times 10^{10}\) for 53 classes. We observed that the accuracy reduction can be overcome by adding additional layers to the architecture. Indeed, we were able to obtain a similar accuracy to the baseline by adding an extra layer to the existing architecture and retraining both systems subject to
\(\mathcal {D}\), i.e., 98.68%, 97.21%, 85.12%, and 87.03% for the 7-gesture, 13-gesture, 24-gesture, and 53-gesture datasets, respectively. Furthermore, compared with the unconstrained models, we managed to maintain a high performance while improving the robustness with respect to unconstrained training, i.e.,
\(\theta _m = 1.02 \times 10^{11}\) for the 7-classes dataset,
\(\theta _m = 9.96 \times 10^{10}\) for the 13-classes dataset,
\(\theta _m = 4.24 \times 10^{11}\) for the 24-classes dataset, and
\(\theta _m = 3.15 \times 10^{11}\) for the 53-classes dataset. We can however conclude from these tests that imposing the nonnegativity of the weight coefficients is not sufficient to reach satisfactory robustness.
To further control the robustness of the systems, we have to manage the Lipschitz constant of the networks by training them under additional spectral norm constraints, as described by Equation (
11). Searching for the optimal accuracy robustness tradeoff, we trained several models considering each of the four aforementioned constraints, namely
\((\mathcal {C}_{i,n})_{1\le i \le m,n\in \mathbb {N}}\) in (
12),
\((\widetilde{\mathcal {C}}_{i})_{1\le i \le m}\) in Equation (
20), and
\(({\check{\mathcal {C}}}_{i,n})_{1\le i \le m,n\in \mathbb {N}}\) in Equation (
10).
By adjusting the upper bound
\(\overline{\vartheta }\), we were able to assess the effect of a robustness constraint on the overall performance of the neural network-based classifiers, and finally to achieve the optimal trade-off. All our models were trained using Algorithm
1 as the optimizer.
The obtained results are summarized in Table
3. As expected, obtaining a good robustness-accuracy tradeoff requires paying attention to the way we design our constrained networks. In all the cases, we show that using tight constraints during the training phase to approximate the Lipschitz bound improves the overall performance of the classifier, proving the generalization properties of our solution.
For comparison, for each of the proposed constraints, we also evaluated the use of an inexact projection, designated by
\(\widetilde{\mathsf {P}}\) (see Section
3.3). It can be observed that using an exact projection yields significantly better results. By combining tight constraints and exact projection techniques, we observe that the robustness of the network can be properly ensured while keeping a good accuracy in both cases. Indeed, we succeeded in ensuring a Lipschitz constant around 1 for a 95% accuracy for the first two datasets. The observed loss in accuracy with respect to a standard training is consistent with the “no free lunch theorem” [
54].
Training neural networks subject to tight spectral norm constraints can be challenging,
3 and the cost of obtaining a good performance is the training time. We used a learning rate scheduler strategy during training, reducing the learning rate by a factor of 2 if the performance does not improve for 1000 epochs. Figure
4 shows the training curves for both validation and training sets in the context of the unconstrained baseline model (yellow and green lines), and in the case of training a constrained version (red and blue lines) using the optimal projection
\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\), with
\(\overline{\vartheta }_m = 0.95\). Even though it requires more iterations, the constrained model is capable of reaching an accuracy comparable with the baseline, while providing a robustness certificate.
Since the training curves may show some slight variations, we measured the accuracy variations in two ways: by computing the classical
standard deviation (\(\operatorname{std})\), and by employing
median absolute deviation (\(\operatorname{mad}\)). For a vector
\((x_i)_{1\le i \le I}\), it is expressed as
\(\operatorname{MAD} = \operatorname{median}((| x_i - \zeta (x)|)_{1\le i \le I})\), where
\(\zeta (x)\) represents the median of the vector components. From this quantity, we can derive an empirical estimate of the standard deviation by multiplying
\(\operatorname{MAD}\) with a factor equal to 1.4826. The latter estimate is known to be more robust to outliers for Gaussian distributed data, especially in the case of small populations. The results are summarized in Table
4. It can be observed that the empirical standard deviation is below 1.6% and the robust estimate of it is below 1.1% for all four datasets. These deviations values are normal considering the size of the dataset and shows that the presented results are relevant and consistent.
Next, we have also evaluated how the positivity constraint impacts the overall accuracy of our system. We trained a robust network by allowing the weights to have arbitrary signs. For this purpose, we control individually the Lipschitz constant of each layer
\(i \in \lbrace 1, \dots , m\rbrace\) to be less than a given value
\(\overline{\vartheta }^{1/m}\). The exact projection onto
\(\widetilde{\mathcal {C}}_i\),
\(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\), as well as the approximate one
\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_i}\) were computed as described previously. In this case,
\(\overline{\vartheta }\) represents an upper bound on the Lipschitz constant of the system. Table
5 summarizes the results for different values of
\(\overline{\vartheta }\), for two datasets. We compare our method for dealing with Lipschitz constraints with the approach proposed in Reference [
51]. This approach, which is implemented in the
deel-lip library allows the user to train robust networks in a convenient manner, offering a robustness certificate by performing a spectral normalization for each layer. It can be observed on these datasets that our method yields similar results when using the approximate projection, but better ones when using the exact projection. These results underline again the importance of carefully managing the projections and the effect it has on the accuracy of the system.