research-article

Open access

EMG-Based Automatic Gesture Recognition Using Lipschitz-Regularized Neural Networks

Authors:

Ana Neacşu,

Jean-Christophe Pesquet,

Corneliu BurileanuAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 2

Article No.: 26, Pages 1 - 25

https://doi.org/10.1145/3635159

Published: 22 February 2024 Publication History

PDF eReader

Abstract

This article introduces a novel approach for building a robust Automatic Gesture Recognition system based on Surface Electromyographic (sEMG) signals, acquired at the forearm level. Our main contribution is to propose new constrained learning strategies that ensure robustness against adversarial perturbations by controlling the Lipschitz constant of the classifier. We focus on nonnegative neural networks for which accurate Lipschitz bounds can be derived, and we propose different spectral norm constraints offering robustness guarantees from a theoretical viewpoint. Experimental results on four publicly available datasets highlight that a good tradeoff in terms of accuracy and performance is achieved. We then demonstrate the robustness of our models, compared with standard trained classifiers in four scenarios, considering both white-box and black-box attacks.

1 Introduction

In recent years, the concept of human–computer interaction (HCI) has been at the core of many scientific and sociological developments. Combined with the power of machine learning algorithms, it has led to some of the most outstanding achievements in nowadays technology, which are used successfully in an ever-increasing number of areas impacting our lives, e.g., medicine [29], autonomous driving [38], natural language processing [57], and so on. Researchers all around the world focus on providing new intuitive and accurate ways of interacting with devices around, based on gesture, voice, or vision analysis [36]. Gestures constitute a universal and intuitive way of communication, with the potential of bringing the Internet of Things (IoT) experience to a different, more organic level [48]. Automatic gesture recognition (AGR) algorithms can be successfully used in various applications, from sign language recognition (SLR) [12] to VR games [56].

Various solutions for AGRs based on image or video stream analysis, leveraging on computer vision algorithms have been proposed; see for example [22, 25, 34]. A multi-stream solution for dynamic hand-gesture recognition is described in Reference [58]. Multi-modal approaches for gesture classification have been also studied [42]. A novel method showing a fully neuromorphic implementation [9] achieves good results (96% accuracy while reducing the inference time by 30%). Although a good performance is achieved on synthetic data, in real-life scenarios these systems may be sensitive to environmental conditions, e.g., light conditions, background, and so on. Additionally, these systems are often computationally demanding and consequently not always suited for real-time applications. Accelerometers and electromyography (EMG) sensors provide an alternative low-cost technology for gesture sensing [30]. sEMG stands for surface EMG and represents the electrical manifestation of the neuromuscular activation related to the contraction of the muscles [4]. In Reference [46], the authors propose a method combining feature selection with ensamble learning, achieving around 78% classification accuracy for 53 gestures. The applications of sEMG-based classification systems are focused on, but not limited to, assertive devices and rehabilitation or postural control therapy for physically impaired persons [31]. With the continuous development of more versatile signal processing techniques, the applications of EMG signal classification expanded to a wide range of domains including augmented reality, gaming industry, military applications, and so on [32, 43].

Two critical issues need to be addressed when developing AGR algorithms: fast enough inference to ensure real-time feeling for the end-user, and accurate and robust classification to guarantee that the gesture is correctly identified no matter the environmental conditions.

However, deep neural networks (DNNs), which are probably the most powerful methods, may appear as black boxes whose robustness is not always well-controlled. For real-life applications, it is mandatory to guarantee the reliability of such techniques. Nowadays, the main difficulty to overcome consists in developing high-performance systems that are also trustable and safe. An additional challenge is to avoid implementation heaviness during the learning phase.

In Reference [52], the authors showed that slightly altering data inputs that were correctly classified by the network can lead to a wrong classification [7, 35, 53]. This finding was at the origin of the concept of adversarial inputs, which constitute malicious input data that can deceive machine learning models. For example, Reference [7] shows how voice interfaces can be fooled by creating carefully crafted artificial audio inputs of unintelligible voice that are miss-classified as specific vocal commands by the system. Also, Reference [27] introduces several methods for generating adversarial examples on ImageNet that are so close to the original data that differences are indistinguishable for the human eye.

It must be emphasized that adversarial inputs are not necessarily artificially created with the intention to sabotage the system. As other physiological signals, e.g., EEG or EKG, EMG signals have low frequency components (usually between 10–150 Hz), and low amplitudes ($\le \! 10$ mV Peak to Peak). This makes them very sensitive to noise and outside perturbations that can occur innately, under the form of noise stemming from acquisition devices, imperfect sensor contact, and so on. Those can seriously flaw the performance of real-life applications based on pre-trained models [40]. An empirical way of training more robust AGR systems is detailed in Reference [23], where a strategy of training using noisy labels is proposed.

As highlighted in Reference [27], the Lipschitz behaviour of the network is tightly correlated with its robustness against adversarial attacks. The Lipschitz constant allows to upper bound the output perturbation knowing the magnitude of the input one, for a given metric [50]. Controlling this constant thus represents a feasible solution to limit the effect of adversarial attacks. Computing the exact Lipschitz constant of a neural network is, however, a very complex problem, so the main challenge is to find clever ways to approximate this constant effectively.

Recently, several techniques to ensure the Lipschitz stability of neural networks have been explored. For example, Reference [53] proposes a novel weight spectral normalization technique applied to stabilize the training of the discriminator in Generative Adversarial Networks (GANs). The Lipschitz constant of the network is viewed as a hyper-parameter that can be tuned in the training process of the image generation task. Doing so leads to a model with improved generalization capabilities.

In Reference [3] norm-constraint GroupSort based architectures are proposed and it is shown that they can be used as universal Lipschitz function approximators. The authors apply gradient norm preservation to create Lipschitzian networks that offer adversarial robustness guarantees. In Reference [14] the authors introduce Parseval networks, another approach for designing networks which are intrinsically robust to adversarial noise, by imposing the Lipschitz constant of each layer of the system to be less than 1. In Reference [24], a convex optimization framework is introduced to compute tight upper bounds for the Lipschitz constant of DNNs. They make use of the observation that commonly used activation operators are gradients of convex functions. Semi-definite programming approaches to ensure robustness are also explored in Reference [45]. The main contributions of this article are:

—

To propose a robust real-time AGR system based on sEMG signals. The robustness is ensured by using a novel learning algorithm for training feedforward neural networks.

—

To show that a good accuracy-robustness balance can be reached. To do so, we train the system under carefully crafted spectral norm constraints, allowing us to finely control its Lipschitz constant. A tight Lipschitz constant is efficiently estimated by focusing on neural networks with positive weights as in Reference [13].

—

To demonstrate the performance of the final architecture in real-life experiments where we show that the proposed robust model outperforms those trained conventionally.

—

To analyze how our system behaves when the input is affected by different noise levels, simulating perturbations that may occur in real scenarios.

—

To show the validity of our solution by experimenting on four distinct publicly available sEMG gestures datasets.

The rest of the article is structured as follows. The theoretical background of our work is detailed in Section 2. In Section 3, we present the proposed optimization algorithm and we investigate the way of dealing with the constraints. The application and the results are discussed in Section 4, while Section 5 deals with how our model behaves when facing adversarial data. Finally, Section 6 contains some concluding remarks.

2 Robustness Solutions in the Context of Nonnegative Neural Networks

2.1 Problem Formulation

Any feedforward neural network is obtained by cascading m layers associated with operators $(T_i)_{1\le i \le m}$. The neural network can thus be expressed as the following composition of operators:

\begin{equation} T=T_m\circ \ldots \circ T_1. \end{equation}

(1)

Each layer $i\in \lbrace 1,\ldots ,m\rbrace$ has a real-valued vector input $x_i$ of dimension $N_{i-1}$ which is mapped to

\begin{equation} T_i(x_i)=R_i(W_i x_i+b_i), \end{equation}

(2)

where $W_i\in \mathbb {R}^{N_i\times N_{i-1}}$, $b_i\in \mathbb {R}^{N_i}$ are the weight matrix and bias parameter, respectively. $R_i:\mathbb {R}^{N_i} \rightarrow \mathbb {R}^{N_i}$ constitutes a non-linear activation operator which is applied component-wise (e.g., ReLU or Sigmoid) or globally (e.g., Softmax). Figure 1 shows a graphical representation of this concept.

Fig. 1.

Even though the choice of the activation $R_i$ may differ depending on the task at hand, it has been shown in References [16, 18] that most of them are actually $\alpha _i$-averaged operators with $\alpha _i \in \,]0,1]$. Recall that $R_i$ is an $\alpha _i$-averaged operator if, for every pair $(x_i,y_i)\in (\mathbb {R}^{N_i})^2$, the following inequality holds:

\begin{equation} \Vert R_i(x_i)-R_i(y_i)-(1-\alpha _i)(x_i-y_i)\Vert \le \alpha _i \Vert x_i-y_i\Vert . \end{equation}

(3)

When $\alpha _i = 1/2$, $R_i$ is said to be firmly nonexpansive. For standard choices of activation operators, $R_i$ is firmly nonexpansive since it is the proximity operator of a proper, lower-semicontinous function (see Reference [16] for more details). Note that, in Reference [24], it is assumed that $R_i$ operates component-wise and is slope-bounded. The authors emphasize that the most common case corresponds to lower and upper slope values equal to 0 and 1, respectively. It follows from [15, Proposition 2.4] that a function satisfies this property if and only if it is the proximity operator of some proper lower-semicontinuous convex function, so that similar assumptions to those made in Reference [16] are recovered.

As explained in Reference [18], examples of activation operators $R_i$ which are $\alpha _i$-averaged with $\alpha _i \gt 1/2$ can be encountered. They basically correspond to over-relaxations of firmly nonexpansive operator. An example of such operators is the Swish activation function [49]. Another famous example is the group-sort operator:

\begin{equation} \left(\forall x_i = \begin{bmatrix}x_{i,1}\\ \vdots \\ x_{i,M} \end{bmatrix}\in \mathbb {R}^{N_i}\right)\qquad R_i(x_i)= \begin{bmatrix}x_{i,1}^\uparrow \\ \vdots \\ x_{i,M}^\uparrow \end{bmatrix}, \end{equation}

(4)

where the vector $x_i$ has been decomposed in M subvectors $x_{i,j}$ with $j\in \lbrace 1,\ldots ,M\rbrace$, of dimension B ($N_i= B M$) and $x_{i,j}^\uparrow$ designate the vector of components of $x_{i,j}$ sorted in ascending order. $R_i$ is then purely nonexpansive, i.e., $\alpha _i=1$. Note that max-pooling can be achieved by composing this group sort operation with a linear operator. Indeed, if $i\lt m$, $M=N_{i+1}$, and $W_{i+1}$ is the matrix extracted from the $N_i\times N_i$ identity matrix $\text{Id}\,_{N_i}$ by selecting the matrix rows with indices multiple of B, then $W_{i+1}\circ R_i$ corresponds to a max-pooling.

2.2 Lipschitz Robustness Certificate

Consider a neural network T as described in Figure 1. let $x\in \mathbb {R}^{N_0}$ be the input of the network and let $T(x)\in \mathbb {R}^{N_m}$ be its associated output. By adding some small perturbation $z \in \mathbb {R}^0$ to the input, the perturbed input is $\tilde{x} = x + z$. The effect of the perturbation on the output of the system can be quantified by the following inequality:

\begin{equation} \Vert T(\tilde{x})-T(x)\Vert \le \theta _m \Vert z\Vert , \end{equation}

(5)

where $\theta _m \ge 0$ denotes a Lipschitz constant of the network. $\theta _m$ represents thus, an important parameter that allows us to assess and control the sensitivity of a neural network to various perturbations. It needs, however, to be accurately estimated to provide valuable information. A standard approximation to the Lipschitz constant [27] is given by

\begin{equation} \theta _m = \prod _{i=1}^m \Vert W_i\Vert _{\rm S}, \end{equation}

(6)

where $\Vert \cdot \Vert _{\rm S}$ denotes the spectral norm of a matrix. Although simple to compute, this approximate bound is over-pessimistic. Different methods for obtaining tighter estimates of the Lipschitz constant have been presented in the recent literature; see for example [11, 18, 24, 37, 50]. Local estimates of the Lipschitz constant can also be performed, which may appear more relevant. But they are more complex to compute and, as we will see, controlling the global Lipschitz constant is usually sufficient to get a good performance. Estimating the global Lispchitz constant of the network is an NP (non-deterministic polynomial-time)-hard problem [50]. Although there exist efficient approaches to approximate an accurate bound [11, 24, 37], computing these estimates may be expensive for wide or deep networks. In addition, using these bounds within a training procedure is a difficult task [45]. In this work, we will make the following assumption.

Assumption 2.1.

Let a neural network be given by (1) where the $i^{th}$ layer with $i\in \lbrace 1,\ldots ,m\rbrace$ is given by (2). We assume that

(i)

all the activation layers, except possibly the last one, consist of separable averaged operators, that is, for every $i\in \lbrace 1,\ldots ,m-1\rbrace$, there exist averaged functions $(\rho _{i,k})_{1\le k \le N_i}$ from $\mathbb {R}$ to $\mathbb {R}$ such that $R_i:(\xi _{i,k})_{1\le k \le N_i} \mapsto \big (\rho _{i,k}(\xi _{i,k})\big)_{1\le i \le k}$;

(ii)

at the last activation layer, $R_m$ is an averaged operator.

Our approach will be grounded on the following result.

Proposition 2.2 ([18]).

Suppose that Assumption 2.1 holds. For every $i\in \lbrace 1,\ldots ,m\rbrace$, let $A_i$ be the matrix whose elements are the absolute values of those of $W_i$. Then,

\begin{equation} \vartheta _m = \Vert A_m \times \cdots \times A_1\Vert _{\rm S} \end{equation}

(7)

is a Lipschitz constant of T. In addition

\begin{equation} \Vert W_m \times \cdots \times W_1\Vert \le \vartheta _m. \end{equation}

(8)

In particular if, for every $i\in \lbrace 1,\ldots ,m\rbrace$, $W_i\in [0,+\infty [^{N_i\times N_{i-1}}$, $\vartheta _m$ is equal to the lower bound in (8).

Based on this proposition, the best estimate for the Lipschitz constant of a given feedforward neural network having nonnegative weights simplifies to the spectral norm of the product of all the weight matrices composing the network. More precisely, the obtained Lipschitz constant

\begin{equation*} {\vartheta _m = \Vert W_m \times \cdots \times W_1\Vert _{\rm S}} \end{equation*}

is the Lipschitz constant of a purely linear network, where all the non-linear activation operators have been replaced with the identity operator.

The above result is guaranteed to be valid only in the case when all the weights are nonnegative. In the general case of networks with weights having arbitrary signs, it can be proved that $\Vert W_m \times \cdots \times W_1\Vert _{\rm S}$ represents only a lower bound of the Lipschitz constant established in Reference [18]. It is also worth mentioning that the proposed results hold for any algebraic structure of the weight matrices $(W_i)_{1\le i \le m}$. Using the above defined bound, in the following, we will propose an algorithm for training models with theoretical robustness guarantees, and validate it in the context of gesture classification. By focusing on gesture recognition, we aims to showcase the effectiveness of our methodology in a challenging domain having multiple applications and for which real experiments can be made. Moreover, gesture recognition tasks often involve complex and dynamic data, making them suitable testbeds for evaluating the robustness and adaptability of our proposed approach.

3 Optimization Methods for Training Robust Feedforward Networks

3.1 Stochastic Gradient Descent—Projected Variant

Standard training in neural networks consists in the minimization of a nonconvex cost function with respect to the model parameters by means of an iterative strategy. Let $\mathcal {L}$ be the cost function defined as follows:

\begin{equation} \mathcal {L}(\eta) = \sum _{k=1}^K \ell (z_k,\eta), \end{equation}

(9)

where $\eta = (\eta _i)_{1\le i \le m}$ is a vector encompassing all the model parameters. For each layer $i\in \lbrace 1,\ldots ,m\rbrace$, $\eta _i$ denotes a vector of dimension $N_i(N_{i-1}+1)$ that contains the scalar variables associated with the weight matrices $W_i$ and the corresponding bias components $b_i$. The data information is represented by $(z_k)_{1\le k \le K}$. For every $k\in \lbrace 1,\ldots ,K\rbrace$, $z_k$ is a pair consisting of an input of the system and the associated desired output (ground truth). Also, $\ell$ represents the loss function assumed to be differentiable (almost everywhere) with respect to $\eta$.

To ensure robustness, we shall impose spectral norm constraints on the weight matrices. In other words, the vector of parameters $\eta$ is constrained to belong to a closed set $\mathcal {S}$ that will be described in the next section. We propose to use an extension of a standard optimization technique for training neural networks [19]. More specifically, we will implement a projected stochastic gradient algorithm. A momentum parameter is introduced in this algorithm to accelerate the convergence process.

Algorithm 1 describes the iterations performed at each epoch $n\gt 0$. We see that there are two nested loops: the outer loop operates on the batch index q and the second one on the layer index i. In this algorithm, $\gamma _n\in \,]0,+\infty [$ is the learning rate, while $\zeta _n\in [0,+\infty [$ denotes the inertia parameter for momentum. The algorithm is very similar to block-iterative techniques used in convex optimization [19]. The parameters of each layer are indeed updated successively by performing a gradient step on the data in the current mini-batch (which can be epoch-dependent). $\nabla _i$ represents the gradient, computed by standard backpropagation mechanism, with respect to $\eta _i$ for each $i\in \lbrace 1,\ldots ,m\rbrace$. This stochastic gradient step is followed by a projection $\mathsf {P}_{\mathcal {S}_{i,n}}$ onto the constraint set $\mathcal {S}_{i,n}$. The definition of this set as well as the way of handling this projection are detailed in the following.

3.2 Constraint Sets

As mentioned before, this work revolves around feed-forward networks with positive weights. Thus, the first condition that we impose is nonnegativity for each layer $i\in \lbrace 1,\ldots ,m\rbrace$, which is modeled by the constraint set

\begin{equation} \mathcal {D}_i = \lbrace W_i \in \mathbb {R}^{N_i \times N_{i - 1}} \mid W_i \ge 0 \rbrace . \end{equation}

(10)

Moreover, based on our standing assumptions and Proposition 2.2, we must impose a spectral norm constraint on the weight matrices to control the robustness of the system. This translates mathematically as the following upper bound constraint:

\begin{equation} \Vert W_m \times \cdots \times W_1 \Vert _{\rm S} \le \overline{\vartheta }, \end{equation}

(11)

where $\overline{\vartheta }$ represents the target maximum Lipschitz constant of the network. This bound constitutes a direct measure of the system level of robustness against adversarial inputs. We need to handle these two constraints simultaneously during the training process. Imposing nonnegativity is fairly easy since (10) defines a simple convex constraint. By contrast, constraint (11) does not satisfy the convexity property. Since (11) corresponds to a closed set in the underlying space of weight matrices and this set has a nonempty intersection with $\mathcal {D}=\mathcal {D}_1\times \cdots \times \mathcal {D}_m$, the projection onto the intersection of the two sets can be defined but it is not guaranteed to be unique. To circumvent this difficulty, it can be noticed that (11) actually defines a multi-convex constraint in the sense that if, for every $i\in \lbrace 1,\ldots ,m\rbrace$, $(W_j)_{1\le j \le m,j\ne i}$ are given, then (11) imposes a convex constraint on $W_i$. This suggests to introduce the following closed and convex set:

\begin{equation} \mathcal {C}_{i,n} =\lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \Vert A_{i,n} W_i B_{i,n}\Vert _{\rm S} \le \overline{\vartheta }\rbrace , \end{equation}

(12)

in order to control the Lipschitz constant. Hereabove, the matrices $A_{i,n}$ and $B_{i,n}$ represent the product of the weight matrices for the previous and the posterior layers, respectively. By adopting the convention that $A_{i,n} = \text{Id}\,$ if $i=m$ and $B_{i,n} = \text{Id}\,$ if $i=1$, we define these matrix products as

\begin{align} A_{i,n}=W_{m,n} \times \cdots \times W_{i+1,n},\qquad B_{i,n}=W_{i-1,n+1} \times \cdots \times W_{1,n+1}, \end{align}

(13)

where $(W_{j,n})_{1\le j\le m}$ denote the estimates of the weight matrices at each iteration n, as it appears in Algorithm 1.

Thus, our objective will be to perform the projection onto the set $\mathcal {S}_{i,n} = \mathcal {D}_i \cap \mathcal {C}_{i,n}$, for each layer $i \in \lbrace 1, \dots , m\rbrace$ and at each iteration n. Several algorithms can be envisaged to solve this convex optimization problem.

Before describing our proposed algorithmic solution, let us recall the expressions of the required elementary projections. For every $W \in \mathbb {R}^{S\times T}$, the projection of W onto $[0,+\infty [^{S\times T}$ is

\begin{equation} \mathsf {P}_{[0,+\infty [^{S\times T}}(W) = (\widetilde{W}_{s,t})_{1\le s \le S,1\le t \le T}, \end{equation}

(14)

where, for every $s \in \lbrace 1,\ldots ,S\rbrace$ and $t\in \lbrace 1,\ldots ,T\rbrace$,

\begin{equation} \widetilde{W}_{s,t} = {\left\lbrace \begin{array}{ll} W_{s,t} & \mbox{if $W_{s,t} \ge 0$}\\ 0 & \mbox{otherwise.} \end{array}\right.} \end{equation}

(15)

Let $\mathcal {B}(0,\overline{\vartheta })$ be the closed spectral ball of center 0 and radius $\overline{\vartheta }$ defined as¹

\begin{equation} \mathcal {B}(0,\overline{\vartheta }) = \lbrace W \in \mathbb {R}^{S\times T} \mid \Vert W\Vert _{\rm S} \le \overline{\vartheta }\rbrace . \end{equation}

(16)

For every $W = (W_{s,t})_{1\le s \le S,1\le t \le T} \in \mathbb {R}^{S\times T}$, let $U \Lambda V^\top$ be the singular value decomposition of W, where $U \in \mathbb {R}^{S\times R}$ and $V \in \mathbb {R}^{T\times R}$ are matrices such that $U^\top U = \text{Id}\,$ and $V^\top V = \text{Id}\,$, $R=\min \lbrace S,T\rbrace$, and $\Lambda = \operatorname{Diag}(\lambda _{1},\ldots ,\lambda _{R})$, $(\lambda _r)_{1\le r \le R}\in [0,+\infty [^R$ being the singular values of W. Then the projection of W onto $\mathcal {B}(0,\overline{\vartheta })$ is expressed as

\begin{equation} \mathsf {P}_{\mathcal {B}(0,\overline{\vartheta })}(W) = U \widetilde{\Lambda } V^\top , \end{equation}

(17)

where $\widetilde{\Lambda } = \operatorname{Diag}(\widetilde{\lambda }_{1},\ldots ,\widetilde{\lambda }_{r})$ and

\begin{equation} (\forall i \in \lbrace 1,\ldots ,r\rbrace)\quad \widetilde{\lambda }_{i} = {\left\lbrace \begin{array}{ll} \lambda _{i} & \mbox{if $\lambda _{i}\le \overline{\vartheta }$}\\ \overline{\vartheta } & \mbox{otherwise.} \end{array}\right.} \end{equation}

(18)

To compute the projection onto $\mathcal {S}_{i,n}$ of a matrix $\overline{W}_{i}\in \mathbb {R}^{N_i\times N_{i-1}}$, we propose to employ the FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) version of a dual forward-backward method in Algorithm 2. This algorithm is based on a dual proximal approach [33] and constitutes an extension of the optimization method originally proposed in Reference [17]. The rationale for this algorithm is given in the appendix.

3.3 Handling Looser Constraints

The Lipchitz constant of the network can be controlled in multiple ways. Besides the solution formulated in Section 3.2, a more standard approach to control it [52] consists in imposing

\begin{equation} \prod _{i=1}^m \Vert W_i\Vert _{\rm S} \le \overline{\vartheta }. \end{equation}

(19)

Two strategies have been implemented to enforce this constraint.

(i)

The first one consists in imposing a uniform bound on the spectral norm of each weight matrix $(W_i)_{1\le i \le m}$, which leads to the following convex constraint sets:

\begin{equation} (\forall i \in \lbrace 1,\ldots ,m\rbrace) \quad \widetilde{\mathcal {C}}_i =\lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \Vert W_i \Vert _{\rm S} \le \overline{\vartheta }^{1/m}\rbrace . \end{equation}

(20)

(ii)

The second strategy aim at introducing more flexible bounds on the spectral norms of each layer. It is based on the following choice for the individual convex constraint sets:

\[\begin{eqnarray*} &(\forall n \in \mathbb {N}\setminus \lbrace 0\rbrace)(\forall i \in \lbrace 1,\ldots ,m\rbrace)\nonumber \nonumber\\ &\;\;{\check{\mathcal {C}}}_{i,n} =\Big \lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \nonumber \nonumber \Vert W_i\Vert _{\rm S}\le \Vert W_{i,n}\Vert _{\rm S}\Big (\frac{\vartheta }{\prod _{j=1}^{m} \Vert W_{j,n}\Vert _{\rm S}}\Big)^{1/m}\Big \rbrace . \end{eqnarray*}\]

For every $i\in \lbrace 1,\ldots ,m\rbrace$, projecting onto $\widetilde{\mathcal {C}}_i$ or ${\check{\mathcal {C}}}_{i,n}$ is performed by truncating a singular value decomposition, similarly to the technique described at the end of Section 3.2. The projections onto $\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$ and $\check{{\mathcal {C}}}_{i,n} \cap \mathcal {D}_i$ can then be computed by using the same iterative method as in Algorithm 2 with $A_{i,n}=B_{i,n}=\text{Id}\,$.

In all the proposed constrained optimization methods, the projection $\mathsf {P}_{\mathcal {B}(0,\widetilde{\vartheta })}$ onto a spectral ball with radius $\widetilde{\vartheta } \gt 0$ plays a prominent role. The ball radius depends on the handled constraint (11), (20), or (10). A complex operation such as a singular value decomposition may be very demanding in terms of computational resources when dealing with large size matrices. In that case, we propose to use an approximate projection [53] defined as

\begin{equation} (\forall W \in \mathbb {R}^{S\times T})\quad \mathsf {P}_{\mathcal {B}(0,\widetilde{\vartheta })}(W) \simeq {\left\lbrace \begin{array}{ll} W & \mbox{if $\Vert W\Vert _{\rm S} \le \widetilde{\vartheta }$}\\ \displaystyle \frac{\widetilde{\vartheta }}{\Vert W\Vert _{\rm S}} W & \mbox{otherwise.} \end{array}\right.} \end{equation}

(21)

Using this approximation in Algorithm 2 yields approximate projections $(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i})_{1\le i \le m,n\gt 0}$. Note, however, that we then lose the theoretical guarantees of convergence in Algorithm 2, even if this issue was not observed in our implementation.

An additional advantage of Formula (21) is that it allows the nonnegativity of the elements of the input matrix to be kept. This allows us to derive cheap approximate versions of the projection onto $\widetilde{\mathcal {C}}_i \cap \mathcal {D}_i$ with $i\in \lbrace 1,\ldots ,m\rbrace$ by first projecting onto $\mathcal {D}_i$ and then applying the approximate projection onto $\widetilde{\mathcal {C}}_i$. The resulting approximate projection is denoted by $(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i})_{1\le i \le m}$. A similar procedure can be followed to compute approximate projections $(\widetilde{\mathsf {P}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i})_{1\le i \le m,n\gt 0}$ onto $(\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i)_{1\le i \le m, n\gt 0}$.

4 AGR Experimental Setup

4.1 sEMG Datasets

We test our proposed training scheme on four online datasets containing EMG information of different hand gestures. The first three were acquired using Myo armband, a device developed by Thalmic Labs, equipped with eight sEMG sensors displayed circularly, while the last one was acquired using 10 active double-differential OttoBockMy-oBock13E200 sEMG electrodes.²

Myo-sEMG. The first dataset, detailed in Reference [21] contains EMG signals characterizing 7 hand gestures correlated to the primary movements of the hand. There are four mobility gestures (i.e., wrist flexion and extension, ulnar, and radial deviation) and two gestures used for grasping and releasing objects (i.e., spread fingers and close fist). The 7^th gesture characterizes the neutral position, corresponding to the relaxation of the muscles.

13Myo-sEMG. The second dataset includes 13 gestures: the same 7 gestures described above, plus 6 additional classes. It contains gestures from 50 different subjects and two sets of trials per user. All 13 gestures are depicted in Figure 2. More details about the dataset can be found in Reference [2].

Fig. 2.

NinaPro DB5.C. The third dataset is a subset of NinaPro DB5 dataset, detailed in [47]. The dataset is acquired using two Myo armbands, one positioned just below the elbow and the other one closer to the arm. For our experiments, we considered the subset C, which contains sEMG data associated to 24 gestures.

NinaPro DB1. The forth dataset was introduced in Reference [5], and encompasses physiological data acquired from 27 able-bodied subjects, performing a total of 53 different gestures. The sEMG data is recorded using 10 electrodes, positioned as follows. The first eight electrodes are evenly distributed around the forearm using an elastic band, maintaining a consistent distance from the radio-humeral joint located directly below the elbow. Two more electrodes are strategically positioned on the major flexor and extensor muscles located in the forearm.

We also validate our models in a real-context scenario. For the real-life predictions, we recorded the EMG activity associated with each gesture at forearm level using Myo armband. The information collected from each channel is transmitted to a computer via Bluetooth protocol where it is processed to extract relevant time domain features that will be used by the classifier to determine which gesture has been performed.

4.2 Proposed Architecture

The raw 8/10 channels EMG signal is split using a 250 ms sliding window, with $50\%$ overlap. A 250 ms window is long enough to cover the most common gesture durations, ensuring that the essential temporal aspects of each gesture are captured within this window. Overlap ensures that important signal characteristics, such as abrupt changes or transient patterns, are not missed due to window boundaries. By using overlapping windows, the feature extraction process also becomes more robust, as multiple windows contribute to representing the same temporal information from the EMG signal. From each window of each channel a series of 8 time descriptors are extracted. The information from all the channels is then concatenated, forming a 64 (80 for the forth dataset) dimensional vector. The 7-gestures dataset contains around 200k vector samples, the 13-gestures dataset has around 59k vector samples, the 24-gestures dataset has around 20k vector samples, while the 53-gestures dataset has 250k vector samples. Those are split in training, validation, and test sets at user level according to the ratio: 70%, 20%, and 10%. These vectors are fed to the network in mini-batches of size 2048. For our experiments, we used as the loss function $\ell$ the categorical cross entropy, with a learning rate $\gamma =10^{-3}$ and momentum parameter $\zeta =0.02$. The considered architectures consists of a 6-hidden layer ($m=6$) fully connected neural networks, with different parameters depending on the considered datasets, but the same core structure, as displayed in Figure 3. Let $x = (x_k)_{0 \le k \le K-1}$ be the vector of EMG samples acquired on a window from one channel. For this work, we considered some of the most relevant features to describe sEMG data, as follows.

Fig. 3.

(i)

Mean Absolute Value (MAV)—represents the average muscle activation level within a specific time window. As different gestures involve varying degrees of muscle activation, MAV can capture the overall muscle activity pattern, helping to distinguish between gestures with low and high muscle involvement.

\begin{equation} \mathrm{MAV}(x) = \frac{1}{K}\ \sum _{k=0}^{K-1}{|x_k|}. \end{equation}

(22)

(ii)

Zero Crossing Rate (ZCR)—indicates how frequently the EMG signal crosses zero within a time window. Rapid changes in muscle activation lead to higher ZCR values, making it relevant for identifying gestures involving quick and repetitive movements. A threshold $\alpha \ge 0$ is used in order to lessen the noise effect. This feature can be computed in an incremental manner and it is defined as

\begin{equation} \mathrm{ZCR}(x) = \Big | \big \lbrace k\in \lbrace 1,\ldots ,K-1\rbrace \mid |x_k -\ x_{k-1}| \ge \alpha \text{ and } x_kx_{k-1}\lt 0 \big \rbrace \Big |. \end{equation}

(23)

(iii)

Waveform Length (WL)—quantifies the amplitude variations within a time window. Longer WL values may correspond to gestures involving sustained muscle activity or complex patterns. It corresponds to the following total variation seminorm:

\begin{equation} \mathrm{WL}(x) = \sum _{k=1}^{K-1}{|x_k -x_{k-1}|}. \end{equation}

(24)

(iv)

Slope Sign Changes (SSC)—counts the number of times the slope of the EMG signal changes its sign within a window. It is effective in detecting abrupt changes in muscle activation, which is crucial for recognizing gestures with distinct start and stop points. It amounts in checking a condition on three consecutive samples $x_k, x_{k - 1}, x_{k + 1}$ with $k\in \lbrace 2,\ldots ,K-2\rbrace$:

\begin{equation} \mathrm{SSC}(x) = \Big |\lbrace k\in \lbrace 2,\ldots ,K-2\rbrace \mid (x_k -\ x_{k-1}) (x_k\ -\ x_{k+1}) \ge \alpha \rbrace \Big |, \end{equation}

(25)

where the threshold $\alpha \gt 0$ is employed to reduce the influence of the noise.

(v)

Root Mean Square (RMS)—provides information about the overall energy of the EMG signal within a time window. High energy levels may correspond to forceful gestures, while lower energy levels may indicate more subtle movements. RMS helps in recognizing gestures with varying intensity levels and it is given by

\begin{equation} \mathrm{RMS}(x) =\sqrt {\frac{1}{K}\sum _{k=0}^{K-1}x_{k}^2}\,. \end{equation}

(26)

(vi)

Hjorth parameters—are a set of three features originally developed for characterizing electroencephalography signals and then successfully applied to sEMG signal recognition. The most relevant Hjorth activity parameter can be thought of as the integrated power spectrum and basically corresponds to the variance of the signal calculated as follows:

\begin{equation} \ \sigma ^2(x)\ =\ \frac{1}{K}\sum _{k=0}^{K-1}{(x_{k}-\ \mu (x))}^2, \end{equation}

(27)

where $\mu (x)$ represents the mean value of the signal. The standard deviation and $\mathrm{RMS}(x)$ are equal when the mean of the signal is zero.

(vii)

Skewness— measures the asymmetry of the EMG signal amplitude distribution within a time window. Positive skewness indicates a longer tail on the right side, while negative skewness indicates a longer tail on the left side and can be useful in identifying gestures with asymmetric muscle activations.

\begin{equation} \mathrm{Skew}(x)=\frac{1}{K}\sum _{k=0}^{K-1}{\left(\frac{x_{k}-\ \mu (x)}{\sigma (x)}\right)}^3. \end{equation}

(28)

(viii)

Integrated Square-root EMG (ISEMG) – It provides a measure of the total muscular activity and is particularly useful for capturing the overall muscle involvement over time. iEMG is commonly used to quantify muscle fatigue and effort during movements. In the context of gesture recognition, iEMG can help differentiate between gestures with varying levels of sustained muscle activation and can be indicative of the gesture intensity and duration.

\begin{equation} \mathrm{ISEMG}(x) = \sum _{k=0}^{K-1} \sqrt {\mid x_{k} \mid }. \end{equation}

(29)

4.3 Performance Analysis in Terms of Accuracy and Robustness

Our best AGR system trained conventionally achieves state-of-the-art performance [2, 30, 41], of over 99% accuracy for the first two datasets, around 86% in the case of the 24-gestures dataset [23, 46] and around 88.5% in the case of 53-gestures dataset [55]. A more detailed comparison with other new sEMG-based AGR systems is presented in Table 1 where we show comparisons with other recent works proposing neural-network solutions on the same datasets.

Table 1.

Myo-sEMG		13Myo-sEMG		NinaPro DB5 Ex. C		NinaPro DB 1
Method	Acc.[%]	Method	Acc.[%]	Method	Acc.[%]	Method	Acc.[%]
7-DNN (ours)	99.67	13-DNN (ours)	99.31	24-DNN (ours)	86.20	53-DNN (ours)	88.50
DL-TL [21]	98.12	EMG-CNN [1]	99.28	EELM [46]	83.60	IRDC-Net[55]	89.82

Table 1. Comparison to Other sEMG-Based AGR Systems

Since, in this case, the weights are not guaranteed to be positive, the lower bound introduced in Proposition 2.2 does not constitute a valid Lipschitz constant. Computing the exact Lipschitz constant $\theta _m$ of the system is a very difficult task [18], but we can easily bound $\theta _m$ between the estimate given by (6) and the spectral norm of the product of all the weight matrices from the network. We found that the Lipschitz constant upper bound $\theta _m$ is greater than $10^{12}$ for all our baseline models. Also, while training our model, we faced the problem of overfitting, which is a challenging issue in classification of physiological signals.

This suggests that despite the high performance of the classifiers, their robustness is poorly controlled, leaving the systems vulnerable to adversarial perturbations. A first step towards controlling the Lipschitz constant of the classification algorithm and implicitly its robustness is to impose the nonnegativity condition associated with constraint $\mathcal {D}$. Training under such a nonnegativity constraint is shown to improve the network operation interpretability [13] and acts as a regularization, reducing overfitting. On the other hand, it can affect its approximation capability and potentially lead to a performance decay. To further study the effect of other regularization techniques from a dual performance-robustness perspective, we trained several models for 1000 iterations using common regularization methods, such as Dropout, $\ell _1/\ell _2$ Regularization, and Batch Normalization. Such comparisons were also featured in other works like [28]. The results for the 7-gesture dataset are summarized in Table 2. As expected, employing regularization techniques during the training phase improves the overall performance of the baseline classifiers. While the positive impact of regularization techniques on enhancing neural network model performance by mitigating overfitting has been extensively researched and validated, the exploration of their influence on system robustness remains an understudied area. It can be observed that Batch Normalization is the most efficient technique from the accuracy view-point, but it comes with an increase in the overall Lipschitz constant of the classifier. Training the proposed system subject to the nonnegativity constraint ($\mathcal {D}$) results in an overall accuracy of 96.92 %, $95.87\%$, 84.75%, and 85.65% for the case of 7, 13, 24, and 53 classes, respectively. The performance decay was balanced by an increase in the robustness, since the Lipschitz constant, computed as indicated in Proposition 2.2, equals $\theta _m = 9.69\times 10^{10}$ for 7 classes, $\theta _m = 9.73 \times 10^{10}$ for 13 classes, $\theta _m = 1.03 \times 10^{11}$ for 24 classes, and $\theta _m = 8.4 \times 10^{10}$ for 53 classes. We observed that the accuracy reduction can be overcome by adding additional layers to the architecture. Indeed, we were able to obtain a similar accuracy to the baseline by adding an extra layer to the existing architecture and retraining both systems subject to $\mathcal {D}$, i.e., 98.68%, 97.21%, 85.12%, and 87.03% for the 7-gesture, 13-gesture, 24-gesture, and 53-gesture datasets, respectively. Furthermore, compared with the unconstrained models, we managed to maintain a high performance while improving the robustness with respect to unconstrained training, i.e., $\theta _m = 1.02 \times 10^{11}$ for the 7-classes dataset, $\theta _m = 9.96 \times 10^{10}$ for the 13-classes dataset, $\theta _m = 4.24 \times 10^{11}$ for the 24-classes dataset, and $\theta _m = 3.15 \times 10^{11}$ for the 53-classes dataset. We can however conclude from these tests that imposing the nonnegativity of the weight coefficients is not sufficient to reach satisfactory robustness.

Table 2.

Regularization method	Param.	Accuracy [%]			Lipschitz constant	Training time [ms]
Regularization method	Param.	Train	Validation	Test	Lipschitz constant	Training time [ms]
None	–	99.98	79.63	80.35	5.3$\times 10^{12}$	150
Dropout	rate=0.1	98.31	97.76	97.66	5.1$\times 10^{11}$	160
	rate=0.15	98.00	97.44	97.34	5.6$\times 10^{11}$	160
	rate=0.2	97.55	97.03	96.98	6.1$\times 10^{12}$	162
Batch Norm.	–	99.96	99.65	99.78	6.3$\times 10^{12}$	160
$\ell _1$regularization	reg. factor=$10^{-4}$	99.28	97.35	97.59	7.2$\times 10^{9}$	135
	reg. factor=$10^{-3}$	95.87	95.53	95.48	9.2$\times 10^{9}$	162
	reg. factor=$10^{-2}$	84.35	84.24	83.34	5.8$\times 10^{10}$	160
$\ell _2$regularization	reg. factor=$10^{-4}$	99.71	98.36	98.02	5.7$\times 10^{11}$	160
	reg. factor=$10^{-3}$	98.66	97.99	97.25	3.2$\times 10^{10}$	160
	reg. factor=$10^{-2}$	91.97	91.86	91.78	5.5$\times 10^{8}$	160
Non-negativity	–	97.23	96.82	96.92	9.69$\times 10^{10}$	162

Table 2. Performance and Robustness Results for 7-Gestures Dataset Baseline Models

The training time is computed for an epoch, with a batch size=2048. All models were trained for 1000 iterations. All experiments were performed using 2 $\times$ A100 40 Gb Nvidia GPUs.

To further control the robustness of the systems, we have to manage the Lipschitz constant of the networks by training them under additional spectral norm constraints, as described by Equation (11). Searching for the optimal accuracy robustness tradeoff, we trained several models considering each of the four aforementioned constraints, namely $(\mathcal {C}_{i,n})_{1\le i \le m,n\in \mathbb {N}}$ in (12), $(\widetilde{\mathcal {C}}_{i})_{1\le i \le m}$ in Equation (20), and $({\check{\mathcal {C}}}_{i,n})_{1\le i \le m,n\in \mathbb {N}}$ in Equation (10).

By adjusting the upper bound $\overline{\vartheta }$, we were able to assess the effect of a robustness constraint on the overall performance of the neural network-based classifiers, and finally to achieve the optimal trade-off. All our models were trained using Algorithm 1 as the optimizer.

The obtained results are summarized in Table 3. As expected, obtaining a good robustness-accuracy tradeoff requires paying attention to the way we design our constrained networks. In all the cases, we show that using tight constraints during the training phase to approximate the Lipschitz bound improves the overall performance of the classifier, proving the generalization properties of our solution.

Table 3.

	Accuracy		75 %	80 %	85 %	90 %	$95\%$
Lipschitz constant 7-gestures Myo-sEMG	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	19.5	37.5	68.3	$3.5 \times 10^{4}$	$3.5 \times 10^{8}$
	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	0.66	13.47	74.16	$1.04 \times 10^3$	$1.39 \times 10^5$
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	${\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}$	0.71	1.84	3.42	6.87	11.60
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	0.70	1.35	3.41	6.79	11.20
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.44	1.79	2.93	4.85	5.68
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.35	0.46	0.65	0.82	0.95
Lipschitz constant 13-gestures 13Myo-sEMG	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	20.2	41.8	145.2	$2.2 \times 10^5$	$1.21 \times 10^{11}$
	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	0.85	20.47	112.3	$1.62 \times 10^{4}$	$2.31 \times 10^{8}$
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	${\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}$	0.84	2.08	4.23	7.54	12.02
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	0.81	2.01	4.12	7.50	11.92
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.54	1.87	3.38	4.20	5.78
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.49	0.53	0.75	0.92	1.25
	Accuracy		65 %	70 %	75 %	80 %	85%
Lipschitz constant 24-gestures NinaPro DB5 Ex C.	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	25.13	57.16	188.26	$2.5 \times 10^6$	$2.14 \times 10^{11}$
	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	1.85	31.12	112.3	$1.82 \times 10^{4}$	$4.63 \times 10^{8}$
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	${\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}$	1.74	2.41	6.02	10.17	20.14
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	1.57	2.18	5.94	10.58	19.69
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.88	2.05	4.28	5.74	6.84
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	0.77	0.96	1.27	1.44	1.96
Lipschitz constant 53-gestures NinaPro DB 1	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	26.26	86.17	200.45	$4.10 \times 10^6$	$4.32 \times 10^{11}$
	$\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	2.60	50.12	163.14	$2.8 \times 10^{4}$	$2.9 \times 10^{9}$
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	${\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}$	2.94	4.43	6.88	14.25	22.16
	$\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	2.83	2.18	5.56	16.48	20.16
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	1.22	1.80	6.83	7.40	8.23
	$\mathcal {C}_{i,n} \cap \mathcal {D}_i$	$\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	1.56	2.08	2.53	2.74	3.88

Table 3. Lipschitz Constant Obtained with Various Constrained Optimization Strategies for Different Accuracies

For comparison, for each of the proposed constraints, we also evaluated the use of an inexact projection, designated by $\widetilde{\mathsf {P}}$ (see Section 3.3). It can be observed that using an exact projection yields significantly better results. By combining tight constraints and exact projection techniques, we observe that the robustness of the network can be properly ensured while keeping a good accuracy in both cases. Indeed, we succeeded in ensuring a Lipschitz constant around 1 for a 95% accuracy for the first two datasets. The observed loss in accuracy with respect to a standard training is consistent with the “no free lunch theorem” [54].

Training neural networks subject to tight spectral norm constraints can be challenging,³ and the cost of obtaining a good performance is the training time. We used a learning rate scheduler strategy during training, reducing the learning rate by a factor of 2 if the performance does not improve for 1000 epochs. Figure 4 shows the training curves for both validation and training sets in the context of the unconstrained baseline model (yellow and green lines), and in the case of training a constrained version (red and blue lines) using the optimal projection $\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$, with $\overline{\vartheta }_m = 0.95$. Even though it requires more iterations, the constrained model is capable of reaching an accuracy comparable with the baseline, while providing a robustness certificate.

Fig. 4.

Since the training curves may show some slight variations, we measured the accuracy variations in two ways: by computing the classical standard deviation ($\operatorname{std})$, and by employing median absolute deviation ($\operatorname{mad}$). For a vector $(x_i)_{1\le i \le I}$, it is expressed as $\operatorname{MAD} = \operatorname{median}((| x_i - \zeta (x)|)_{1\le i \le I})$, where $\zeta (x)$ represents the median of the vector components. From this quantity, we can derive an empirical estimate of the standard deviation by multiplying $\operatorname{MAD}$ with a factor equal to 1.4826. The latter estimate is known to be more robust to outliers for Gaussian distributed data, especially in the case of small populations. The results are summarized in Table 4. It can be observed that the empirical standard deviation is below 1.6% and the robust estimate of it is below 1.1% for all four datasets. These deviations values are normal considering the size of the dataset and shows that the presented results are relevant and consistent.

Table 4.

	Accuracy	75%	80%	85%	90%	95%
7-gestures dataset (Myo-sEMG) Model Variation	empirical std	0.65	1.22	0.56	1.35	1.10
7-gestures dataset (Myo-sEMG) Model Variation	robust std	1.02	0.94	0.53	0.87	1.07
13-gestures dataset (13Myo-sEMG) Model Variation	empirical std	0.65	1.05	0.75	0.75	0.72
13-gestures dataset (13Myo-sEMG) Model Variation	robust std	0.77	0.81	0.72	0.97	0.59
	Accuracy	65%	70%	75%	80%	85%
24-gestures dataset (NinaPro DB5.C) Model Variation	empirical std	0.68	0.95	0.87	0.77	0.76
24-gestures dataset (NinaPro DB5.C) Model Variation	robust std	0.89	0.74	0.79	0.89	0.64
53-gestures dataset (NinaPro DB1) Model Variation	empirical std	0.72	0.93	0.95	0.87	0.78
53-gestures dataset (NinaPro DB1) Model Variation	robust std	0.94	0.77	0.78	0.92	0.84

Table 4. Standard Deviation of Accuracy Computed on 15 Epochs, After Convergence, on the Test Set for Constrained Models

Next, we have also evaluated how the positivity constraint impacts the overall accuracy of our system. We trained a robust network by allowing the weights to have arbitrary signs. For this purpose, we control individually the Lipschitz constant of each layer $i \in \lbrace 1, \dots , m\rbrace$ to be less than a given value $\overline{\vartheta }^{1/m}$. The exact projection onto $\widetilde{\mathcal {C}}_i$, $\mathsf {P}_{\widetilde{\mathcal {C}}_i}$, as well as the approximate one $\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_i}$ were computed as described previously. In this case, $\overline{\vartheta }$ represents an upper bound on the Lipschitz constant of the system. Table 5 summarizes the results for different values of $\overline{\vartheta }$, for two datasets. We compare our method for dealing with Lipschitz constraints with the approach proposed in Reference [51]. This approach, which is implemented in the deel-lip library allows the user to train robust networks in a convenient manner, offering a robustness certificate by performing a spectral normalization for each layer. It can be observed on these datasets that our method yields similar results when using the approximate projection, but better ones when using the exact projection. These results underline again the importance of carefully managing the projections and the effect it has on the accuracy of the system.

Table 5.

	Accuracy		75%	80%	85%	90%	$95\%$
7-gestures dataset Myo-sEMG Lipschitz constant	$\mathcal {C}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}$	72.03	127.5	1296	$8.75 \times 10^4$	$5.43 \times 10^9$
	$\mathcal {C}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_i}$	52.06	102.49	905.45	$7.23 \times 10^4$	$8.14 \times 10^8$
	Deel-lip[51]		75.81	126.9	1283.6	$8.70 \times 10^4$	$5.43 \times 10^9$
13-gestures dataset 13Myo-sEMG Lipschitz constant	$\mathcal {C}_i$	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}$	76.59	125.20	1016	$2.03 \times 10^4$	$4.3 \times 10^8$
	$\mathcal {C}_i$	$\mathsf {P}_{\widetilde{\mathcal {C}}_i}$	61.22	99.74	740	$1.26 \times 10^4$	$6.7 \times 10^7$
	Deel-lip [51]		77.21	125.63	1120	$2.04 \times 10^4$	$4.5 \times 10^8$

Table 5. Lipschitz Constant for Networks Trained with Arbitrary Signs—7-gestures/13-gestures Datasets

5 Robustness Validation

In this section, we investigate to what extent the theoretical concepts described in the previous sections help in improving the robustness of the classifier in different settings. To this goal, we consider the following three scenarios. In the first one, we examine the impact of adversarial attacks on the performance of the classifier. The second scenario takes into account the effect of noise in the acquisition process. In the case of sEMG signals, this noise may come from imperfect skin-sensor contact caused by hairs or drops of sweat. In the last scenario, we perform a real-life experiment using 10 able-bodied volunteers.

5.1 Sensitivity to Adversarial Attacks

We evaluate our robust model on purposely designed perturbations, by studying their influence on the overall performance of the system. We lead attacks on our best robust model in terms of accuracy and robustness achieving 92.95% accuracy and a Lipschitz constant $\overline{\vartheta } = 0.87$ for the 7-gesture dataset. We compare the results with two conventionally trained models: the best one in terms of performance, which achieves 99.78% prediction accuracy on non-adversarial data, and another one trained to have similar performance as our robust model reaching 92.99% accuracy on the original test set.

To create the adversarial samples, we used some of the most popular attackers, namely:

—

Fast gradient sign method (FGSM) [27]—generates adversarial data based on the gradient of the cost function with respect to the input data;

—

Jacobian Saliency Map Attacker (JSMA) [44]—computes a perturbation based on $\ell _2$ distance metric by iteratively selecting the input sample that will increase the chance of miss-classification;

—

Projected gradient descent (PGD) [39]—uses local first order information about the network to create adversarial examples;

—

Carlini and Wagner (C&W) [8]—uses $\ell _{2}$ distance to compute the optimal adversarial perturbation.

—

Gradient Matching (GM) [26]—this is a data-poisoning black-box attack. In this case, the attacker does not have access to the victim model parameters, but instead is trying to match the gradient direction for adversarial examples.

We also show a comparison with another popular technique of ensuring the robustness of neural network-based models, namely Adversarial training. This implies training an extended version of the dataset, containing the original training data together with a perturbed version of the samples in an effort to increase the system stability against adversarial inputs. Note that this method is purely empirical and gives no theoretical robustness certificates. We implemented an adversarial training strategy detailed in Reference [39], training the system using an augmented version of the dataset which was updated every 25 epochs. The adversarial samples were created using PGD attack and then the model was validated using data containing perturbations computed with various attacks.

The results summarised in Table 6 show the performance obtained for the 7-gesture test set. Note that the robust model performance is barely affected by the adversarial perturbations, whereas the baseline models show a huge drop in accuracy. It can be observed that adversarial training helps to increase the robustness, but our method of controlling the Lipschitz constant the network provides better results when facing data perturbed with other attackers than PGD. As expected, the poisoning attack is less effective than the white-box ones against the baseline models, but still our robust model showcases better performance. This shows that our method is more versatile, since its performance remains stable whatever the attacker.

Table 6.

Accuracy [%]
	robust model		baseline model				adversarial trained model – PGD
Attack	adversarial	non-adversarial	adversarial	non-adversarial	adversarial	non-adversarial	adversarial	non-adversarial
FGSM [27]	91.75	92.95	76.48	99.78	71.21	92.99	80.43	97.25
C& W $\ell _{2}$ [8]	90.09		48.03		45.85		60.17
PGD [39]	91.92		59.36		56.38		97.25
JSMA [44]	91.10		89.37		81.27		83.31
GM [26]	92.13		98.25		89.04		95.38

Table 6. Adversarial Attack Results

First four lines correspond to white-box attacks, whereas the last line shows a black-box attack. We consider out best constrained model, having a Lipschitz constant $\theta =0.97$, two models trained conventionally: the best baseline and another one having similar performance as the constrained one. On the last columns we feature an adversarial trained model using PGD-generated perturbations.

5.2 Noisy Input Behaviour

To simulate the effect of underlying noise generated during the acquisition process, we added synthetic noise directly to the raw sEMG data, prior to the feature extraction step. The noise is chosen independent and identically distributed according to a Gaussian mixture law $(1-p) \mathcal {N}(0,\sigma _0^2)+p \mathcal {N}(0,\sigma _1^2)$. The mixture comprises a background component, corresponding to the intrinsic electronic noise in the armband, such as thermal or quantization noise, and an impulsive component accounting for outliers. Those may be related to imperfect wiring that can generate impulse-like artifacts. In our experiments, we consider background and impulse noises with standard deviations $\sigma _0 = \alpha$ and $\sigma _1 = 10\alpha$ with $\alpha \in [0,+\infty [$. We generate different levels of noise, by varying the parameter $\alpha$. The probability of peaks $p\in [0,1]$ is also adjusted to simulate more or less severe scenarios in terms of outliers.

From the resulting noisy signals, we extract the features described in Section 4 and pass them to the classifier, using our robust models reaching an accuracy of 92.95% ($\overline{\vartheta } = 0.87$) for the 7-gestures dataset, and 93.05% ($\overline{\vartheta } = 0.98$) in the case of the 13-gestures dataset, trained with non-altered data. We compared the results achieved with our robust training with those obtained with (i) classical training and (ii) adversarial training. In this case, the adversarial training was performed by generating an extended dataset, containing the original data and corrupted versions of them by additive noise following the Gaussian mixture law described above, where the parameters p and $\alpha$ were drawn randomly in a uniform manner on $[0.15, 0.45]$ and $[0, 2]$, respectively. In the absence of noise, a similar performance in terms of accuracy was obtained: 7-gestures dataset—92.99%, and 92.97%, 13-gestures dataset—93.03% and 92.98% for baseline and the adversarial training, respectively.

The experimental results obtained on two datasets are depicted in Figure 5. The red, blue, and green lines correspond to the unconstrained, constrained, and adversarial models, respectively. We observe that the constrained model is significantly less affected by the presence of noise in the inputs than the one trained without robustness guarantees. It is also worth noting that training with adversarial inputs also leads to satisfactory results, although usually slightly less accurate. The Lipschitz lower and upper bounds computed for the networks trained in an adversarial manner are indeed much lower than those with standard training, but they remain quite large ((1845.23, 79534.2) for 7-gestures dataset and (1754.74, 64595.8) for 13-gestures dataset).

Fig. 5.

This experiment emphasizes that controlling the Lipschitz constant of a network improves its robustness not only against targeted adversarial attacks, as shown previously, but also in the case of black-box attacks, where no prior information about the model is used.

5.3 Real-Life Scenario Validation

To illustrate the practical applicability of our findings, we proceed to validate our model in a real-life context. For this purpose, we designed an experiment to compare a conventionally trained model with the constrained one. We integrated both models in a real-time application that controls a 3D hand on a screen, as well as a game that can be controlled by gestures, to give the user a tangible feedback. We used the Unity⁴ platform to design and control a 3D hand and then encapsulated our models in an application which performed real-time inference and the hand was moving on the screen in accordance with the predicted gesture. We asked 10 volunteers (males and females) to test both models by performing each gesture 20 times. We emphasize that the user had no prior knowledge about what model was implemented, since it was randomly selected at the beginning of each new trial. Pictures of the experimental setup are provided in Figure 6. Table 7 details on a user level, how many (out of the 20) trials were erroneously classified. U and C denote the Unconstrained and the Constrained models, respectively. Note that, despite obtaining very good results on the test set, the unconstrained model loses a lot in terms of performance (up to 15%) when facing real-life data. We can observe that training a positive neural networks subject to Lipschitz constraints improves the overall robustness of the classifier against adversarial perturbations, not only from a theoretical viewpoint, but also practically by leading to more reliable systems with greater generalization power.

Table 7.

Movement	User #1		User#2		User#3		User#4		User#5		User#6		User#7		User#8		User#9		User#10
	C	U	C	U	C	U	C	U	C	U	C	U	C	U	C	U	C	U	C	U
up	2	2	1	3	0	0	0	1	0	0	0	2	0	2	1	2	0	2	1	3
down	1	1	0	2	2	3	0	0	2	4	1	0	2	3	1	1	0	1	0	1
right	0	4	0	0	0	1	0	1	1	1	0	2	0	0	0	0	0	1	1	2
left	3	5	1	4	0	1	0	1	2	5	0	0	0	1	2	3	1	2	0	1
fist	0	2	2	4	0	0	1	0	0	3	0	1	1	1	0	2	1	1	1	3
spread	0	3	2	5	3	4	2	4	1	0	0	0	1	2	1	0	0	1	0	3
Sum	6	17	6	18	5	9	3	7	6	13	1	5	4	9	6	7	2	8	3	3
Error rate (%)	5	14	5	15	4.1	7.5	2.5	5.7	5	10.7	0.7	4.1	3.3	7.5	5	5.7	1.6	6.6	2.5	10.7

Table 7. Real-Scenario Experiment Results

Fig. 6.

As for the other application, we asked the volunteers to play 2 rounds of a gesture-controlled game, one with each model. The game was inspired by the famous Temple Run,⁵ and consists of a moving cube which the user controls via gestures. The player can move his/hers hand left or right to move the character to either side of the screen to avoid obstacles. The player can also move the hand up to jump or spread its fingers to shoot and clear the obstacles ahead. The game is over when the player fails to take a turn or to jump/ clear an obstacle. We observed that 70% of the users were able to obtain higher scores when they used the constrained model, showing again that our solution is more stable when it comes to real-life applications.

5.4 Limitations

Increased training time is one of the main limitations of our proposed approach. Indeed, to compute the true projection, the proposed method uses an iterative algorithm which performs singular value decomposition at each iteration, which is a resource consuming operation, especially when performed on large matrices. We propose several lower complexity solutions, which have proved to offer a good tradeoff between training time, robustness and performance. Table 8 shows the training time for all the propose constraint algorithms. The time is measured per step, which consists of a batch of 2048 examples. Nevertheless, it is worth noting that the additional time overhead is applicable only during the training phase. The inference is the same for all the models, around 7 ms per step.

Table 8.

Constraint	None	$\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	$\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	${\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}$	$\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}$	$\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$	$\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}$
time/step [ms]	7	9	11	9	11	18	28

Table 8. Training Time for Different Constraints in the Case of an $m=6$-layer Network

The training time is computed on a batch of 2048 examples.

Another limitation is related to the fact that our method for controlling the Lipschitz constant of the system is currently applicable in the context of nonnegative-weighted fully connected feed-forward neural networks. Although the performance remains good for the considered AGR systems, the nonnegativity constraint might lead to a loss of expressivity of the neural networks in other inference tasks. In a future work, we plan to extend our method towards more general neural network architectures, including convolutional layers, skip connections, and so on.

6 Conclusion

This work has shown the usefulness of designing robust feed-forward neural networks for AGR based on sEMG physiological signals. More precisely, we proposed to finely control the Lipschitz constant of these nonlinear systems by considering positively weighted neural architectures. To offer robustness certificates, we also developed new optimization techniques for training classifiers subject to spectral norm constraints on the weights. We studied various constrained formulations and showed that robustness can be secured without sacrificing accuracy when using a combination of tight constraints and exact projections. We also provide several lower-complexity solutions, which reduce the training time significantly.

Experiments on four distinct datasets illustrated the good performance of our approach. We further demonstrated the effectiveness of our robust classifier, compared with classically trained ones, when facing white-box and black-box attacks.

We also want to highlight that one of the key advantages of our research was the ability to conduct real-life experiments. This was made possible because we had access to a specialized acquisition module tailored for capturing gesture data. The availability of this acquisition module allowed us to gather real-world gesture data in a controlled setting, which closely mimics practical scenarios. By conducting experiments with real users and their gestures, we could thoroughly evaluate the performance and accuracy of our proposed methodology. This real-life experimentation not only provided us with invaluable insights into the effectiveness of our approach, but also demonstrated its feasibility and potential for implementation in real-world applications.

In future works, it would be interesting to apply such a robust training procedure to other applications in pattern recognition involving data acquired in real-time.

Footnotes

To simplify our notation, $\mathcal {B}(0,\overline{\vartheta })$ will designate any spectral ball of this kind whatever the dimensions of the involved matrices.

https://www.ottobock.com/en-gb/home-uk

A code in TensorFlow will be made available upon the acceptance of the article.

⁴

https://unity.com/

⁵

https://play.google.com/store/apps/details?id=com.imangi.templerun&hl=en&gl=US&pli=1

A Accelerated Dfb Algorithm

Let $n\in \mathbb {N}\setminus \lbrace 0\rbrace$ and $i\in \lbrace 1,\ldots ,m\rbrace$. Computing the projection of a matrix $\overline{W}_i \in \mathbb {R}^{N_i\times N_{i-1}}$ onto $\mathcal {D}_i \cap \mathcal {C}_{i,n}$ is equivalent to solve the following matrix optimization problem:

\begin{equation} \underset{{{\scriptstyle \begin{matrix}{W_i\in \mathbb {R}^{N_i \times N_{i-1}}}\end{matrix}}}}{\text{minimize}}\;\;\iota _{\mathcal {D}_i}(W_i)+\iota _{\mathcal {B}(0,\overline{\vartheta })}(A_{i,n}W_iB_{i,n})+ \frac{1}{2} \Vert W_i-\overline{W}_i\Vert ^2_{\rm F} \end{equation}

(30)

where $\Vert \cdot \Vert _{\rm F}$ is the Frobenius norm and $\iota _{\mathcal {S}}$ denotes the indicator of a set $\mathcal {S}$ (this function is equal to 0 on this set and $+\infty$ otherwise.) The dual optimization problem associated to this strongly convex minimization problem reads

\begin{equation} \underset{{{\scriptstyle \begin{matrix}{Y \in \mathbb {R}^{N_m\times N_0}}\end{matrix}}}}{\text{minimize}}\;\;f^*(-A_{i,n}^\top Y B_{i,n}^\top)+ \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}(Y) , \end{equation}

(31)

where for a given function g, $g^*$ denotes its Fenchel-Legendre conjugate. In our case $f=\iota _{\mathcal {D}_i}+\frac{1}{2} \Vert \cdot -\overline{W}_i\Vert ^2_{\rm F}$. From standard conjugation rules [33], $f^*$ is equal to

\begin{equation} (\forall W_i\in \mathbb {R}^{N_i\times N_{i-1}})\quad f^*(W_i) = \widetilde{\iota _{\mathcal {D}_i}}(W_i+\overline{W}_i), \end{equation}

(32)

where $\widetilde{\iota _{\mathcal {D}_i}}$ is the Moreau envelope of $\iota ^*_{\mathcal {D}_i}$ given by

\begin{equation} \widetilde{\iota _{\mathcal {D}_i}}(W_i)= \inf _{W^{\prime }_i\in \mathbb {R}^{N_i\times N_{i-1}}} \iota ^*_{\mathcal {D}_i}(W^{\prime }_i)+\frac{1}{2} \Vert W^{\prime }_i-W_i\Vert ^2_{\rm F}. \end{equation}

(33)

The Moreau envelope of a proper lower-semincontinuous convex function is differentiable. Thus $f^*$ is differentiable and its gradient is [6, Example 17.33]

\begin{equation} \nabla f^*(W_i) = \mathsf {P}_{\mathcal {D}_i}(W_i+\overline{W}_i). \end{equation}

(34)

We deduce that the gradient of $Y \mapsto f^*(-A_{i,n}^\top Y B_{i,n}^\top)$ is

\begin{equation*} -A_{i,n} \mathsf {P}_{\mathcal {D}_i}(\overline{W}_i-A_{i,n}^\top Y B_{i,n}^\top) B_{i,n}. \end{equation*}

Since $\mathsf {P}_{\mathcal {D}_i}$ is a nonexpansive operator, the latter function has a Lipschitz gradient with constant $\beta = \Vert A_{i,n}\Vert ^2_{\rm S} \Vert B_{i,n}\Vert ^2_{\rm S}$. The dual problem (31) thus corresponds to the minimization of the sum of a smooth convex function and a proper lower-semicontinuous function. Consequently, it can be minimized by a proximal algorithm. Such a strategy will require to calculate the proximity operator of $\gamma \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}$ for some scaling parameter $\gamma \in \, ]0,+\infty [$. By using Moreau’s formula [6], this proximity operator is expressed as

\begin{equation} (\forall Y\in \mathbb {R}^{N_m\times N_0})\;\; \operatorname{prox}_{\gamma \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}}(Y) = Y - \gamma \mathsf {P}_{\mathcal {B}(0,\overline{\vartheta })}(\gamma ^{-1} Y). \end{equation}

(35)

A classical solution for solving the dual problem consists in using the standard forward-backward algorithm [15, 20]. This leads to Algorithm [17]. Another solution consists in using the FISTA-like algorithm in [10], which leads to the accelerated version in Algorithm 2. The sequences $(Y_\ell)_{\ell \in \mathbb {N}}$ generated by these two algorithms is guaranteed to converge to a solution $\widehat{Y}$ to the dual problem. In addition, from Kuhn–Tucker conditions, the solution to the primal problem $\widehat{W}_i=\mathsf {P}_{\mathcal {S}_{i,n}}(\overline{W}_i)$ is equal to $\nabla f^*(-A_{i,n}^\top \widehat{Y} B_{i,n}^\top)$. It follows from (34) and the continuity of $\mathsf {P}_{\mathcal {D}_i}$ that the sequence $(V_\ell)_{\ell \in \mathbb {N}}$ converges to $\widehat{W}_i$.

References

[1]

Cristina Andronache, Marian Negru, Ioana Bădiţoiu, George Cioroiu, Ana Neacsu, and Corneliu Burileanu. 2022. Automatic gesture recognition framework based on forearm EMG activity. In Proc. IEEE Int. Conf. Telecomm. Signal Porcess.284–288.

	Accuracy		75%	80%	85%	90%	\(95\%\)
7-gestures dataset Myo-sEMG Lipschitz constant	\(\mathcal {C}_i\)	\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}\)	72.03	127.5	1296	\(8.75 \times 10^4\)	\(5.43 \times 10^9\)
	\(\mathcal {C}_i\)	\(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\)	52.06	102.49	905.45	\(7.23 \times 10^4\)	\(8.14 \times 10^8\)
	Deel-lip[51]		75.81	126.9	1283.6	\(8.70 \times 10^4\)	\(5.43 \times 10^9\)
13-gestures dataset 13Myo-sEMG Lipschitz constant	\(\mathcal {C}_i\)	\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}\)	76.59	125.20	1016	\(2.03 \times 10^4\)	\(4.3 \times 10^8\)
	\(\mathcal {C}_i\)	\(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\)	61.22	99.74	740	\(1.26 \times 10^4\)	\(6.7 \times 10^7\)
	Deel-lip [51]		77.21	125.63	1120	\(2.04 \times 10^4\)	\(4.5 \times 10^8\)

Abstract

1 Introduction

2 Robustness Solutions in the Context of Nonnegative Neural Networks

2.1 Problem Formulation

2.2 Lipschitz Robustness Certificate

3 Optimization Methods for Training Robust Feedforward Networks

3.1 Stochastic Gradient Descent—Projected Variant

3.2 Constraint Sets

3.3 Handling Looser Constraints

4 AGR Experimental Setup

4.1 sEMG Datasets

4.2 Proposed Architecture

4.3 Performance Analysis in Terms of Accuracy and Robustness

5 Robustness Validation

5.1 Sensitivity to Adversarial Attacks

5.2 Noisy Input Behaviour

5.3 Real-Life Scenario Validation

5.4 Limitations

6 Conclusion

Footnotes

A Accelerated Dfb Algorithm

References

Cited By

Index Terms

Recommendations

Biceps activity EMG pattern recognition using neural networks

EMG-based online classification of gestures with recurrent neural networks

Combined influence of forearm orientation and muscular contraction on EMG pattern recognition

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations