skip to main content
research-article
Open access

EMG-Based Automatic Gesture Recognition Using Lipschitz-Regularized Neural Networks

Published: 22 February 2024 Publication History

Abstract

This article introduces a novel approach for building a robust Automatic Gesture Recognition system based on Surface Electromyographic (sEMG) signals, acquired at the forearm level. Our main contribution is to propose new constrained learning strategies that ensure robustness against adversarial perturbations by controlling the Lipschitz constant of the classifier. We focus on nonnegative neural networks for which accurate Lipschitz bounds can be derived, and we propose different spectral norm constraints offering robustness guarantees from a theoretical viewpoint. Experimental results on four publicly available datasets highlight that a good tradeoff in terms of accuracy and performance is achieved. We then demonstrate the robustness of our models, compared with standard trained classifiers in four scenarios, considering both white-box and black-box attacks.

1 Introduction

In recent years, the concept of human–computer interaction (HCI) has been at the core of many scientific and sociological developments. Combined with the power of machine learning algorithms, it has led to some of the most outstanding achievements in nowadays technology, which are used successfully in an ever-increasing number of areas impacting our lives, e.g., medicine [29], autonomous driving [38], natural language processing [57], and so on. Researchers all around the world focus on providing new intuitive and accurate ways of interacting with devices around, based on gesture, voice, or vision analysis [36]. Gestures constitute a universal and intuitive way of communication, with the potential of bringing the Internet of Things (IoT) experience to a different, more organic level [48]. Automatic gesture recognition (AGR) algorithms can be successfully used in various applications, from sign language recognition (SLR) [12] to VR games [56].
Various solutions for AGRs based on image or video stream analysis, leveraging on computer vision algorithms have been proposed; see for example [22, 25, 34]. A multi-stream solution for dynamic hand-gesture recognition is described in Reference [58]. Multi-modal approaches for gesture classification have been also studied [42]. A novel method showing a fully neuromorphic implementation [9] achieves good results (96% accuracy while reducing the inference time by 30%). Although a good performance is achieved on synthetic data, in real-life scenarios these systems may be sensitive to environmental conditions, e.g., light conditions, background, and so on. Additionally, these systems are often computationally demanding and consequently not always suited for real-time applications. Accelerometers and electromyography (EMG) sensors provide an alternative low-cost technology for gesture sensing [30]. sEMG stands for surface EMG and represents the electrical manifestation of the neuromuscular activation related to the contraction of the muscles [4]. In Reference [46], the authors propose a method combining feature selection with ensamble learning, achieving around 78% classification accuracy for 53 gestures. The applications of sEMG-based classification systems are focused on, but not limited to, assertive devices and rehabilitation or postural control therapy for physically impaired persons [31]. With the continuous development of more versatile signal processing techniques, the applications of EMG signal classification expanded to a wide range of domains including augmented reality, gaming industry, military applications, and so on [32, 43].
Two critical issues need to be addressed when developing AGR algorithms: fast enough inference to ensure real-time feeling for the end-user, and accurate and robust classification to guarantee that the gesture is correctly identified no matter the environmental conditions.
However, deep neural networks (DNNs), which are probably the most powerful methods, may appear as black boxes whose robustness is not always well-controlled. For real-life applications, it is mandatory to guarantee the reliability of such techniques. Nowadays, the main difficulty to overcome consists in developing high-performance systems that are also trustable and safe. An additional challenge is to avoid implementation heaviness during the learning phase.
In Reference [52], the authors showed that slightly altering data inputs that were correctly classified by the network can lead to a wrong classification [7, 35, 53]. This finding was at the origin of the concept of adversarial inputs, which constitute malicious input data that can deceive machine learning models. For example, Reference [7] shows how voice interfaces can be fooled by creating carefully crafted artificial audio inputs of unintelligible voice that are miss-classified as specific vocal commands by the system. Also, Reference [27] introduces several methods for generating adversarial examples on ImageNet that are so close to the original data that differences are indistinguishable for the human eye.
It must be emphasized that adversarial inputs are not necessarily artificially created with the intention to sabotage the system. As other physiological signals, e.g., EEG or EKG, EMG signals have low frequency components (usually between 10–150 Hz), and low amplitudes (\(\le \! 10\) mV Peak to Peak). This makes them very sensitive to noise and outside perturbations that can occur innately, under the form of noise stemming from acquisition devices, imperfect sensor contact, and so on. Those can seriously flaw the performance of real-life applications based on pre-trained models [40]. An empirical way of training more robust AGR systems is detailed in Reference [23], where a strategy of training using noisy labels is proposed.
As highlighted in Reference [27], the Lipschitz behaviour of the network is tightly correlated with its robustness against adversarial attacks. The Lipschitz constant allows to upper bound the output perturbation knowing the magnitude of the input one, for a given metric [50]. Controlling this constant thus represents a feasible solution to limit the effect of adversarial attacks. Computing the exact Lipschitz constant of a neural network is, however, a very complex problem, so the main challenge is to find clever ways to approximate this constant effectively.
Recently, several techniques to ensure the Lipschitz stability of neural networks have been explored. For example, Reference [53] proposes a novel weight spectral normalization technique applied to stabilize the training of the discriminator in Generative Adversarial Networks (GANs). The Lipschitz constant of the network is viewed as a hyper-parameter that can be tuned in the training process of the image generation task. Doing so leads to a model with improved generalization capabilities.
In Reference [3] norm-constraint GroupSort based architectures are proposed and it is shown that they can be used as universal Lipschitz function approximators. The authors apply gradient norm preservation to create Lipschitzian networks that offer adversarial robustness guarantees. In Reference [14] the authors introduce Parseval networks, another approach for designing networks which are intrinsically robust to adversarial noise, by imposing the Lipschitz constant of each layer of the system to be less than 1. In Reference [24], a convex optimization framework is introduced to compute tight upper bounds for the Lipschitz constant of DNNs. They make use of the observation that commonly used activation operators are gradients of convex functions. Semi-definite programming approaches to ensure robustness are also explored in Reference [45]. The main contributions of this article are:
To propose a robust real-time AGR system based on sEMG signals. The robustness is ensured by using a novel learning algorithm for training feedforward neural networks.
To show that a good accuracy-robustness balance can be reached. To do so, we train the system under carefully crafted spectral norm constraints, allowing us to finely control its Lipschitz constant. A tight Lipschitz constant is efficiently estimated by focusing on neural networks with positive weights as in Reference [13].
To demonstrate the performance of the final architecture in real-life experiments where we show that the proposed robust model outperforms those trained conventionally.
To analyze how our system behaves when the input is affected by different noise levels, simulating perturbations that may occur in real scenarios.
To show the validity of our solution by experimenting on four distinct publicly available sEMG gestures datasets.
The rest of the article is structured as follows. The theoretical background of our work is detailed in Section 2. In Section 3, we present the proposed optimization algorithm and we investigate the way of dealing with the constraints. The application and the results are discussed in Section 4, while Section 5 deals with how our model behaves when facing adversarial data. Finally, Section 6 contains some concluding remarks.

2 Robustness Solutions in the Context of Nonnegative Neural Networks

2.1 Problem Formulation

Any feedforward neural network is obtained by cascading m layers associated with operators \((T_i)_{1\le i \le m}\). The neural network can thus be expressed as the following composition of operators:
\begin{equation} T=T_m\circ \ldots \circ T_1. \end{equation}
(1)
Each layer \(i\in \lbrace 1,\ldots ,m\rbrace\) has a real-valued vector input \(x_i\) of dimension \(N_{i-1}\) which is mapped to
\begin{equation} T_i(x_i)=R_i(W_i x_i+b_i), \end{equation}
(2)
where \(W_i\in \mathbb {R}^{N_i\times N_{i-1}}\), \(b_i\in \mathbb {R}^{N_i}\) are the weight matrix and bias parameter, respectively. \(R_i:\mathbb {R}^{N_i} \rightarrow \mathbb {R}^{N_i}\) constitutes a non-linear activation operator which is applied component-wise (e.g., ReLU or Sigmoid) or globally (e.g., Softmax). Figure 1 shows a graphical representation of this concept.
Fig. 1.
Fig. 1. Representation of a NN as a composition of operators.
Even though the choice of the activation \(R_i\) may differ depending on the task at hand, it has been shown in References [16, 18] that most of them are actually \(\alpha _i\)-averaged operators with \(\alpha _i \in \,]0,1]\). Recall that \(R_i\) is an \(\alpha _i\)-averaged operator if, for every pair \((x_i,y_i)\in (\mathbb {R}^{N_i})^2\), the following inequality holds:
\begin{equation} \Vert R_i(x_i)-R_i(y_i)-(1-\alpha _i)(x_i-y_i)\Vert \le \alpha _i \Vert x_i-y_i\Vert . \end{equation}
(3)
When \(\alpha _i = 1/2\), \(R_i\) is said to be firmly nonexpansive. For standard choices of activation operators, \(R_i\) is firmly nonexpansive since it is the proximity operator of a proper, lower-semicontinous function (see Reference [16] for more details). Note that, in Reference [24], it is assumed that \(R_i\) operates component-wise and is slope-bounded. The authors emphasize that the most common case corresponds to lower and upper slope values equal to 0 and 1, respectively. It follows from [15, Proposition 2.4] that a function satisfies this property if and only if it is the proximity operator of some proper lower-semicontinuous convex function, so that similar assumptions to those made in Reference [16] are recovered.
As explained in Reference [18], examples of activation operators \(R_i\) which are \(\alpha _i\)-averaged with \(\alpha _i \gt 1/2\) can be encountered. They basically correspond to over-relaxations of firmly nonexpansive operator. An example of such operators is the Swish activation function [49]. Another famous example is the group-sort operator:
\begin{equation} \left(\forall x_i = \begin{bmatrix}x_{i,1}\\ \vdots \\ x_{i,M} \end{bmatrix}\in \mathbb {R}^{N_i}\right)\qquad R_i(x_i)= \begin{bmatrix}x_{i,1}^\uparrow \\ \vdots \\ x_{i,M}^\uparrow \end{bmatrix}, \end{equation}
(4)
where the vector \(x_i\) has been decomposed in M subvectors \(x_{i,j}\) with \(j\in \lbrace 1,\ldots ,M\rbrace\), of dimension B (\(N_i= B M\)) and \(x_{i,j}^\uparrow\) designate the vector of components of \(x_{i,j}\) sorted in ascending order. \(R_i\) is then purely nonexpansive, i.e., \(\alpha _i=1\). Note that max-pooling can be achieved by composing this group sort operation with a linear operator. Indeed, if \(i\lt m\), \(M=N_{i+1}\), and \(W_{i+1}\) is the matrix extracted from the \(N_i\times N_i\) identity matrix \(\text{Id}\,_{N_i}\) by selecting the matrix rows with indices multiple of B, then \(W_{i+1}\circ R_i\) corresponds to a max-pooling.

2.2 Lipschitz Robustness Certificate

Consider a neural network T as described in Figure 1. let \(x\in \mathbb {R}^{N_0}\) be the input of the network and let \(T(x)\in \mathbb {R}^{N_m}\) be its associated output. By adding some small perturbation \(z \in \mathbb {R}^0\) to the input, the perturbed input is \(\tilde{x} = x + z\). The effect of the perturbation on the output of the system can be quantified by the following inequality:
\begin{equation} \Vert T(\tilde{x})-T(x)\Vert \le \theta _m \Vert z\Vert , \end{equation}
(5)
where \(\theta _m \ge 0\) denotes a Lipschitz constant of the network. \(\theta _m\) represents thus, an important parameter that allows us to assess and control the sensitivity of a neural network to various perturbations. It needs, however, to be accurately estimated to provide valuable information. A standard approximation to the Lipschitz constant [27] is given by
\begin{equation} \theta _m = \prod _{i=1}^m \Vert W_i\Vert _{\rm S}, \end{equation}
(6)
where \(\Vert \cdot \Vert _{\rm S}\) denotes the spectral norm of a matrix. Although simple to compute, this approximate bound is over-pessimistic. Different methods for obtaining tighter estimates of the Lipschitz constant have been presented in the recent literature; see for example [11, 18, 24, 37, 50]. Local estimates of the Lipschitz constant can also be performed, which may appear more relevant. But they are more complex to compute and, as we will see, controlling the global Lipschitz constant is usually sufficient to get a good performance. Estimating the global Lispchitz constant of the network is an NP (non-deterministic polynomial-time)-hard problem [50]. Although there exist efficient approaches to approximate an accurate bound [11, 24, 37], computing these estimates may be expensive for wide or deep networks. In addition, using these bounds within a training procedure is a difficult task [45]. In this work, we will make the following assumption.
Assumption 2.1.
Let a neural network be given by (1) where the \(i^{th}\) layer with \(i\in \lbrace 1,\ldots ,m\rbrace\) is given by (2). We assume that
(i)
all the activation layers, except possibly the last one, consist of separable averaged operators, that is, for every \(i\in \lbrace 1,\ldots ,m-1\rbrace\), there exist averaged functions \((\rho _{i,k})_{1\le k \le N_i}\) from \(\mathbb {R}\) to \(\mathbb {R}\) such that \(R_i:(\xi _{i,k})_{1\le k \le N_i} \mapsto \big (\rho _{i,k}(\xi _{i,k})\big)_{1\le i \le k}\);
(ii)
at the last activation layer, \(R_m\) is an averaged operator.
Our approach will be grounded on the following result.
Proposition 2.2 ([18]).
Suppose that Assumption 2.1 holds. For every \(i\in \lbrace 1,\ldots ,m\rbrace\), let \(A_i\) be the matrix whose elements are the absolute values of those of \(W_i\). Then,
\begin{equation} \vartheta _m = \Vert A_m \times \cdots \times A_1\Vert _{\rm S} \end{equation}
(7)
is a Lipschitz constant of T. In addition
\begin{equation} \Vert W_m \times \cdots \times W_1\Vert \le \vartheta _m. \end{equation}
(8)
In particular if, for every \(i\in \lbrace 1,\ldots ,m\rbrace\), \(W_i\in [0,+\infty [^{N_i\times N_{i-1}}\), \(\vartheta _m\) is equal to the lower bound in (8).
Based on this proposition, the best estimate for the Lipschitz constant of a given feedforward neural network having nonnegative weights simplifies to the spectral norm of the product of all the weight matrices composing the network. More precisely, the obtained Lipschitz constant
\begin{equation*} {\vartheta _m = \Vert W_m \times \cdots \times W_1\Vert _{\rm S}} \end{equation*}
is the Lipschitz constant of a purely linear network, where all the non-linear activation operators have been replaced with the identity operator.
The above result is guaranteed to be valid only in the case when all the weights are nonnegative. In the general case of networks with weights having arbitrary signs, it can be proved that \(\Vert W_m \times \cdots \times W_1\Vert _{\rm S}\) represents only a lower bound of the Lipschitz constant established in Reference [18]. It is also worth mentioning that the proposed results hold for any algebraic structure of the weight matrices \((W_i)_{1\le i \le m}\). Using the above defined bound, in the following, we will propose an algorithm for training models with theoretical robustness guarantees, and validate it in the context of gesture classification. By focusing on gesture recognition, we aims to showcase the effectiveness of our methodology in a challenging domain having multiple applications and for which real experiments can be made. Moreover, gesture recognition tasks often involve complex and dynamic data, making them suitable testbeds for evaluating the robustness and adaptability of our proposed approach.

3 Optimization Methods for Training Robust Feedforward Networks

3.1 Stochastic Gradient Descent—Projected Variant

Standard training in neural networks consists in the minimization of a nonconvex cost function with respect to the model parameters by means of an iterative strategy. Let \(\mathcal {L}\) be the cost function defined as follows:
\begin{equation} \mathcal {L}(\eta) = \sum _{k=1}^K \ell (z_k,\eta), \end{equation}
(9)
where \(\eta = (\eta _i)_{1\le i \le m}\) is a vector encompassing all the model parameters. For each layer \(i\in \lbrace 1,\ldots ,m\rbrace\), \(\eta _i\) denotes a vector of dimension \(N_i(N_{i-1}+1)\) that contains the scalar variables associated with the weight matrices \(W_i\) and the corresponding bias components \(b_i\). The data information is represented by \((z_k)_{1\le k \le K}\). For every \(k\in \lbrace 1,\ldots ,K\rbrace\), \(z_k\) is a pair consisting of an input of the system and the associated desired output (ground truth). Also, \(\ell\) represents the loss function assumed to be differentiable (almost everywhere) with respect to \(\eta\).
To ensure robustness, we shall impose spectral norm constraints on the weight matrices. In other words, the vector of parameters \(\eta\) is constrained to belong to a closed set \(\mathcal {S}\) that will be described in the next section. We propose to use an extension of a standard optimization technique for training neural networks [19]. More specifically, we will implement a projected stochastic gradient algorithm. A momentum parameter is introduced in this algorithm to accelerate the convergence process.
Algorithm 1 describes the iterations performed at each epoch \(n\gt 0\). We see that there are two nested loops: the outer loop operates on the batch index q and the second one on the layer index i. In this algorithm, \(\gamma _n\in \,]0,+\infty [\) is the learning rate, while \(\zeta _n\in [0,+\infty [\) denotes the inertia parameter for momentum. The algorithm is very similar to block-iterative techniques used in convex optimization [19]. The parameters of each layer are indeed updated successively by performing a gradient step on the data in the current mini-batch (which can be epoch-dependent). \(\nabla _i\) represents the gradient, computed by standard backpropagation mechanism, with respect to \(\eta _i\) for each \(i\in \lbrace 1,\ldots ,m\rbrace\). This stochastic gradient step is followed by a projection \(\mathsf {P}_{\mathcal {S}_{i,n}}\) onto the constraint set \(\mathcal {S}_{i,n}\). The definition of this set as well as the way of handling this projection are detailed in the following.

3.2 Constraint Sets

As mentioned before, this work revolves around feed-forward networks with positive weights. Thus, the first condition that we impose is nonnegativity for each layer \(i\in \lbrace 1,\ldots ,m\rbrace\), which is modeled by the constraint set
\begin{equation} \mathcal {D}_i = \lbrace W_i \in \mathbb {R}^{N_i \times N_{i - 1}} \mid W_i \ge 0 \rbrace . \end{equation}
(10)
Moreover, based on our standing assumptions and Proposition 2.2, we must impose a spectral norm constraint on the weight matrices to control the robustness of the system. This translates mathematically as the following upper bound constraint:
\begin{equation} \Vert W_m \times \cdots \times W_1 \Vert _{\rm S} \le \overline{\vartheta }, \end{equation}
(11)
where \(\overline{\vartheta }\) represents the target maximum Lipschitz constant of the network. This bound constitutes a direct measure of the system level of robustness against adversarial inputs. We need to handle these two constraints simultaneously during the training process. Imposing nonnegativity is fairly easy since (10) defines a simple convex constraint. By contrast, constraint (11) does not satisfy the convexity property. Since (11) corresponds to a closed set in the underlying space of weight matrices and this set has a nonempty intersection with \(\mathcal {D}=\mathcal {D}_1\times \cdots \times \mathcal {D}_m\), the projection onto the intersection of the two sets can be defined but it is not guaranteed to be unique. To circumvent this difficulty, it can be noticed that (11) actually defines a multi-convex constraint in the sense that if, for every \(i\in \lbrace 1,\ldots ,m\rbrace\), \((W_j)_{1\le j \le m,j\ne i}\) are given, then (11) imposes a convex constraint on \(W_i\). This suggests to introduce the following closed and convex set:
\begin{equation} \mathcal {C}_{i,n} =\lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \Vert A_{i,n} W_i B_{i,n}\Vert _{\rm S} \le \overline{\vartheta }\rbrace , \end{equation}
(12)
in order to control the Lipschitz constant. Hereabove, the matrices \(A_{i,n}\) and \(B_{i,n}\) represent the product of the weight matrices for the previous and the posterior layers, respectively. By adopting the convention that \(A_{i,n} = \text{Id}\,\) if \(i=m\) and \(B_{i,n} = \text{Id}\,\) if \(i=1\), we define these matrix products as
\begin{align} A_{i,n}=W_{m,n} \times \cdots \times W_{i+1,n},\qquad B_{i,n}=W_{i-1,n+1} \times \cdots \times W_{1,n+1}, \end{align}
(13)
where \((W_{j,n})_{1\le j\le m}\) denote the estimates of the weight matrices at each iteration n, as it appears in Algorithm 1.
Thus, our objective will be to perform the projection onto the set \(\mathcal {S}_{i,n} = \mathcal {D}_i \cap \mathcal {C}_{i,n}\), for each layer \(i \in \lbrace 1, \dots , m\rbrace\) and at each iteration n. Several algorithms can be envisaged to solve this convex optimization problem.
Before describing our proposed algorithmic solution, let us recall the expressions of the required elementary projections. For every \(W \in \mathbb {R}^{S\times T}\), the projection of W onto \([0,+\infty [^{S\times T}\) is
\begin{equation} \mathsf {P}_{[0,+\infty [^{S\times T}}(W) = (\widetilde{W}_{s,t})_{1\le s \le S,1\le t \le T}, \end{equation}
(14)
where, for every \(s \in \lbrace 1,\ldots ,S\rbrace\) and \(t\in \lbrace 1,\ldots ,T\rbrace\),
\begin{equation} \widetilde{W}_{s,t} = {\left\lbrace \begin{array}{ll} W_{s,t} & \mbox{if $W_{s,t} \ge 0$}\\ 0 & \mbox{otherwise.} \end{array}\right.} \end{equation}
(15)
Let \(\mathcal {B}(0,\overline{\vartheta })\) be the closed spectral ball of center 0 and radius \(\overline{\vartheta }\) defined as1
\begin{equation} \mathcal {B}(0,\overline{\vartheta }) = \lbrace W \in \mathbb {R}^{S\times T} \mid \Vert W\Vert _{\rm S} \le \overline{\vartheta }\rbrace . \end{equation}
(16)
For every \(W = (W_{s,t})_{1\le s \le S,1\le t \le T} \in \mathbb {R}^{S\times T}\), let \(U \Lambda V^\top\) be the singular value decomposition of W, where \(U \in \mathbb {R}^{S\times R}\) and \(V \in \mathbb {R}^{T\times R}\) are matrices such that \(U^\top U = \text{Id}\,\) and \(V^\top V = \text{Id}\,\), \(R=\min \lbrace S,T\rbrace\), and \(\Lambda = \operatorname{Diag}(\lambda _{1},\ldots ,\lambda _{R})\), \((\lambda _r)_{1\le r \le R}\in [0,+\infty [^R\) being the singular values of W. Then the projection of W onto \(\mathcal {B}(0,\overline{\vartheta })\) is expressed as
\begin{equation} \mathsf {P}_{\mathcal {B}(0,\overline{\vartheta })}(W) = U \widetilde{\Lambda } V^\top , \end{equation}
(17)
where \(\widetilde{\Lambda } = \operatorname{Diag}(\widetilde{\lambda }_{1},\ldots ,\widetilde{\lambda }_{r})\) and
\begin{equation} (\forall i \in \lbrace 1,\ldots ,r\rbrace)\quad \widetilde{\lambda }_{i} = {\left\lbrace \begin{array}{ll} \lambda _{i} & \mbox{if $\lambda _{i}\le \overline{\vartheta }$}\\ \overline{\vartheta } & \mbox{otherwise.} \end{array}\right.} \end{equation}
(18)
To compute the projection onto \(\mathcal {S}_{i,n}\) of a matrix \(\overline{W}_{i}\in \mathbb {R}^{N_i\times N_{i-1}}\), we propose to employ the FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) version of a dual forward-backward method in Algorithm 2. This algorithm is based on a dual proximal approach [33] and constitutes an extension of the optimization method originally proposed in Reference [17]. The rationale for this algorithm is given in the appendix.

3.3 Handling Looser Constraints

The Lipchitz constant of the network can be controlled in multiple ways. Besides the solution formulated in Section 3.2, a more standard approach to control it [52] consists in imposing
\begin{equation} \prod _{i=1}^m \Vert W_i\Vert _{\rm S} \le \overline{\vartheta }. \end{equation}
(19)
Two strategies have been implemented to enforce this constraint.
(i)
The first one consists in imposing a uniform bound on the spectral norm of each weight matrix \((W_i)_{1\le i \le m}\), which leads to the following convex constraint sets:
\begin{equation} (\forall i \in \lbrace 1,\ldots ,m\rbrace) \quad \widetilde{\mathcal {C}}_i =\lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \Vert W_i \Vert _{\rm S} \le \overline{\vartheta }^{1/m}\rbrace . \end{equation}
(20)
(ii)
The second strategy aim at introducing more flexible bounds on the spectral norms of each layer. It is based on the following choice for the individual convex constraint sets:
\[\begin{eqnarray*} &(\forall n \in \mathbb {N}\setminus \lbrace 0\rbrace)(\forall i \in \lbrace 1,\ldots ,m\rbrace)\nonumber \nonumber\\ &\;\;{\check{\mathcal {C}}}_{i,n} =\Big \lbrace W_{i} \in \mathbb {R}^{N_i\times N_{i-1}} \mid \nonumber \nonumber \Vert W_i\Vert _{\rm S}\le \Vert W_{i,n}\Vert _{\rm S}\Big (\frac{\vartheta }{\prod _{j=1}^{m} \Vert W_{j,n}\Vert _{\rm S}}\Big)^{1/m}\Big \rbrace . \end{eqnarray*}\]
For every \(i\in \lbrace 1,\ldots ,m\rbrace\), projecting onto \(\widetilde{\mathcal {C}}_i\) or \({\check{\mathcal {C}}}_{i,n}\) is performed by truncating a singular value decomposition, similarly to the technique described at the end of Section 3.2. The projections onto \(\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i\) and \(\check{{\mathcal {C}}}_{i,n} \cap \mathcal {D}_i\) can then be computed by using the same iterative method as in Algorithm 2 with \(A_{i,n}=B_{i,n}=\text{Id}\,\).
In all the proposed constrained optimization methods, the projection \(\mathsf {P}_{\mathcal {B}(0,\widetilde{\vartheta })}\) onto a spectral ball with radius \(\widetilde{\vartheta } \gt 0\) plays a prominent role. The ball radius depends on the handled constraint (11), (20), or (10). A complex operation such as a singular value decomposition may be very demanding in terms of computational resources when dealing with large size matrices. In that case, we propose to use an approximate projection [53] defined as
\begin{equation} (\forall W \in \mathbb {R}^{S\times T})\quad \mathsf {P}_{\mathcal {B}(0,\widetilde{\vartheta })}(W) \simeq {\left\lbrace \begin{array}{ll} W & \mbox{if $\Vert W\Vert _{\rm S} \le \widetilde{\vartheta }$}\\ \displaystyle \frac{\widetilde{\vartheta }}{\Vert W\Vert _{\rm S}} W & \mbox{otherwise.} \end{array}\right.} \end{equation}
(21)
Using this approximation in Algorithm 2 yields approximate projections \((\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i})_{1\le i \le m,n\gt 0}\). Note, however, that we then lose the theoretical guarantees of convergence in Algorithm 2, even if this issue was not observed in our implementation.
An additional advantage of Formula (21) is that it allows the nonnegativity of the elements of the input matrix to be kept. This allows us to derive cheap approximate versions of the projection onto \(\widetilde{\mathcal {C}}_i \cap \mathcal {D}_i\) with \(i\in \lbrace 1,\ldots ,m\rbrace\) by first projecting onto \(\mathcal {D}_i\) and then applying the approximate projection onto \(\widetilde{\mathcal {C}}_i\). The resulting approximate projection is denoted by \((\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i})_{1\le i \le m}\). A similar procedure can be followed to compute approximate projections \((\widetilde{\mathsf {P}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i})_{1\le i \le m,n\gt 0}\) onto \((\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i)_{1\le i \le m, n\gt 0}\).

4 AGR Experimental Setup

4.1 sEMG Datasets

We test our proposed training scheme on four online datasets containing EMG information of different hand gestures. The first three were acquired using Myo armband, a device developed by Thalmic Labs, equipped with eight sEMG sensors displayed circularly, while the last one was acquired using 10 active double-differential OttoBockMy-oBock13E200 sEMG electrodes.2
Myo-sEMG. The first dataset, detailed in Reference [21] contains EMG signals characterizing 7 hand gestures correlated to the primary movements of the hand. There are four mobility gestures (i.e., wrist flexion and extension, ulnar, and radial deviation) and two gestures used for grasping and releasing objects (i.e., spread fingers and close fist). The 7th gesture characterizes the neutral position, corresponding to the relaxation of the muscles.
13Myo-sEMG. The second dataset includes 13 gestures: the same 7 gestures described above, plus 6 additional classes. It contains gestures from 50 different subjects and two sets of trials per user. All 13 gestures are depicted in Figure 2. More details about the dataset can be found in Reference [2].
Fig. 2.
Fig. 2. 13-gestures dataset [2].
NinaPro DB5.C. The third dataset is a subset of NinaPro DB5 dataset, detailed in [47]. The dataset is acquired using two Myo armbands, one positioned just below the elbow and the other one closer to the arm. For our experiments, we considered the subset C, which contains sEMG data associated to 24 gestures.
NinaPro DB1. The forth dataset was introduced in Reference [5], and encompasses physiological data acquired from 27 able-bodied subjects, performing a total of 53 different gestures. The sEMG data is recorded using 10 electrodes, positioned as follows. The first eight electrodes are evenly distributed around the forearm using an elastic band, maintaining a consistent distance from the radio-humeral joint located directly below the elbow. Two more electrodes are strategically positioned on the major flexor and extensor muscles located in the forearm.
We also validate our models in a real-context scenario. For the real-life predictions, we recorded the EMG activity associated with each gesture at forearm level using Myo armband. The information collected from each channel is transmitted to a computer via Bluetooth protocol where it is processed to extract relevant time domain features that will be used by the classifier to determine which gesture has been performed.

4.2 Proposed Architecture

The raw 8/10 channels EMG signal is split using a 250 ms sliding window, with \(50\%\) overlap. A 250 ms window is long enough to cover the most common gesture durations, ensuring that the essential temporal aspects of each gesture are captured within this window. Overlap ensures that important signal characteristics, such as abrupt changes or transient patterns, are not missed due to window boundaries. By using overlapping windows, the feature extraction process also becomes more robust, as multiple windows contribute to representing the same temporal information from the EMG signal. From each window of each channel a series of 8 time descriptors are extracted. The information from all the channels is then concatenated, forming a 64 (80 for the forth dataset) dimensional vector. The 7-gestures dataset contains around 200k vector samples, the 13-gestures dataset has around 59k vector samples, the 24-gestures dataset has around 20k vector samples, while the 53-gestures dataset has 250k vector samples. Those are split in training, validation, and test sets at user level according to the ratio: 70%, 20%, and 10%. These vectors are fed to the network in mini-batches of size 2048. For our experiments, we used as the loss function \(\ell\) the categorical cross entropy, with a learning rate \(\gamma =10^{-3}\) and momentum parameter \(\zeta =0.02\). The considered architectures consists of a 6-hidden layer (\(m=6\)) fully connected neural networks, with different parameters depending on the considered datasets, but the same core structure, as displayed in Figure 3. Let \(x = (x_k)_{0 \le k \le K-1}\) be the vector of EMG samples acquired on a window from one channel. For this work, we considered some of the most relevant features to describe sEMG data, as follows.
Fig. 3.
Fig. 3. Proposed neural network architecture for AGR. All the layers except the last one use ReLU activation functions; the last layer uses Softmax. The number of neurons considered for each layer is: 128, 128, 128, 64, 32, 16, in the case of 7-gestures dataset, 256, 256, 256, 128, 64, 32 in the case of 13-gesture dataset, and 512, 512, 256, 128, 64 in the case of 24-gesture and 53-gesture dataset. The last layer has 7, 13 or 24, or 53 neurons representing the gesture number being recognized. Each EMG box represents a column vector containing 8 time-descriptors.
(i)
Mean Absolute Value (MAV)—represents the average muscle activation level within a specific time window. As different gestures involve varying degrees of muscle activation, MAV can capture the overall muscle activity pattern, helping to distinguish between gestures with low and high muscle involvement.
\begin{equation} \mathrm{MAV}(x) = \frac{1}{K}\ \sum _{k=0}^{K-1}{|x_k|}. \end{equation}
(22)
(ii)
Zero Crossing Rate (ZCR)—indicates how frequently the EMG signal crosses zero within a time window. Rapid changes in muscle activation lead to higher ZCR values, making it relevant for identifying gestures involving quick and repetitive movements. A threshold \(\alpha \ge 0\) is used in order to lessen the noise effect. This feature can be computed in an incremental manner and it is defined as
\begin{equation} \mathrm{ZCR}(x) = \Big | \big \lbrace k\in \lbrace 1,\ldots ,K-1\rbrace \mid |x_k -\ x_{k-1}| \ge \alpha \text{ and } x_kx_{k-1}\lt 0 \big \rbrace \Big |. \end{equation}
(23)
(iii)
Waveform Length (WL)—quantifies the amplitude variations within a time window. Longer WL values may correspond to gestures involving sustained muscle activity or complex patterns. It corresponds to the following total variation seminorm:
\begin{equation} \mathrm{WL}(x) = \sum _{k=1}^{K-1}{|x_k -x_{k-1}|}. \end{equation}
(24)
(iv)
Slope Sign Changes (SSC)—counts the number of times the slope of the EMG signal changes its sign within a window. It is effective in detecting abrupt changes in muscle activation, which is crucial for recognizing gestures with distinct start and stop points. It amounts in checking a condition on three consecutive samples \(x_k, x_{k - 1}, x_{k + 1}\) with \(k\in \lbrace 2,\ldots ,K-2\rbrace\):
\begin{equation} \mathrm{SSC}(x) = \Big |\lbrace k\in \lbrace 2,\ldots ,K-2\rbrace \mid (x_k -\ x_{k-1}) (x_k\ -\ x_{k+1}) \ge \alpha \rbrace \Big |, \end{equation}
(25)
where the threshold \(\alpha \gt 0\) is employed to reduce the influence of the noise.
(v)
Root Mean Square (RMS)—provides information about the overall energy of the EMG signal within a time window. High energy levels may correspond to forceful gestures, while lower energy levels may indicate more subtle movements. RMS helps in recognizing gestures with varying intensity levels and it is given by
\begin{equation} \mathrm{RMS}(x) =\sqrt {\frac{1}{K}\sum _{k=0}^{K-1}x_{k}^2}\,. \end{equation}
(26)
(vi)
Hjorth parameters—are a set of three features originally developed for characterizing electroencephalography signals and then successfully applied to sEMG signal recognition. The most relevant Hjorth activity parameter can be thought of as the integrated power spectrum and basically corresponds to the variance of the signal calculated as follows:
\begin{equation} \ \sigma ^2(x)\ =\ \frac{1}{K}\sum _{k=0}^{K-1}{(x_{k}-\ \mu (x))}^2, \end{equation}
(27)
where \(\mu (x)\) represents the mean value of the signal. The standard deviation and \(\mathrm{RMS}(x)\) are equal when the mean of the signal is zero.
(vii)
Skewness— measures the asymmetry of the EMG signal amplitude distribution within a time window. Positive skewness indicates a longer tail on the right side, while negative skewness indicates a longer tail on the left side and can be useful in identifying gestures with asymmetric muscle activations.
\begin{equation} \mathrm{Skew}(x)=\frac{1}{K}\sum _{k=0}^{K-1}{\left(\frac{x_{k}-\ \mu (x)}{\sigma (x)}\right)}^3. \end{equation}
(28)
(viii)
Integrated Square-root EMG (ISEMG) – It provides a measure of the total muscular activity and is particularly useful for capturing the overall muscle involvement over time. iEMG is commonly used to quantify muscle fatigue and effort during movements. In the context of gesture recognition, iEMG can help differentiate between gestures with varying levels of sustained muscle activation and can be indicative of the gesture intensity and duration.
\begin{equation} \mathrm{ISEMG}(x) = \sum _{k=0}^{K-1} \sqrt {\mid x_{k} \mid }. \end{equation}
(29)

4.3 Performance Analysis in Terms of Accuracy and Robustness

Our best AGR system trained conventionally achieves state-of-the-art performance [2, 30, 41], of over 99% accuracy for the first two datasets, around 86% in the case of the 24-gestures dataset [23, 46] and around 88.5% in the case of 53-gestures dataset [55]. A more detailed comparison with other new sEMG-based AGR systems is presented in Table 1 where we show comparisons with other recent works proposing neural-network solutions on the same datasets.
Table 1.
Myo-sEMG 13Myo-sEMG NinaPro DB5 Ex. C NinaPro DB 1
MethodAcc.[%] MethodAcc.[%] MethodAcc.[%] MethodAcc.[%]
7-DNN (ours)99.67 13-DNN (ours)99.31 24-DNN (ours)86.20 53-DNN (ours)88.50
DL-TL [21]98.12 EMG-CNN [1]99.28 EELM [46]83.60 IRDC-Net[55]89.82
Table 1. Comparison to Other sEMG-Based AGR Systems
Since, in this case, the weights are not guaranteed to be positive, the lower bound introduced in Proposition 2.2 does not constitute a valid Lipschitz constant. Computing the exact Lipschitz constant \(\theta _m\) of the system is a very difficult task [18], but we can easily bound \(\theta _m\) between the estimate given by (6) and the spectral norm of the product of all the weight matrices from the network. We found that the Lipschitz constant upper bound \(\theta _m\) is greater than \(10^{12}\) for all our baseline models. Also, while training our model, we faced the problem of overfitting, which is a challenging issue in classification of physiological signals.
This suggests that despite the high performance of the classifiers, their robustness is poorly controlled, leaving the systems vulnerable to adversarial perturbations. A first step towards controlling the Lipschitz constant of the classification algorithm and implicitly its robustness is to impose the nonnegativity condition associated with constraint \(\mathcal {D}\). Training under such a nonnegativity constraint is shown to improve the network operation interpretability [13] and acts as a regularization, reducing overfitting. On the other hand, it can affect its approximation capability and potentially lead to a performance decay. To further study the effect of other regularization techniques from a dual performance-robustness perspective, we trained several models for 1000 iterations using common regularization methods, such as Dropout, \(\ell _1/\ell _2\) Regularization, and Batch Normalization. Such comparisons were also featured in other works like [28]. The results for the 7-gesture dataset are summarized in Table 2. As expected, employing regularization techniques during the training phase improves the overall performance of the baseline classifiers. While the positive impact of regularization techniques on enhancing neural network model performance by mitigating overfitting has been extensively researched and validated, the exploration of their influence on system robustness remains an understudied area. It can be observed that Batch Normalization is the most efficient technique from the accuracy view-point, but it comes with an increase in the overall Lipschitz constant of the classifier. Training the proposed system subject to the nonnegativity constraint (\(\mathcal {D}\)) results in an overall accuracy of 96.92 %, \(95.87\%\), 84.75%, and 85.65% for the case of 7, 13, 24, and 53 classes, respectively. The performance decay was balanced by an increase in the robustness, since the Lipschitz constant, computed as indicated in Proposition 2.2, equals \(\theta _m = 9.69\times 10^{10}\) for 7 classes, \(\theta _m = 9.73 \times 10^{10}\) for 13 classes, \(\theta _m = 1.03 \times 10^{11}\) for 24 classes, and \(\theta _m = 8.4 \times 10^{10}\) for 53 classes. We observed that the accuracy reduction can be overcome by adding additional layers to the architecture. Indeed, we were able to obtain a similar accuracy to the baseline by adding an extra layer to the existing architecture and retraining both systems subject to \(\mathcal {D}\), i.e., 98.68%, 97.21%, 85.12%, and 87.03% for the 7-gesture, 13-gesture, 24-gesture, and 53-gesture datasets, respectively. Furthermore, compared with the unconstrained models, we managed to maintain a high performance while improving the robustness with respect to unconstrained training, i.e., \(\theta _m = 1.02 \times 10^{11}\) for the 7-classes dataset, \(\theta _m = 9.96 \times 10^{10}\) for the 13-classes dataset, \(\theta _m = 4.24 \times 10^{11}\) for the 24-classes dataset, and \(\theta _m = 3.15 \times 10^{11}\) for the 53-classes dataset. We can however conclude from these tests that imposing the nonnegativity of the weight coefficients is not sufficient to reach satisfactory robustness.
Table 2.
Regularization methodParam.Accuracy [%]Lipschitz constantTraining time [ms]
TrainValidationTest
None99.9879.6380.355.3\(\times 10^{12}\)150
Dropoutrate=0.198.3197.7697.665.1\(\times 10^{11}\)160
rate=0.1598.0097.4497.345.6\(\times 10^{11}\)160
rate=0.297.5597.0396.986.1\(\times 10^{12}\)162
Batch Norm.99.9699.6599.786.3\(\times 10^{12}\)160
\(\ell _1\)regularizationreg. factor=\(10^{-4}\)99.2897.3597.597.2\(\times 10^{9}\)135
reg. factor=\(10^{-3}\)95.8795.5395.489.2\(\times 10^{9}\)162
reg. factor=\(10^{-2}\)84.3584.2483.345.8\(\times 10^{10}\)160
\(\ell _2\)regularizationreg. factor=\(10^{-4}\)99.7198.3698.025.7\(\times 10^{11}\)160
reg. factor=\(10^{-3}\)98.6697.9997.253.2\(\times 10^{10}\)160
reg. factor=\(10^{-2}\)91.9791.8691.785.5\(\times 10^{8}\)160
Non-negativity97.2396.8296.929.69\(\times 10^{10}\)162
Table 2. Performance and Robustness Results for 7-Gestures Dataset Baseline Models
The training time is computed for an epoch, with a batch size=2048. All models were trained for 1000 iterations. All experiments were performed using 2 \(\times\) A100 40 Gb Nvidia GPUs.
To further control the robustness of the systems, we have to manage the Lipschitz constant of the networks by training them under additional spectral norm constraints, as described by Equation (11). Searching for the optimal accuracy robustness tradeoff, we trained several models considering each of the four aforementioned constraints, namely \((\mathcal {C}_{i,n})_{1\le i \le m,n\in \mathbb {N}}\) in (12), \((\widetilde{\mathcal {C}}_{i})_{1\le i \le m}\) in Equation (20), and \(({\check{\mathcal {C}}}_{i,n})_{1\le i \le m,n\in \mathbb {N}}\) in Equation (10).
By adjusting the upper bound \(\overline{\vartheta }\), we were able to assess the effect of a robustness constraint on the overall performance of the neural network-based classifiers, and finally to achieve the optimal trade-off. All our models were trained using Algorithm 1 as the optimizer.
The obtained results are summarized in Table 3. As expected, obtaining a good robustness-accuracy tradeoff requires paying attention to the way we design our constrained networks. In all the cases, we show that using tight constraints during the training phase to approximate the Lipschitz bound improves the overall performance of the classifier, proving the generalization properties of our solution.
Table 3.
 Accuracy75 %80 %85 %90 %\(95\%\)
Lipschitz constant 7-gestures Myo-sEMG\(\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)19.537.568.3\(3.5 \times 10^{4}\)\(3.5 \times 10^{8}\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)0.6613.4774.16\(1.04 \times 10^3\)\(1.39 \times 10^5\)
\(\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i\)\({\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}\)0.711.843.426.8711.60
\(\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)0.701.353.416.7911.20
\(\mathcal {C}_{i,n} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.441.792.934.855.68
\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.350.460.650.820.95
Lipschitz constant 13-gestures 13Myo-sEMG\(\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)20.241.8145.2\(2.2 \times 10^5\)\(1.21 \times 10^{11}\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)0.8520.47112.3\(1.62 \times 10^{4}\)\(2.31 \times 10^{8}\)
\(\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i\)\({\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}\)0.842.084.237.5412.02
\(\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)0.812.014.127.5011.92
\(\mathcal {C}_{i,n} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.541.873.384.205.78
\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.490.530.750.921.25
 Accuracy65 %70 %75 %80 %85%
Lipschitz constant 24-gestures NinaPro DB5 Ex C.\(\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)25.1357.16188.26\(2.5 \times 10^6\)\(2.14 \times 10^{11}\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)1.8531.12112.3\(1.82 \times 10^{4}\)\(4.63 \times 10^{8}\)
\(\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i\)\({\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}\)1.742.416.0210.1720.14
\(\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)1.572.185.9410.5819.69
\(\mathcal {C}_{i,n} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.882.054.285.746.84
\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)0.770.961.271.441.96
Lipschitz constant 53-gestures NinaPro DB 1\(\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)26.2686.17200.45\(4.10 \times 10^6\)\(4.32 \times 10^{11}\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)2.6050.12163.14\(2.8 \times 10^{4}\)\(2.9 \times 10^{9}\)
\(\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i\)\({\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}\)2.944.436.8814.2522.16
\(\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)2.832.185.5616.4820.16
\(\mathcal {C}_{i,n} \cap \mathcal {D}_i\)\(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)1.221.806.837.408.23
\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)1.562.082.532.743.88
Table 3. Lipschitz Constant Obtained with Various Constrained Optimization Strategies for Different Accuracies
For comparison, for each of the proposed constraints, we also evaluated the use of an inexact projection, designated by \(\widetilde{\mathsf {P}}\) (see Section 3.3). It can be observed that using an exact projection yields significantly better results. By combining tight constraints and exact projection techniques, we observe that the robustness of the network can be properly ensured while keeping a good accuracy in both cases. Indeed, we succeeded in ensuring a Lipschitz constant around 1 for a 95% accuracy for the first two datasets. The observed loss in accuracy with respect to a standard training is consistent with the “no free lunch theorem” [54].
Training neural networks subject to tight spectral norm constraints can be challenging,3 and the cost of obtaining a good performance is the training time. We used a learning rate scheduler strategy during training, reducing the learning rate by a factor of 2 if the performance does not improve for 1000 epochs. Figure 4 shows the training curves for both validation and training sets in the context of the unconstrained baseline model (yellow and green lines), and in the case of training a constrained version (red and blue lines) using the optimal projection \(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\), with \(\overline{\vartheta }_m = 0.95\). Even though it requires more iterations, the constrained model is capable of reaching an accuracy comparable with the baseline, while providing a robustness certificate.
Fig. 4.
Fig. 4. Accuracy vs. Iterations—constrained and unconstrained models in the context of 7-gesture dataset. The training and validation curves are displayed in green and yellow, respectively, for the unconstrained model. The training and validation curves are displayed in blue and red, respectively, in the case of constrained training, with the bound \(\overline{\vartheta } = 0.95\).
Since the training curves may show some slight variations, we measured the accuracy variations in two ways: by computing the classical standard deviation (\(\operatorname{std})\), and by employing median absolute deviation (\(\operatorname{mad}\)). For a vector \((x_i)_{1\le i \le I}\), it is expressed as \(\operatorname{MAD} = \operatorname{median}((| x_i - \zeta (x)|)_{1\le i \le I})\), where \(\zeta (x)\) represents the median of the vector components. From this quantity, we can derive an empirical estimate of the standard deviation by multiplying \(\operatorname{MAD}\) with a factor equal to 1.4826. The latter estimate is known to be more robust to outliers for Gaussian distributed data, especially in the case of small populations. The results are summarized in Table 4. It can be observed that the empirical standard deviation is below 1.6% and the robust estimate of it is below 1.1% for all four datasets. These deviations values are normal considering the size of the dataset and shows that the presented results are relevant and consistent.
Table 4.
 Accuracy75%80%85%90%95%
7-gestures dataset (Myo-sEMG) Model Variationempirical std0.651.220.561.351.10
robust std1.020.940.530.871.07
13-gestures dataset (13Myo-sEMG) Model Variationempirical std0.651.050.750.750.72
robust std0.770.810.720.970.59
 Accuracy65%70%75%80%85%
24-gestures dataset (NinaPro DB5.C) Model Variationempirical std0.680.950.870.770.76
robust std0.890.740.790.890.64
53-gestures dataset (NinaPro DB1) Model Variationempirical std0.720.930.950.870.78
robust std0.940.770.780.920.84
Table 4. Standard Deviation of Accuracy Computed on 15 Epochs, After Convergence, on the Test Set for Constrained Models
Next, we have also evaluated how the positivity constraint impacts the overall accuracy of our system. We trained a robust network by allowing the weights to have arbitrary signs. For this purpose, we control individually the Lipschitz constant of each layer \(i \in \lbrace 1, \dots , m\rbrace\) to be less than a given value \(\overline{\vartheta }^{1/m}\). The exact projection onto \(\widetilde{\mathcal {C}}_i\), \(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\), as well as the approximate one \(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_i}\) were computed as described previously. In this case, \(\overline{\vartheta }\) represents an upper bound on the Lipschitz constant of the system. Table 5 summarizes the results for different values of \(\overline{\vartheta }\), for two datasets. We compare our method for dealing with Lipschitz constraints with the approach proposed in Reference [51]. This approach, which is implemented in the deel-lip library allows the user to train robust networks in a convenient manner, offering a robustness certificate by performing a spectral normalization for each layer. It can be observed on these datasets that our method yields similar results when using the approximate projection, but better ones when using the exact projection. These results underline again the importance of carefully managing the projections and the effect it has on the accuracy of the system.
Table 5.
 Accuracy75%80%85%90%\(95\%\)
7-gestures dataset Myo-sEMG Lipschitz constant\(\mathcal {C}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}\)72.03127.51296\(8.75 \times 10^4\)\(5.43 \times 10^9\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\)52.06102.49905.45\(7.23 \times 10^4\)\(8.14 \times 10^8\)
Deel-lip[51]75.81126.91283.6\(8.70 \times 10^4\)\(5.43 \times 10^9\)
13-gestures dataset 13Myo-sEMG Lipschitz constant\(\mathcal {C}_i\)\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i}}\)76.59125.201016\(2.03 \times 10^4\)\(4.3 \times 10^8\)
\(\mathsf {P}_{\widetilde{\mathcal {C}}_i}\)61.2299.74740\(1.26 \times 10^4\)\(6.7 \times 10^7\)
Deel-lip [51]77.21125.631120\(2.04 \times 10^4\)\(4.5 \times 10^8\)
Table 5. Lipschitz Constant for Networks Trained with Arbitrary Signs—7-gestures/13-gestures Datasets

5 Robustness Validation

In this section, we investigate to what extent the theoretical concepts described in the previous sections help in improving the robustness of the classifier in different settings. To this goal, we consider the following three scenarios. In the first one, we examine the impact of adversarial attacks on the performance of the classifier. The second scenario takes into account the effect of noise in the acquisition process. In the case of sEMG signals, this noise may come from imperfect skin-sensor contact caused by hairs or drops of sweat. In the last scenario, we perform a real-life experiment using 10 able-bodied volunteers.

5.1 Sensitivity to Adversarial Attacks

We evaluate our robust model on purposely designed perturbations, by studying their influence on the overall performance of the system. We lead attacks on our best robust model in terms of accuracy and robustness achieving 92.95% accuracy and a Lipschitz constant \(\overline{\vartheta } = 0.87\) for the 7-gesture dataset. We compare the results with two conventionally trained models: the best one in terms of performance, which achieves 99.78% prediction accuracy on non-adversarial data, and another one trained to have similar performance as our robust model reaching 92.99% accuracy on the original test set.
To create the adversarial samples, we used some of the most popular attackers, namely:
Fast gradient sign method (FGSM) [27]—generates adversarial data based on the gradient of the cost function with respect to the input data;
Jacobian Saliency Map Attacker (JSMA) [44]—computes a perturbation based on \(\ell _2\) distance metric by iteratively selecting the input sample that will increase the chance of miss-classification;
Projected gradient descent (PGD) [39]—uses local first order information about the network to create adversarial examples;
Carlini and Wagner (C&W) [8]—uses \(\ell _{2}\) distance to compute the optimal adversarial perturbation.
Gradient Matching (GM) [26]—this is a data-poisoning black-box attack. In this case, the attacker does not have access to the victim model parameters, but instead is trying to match the gradient direction for adversarial examples.
We also show a comparison with another popular technique of ensuring the robustness of neural network-based models, namely Adversarial training. This implies training an extended version of the dataset, containing the original training data together with a perturbed version of the samples in an effort to increase the system stability against adversarial inputs. Note that this method is purely empirical and gives no theoretical robustness certificates. We implemented an adversarial training strategy detailed in Reference [39], training the system using an augmented version of the dataset which was updated every 25 epochs. The adversarial samples were created using PGD attack and then the model was validated using data containing perturbations computed with various attacks.
The results summarised in Table 6 show the performance obtained for the 7-gesture test set. Note that the robust model performance is barely affected by the adversarial perturbations, whereas the baseline models show a huge drop in accuracy. It can be observed that adversarial training helps to increase the robustness, but our method of controlling the Lipschitz constant the network provides better results when facing data perturbed with other attackers than PGD. As expected, the poisoning attack is less effective than the white-box ones against the baseline models, but still our robust model showcases better performance. This shows that our method is more versatile, since its performance remains stable whatever the attacker.
Table 6.
Accuracy [%]
 robust modelbaseline modeladversarial trained model – PGD
Attackadversarialnon-adversarialadversarialnon-adversarialadversarialnon-adversarialadversarialnon-adversarial
FGSM [27]91.7592.9576.4899.7871.2192.9980.4397.25
C& W \(\ell _{2}\) [8]90.0948.0345.8560.17
PGD [39]91.9259.3656.3897.25
JSMA [44]91.1089.3781.2783.31
GM [26]92.1398.2589.0495.38
Table 6. Adversarial Attack Results
First four lines correspond to white-box attacks, whereas the last line shows a black-box attack. We consider out best constrained model, having a Lipschitz constant \(\theta =0.97\), two models trained conventionally: the best baseline and another one having similar performance as the constrained one. On the last columns we feature an adversarial trained model using PGD-generated perturbations.

5.2 Noisy Input Behaviour

To simulate the effect of underlying noise generated during the acquisition process, we added synthetic noise directly to the raw sEMG data, prior to the feature extraction step. The noise is chosen independent and identically distributed according to a Gaussian mixture law \((1-p) \mathcal {N}(0,\sigma _0^2)+p \mathcal {N}(0,\sigma _1^2)\). The mixture comprises a background component, corresponding to the intrinsic electronic noise in the armband, such as thermal or quantization noise, and an impulsive component accounting for outliers. Those may be related to imperfect wiring that can generate impulse-like artifacts. In our experiments, we consider background and impulse noises with standard deviations \(\sigma _0 = \alpha\) and \(\sigma _1 = 10\alpha\) with \(\alpha \in [0,+\infty [\). We generate different levels of noise, by varying the parameter \(\alpha\). The probability of peaks \(p\in [0,1]\) is also adjusted to simulate more or less severe scenarios in terms of outliers.
From the resulting noisy signals, we extract the features described in Section 4 and pass them to the classifier, using our robust models reaching an accuracy of 92.95% (\(\overline{\vartheta } = 0.87\)) for the 7-gestures dataset, and 93.05% (\(\overline{\vartheta } = 0.98\)) in the case of the 13-gestures dataset, trained with non-altered data. We compared the results achieved with our robust training with those obtained with (i) classical training and (ii) adversarial training. In this case, the adversarial training was performed by generating an extended dataset, containing the original data and corrupted versions of them by additive noise following the Gaussian mixture law described above, where the parameters p and \(\alpha\) were drawn randomly in a uniform manner on \([0.15, 0.45]\) and \([0, 2]\), respectively. In the absence of noise, a similar performance in terms of accuracy was obtained: 7-gestures dataset—92.99%, and 92.97%, 13-gestures dataset—93.03% and 92.98% for baseline and the adversarial training, respectively.
The experimental results obtained on two datasets are depicted in Figure 5. The red, blue, and green lines correspond to the unconstrained, constrained, and adversarial models, respectively. We observe that the constrained model is significantly less affected by the presence of noise in the inputs than the one trained without robustness guarantees. It is also worth noting that training with adversarial inputs also leads to satisfactory results, although usually slightly less accurate. The Lipschitz lower and upper bounds computed for the networks trained in an adversarial manner are indeed much lower than those with standard training, but they remain quite large ((1845.23, 79534.2) for 7-gestures dataset and (1754.74, 64595.8) for 13-gestures dataset).
Fig. 5.
Fig. 5. Accuracy vs. \(\alpha\) in the context Noisy Inputs training. First row: 7-gesture dataset; Second row: 13-gestures dataset. Red line: robust model; Blue line: baseline model; Green line: adversarial trained model.
This experiment emphasizes that controlling the Lipschitz constant of a network improves its robustness not only against targeted adversarial attacks, as shown previously, but also in the case of black-box attacks, where no prior information about the model is used.

5.3 Real-Life Scenario Validation

To illustrate the practical applicability of our findings, we proceed to validate our model in a real-life context. For this purpose, we designed an experiment to compare a conventionally trained model with the constrained one. We integrated both models in a real-time application that controls a 3D hand on a screen, as well as a game that can be controlled by gestures, to give the user a tangible feedback. We used the Unity4 platform to design and control a 3D hand and then encapsulated our models in an application which performed real-time inference and the hand was moving on the screen in accordance with the predicted gesture. We asked 10 volunteers (males and females) to test both models by performing each gesture 20 times. We emphasize that the user had no prior knowledge about what model was implemented, since it was randomly selected at the beginning of each new trial. Pictures of the experimental setup are provided in Figure 6. Table 7 details on a user level, how many (out of the 20) trials were erroneously classified. U and C denote the Unconstrained and the Constrained models, respectively. Note that, despite obtaining very good results on the test set, the unconstrained model loses a lot in terms of performance (up to 15%) when facing real-life data. We can observe that training a positive neural networks subject to Lipschitz constraints improves the overall robustness of the classifier against adversarial perturbations, not only from a theoretical viewpoint, but also practically by leading to more reliable systems with greater generalization power.
Table 7.
MovementUser #1User#2User#3User#4User#5User#6User#7User#8User#9User#10
 CUCUCUCUCUCUCUCUCUCU
up22130001000202120213
down11022300241023110101
right04000101110200000112
left35140101250001231201
fist02240010030111021113
spread03253424100012100103
Sum61761859376131549672833
Error rate (%)5145154.17.52.55.7510.70.74.13.37.555.71.66.62.510.7
Table 7. Real-Scenario Experiment Results
Fig. 6.
Fig. 6. Real-life experimental setup.
As for the other application, we asked the volunteers to play 2 rounds of a gesture-controlled game, one with each model. The game was inspired by the famous Temple Run,5 and consists of a moving cube which the user controls via gestures. The player can move his/hers hand left or right to move the character to either side of the screen to avoid obstacles. The player can also move the hand up to jump or spread its fingers to shoot and clear the obstacles ahead. The game is over when the player fails to take a turn or to jump/ clear an obstacle. We observed that 70% of the users were able to obtain higher scores when they used the constrained model, showing again that our solution is more stable when it comes to real-life applications.

5.4 Limitations

Increased training time is one of the main limitations of our proposed approach. Indeed, to compute the true projection, the proposed method uses an iterative algorithm which performs singular value decomposition at each iteration, which is a resource consuming operation, especially when performed on large matrices. We propose several lower complexity solutions, which have proved to offer a good tradeoff between training time, robustness and performance. Table 8 shows the training time for all the propose constraint algorithms. The time is measured per step, which consists of a batch of 2048 examples. Nevertheless, it is worth noting that the additional time overhead is applicable only during the training phase. The inference is the same for all the models, around 7 ms per step.
Table 8.
ConstraintNone\(\widetilde{\mathsf {P}}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)\(\mathsf {P}_{\widetilde{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)\({\widetilde{\mathsf {P}}}_{\check{\mathcal {C}}_{i,n} \cap \mathcal {D}_i}\)\(\mathsf {P}_{\check{\mathcal {C}}_{i} \cap \mathcal {D}_i}\)\(\widetilde{\mathsf {P}}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)\(\mathsf {P}_{\mathcal {C}_{i,n} \cap \mathcal {D}_i}\)
time/step [ms]79119111828
Table 8. Training Time for Different Constraints in the Case of an \(m=6\)-layer Network
The training time is computed on a batch of 2048 examples.
Another limitation is related to the fact that our method for controlling the Lipschitz constant of the system is currently applicable in the context of nonnegative-weighted fully connected feed-forward neural networks. Although the performance remains good for the considered AGR systems, the nonnegativity constraint might lead to a loss of expressivity of the neural networks in other inference tasks. In a future work, we plan to extend our method towards more general neural network architectures, including convolutional layers, skip connections, and so on.

6 Conclusion

This work has shown the usefulness of designing robust feed-forward neural networks for AGR based on sEMG physiological signals. More precisely, we proposed to finely control the Lipschitz constant of these nonlinear systems by considering positively weighted neural architectures. To offer robustness certificates, we also developed new optimization techniques for training classifiers subject to spectral norm constraints on the weights. We studied various constrained formulations and showed that robustness can be secured without sacrificing accuracy when using a combination of tight constraints and exact projections. We also provide several lower-complexity solutions, which reduce the training time significantly.
Experiments on four distinct datasets illustrated the good performance of our approach. We further demonstrated the effectiveness of our robust classifier, compared with classically trained ones, when facing white-box and black-box attacks.
We also want to highlight that one of the key advantages of our research was the ability to conduct real-life experiments. This was made possible because we had access to a specialized acquisition module tailored for capturing gesture data. The availability of this acquisition module allowed us to gather real-world gesture data in a controlled setting, which closely mimics practical scenarios. By conducting experiments with real users and their gestures, we could thoroughly evaluate the performance and accuracy of our proposed methodology. This real-life experimentation not only provided us with invaluable insights into the effectiveness of our approach, but also demonstrated its feasibility and potential for implementation in real-world applications.
In future works, it would be interesting to apply such a robust training procedure to other applications in pattern recognition involving data acquired in real-time.

Footnotes

1
To simplify our notation, \(\mathcal {B}(0,\overline{\vartheta })\) will designate any spectral ball of this kind whatever the dimensions of the involved matrices.
3
A code in TensorFlow will be made available upon the acceptance of the article.

A Accelerated Dfb Algorithm

Let \(n\in \mathbb {N}\setminus \lbrace 0\rbrace\) and \(i\in \lbrace 1,\ldots ,m\rbrace\). Computing the projection of a matrix \(\overline{W}_i \in \mathbb {R}^{N_i\times N_{i-1}}\) onto \(\mathcal {D}_i \cap \mathcal {C}_{i,n}\) is equivalent to solve the following matrix optimization problem:
\begin{equation} \underset{{{\scriptstyle \begin{matrix}{W_i\in \mathbb {R}^{N_i \times N_{i-1}}}\end{matrix}}}}{\text{minimize}}\;\;\iota _{\mathcal {D}_i}(W_i)+\iota _{\mathcal {B}(0,\overline{\vartheta })}(A_{i,n}W_iB_{i,n})+ \frac{1}{2} \Vert W_i-\overline{W}_i\Vert ^2_{\rm F} \end{equation}
(30)
where \(\Vert \cdot \Vert _{\rm F}\) is the Frobenius norm and \(\iota _{\mathcal {S}}\) denotes the indicator of a set \(\mathcal {S}\) (this function is equal to 0 on this set and \(+\infty\) otherwise.) The dual optimization problem associated to this strongly convex minimization problem reads
\begin{equation} \underset{{{\scriptstyle \begin{matrix}{Y \in \mathbb {R}^{N_m\times N_0}}\end{matrix}}}}{\text{minimize}}\;\;f^*(-A_{i,n}^\top Y B_{i,n}^\top)+ \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}(Y) , \end{equation}
(31)
where for a given function g, \(g^*\) denotes its Fenchel-Legendre conjugate. In our case \(f=\iota _{\mathcal {D}_i}+\frac{1}{2} \Vert \cdot -\overline{W}_i\Vert ^2_{\rm F}\). From standard conjugation rules [33], \(f^*\) is equal to
\begin{equation} (\forall W_i\in \mathbb {R}^{N_i\times N_{i-1}})\quad f^*(W_i) = \widetilde{\iota _{\mathcal {D}_i}}(W_i+\overline{W}_i), \end{equation}
(32)
where \(\widetilde{\iota _{\mathcal {D}_i}}\) is the Moreau envelope of \(\iota ^*_{\mathcal {D}_i}\) given by
\begin{equation} \widetilde{\iota _{\mathcal {D}_i}}(W_i)= \inf _{W^{\prime }_i\in \mathbb {R}^{N_i\times N_{i-1}}} \iota ^*_{\mathcal {D}_i}(W^{\prime }_i)+\frac{1}{2} \Vert W^{\prime }_i-W_i\Vert ^2_{\rm F}. \end{equation}
(33)
The Moreau envelope of a proper lower-semincontinuous convex function is differentiable. Thus \(f^*\) is differentiable and its gradient is [6, Example 17.33]
\begin{equation} \nabla f^*(W_i) = \mathsf {P}_{\mathcal {D}_i}(W_i+\overline{W}_i). \end{equation}
(34)
We deduce that the gradient of \(Y \mapsto f^*(-A_{i,n}^\top Y B_{i,n}^\top)\) is
\begin{equation*} -A_{i,n} \mathsf {P}_{\mathcal {D}_i}(\overline{W}_i-A_{i,n}^\top Y B_{i,n}^\top) B_{i,n}. \end{equation*}
Since \(\mathsf {P}_{\mathcal {D}_i}\) is a nonexpansive operator, the latter function has a Lipschitz gradient with constant \(\beta = \Vert A_{i,n}\Vert ^2_{\rm S} \Vert B_{i,n}\Vert ^2_{\rm S}\). The dual problem (31) thus corresponds to the minimization of the sum of a smooth convex function and a proper lower-semicontinuous function. Consequently, it can be minimized by a proximal algorithm. Such a strategy will require to calculate the proximity operator of \(\gamma \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}\) for some scaling parameter \(\gamma \in \, ]0,+\infty [\). By using Moreau’s formula [6], this proximity operator is expressed as
\begin{equation} (\forall Y\in \mathbb {R}^{N_m\times N_0})\;\; \operatorname{prox}_{\gamma \iota ^*_{\mathcal {B}(0,\overline{\vartheta })}}(Y) = Y - \gamma \mathsf {P}_{\mathcal {B}(0,\overline{\vartheta })}(\gamma ^{-1} Y). \end{equation}
(35)
A classical solution for solving the dual problem consists in using the standard forward-backward algorithm [15, 20]. This leads to Algorithm [17]. Another solution consists in using the FISTA-like algorithm in [10], which leads to the accelerated version in Algorithm 2. The sequences \((Y_\ell)_{\ell \in \mathbb {N}}\) generated by these two algorithms is guaranteed to converge to a solution \(\widehat{Y}\) to the dual problem. In addition, from Kuhn–Tucker conditions, the solution to the primal problem \(\widehat{W}_i=\mathsf {P}_{\mathcal {S}_{i,n}}(\overline{W}_i)\) is equal to \(\nabla f^*(-A_{i,n}^\top \widehat{Y} B_{i,n}^\top)\). It follows from (34) and the continuity of \(\mathsf {P}_{\mathcal {D}_i}\) that the sequence \((V_\ell)_{\ell \in \mathbb {N}}\) converges to \(\widehat{W}_i\).

References

[1]
Cristina Andronache, Marian Negru, Ioana Bădiţoiu, George Cioroiu, Ana Neacsu, and Corneliu Burileanu. 2022. Automatic gesture recognition framework based on forearm EMG activity. In Proc. IEEE Int. Conf. Telecomm. Signal Porcess.284–288.
[2]
Cristina Andronache, Marian Negru, Ana Neacşu, George Cioroiu, Anamaria Rădoi, and Corneliu Burileanu. 2020. Towards extending real-time EMG-based gesture recognition system. In Proc. IEEE Int. Conf. Telecomm. Signal Porcess.Milan, 301–304.
[3]
Cem Anil, James Lucas, and Roger Grosse. 2019. Sorting out Lipschitz function approximation. In Proc. Int. Conf. Mach. Learn.Long Beach, California, 291–301.
[4]
Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto, and Henning Müller. 2014. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 1, 1 (2014), 1–13.
[5]
Manfredo Atzori, Arjan Gijsberts, Ilja Kuzborskij, Simone Elsig, Anne-Gabrielle Mittaz Hager, Olivier Deriaz, Claudio Castellini, Henning Müller, and Barbara Caputo. 2014. Characterization of a benchmark database for myoelectric movement classification. IEEE Trans. on Neural Syst. and Rehab. Eng. 23, 1 (2014), 73–83.
[6]
Heinz H. Bauschke and Patrick L. Combettes. 2019. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. 2nd ed., Corrected Printing. New York: Springer 1 (2019), 1–619.
[7]
Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In USENIX Security Symp.Austin, TX, 513–530.
[8]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In IEEE Symp. Security Privacy. San Jose, CA, 39–57.
[9]
Enea Ceolini, Charlotte Frenkel, Sumit Bam Shrestha, Gemma Taverni, Lyes Khacef, Melika Payvand, and Elisa Donati. 2020. Hand-gesture recognition based on EMG and event-based camera sensor fusion: A benchmark in neuromorphic computing. Front. in Neurosci. 14, 1 (2020), 637–652.
[10]
Antonin Chambolle and Charles Dossal. 2015. On the convergence of the iterates of the fast iterative shrinkage/thresholding algorithm. J. Optim. Theory Appl. 166, 3 (2015), 968–982.
[11]
Tong Chen, Jean-Bernard Lasserre, Victor Magron, and Edouard Pauwels. 2020. Semialgebraic optimization for Lipschitz constants of ReLU networks. In Proc. Int. Conf. Neural Info. Process. Syst. Curran Associates Inc., 12.
[12]
Ming Jin Cheok, Zaid Omar, and Mohamed Hisham Jaward. 2019. A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cyber. 10, 1 (2019), 131–153.
[13]
Jan Chorowski and Jacek M. Zurada. 2015. Learning understandable neural networks with nonnegative weight constraints. IEEE Trans. Neural Netw. Learn. Syst. 26, 1 (2015), 62–69.
[14]
Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. 2017. Parseval networks: Improving robustness to adversarial examples. In Proc. Int. Conf. Mach. Learn.Sydney, NSW, 854–863.
[15]
Patrick Combettes and Jean-Christophe Pesquet. 2008. Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. on Optim. 18, 4 (2008), 1351–1376.
[16]
Patrick Combettes and Jean-Christophe Pesquet. 2020. Deep neural network structures solving variational inequalities. Set-Valued Var. Anal. 28, 3 (2020), 491–518.
[17]
Patrick L. Combettes, Dinh Dũng, and Bang Công Vũ. 2010. Dualization of signal recovery problems. Set-Valued Var. Anal. 18, 3–4 (2010), 373–404.
[18]
Patrick L. Combettes and Jean-Christophe Pesquet. 2020. Lipschitz certificates for layered network structures driven by averaged activation operators. SIAM J. on Math. Data Sci. 2, 4 (2020), 529–557.
[19]
Patrick L. Combettes and Jean-Christophe Pesquet. 2021. Fixed point strategies in data science. IEEE Trans. Sig. Proc. 69, 1 (2021), 3878–3905.
[20]
Patrick L. Combettes and Valérie R. Wajs. 2005. Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 4 (2005), 1168–1200.
[21]
Ulysse Côté-Allard, Cheikh Latyr Fall, Alexandre Drouin, Alexandre Campeau-Lecours, Clément Gosselin, Kyrre Glette, François Laviolette, and Benoit Gosselin. 2019. Deep learning for electromyographic hand gesture signal classification using transfer learning. IEEE Trans. Neural Syst. Rehabilitation Eng. 27, 4 (2019), 760–771.
[22]
Trevor J. Darrell, Irfan A. Essa, and Alex P. Pentland. 1996. Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. Pattern Anal. Mach. Intell. 18, 12 (1996), 1236–1242.
[23]
Akram Fatayer, Wenpeng Gao, and Yili Fu. 2022. sEMG-based gesture recognition using deep learning from noisy labels. IEEE J. of Biomed. and Health Info. 26, 9 (2022), 4462–4473.
[24]
Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. 2019. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Adv. Neural Info. Process. Syst.Vancouver, Canada, 11423–11434.
[25]
Xiaojie Gao, Yueming Jin, Qi Dou, and Pheng-Ann Heng. 2020. Automatic gesture recognition in robot-assisted surgery with reinforcement learning and tree search. In Proc. IEEE Int. Conf. Robot. Autom.Paris, France, 8440–8446.
[26]
Jonas Geiping, Liam H. Fowl, W. Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, and Tom Goldstein. 2020. Witches’ Brew: Industrial scale data poisoning via gradient matching. In Int. Conf. Learn. Represent.
[27]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Int. Conf. Learn. Represent.San Diego, CA.
[28]
Kavya Gupta, Beatrice Pesquet-Popescu, Fateh Kaakai, and Jean-Christophe Pesquet. 2021. A quantitative analysis of the robustness of neural networks for tabular data. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process.8057–8061.
[29]
Jianxing He, Sally L. Baxter, Jie Xu, Jiming Xu, Xingtao Zhou, and Kang Zhang. 2019. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 1 (2019), 30–36.
[30]
Yujian Jiang, Lin Song, Junming Zhang, Yang Song, and Ming Yan. 2022. Multi-category gesture recognition modeling based on sEMG and IMU signals. Sensors 22, 15 (2022), 1–25.
[31]
Rami N. Khushaba and Sarath Kodagoda. 2012. Electromyogram (EMG) feature reduction using mutual components analysis for multifunction prosthetic fingers control. In Proc. Int. Conf. Control Autom. Robotics & Vision. Guangzhou, China, 1534–1539.
[32]
Jonghwa Kim, Stephan Mastnik, and Elisabeth André. 2008. EMG-based hand gesture recognition for realtime biosignal interfacing. In Proc. Int. Conf. Intell. User Interfac.Canaria, Spain, 30–39.
[33]
Nikos Komodakis and Jean-Christophe Pesquet. 2015. Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Process. Mag. 32, 6 (2015), 31–54.
[34]
Okan Kopuklu, Neslihan Kose, and Gerhard Rigoll. 2018. Motion fused frames: Data level fusion strategy for hand gesture recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn.Utah, 2103–2111.
[35]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In Int. Conf. on Learn. Represent.Toulon, France.
[36]
Alexey Kurakin, Zhengyou Zhang, and Zicheng Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proc. IEEE European Signal Processing Conf.Bucharest, Romania, 1975–1979.
[37]
Fabian Latorre, Paul Rolland, and Volkan Cevher. 2020. Lipschitz constant estimation of neural networks via sparse polynomial optimization. In Int. Conf. on Learning Representations.
[38]
Peiliang Li, Xiaozhi Chen, and Shaojie Shen. 2019. Stereo R-CNN based 3D object detection for autonomous driving. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn.Long Beach, CA, 7644–7652.
[39]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proc. Int. Conf. Learn. Represent.Vancouver, BC, Canada.
[40]
Ana Neacşu, Jean-Christophe Pesquet, and Corneliu Burileanu. 2020. Accuracy-robustness trade-off for positively weighted neural networks. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process.Barcelona, Spain, 8389–8393.
[41]
Ana Antonia Neacsu, George Cioroiu, Anamaria Rădoi, and Corneliu Burileanu. 2019. Automatic EMG-based hand gesture recognition system using time-domain descriptors and fully-connected neural networks. In Proc. Int. Conf. Telecommunications Signal Process.Budapest, Hungary, 232–235.
[42]
Natalia Neverova, Christian Wolf, Graham Taylor, and Florian Nebout. 2015. Moddrop: Adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2015), 1692–1706.
[43]
Alvaro David Orjuela-Cañón, Andrés F. Ruíz-Olaya, and Leonardo Forero. 2017. Deep neural network for EMG signal classification of wrist position: Preliminary results. In IEEE Latin American Conf. Comput. Intell.Arequipa, Peru, 1–5.
[44]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In IEEE Symp. Security Privacy. Saarbrücken, Germany.
[45]
Patricia Pauli, Anne Koch, Julian Berberich, Paul Kohler, and Frank Allgower. 2021. Training robust neural networks using Lipschitz bounds. IEEE Control Syst. Lett. 6, 1 (2021), 121–126.
[46]
Fulai Peng, Cai Chen, Danyang Lv, Ningling Zhang, Xingwei Wang, Xikun Zhang, and Zhiyong Wang. 2022. Gesture recognition by ensemble extreme learning machine based on surface electromyography signals. Front. in Human Neurosci. 16 (2022), 1–14.
[47]
Stefano Pizzolato, Luca Tagliapietra, Matteo Cognolato, Monica Reggiani, Henning Müller, and Manfredo Atzori. 2017. Comparison of six electromyography acquisition setups on hand movement classification tasks. PLOS ONE 12, 10 (Oct.2017), 1–17.
[48]
Jinxian Qi, Guozhang Jiang, Gongfa Li, Ying Sun, and Bo Tao. 2019. Intelligent human-computer interaction based on surface EMG gesture recognition. IEEE Access 7, 1 (2019), 61378–61387.
[49]
Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2018. Searching for activation functions. Proc. Int. Conf. Learn. Represent. (30 Apr.–03 May2018), 1–13.
[50]
Kevin Scaman and Aladin Virmaux. 2018. Lipschitz regularity of deep neural networks: Analysis and efficient estimation. In Proc. Ann. Conf. Neur. Inform. Proc. Syst.Montreal, Canada, 3839–3848.
[51]
Mathieu Serrurier, Franck Mamalet, Alberto González-Sanz, Thibaut Boissin, Jean-Michel Loubes, and Eustasio del Barrio. 2021. Achieving robustness in classification using optimal transport with hinge regularization. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn.Nashville, 505–514.
[52]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Int. Conf. Learn. Represent.Banff, Canada.
[53]
Miyato Takeru, Kataoka Toshiki, Koyama Masanori, and Yoshida Yuichi. 2018. Spectral normalization for generative adversarial networks. In Int. Conf. Learn. Represent.Vancouver, Canada.
[54]
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2019. Robustness may be at odds with accuracy. Int. Conf. Learn. Represent. (2019), 1–20.
[55]
Xiangrui Wang, Lu Tang, Qibin Zheng, Xilin Yang, and Zhiyuan Lu. 2023. IRDC-Net: An inception network with a residual module and dilated convolution for sign language recognition based on surface electromyography. Sensors 23, 13 (2023), 1–18.
[56]
Feng Wen, Zhongda Sun, Tianyiyi He, Qiongfeng Shi, Minglu Zhu, Zixuan Zhang, Lianhui Li, Ting Zhang, and Chengkuo Lee. 2020. Machine learning glove using self-powered conductive superhydrophobic triboelectric textile for gesture recognition in VR/AR applications. Adv. Sci. 7, 14 (2020), 2000261.
[57]
Nelly Indriani Widiastuti. 2019. Convolution neural network for text mining and natural language processing. In IOP Conf. Mater. Sci. Eng., Vol. 662. Kazimierz Dolny, Poland.
[58]
Zhiwen Yang, Du Jiang, Ying Sun, Bo Tao, Xiliang Tong, Guozhang Jiang, Manman Xu, Juntong Yun, Ying Liu, Baojia Chen, and Jianyi Kong. 2021. Dynamic gesture recognition using surface EMG signals based on multi-stream residual network. Front. in Bioengin. and Biotech. 9, 1 (2021), 1–13.

Cited By

View all
  • (2025)Gesture recognition with adaptive-weight-based residual MultiheadCrossAttention fusion based on multi-level feature informationInformation Fusion10.1016/j.inffus.2024.102789115(102789)Online publication date: Mar-2025
  • (2024)ABBA Neural Networks: Coping with Positivity, Expressivity, and RobustnessSIAM Journal on Mathematics of Data Science10.1137/23M15895916:3(649-678)Online publication date: 15-Jul-2024
  • (2024)Are Almost Non-Negative Neural Networks Universal Approximators?2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP)10.1109/MLSP58920.2024.10734768(1-6)Online publication date: 22-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 2
April 2024
481 pages
EISSN:2157-6912
DOI:10.1145/3613561
  • Editor:
  • Huan Liu
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2024
Online AM: 08 December 2023
Accepted: 20 November 2023
Revised: 05 October 2023
Received: 10 May 2023
Published in TIST Volume 15, Issue 2

Check for updates

Author Tags

  1. Recognition
  2. perturbations
  3. stability
  4. Lipschitz regularity optimization
  5. EMG

Qualifiers

  • Research-article

Funding Sources

  • BRIDGEABLE ANR Research and Teaching Chair in AI
  • Romanian National Authority for Scientific Research and Innovation, UEFISCDI

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,079
  • Downloads (Last 6 weeks)125
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Gesture recognition with adaptive-weight-based residual MultiheadCrossAttention fusion based on multi-level feature informationInformation Fusion10.1016/j.inffus.2024.102789115(102789)Online publication date: Mar-2025
  • (2024)ABBA Neural Networks: Coping with Positivity, Expressivity, and RobustnessSIAM Journal on Mathematics of Data Science10.1137/23M15895916:3(649-678)Online publication date: 15-Jul-2024
  • (2024)Are Almost Non-Negative Neural Networks Universal Approximators?2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP)10.1109/MLSP58920.2024.10734768(1-6)Online publication date: 22-Sep-2024
  • (2024)Optimization of High-Speed Train Driver Gesture Recognition Using Self-Supervised Learning and Stream State-Tying2024 China Automation Congress (CAC)10.1109/CAC63892.2024.10864778(4270-4275)Online publication date: 1-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media