T Assine 2017

Uploaded by

Nathan Rodrigues

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views8 pages

T Assine 2017

Uploaded by

Nathan Rodrigues

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Hybrid Classifier for Fault Detection and Isolation

in Wind Turbine based on Data-Driven

FADILI Yassine BOUMHIDI Ismail
Laboratory SSI, Department of Physics Laboratory SSI, Department of Physics
Faculty of Sciences Dhar El Mehraz Faculty of Sciences Dhar El Mehraz
University Sidi Mohammed Ben Abdellah University Sidi Mohammed Ben Abdellah
Fez, Morocco Fez, Morocco
fadili.m2si@gmail.com ismail.boumhidi@usmba.ac.ma

Abstract — In this paper, the problem of fault diagnosis in of wind turbines is also increasing. Moreover, in order to
wind turbine will be addressed based on data-driven. The idea is ensure accessibility, reduce downtime and maintenance costs,
to classify of data by a classifier in two classes, so that we can faster detection of defects and isolation should be introduced in
decide whether the state of the system is defective or not. The wind systems.
proposed hybrid classifier is a combining of those three
Benchmarking is based on the principles of ability to be
approaches Bayes Statistical Algorithm, Back-propagation
Neural Networks and Decision Trees. During training, the validated, reproducible and comparable. This requires an exact
learning rate has gradually decreased in discrete steps converges specification of the reference problem ranging from the
to a minimum error solution of classification. The Different parts description of the process for the design of the experiment and
of the process were investigated, including actuators, sensors and the test data to the evaluation criteria for achieving the results
process faults. All Process faults mainly concerned friction in the [3]. To the extent that this is true, reference models for wind
wind turbine, which might cause it damage. Our results are turbines have recently received much attention of varying
compared with those three approaches each separately who have degrees of complexity and have been successfully proposed
formed the proposed strategy to indicate the value of our strategy. and applied in simulation studies to compare approaches to the
This proposed classifier has built with the ''Waikato Environment
for Knowledge Analysis (WEKA)'' library and evaluated by
faults detection and isolation (FDI). The most popular of these
observing of classification rate. modern FDI techniques is the model-based approach [1].
Where a priori mathematical information is used to model the
Keywords— Fault Detection and Isolation (FDI); wind turbine; normal system. Based on the system model, state estimation
Data-Driven; Classification; Learning; Bayes Statistical Algorithm; techniques or parameters are then used to facilitate comparison
Back-propagation Neural Networks; Decision Tree. between the actual process and the modeled process to
accomplish fault detection. This comparison can also be used
I. INTRODUCTION to generate a residue vector from which the cause of the defect
Technological systems are vulnerable to defects. Actuator can be identified by a statistical or geometric method based on
defects reduce the performance of the control systems and may knowledge.
even cause a complete system failure. Incorrect readings of As for the wind turbine system based on a limited
sensors are the reason for several malfunctions since these description of noise and modeling errors, the interval observer
errors can reduce the quality and efficiency of a production approach was applied to analyze the fault signatures observed
line. Adding to this that failure of a single component in a online and to match them with the theoretical results obtained
complex system can cause a malfunction of the entire system. using a structural analysis and a line reasoning scheme [4]-[5].
Therefore, the operation of the system must be stopped to We also find the use of parity equations to check the coherence
avoid damage to humans and/or machinery. Due to the between the measurements and the model by looking for a
economic requirements, ecological requirements and safety to parameter in the set of feasible parameters (approximated by a
satisfy, the high reliability of the technological systems has zonotope) [6]. The fuzzy model, in the form of Takagi-Sugeno
prototypes, represents the residual generators used for the
become a dominant objective in the industry. This is why
detection and isolation of defects [7]-[8].
several researchers have aroused considerable interest in
In addition, the analytical model requires simultaneous
research over the last decades in methods of detection and
isolation of defects and which can be classified into two consideration of the exact model of the system diagnosis such
categories based model [1] or based data [2]. as the laws of thermodynamics, the dynamics of compressible
fluids, electrical, magnetic, as well as Newtonian mechanics.
It is known that in recent years the conversion of the kinetic
To overcome the limitations of the model-based approach,
energy of the wind turbine into electrical energy has received
some researchers have also used large amounts of data from
considerable attention in the energy market due to the growing
available sensor measurements, event logs, records, and
demand and the need for clean energy sources. Except that
with the increasing production of wind energy, the complexity diagnostic techniques based on sample data [1] [9]-[10]. In the

978-1-5090-4062-9 / 17 / $ 31.00 © 2017 IEEE

context of the data-based diagnosis, the classifier for some vectors are constructed as a combination of the measurements
control design and (FDI) methods appeared from the 1990s. captured by the sensors. Secondly, we will combine the Naive
The classification techniques of the models have been widely Bayesian Algorithm , Back-propagation Neural Networks and
used for fault diagnosis [11]. The classification-based the Decision Trees to build a hybrid classier have more
diagnostic procedure [21]-[22] takes place in two stages. First, performance, robustness, reproducibility and reliability to
an empirical classifier drawns from previous knowledge and/or process and classify the data. Third, learning of the proposed
stories [24]. This is considered the training process[23]. Then, classifier from a part of the predicted data, and the classifier
using the classifier obtained, the real-time data are classified parameters will be adjusted to improve as much as possible the
into certain classes [25]-[26] which correspond to health states quality of the classification. Finally, the results obtained will
(normal state or various faults states). Thus, detection and be validated by the generation of the classification on all the
isolation of defects can be obtained. measurement data. The comparing Our results with Naive
Adding to this that it has been found that classification Bayesian classifier [27] and with the BPN classifier [30] and
performance can be improved by combining some methods of with the Decision Trees Classifier [31] indicate the value of
signal analysis and/or characteristic extraction . The Neural our classifier. To validate our results we have used The WEKA
Network approach has the ability to detect known and machine learning library.
unknown attacks. It can be distinguished into supervised and The remainder of the paper is organized as follows: in
unsupervised training algorithm. According to those in [12]- section 2, each component of the wind turbine system is
[30], Back-Propagation Neural Network (BPN) has good described and modeled. The Preparing data anis presented in
detection rate as compared to other neural network techniques section 3, the strategy of classification and those three
and therefore it can be used for specific attack classification, so approaches the Naive Bayesian Algorithm, the Learning
that preventive action can be taken. BPN uses a supervised Neural Networks and the Decision Tree Learning are
learning approach for training. In other works, methods for represented in section 4, In section 5 some important
classifying defects have been based on neural networks, fuzzy simulation results are represented and compared with those
or Neuro-fuzzy logic [15-16]. In [27] A diagnostic framework three approaches each separately who have formed the
based on the Bayesian network (BN) is proposed to proposed strategy to show the effectiveness of our approach,
Finally, in section 6 a concluding and perspective is given.
incorporate fault frequencies. These approaches are based on
chronological data of various signals in static environments. II. WIND TURBINE DESCRIPTION
The decision tree technique is a subset of the logic based
classification method, it performs classification by sorting the
The basic configuration of the wind turbine is presented as
inputs based on the inherent feature values. The nodes in a
follows. The nacelle contains main components, as well as the
decision tree are representative of the feature values [13]. This generator and gearbox, and is located on top of the tower
method has been improved classification accuracy and structure. The blades are connected to the rotor, that in turn is
interpretability of loan granting decisions [14]-[30]. attached to the generator through the gearbox. The nacelle
One of the main limitations of classical classification contains a motor that allows the blades to face the wind. Large-
methods is that classifiers formed from pre-existing data can scale utility generators are generally variable speed and have
only be used to recognize known classes from the diagnostic mechanisms that alter the pitch angle of the blade and control
task of separating defective and clean data. This problem has the lift produced by the wind. Wind turbines have additional
become a well-known classification issue. actuators and sensors that can be used for the control. The
Our study is motivated by the aforementioned data-driven generator torque load may be used to determine the amount of
approach, which is a simple method without complex electrical power absorbed by the mechanical system and act
algorithms and which is always of great interest from the like a breaking system to control the acceleration of the rotor.
application point of view to reduce design and engineering A rupture system can put the rotor to a complete stop. The
efforts. An intuitive and simple data management method that main measurements are the rotor and generator speeds and the
ensures the security and reliability of batch processes and an pitch angles of the blade. As one of the principal premises is
easy process monitoring task. For this purpose, a diagnostic that the system has to remain at low cost, because high cost
diagram of wind turbine controlled by subspace data is and redundant sensors are generally avoided, resulting in
proposed in this paper. Without the availability of the process additional challenges to the closed loop and FDI problems.
model, the parameters of the process monitoring system, which The turbine operates at four distinct regions, Fig. 1: (1) cut
can be realized by the parity space and the diagnostic observer, in speed, where winds are insufficient; (2) the interim region,
will be directly identified from the test data. However, the idea i.e. the region between the cut-in and rated wind speed; (3)
to classify of the prepared data comes from the sensors by the maximum power capture region, which starts at the minimum
proposed hybrid classifier in two classes, so that we can decide rated wind speed; (4) cut out speed, where wind speeds are too
whether the state of the system is defective or not. The high. The general control objective is to maximize power
proposed classifier is a combining three approaches the naive absorption while operating in the region (ii) and minimize
Bayesian algorithm, the Neural Network's approach and the structural loads when operating in the region (3); regions (1)
Decision tree approach. This contribution is obtained by and (4) are not contemplated.
following the instructions: First, the training measurement
3. Where wn is the angular frequency, and ζ is the damping
coefficient. The pressure drop is achieved by the model
parameters to change the values to wn2,ζ2.
The dynamics modeling of the generator with a first order
transfer function are given as follow:
τ g ,m ( s ) α gc
=
τ g ,r ( s ) s + α gc
(3)
Where τg,m(s)and τg,r(s)are respectively the real and the
Fig. 1 The reference power curve for the wind turbine depending on the wind desired torque.
speed [4].

The data set was generated from the simulation results of a A two mass model models the drive train and generator:
benchmark of a wind turbine model for the development and −Bdt −B Bdt −Kdt 
analysis of FDI systems [17]. The components of the model  J NgJr Jr 
 r  1 
used and the relationship between them are represented in Fig. •  −ηdt Bdt  J 0 
2. This model is a simplification of 3 blades, a utility-scale wr (t)  −Bg wr (t)  
r

1 τr (t) (4)

2
wind turbine with a rated power of 4.8 MW. Each blade has a η B
w (t) =  dt dt N g η K
dt dt    
free pitch motion. Hence the control system provides 3 pitch  g   NJ wg (t) + 0 −  
Jg NgJg  Jg τg (t)
command signals separately, βi,r where i=[1,2,3]; the actual
θΔ(t)  θΔ(t)  
g g

pitch position is determined by sensors that provide signals of  −1  0 0 

βi,m. Although the model has physical redundancy for the pitch  1 Ng
0  



sensors, the goal of this paper is to study the performance of an  
analytical FDI system so that we will eliminate redundant  
measurements.
Where θΔ is the drive train torsion, Jr and Jg are the rotor
and generator inertia, Br and Bg are the viscous dampings of
the rotor and generator. Bdt and Kdt are the damping and
stiffness coefficients of the drive train flexibility; Ng is the
gearbox ratio, and ηdt is the efficiency of the drive train. If the
turbine is operating in Region 3, a discrete Proportional
Integral controller generates the pitch commands references to
maintain the nominal rotor speed.
The torque of the generator is set to:
τ g , ref = Prate d / w g (5)
Fig. 2 Wind turbine Model: system interconnection [18] . In this paper, the different faults proposed in the FDI
The torque that is transmitted to the generator and the rotor benchmark [18] will be considered, as resumed in Table I.
is noted by τg and τr respectively, whereas the angular velocity
of the generator and the rotor are denoted by wg (rad/s) and III. THE PROPOSED STRATEGY
wr (rad/s). The complete set of measurements also includes
transmitting torques, angular velocities and wind speed νw. The A. Preparing Data for Classifications
captured power is approximately given by equation (1).
The key step in learning a new model for fault detection by
1 (1) a classifier is the definition of the vector x to be used for
Pr = τ r wr = * ρ Aυ w3 C p ( λ , β )
2 classification. This vector should contain the most pertinent
Where τr (N) is the aerodynamic torque, wr is the rotor information on the behavior of the system. It should not be
speed, ρ (kg/m2) is the air density, A (m2) is the area swept by limited to the measurement output. It can include the inputs,
the rotor, and ⱱw (m/s) is the wind speed. Cp is is a function of the set-points, the combination of those or variations of the
blade pitch angle β (deg°) which represents how much outputs with time. To build a useful vector, one should
power available in the wind is captured. Moreover, tip speed carefully observe the process outputs for each fault and
propose a combination that ensures a sufficiently high impact
ratio, λ = wr .R , R (m) is the rotor radius. The following of the considered faults in x.
υw
transfer function represents the pitch actuators on the system: Different vectors were suggested for the various kinds of
faults. For the six sensors measuring the pitch positions ( βk ,mi;
β k ,m ( s ) w 2
k=1, 2, 3; i=1, 2), the following vector was used in the case
= 2 n
β k (s)
d
s + 2 .ξ . w n . s + w n2 (2) where the fault is Fixed Value or Gain Factor (fault F1, F2,
and F3):
d
Where βk,m(s) and βk (s) are respectively the measured
positions and the desired pitch angle of the blades with k= 1, 2,
TABLE I. THE DIFFERENT FAULTS PROPOSED IN THIS PAPER

Fault Fault Description Type Value Period (s)

F1 Change in Pitch 1 Measurement Fixed Value β1, m1=5° 2000-2100
F2 Change in Pitch 2 Measurement Gain Factor β2, m2=1,2 * β2, m2 2300-2400
F3 Change in Pitch 3 Measurement Fixed Value β3, m1=10° 2600-2700
F4 Change in Rotor Speed Sensor Fixed Value wr,m1=1.4 rad/s 1500-1600
F5 Change in Rotor and Generator Speed Gain Factor wr,m2= 1.1 * wr,m2 1000-1100
Measurement wg,m2= 0.9 * wg,m2
F6 Parameter Abrupt Change in Pitch 2 Changed Dynamics wn2=11.11 wn2=5.73 2900-3000
ζ2 =0.6 ζ2 =0.45
F7 Parameter Slowly change in Pitch 3 Changed Dynamics wn3=11.11 wn3=3.42 3400-3500
ζ3 =0.6 ζ3 =0.9
F8 Offset in Converter System Offset τg τg + 2000Nm 3800-3900

 β k , m 1 (t j ) − β k , m 2 ( t j )  ( Pr / τ gd , with Pr the desired power). The factor λ2 = 10−10 ×υwind

6
in
  rd
the 3 component of x was used to take into account the wind
x =  β k , m1 (t j ) − β k , m1 (t j −1 )  (6)
speed and for normalization. The used values β k ,mi are the
 β k , m 2 (t j ) − β k , m 2 (t j −1 ) 
 
filtered ones (as in Eq. 6). Note that τ gd was also filtered using
Where tj and tj-1 are the time instance j and j-1 respectively. a first order filter with a time constant τ=0.02s. The objective
The first line in x at time tj detects differences between two of this filter was to take into account the dynamic of the
sensors of the same location and the second and three lines control system (time necessary for τ gm to attain τ gd , see (5) and
show the variation with time for the two sensor measurement.
not to reject measurement noise or disturbances.
Note that absolute values are used in x in all cases. When
For the detection of faults F6 and F7, the following vector
β k , mi (t j ) − β k , mi (t j −1 ) = 0 , this term is replaced by a large
was used:
constant value (5000) to enhance distinguishability between
the fixed value fault and normal case where these values  w g ,m1 (t j ) − w g , m 2 (t j ) 
oscillate between 1×10-2 and 2. The measured values βk, mi  
were filtered using a first order filter with a time constant  β k ,m1 (t j ) − β k ,m 2 (t j ) 
(9)
τ=0.06s to reduce the sensitivity to process disturbances or x= 
measurement noise.  β k , m 1 ( t j ) − β k , m 1 ( t j −1 ) 
 
Sensor faults of the speeds of the generator and rotor (wg,mi,  β k , m 2 ( t j ) − β k , m 2 ( t j −1 ) 
wr,mi, i=1, 2), the following vector was used for learning in the  
case where the fault is Fixed Value or Gain Factor (faults F4 Once the learning vectors are defined for each fault,
and F5): . different fault scenarios are then simulated and attributed the
ticket y=+/-1 (with or without fault). The classifier uses the
 wp,m1 (t j ) − wp,m2 (t j )  outputs (x) and the corresponding y values to learning for
  identifying a model as a function of the principe each
x =  wp,m1 (t j ) − wp,m1 (t j −1 )  , p=g,r (7) classifier.
 
 wp,m2 (t j ) − wp,m2 (t j −1 )  B. Classification
 
The measurements wg where filtered with τ = 0.02s and wr
Data classification is a two-step process:
with τ = 0.06s before use in equation 7. Note however that
very high variance values might lead to false alarms. For fault • The first step, of the classification process [29], can
F8, the following vector was used: also be viewed as the learning of a mapping or function,
y=f(X), which can predict the associated class label y of
  a given tuple X. Where a classification algorithm builds
w g , m 1 (t j ) − w g , m 2 (t j ) the classifier by analyzing a training set made up of
 
data base tuples and their associated class labels.
x= τ gd (t j ) − τ gm (t j )  (8)
 
 λ × w d (t ) − ( w (t ) + w  • The second step, the model is used for classification
g , m 2 (t j )) / 2 [28], the predictive accuracy of the classifier is
 2 g j g , m1 j

estimated. If we had to use the training set to measure
Where w gd is the desired generator speed, calculated from the accuracy of the classifier, this estimate could be
optimistic, because the classifier tends to overfit the
the desired generator torque τ g obtained by the controller
d
data (which means that during learning it may
incorporate few particular anomalies of the training Thus we find the class that maximizes P(Ci , X ) . The
data that are not present in the general dataset overall).
class Ci for which P(Ci , X ) is maximized is called the
Recent data mining research has contributed to the
development of scalable algorithms for classification and maximum posterior hypothesis. By Bayes theorem:
prediction. The selected model is discussed in these following P ( X | C i ) P (C i ) (12)
P (C i | X ) = .
parts. P(X )
1) Naive Bayesian Classifier: 3. As P( X ) is the same for all classes, only P(X Ci ) PC
( i ) need
Bayesian classifiers are statistical classifiers [27]. They be maximized. If the class a priori probability, P(Ci ) , are
can predict class membership probabilities, such as the
probability that a given sample belongs to a particular class. not known, then it is commonly assumed that the classes
Bayesian classifier is based on Bayes theorem. Naive are equally likely, that is, P(C1) = P(C2) =... = P(Ck ) , and we
Bayesian classifiers assume that the effect of an attribute value
would, therefore, maximize P(X Ci ) . Otherwise, we
of a given class is independent of the values of the other
attributes. This assumption is called class conditional maximize P(X Ci ) P(Ci ) . Note that the prior class
independence. It is made to simplify the computation involved
and, in this sense, is considered naive. probabilities may be estimated by P(Ci ) = freq(Ci ,T)/ T .
a)- Bayes’ Theorem : 4. Given data sets with many attributes, it would be
Let X = {x1, x2,..., xn} be a sample, whose components computationally expensive to compute P(X Ci ) . In order to
represent values made on a set of n attributes. In Bayesian
terms [20], X is considered evidence. Let H be some
reduce computation in evaluating P(X Ci ) P(Ci ) , the naive
hypothesis, such as that the data X belongs to a specific class assumption of class conditional independence is made. This
C. For classification problems, our goal is to determine presumes that the values of the attributes are conditionally
independent of one another, given the class label of the
P(H X) , the probability that the hypothesis H holds given the
sample. Mathematically this means that
evidence, (i.e. the observed data sample X). In other words, we n

are looking for the probability that sample X belongs to class P ( X | Ci ) ≈ ∏ P(x
k =1
k | Ci ) (13)
C, given that we know the attribute description of X.
According to Bayes' theorem, the probability that we want The probabilities P(x1 Ci ), P(x2 Ci ),..., P(xn Ci ) can easily
to compute P(H X) can be expressed regarding probabilities be estimated from the training set. Recall that here xk refers
to the value of attribute Ak for sample X.
P(H), P(X H) , and P(X) as:
• If Ak is categorical, then P( xk Ci ) is the number of
P( X | H )P(H ) (10)
P(H | X ) = samples of class Ci in T having the value xk for attribute
P(X )
Ak, divided by freq(Ci , T ) , the number of sample of
Where P(H X) is the a posteriori probability of H
class Ci in T.
conditioned on X, P(X H) is the a posteriori probability of X • If Ak is continuous-valued, then we typically assume
conditioned on H, and P(H) and P(X) are the a priori that the values have a Gaussian distribution with a
probability of H and X respectively. These probabilities may be mean μ and standard deviation σ defined by
estimated from the given data. 1 ( x − μ )2 (14)
The naive Bayesian classifier works as follows: g ( x, μ , σ ) = exp( − )
2πσ 2σ 2
1. Let T be a training set of samples, each with their class so that p ( x k | C i ) = g ( x k μ Ci , σ Ci )
labels. There are k classes, C1 , C2 ,..., Ck Each sample is
We need to compute μ Ci and σ Ci which are the mean
represented by an n-dimensional vector, X = { x1, x2 ,..., xn} ,
and standard deviation of values of attribute Ak for
depicting n measured values of the n attributes, training samples of class Ci.
A1 , A2 ,..., An respectively. 5. To predict the class label of X, P( X Ci ) P(Ci ) is evaluated
2. Given a sample X, the classifier will predict that X belongs
to the class having the highest a posteriori probability, for each class Ci. The classifier predicts that the class label
conditioned on X. That X is predicted to belong to the class of X is Ci if and only if it is the class that maximizes
Ci if and only if P(X Ci ) P(Ci ) .
P (Ci | X ) > P (C j | X ) for 1 ≤ j ≤ m , j ≠ i . (11)
2) Classification by Back-Propagation Neural Networks (7) for each hidden or output layer unit j {
(8) Ij =  i
w ij O i + θ j ; // Compte the net input of unit j
The Back-Propagation Neural Network (BPNN), see Fig.4 with respect to the previous layer, i
was developed by Rumelhart et al. [19] as a solution to the 1
issue of training multi-layer perceptrons. The fundamental (9) Oj = ; } // compte the output of each Unit j
−Ij
advances represented by the BPNN include their high tolerance 1+ e
(10) // Back-propagate the errors:
of noisy data as well as their ability to classify patterns on that (11) for each unit j in the output layer
they have not been trained, This is due to the inclusion of a (12) Errj=Oj=(1-Oj)(Tj-Oj) ; // compute the error
differentiable transfer function at each node of the network and (13) for each unit j in the hidden layers, from the last to the
the use of error back-propagation to change the internal first hidden layer
network weights after each training epoch. (14) E r r j = O j (1 − O j )  k E r rk w jk ;// compte the error
with respect to the next higher layer; k
(15) for each weight wi j in network {
(16) Δ w ij = ( l ) E r r j O i ; // weight increment
(17) w ij = w ij + Δ w ij ; } weight update
(18) for each bias θj in network {
(19) Δ θ j = ( l ) Err j ; // bias increment
(20) θ j = θ j + Δ θ j ; } // bias update
(21) }}

Fig. 4. A multilayer feed-forward neural network [2]. 3) Classification by Decision Tree Induction
Back-propagation learns by iterative processing a data set Decision tree induction is the learning of decision
of training tuples [30], comparing the network’s prediction for trees from class-labeled training tuples [31]. A decision
each tuple with the known real target value. The target value tree is a flowchart-like tree structure, where each internal
may be the known class label of the training tuple or a node (no leaf node) denotes a test on an attribute, each
continuous value. For each learning tuple, the weights are branch represents a test result, and each leaf node (or
modified to minimize the mean squared error between the terminal node) holds a class label. The highest node in a
network prediction and the actual target value. These changes tree is the root node. The algorithm for decision tree
are made in the "backward" direction, that is from the output induction in classification is following these instructions,
Given a tuple, X, for which the associated class label is
layer, to each hidden layer to the first hidden layer (hence the unknown, the attribute values of the tuple are tested
name of backpropagation). Although it is not guaranteed, in against the decision tree. A path is traced from the root
general, the weights will eventually converge, and the learning to a leaf node, that holds the class prediction for that
process stops. The steps involved are expressed regarding tuple. Decision trees can easily be converted to
inputs, outputs, and errors, and may seem awkward if this is classification rules.
your first look at neural network learning . However, once you The classifier adopts a greedy approach in which decision
become familiar with the process, you will see that each step is trees [13] - [14]] are constructed to classification in a top-down
inherently simple. These steps are described and summarized recursive divide-and-conquer manner, also follow such a top-
on algorithm below. down approach starts with a training set of tuples and their
associated class labels. The training is recursively partitioned
Algorithm: Neural network learning for classification, using into smaller subsets as the tree is being built. A basic decision
the back-propagation algorithm [30]. tree algorithm is summarized, At first glance, the algorithm
may appear long, but fear not! It is quite straightforward. The
Input: strategy is as follows.
 D, a data set consisting of the training tuples and their
associated target values; Algorithm: Generate a decision tree from the training
 l, the learning rate; tuples of data partition D [31] .
 network, a multilayer feed-forward network.
Output: A trained neural network. Input:
Method:  Data partition D, which is a set of training tuples
and their associated class labels;
(1) Initialize all weights and biases in network;  attribute list, the set of candidate attributes;
(2) while terminating condition is not satisfied {  Attribute selection method, a procedure to
(3) for each training tuple X in D determine the splitting criterion that “best”
(4) // Propagate the inputs forward: partitions the data tuples into individual classes.
(5) for each input layer unit j {
(6) Oj = Ij ;//output of an input unit is its actual input value
This criterion consists of a splitting attribute and, measurement for different data vectors proposed for the
possibly, either a split point or splitting subset. different kinds of faults to the training of these proposed
Output: A decision tree. classifiers. we demonstrate a comparison in performance of the
Method: each classifier . It is to be noted about the following table that
(1) create a node N; a good classifier requires a great deal of sequential
(2) if tuples in D are all of the same class, C then computation and a large number of reference vectors , we
(3) return N as a leaf node labeled with the class C; shows about the comparison of reference vectors, that the
(4) if attribute_list is empty then classifier statically Naive Bayes classifier [27] is the Best than
(5) return N as a leaf node labeled with the majority class in D; //
the classification based on Back-Propagation Neural Network
majority voting
(6) apply Attribute_selection_method(D, attribute_list) to find the (BPNN) [12] or Decision tree induction [13]. The proposed
“best” splitting_criterion; classifier is based on the combining of Naive Bayes approach,
(7) label node N with splitting_criterion; Back-propagation Neural Network and Decision Tree, this
(8) if splitting_attribute is discrete-valued and proposed hybrid classifier is a set of those three approaches to
multiway splits allowed then // not restricted to binary trees feat the advantages of each approach aside for Build a
(9) attribute_list = attribute_list - splitting_attribute; // remove classifier has more performance. Firstly, the Naive Bayes
splitting_attribute classifier can be extended to exploit the conditional
(10) for each outcome j of splitting_criterion independence of features. Secondly, (BPNN) classifier
// partition the tuples and grow subtrees for each partition includes their high tolerance of noisy data as well as their
(11) let D j be the set of data tuples in D satisfying outcome j; // a ability to classify patterns on which they have not been trained.
partition Thirdly, The efficiency of Decision Tree algorithm for
(12) if D j is empty then relatively small data sets. Our results are compared with the
(13) attach a leaf labeled with the majority class in D to node N; Naive Bayesian classifier [27] and with the BPN classifier
(14) else attach the node returned by Generate decision tree(D j, [30] and with the Decision Trees Classifier [31] to indicate the
attribute_list) to node N;
value of our method. To validate our results we have used the
end for
(15) return N; WEKA machine learning library.
TABLE III. PARAMETERS OF WIND TURBINE
BENCHMARK MODEL.
4) The Proposed Strategy :
Parameters Parameters values
In this paper, the problem of FDI in wind turbine system R (m) 57.5
will be treated. The idea is to build a classifier to classify of ρ (kg/m3) 1.225
data in two classes, so that we can decide whether the state of ζ 0.6
the system is defective or not. In this problem we have injected wn (rad/s) 11.11
the noises provided that are minimum of a threshold in the ζ2 0.45
sensors, And since the faults caused the perturbation in data wn2 (rad/s) 5.73
captured from the noisy sensors, Both put the FDI problem ζ3 0.9
very difficult to treat. To solve the problem in these conditions, wn3 (rad/s) 3.42
we used three complementary approaches to build a hybrid
αgc (s-1) 50
classifier who can be classify data, the first is the Bayesian
ηgc 0.98
Statically Algorithm, for exploiting the conditional
Bdt (N.m.s/ 775.49
independence of features. But it sends the inference in a false
Br (N.m.s/r 7.11
direction if the prior information is wrong. So to adjust this
Bg (N.m.s/r 45.6
problem we added the second that Back-propagation Neural
Networks approach but has a high time for learning. the last is Ng 95
the Decision Trees Learning one are simple to use by adding Kdt (N.m.s/ 2.7 . 109
it, it has allowed us to Acquire the measurement. ηdt 0.97
ηdt2 0.92
C. The Validation of Proposed Strategy : Jg (kg.m2) 390
In view of this , we take some sample of sensor Jr (kg.m2) 55 . 106

TABLE II. DATA LEARNING CLASSIFICATION RESULTS

Fault F1 F2 F3 F4 F5 F6 F7 F8
Classifier
Naive Bayesian TB=0.03s TB=0s TB=0.06s TB=0s TB=0s TB=0s TB=0s TB=0.01s TB=0s
Classifier [27] RC=100% RC=97.93% RC=100% RC=100% RC=100% TC=100% RC=18.74% RC=9.13% RC=99.95%
Back-propagation Neural TB=1.53s TB=1.40s TB=1.39s TB=1.39s TB=1.39s TB=1.39s TB=1.99s TB=1.95s TB=1.9s
Networks Classifier [30] RC=100% RC=97.81% RC=100% RC=100% RC=100% RC=100% RC=97.72% RC=97.72% RC=99.95%
Decision Tree Classifier [31] TB=0.03s TB=0.01s TB=0s TB=0s TB=0s TB=0s TB=0.03s TB=0s TB=0.01s
RC=100% RC=97.81% RC=100% RC=100% RC=100% RC=100% RC=97.72% RC=97.72% RC=99.95%
Proposed Hybrid Classifier TB=1.41s TB=1.42s TB=1.42s TB=1.41s TB=1.40s TB=1.40s TB=1.94s TB=1.96s TB=1.92s
RC=100% RC=99.98% RC=100% RC=100% RC=100% RC=100% RC=99.98% RC=99.98% RC=99.99%
[13] S. Kotsiantis, “ Supervised machine learning: A review of classification
techniques,” Informatica, vol. 31, pp. 249–268, 2007.
These results obtaining are resumed in the following in
[14] J. Zurada, “ Could decision trees improve the classification accuracy and
Table II. Where TB is the time taken to build model in seconds interpretability of loan granting decisions,” in Proceedings of the 43rd
(s) and RC is the recognition accuracy (the correctly data Hawaii International Conference on System Sciences (HICSS), Jan, pp.
classified ) in percentage (%). 1–9, 2010.
The parameter values of the reference model of the wind [15] de Miguel, L. J., & Blázquez, L. F. “ Fuzzy logic-based decision-making
turbine are given in Table III. for fault diagnosis in a DC motor ,” Engineering Applications of
Artificial Intelligence, vol. 18, no 4, pp. 423-450, 2005.
IV. CONCLUSION [16] Razavi-Far, R., Davilu, H., Palade, V., & Lucas, C. “ Model-based fault
detection and isolation of a steam generator using neuro-fuzzy
In this paper, the problem of fault detection and isolation of networks,” Neurocomputing, vol. 72, no 13, pp. 2939-2951, 2009.
the wind turbine was treated based on data driven. The [17] Odgaard, P. F., Stoustrup, J., & Kinnaert, M. “ Fault tolerant control of
proposed hybrid classifer is a combining of those three wind turbines–a benchmark model,” IFAC Proceedings Volumes, vol.
approaches Bayes statistical Algorithm, Back-propagation 42, no 8, pp. 155-160, 2009.
Neural Network and Decision Trees. Enabling the benefits of [18] Fernandez-Canti, R. M., Blesa, J., Tornil-Sin, S., & Puig, V. “ Fault
these three methods to Build a classifier has more detection and isolation for a wind turbine benchmark using a mixed
Bayesian/Set-membership approach ,” Annual Reviews in Control, vol.
performance. The effectiveness of this strategy is demonstrated 40, pp. 59-69, 2015.
in a comparison of obtaining results with the results of a Naive [19] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. “ Learning
Bayesian classifier, Back-Propagation Neural Networks representations by back-propagating errors,” Cognitive modeling, vol. 5,
classifier and decision tree classifier no 3, pp. 1, 1988.
[20] Han, J., Pei, J., & Kamber, M. “ Data mining: concepts and techniques,”
Elsevier. 2011.
References [21] Costamagna, P., De Giorgi, A., Magistri, L., Moser, G., Pellaco, L., &
[1] Zhang, J., Swain, A. K., & Nguang, S. K. “ Robust Observer-Based Trucco, A. “A Classification Approach for Model-Based Fault Diagnosis
Fault Diagnosis for Nonlinear Systems Using MATLAB®,” Springer, in Power Generation Systems Based on Solid Oxide Fuel Cells,” IEEE
2016. Transactions on Energy Conversion, vol. 31, no 2, pp. 676-687, 2016.
[2] DING, Steven X. “ Data-driven design of fault diagnosis and fault- [22] Sundermann, B., Burgmer, M., Pogatzki-Zahn, E., Gaubitz, M., Stüber,
tolerant control systems,” Springer, 2014. C., Wessolleck, E., & Pfleiderer, B. “ Diagnostic classification based on
functional connectivity in chronic pain: model optimization in
[3] Kroll, A., & Schulte, H. , “ Benchmark problems for nonlinear system
fibromyalgia and rheumatoid arthritis,” Academic radiology, vol. 21, no
identification and control using soft computing methods: need and
3, pp. 369-377, 2014.
overview,” Applied Soft Computing, vol. 25, pp. 496-513, 2014.
[23] Chhogyal, K., & Nayak, A. “ An Empirical Study of a Simple Naive
[4] Blesa, J., Rotondo, D., Puig, V., & Nejjari, F. “ FDI and FTC of wind
Bayes Classifier Based on Ranking Functions,” Australasian Joint
turbines using the interval observer approach and virtual actuators/
Conference on Artificial Intelligence. Springer International Publishing,
sensors,” Control Engineering Practice, vol. 24, pp. 138-155, 2014.
pp. 324-331, 2016.
[5] Sanchez, H., Escobet, T., Puig, V., & Odgaard, P. F. “ Fault diagnosis of
[24] Devroye, L., Györfi, L., & Lugosi, G. “ A probabilistic theory of pattern
an advanced wind turbine benchmark using interval-based ARRs and
recognition,” Springer Science & Business Media, Vol. 31, 2013.
observers,” IEEE Transactions on Industrial Electronics, vol. 62, no 6,
pp. 3783-3793, 2015. [25] Girouard, A., Solovey, E. T., & Jacob, R. J. “ Designing a passive brain
computer interface using real time classification of functional near-
[6] Blesa, J., Puig, V., Romera, J., & Saludes, J. “ Fault diagnosis of wind
infrared spectroscopy,” International Journal of Autonomous and
turbines using a set-membership approach,” IFAC Proceedings Volumes,
Adaptive Communications Systems, vol. 6, no 1, pp. 26-44, 2013.
vol. 44, no 1, pp. 8316-8321, 2011.
[26] Kyranou, I., Krasoulis, A., Erden, M. S., Nazarpour, K., & Vijayakumar,
[7] Simani, S., Farsoni, S., & Castaldi, P. “ Fault diagnosis of a wind turbine
S. “ Real-time classification of multi-modal sensory data for prosthetic
benchmark via identified fuzzy models,” IEEE Transactions on Industrial
hand control. In Biomedical Robotics and Biomechatronics (BioRob) ,”
Electronics, vol.62, no 6, pp.3775-3782, 2015.
Biomedical Robotics and Biomechatronics (BioRob), 2016 6th IEEE
[8] Schulte, H., & Pöschke, F. “ Estimation of Multiple Faults in Hydrostatic International Conference on. IEEE, pp. 536-541, 2016.
Wind Turbines using Takagi-Sugeno Sliding Mode Observer with
[27] Askarian, M., Zarghami, R., Jalali-Farahani, F., & Mostoufi, N. “ Fault
Weighted Switching Action,” IFAC-PapersOnLine, vol.49, no 5, pp.194-
diagnosis of chemical processes considering fault frequency via Bayesian
199, 2016.
network,” The Canadian Journal of Chemical Engineering, vol. 94, no
[9] Dong, J., & Verhaegen, M. “ Data driven fault detection and isolation of 12, pp. 2315-2325, 2016.
a wind turbine benchmark,” IFAC Proceedings Volumes, vol.44, no 1,
[28] Hassouna, H., Melgani, F., & Mokhtari, Z. “ Spatial contextual Gaussian
pp. 7086-7091, 2011.
process learning for remote-sensing image classification,” Remote
[10] Yin, S., Ding, S. X., Abandan Sari, A. H., & Hao, H. “ Data-driven Sensing Letters, vol. 6, no 7, pp. 519-528, 2015.
monitoring for stochastic systems and its application on batch
[29] Bashir, L. Z. “ Reinforcement-Based Learning for Process Classification
process,” International Journal of Systems Science, vol.44, no 7,
Task,” World Scientific News, vol. 36, pp. 12-26, 2016.
pp.1366-1376, 2013.
[30] Lam, H. K., Ekong, U., Liu, H., Xiao, B., Araujo, H., Ling, S. H., &
[11] Vong, C. M., Wong, P. K., & Ip, W. F. “ A new framework of
Chan, K. Y. “ A study of neural-network-based classifiers for material
simultaneous-fault diagnosis using pairwise probabilistic multi-label
classification,” Neurocomputing, vol. 144, pp. 367-377, 2014.
classification for time-dependent patterns,” IEEE transactions on
industrial electronics, vol.60, no 8, pp.3372-3385, 2013. Zhang, Y. X., & Zhao, Y. H. “ Comparison of decision tree methods
for finding active objects,” Chinese Journal of Astronomy and
[12] Shah, B., & H Trivedi, B, “ Artificial neural network based intrusion
Astrophysics, vol. 7, no 2, pp. 289, 2007.
detection system: A survey,” International Journal of Computer
Applications, vol. 39, no 6, pp. 13-18, 2012.