T Assine 2017
T Assine 2017
Abstract — In this paper, the problem of fault diagnosis in of wind turbines is also increasing. Moreover, in order to
wind turbine will be addressed based on data-driven. The idea is ensure accessibility, reduce downtime and maintenance costs,
to classify of data by a classifier in two classes, so that we can faster detection of defects and isolation should be introduced in
decide whether the state of the system is defective or not. The wind systems.
proposed hybrid classifier is a combining of those three
Benchmarking is based on the principles of ability to be
approaches Bayes Statistical Algorithm, Back-propagation
Neural Networks and Decision Trees. During training, the validated, reproducible and comparable. This requires an exact
learning rate has gradually decreased in discrete steps converges specification of the reference problem ranging from the
to a minimum error solution of classification. The Different parts description of the process for the design of the experiment and
of the process were investigated, including actuators, sensors and the test data to the evaluation criteria for achieving the results
process faults. All Process faults mainly concerned friction in the [3]. To the extent that this is true, reference models for wind
wind turbine, which might cause it damage. Our results are turbines have recently received much attention of varying
compared with those three approaches each separately who have degrees of complexity and have been successfully proposed
formed the proposed strategy to indicate the value of our strategy. and applied in simulation studies to compare approaches to the
This proposed classifier has built with the ''Waikato Environment
for Knowledge Analysis (WEKA)'' library and evaluated by
faults detection and isolation (FDI). The most popular of these
observing of classification rate. modern FDI techniques is the model-based approach [1].
Where a priori mathematical information is used to model the
Keywords— Fault Detection and Isolation (FDI); wind turbine; normal system. Based on the system model, state estimation
Data-Driven; Classification; Learning; Bayes Statistical Algorithm; techniques or parameters are then used to facilitate comparison
Back-propagation Neural Networks; Decision Tree. between the actual process and the modeled process to
accomplish fault detection. This comparison can also be used
I. INTRODUCTION to generate a residue vector from which the cause of the defect
Technological systems are vulnerable to defects. Actuator can be identified by a statistical or geometric method based on
defects reduce the performance of the control systems and may knowledge.
even cause a complete system failure. Incorrect readings of As for the wind turbine system based on a limited
sensors are the reason for several malfunctions since these description of noise and modeling errors, the interval observer
errors can reduce the quality and efficiency of a production approach was applied to analyze the fault signatures observed
line. Adding to this that failure of a single component in a online and to match them with the theoretical results obtained
complex system can cause a malfunction of the entire system. using a structural analysis and a line reasoning scheme [4]-[5].
Therefore, the operation of the system must be stopped to We also find the use of parity equations to check the coherence
avoid damage to humans and/or machinery. Due to the between the measurements and the model by looking for a
economic requirements, ecological requirements and safety to parameter in the set of feasible parameters (approximated by a
satisfy, the high reliability of the technological systems has zonotope) [6]. The fuzzy model, in the form of Takagi-Sugeno
prototypes, represents the residual generators used for the
become a dominant objective in the industry. This is why
detection and isolation of defects [7]-[8].
several researchers have aroused considerable interest in
In addition, the analytical model requires simultaneous
research over the last decades in methods of detection and
isolation of defects and which can be classified into two consideration of the exact model of the system diagnosis such
categories based model [1] or based data [2]. as the laws of thermodynamics, the dynamics of compressible
fluids, electrical, magnetic, as well as Newtonian mechanics.
It is known that in recent years the conversion of the kinetic
To overcome the limitations of the model-based approach,
energy of the wind turbine into electrical energy has received
some researchers have also used large amounts of data from
considerable attention in the energy market due to the growing
available sensor measurements, event logs, records, and
demand and the need for clean energy sources. Except that
with the increasing production of wind energy, the complexity diagnostic techniques based on sample data [1] [9]-[10]. In the
The data set was generated from the simulation results of a A two mass model models the drive train and generator:
benchmark of a wind turbine model for the development and −Bdt −B Bdt −Kdt
analysis of FDI systems [17]. The components of the model J NgJr Jr
r 1
used and the relationship between them are represented in Fig. • −ηdt Bdt J 0
2. This model is a simplification of 3 blades, a utility-scale wr (t) −Bg wr (t)
r
are looking for the probability that sample X belongs to class P ( X | Ci ) ≈ ∏ P(x
k =1
k | Ci ) (13)
C, given that we know the attribute description of X.
According to Bayes' theorem, the probability that we want The probabilities P(x1 Ci ), P(x2 Ci ),..., P(xn Ci ) can easily
to compute P(H X) can be expressed regarding probabilities be estimated from the training set. Recall that here xk refers
to the value of attribute Ak for sample X.
P(H), P(X H) , and P(X) as:
• If Ak is categorical, then P( xk Ci ) is the number of
P( X | H )P(H ) (10)
P(H | X ) = samples of class Ci in T having the value xk for attribute
P(X )
Ak, divided by freq(Ci , T ) , the number of sample of
Where P(H X) is the a posteriori probability of H
class Ci in T.
conditioned on X, P(X H) is the a posteriori probability of X • If Ak is continuous-valued, then we typically assume
conditioned on H, and P(H) and P(X) are the a priori that the values have a Gaussian distribution with a
probability of H and X respectively. These probabilities may be mean μ and standard deviation σ defined by
estimated from the given data. 1 ( x − μ )2 (14)
The naive Bayesian classifier works as follows: g ( x, μ , σ ) = exp( − )
2πσ 2σ 2
1. Let T be a training set of samples, each with their class so that p ( x k | C i ) = g ( x k μ Ci , σ Ci )
labels. There are k classes, C1 , C2 ,..., Ck Each sample is
We need to compute μ Ci and σ Ci which are the mean
represented by an n-dimensional vector, X = { x1, x2 ,..., xn} ,
and standard deviation of values of attribute Ak for
depicting n measured values of the n attributes, training samples of class Ci.
A1 , A2 ,..., An respectively. 5. To predict the class label of X, P( X Ci ) P(Ci ) is evaluated
2. Given a sample X, the classifier will predict that X belongs
to the class having the highest a posteriori probability, for each class Ci. The classifier predicts that the class label
conditioned on X. That X is predicted to belong to the class of X is Ci if and only if it is the class that maximizes
Ci if and only if P(X Ci ) P(Ci ) .
P (Ci | X ) > P (C j | X ) for 1 ≤ j ≤ m , j ≠ i . (11)
2) Classification by Back-Propagation Neural Networks (7) for each hidden or output layer unit j {
(8) Ij = i
w ij O i + θ j ; // Compte the net input of unit j
The Back-Propagation Neural Network (BPNN), see Fig.4 with respect to the previous layer, i
was developed by Rumelhart et al. [19] as a solution to the 1
issue of training multi-layer perceptrons. The fundamental (9) Oj = ; } // compte the output of each Unit j
−Ij
advances represented by the BPNN include their high tolerance 1+ e
(10) // Back-propagate the errors:
of noisy data as well as their ability to classify patterns on that (11) for each unit j in the output layer
they have not been trained, This is due to the inclusion of a (12) Errj=Oj=(1-Oj)(Tj-Oj) ; // compute the error
differentiable transfer function at each node of the network and (13) for each unit j in the hidden layers, from the last to the
the use of error back-propagation to change the internal first hidden layer
network weights after each training epoch. (14) E r r j = O j (1 − O j ) k E r rk w jk ;// compte the error
with respect to the next higher layer; k
(15) for each weight wi j in network {
(16) Δ w ij = ( l ) E r r j O i ; // weight increment
(17) w ij = w ij + Δ w ij ; } weight update
(18) for each bias θj in network {
(19) Δ θ j = ( l ) Err j ; // bias increment
(20) θ j = θ j + Δ θ j ; } // bias update
(21) }}
Fig. 4. A multilayer feed-forward neural network [2]. 3) Classification by Decision Tree Induction
Back-propagation learns by iterative processing a data set Decision tree induction is the learning of decision
of training tuples [30], comparing the network’s prediction for trees from class-labeled training tuples [31]. A decision
each tuple with the known real target value. The target value tree is a flowchart-like tree structure, where each internal
may be the known class label of the training tuple or a node (no leaf node) denotes a test on an attribute, each
continuous value. For each learning tuple, the weights are branch represents a test result, and each leaf node (or
modified to minimize the mean squared error between the terminal node) holds a class label. The highest node in a
network prediction and the actual target value. These changes tree is the root node. The algorithm for decision tree
are made in the "backward" direction, that is from the output induction in classification is following these instructions,
Given a tuple, X, for which the associated class label is
layer, to each hidden layer to the first hidden layer (hence the unknown, the attribute values of the tuple are tested
name of backpropagation). Although it is not guaranteed, in against the decision tree. A path is traced from the root
general, the weights will eventually converge, and the learning to a leaf node, that holds the class prediction for that
process stops. The steps involved are expressed regarding tuple. Decision trees can easily be converted to
inputs, outputs, and errors, and may seem awkward if this is classification rules.
your first look at neural network learning . However, once you The classifier adopts a greedy approach in which decision
become familiar with the process, you will see that each step is trees [13] - [14]] are constructed to classification in a top-down
inherently simple. These steps are described and summarized recursive divide-and-conquer manner, also follow such a top-
on algorithm below. down approach starts with a training set of tuples and their
associated class labels. The training is recursively partitioned
Algorithm: Neural network learning for classification, using into smaller subsets as the tree is being built. A basic decision
the back-propagation algorithm [30]. tree algorithm is summarized, At first glance, the algorithm
may appear long, but fear not! It is quite straightforward. The
Input: strategy is as follows.
D, a data set consisting of the training tuples and their
associated target values; Algorithm: Generate a decision tree from the training
l, the learning rate; tuples of data partition D [31] .
network, a multilayer feed-forward network.
Output: A trained neural network. Input:
Method: Data partition D, which is a set of training tuples
and their associated class labels;
(1) Initialize all weights and biases in network; attribute list, the set of candidate attributes;
(2) while terminating condition is not satisfied { Attribute selection method, a procedure to
(3) for each training tuple X in D determine the splitting criterion that “best”
(4) // Propagate the inputs forward: partitions the data tuples into individual classes.
(5) for each input layer unit j {
(6) Oj = Ij ;//output of an input unit is its actual input value
This criterion consists of a splitting attribute and, measurement for different data vectors proposed for the
possibly, either a split point or splitting subset. different kinds of faults to the training of these proposed
Output: A decision tree. classifiers. we demonstrate a comparison in performance of the
Method: each classifier . It is to be noted about the following table that
(1) create a node N; a good classifier requires a great deal of sequential
(2) if tuples in D are all of the same class, C then computation and a large number of reference vectors , we
(3) return N as a leaf node labeled with the class C; shows about the comparison of reference vectors, that the
(4) if attribute_list is empty then classifier statically Naive Bayes classifier [27] is the Best than
(5) return N as a leaf node labeled with the majority class in D; //
the classification based on Back-Propagation Neural Network
majority voting
(6) apply Attribute_selection_method(D, attribute_list) to find the (BPNN) [12] or Decision tree induction [13]. The proposed
“best” splitting_criterion; classifier is based on the combining of Naive Bayes approach,
(7) label node N with splitting_criterion; Back-propagation Neural Network and Decision Tree, this
(8) if splitting_attribute is discrete-valued and proposed hybrid classifier is a set of those three approaches to
multiway splits allowed then // not restricted to binary trees feat the advantages of each approach aside for Build a
(9) attribute_list = attribute_list - splitting_attribute; // remove classifier has more performance. Firstly, the Naive Bayes
splitting_attribute classifier can be extended to exploit the conditional
(10) for each outcome j of splitting_criterion independence of features. Secondly, (BPNN) classifier
// partition the tuples and grow subtrees for each partition includes their high tolerance of noisy data as well as their
(11) let D j be the set of data tuples in D satisfying outcome j; // a ability to classify patterns on which they have not been trained.
partition Thirdly, The efficiency of Decision Tree algorithm for
(12) if D j is empty then relatively small data sets. Our results are compared with the
(13) attach a leaf labeled with the majority class in D to node N; Naive Bayesian classifier [27] and with the BPN classifier
(14) else attach the node returned by Generate decision tree(D j, [30] and with the Decision Trees Classifier [31] to indicate the
attribute_list) to node N;
value of our method. To validate our results we have used the
end for
(15) return N; WEKA machine learning library.
TABLE III. PARAMETERS OF WIND TURBINE
BENCHMARK MODEL.
4) The Proposed Strategy :
Parameters Parameters values
In this paper, the problem of FDI in wind turbine system R (m) 57.5
will be treated. The idea is to build a classifier to classify of ρ (kg/m3) 1.225
data in two classes, so that we can decide whether the state of ζ 0.6
the system is defective or not. In this problem we have injected wn (rad/s) 11.11
the noises provided that are minimum of a threshold in the ζ2 0.45
sensors, And since the faults caused the perturbation in data wn2 (rad/s) 5.73
captured from the noisy sensors, Both put the FDI problem ζ3 0.9
very difficult to treat. To solve the problem in these conditions, wn3 (rad/s) 3.42
we used three complementary approaches to build a hybrid
αgc (s-1) 50
classifier who can be classify data, the first is the Bayesian
ηgc 0.98
Statically Algorithm, for exploiting the conditional
Bdt (N.m.s/ 775.49
independence of features. But it sends the inference in a false
Br (N.m.s/r 7.11
direction if the prior information is wrong. So to adjust this
Bg (N.m.s/r 45.6
problem we added the second that Back-propagation Neural
Networks approach but has a high time for learning. the last is Ng 95
the Decision Trees Learning one are simple to use by adding Kdt (N.m.s/ 2.7 . 109
it, it has allowed us to Acquire the measurement. ηdt 0.97
ηdt2 0.92
C. The Validation of Proposed Strategy : Jg (kg.m2) 390
In view of this , we take some sample of sensor Jr (kg.m2) 55 . 106