1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
Online Dissolved Gas Analysis of Power
Transformers Based on Decision Tree Model
Arief Basuki Suwarno
School of Electrical School of Electrical
Engineering and Informatics Engineering and Informatics
Institut Teknologi Bandung Institut Teknologi Bandung
Bandung, Indonesia Bandung, Indonesia
arief.scada@gmail.com suwarno@stei.itb.ac.id
Abstract—This paper presents the possibility of using one of This paper provides a solution for fault interpretation of
machine learning model, decision tree with C4.5 algorithm for gas power transformers that sometimes cannot be handled by
interpretation in online condition monitoring and diagnostic conventional DGA methods. This solution also can be
application of power transformers. Decision tree selection is based implemented in the continuous online condition monitoring of
on the best learning outcomes of machine learning software power transformer which is integrated into SCADA system. So
(WEKA and Orange) compared to naïve Bayes, neural network, that transformer condition can be monitored continuously and
nearest neighbour and support vector machine models. The fault that occurs in the transformer will be detected earlier. The
decision tree was built from 715 data, 7 attributes of gas and 9 interpretation method is machine learning model using decision
types of fault which were cleaned by interquartile range method
tree (C4.5 algorithm).
become 471 data. Evaluation result based on correction prediction
are 95.54% using data training, 88.32% using cross validation and There are several researches of DGA interpretation based on
87.23% using 10% random data from data training. The decision decision tree. In the [18], decision tree algorithm is applied by
tree rule was implemented in online condition monitoring and using 5 attributes of gas, to diagnose 3 types of fault (discharge,
diagnostic of power transformer which is integrated into SCADA partial discharge and thermal. Then [19], decision tree algorithm
system. This implementation result can predict transformers fault is applied by using 7 attributes of gas to diagnose 5 types of fault
from gas values by online better than conventional DGA methods. (low-temperature superheat, high-temperature superheat, low
energy discharge, high energy discharge and arc discharge with
Keywords— Dissolved Gas Analysis, Decision Tree, Machine
Learning, C4.5 Algorithm, SCADA
over heat). Both are applied in offline application.
I. INTRODUCTION II. DGA (DISSOLVED GAS ANALYSIS)
Being one of the most expensive assets in a power network, Faults in the transformer will cause an increase in oil
power transformers are supposed to operate with the highest temperature and the occurrence of certain oxidation products
possible reliability and availability. A failure of power such as dissolved gases. These gases, such as hydrogen (H2),
transformer can cause outage in the electrical system. The methane (CH4), ethane (C2H6), ethylene (C2H4), acetylene
transformer condition becomes very important to be monitored (C2H2), carbon monoxide (CO) and carbon dioxide (CO2). The
continuously so that the reliability of the power system can be presence of these gases in the transformer oil is an indicator of
maintained. disturbance in the transformer. The amount and composition of
these gases can be used to detect the type of disturbance or fault
A continuous online condition monitoring function will be that occurs. The technique for diagnosing transformer fault
better if the system is also equipped with the ability to diagnostic based on these gases is called Dissolved Gas Analysis (DGA).
(process to interpret) fault that occur in the equipment. There are
various methods in the IEC and IEEE standards, that can be used There are several methods in DGA according to IEEE Std.
to interpret or detect faults that occurs or will be occur in the C57.104™-2008 and IEC 60599-2007 [6], [7]:
transformer. These methods usually called conventional A. TDCG Method
methods. One of the fault detection of power transformer which
is very popular is DGA (Dissolved Gas Analysis) method. This This method uses four criteria to define the condition of the
method can detect fault in transformer by key gas concentrations transformer. Four of these criteria is based on the content of
(ppm) or their mutual comparison ratios. dissolved gases individually or total dissolved gases in a
transformer oil.
The limitation of the conventional methods is in some cases,
measured gas concentrations or ratios do not fit within the When condition 4 is reached, continued operation could
predefined criteria. So, faults that occur inside transformers result in failure of the transformer. For Condition 1,2 and 3, if
cannot be identifiable. Another problem is, in practice, the same there is any individual combustible gas exceeding specified
gas data provide different interpretation when processing with levels should prompt additional investigation.
different DGA methods.
This work was supported by PT. PLN (Persero), Indonesian State Electricity
Company
1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
B. Key gas Method online condition monitoring and Continuous On-Line
This method defines fault based on key gases that exist when Monitoring. Both terms refer to methods for collecting or
fault occur. According to key gas changes, there are four measuring data while the equipment is electrically energized and
definition fault that can be detect using this method. This method in service. The difference is the use of IED in continuous online
interpretation according to relative proportion key gas in percent monitoring that is permanently installed. [2]
(%) compare to other key gas. There are the interpretations [7]: According to [2], there are several goals of Continuous On-
a. Thermal – Oil (Overheated Oil) Line Monitoring are:
Principal Gas in this fault is Ethylene (C2H4). • To generate early warnings in case of incipient faults, to
b. Thermal – Cellulose (Overheated Cellulose) reduce the risk of unexpected failure
Principal Gas in this fault is Carbon Monoxide (CO). • To follow up the development of the diagnostic values
c. Electrical – Partial Discharge (Partial Discharge in Oil) on suspect or faulty units which cannot be taken out of
service immediately
Principal Gas in this fault is Hydrogen (H2).
• To reduce costs for periodic diagnostic testing and
d. Electrical – Arcing (Arcing in Oil)
assign workers to other tasks
Principal Gas in this fault is Acetylene (C2H2).
• To archive measured and computed data in a database
C. Ratio method
for future analyses
There are five ratio of key combustible gases that use to
diagnose transformers. Condition monitoring system can be implementing in several
ways below [11]:
• Ratio 1 (R1) = CH4/H2 Ratio 4 (R4) = C2H6/C2H2
• Ratio 2 (R2) = C2H2/C2H4 Ratio 5 (R5) = C2H4/C2H6 • A stand-alone system using PC or microcontroller to
collect and analyze data
• Ratio 3 (R3) = C2H2/CH4
• A system using data acquisition units at the transformer
Doernenburg ratio method using R1, R2, R3 and R4 to
to collect and transmit data for analysis to centralized
define the fault type. Both Roger ratio and IEC ratio methods
PC in the local substation.
use R1, R2 and R5.
D. CO2/CO Ratio • A system using the substation control system SCADA
to collect and store the data for analysis at a remote
This ratio is a general indication of thermal decomposition location.
of cellulose. Ratio value for normal transformer is more than
seven. The CO2/CO ratio, will be valid when value of CO2 Fig. 2 shows configuration for implementation of condition
exceed 5000 ppm and CO exceed 500 ppm. monitoring system using substation automation in the SCADA
system. In the implementation, there are modifications by
E. Duval Triangle adding IED monitoring to the bay level, addition sensors to the
This diagnosis method was developed by M. Duval [8] by process level and addition displays menu for the transformer
means of three gases: CH4, C2H4 and C2H2. This method uses monitoring.
the graphical method to plot the value and interpret the fault.
To Control Center
PD
T1
STATION LEVEL
%C2H4 Workstation Gateway
%CH4 T2
Station bus
D1 D2 DT T3
BAY LEVEL
IED IED
IED Control Protection Monitoring
IED Protection
%C2H2 Process bus
Fig. 1. Duval triangle
III. ONLINE CONDITION MONITORING SYSTEM
PROCESS LEVEL
Condition monitoring (CM) can be defined as a parameter
monitoring process of the equipment within a certain period to Switchyard
identify changes that may lead to disruption of the equipment.
There are two important terms in condition monitoring i.e.,
1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
Fig. 2. Substation automation configuration combine with online condition attributes include Gini index, information gain, gain ratio and
monitoring of power transformer average gain.
IV. MACHINE LEARNING The equation for calculating information gain and entropy
are as follows:
Machine learning is one application of Artificial Intelligence
(AI) that makes computers act and think like humans, with , = −∑
| |
(1)
| |
algorithms provided from data. Machine learning uses a variety
of algorithms that iteratively learn from data to improve, and,
describe data, and predict outcomes. A machine learning model
is the output generated when machine learning algorithm has = −∑ (2)
been trained with data [4].
The accuracy of a machine learning model depends on the Where,
data. The more data that is added to the algorithm, the more , = information gain of set S (training data) on an attribute A
sophisticated the algorithm becomes. There are two machine =attribute
learning models, online and offline. The difference is online = subset of S for which attribute A has value i
machine learning models continuously learn with new data, but | | = number of elements in
offline machine learning models once deployed and never | | = number of elements in
change [4]. Several machine learning algorithms that widely use =proportion of S belonging to a class i
such as: naïve Bayes, decision tree, nearest neighbour, neural
network, linear regression, etc.
VI. MACHINE LEARNING IMPLEMENTATION
The steps in the machine learning cycle are as follows [4]:
According steps in the machine learning cycle (point IV),
A. Identify the data: there are steps in this research:
The first step is identifying the relevant data sources. The A. Identify the data:
correct data source will produce good machine learning model.
Data which is used in this research from PLN (National
B. Prepare data: Electric Company) in Indonesia. The data consist of 7 gas types
Clean the data from noise and irrelevant data, so data will and the real faults. The faults data is classified into 9 types of
clean, secured, and governed. faults [9] as shown in Table I.
C. Select the machine learning algorithm: TABLE I. TYPE OF TRANSFORMERS FAULT
Choose the best algorithm that will be used. Machine Learning
No Fault Code
D. Train: Result
1 Partial Discharge Partial Discharge PD
Train the algorithm to create the machine learning model.
Low temperature
2 Thermal Fault t < 300°C LT
E. Evaluate: overheat
Middle temperature
Evaluate the model to improve the result. 3 Thermal Fault 300°C < t < 700°C
overheat
MT
F. Deploy: High temperature
4 Thermal Fault t > 700°C HT
overheat
Deploy the model to the application. Low energy
5 Discharge of Low Energy LD
discharge
G. Predict: High energy
6 Discharge of High Energy HD
Making predictions based on new data. discharge
Low energy
Discharge of Low Energy and
H. Assess predictions: 7
Overheat
discharge and LDNO
overheat
Check the validity of model predictions. Middle temperature
Discharge of Low Energy and
8 and low energy MTNLD
V. C4.5 ALGORITHM Thermal Fault 300°C < t < 700°C
discharge
C4.5 is a machine learning algorithm that aims to form a 9
Discharge of Low Energy and High temperature and
HTNLD
decision tree of data. The tree model that is formed resembles an Thermal Fault t > 700°C low energy discharge
upside-down tree, where the root is at the top and the leaf is at
the bottom. The decision tree model can be used to classify a B. Prepare data:
case by tracing it from the root of the tree and moving through it
In this step, data will be cleaned from outlier and extreme
until the leaf is encountered [10].
values using WEKA software [14]. Method that is used in this
The main process in decision tree formation in C4.5 is the step is interquartile range. The interquartile range equation that
determination of Split attribute. The methods for selecting split is used in WEKA software are:
Outliers detection:
1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
Q3 + OF*IQR < x <= Q3 + EVF*IQR (3) Fig. 3. Decision tree model with 26 leaves
Q1 - EVF*IQR <= x < Q1 - OF*IQR (4) E. Evaluate:
Extreme values detection:
There are three test methods to evaluate the model:
• Making prediction using data training
x > Q3 + EVF*IQR (5)
• Making prediction using cross validation
x < Q1 - EVF*IQR (6)
• Making prediction using some part of data training
Where: (10% random data from data training).
Q1 = 25% quartile Table III shows evaluation result from decision tree model
Q3 = 75% quartile using three methods.
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor [16] = 1.5 TABLE III. EVALUATION RESULT
EVF = Extreme Value Factor = 3
No Outlier and Extreme Value (471 Data)
From this step, the result is 471 data without outliers and NO Method
Correction Predict MAE RMSE
extreme values which will be used in training machine learning 1 Use Training Set 95.54% 0.0188 0.0968
models. 2 Cross-Validation 88.32% 0.0334 0.1639
3 Percentage Split (90-10) 87.23% 0.0354 0.1765
C. Select the machine learning algorithm:
WEKA and Orange [15] software was used to choose the F. Deploy:
best machine learning algorithm that will be used in this
research. Decision tree (c.45 algorithm), naïve Bayes, neural This part is how to implement decision tree into online
network, nearest neighbor and support vector machine (SVM) condition monitoring system. The system was made using
are used in this comparation. SCADA software in the Substation Automation System (SAS).
From decision tree model is changed into rules (if – then – else),
TABLE II. COMPARISON MODEL USING 471 DATA TRAINING so it can be implemented in SCADA software programming.
Comparison Correct Prediction (WEKA) Classification Then, debugging process is done by testing the algorithm in
No Accuracy SCADA software (Fig. 4), compare to another DGA methods.
Model Percentage (ORANGE)
1 Decision Tree 88.32% 87.9%
2 Naive Bayes 68.79% 2.1%
3 Neural Network 86.41% 84.1%
4 Nearest Neighbor 84.29% 72.2%
5 Support Vector Machine 79.40 % 83.4%
From table II, decision tree is the best model for this case.
D. Train:
This step will produce decision tree model from 471 data
training. The decision tree result based on WEKA software
using J48 module (C4.5 algorithm) shown in Fig. 3.
Fig. 4. Decision tree testing model
1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
G. Predict: TABLE IV. FAULT TYPES
F1 F2 F3 F4
Next step is adding the decision tree module in SCADA
Thermal Thermal Electrical Electrical
software that has been built in previous step (deploy step) into Method
Fault Fault Fault Fault
online condition monitoring system. This system will predict gas (Cellulose) (Oil) (Corona) (Arcing)
interpretation result compare to another method by online. Roger - Low - Thermal fault - Low energy - High energy
temperature <7000C electrical electrical
Thermal - Thermal discharge discharge
fault >700 0C
IEC - Thermal - Thermal fault - Low energy - High energy
fault<1500C 300-7000C electrical electrical
- Thermal fault - Thermal discharge discharge
150–300 0C fault >700 0C
Doernenburg - Thermal Thermal - Low energy - High energy
decomposition decomposition electrical electrical
discharge discharge
Duval - Thermal - Thermal fault - Low energy - High energy
fault<3000C 300-7000C electrical electrical
Implementation Thermal discharge discharge
Result fault >700 0C
Key gas - Overheated - Overheated oil - Low energy - High energy
cellulose electrical electrical
discharge discharge
Decision - Low - Middle - Low energy - High energy
Tree temperature temperature discharge discharge
overheat overheat
- High
temperature
overheat
Fig. 5. Implementation decision tree model into online condition monitoring Prediction results with data from PLN (National Electric
and diagnostic of power transformer Company) in Indonesia (different data with data train), 4 cases
from [3] and a case from [5] by decision tree compare to other
H. Assess predictions: methods is shown in table V based on fault types from table IV.
To accommodate all of fault definitions, both from the Table V shows that decision tree can interpret all fault types
conventional method and the decision tree, fault definition from while another method cannot interpret in several faults. Decision
[5] is used in this paper, as shown in table IV. tree also provides more accurate type of faults interpretation than
conventional methods of DGA.
TABLE V. COMPARISON RESULT
1C31-1 The 4th IEEE Conference on Power Engineering and Renewable Energy
ICPERE 2018
VII. CONCLUSION [8] M. Duval, “Dissolved Gas Analysis:It Can Save Your Transformer,”
IEEE Electrical Insulation Magazine, vol. 5, no. 6, pp. 22–27, 1989.
1. Decision tree (C4.5 algorithm) which is one of the
[9] C. Subroto, Suwarno, Trianto and G. J. Zhang, "Artificial intelligence
machine learning model can be used for dissolved gas for DGA interpretation methods using weighting factor," 2017 1st
analysis (DGA) of power transformer. International Conference on Electrical Materials and Power Equipment
(ICEMPE), Xi'an, 2017, pp. 85-88.
2. This model also provides more accurate interpretation [10] John Ross Quinlan, "C4.5: Programs for Machine Learning", Morgan
results than conventional methods of DGA. Kaufmann Publishers, 1993.
[11] CIGRE WG A2.27: Technical Brochure 343, “Recommendations for
3. Machine learning model can be implemented in an online Condition Monitoring And Condition Assessment Facilities For
condition monitoring and diagnostic of power transformer Transformers”, April 2008
that is integrated with SCADA (Substation Automation). [12] LI Ming, “Switched Ethernet Based Online Monitoring System for
Power Transformer”, Proceedings of the 30th Chinese Control
REFERENCES Conference, July 22-24, 2011, Yantai, China
[13] S.Tenbohlen, F. Figel, “On-line Condition Monitoring of Power
[1] Kanika Shrivastava, Ashish Choubey, “Data Mining Approach With
Transformers”, IEEE Power Engineering Society Winter Meeting, 2000
IEC Based Dissolved Gas Analysis For Fault Diagnosis Of Power
Transformer”, International Journal of Engineering Research & [14] https://www.cs.waikato.ac.nz/~ml/weka/
Technology (IJERT), March – 2013 [15] https://orange.biolab.si/
[2] CIGRE WG A2.34: Technical Brochure 445, “Guide for Transformer [16] F. M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester, A
Maintenance”, February 2011 Modern Introduction to Probability and Statistics: Understanding
[3] Di Giorgio, J. B., “Dissolved Gas Analysis of Mineral Oil Insulating Why and How, Springer-Verlag, London Limited, 2005, pp:236
Fluids,” DGA Expert System: A Leader in Quality, Value and [17] Feng Li, Hong bin Wang and Kun Feng, “Transformer Defect Data
Experience 1, Northern Technology and Testing, pages 1-17, 2005 Analysis Based on Data Mining Technology”, International Conference
[4] Judith Hurwitz and Daniel Kirsch, “Machine Learning For Dummies®, on Test, Measurement and Computational Method (TMCM 2015)
IBM Limited Edition”, John Wiley & Sons, Inc., 2018 [18] Chih-Hsuan Liu, Tai-Li Chen, Leeh-Ter Yao and Shun-Yuan Wang,
[5] A. Abu-Siada, S. Hmood and S. Islam, "A new fuzzy logic approach for "Using data mining to dissolved gas analysis for power transformer fault
consistent interpretation of dissolved gas-in-oil analysis", IEEE diagnosis," 2012 International Conference on Machine Learning and
Transactions on Dielectrics and Electrical Insulation, vol. 20, no. 6, pp. Cybernetics, Xian, 2012, pp. 1952-1957.
2343-2349, December 2013. [19] Yuanyuan Hana, Dongming Zhaoa, Hui Houa, " Oil-immersed
[6] International Electrotechnical Commission, “IEC 60599 - Mineral oil Transformer Internal Thermoelectric Potential Fault Diagnosis Based
impregnated electrical equipment in service - Guide to the interpretation on Decision-tree of KNIME Platform", Procedia Computer Science
of dissolved and free gases analysis,” Tech. Rep., 2007. 2016, vol.83, pp. 1321-1326
[7] “IEEE Guide for the Interpretation of Gases Generated in Oil-Immersed
Transformers - Redline,” IEEE Std C57.104-2008 (Revision of IEEE
Std C57.104-1991) - Redline, pp. 1–45, 2009.