0% found this document useful (0 votes)
16 views6 pages

Sulaiman 2015

This paper proposes a feature selection method based on mutual information for predicting petroleum reservoir properties using machine learning models like SVM and ANN. It addresses the challenges posed by uncertainty in well log datasets, which can affect prediction accuracy, and demonstrates that the proposed method can enhance model performance and reduce training time. The study includes a background on well log data, the mutual information hypothesis, and experimental results that validate the effectiveness of the feature selection approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Sulaiman 2015

This paper proposes a feature selection method based on mutual information for predicting petroleum reservoir properties using machine learning models like SVM and ANN. It addresses the challenges posed by uncertainty in well log datasets, which can affect prediction accuracy, and demonstrates that the proposed method can enhance model performance and reduce training time. The study includes a background on well log data, the mutual information hypothesis, and experimental results that validate the effectiveness of the feature selection approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Feature Selection based on Mutual Information

for Machine learning prediction of Petroleum reservoir properties

Muhammad Aliyu Sulaiman1, Jane Labadin2


Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak,
94300 Kota Samarahan, Sarawak, Malaysia
1
muhalisu@gmail.com, 2 ljane@pps.unimas.my

Abstract— the application of machine learning models such as The rest of the paper is organized as follows: section II is a
support vector machine (SVM) and artificial neural networks background of the study and it discusses the overview of well
(ANN) in predicting reservoir properties has been effective in the log data and the feature selection methods. Section III, we
recent years when compared with the traditional empirical present Mutual Information hypothesis and formulation of its
methods. Despite that the machine learning models suffer a lot in estimation from the dataset. Section IV presents the proposed
the faces of uncertain data which is common characteristics of feature selection for well log dataset based on greedy
well log dataset. The reason for uncertainty in well log dataset feedforward procedure. Experimental studies are detailed in
includes a missing scale, data interpretation and measurement section V, which include experimental setups, results of
error problems. Feature Selection aimed at selecting feature
experiments and discussion.
subset that is relevant to the predicting property. In this paper a
feature selection based on mutual information criterion is
proposed, the strong point of this method relies on the choice of II. BACKGROUND OF THE STUDY
threshold based on statistically sound criterion for the typical Literature review provides overview of sources of
greedy feedforward method of feature selection. Experimental
uncertainty in well log data and how the uncertainty affects
results indicate that the proposed method is capable of improving
the performance of the machine learning models in terms of optimal performance of machine learning applications to oil
prediction accuracy and reduction in training time. and gas predictions. Finally, feature selection algorithms in
related studies are reviewed.
Keywords—Machine Learning; Mutual Information; Feature
Selection. A. Overview of uncertainty in well log dataset and how it
affeacts accuracy of Machine learning predictors.
I. INTRODUCTION Uncertainty information in data is useful information that
can be utilized to improve the quality of underlying result. As
Well logging is at the heart of the oil and gas exploration, it such, feature with greater uncertainty may not be as important
provides continuous record of rock’s formation properties.
as one which has a lower amount of uncertainty [5].
Reservoir variables are known to be used as input data to a
As earlier on mentioned in the introduction, reservoir variables
reservoir study. These variables are commonly derived through
a number of processes and they are not measured directly from such as porosity, permeability, water saturation and minerals
well logging tools. Out of all the reservoir properties, the are known to be used as input data to a reservoir study. And
reservoir porosity and permeability collectively refer to as core these variables are commonly derived through a number of
logs are of great importance, because accurate prediction of processes which include acquisition, processing, interpretation
these properties is essential in determining where to drill and if and calibration and they are not measured directly from well
found, how much of oil and gas can be recovered [2]. logging tools. Each of these processes has uncertainty and as
However, existence of uncertainty in well log dataset affects such the result of petrophysical data or well logs will equally
the optimal performance of machine learning models to predict have uncertainty and limitation [1, 3]. More so it is commonly
these properties, in order to address the problem associated acknowledged that uncertainty exist at all stages of petroleum
with uncertainty in well log dataset as regards to the exploration [3, 5 & 6]. And not only that, it propagates with
performance of machine learning models we introduced a each stage since each stage is built on the result from the
feature selection algorithm based on Mutual Information previous stages.
criterion. The choice of mutual information is because of its A hybrid model for predicting Pressure Volume and
ability to select features that retain relevant information of the Temperature (PVT) properties of crude oil is presented in [7],
predicting parameter. Moreover, to measure the effectiveness the model was based on the fusion of Type-2 fuzzy logic
of the proposed study we implemented a machine learning system (type-2 FLS) and Sensitivity-based linear learning
models based on back propagation neural networks. And we method (SBLLM). The authors categorically recognized the
used the trained classifiers to test our proposed method in terms
presence of uncertainty in well-log datasets and limitation of
of prediction accuracy and training time, by comparing the
SBLLM to generalize when there is uncertainty in dataset.
performance of selected feature subsets with the performance
of full feature set. And since Type-2 FLS is known for modeling uncertainty,
therefore it is used to improve the prediction ability of
SBLLM in the presence of uncertainty in dataset. A genetic- data mining and machine learning application were proposed in
neuro-fuzzy inference system is proposed in [8] to estimate [10, 15, and 16] based on mutual information (MI) criterion,
Pressure, Volume & Temperature (PVT) properties of crude combination of ranking and expectation-maximization(EM)
oil system. In that study it was cited that ANN correlations are and Hilbert-Schmidt independence criterion (HSIC)
limited and less accurate in terms of global accuracy, hence respectively. Mutual information (MI) Criterion based on
this led the authors to proposed hybrid genetic-neuro-fuzzy probability density function (pdf) of each data value was
system. Although they could not ascertain why ANN proposed in [10] and experimental results on 8 UCI machine
correlations are less accurate in terms of global accuracy, learning repository have proved the effectiveness of this
algorithm. MI was first evaluated between each feature of the
emphasis was on fuzzy clustering optimization criterion and
training set and the output vector. The resulting MI scores for
ranking as motivation for their method. From this we can
each feature are then used to rank the features. Also, two
inferred that data uncertainty could be the reason why ANN different aging databases were used in [16] for experiment. FG-
gives suboptimal result when used alone, just as proclaimed in NET containing 1002 face images of 82 persons with age
both [7& 10]. Again, two hybrid intelligence systems for ranging from 0 to 69, also, Yamaha face database containing
predicting petroleum reservoir properties were proposed in 800 males, 800 females and 8000 images with ages ranging
[9], functional networks-support vector machines (FN-SVM) from 0 to 93. The Ranking model was built based on kernel
and functional networks-type-2 fuzzy logic (FN-T2FL) to trick and bilinear regression strategy, and the parameter
improve the performance of standalone SVM and T2FL learning technique was based on EM. In [16], datasets taken
respectively. In both hybrid systems the functional network from UCI repository, the Statlib repository and LibSVM
components uses least square fitting algorithm to extract website were used for testing and comparison. Hilbert-Schmidt
relevant features from input data, and this was the core reason Independence Criterion which is based on covariance between
for improvement of these models over individual standalone variables mapped to produce kernel Hilbert spaces, were
models. employed for feature selection together with greedy backward
elimination algorithm. Experimental results shown that HSIC
B. Overview of feature selection methods performed comparably to other-state-of-the-art feature selectors
More often, real life applications of classification or such as, SVM Recursive Elimination (RFE), RELIEF, L0 –
prediction include collection of large number of norm SVM (L0) and R2W2.
attributes/features for some reasons other than mining the data,
thereby containing replicate or irrelevant features [14]. III. MUTUAL INFORMATION (MI) HYPOTHESIS AND
There are three broadly categories of models for feature FORMULATION OF ITS ESTIMATION FROM THE DATASET
selection algorithms [4, 11, 12 and 13]. The first category is the Mutual Information (MI) of two random variables is a
filter models, which are group of models that utilize statistical quantitative measurement of the amount of dependence
and probabilistic distributions of dataset attributes in order to (information) between the two random variables. Unlike
select feature subset from the input dataset. Hence they select correlation coefficient that can only handle linear dependence,
feature subset independent of any particular learning machine. MI is able to detect both linear and non-linear relationships
This independent selection enables subsequent feature
between variables, a property that made it a popular choice for
prediction by any learning machine. Feature Selection based on
feature selection [10, 17, 18, 19 and 20].
Mutual Information (MI) is an example of (unsupervised) filter
model. The second category is wrapper models, these are
optimization searched algorithms that are used together with a Formally, the MI of a pair of random variables X and Y is
particular learning machine to find the best feature subset based defined by probability density function (pdf) of X, Y and joint
on the prediction accuracy of a learning machine. Genetic variables (X, Y). If we denote the pdf of X, Y and joint (X, Y) as
algorithm (GA) and Particle Swarm Optimization (PSO) f X, fY, and f X ,Y respectively.
algorithm are examples of wrapper models. This dependent on
a particular learning machine to search the best feature subset
makes wrappers to have better prediction accuracy with f (x, y)
MI(X;Y) = ∫∫ f (x, y).log X,Y
dd (1)
relatively high computational overhead. For example, feature X,Y
f X
(x) f (y)
Y
x y

subset generated based on GA and ANN may perform woefully


when used to make prediction with SVM. The third category is Note that if the variables X and Y are completely independent,
the hybrid models, which combined the power of both filter then the joint pdf is equal to the product of pdf of X and pdf of
and wrapper to select feature subsets.
Y that is, f X ,Y = f X * f Yand MI becomes equal to zero:
Unler et al in [4] introduces hybrid filter-wrapper feature
subset selection algorithm based on PSO with SVM. Mutual MI(X;Y) = 0 (2)
Information (MI) defined in terms of probability density
functions of variables serves as filter algorithm, while Particle MI can also be expressed in terms of entropy, which is another
Swarm Optimization (PSO) which is a population-based information theory that measure uncertainty in random
optimization search technique was used as wrapper to find the variables. Entropy of X is expressed as:
best strongly relevant feature subset identified by MI and
finally SVM is used for classification. Also, varied filter h( X ) = −∫ f X ( x)log f X ( x) dx (3)
(feature selection) algorithms for managing uncertain data in
And the mutual information is expressed in terms of entropy bj=1.06σj N (9)
as:
σj is the standard deviation along jth dimension of the dataset.
MI(X; Y) =h(Y) - h(Y/X) (4) ∧

Also, f (y | x ) is the estimated pdf of the ith class conditional to the


i j

jth feature, based on Bayes theorems, f ( y | x ) from equation 6 Yi | X j i j

h(Y/X) is the uncertainty about Y when X is known. Again, if X can be rewritten as:
and Y are independent: h(Y/X) = h(Y) and MI(X; Y) =0.
^ ^
∧ f X j |Yi (x | yi ) p( yi )
With these definitions now let us assumed that, given a
dataset X containing N samples and d-attributes/features, our
f X jYi
( yi | xj ) = ^
(10)
f xj (x)
goal is to predict the class of these samples based on previously
observed input/output pairs. That is to estimate MI (MI(X; Y)) Then equation 6 becomes:
between X continuous variables and Y discrete variables (Y is
the classes of sample under consideration). ^ ^ ^ ^
^ ∧ f X j |Yi (x | yi ) p(yi ) f X j |Yi (x | yi ) p(yi )
More precisely, our objective here is to evaluate MI (Xj; Y), for h(Y / X j ) = ∫ fX j (x)∑ ^
log ∧
dx (11)
j=1, 2,…, d attributes. Assuming that Y takes k different xj f xj (x) f x j (x)
discrete values y1………,yk. Each yi is represented by ni
samples (where ∑i ni =N). Thus, probability that Y = yi is This explained how the MI can be entirely determined by the
^
denoted by p ( y i ) = n i . Now the entropy of Y is: pdf of the variables X; and possibly limited to the points with
N a particular class label yi.

∧ k ∧ ∧ Recall that Xj=[Xj1………,XjN] model uncertainty in the


h(Y ) = −∑ p( yi ) log p( yi ) (5) data using Gaussian pdf i.e.
i =1 X j1 ~ N ( µ j1 , σ j1 )..... X jN ~ N ( µ jN , σ jN ) .
Equation 5 can be omitted when we are comparing the µji, is observed value for the jth dimension of the ith sample,
estimated MI of each feature with the output Y, since entropy of and σji, is the corresponding variance. Replacing equation 7
Y is the same for all features. Thus MI is estimated in terms of
with equation 12 below is the natural approach to consider the
entropy as:
^ ∧ k ∧ ∧ expected value of the kernel k.
( / Xj ) =−∫ f (xj )∑f
hY (yi | xj )log f (yi | xj )dx (6)
Xj Yi|Xj Yi|Xj
xj i
N
∧ 1 x − xi
Equation 6 is the entropy of classes of (Y) over each of X
f (x) =
N b
∑i=1
E [k (
b
)] (12)
attributes.
Now by ignoring equation 5, equation 4 (which is the actual Expected value of the Kernel function k(X) is calculated using
expression of MI estimation of two random variables) becomes the law of the unconscious statistician (LOTUS) as:
equal to equation 6. From the expression
∧ of MI as in equation
x ji −µ ji
6, we can see that MI depends on, f ( x ) so we need to estimate x − x 1 −0.5( )2

1 N
f (x) = ∑∫ k( ji )
σ ji
pdf of X from the dataset only. The common method use for e dxji
estimating pdf is histogram or kernel-based estimators. In this Nbi=1 x ji b 2πσji
study, Parzen-Window density estimator will be used as
appears in [21]. x ji −x x ji −µ ji
)2
1 N 1 −0.5( )2 1 −0.5(
= ∑∫ b σ ji
Consider x1……,xN where N is independent and identically e e dxji
distributed (i.i.d.) samples drawn from the distribution f. Thus: Nbi=1 xji 2π 2πσji
xji −x 2 µji −xji
^ 1 N x−x )2
∑ k( b i ) 1N 1 −0.5( ) 1 −0.5(
= ∑∫
f ( x) = (7) b σ ji
Nb i =1 e e dxji (13)
N i=1 xji 2πb 2πσji
Where k is kernel and b is called bandwidth. Gaussian kernel Since Convolution of two Gaussian distributions is another
with zero mean and unit variance is the most popular choice for Gaussian distribution. Let say:
k. i.e.
f ~ N ( µ f , σ f ) & g ~ N ( µ g ,σ g )
1 2
⇒ C ~ N (µ f + µ g , σ f + σ g )
k(X ) = e − 0 .5 x (8)

Parameter b (the bandwidth) is smoothing parameter, and value This can be expressed as:
of which is critical for the quality of estimation. In according to
famous Silverman rule [24], b can be determined for each τ −µ f t −τ −µg
dimensional data point as: −0.5( )2 −0.5( )2
1 1
f ∗g = ∫
σf σg
e e dτ
τ 2πσ f 2πσg
 x0  θ 0 
x  θ 
 1 ,  1
( t −( µ f + µ g ))2
x =  .  θ =  . 
−0.5    
1 σ 2f +σ 2 g  .   . 
= e  x m  θ m 
2π (σ f 2 + σ g 2 ) (14)
With this brief description and illustration we define ANN as
collection of different artificial neurons working together to
form a predictor or classifier.
Comparing first part of equation 14 & last part of equation 13,
we arrived at this setting: IV. FEATURE SELECTION PROCEDURE BASED ON MUTUAL
INFORMATION CRITERION
τ = x ji, σ f = b, σ g = σ ji , t = µ ji , µ f= x , and µ g = 0.
Equation 13 can be expressed as: The proposed feature selection is a greedy search
( x − µ ji ) 2 feedforward procedure in which a feature is added to an empty
−0.5 2 2 subset based on setting criteria. The proposed method is in
∧ 1 N 1 σ ji + b
f ( x) = ∑ 2 2
e (15) table1.
N i =1 2π (σ ji + b )
∧ ∧
Both S and f are initialized to empty set and candidate
Worth taking note is that both f ( x ) and f Y|X (yi | xj ) are evaluated feature set respectively. f is a set of features of X, we compute
i
the same, except that in the later only sample with output yi are
j
the amount of information contained in the feature as I( f )
consider. based on MI implementation. After the estimated Mutual
information for each subset of X is computed, we set a
threshold value ϴ as a criterion for feature selection. That is to
say only I( f) that fulfils the threshold criterion will be selected,
A. Artificial Neural Netwroks (ANN) Classification otherwise it is rejected. We used threshold values to distinguish
ANN was developed by simulating human neurons or ranges of values where different suboptimal subsets vary in
network of neurons in the brain. Neurons are cells with cell some important way.
body and dendrites which are the input wires connected to the
The proposed procedure is suitable for small dimensional
cell body and receive signals from body receptors. In addition
feature sets, because the method depends on estimating pdf
to these two, neurons have output wire known as axon which from dataset through an exhaustive sequential forward
sends signals or informative messages to other neurons. Thus, selection. And the choice of ϴ is critical to obtaining good
at primitive level a neuron is a computational unit that suboptimal feature subset S
receives a number of inputs (electric pulses) through its input
wires (dendrites), does some computation in the cell body and
TABLE I. GREDY FORWARD SELECTION ALGORITHM
sends output through its axon to other neurons in the brain.
Input: Training Dataset D = (X, y);
Output: Best suboptimal feature subset S;
1. Initialize S={}(selected feature subsets) and f=set feature of
X;
2. for each j in Xj
3. Compute I(fj )= MI(Xj,y);
4. Set threshold value ϴ based on computed Is
Fig. 1. Neuron model: (a) without bias, (b) with bias input x0 and weight θ0 5. for each j in f
explicitly shown.
6. S=S∪{fj } if I(fj ) ≥ ϴ;
Figure1 depicted ANN implementation of a simple model of a 7. Return S as selected feature subset;
neuron; the empty circle (in orange) plays a role analogous to
the body of neuron, each arrow linking an input xi to the
neuron, has attached a parameter, or weight, θi, the diagram
represent the computation of a sigmoid fed with the dot V. EXPERIMENTAL STUDIES
product of x and θ in the form of: The well log data variables used for this study are
1 DEPTH (depth), MSFL (Micro spherically Focused Log),
hθ (x) = T
(16)
DT (Sonic travel time), NPHI (Neutron porosity), PHIT
1 + e −θ x
(Total porosity), RHOB (Bulk density), SWT (Water
Where x and θ are our parameter vectors:
saturation), CALI (Caliper log), CT (Electric
conductivity), DRHO (Density), GR (Gamma Ray Log) compare the training time for each selected feature subsets
and RT (Deep Resistivity), and permeability which is the including the full feature set.
output reservoir attribute. It is private data collected from Three feature subsets with 3, 6 and 9 number of elements in
wells of a Middle Eastern region. The dataset consist of 12 subset were generated based on the proposed method. The 3
well logs (feature set) and permeability as the target core generated feature subsets are based on the threshold values of
log. The dataset consists of 880 data points, which we 25 percentile, average and 75 percentile of computed MI
divided into training set 60% (528 data points), validation estimations respectively. Using cross-validation method we
set 20% (176 data points) and test set 20% (176 data selected a fixed neural architecture for the generated selected
points). feature subsets and full feature set. We trained each classifier
and used both the training and test dataset for predictions. The
A. Experiemental Setups
results of the prediction accuracies for the training and test
datasets is presented in figure 2. In both training and test
Starting from the data analysis, we explored the nature of dataset selected feature subset containing 3 features performed
the dataset using statistical analysis tools such as five number very low. While full feature set (with 12 features) scores over
summary and boxplot. This helps us to visualize the entire
80% prediction accuracy in the training dataset, but performed
dataset and see the nature of individual log (feature)
distribution. Finally we realized the need for normalization of badly in unknown (test) dataset, this is a case of over fitting
the dataset to loosely the same level in order to avoid bias / and usually occurs in dataset with many features but small
vaiance problems. data points. This result has indicated one of the benefits of
having feature selection. Feature selection curtails the problem
Discretization is the process of re-encoding each of over fitting by reducing irrelevant features that could
continuous attribute into a discrete attribute using a set of overwhelm the performance of a classifier.
intervals. It is an essential task of data preprocessing because
it makes transformed data in a set of intervals more relevant
for human interpretation [22]. Since our proposed feature
selection method is formulated on assumption that the output
variable has discrete values, we employed domain expert
knowledge based on [23] to discretize our targeted output as
presented in table II. And the domain expert based
discretization is the preferred discretization methods for
machine learning classification modeling.

TABLE II. PERMEABILITY RANGE AND LABEL

Permeability range (in mD) Label / Class


0.0001 - 0.001 Extremely Tight
0.001 - 0.01 Very Tight
0.01 - 0.1 Tight Fig. 2. Prediction Accuracy Vs Number of Selected Features.
0.1 - 1.0 Low
1.0 - 10.0 Moderate However in both cases selected feature subset with 9
10.0 - 100.0 High features in subset appears to be the best suboptimal subset. It
performed reasonably high in both training and test datasets,
Choosing a threshold value is critical in obtaining then followed by selected feature subset with 6 features.
best suboptimal feature subset. In this experiment we Although the selected feature subsets with 6 features has
considered our threshold values empirically by taking 25 performed less than the selected feature subset with 9 features
percentile, average and 75 percentile of estimated MI values in both training and test datasets, it has least difference in
computed for all the input features. Finally, we implemented prediction accuracy in both training and test dataset. This is an
neural network with one hidden layer and multi-label output indication that it is second best suboptimal feature subset.
back propagation model to investigate the performance of In the same manner, we captured the time it takes to train each
Feature selection method for each of the threshold value of the selected feature subsets and the full feature subset. The
against the full set of features. results are presented in figure 3. Full feature subset takes more
time to train compared to each of the selected feature subsets.
While both the selected feature subsets with 3 and 6 features
B. Results and Discussion trained in least time, the selected feature subset with 9 features
takes relatively more time to train compared to both the
We conducted several experiments to evaluate the selected feature subsets with 3 and 6 features. And when the
performance of the proposed method. The first experiment is 9-features subset is compared to the full feature set, it takes
to evaluate the prediction accuracy of each of the selected less time to be trained.
feature subsets including the full feature set and we make
comparison between them. And the second experiment is to
[5] Charu C. Aggarwal “Managing and Mining Uncertain Data.” IBM T. J
Watson Research Center, Howthorne, NY10532. Kluwer Academic
Publishers, Boston/Dordrecht/London.
[6] Jose Akamine and Jef Caers. “A workflow to account for Uncertainty in
well-log data in 3D geostatistical reservoir modeling.” Stanford Center
for Reservoir Forecasting, May 2007.
[7] A. Selamat, S. O. Olatunji, A. AbdulRaheem. “A Hybrid Model through
the Fusion of Type-2 Fuzzy Logic Systems and Sensitivity-Based Linear
Learning Method for Modeling PVT Properties of Crude Oil Systems”.
Advances in Fuzzy Systems. Research Article, Hindawi Publishing
Corporation. Volume2012, ArticleID359429, 19 Pages.
[8] L. Ghouti, S. Al-Bukhitan. “Hybrid Soft Computing for PVT properties
Prediction”. In proceeding of European Symposium on Artificial Neural
Networks – Computational Intelligence and Machine Learning. Bruges
(Belgium), 28-30 April 2010, d-side publi. ISBN 2-930307-10-2.
[9] F. A. Anifowose, J. Labadin, A. Abdulraheem. “Prediction of Petroleum
Fig. 3. Running time Vs Number of Selected Features. Reservoir Properties Using Different Versions of Adaptive Neuro-Fuzzy
Inference System Hybrid Models”. International Journal of Computer
Information Systems and Industrial Management Applications. ISSN
2150-7988 Volume 5 (2013) PP.413 -426.
CONCLUSION
[10] G. Doquire, M. Verleysen. “Feature Selection with Mutual Information
In this study we proposed feature selection method based for Uncertain Data”. Springer Link Data Warehousing and Knowledge
on the greedy feedforward algorithm for selecting feature Discovery Lecture Notes in Computer Science Volume 6862, 2011, PP.
subset from the full feature set. What is unique in this method 330-341.
is the introduction of a threshold value based on the statistically [11] Lie Yu, Huan Liu. “Feature Selection for High-Dimensional Data: A
Fast Correlation-Based Filter Solution.” Proceeding of the Twentieth
sound idea, were we used threshold values to distinguish International Conference on Machine Learning (ICML-2003),
ranges of values where different suboptimal subsets vary in Washington DC, 2003.
some important way. [12] Isabelle Guyon, Andre Elisseeff. “An Introduction to Variable and
Feature Selection.” Journal of Machine Learning Research 3 (2003)
We tested the performance of the proposed method by 1157-1182.
modeled classifiers. The best selected suboptimal feature
[13] Huan Liu, Lei Yu. “Toward Integrating Feature Selection Algorithms for
subset by the proposed method performed better in terms of Classification and Clustering.” IEEE Transactions on Knowledge and
overall prediction accuracies and low training time when Data Engineering, Volume 17, No. 4, April 2005.
compared to the prediction accuracy of the full feature set. [14] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. “From data mining to
knowledge discovery: an overview”. ACM Digital Library Advances in
Considering the fact that the intention of this study is to Knowledge Discovery and Data Mining (1996) 1–34.
propose an effective feature selection method for well log [15] S. Yan, H. Wang, T. S. Huang, Q. Yang, X. Tang. “Ranking with
dataset used for machine learning prediction of petroleum Uncertainty Labels”. In proceeding of IEEE International Conference on
reservoir properties, the percentage prediction accuracy Multimedia and Expo. 2007.
obtained are by no means the best results when focusing on [16] L. Song, A. Smola, A. Gretton, J. Bedo, K. Borgwardt. “Feature
Selection via Dependence Maximization”. In Journal of Machine
classification. In this study we fixed the parameters for the Learning Research 13 (2012) 1393 – 1433.
purpose of fair comparison of the feature subsets. In future [17] D. François, F. Rossi, V. Wertz, and M. Verleysen. “Resampling
work, we intend to test this method using different classifiers. methods for parameter-free and robust feature selection with mutual
We will also see how we can extend this method to regression information.” Neurocomputing 70, 7-9 (2007) 1276-1288.
problems. [18] R. Battiti. “Using Mutual Information for Selecting Features in
Supervised Neural Net Learning.” IEEE Transaction on Neural
Networks, Vol. 5 No. 4 July, 1994.
REFERENCES [19] H. Peng, F. Long, and C. Ding. “Feature Selection Based on Mutual
Information: Criteria of Max-Dependency, Max-Relevance, and Min-
Redundancy.” IEEE Transactions on Pattern Analysis and Machine
[1] William R. Moore, Y. Zee Ma, Jim Urdea, and Tom Bratton. Intelligence, Vol. 27, No 8, August 2005.
“Uncertainty Analysis in Well-log and Petrophysical Interpretations.” [20] F. Rossi, A. Lendasse, D. François, V. Wertz, and M. Verleysen.
Schlumberger, Greenwood Village, Colorado U.S.A. “Mutual information for the selection of relevant variables in
[2] Lone Star Securities, Inc. “Understanding and Investing in Oil and spectrometric nonlinear modelling.” Chemometrics and intelligent
Natural Gas Drilling and Production Projects”. Laboratory Systems 80, 2 (2006) 215-226.
http://www.lonestarsecurities.com, 2013. [21] Parzen, E. “On Estimation of a Probability Density Function and Mode.”
[3] Emmanuel Gringarten. “Integrated uncertainty assessment – from Annals of Mathematical Statistics, Volume 33, 1065–1076
seismic and well-logs to flow simulation.” PARADIGM, SEG Las (September1962).
Vegas 2012 Annual Meeting. [22] CSUR: “Understanding Tight Oil,”. Canadian Society for
[4] A. Unler, A. Murat, R. B. Chinnam. “Mr2PSO: A maximum relevance Unconventional Resources Information about Canada's emerging
minimum redundancy feature selection method based on swarm energy resources, Information Booklet.
intelligence for support vector machine classification”. Elsevier Journal [23] Silverman, B.W. “Density Estimation for Statistics and Data Analysis.”
of Information Sciences 181(2011)4625 – 4641. Published in Monographs on Statistics and Applied Probability, London:
Chapman and Hall, 1986.
[24] Writer’s Handbook. Mill Valley, CA: University Science, 1989.

You might also like