Sulaiman 2015

This paper proposes a feature selection method based on mutual information for predicting petroleum reservoir properties using machine learning models like SVM and ANN. It addresses the challenges posed by uncertainty in well log datasets, which can affect prediction accuracy, and demonstrates that the proposed method can enhance model performance and reduce training time. The study includes a background on well log data, the mutual information hypothesis, and experimental results that validate the effectiveness of the feature selection approach.

Uploaded by

hetland vanhoogland

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Sulaiman 2015

Uploaded by

hetland vanhoogland

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Feature Selection based on Mutual Information

for Machine learning prediction of Petroleum reservoir properties

Muhammad Aliyu Sulaiman1, Jane Labadin2

Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak,
94300 Kota Samarahan, Sarawak, Malaysia
1
muhalisu@gmail.com, 2 ljane@pps.unimas.my

Abstract— the application of machine learning models such as The rest of the paper is organized as follows: section II is a
support vector machine (SVM) and artificial neural networks background of the study and it discusses the overview of well
(ANN) in predicting reservoir properties has been effective in the log data and the feature selection methods. Section III, we
recent years when compared with the traditional empirical present Mutual Information hypothesis and formulation of its
methods. Despite that the machine learning models suffer a lot in estimation from the dataset. Section IV presents the proposed
the faces of uncertain data which is common characteristics of feature selection for well log dataset based on greedy
well log dataset. The reason for uncertainty in well log dataset feedforward procedure. Experimental studies are detailed in
includes a missing scale, data interpretation and measurement section V, which include experimental setups, results of
error problems. Feature Selection aimed at selecting feature
experiments and discussion.
subset that is relevant to the predicting property. In this paper a
feature selection based on mutual information criterion is
proposed, the strong point of this method relies on the choice of II. BACKGROUND OF THE STUDY
threshold based on statistically sound criterion for the typical Literature review provides overview of sources of
greedy feedforward method of feature selection. Experimental
uncertainty in well log data and how the uncertainty affects
results indicate that the proposed method is capable of improving
the performance of the machine learning models in terms of optimal performance of machine learning applications to oil
prediction accuracy and reduction in training time. and gas predictions. Finally, feature selection algorithms in
related studies are reviewed.
Keywords—Machine Learning; Mutual Information; Feature
Selection. A. Overview of uncertainty in well log dataset and how it
affeacts accuracy of Machine learning predictors.
I. INTRODUCTION Uncertainty information in data is useful information that
can be utilized to improve the quality of underlying result. As
Well logging is at the heart of the oil and gas exploration, it such, feature with greater uncertainty may not be as important
provides continuous record of rock’s formation properties.
as one which has a lower amount of uncertainty [5].
Reservoir variables are known to be used as input data to a
As earlier on mentioned in the introduction, reservoir variables
reservoir study. These variables are commonly derived through
a number of processes and they are not measured directly from such as porosity, permeability, water saturation and minerals
well logging tools. Out of all the reservoir properties, the are known to be used as input data to a reservoir study. And
reservoir porosity and permeability collectively refer to as core these variables are commonly derived through a number of
logs are of great importance, because accurate prediction of processes which include acquisition, processing, interpretation
these properties is essential in determining where to drill and if and calibration and they are not measured directly from well
found, how much of oil and gas can be recovered [2]. logging tools. Each of these processes has uncertainty and as
However, existence of uncertainty in well log dataset affects such the result of petrophysical data or well logs will equally
the optimal performance of machine learning models to predict have uncertainty and limitation [1, 3]. More so it is commonly
these properties, in order to address the problem associated acknowledged that uncertainty exist at all stages of petroleum
with uncertainty in well log dataset as regards to the exploration [3, 5 & 6]. And not only that, it propagates with
performance of machine learning models we introduced a each stage since each stage is built on the result from the
feature selection algorithm based on Mutual Information previous stages.
criterion. The choice of mutual information is because of its A hybrid model for predicting Pressure Volume and
ability to select features that retain relevant information of the Temperature (PVT) properties of crude oil is presented in [7],
predicting parameter. Moreover, to measure the effectiveness the model was based on the fusion of Type-2 fuzzy logic
of the proposed study we implemented a machine learning system (type-2 FLS) and Sensitivity-based linear learning
models based on back propagation neural networks. And we method (SBLLM). The authors categorically recognized the
used the trained classifiers to test our proposed method in terms
presence of uncertainty in well-log datasets and limitation of
of prediction accuracy and training time, by comparing the
SBLLM to generalize when there is uncertainty in dataset.
performance of selected feature subsets with the performance
of full feature set. And since Type-2 FLS is known for modeling uncertainty,
therefore it is used to improve the prediction ability of
SBLLM in the presence of uncertainty in dataset. A genetic- data mining and machine learning application were proposed in
neuro-fuzzy inference system is proposed in [8] to estimate [10, 15, and 16] based on mutual information (MI) criterion,
Pressure, Volume & Temperature (PVT) properties of crude combination of ranking and expectation-maximization(EM)
oil system. In that study it was cited that ANN correlations are and Hilbert-Schmidt independence criterion (HSIC)
limited and less accurate in terms of global accuracy, hence respectively. Mutual information (MI) Criterion based on
this led the authors to proposed hybrid genetic-neuro-fuzzy probability density function (pdf) of each data value was
system. Although they could not ascertain why ANN proposed in [10] and experimental results on 8 UCI machine
correlations are less accurate in terms of global accuracy, learning repository have proved the effectiveness of this
algorithm. MI was first evaluated between each feature of the
emphasis was on fuzzy clustering optimization criterion and
training set and the output vector. The resulting MI scores for
ranking as motivation for their method. From this we can
each feature are then used to rank the features. Also, two
inferred that data uncertainty could be the reason why ANN different aging databases were used in [16] for experiment. FG-
gives suboptimal result when used alone, just as proclaimed in NET containing 1002 face images of 82 persons with age
both [7& 10]. Again, two hybrid intelligence systems for ranging from 0 to 69, also, Yamaha face database containing
predicting petroleum reservoir properties were proposed in 800 males, 800 females and 8000 images with ages ranging
[9], functional networks-support vector machines (FN-SVM) from 0 to 93. The Ranking model was built based on kernel
and functional networks-type-2 fuzzy logic (FN-T2FL) to trick and bilinear regression strategy, and the parameter
improve the performance of standalone SVM and T2FL learning technique was based on EM. In [16], datasets taken
respectively. In both hybrid systems the functional network from UCI repository, the Statlib repository and LibSVM
components uses least square fitting algorithm to extract website were used for testing and comparison. Hilbert-Schmidt
relevant features from input data, and this was the core reason Independence Criterion which is based on covariance between
for improvement of these models over individual standalone variables mapped to produce kernel Hilbert spaces, were
models. employed for feature selection together with greedy backward
elimination algorithm. Experimental results shown that HSIC
B. Overview of feature selection methods performed comparably to other-state-of-the-art feature selectors
More often, real life applications of classification or such as, SVM Recursive Elimination (RFE), RELIEF, L0 –
prediction include collection of large number of norm SVM (L0) and R2W2.
attributes/features for some reasons other than mining the data,
thereby containing replicate or irrelevant features [14]. III. MUTUAL INFORMATION (MI) HYPOTHESIS AND
There are three broadly categories of models for feature FORMULATION OF ITS ESTIMATION FROM THE DATASET
selection algorithms [4, 11, 12 and 13]. The first category is the Mutual Information (MI) of two random variables is a
filter models, which are group of models that utilize statistical quantitative measurement of the amount of dependence
and probabilistic distributions of dataset attributes in order to (information) between the two random variables. Unlike
select feature subset from the input dataset. Hence they select correlation coefficient that can only handle linear dependence,
feature subset independent of any particular learning machine. MI is able to detect both linear and non-linear relationships
This independent selection enables subsequent feature
between variables, a property that made it a popular choice for
prediction by any learning machine. Feature Selection based on
feature selection [10, 17, 18, 19 and 20].
Mutual Information (MI) is an example of (unsupervised) filter
model. The second category is wrapper models, these are
optimization searched algorithms that are used together with a Formally, the MI of a pair of random variables X and Y is
particular learning machine to find the best feature subset based defined by probability density function (pdf) of X, Y and joint
on the prediction accuracy of a learning machine. Genetic variables (X, Y). If we denote the pdf of X, Y and joint (X, Y) as
algorithm (GA) and Particle Swarm Optimization (PSO) f X, fY, and f X ,Y respectively.
algorithm are examples of wrapper models. This dependent on
a particular learning machine to search the best feature subset
makes wrappers to have better prediction accuracy with f (x, y)
MI(X;Y) = ∫∫ f (x, y).log X,Y
dd (1)
relatively high computational overhead. For example, feature X,Y
f X
(x) f (y)
Y
x y

subset generated based on GA and ANN may perform woefully

when used to make prediction with SVM. The third category is Note that if the variables X and Y are completely independent,
the hybrid models, which combined the power of both filter then the joint pdf is equal to the product of pdf of X and pdf of
and wrapper to select feature subsets.
Y that is, f X ,Y = f X * f Yand MI becomes equal to zero:
Unler et al in [4] introduces hybrid filter-wrapper feature
subset selection algorithm based on PSO with SVM. Mutual MI(X;Y) = 0 (2)
Information (MI) defined in terms of probability density
functions of variables serves as filter algorithm, while Particle MI can also be expressed in terms of entropy, which is another
Swarm Optimization (PSO) which is a population-based information theory that measure uncertainty in random
optimization search technique was used as wrapper to find the variables. Entropy of X is expressed as:
best strongly relevant feature subset identified by MI and
finally SVM is used for classification. Also, varied filter h( X ) = −∫ f X ( x)log f X ( x) dx (3)
(feature selection) algorithms for managing uncertain data in
And the mutual information is expressed in terms of entropy bj=1.06σj N (9)
as:
σj is the standard deviation along jth dimension of the dataset.
MI(X; Y) =h(Y) - h(Y/X) (4) ∧

Also, f (y | x ) is the estimated pdf of the ith class conditional to the

i j
∧

jth feature, based on Bayes theorems, f ( y | x ) from equation 6 Yi | X j i j

h(Y/X) is the uncertainty about Y when X is known. Again, if X can be rewritten as:
and Y are independent: h(Y/X) = h(Y) and MI(X; Y) =0.
^ ^
∧ f X j |Yi (x | yi ) p( yi )
With these definitions now let us assumed that, given a
dataset X containing N samples and d-attributes/features, our
f X jYi
( yi | xj ) = ^
(10)
f xj (x)
goal is to predict the class of these samples based on previously
observed input/output pairs. That is to estimate MI (MI(X; Y)) Then equation 6 becomes:
between X continuous variables and Y discrete variables (Y is
the classes of sample under consideration). ^ ^ ^ ^
^ ∧ f X j |Yi (x | yi ) p(yi ) f X j |Yi (x | yi ) p(yi )
More precisely, our objective here is to evaluate MI (Xj; Y), for h(Y / X j ) = ∫ fX j (x)∑ ^
log ∧
dx (11)
j=1, 2,…, d attributes. Assuming that Y takes k different xj f xj (x) f x j (x)
discrete values y1………,yk. Each yi is represented by ni
samples (where ∑i ni =N). Thus, probability that Y = yi is This explained how the MI can be entirely determined by the
^
denoted by p ( y i ) = n i . Now the entropy of Y is: pdf of the variables X; and possibly limited to the points with
N a particular class label yi.

∧ k ∧ ∧ Recall that Xj=[Xj1………,XjN] model uncertainty in the

h(Y ) = −∑ p( yi ) log p( yi ) (5) data using Gaussian pdf i.e.
i =1 X j1 ~ N ( µ j1 , σ j1 )..... X jN ~ N ( µ jN , σ jN ) .
Equation 5 can be omitted when we are comparing the µji, is observed value for the jth dimension of the ith sample,
estimated MI of each feature with the output Y, since entropy of and σji, is the corresponding variance. Replacing equation 7
Y is the same for all features. Thus MI is estimated in terms of
with equation 12 below is the natural approach to consider the
entropy as:
^ ∧ k ∧ ∧ expected value of the kernel k.
( / Xj ) =−∫ f (xj )∑f
hY (yi | xj )log f (yi | xj )dx (6)
Xj Yi|Xj Yi|Xj
xj i
N
∧ 1 x − xi
Equation 6 is the entropy of classes of (Y) over each of X
f (x) =
N b
∑i=1
E [k (
b
)] (12)
attributes.
Now by ignoring equation 5, equation 4 (which is the actual Expected value of the Kernel function k(X) is calculated using
expression of MI estimation of two random variables) becomes the law of the unconscious statistician (LOTUS) as:
equal to equation 6. From the expression
∧ of MI as in equation
x ji −µ ji
6, we can see that MI depends on, f ( x ) so we need to estimate x − x 1 −0.5( )2
∧
1 N
f (x) = ∑∫ k( ji )
σ ji
pdf of X from the dataset only. The common method use for e dxji
estimating pdf is histogram or kernel-based estimators. In this Nbi=1 x ji b 2πσji
study, Parzen-Window density estimator will be used as
appears in [21]. x ji −x x ji −µ ji
)2
1 N 1 −0.5( )2 1 −0.5(
= ∑∫ b σ ji
Consider x1……,xN where N is independent and identically e e dxji
distributed (i.i.d.) samples drawn from the distribution f. Thus: Nbi=1 xji 2π 2πσji
xji −x 2 µji −xji
^ 1 N x−x )2
∑ k( b i ) 1N 1 −0.5( ) 1 −0.5(
= ∑∫
f ( x) = (7) b σ ji
Nb i =1 e e dxji (13)
N i=1 xji 2πb 2πσji
Where k is kernel and b is called bandwidth. Gaussian kernel Since Convolution of two Gaussian distributions is another
with zero mean and unit variance is the most popular choice for Gaussian distribution. Let say:
k. i.e.
f ~ N ( µ f , σ f ) & g ~ N ( µ g ,σ g )
1 2
⇒ C ~ N (µ f + µ g , σ f + σ g )
k(X ) = e − 0 .5 x (8)
2π
Parameter b (the bandwidth) is smoothing parameter, and value This can be expressed as:
of which is critical for the quality of estimation. In according to
famous Silverman rule [24], b can be determined for each τ −µ f t −τ −µg
dimensional data point as: −0.5( )2 −0.5( )2
1 1
f ∗g = ∫
σf σg
e e dτ
τ 2πσ f 2πσg
 x0  θ 0 
x  θ 
 1 ,  1
( t −( µ f + µ g ))2
x =  .  θ =  . 
−0.5    
1 σ 2f +σ 2 g  .   . 
= e  x m  θ m 
2π (σ f 2 + σ g 2 ) (14)
With this brief description and illustration we define ANN as
collection of different artificial neurons working together to
form a predictor or classifier.
Comparing first part of equation 14 & last part of equation 13,
we arrived at this setting: IV. FEATURE SELECTION PROCEDURE BASED ON MUTUAL
INFORMATION CRITERION
τ = x ji, σ f = b, σ g = σ ji , t = µ ji , µ f= x , and µ g = 0.
Equation 13 can be expressed as: The proposed feature selection is a greedy search
( x − µ ji ) 2 feedforward procedure in which a feature is added to an empty
−0.5 2 2 subset based on setting criteria. The proposed method is in
∧ 1 N 1 σ ji + b
f ( x) = ∑ 2 2
e (15) table1.
N i =1 2π (σ ji + b )
∧ ∧
Both S and f are initialized to empty set and candidate
Worth taking note is that both f ( x ) and f Y|X (yi | xj ) are evaluated feature set respectively. f is a set of features of X, we compute
i
the same, except that in the later only sample with output yi are
j
the amount of information contained in the feature as I( f )
consider. based on MI implementation. After the estimated Mutual
information for each subset of X is computed, we set a
threshold value ϴ as a criterion for feature selection. That is to
say only I( f) that fulfils the threshold criterion will be selected,
A. Artificial Neural Netwroks (ANN) Classification otherwise it is rejected. We used threshold values to distinguish
ANN was developed by simulating human neurons or ranges of values where different suboptimal subsets vary in
network of neurons in the brain. Neurons are cells with cell some important way.
body and dendrites which are the input wires connected to the
The proposed procedure is suitable for small dimensional
cell body and receive signals from body receptors. In addition
feature sets, because the method depends on estimating pdf
to these two, neurons have output wire known as axon which from dataset through an exhaustive sequential forward
sends signals or informative messages to other neurons. Thus, selection. And the choice of ϴ is critical to obtaining good
at primitive level a neuron is a computational unit that suboptimal feature subset S
receives a number of inputs (electric pulses) through its input
wires (dendrites), does some computation in the cell body and
TABLE I. GREDY FORWARD SELECTION ALGORITHM
sends output through its axon to other neurons in the brain.
Input: Training Dataset D = (X, y);
Output: Best suboptimal feature subset S;
1. Initialize S={}(selected feature subsets) and f=set feature of
X;
2. for each j in Xj
3. Compute I(fj )= MI(Xj,y);
4. Set threshold value ϴ based on computed Is
Fig. 1. Neuron model: (a) without bias, (b) with bias input x0 and weight θ0 5. for each j in f
explicitly shown.
6. S=S∪{fj } if I(fj ) ≥ ϴ;
Figure1 depicted ANN implementation of a simple model of a 7. Return S as selected feature subset;
neuron; the empty circle (in orange) plays a role analogous to
the body of neuron, each arrow linking an input xi to the
neuron, has attached a parameter, or weight, θi, the diagram
represent the computation of a sigmoid fed with the dot V. EXPERIMENTAL STUDIES
product of x and θ in the form of: The well log data variables used for this study are
1 DEPTH (depth), MSFL (Micro spherically Focused Log),
hθ (x) = T
(16)
DT (Sonic travel time), NPHI (Neutron porosity), PHIT
1 + e −θ x
(Total porosity), RHOB (Bulk density), SWT (Water
Where x and θ are our parameter vectors:
saturation), CALI (Caliper log), CT (Electric
conductivity), DRHO (Density), GR (Gamma Ray Log) compare the training time for each selected feature subsets
and RT (Deep Resistivity), and permeability which is the including the full feature set.
output reservoir attribute. It is private data collected from Three feature subsets with 3, 6 and 9 number of elements in
wells of a Middle Eastern region. The dataset consist of 12 subset were generated based on the proposed method. The 3
well logs (feature set) and permeability as the target core generated feature subsets are based on the threshold values of
log. The dataset consists of 880 data points, which we 25 percentile, average and 75 percentile of computed MI
divided into training set 60% (528 data points), validation estimations respectively. Using cross-validation method we
set 20% (176 data points) and test set 20% (176 data selected a fixed neural architecture for the generated selected
points). feature subsets and full feature set. We trained each classifier
and used both the training and test dataset for predictions. The
A. Experiemental Setups
results of the prediction accuracies for the training and test
datasets is presented in figure 2. In both training and test
Starting from the data analysis, we explored the nature of dataset selected feature subset containing 3 features performed
the dataset using statistical analysis tools such as five number very low. While full feature set (with 12 features) scores over
summary and boxplot. This helps us to visualize the entire
80% prediction accuracy in the training dataset, but performed
dataset and see the nature of individual log (feature)
distribution. Finally we realized the need for normalization of badly in unknown (test) dataset, this is a case of over fitting
the dataset to loosely the same level in order to avoid bias / and usually occurs in dataset with many features but small
vaiance problems. data points. This result has indicated one of the benefits of
having feature selection. Feature selection curtails the problem
Discretization is the process of re-encoding each of over fitting by reducing irrelevant features that could
continuous attribute into a discrete attribute using a set of overwhelm the performance of a classifier.
intervals. It is an essential task of data preprocessing because
it makes transformed data in a set of intervals more relevant
for human interpretation [22]. Since our proposed feature
selection method is formulated on assumption that the output
variable has discrete values, we employed domain expert
knowledge based on [23] to discretize our targeted output as
presented in table II. And the domain expert based
discretization is the preferred discretization methods for
machine learning classification modeling.

TABLE II. PERMEABILITY RANGE AND LABEL

Permeability range (in mD) Label / Class

0.0001 - 0.001 Extremely Tight
0.001 - 0.01 Very Tight
0.01 - 0.1 Tight Fig. 2. Prediction Accuracy Vs Number of Selected Features.
0.1 - 1.0 Low
1.0 - 10.0 Moderate However in both cases selected feature subset with 9
10.0 - 100.0 High features in subset appears to be the best suboptimal subset. It
performed reasonably high in both training and test datasets,
Choosing a threshold value is critical in obtaining then followed by selected feature subset with 6 features.
best suboptimal feature subset. In this experiment we Although the selected feature subsets with 6 features has
considered our threshold values empirically by taking 25 performed less than the selected feature subset with 9 features
percentile, average and 75 percentile of estimated MI values in both training and test datasets, it has least difference in
computed for all the input features. Finally, we implemented prediction accuracy in both training and test dataset. This is an
neural network with one hidden layer and multi-label output indication that it is second best suboptimal feature subset.
back propagation model to investigate the performance of In the same manner, we captured the time it takes to train each
Feature selection method for each of the threshold value of the selected feature subsets and the full feature subset. The
against the full set of features. results are presented in figure 3. Full feature subset takes more
time to train compared to each of the selected feature subsets.
While both the selected feature subsets with 3 and 6 features
B. Results and Discussion trained in least time, the selected feature subset with 9 features
takes relatively more time to train compared to both the
We conducted several experiments to evaluate the selected feature subsets with 3 and 6 features. And when the
performance of the proposed method. The first experiment is 9-features subset is compared to the full feature set, it takes
to evaluate the prediction accuracy of each of the selected less time to be trained.
feature subsets including the full feature set and we make
comparison between them. And the second experiment is to
[5] Charu C. Aggarwal “Managing and Mining Uncertain Data.” IBM T. J
Watson Research Center, Howthorne, NY10532. Kluwer Academic
Publishers, Boston/Dordrecht/London.
[6] Jose Akamine and Jef Caers. “A workflow to account for Uncertainty in
well-log data in 3D geostatistical reservoir modeling.” Stanford Center
for Reservoir Forecasting, May 2007.
[7] A. Selamat, S. O. Olatunji, A. AbdulRaheem. “A Hybrid Model through
the Fusion of Type-2 Fuzzy Logic Systems and Sensitivity-Based Linear
Learning Method for Modeling PVT Properties of Crude Oil Systems”.
Advances in Fuzzy Systems. Research Article, Hindawi Publishing
Corporation. Volume2012, ArticleID359429, 19 Pages.
[8] L. Ghouti, S. Al-Bukhitan. “Hybrid Soft Computing for PVT properties
Prediction”. In proceeding of European Symposium on Artificial Neural
Networks – Computational Intelligence and Machine Learning. Bruges
(Belgium), 28-30 April 2010, d-side publi. ISBN 2-930307-10-2.
[9] F. A. Anifowose, J. Labadin, A. Abdulraheem. “Prediction of Petroleum
Fig. 3. Running time Vs Number of Selected Features. Reservoir Properties Using Different Versions of Adaptive Neuro-Fuzzy
Inference System Hybrid Models”. International Journal of Computer
Information Systems and Industrial Management Applications. ISSN
2150-7988 Volume 5 (2013) PP.413 -426.
CONCLUSION
[10] G. Doquire, M. Verleysen. “Feature Selection with Mutual Information
In this study we proposed feature selection method based for Uncertain Data”. Springer Link Data Warehousing and Knowledge
on the greedy feedforward algorithm for selecting feature Discovery Lecture Notes in Computer Science Volume 6862, 2011, PP.
subset from the full feature set. What is unique in this method 330-341.
is the introduction of a threshold value based on the statistically [11] Lie Yu, Huan Liu. “Feature Selection for High-Dimensional Data: A
Fast Correlation-Based Filter Solution.” Proceeding of the Twentieth
sound idea, were we used threshold values to distinguish International Conference on Machine Learning (ICML-2003),
ranges of values where different suboptimal subsets vary in Washington DC, 2003.
some important way. [12] Isabelle Guyon, Andre Elisseeff. “An Introduction to Variable and
Feature Selection.” Journal of Machine Learning Research 3 (2003)
We tested the performance of the proposed method by 1157-1182.
modeled classifiers. The best selected suboptimal feature
[13] Huan Liu, Lei Yu. “Toward Integrating Feature Selection Algorithms for
subset by the proposed method performed better in terms of Classification and Clustering.” IEEE Transactions on Knowledge and
overall prediction accuracies and low training time when Data Engineering, Volume 17, No. 4, April 2005.
compared to the prediction accuracy of the full feature set. [14] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. “From data mining to
knowledge discovery: an overview”. ACM Digital Library Advances in
Considering the fact that the intention of this study is to Knowledge Discovery and Data Mining (1996) 1–34.
propose an effective feature selection method for well log [15] S. Yan, H. Wang, T. S. Huang, Q. Yang, X. Tang. “Ranking with
dataset used for machine learning prediction of petroleum Uncertainty Labels”. In proceeding of IEEE International Conference on
reservoir properties, the percentage prediction accuracy Multimedia and Expo. 2007.
obtained are by no means the best results when focusing on [16] L. Song, A. Smola, A. Gretton, J. Bedo, K. Borgwardt. “Feature
Selection via Dependence Maximization”. In Journal of Machine
classification. In this study we fixed the parameters for the Learning Research 13 (2012) 1393 – 1433.
purpose of fair comparison of the feature subsets. In future [17] D. François, F. Rossi, V. Wertz, and M. Verleysen. “Resampling
work, we intend to test this method using different classifiers. methods for parameter-free and robust feature selection with mutual
We will also see how we can extend this method to regression information.” Neurocomputing 70, 7-9 (2007) 1276-1288.
problems. [18] R. Battiti. “Using Mutual Information for Selecting Features in
Supervised Neural Net Learning.” IEEE Transaction on Neural
Networks, Vol. 5 No. 4 July, 1994.
REFERENCES [19] H. Peng, F. Long, and C. Ding. “Feature Selection Based on Mutual
Information: Criteria of Max-Dependency, Max-Relevance, and Min-
Redundancy.” IEEE Transactions on Pattern Analysis and Machine
[1] William R. Moore, Y. Zee Ma, Jim Urdea, and Tom Bratton. Intelligence, Vol. 27, No 8, August 2005.
“Uncertainty Analysis in Well-log and Petrophysical Interpretations.” [20] F. Rossi, A. Lendasse, D. François, V. Wertz, and M. Verleysen.
Schlumberger, Greenwood Village, Colorado U.S.A. “Mutual information for the selection of relevant variables in
[2] Lone Star Securities, Inc. “Understanding and Investing in Oil and spectrometric nonlinear modelling.” Chemometrics and intelligent
Natural Gas Drilling and Production Projects”. Laboratory Systems 80, 2 (2006) 215-226.
http://www.lonestarsecurities.com, 2013. [21] Parzen, E. “On Estimation of a Probability Density Function and Mode.”
[3] Emmanuel Gringarten. “Integrated uncertainty assessment – from Annals of Mathematical Statistics, Volume 33, 1065–1076
seismic and well-logs to flow simulation.” PARADIGM, SEG Las (September1962).
Vegas 2012 Annual Meeting. [22] CSUR: “Understanding Tight Oil,”. Canadian Society for
[4] A. Unler, A. Murat, R. B. Chinnam. “Mr2PSO: A maximum relevance Unconventional Resources Information about Canada's emerging
minimum redundancy feature selection method based on swarm energy resources, Information Booklet.
intelligence for support vector machine classification”. Elsevier Journal [23] Silverman, B.W. “Density Estimation for Statistics and Data Analysis.”
of Information Sciences 181(2011)4625 – 4641. Published in Monographs on Statistics and Applied Probability, London:
Chapman and Hall, 1986.
[24] Writer’s Handbook. Mill Valley, CA: University Science, 1989.

Ensemble Machine Learning Assisted Reservoir Chara
No ratings yet
Ensemble Machine Learning Assisted Reservoir Chara
19 pages
Essemble ML Forecastproduction
No ratings yet
Essemble ML Forecastproduction
20 pages
1 s2.0 S0920410520303685 Main
No ratings yet
1 s2.0 S0920410520303685 Main
19 pages
40 Paper
No ratings yet
40 Paper
8 pages
ML Petroleum
No ratings yet
ML Petroleum
11 pages
14th Conference Kochi23 21
No ratings yet
14th Conference Kochi23 21
6 pages
SVMpetroleum
No ratings yet
SVMpetroleum
14 pages
Processes
No ratings yet
Processes
19 pages
Energies 12 03597
No ratings yet
Energies 12 03597
21 pages
Automatic Deep Vector Learning Model Applied For Oil-Well-Testing Feature Mining Purification and Classification
No ratings yet
Automatic Deep Vector Learning Model Applied For Oil-Well-Testing Feature Mining Purification and Classification
16 pages
Petroleum Reservoir Hybrid Systems
No ratings yet
Petroleum Reservoir Hybrid Systems
13 pages
Production Prediction ML
No ratings yet
Production Prediction ML
14 pages
A Comprehensive Data Mining Approach To Estimate The R - 2017 - Journal of Petro
No ratings yet
A Comprehensive Data Mining Approach To Estimate The R - 2017 - Journal of Petro
11 pages
Article Diagraphie Main
No ratings yet
Article Diagraphie Main
18 pages
A Parametric Study of Machine Learning Techniques in Petroleum Reservoir Permeability Prediction by Integrating Seismic Attributes and Wireline Data
No ratings yet
A Parametric Study of Machine Learning Techniques in Petroleum Reservoir Permeability Prediction by Integrating Seismic Attributes and Wireline Data
13 pages
Machine Learning With Hyperparameter Optimization Applied in Facies-Supported Permeability Modeling in Carbonate Oil Reservoirs
No ratings yet
Machine Learning With Hyperparameter Optimization Applied in Facies-Supported Permeability Modeling in Carbonate Oil Reservoirs
37 pages
Arma Igs 2024 0665
No ratings yet
Arma Igs 2024 0665
16 pages
Research Paper
No ratings yet
Research Paper
26 pages
Hybrid and Automated Machine Learning Approaches For Oil Fields Development: The Case Study of Volve Field, North Sea
No ratings yet
Hybrid and Automated Machine Learning Approaches For Oil Fields Development: The Case Study of Volve Field, North Sea
33 pages
Ensemble Learning Based Sustai
No ratings yet
Ensemble Learning Based Sustai
16 pages
Comprehensive Input Models and Machine Learning Me
No ratings yet
Comprehensive Input Models and Machine Learning Me
20 pages
Jmse 09 00666
No ratings yet
Jmse 09 00666
23 pages
DNN NMR
No ratings yet
DNN NMR
12 pages
Identification of Lithology From Well Log Data Usi
No ratings yet
Identification of Lithology From Well Log Data Usi
10 pages
Full Online Version-Hybrid CI Models For Characterization of Oil and Gas Reservoirs
No ratings yet
Full Online Version-Hybrid CI Models For Characterization of Oil and Gas Reservoirs
11 pages
Geom+IA - Artificial Intelligence Model For Predicting Geomechanical
No ratings yet
Geom+IA - Artificial Intelligence Model For Predicting Geomechanical
12 pages
Ristan To 2018
No ratings yet
Ristan To 2018
71 pages
bf81 PDF
No ratings yet
bf81 PDF
10 pages
Permeability Prediction of Carbonate Reservoir Based On Nuclear Magnetic Resonance (NMR) Logging and Machine Learning
No ratings yet
Permeability Prediction of Carbonate Reservoir Based On Nuclear Magnetic Resonance (NMR) Logging and Machine Learning
15 pages
Post-Fracture Production Prediction With Productio
No ratings yet
Post-Fracture Production Prediction With Productio
40 pages
Esma El 2011
No ratings yet
Esma El 2011
4 pages
Machine Learning For Predict Well Productivity With Frac
No ratings yet
Machine Learning For Predict Well Productivity With Frac
42 pages
1 s2.0 S0920410519310083 Main
No ratings yet
1 s2.0 S0920410519310083 Main
17 pages
Seismic Data Log Prediction
No ratings yet
Seismic Data Log Prediction
39 pages
Artificial Intelligence Applications in Petroleum Exploration and Production 2023
No ratings yet
Artificial Intelligence Applications in Petroleum Exploration and Production 2023
3 pages
Spe 192818 Ms
No ratings yet
Spe 192818 Ms
14 pages
Reservoir Property Prediction in The North Sea Using Machine Learning
No ratings yet
Reservoir Property Prediction in The North Sea Using Machine Learning
14 pages
SPE-199112-MS Machine Learning Models To Automatically Validate Petroleum Production Tests
No ratings yet
SPE-199112-MS Machine Learning Models To Automatically Validate Petroleum Production Tests
15 pages
Value-Aware Meta-Transfer Learning and Convolutional Mask Attention
No ratings yet
Value-Aware Meta-Transfer Learning and Convolutional Mask Attention
11 pages
Seismic-Attribute Optimization Driven by Forward Modeling - ML
No ratings yet
Seismic-Attribute Optimization Driven by Forward Modeling - ML
11 pages
Geom+IA - Artificial Intelligence Model in Predicting Geomechanical
No ratings yet
Geom+IA - Artificial Intelligence Model in Predicting Geomechanical
19 pages
Segam2012-0202 1
No ratings yet
Segam2012-0202 1
5 pages
1 s2.0 S0952197609001110 Main
No ratings yet
1 s2.0 S0952197609001110 Main
8 pages
Reservoir Insights Enabled by Machine Learning Technology A Supervised Machine Learning Method For Probabilistic Rock Type Prediction
No ratings yet
Reservoir Insights Enabled by Machine Learning Technology A Supervised Machine Learning Method For Probabilistic Rock Type Prediction
33 pages
Machine Learnig For Petroleum Engineers
No ratings yet
Machine Learnig For Petroleum Engineers
3 pages
Predicting Shale Volume From Seismic Traces Using Modified Random Vector Functional Based On Transient Search Optimization
No ratings yet
Predicting Shale Volume From Seismic Traces Using Modified Random Vector Functional Based On Transient Search Optimization
17 pages
Reservoir Characterization-Intelligent Seismic Inversion
No ratings yet
Reservoir Characterization-Intelligent Seismic Inversion
57 pages
Jurnal AI Usage Uzkeda Et Al.
No ratings yet
Jurnal AI Usage Uzkeda Et Al.
17 pages
Advanced Machine Learning Application For Permeabi
No ratings yet
Advanced Machine Learning Application For Permeabi
17 pages
An Integrated Neural-Fuzzy-Genetic-Algorithm Using Hyper-Surface Membership Functions To Predict Permeability in Petroleum Reservoirs
No ratings yet
An Integrated Neural-Fuzzy-Genetic-Algorithm Using Hyper-Surface Membership Functions To Predict Permeability in Petroleum Reservoirs
7 pages
Qishuai-Tyagi-JPSE-2020-Rig State Classification Using ML-1
No ratings yet
Qishuai-Tyagi-JPSE-2020-Rig State Classification Using ML-1
24 pages
Qin-Feng DI, Wei Chen, Jing-Nan ZHANG, Wen-Chang WANG and Hui-Juan Chen
No ratings yet
Qin-Feng DI, Wei Chen, Jing-Nan ZHANG, Wen-Chang WANG and Hui-Juan Chen
9 pages
Keywords
No ratings yet
Keywords
20 pages
Otchere 2020
No ratings yet
Otchere 2020
66 pages
Petrophysical Property Estimation From Seismic Data Using Recurrent Neural Networks
No ratings yet
Petrophysical Property Estimation From Seismic Data Using Recurrent Neural Networks
7 pages
Journal of Natural Gas Science and Engineering: Fatai Adesina Anifowose, Jane Labadin, Abdulazeez Abdulraheem
No ratings yet
Journal of Natural Gas Science and Engineering: Fatai Adesina Anifowose, Jane Labadin, Abdulazeez Abdulraheem
12 pages
2024-05-12 - Artificial Intelligence Model in Predicting Geomechanical Properties For Shale Formation
No ratings yet
2024-05-12 - Artificial Intelligence Model in Predicting Geomechanical Properties For Shale Formation
25 pages
Geosciences 14 00250 With Cover
No ratings yet
Geosciences 14 00250 With Cover
22 pages
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
No ratings yet
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
7 pages
Campos 2021
No ratings yet
Campos 2021
10 pages
Mac Nab 1981
No ratings yet
Mac Nab 1981
25 pages
Bilbo 1968
No ratings yet
Bilbo 1968
15 pages
Keegan 2017
No ratings yet
Keegan 2017
6 pages
Kill Aly 1998
No ratings yet
Kill Aly 1998
10 pages
Zachs 2012
No ratings yet
Zachs 2012
17 pages
Goffin Et 1979
No ratings yet
Goffin Et 1979
8 pages
Carlsson 1990
No ratings yet
Carlsson 1990
8 pages
Vanier 1984
No ratings yet
Vanier 1984
10 pages
Chu 1990
No ratings yet
Chu 1990
4 pages
Ansari 2013
No ratings yet
Ansari 2013
18 pages
Potter 1968
No ratings yet
Potter 1968
2 pages
Klempner 2001
No ratings yet
Klempner 2001
8 pages
Peryshkov 2011
No ratings yet
Peryshkov 2011
1 page
Lee 2014
No ratings yet
Lee 2014
13 pages
Cotter Ill 1997
No ratings yet
Cotter Ill 1997
1 page
Chrcanovic 2011
No ratings yet
Chrcanovic 2011
15 pages
10 1007@bf00808951
No ratings yet
10 1007@bf00808951
3 pages
10 3390@molecules18088740
No ratings yet
10 3390@molecules18088740
12 pages
Krupicka 1997
No ratings yet
Krupicka 1997
7 pages
Yamamoto 1993
No ratings yet
Yamamoto 1993
2 pages
Match Ar 1994
No ratings yet
Match Ar 1994
13 pages
Ma Tagne 2013
No ratings yet
Ma Tagne 2013
11 pages
Castier 1999
No ratings yet
Castier 1999
17 pages
Pampaloni 2005
No ratings yet
Pampaloni 2005
7 pages
Zuo 2013
No ratings yet
Zuo 2013
1 page
Zigan Giro V 1993
No ratings yet
Zigan Giro V 1993
9 pages
Bartels 1937
No ratings yet
Bartels 1937
10 pages
Yang 2014
No ratings yet
Yang 2014
4 pages
Brisson 2007
No ratings yet
Brisson 2007
9 pages
ĐỀ SỐ 1-phonetics-lexico-and-gramar
No ratings yet
ĐỀ SỐ 1-phonetics-lexico-and-gramar
5 pages
Research Lab Price List
No ratings yet
Research Lab Price List
3 pages
Particle Motion for Engineering Students
No ratings yet
Particle Motion for Engineering Students
63 pages
Env Data Sheet For Ip54
No ratings yet
Env Data Sheet For Ip54
3 pages
Advances in Fluid Dynamics Selected Proceedings of ICAFD 2018 B. Rushi Kumar Full
No ratings yet
Advances in Fluid Dynamics Selected Proceedings of ICAFD 2018 B. Rushi Kumar Full
162 pages
Teaching Profession Lesson 8
No ratings yet
Teaching Profession Lesson 8
27 pages
Reflection Paper
100% (1)
Reflection Paper
1 page
Assessment 2: - (First Name) (Middle Name) (Last Name)
No ratings yet
Assessment 2: - (First Name) (Middle Name) (Last Name)
4 pages
Tolerancias Mettler
No ratings yet
Tolerancias Mettler
247 pages
Study The Uses of Alternative Source of Energ in Nepal
No ratings yet
Study The Uses of Alternative Source of Energ in Nepal
20 pages
Mahek Doshi - M.SC - Clinical Psychology - Sec-A
No ratings yet
Mahek Doshi - M.SC - Clinical Psychology - Sec-A
7 pages
Test Item Analysis ENGLISH
No ratings yet
Test Item Analysis ENGLISH
4 pages
The FutureMat Company Is Researching
No ratings yet
The FutureMat Company Is Researching
2 pages
Reality Monitoring PDF
No ratings yet
Reality Monitoring PDF
19 pages
Value Education - Class 3
No ratings yet
Value Education - Class 3
2 pages
Decision Theory & Trees Guide
No ratings yet
Decision Theory & Trees Guide
130 pages
Num Sol
No ratings yet
Num Sol
4 pages
Unit-1 Atomic Structure
No ratings yet
Unit-1 Atomic Structure
13 pages
Class 5 Al Abbas 5 Date 13 September 2020 (Sunday) Time 8.30 - 10.00 Am Venue Lecture Hall Attendance 23/23
No ratings yet
Class 5 Al Abbas 5 Date 13 September 2020 (Sunday) Time 8.30 - 10.00 Am Venue Lecture Hall Attendance 23/23
6 pages
Module 7 Leadership Training
No ratings yet
Module 7 Leadership Training
14 pages
Exp 04 & 05
No ratings yet
Exp 04 & 05
5 pages
Black Book Project FINAL
No ratings yet
Black Book Project FINAL
6 pages
Sunita Williams: Record-Setting Astronaut
100% (1)
Sunita Williams: Record-Setting Astronaut
4 pages
Iso 05006-2006
No ratings yet
Iso 05006-2006
28 pages
International Standards For The Professional Practice of Internal Auditing
No ratings yet
International Standards For The Professional Practice of Internal Auditing
22 pages
Business Research CH 5
No ratings yet
Business Research CH 5
10 pages
Odisha's Ongoing Border Disputes
No ratings yet
Odisha's Ongoing Border Disputes
19 pages
Unit IV Pisa Review
No ratings yet
Unit IV Pisa Review
5 pages
Puritan Superstitions in New England
No ratings yet
Puritan Superstitions in New England
5 pages
Laboratory Report For The Experiment On Absorption and Water Content of Aggregates
No ratings yet
Laboratory Report For The Experiment On Absorption and Water Content of Aggregates
8 pages

Sulaiman 2015

Uploaded by

Sulaiman 2015

Uploaded by

Feature Selection based on Mutual Information

for Machine learning prediction of Petroleum reservoir properties

Muhammad Aliyu Sulaiman1, Jane Labadin2

subset generated based on GA and ANN may perform woefully

Also, f (y | x ) is the estimated pdf of the ith class conditional to the

jth feature, based on Bayes theorems, f ( y | x ) from equation 6 Yi | X j i j

∧ k ∧ ∧ Recall that Xj=[Xj1………,XjN] model uncertainty in the

TABLE II. PERMEABILITY RANGE AND LABEL

Permeability range (in mD) Label / Class

You might also like