Sulaiman 2015
Sulaiman 2015
Abstract— the application of machine learning models such as The rest of the paper is organized as follows: section II is a
support vector machine (SVM) and artificial neural networks background of the study and it discusses the overview of well
(ANN) in predicting reservoir properties has been effective in the log data and the feature selection methods. Section III, we
recent years when compared with the traditional empirical present Mutual Information hypothesis and formulation of its
methods. Despite that the machine learning models suffer a lot in estimation from the dataset. Section IV presents the proposed
the faces of uncertain data which is common characteristics of feature selection for well log dataset based on greedy
well log dataset. The reason for uncertainty in well log dataset feedforward procedure. Experimental studies are detailed in
includes a missing scale, data interpretation and measurement section V, which include experimental setups, results of
error problems. Feature Selection aimed at selecting feature
experiments and discussion.
subset that is relevant to the predicting property. In this paper a
feature selection based on mutual information criterion is
proposed, the strong point of this method relies on the choice of II. BACKGROUND OF THE STUDY
threshold based on statistically sound criterion for the typical Literature review provides overview of sources of
greedy feedforward method of feature selection. Experimental
uncertainty in well log data and how the uncertainty affects
results indicate that the proposed method is capable of improving
the performance of the machine learning models in terms of optimal performance of machine learning applications to oil
prediction accuracy and reduction in training time. and gas predictions. Finally, feature selection algorithms in
related studies are reviewed.
Keywords—Machine Learning; Mutual Information; Feature
Selection. A. Overview of uncertainty in well log dataset and how it
affeacts accuracy of Machine learning predictors.
I. INTRODUCTION Uncertainty information in data is useful information that
can be utilized to improve the quality of underlying result. As
Well logging is at the heart of the oil and gas exploration, it such, feature with greater uncertainty may not be as important
provides continuous record of rock’s formation properties.
as one which has a lower amount of uncertainty [5].
Reservoir variables are known to be used as input data to a
As earlier on mentioned in the introduction, reservoir variables
reservoir study. These variables are commonly derived through
a number of processes and they are not measured directly from such as porosity, permeability, water saturation and minerals
well logging tools. Out of all the reservoir properties, the are known to be used as input data to a reservoir study. And
reservoir porosity and permeability collectively refer to as core these variables are commonly derived through a number of
logs are of great importance, because accurate prediction of processes which include acquisition, processing, interpretation
these properties is essential in determining where to drill and if and calibration and they are not measured directly from well
found, how much of oil and gas can be recovered [2]. logging tools. Each of these processes has uncertainty and as
However, existence of uncertainty in well log dataset affects such the result of petrophysical data or well logs will equally
the optimal performance of machine learning models to predict have uncertainty and limitation [1, 3]. More so it is commonly
these properties, in order to address the problem associated acknowledged that uncertainty exist at all stages of petroleum
with uncertainty in well log dataset as regards to the exploration [3, 5 & 6]. And not only that, it propagates with
performance of machine learning models we introduced a each stage since each stage is built on the result from the
feature selection algorithm based on Mutual Information previous stages.
criterion. The choice of mutual information is because of its A hybrid model for predicting Pressure Volume and
ability to select features that retain relevant information of the Temperature (PVT) properties of crude oil is presented in [7],
predicting parameter. Moreover, to measure the effectiveness the model was based on the fusion of Type-2 fuzzy logic
of the proposed study we implemented a machine learning system (type-2 FLS) and Sensitivity-based linear learning
models based on back propagation neural networks. And we method (SBLLM). The authors categorically recognized the
used the trained classifiers to test our proposed method in terms
presence of uncertainty in well-log datasets and limitation of
of prediction accuracy and training time, by comparing the
SBLLM to generalize when there is uncertainty in dataset.
performance of selected feature subsets with the performance
of full feature set. And since Type-2 FLS is known for modeling uncertainty,
therefore it is used to improve the prediction ability of
SBLLM in the presence of uncertainty in dataset. A genetic- data mining and machine learning application were proposed in
neuro-fuzzy inference system is proposed in [8] to estimate [10, 15, and 16] based on mutual information (MI) criterion,
Pressure, Volume & Temperature (PVT) properties of crude combination of ranking and expectation-maximization(EM)
oil system. In that study it was cited that ANN correlations are and Hilbert-Schmidt independence criterion (HSIC)
limited and less accurate in terms of global accuracy, hence respectively. Mutual information (MI) Criterion based on
this led the authors to proposed hybrid genetic-neuro-fuzzy probability density function (pdf) of each data value was
system. Although they could not ascertain why ANN proposed in [10] and experimental results on 8 UCI machine
correlations are less accurate in terms of global accuracy, learning repository have proved the effectiveness of this
algorithm. MI was first evaluated between each feature of the
emphasis was on fuzzy clustering optimization criterion and
training set and the output vector. The resulting MI scores for
ranking as motivation for their method. From this we can
each feature are then used to rank the features. Also, two
inferred that data uncertainty could be the reason why ANN different aging databases were used in [16] for experiment. FG-
gives suboptimal result when used alone, just as proclaimed in NET containing 1002 face images of 82 persons with age
both [7& 10]. Again, two hybrid intelligence systems for ranging from 0 to 69, also, Yamaha face database containing
predicting petroleum reservoir properties were proposed in 800 males, 800 females and 8000 images with ages ranging
[9], functional networks-support vector machines (FN-SVM) from 0 to 93. The Ranking model was built based on kernel
and functional networks-type-2 fuzzy logic (FN-T2FL) to trick and bilinear regression strategy, and the parameter
improve the performance of standalone SVM and T2FL learning technique was based on EM. In [16], datasets taken
respectively. In both hybrid systems the functional network from UCI repository, the Statlib repository and LibSVM
components uses least square fitting algorithm to extract website were used for testing and comparison. Hilbert-Schmidt
relevant features from input data, and this was the core reason Independence Criterion which is based on covariance between
for improvement of these models over individual standalone variables mapped to produce kernel Hilbert spaces, were
models. employed for feature selection together with greedy backward
elimination algorithm. Experimental results shown that HSIC
B. Overview of feature selection methods performed comparably to other-state-of-the-art feature selectors
More often, real life applications of classification or such as, SVM Recursive Elimination (RFE), RELIEF, L0 –
prediction include collection of large number of norm SVM (L0) and R2W2.
attributes/features for some reasons other than mining the data,
thereby containing replicate or irrelevant features [14]. III. MUTUAL INFORMATION (MI) HYPOTHESIS AND
There are three broadly categories of models for feature FORMULATION OF ITS ESTIMATION FROM THE DATASET
selection algorithms [4, 11, 12 and 13]. The first category is the Mutual Information (MI) of two random variables is a
filter models, which are group of models that utilize statistical quantitative measurement of the amount of dependence
and probabilistic distributions of dataset attributes in order to (information) between the two random variables. Unlike
select feature subset from the input dataset. Hence they select correlation coefficient that can only handle linear dependence,
feature subset independent of any particular learning machine. MI is able to detect both linear and non-linear relationships
This independent selection enables subsequent feature
between variables, a property that made it a popular choice for
prediction by any learning machine. Feature Selection based on
feature selection [10, 17, 18, 19 and 20].
Mutual Information (MI) is an example of (unsupervised) filter
model. The second category is wrapper models, these are
optimization searched algorithms that are used together with a Formally, the MI of a pair of random variables X and Y is
particular learning machine to find the best feature subset based defined by probability density function (pdf) of X, Y and joint
on the prediction accuracy of a learning machine. Genetic variables (X, Y). If we denote the pdf of X, Y and joint (X, Y) as
algorithm (GA) and Particle Swarm Optimization (PSO) f X, fY, and f X ,Y respectively.
algorithm are examples of wrapper models. This dependent on
a particular learning machine to search the best feature subset
makes wrappers to have better prediction accuracy with f (x, y)
MI(X;Y) = ∫∫ f (x, y).log X,Y
dd (1)
relatively high computational overhead. For example, feature X,Y
f X
(x) f (y)
Y
x y
h(Y/X) is the uncertainty about Y when X is known. Again, if X can be rewritten as:
and Y are independent: h(Y/X) = h(Y) and MI(X; Y) =0.
^ ^
∧ f X j |Yi (x | yi ) p( yi )
With these definitions now let us assumed that, given a
dataset X containing N samples and d-attributes/features, our
f X jYi
( yi | xj ) = ^
(10)
f xj (x)
goal is to predict the class of these samples based on previously
observed input/output pairs. That is to estimate MI (MI(X; Y)) Then equation 6 becomes:
between X continuous variables and Y discrete variables (Y is
the classes of sample under consideration). ^ ^ ^ ^
^ ∧ f X j |Yi (x | yi ) p(yi ) f X j |Yi (x | yi ) p(yi )
More precisely, our objective here is to evaluate MI (Xj; Y), for h(Y / X j ) = ∫ fX j (x)∑ ^
log ∧
dx (11)
j=1, 2,…, d attributes. Assuming that Y takes k different xj f xj (x) f x j (x)
discrete values y1………,yk. Each yi is represented by ni
samples (where ∑i ni =N). Thus, probability that Y = yi is This explained how the MI can be entirely determined by the
^
denoted by p ( y i ) = n i . Now the entropy of Y is: pdf of the variables X; and possibly limited to the points with
N a particular class label yi.