0% found this document useful (0 votes)
21 views14 pages

Production Prediction ML

Uploaded by

Tamires Soares
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Production Prediction ML

Uploaded by

Tamires Soares
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Petroleum Science 20 (2023) 295e308

Contents lists available at ScienceDirect

Petroleum Science
journal homepage: www.keaipublishing.com/en/journals/petroleum-science

Original Paper

A systematic machine learning method for reservoir identification and


production prediction
Wei Liu a, *, Zhangxin Chen a, Yuan Hu b, Liuyang Xu c
a
Department of Chemical and Petroleum Engineering, University of Calgary, AB, Canada
b
Rockeast Energy Ltd., Calgary, AB, Canada
c
Jilin Oilfield, CNPC, Changchun, Jilin, 130000, China

a r t i c l e i n f o a b s t r a c t

Article history: Reservoir identification and production prediction are two of the most important tasks in petroleum
Received 15 February 2022 exploration and development. Machine learning (ML) methods are used for petroleum-related studies,
Received in revised form but have not been applied to reservoir identification and production prediction based on reservoir
1 September 2022
identification. Production forecasting studies are typically based on overall reservoir thickness and lack
Accepted 5 September 2022
Available online 9 September 2022
accuracy when reservoirs contain a water or dry layer without oil production. In this paper, a systematic
ML method was developed using classification models for reservoir identification, and regression models
Edited by Yan-Hua Sun for production prediction. The production models are based on the reservoir identification results. To
realize the reservoir identification, seven optimized ML methods were used: four typical single ML
Keywords: methods and three ensemble ML methods. These methods classify the reservoir into five types of layers:
Reservoir identification water, dry and three levels of oil (I oil layer, II oil layer, III oil layer). The validation and test results of these
Production prediction seven optimized ML methods suggest the three ensemble methods perform better than the four single
Machine learning ML methods in reservoir identification. The XGBoost produced the model with the highest accuracy; up
Ensemble method
to 99%. The effective thickness of I and II oil layers determined during the reservoir identification was fed
into the models for predicting production. Effective thickness considers the distribution of the water and
the oil resulting in a more reasonable production prediction compared to predictions based on the
overall reservoir thickness. To validate the superiority of the ML methods, reference models using overall
reservoir thickness were built for comparison. The models based on effective thickness outperformed the
reference models in every evaluation metric. The prediction accuracy of the ML models using effective
thickness were 10% higher than that of reference model. Without the personal error or data distortion
existing in traditional methods, this novel system realizes rapid analysis of data while reducing the time
required to resolve reservoir classification and production prediction challenges. The ML models using
the effective thickness obtained from reservoir identification were more accurate when predicting oil
production compared to previous studies which use overall reservoir thickness.
© 2022 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This
is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction Andrew, 2002). Traditional reservoir simulators require a fixed set


of parameters, some of which are very difficult to obtain (e.g., skin
Big data of reservoir and oil production is exponentially factor, compression index and capillary force) and other useful
expanding. The traditional methods employed to identify reser- parameters cannot be incorporated into these simulators (e.g., dy-
voirs and predict their production cannot efficiently use historical namic oil level). The combination of inaccurate data and missing
information and new data. Geologists conduct reservoir identifi- information results in miscalculations being used for prediction. In
cation based on large amounts of geophysical data and numerical the era of big data, it is increasingly necessary to develop an
simulators cannot take full advantage of all the reservoir informa- effective and reliable technique that will maximize the benefits of
tion due to model scaling (Rodríguez et al., 2014; Siddiqi and data explosion while making full use of massive reservoir data,
which will help resolve reservoir identification and production
prediction validity challenges.
* Corresponding author. Machine learning (ML) as a subdivision of artificial intelligence
E-mail address: wei.liu2@ucalgary.ca (W. Liu).

https://doi.org/10.1016/j.petsci.2022.09.002
1995-8226/© 2022 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC BY license (http://
creativecommons.org/licenses/by/4.0/).
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Nomenclature kavg The mean of permeability values


Ldo A dynamic oil level
AC Acoustic log Lp Distance between samples
AI Artificial intelligence l Loss function
ANN Artificial neural network n Number of classifications
CART Classification and regression tree Rn Feature space
DL Dry layer Srw The residual water saturation
DT Decision tree T Thickness of the predicted IO and IIO
FN False negative Tr Real thickness of IO and IIO
FP False positive To Overall reservoir thickness
GR Gamma ray T0 Number of leaf nodes
GBDT Gradient boosting decision trees Vk A permeability variation coefficient
KNN k-nearest neighbors Wr The water content ratio in the first month
LLD Deep laterolog w Score on leaf node
LR Logistic regression whj Weight matrix between a hidden layer and the
MAE Mean absolute error output layer
ML Machine learning wih Weight matrix between the input layer and a hidden
PER Permeability layer
POR Porosity x Original sample parameter
R2 Correlation coefficient x' New parameter
RF Random forest Xmin Minimum value of sample
RMSE Root mean squared error Xmax Maximum value of sample
SP Spontaneous potential Yj Output layer
Sw Water saturation b
yi Prediction
TN True negative yi Observation
TP True positive yk True value in GBDT
WL Water layer yobs
i Observed data
XGB XGBoost ypred
i
Predicted data
IO I oil layer favg Mean porosity value for each well
IIO II oil layer qh Threshold matrix associated with a hidden layer
IIIO III oil layer qj Threshold matrix associated with the output layer
Day The number of production days U Regularization term
Fk(x) Predicted value in GBDT g Parameter to control the regularization
fk Tree structure l Parameter to control the regularization
Hh Hidden layers
hq ðxÞ Prediction function of LR

(AI) has been applied in various fields with a positive impact for existing wells (Awoleke and Lane, 2011; Van and Chon, 2018). To
many years. Practical applications of ML techniques have been forecast the oil production, well log data related to reservoir
widely investigated in petroleum engineering, including reservoir geological characteristics and dynamic operation data have been
characterization (Anifowose et al., 2017; Chaki et al., 2018), pre- used to build a representative prediction model. The RF technique
diction of reservoir properties (Helmy et al., 2013; Anifowose et al., has been applied to geological and geochemical data for lithology
2015; Priezzhev and Stanisalav, 2018) and production prediction identification by previous researchers. It outperforms other ML
(Chakra et al., 2013; You et al., 2019). Some studies have applied ML algorithms in this area (Harris and Grunsky, 2015; Cracknell and
techniques to petroleum geology (Raeesi et al., 2012; Merembayev Reading, 2012). In addition to the classification performance, this
et al., 2018) where several input parameters are selected related to technique provides reliable predictions for geological mapping
geological characteristics of a reservoir and its operating condi- applications (Radford et al., 2018).
tions. To predict lithofacies or reservoir properties (e.g., porosity Reservoir identification is the fundamental work necessary for
and permeability), related well log data has been used to train a production forecasting. Most previous ML studies about reservoirs
predictive model. After learning the underlying relationship be- were focused on lithofacies classification or reservoir properties
tween input variables and an output target, this data-driven model and very few mention reservoir identifications, let alone the com-
is finally applied to forecast specific lithofacies or reservoir bination of reservoir identification and production prediction.
properties. Historically, the prediction of oil production leaned heavily on the
In the previous research, several ML techniques have been overall thickness of reservoir as a key data input (Guo et al., 2021).
introduced to solve classification and regression problems in pe- This technique oversimplified the reservoir which contains not only
troleum engineering and geology. Among them, artificial neural the good oil layer, but other layers (water, dry and poor oil) that do
network (ANN) and random forest (RF) are commonly used due to not or cannot produce oil. There is no proven direct relationship
their remarkable performance. The ANN technique has been between the overall thickness of reservoir and the volume of oil
applied to lithology identification and recognition with well log production. For example, other conditions being equal, a 20-m-
data (Ren et al., 2019; Kamenski et al., 2020) using the back prop- thick reservoir A is theoretically predicted to produce more oil than
agation neural network (BPNN) to find patterns that identify the 10-m-thick reservoir B. In fact, if reservoir A is mainly composed of
lithology. This technique has been used to predict oil production of water and dry layers and reservoir B is formed by almost all good oil

296
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

layers, the production of reservoir B will be better than that of


reservoir A. The heavily used method of predicting reservoir pro- x0 ¼ ðx  Xmin Þ=ðXmax  Xmin Þ (1)
duction based on the overall thickness of reservoir is not accurate.
To solve this problem, before predicting the production in this where x is an original sample parameter; x' is the new parameter;
study, the reservoir was first classified by the five types of layers Xmin is the minimum value of the sample; and Xmax is the maximum
found within a reservoir: water, dry and three different levels of oil value of the sample.
layers. Among these, I oil layer (IO) and II oil layer (IIO) were
defined as effective reservoirs producing industrial oil flows that 2.3. Approach
are perforated during production. In the subsequent production
prediction, the thickness of IO and IIO (effective thickness) was The objective of this study was to provide an advanced and
used as an important variable instead of the overall reservoir alternative approach that accurately identifies a reservoir and
thickness commonly used by predecessors. predicts oil production reliably. These two prediction tasks are
In this study, several ML methods were used and compared to supervised learning problems as the samples have input features
identify effective reservoirs in oilfields. To further predict their and corresponding outputs. According to the types of predicting
production, this study implemented the prediction of cumulative results, reservoir identification is a supervised classification prob-
production for new wells and existing wells. This is different from lem since its reservoir type as the output value is a discrete value.
most previous studies on oil production which focused on subse- The production prediction is a supervised regression problem
quent production and production decline rates of existing wells. because production is a continuous value as output. For a classifi-
The prediction of effective reservoirs and other production vari- cation problem (reservoir identification), seven classifiers are
ables were used to train a predictive model for production. In this selected: LR, KNN, DT, ANN, RF, GBDT, and XGB. For a regression
way, an integrated ML system was developed that integrates the problem (production prediction), ANN and XGB were selected to
whole industrial process from the reservoir identification to pre- show their predictive results due to their better performance when
diction of oil production, increasing the accuracy and efficiency of compared to other ML methods. These classifiers are briefly
production prediction by making full use of the obtained results reviewed below.
from the reservoir identification process. Moreover, two reference
models were built to compare and prove that the prediction results 2.3.1. Logistic regression
from the reservoir identification process were reliable enough to be LR (Cox, 1958) is a regression analysis method where a depen-
used in production models. In reference model I (RM I), real data for dent variable is categorical and used for binary classification. It can
effective thickness (thickness of IO and IIO) was fed into ML pro- be generalized to multiclass problems. As shown in Eq. (2), LR uses
duction models to compare with the production models based on a nonlinear sigmoid function for classification prediction:
reservoir identification models. Reference model II (RM II) used the
1
overall reservoir thickness to compare the prediction result with gðzÞ ¼ (2)
1 þ ez
production models using effective thickness.
A ML model was used to predict and classify potential reservoirs Assume that the eigenvector influencing the prediction result is
into several known reservoir types according to the selected input x ¼ ð1; x1 ; x2 ; /; xn Þ and the regression coefficient is q ¼
features. Then, the predictive results from the reservoir identifica- ðq0 ; q1 ; q2 ; /; qn Þ; then we see that
tion were fed into an ML model to predict oil production. The full
use of reservoir information increased the prediction accuracy of X
n

the production models. This systematic ML method for reservoir


q0 þ q1 x1 þ q2 x2 þ ::: þ qn xn ¼ qi xi ¼ qT x (3)
i¼1
identification and prediction of oil production reduces required
human resources and thus reduces the volume of human errors. We construct the prediction function as:
  1
T
2. Methodology hq ðxÞ ¼ g q x ¼ (4)
1 þ eq
T
x

2.1. The systematic ML method


If q is known, hq(x) can be used to calculate eigenvector x. If the
result is greater than 0.5, it is classified as 1; otherwise, it is clas-
Fig. 1 is the flowchart illustrating the procedures of the study.
sified as 0.
Firstly, in the reservoir classification process, seven ML models
(logistic regression (LR), k-nearest neighbors (KNN), decision tree
2.3.2. k-nearest neighbors
(DT), ANN, RF, gradient boosting decision trees (GBDT) and XGBoost
The classification rule of KNN is: the label of an unclassified
(XGB)) were compared to determine the best method for reservoir
sample point is determined by the label of the maximum class in
classification. Then the predicted thickness of the effective reser-
the nearest k neighboring points (Liu et al., 2022). In this paper, the
voir combined with other production features were fed into the
Minkowski distance Lp was used as the metric for measuring the
production prediction process. In this process, after comparing all
distance between two sample points. Suppose that the feature
the classification processes, 2 ML models (ANN and XGBoost) were  1 T
selected to predict the production. After training and testing, the space is Rn, xi ; xj 2Rn , xi ¼ xi ; x2i ; /; xni ; , xj ¼
T
prediction results were compared to two reference models. ðx1j ; x2j ; /; xnj ; Þ . The distance between xi and xj in Lp is defined as
follows:
2.2. Pre-processing
!1
  n 
X p
 l
In the process of the ML technique, various parameters (inputs) L p xi ; xj ¼ xi  xj l jp (5)
fed into the models had different dimensional units that affected l¼1
the data analysis results. To eliminate the interference between
dimensional units, data preprocessing using normalization (Eq. (1)) where p is a parameter to determine the distance type; xj is a
was required to calculate different parameters (Raschka, 2015): training sample point; and xi is a point needed to predict the output
297
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Fig. 1. Flowchart of the research progress.

class. Therefore, p and k are two significant hyper-parameters when !


building a KNN model. X 1
Hh ¼ f wih ,X i  qh ¼ P (7)
qh  wih ,X i
i
1þe i

P
! whj ,H h qj
2.3.3. Decision tree X e h
Classification and regression tree (CART) is the most popular Yj ¼ f whj , H h  qj ¼ P (8)
P
n whj ,H h qi
algorithm used to build a DT. CART (Breiman et al., 1984) develops a h
e h

Gini index to choose a feature for splitting a tree. The Gini index i¼1
reflects the probability that two samples are randomly selected
from a data set and their labels (classes) are different. A lower Gini where wih is the weight matrix of the node connections between
index indicates greater purity of the data set. the input layer and a hidden layer; whj is the weight matrix of the
If the sample set D is split into D1 and D2 using the discrete node connections between a hidden layer and the output layer; qh
feature A, the Gini index calculated after splitting is defined as: is the threshold matrix associated with a hidden layer; qj is the
threshold matrix associated with the output layer; and n is the
jD1j jD2j number of classifications.
Gain_GiniðD; AÞ ¼ GiniðD1Þ þ GiniðD2Þ (6) In Eq. (9), a ReLU activation function was applied in this study to
jDj jDj
solve a regression problem:
Therefore, Gain_GiniðD; AÞ is the uncertainty after splitting. A 
smaller Gain_GiniðD; AÞ value is preferred because this provides x if x > 0
ReLUðxÞ ¼ (9)
greater purity for a data set. 0 if x  0

2.3.4. Artificial neural network 2.3.5. Random forest


ANN is a computing system that imitates the working of the RF is an ensemble learning method. It establishes an advanced
human brain in learning patterns from experience and processing model based on a bagging technique (Breiman, 1996) and a random
data to solve classification and regression problems. ANN is feature selection technique (Ho, 1998). Bagging is a method used for
comprised of an input layer, hidden layers Hh and an output layer Yj. sampling randomly with replacement and helps to generate several
The hidden layers need to be artificially set according to an actual new single trees to reduce variance. For example, to solve a reser-
situation so that a model can achieve the best prediction outcome. voir identification problem by RF, each tree provides the prediction
For a classification problem in this study, the activation function of a possible reservoir type. The final decision depends on the
used in a hidden layer is a Sigmoid function (Eq. (7)), while the majority voting from single trees. The random feature selection
activation function used in the output layer is a Softmax function technique is useful to make all the trees uncorrelated and further
(Eq. (8)) to solve a multi-class problem: reduce the variance of prediction.
298
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

2.3.6. Gradient boosting decision trees et al., 2013). In this study, to deal with a reservoir
Different from RF, GBDT is an ensemble ML tool using a boosting identificationeclassification problem and a production
technique, which sequentially generates base models and improves predictioneregression problem separately, two sets of metrics
the predictive power of the ensemble through incremental mini- were selected and introduced below.
mization of residual errors in each iteration of construction of a
new base model (Brown and Mues, 2012). While building a clas-
sification model in GBDT, samples that are misclassified in a pre- 2.4.1. Classification problems
vious base model are more likely to be assigned an increased Before discussing the metrics for a classification problem, it is
weight in the next step. The new model has improved prediction necessary to introduce four elements in a binary problem: true
accuracy compared to a previous model. A loss function is used to positive (TP), false negative (FN), false positive (FP) and true
measure the difference between the predicted Fk ðxÞ and true values negative (TN). The ‘true’ and ‘false’, respectively, mean whether a
yk to indicate how well a model fits the data. prediction is correct or not compared to real data. The ‘positive’ and
In a GBDT algorithm for a multi-class problem, the loss function ‘negative’ represent whether a prediction class is the same as a
is (Friedman, 2001): specific class or not. To assess the predictive performance of the
above classifiers and select the corresponding optimal models, a
  X
K
L fyk ; Fk ðxÞgK1 ¼  yk log pk ðxÞ (10) matrix of accuracy, precision, recall and f1 score are evaluated.
k¼1
Accuracy is based on the ratio of correctly predicted samples to the
total samples. Precision is defined as the ratio of correct positive
where yk ¼ 1 (class ¼ k) 2 {0,1}, pk ðxÞ ¼ P (yk ¼ 1jx), and predictions. Recall is the ratio of actual positive results correctly
, predicted. f1 score is the harmonic mean of the precision and recall.
X
K The higher the f1-score value the better precision and recall.
pk ðxÞ ¼ expðFk ðxÞÞ expðFl ðxÞÞ (11)
l¼1 TP þ TN
accuracy ¼ (14)
TP þ TN þ FP þ FN

2.3.7. XGBoost TP
precision ¼ (15)
XGB is one of the most popular methods in the ensemble ma- TP þ FP
chine learning category today. It performs very well in multiple
programming competitions like Kaggle (Chen and Guestrin, 2016). TP
XGB is a machine learning technique for classification and regres- recall ¼ (16)
TP þ FN
sion problems. It produces a prediction model in the form of an
ensemble of weak prediction models, typically decision trees. Based
1 precison,recall
on the concept of GBDT, XGB uses a regularized model formaliza- f1 ¼ 2 1 ¼2 (17)
=recall þ 1
=precison precison þ recall
tion to control over-fitting, which leads to better performance.
Distinct from GBDT, the objective function of XGB consists of a
loss function and a regularization term (Chen et al., 2016):
X X
L¼ lð b
y i ; yi Þ þ Uðfk Þ (12)
i k 2.4.2. Regression problems
In this study, the following measurements were applied to
1 substantiate the statistical accuracy of the performance of ANN and
Uðf Þ ¼ gT 0 þ lkwk2 (13) XGB for production prediction: RMSE, MAE, and R2. The RMSE is a
2
measure of the spread of actual values around the average of the
where l is a loss function as in GBDT, and measures a difference predicted values. It computes the average of the squared differ-
between prediction b y i and observation; A regularization term U is ences between each predicted value and its corresponding actual
added in the objective function to control over-fitting and value. It is expressed as:
contribute to better performance and flexible complexity; fk rep- vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
resents a specific tree structure; T 0 and w denote the number of leaf u n  
u1 X pred 2
nodes and the score on each node respectively; g and l are pa- RMSE ¼ t yobs
i
 yi (18)
n i¼1
rameters to control the regularization.

2.4. Evaluation metrics where yobs


i is the observed data; ypred
i
is the predicted data; and n is
the number of data points. The MAE is a statistical measure of
Evaluation metrics are essential to measure the quality of a ML dispersion. It is computed by taking the average of the absolute
model. It is worth noting that different types of evaluation metrics errors of the predicted values relative to the actual values. It is given
are applicable to different tasks and emphasize different aspects of by:
a model's performance (Andika and Chandima Ratnayake, 2019).
For example, accuracy is often selected for its easy-to-use scoring Pn  obs 
pred 
i¼1 yi  yi 
and flexibility for multiclass problems (Hossin and Sulaiman, 2015). MAE ¼ : (19)
The root mean squared error (RMSE), mean absolute error (MAE), n
and correlation coefficient (R2) measures are popular for their R2 assesses the quality of a model prediction by observing the
effectiveness and common use to solve regression problems in difference between predicted data and actual data (Nash and
many petroleum applications (Abdulraheem et al., 2007; Chakra Sutcliffe, 1970). It is expressed as:

299
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

amongst the methods. These two methods were employed to


Pn  
pred 2 forecast the oil recovery performance for a series of producing
yobs
i  yi
i¼1
2
R ¼1 (20) wells. ANN and XGB models were respectively constructed with
Pn  obs 2
i¼1 yi  yobs one output variable which is the single well's first five months
i
cumulative oil production. Eight input variables were used: 1)
T ¼ the total thickness of IO and IIO (effective thickness) obtained
where yobs is the average value of the observed data.
i from the results of reservoir identification, 2) favg ¼ the mean
porosity value for each well, 3) kavg ¼ the mean of permeability
3. Case studies values, 4) Vk ¼ a permeability variation coefficient (Eq. (21)), 5)
Srw ¼ the residual water saturation, 6) Wr ¼ the water content ratio
3.1. Reservoir identification in the first month, 7) Ldo ¼ a dynamic oil level, and 8) Day ¼ the
number of production days. To be as representative as possible, favg
The data was acquired from a public domain of China National and kavg are weighted averages in T. Table 2 shows the unit and
Petroleum Corp. (CNPC). It is comprised of logging data and layer statistical descriptions of input features in prediction of oil
thickness (eight features) from 124 wells. A total of 2800 samples production.
(single layers) made up the dataset. Seven ML models (LR, KNN, DT,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ANN, RF, GBDT and XGB) were constructed involving one output P n .
variable reservoir classification for each single layer and a total of ðKi  KÞ2 n
i¼1
eight input features (see description in Table 1 below) including Vk ¼ (21)
LLD, GR, AC, SP, POR, PER, SW and the thickness of each layer. Table 1 K
shows details and statistical descriptions of input features used for
reservoir identification. The output reservoir classification includes where Ki is the permeability of a single layer; K is the average
five classes: 1) Dry layer (DL) - no oil, gas or formation water, 2) permeability of all layers; and n is the number of layers in the well.
Water layer (WL) - only containing formation water, and 3) three All 124 well records were subjected to the pre-processing stage
levels of oil layer - IO, IIO and III oil layer (IIIO). Among these three mentioned above. They were all used in the training and testing
types of oil layers, IO and IIO were defined as effective reservoir for phases of ANN and XGB. The best constructions of ANN and XGB
their high and medium industrial value and IIIO is defined as a models were determined by a grid search. Two thirds of the original
worthless reservoir due to its poor industrial value as assessed by data were used as a training data set. The remaining one third was
CNPC. employed as the testing data set. Finally, to verify whether the
After data pre-processing, the samples were randomly split into prediction result in Section 3.1 was reliable enough to be used for
a training set of 2500 samples and a testing set of 300 samples. The production prediction and to show its superiority compared to
training set is used to develop a model to perform a single layer previous studies, the final prediction results were compared to two
classification and the testing set is applied to the trained model to reference models. In RM I, Tr is the thickness of real IO and IIO. In
estimate how well the model has been trained. A hyper-parameter RM II, To is the overall reservoir thickness.
tuning process and 10-fold validation were used in the seven
classifiers to choose the best combination of hyper-parameter 4. Results and analysis
values for each model. In this study, Grid Search is used to find
the optimal hyper-parameters of a model that results in the most 4.1. Comparative analysis of classification models for reservoir
accurate predictions. Grid Search is a function that comes in Scikit- identification
learn's model_selection package. Firstly, the values of hyper-
parameters were passed to the Grid Search function by defining a 4.1.1. Results of training and validation
dictionary containing a particular hyper-parameter along with the In a hyper-parameter tuning process, accuracy is used as the
values it can take. Then Grid Search tries all the combinations of the metric to measure the model performance. Table 3 shows the
values passed in the dictionary and evaluates the model for each optimal hyper-parameter combination of each classification model
combination using the 10-fold validation method. After using this builds the best predictive model.
function, the optimal combination of hyper-parameters with the Accuracy, precision, recall and f1-score were used to evaluate
highest prediction accuracy can be selected. the performance of the seven ML methods using the 10-fold-cross-
validation. In Fig. 2, a box-and-whisker plot was utilized to assess
3.2. Prediction of production the statistical dispersion of each classifier's accuracy on all folds
based on their optimal hyper-parameters, respectively. Single
In this case study, five different ML methods (DT, ANN, RF, GBDT classifiers showed reasonable performance in terms of average
and XGB) were compared for the prediction of production. After value of accuracy e ANN (82.61%), KNN (82.58%), DT (80.87%) and
comparison, ANN and XGB demonstrated the best performance LR (73.89%). XGB, RF and GBDT had the top three average accuracies

Table 1 Table 2
Description of input features in reservoir identification. Description of input features in production prediction.

Feature Nomenclature Unit Min Mean Max Feature Unit Min Mean Max

LLD Deep laterolog Um 8.2 58.5 24,430 T m 3.4 39.9 107.5


GR Gamma ray API 23.2 68.1 127.6 favg % 5.4 12.1 17.5
AC Acoustic log ms/m 169.1 236.3 411.5 kavg mD 0.7 30.2 533.4
SP Spontaneous potential mV 157 15.9 28.2 Vk / 58.2 0.1 936.6
POR Porosity % 0.1 10.3 23.3 Srw % 30.3 48.7 66.3
PER Permeability mD 0.01 14.7 1976.9 Wr % 0.5 23.9 100
Sw Water saturation % 0.1 76.2 291.6 Ldo m 70.1 1518.3 2243.5
Thickness Thickness of each layer m 0.7 3.2 40.8 Day day 6 109 145

300
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Table 3
Tuned optimal hyper-parameter values of seven classification methods.

Classification method Tuned hyper-parameter Optimal hyper-parameter setting

LR Penalty parameter determining the strength of regularization (penalty) L2


KNN The number of neighbors (k) 7
The number used to calculate distance (p) 6
DT The maximum number for tree depth (max depth) 10
The minimum number of samples required to split an internal node (min samples split) 5
ANN Learning rate 0.01
Maximum number of learning iterations (max iter) 500
Solver for weight optimization (solver) Adam
RF The number of trees in the forest (n estimators) 200
The maximum depth of the individual estimators (max depth) 7
GBDT Learning rate 0.3
The number of estimators in a model (n estimators) 500
The maximum depth of individual estimators (max depth) 15
XGB Learning rate 0.1
The number of estimators in a model (n estimators) 500
The maximum depth of individual estimators (max depth) 10

Table 5
Number of samples in each type of layer.

DL WL IO IIO IIIO

Training set 739 506 628 554 73


Testing set 79 74 72 66 9

class is likely to be misclassified as another kind of reservoir


because there were very few original samples labeled as class IIIO.
Table 5 shows the number of samples used for each type of reser-
voir during the training and testing process. The IIIO class reservoir
makes up about 10% of the samples compared to the other types of
reservoirs. The lack of samples leads to poor learning performance
for every ML method. The overall performance of three ensemble
Fig. 2. Box plots for accuracy of seven ML methods from 10-fold cross validation. ML methods was better than the other four single methods for
identification of the five reservoir classes. After ensemble methods,
(91.74%, 89.36%, and 89.25%, respectively). This is interpreted as ANN was the best classifier as a single method, but its precision of
validation of the ensemble classification techniques (RF, GBDT and IIIO was only 0.58 and prediction performance of other reservoir
XGB) to generally produce better results than single classifiers (LR, types were all much weaker than the ensemble classifiers. As the
KNN, DT and ANN). Despite some outlier points, the accuracy value effective reservoir, the identification of IO and IIO is more impor-
of GBDT and XGB had the least variance and the highest stability. tant than that of other reservoir classes. XGB had the best perfor-
XGB had the highest average accuracy. mance with all metric values  0.92 in the IO class identification
Table 4 shows the precision, recall and f1-score of each reservoir and up to 0.86 in the IIO class identification, followed by GBDT. The
class for seven ML methods. In the identification of the five reser- results of Fig. 2 and Table 4 indicate that the ensemble methods
voir classes, DL and IO had the highest prediction accuracy. The IIIO

Table 4
Precision, recall and f1-scores for 10-fold cross validation for seven classifiers.

Method Output reservoir Precision Recall f1-score Method Output reservoir Precision Recall f1-score

LR DL 0.82 0.87 0.84 KNN DL 0.87 0.90 0.89


WL 0.76 0.57 0.65 WL 0.78 0.68 0.73
IIIO 0.40 0.21 0.28 IIIO 0.45 0.60 0.51
IIO 0.69 0.72 0.70 IIO 0.77 0.78 0.77
IO 0.82 0.83 0.82 IO 0.88 0.85 0.85
DT DL 0.85 0.81 0.83 DT DL 0.88 0.90 0.89
WL 0.61 0.68 0.64 WL 0.82 0.74 0.78
IIIO 0.38 0.25 0.30 IIIO 0.58 0.64 0.61
IIO 0.74 0.76 0.75 IIO 0.77 0.81 0.79
IO 0.86 0.82 0.84 IO 0.89 0.88 0.88
RF DL 0.88 0.94 0.91 GBDT DL 0.89 0.93 0.91
WL 0.83 0.70 0.80 WL 0.81 0.77 0.79
IIIO 0.73 0.68 0.70 IIIO 0.80 0.76 0.78
IIO 0.80 0.84 0.82 IIO 0.82 0.85 0.83
IO 0.92 0.88 0.90 IO 0.92 0.91 0.91
XGB DL 0.90 0.94 0.91
WL 0.83 0.84 0.83
IIIO 0.82 0.81 0.82
IIO 0.84 0.86 0.85
IO 0.93 0.92 0.92

301
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Fig. 3. Confusion matrix plots of single methods on the test dataset: (a) LR model; (b) KNN model; (c) DT model; (d) ANN model.

achieved better classification results compared to the other four was up to 99%. This was interpreted to mean the probability of
single methods. Among the ensemble methods, XGB and GBDT are being predicted as another type of reservoir (not an IO or IIO
preferred for their higher accuracy, precision, recall and f1-scores. reservoir) was only 1%. The probability of the real IIO reservoir
being predicted as the IIO reservoir was 90% and the probability of
4.1.2. Results of testing being predicted as the IO or IIO reservoir was 95%, which means the
Figs. 3 and 4 present the reservoir classes that were correctly probability of being predicted as another type of reservoir (any
classified or misclassified in the test dataset for single models and reservoir except for IO or IIO) was only 5%. Although the prediction
ensemble models. The testing result showed that LR, KNN and DT accuracy of the XGB model for each type of reservoir was 92%, the
predict DL and IO with an accuracy greater than 0.8. ANN provided prediction success rate for effective reservoirs (IO and IIO) was very
the best identification performance for a single method, but the high - up to 99%. Using the feature importance method, Fig. 5 shows
prediction accuracy for each reservoir was still lower than the importance score ranking of different input features calculated by
ensemble methods. This was consistent with the results from a 10- XGB. Sw is the most valuable feature for reservoir identification.
fold cross validation. In the confusion matrices below, XGB is the POR and PER were the second and third most relevant features
optimal ML method for overall performance. It identified IO, IIO and contributing to accurate prediction. Thickness of reservoir is the
DL with 0.9 accuracy. Even training with very few samples in IIIO least important feature in this case.
class, XGB still predicted this reservoir with an accuracy up to 0.76. To train and test the predictive model for reservoir identifica-
The prediction of XGB was selected as the final reservoir identifi- tion, the ANN model needs several minutes to finish the work,
cation result from all the ML methods. while the other six ML models require several seconds. By contrast,
The confusion matrix result of XGB (Fig. 4c) shows the proba- traditional methods used to identify a reservoir could be much time
bility of a real IO reservoir being predicted as the IO reservoir was consuming and require additional manpower. An experienced
92% and the probability of being predicted as the IO or IIO reservoir geologist requires days or even weeks to complete the

302
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Fig. 4. Confusion matrix plots of ensemble methods on the test dataset: (a) RF model; (b) GBDT model; (c) XGB model.

minutes for a highly accurate reservoir identification completely


supersedes traditional methods.

4.2. Analysis of regression models for production forecasting

4.2.1. ANN model


The ANN model configuration was set to have two hidden layers
or three hidden layers, each with six nodes per layer. In Fig. 6, the
loss shown is the value of MAE plotted per number of epochs for
each ANN model configuration in the training and validation pha-
ses. The loss stabilized after 800 epochs and there was little further
decrease in the mismatch between ANN prediction and real target
Fig. 5. Importance score rank of input features in reservoir identification.
values. To increase the accuracy of the ANN prediction, increasing
the number of hidden layers was necessary to learn more about the
identification of thousands of reservoirs and the accuracy may not relationship between the input variables and the output target.
reach 99% due to the human error. Other traditional methods based A new configuration for ANN was built with four hidden layers,
on logging data need a lot of calculations, which are time each with six nodes, running 1000 epochs to train the model and
consuming and often as not as accurate as the identification estimate the target. In Fig. 7, the loss curve in the training data
determined by a geologist. The ML method's speed of seconds or declines continuously and the loss of validation data begins to in-
crease after 100 epochs. This challenge is called overfitting. The
303
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Fig. 6. Loss vs. epoch curve in the training data and the validation data: (a) Two hidden layered ANN model; (b) Three hidden layered ANN model.

The R2 of three models in training and testing process is compared


in Fig. 8. For the ANN model, the configuration with 100 epochs
provided a very reliable performance for estimating oil production
using predicted reservoir information in the training process,
where R2 was 0.879, MAE was 174.183 ton and RMSE was 246.347
ton. The testing result was satisfied with R2 being 0.795, MAE being
258.414 and RMSE being 321.711. To verify that the predicted
effective reservoir thickness could replace the true data, the
training and testing performance of RM I (using the real effective
reservoir thickness) was compared to the ANN model. In the
training and testing sets, the performance of RM I was only slightly
better than the ANN model in three metrics. The negligible differ-
ence between these two models proved the practicability of the
predicted effective reservoir. By contrast, RM II using the overall
reservoir thickness instead of effective reservoir thickness had a
Fig. 7. Loss vs. epoch curve of four hidden layered ANN model in the training data and weaker prediction performance than the ANN model in the training
the validation data. and testing processes. In the RM II testing set, R2 was at 0.704, MAE
at 356.315 ton and RMSE at 399.703 ton, which meant RM II
accounted for 70.4% of the production variance in the research area
Table 6
Model results of ANN and two reference models.
and on average there was more than 350-ton uncertainty in the
prediction of first 5 months cumulative oil production for each well.
Metrics Training Testing
Therefore, the ANN model using predicted effective reservoir
ANN RM I RM II ANN RM I RM II thickness was applicable for its similar performance compared to
RMSE, ton 246.347 231.916 298.038 321.711 315.228 399.703 RM I and was much better than RM II with higher prediction ac-
MAE, ton 174.183 166.374 235.917 258.414 250.341 356.315 curacy. Using R2 as the metric of accuracy for the testing process,
R2 0.879 0.895 0.822 0.795 0.801 0.704 the prediction accuracy of the ANN model with effective reservoir
thickness was 13% higher than that of RM II.

model learns a great degree of error or random noise within


training data and then its predictive power is reduced. Finally, the 4.2.2. XGB model
construction with four hidden layers, each with six nodes and To determine the best combination of all hyper-parameter
running 100 epochs were selected for the ANN model to forecast values in an XGB model, a grid search was used in this study. A
the oil production. limited number of values for each hyper-parameter were selected
Table 6 illustrates the model performance of ANN and two because it is not feasible to try the entire range of possible values. In
reference models (RM I and RM II) in the prediction of production. the XGB model, three hyper-parameters were examined: (1) a
learning rate, (2) the number of estimators in the model, and (3) the
maximum depth of the individual regression estimators. In the
learning rate, a value range of [0.1, 1], with 0.1 as the distance be-
tween two adjacent values, was assessed. For the number of esti-
mators, a value range of [10, 50, 100, 250, 500] was evaluated, while
a range of [1, 10], with 1 as the spacing between values was
examined for the maximum depth. After going through all the
possible combinations of the three hyper-parameters, the XGB
model was built with a learning rate of 0.1, 50 estimators and a
maximum depth of 4.
Fig. 8. Comparison of prediction performance (R2) of ANN and two reference models. Table 7 and Fig. 9 show the prediction performance of the XGB
304
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Table 7 the training set (0.9986) and testing set (0.8575) prove the supe-
Model results of ANN and two reference models. riority of XGB in the prediction of production compared to ANN.
Metrics Training Testing Since XGB performed better than ANN for prediction of oil
XGB RM I RM II XGB RM I RM II
production, the importance score of features in this case was
calculated based on the prediction of XGB. Fig. 11 shows the
RMSE, ton 26.04 25.881 69.475 237.753 234.792 298.924
importance score of different features using feature importance
MAE, ton 18.807 18.094 57.153 189.962 187.028 257.936
R2 0.999 0.999 0.913 0.857 0.861 0.782 method for the XGB, RM I and RM II models. All three models had
the same top 3 important features. Ldo was the most valuable
feature for the prediction of cumulative oil production; this cannot
be used in traditional simulations. Day and Wr were the second and
third most important features contributing to the prediction. In the
XGB model and RM I, T and Tr were the fourth most important
features and had very little difference. To in RM II was the least
important feature. To contributed less to the prediction of cumu-
lative oil production compared to T and Tr. The result was consistent
with the prediction performance of XGB model and reference
models.
From the algorithm perspective, it makes sense that XGB per-
formed better than ANN in these case studies. The two methods are
important and widely used in data science research and by in-
dustry. These different machine learning methods perform differ-
Fig. 9. Comparison of prediction performance (R2) of XGB and two reference models.
ently for different types of tasks. ANN captures image, voice, text
model and its two reference models in three different metrics. XGB and other high-dimensional data by modeling a spatiotemporal
had a very reliable performance estimating oil production with the location. The tree-based XGB handles tabular data well and has
predicted reservoir data in the training and testing processes, with some features that ANN does not have, such as interpretability of a
R2 being 0.999 for training and R2 being 0.857 for testing. Like the model, easier hyper-parameter tuning and a faster calculating
ANN model, the prediction results from training and testing of the speed. In this study, it became obvious that compared to XGB, it is
XGB model using the predicted effective reservoir thickness was tough work to find the best construction of ANN without over-
very similar to the prediction result of RM I with the real effective fitting. The calculating speed of XGB was nearly 100 times faster
reservoir thickness. RM II with overall reservoir thickness had a than ANN in the process of production forecasting. XGB needed
lower prediction accuracy compared to the XGB model in the only 2 or 3 s, while ANN required several minutes. The difference in
training and testing sets. The XGB model using predicted effective computational speed is attributed to: (1) using a backpropagation
reservoir thickness was considered reliable because of its similar process, the convergence rate of ANN is particularly slow and easily
performance compared to RM I and had a more accurate prediction falls into the local minimum (Ren et al., 2020); (2) compared to
than RM II. In test datasets, the prediction accuracy (R2) of XGB ANN, XGB has a lower number of hyperparameters to be tuned; (3)
model with effective reservoir thickness was about 10% higher than sparsity-aware split finding of XGB makes it find the optimal di-
that of RM II. rection and only non-missing observations are visited; (4) cache-
aware access and blocks for out-of-core computation make XGB
fast. Although the computing speed of ANN is slower than XGB,
4.2.3. Comparative analysis of ANN and XGB XGB and ANN models have much faster prediction speed when
Table 8 expresses the comparative performance of ANN and compared to traditional methods.
XGB. XGB is preferred because it outperforms ANN in every eval- In the case of production forecasting, ANN and XGB models
uation metric for the training data and validation data sets. The two show similar performance compared to their RM I. The results of
methods perform better in the training dataset when compared to reservoir identification from classification models are reliable and
the validation dataset. Different from ANN, the performance of the can be used in regression models for prediction of production. In
training data in XGB was significantly better than that of the vali- RM II, established ANN and XGB models provide higher accuracy for
dation data. This is because in ANN, a larger epoch can increase the predicting production. The combination of reservoir identification
learning and training accuracy, but at the same time it leads to an and production forecasting in this study was meaningful and
overfitting problem where the model learns a great degree of error valuable because the production was correlated with the thickness
or random noise within the training data and then its predictive of effective reservoirs rather than with the overall reservoir
power is reduced. To avoid overfitting and to find the best valida- thickness.
tion accuracy, the learning accuracy must be limited. Fig. 9 shows
the cross plots of real oil production against predictions using the 5. Conclusions
ANN model and XGB model. In Fig. 10a and b, the prediction of the
ANN model performs well with training data and testing data, with This paper developed an integrated ML system, formed by two
R2 being 0.8790 and 0.7950, separately. Fig. 9c and d provides the interconnected predictive models. It makes full use of historical
prediction performance of the XGB model. Higher values of R2 in data and solves reservoir identification and production forecasting
Table 8 problems, making the models faster and less labor-intensive than
Comparative performance of ANN and XGB. traditional methods.
The results of reservoir identification revealed that ensemble
Metrics Training Testing
techniques (RF, GBDT and XGB) perform better than single classi-
ANN XGB ANN XGB
fiers (LR, KNN, DT and ANN). The reservoir identification results of
RMSE, ton 246.347 26.04 321.711 237.753 XGB were selected because of the outperformance of XGB in all
MAE, ton 174.183 18.807 258.414 189.962 evaluation metrics in the 10-fold-cross-validation and test process
R2 0.879 0.999 0.795 0.857
when compared to the other methods. The prediction accuracy for
305
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

Fig. 10. Cross plots of real field cumulative oil production results vs. forecasts of established ANN model and XGB model: (a) Training set results of ANN model; (b) Testing set
results of ANN model; (c) Training set results of XGB model; (d) Testing set results of XGB model.

higher prediction accuracy than corresponding RM II (based on


overall reservoir thickness) in training and testing data sets. In
testing process, the R2 of XGB and ANN model using effective
thickness was 10% and 13% higher than that of RM II. The MAE and
RMSE of effective thickness-based models were much lower than
that of RM II, demonstrating the superiority of effective thickness-
based ML models for all metrics. XGB was better than ANN with a
higher prediction accuracy and faster computing speed.
In this study, the research data was mainly based on an oil
production data set from CNPC. The integrated ML system has
proven successful in the predictive test of CNPC's subordinate
blocks. In the future, introducing diverse data from different re-
Fig. 11. Importance score rank of input features in prediction of cumulative oil gions may improve the ML models and perhaps make them
production.
applicable on a global scale.

effective reservoirs (IO and IIO) was up to 99%.


Based on the prediction results of IO and IIO obtained from the Acknowledgments
reservoir identification, the effective thickness (thickness of IO and
IIO) was an important input used in the production prediction The research is partly supported by the NSERC/Energi Simula-
process to predict the cumulative oil production of single wells. The tion and Alberta Innovates Chair at the University of Calgary.
very little difference (0.01 R2) between the prediction results of
established ANN/XGB model (based on predicted effective thick-
ness) and corresponding RM I (based on real effective thickness) Appendix A
proved that the prediction of reservoir identification was suffi-
ciently accurate and could reliably be used in production fore- Real data and prediction of XGB/ANN model from training
casting. ANN and XGB models with effective thickness provided process in the case of production prediction.
306
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

(continued )

No. Real production, ton Prediction of ANN, ton Prediction of XGB, ton No. Real production, ton Prediction of ANN, ton Prediction of XGB, ton

1 777 818.033 737.999 75 455 516.642 443.858


2 2173 2227.41 2178.62 76 1521 1421.59 1539.18
3 2046 1579.29 2007.16 77 1175 820.689 1072.94
4 150 200.014 164.171 78 232 543.801 286.53
5 1963 1871.6 1952.42 79 153 13.6937 151.762
6 2012 2044.98 2000.79 80 211 156 254
7 1845 1415.15 1829.8 81 1257 1007 1229
8 852 1064.91 847.855 82 928 865 918
9 286 229.866 289.256 83 825 720 785.178
10 1933 1464.19 1906.82
11 104 262.904 107.727
12 704 653.465 703.937 Appendix B
13 1236 865.083 1233.67
14 712 581.236 668.307
15 19 3.78477 23.0181 Real data and prediction of XGB/ANN model from testing pro-
16 482 319.131 497.908 cess in the case of production prediction.
17 1205 1454.17 1243.27
18 313 329.998 326.537
19 1607 1616.4 1587.17
20 480 475.354 483.264
21 5 1.02 23.8237 No. Real production, ton Prediction of ANN, ton Prediction of XGB, ton
22 169 455.467 191.984
1 1786 1079.75 1511.6
23 1716 1484.36 1709.92
2 688 915.475 739.036
24 1761 1250.75 1738.47
3 446 234.412 313.871
25 1115 794.688 1087.56
4 816 696.202 492.459
26 2085 1885.16 2040.46
5 2403 1739.3 2000.69
27 26 54.2055 31.5974
6 1497 1382.06 1565.9
28 376 380.011 372.154
7 2037 1675.9 2322.53
29 188 235.632 178.401
8 996 1252.54 1195.31
30 519 453.608 504.225
9 1045 948.279 762.268
31 146 411.423 180.163
10 340 483.319 565.643
32 2319 1777.83 2268.21
11 630 856.035 668.591
33 1497 1354.75 1529.09
12 164 182.789 175.969
34 1903 2210.43 1924.94
13 1369 1534.89 1682.47
35 2141 2017.69 2138.54
14 631 501.439 359.441
36 566 841.086 606.211
15 2228 2010.97 1992.25
37 18 19.7361 16.5283
16 115 421.96 428.314
38 496 868.637 499.08
17 495 376.964 404.13
39 1541 1731.59 1600.39
18 430 631.916 559.978
40 857 1070.55 871.86
19 309 161.169 201.158
41 396 319.434 369.877
20 312 298.446 275.513
42 260 411.351 257.906
21 911 297.94 400.452
43 1520 1579.48 1528.51
22 1933 1483.01 1703.87
44 1033 738.646 1022.66
23 336 802.416 567.334
45 894 873.995 876.966
24 831 893.789 588.766
46 589 955.536 617.514
25 2256 1519.4 1804.72
47 91 267.938 104.344
26 2334 2320.35 1961.03
48 78 3.78477 58.3632
27 212 545.962 405.253
49 144 447.756 161.496
28 1876 1508.64 1761.49
50 402 346.137 407.909
29 61 52.1522 121.8566
51 803 1241.18 807.688
30 1363 857.524 800.274
52 1588 1582.66 1590.29
31 1211 1720.96 1642.75
53 6 5.45389 32
32 95 250.913 452.584
54 664 571.344 677.822
33 1680 1849.23 1790.27
55 1965 1484.95 1934.8
34 385 505.18 660.58
56 349 453.542 355.716
35 413 212.257 259.373
57 225 182.117 222.646
36 1283 1236.68 1365.72
58 1461 1755.09 1478.51
37 1369 1094.31 1115.13
59 868 946.422 839.416
38 1700 1988.77 1263.2
60 2390 2416.86 2351.98
39 1505 1279 1693
61 275 338.072 294.482
40 220 543.427 322.05
62 464 456.224 486.921
41 903 662.845 667.077
63 1183 2228.44 1254.79
64 20 93.9628 35.771
65 137 3.78477 141.637
66 1798 1817.35 1816.96
References
67 100 231.809 129.682
68 397 569 402.12
Abdulraheem, A., Sabakhi, E., Ahmed, M., Vantala, A., Raharja, I., Korvin, G., 2007.
69 232 408.25 250.134
Estimation of permeability from wireline logs in a Middle Eastern carbonate
70 1355 1386.1 1355.86
reservoir using fuzzy logic. In: 15th SPE Middle East Oil and Gas Show and
71 1843 1491.49 1831.53 Conference. https://doi.org/10.2118/105350-MS.
72 445 600.457 441.883 Andika, R., Chandima Ratnayake, R.M., 2019. Machine learning approach for risk-
73 731 587.383 730.651 based inspection screening assessment. Reliab. Eng. Syst. Saf. 185, 518e532.
74 436 607.271 430.751 https://doi.org/10.1016/j.ress.2019.02.008.
(continued on next page) Anifowose, F.A., Labadin, J., Abdulraheem, A., 2015. Ensemble model of non-linear
feature selection-based Extreme Learning Machine for improved natural gas

307
W. Liu, Z. Chen, Y. Hu et al. Petroleum Science 20 (2023) 295e308

reservoir characterization. Spec. Issue J. Nat. Gas. Sci. Eng. 25, 1561e1572. Kamenski, A., Cvetkovi c, M., Kolenkovic Mocilac, I., et al., 2020. Lithology prediction
https://doi.org/10.1016/j.jngse.2015.02.012. in the subsurface by artificial neural networks on well and 3D seismic data in
Anifowose, F.A., Labadin, J., Abdulraheem, A., 2017. Ensemble machine learning: an clastic sediments: a stochastic approach to a deterministic method. Int. J. Geom.
untapped modeling paradigm for petroleum reservoir characterization. J. Petrol. 11, 8. https://doi.org/10.1007/s13137-020-0145-3.
Sci. Eng. 151, 480e487. https://doi.org/10.1016/j.petrol.2017.01.024. Liu, W., Chen, Z., Hu, Y., 2022. XGBoost algorithm-based prediction of safety
Awoleke, O., Lane, R., 2011. Analysis of data from the Barnett Shale using conven- assessment for pipelines. Int. J. Pres. Ves. Pip. 197, 104655. https://doi.org/
tional statistical and virtual intelligence techniques. SPE Reservoir Eval. Eng. 14 10.1016/j.ijpvp.2022.104655.
(5), 544e556. https://doi.org/10.2118/127919-PA. Merembayev, T., Yunussov, R., Yedilkhan, A., 2018. Machine learning algorithms for
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.G., 1984. Classification and classification geology data from well logging. 2018 14th International Confer-
regression trees. In: Hoecker, A. (Ed.), TMVAeToolkit for Multivariate Data ence on Electronics Computer and Computation (ICECCO), pp. 206e212. https://
Analysis. Wadsworth International Group, Belmont, California, USA arXiv pre- doi.org/10.1109/ICECCO.2018.8634775.
print physics/0703039. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models,
Breiman, L., 1996. Bagging predictors. Mach. Learn. 26 (2), 123e140. https://doi.org/ part I, A discussion of principles. J. Hydrol. 10, 282e290. https://doi.org/10.1016/
10.1007/BF00058655. 0022-1694(70)90255-6.
Brown, I., Mues, C., 2012. An experimental comparison of classification algorithms Priezzhev, I., Stanisalav, E., 2018. Application of Machine Learning Algorithms Using
for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446e3453. Seismic Data and Well Logs to Predict Reservoir Properties, vol. 1. European
https://doi.org/10.1016/j.eswa.2011.09.033. Association of Geoscientists & Engineers, pp. 1e5. https://doi.org/10.3997/2214-
Chaki, S., Routray, A., Mohanty, W.K., 2018. Well-log and seismic data integration for 4609.201800920.
reservoir characterization: a signal processing and machine-learning perspec- Radford, D.D.G., Cracknell, M.J., Roach, M.J., Cumming, G.V., 2018. Geological map-
tive. IEEE Signal Process. Mag. 35 (2), 72e81. https://doi.org/10.1109/ ping in western Tasmania using radar and random forests. IEEE J. Sel. Top. Appl.
MSP.2017.2776602. Earth Obs. Rem. Sens. 11 (9), 3075e3087, September. https://doi.org/10.1109/
Chen, T., Guestrin, C., 2016. Xgboost: a scalable tree boosting system. Proc. 22nd JSTARS.2018.2855207.
ACM SIGKDD Int. Conf. Knowl. Discov. Data Min 785e794. https://doi.org/ Raeesi, M., Moradzadeh, A., Ardejani, F.D., Rahimi, M., 2012. Classification and
10.1145/2939672.299785. identification of hydrocarbon reservoir lithofacies and their heterogeneity using
Chakra, N.C., Song, K.-Y., Gupta, M.M., Saraf, D.N., 2013. An innovative neural fore- seismic attributes, logs data and artificial neural networks. J. Petrol. Sci. Eng. 82,
cast of cumulative oil production from a petroleum reservoir employing higher- 151e165. https://doi.org/10.1016/j.petrol.2012.01.012.
order neural networks (HONNs). J. Petrol. Sci. Eng. 106, 18e33. https://doi.org/ Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd.
10.1016/j.petrol.2013.03.004. Ren, X., Hou, J., Song, S., Liu, Y., Chen, D., Wang, X., Dou, L., 2019. Lithology identi-
Cox, D.R., 1958. The regression analysis of binary sequences (with discussion). J. Roy. fication using well logs: a method by integrating artificial neural networks and
Stat. Soc. B 20, 215e242. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x. sedimentary patterns. J. Petrol. Sci. Eng. 182, 106336. https://doi.org/10.1016/
Cracknell, M., Reading, A., 2012. Machine Learning for Lithology Classification and j.petrol.2019.106336.
Uncertainty Mapping. AGU Fall Meeting Abstracts, p. 1511. Ren, Y., Mao, J., Zhao, H., Zhou, C., Gong, X., Rao, Z., Wang, Q., Zhang, Y., 2020.
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Prediction of aerosol particle size distribution based on neural network. Adv.
Ann. Stat. 29 (5), 1189e1232. http://www.jstor.org/stable/2699986. Meteorol. https://doi.org/10.1155/2020/5074192.
Guo, Z., Wang, H., Kong, X., Shen, L., Jia, Y., 2021. Machine learning-based production Rodríguez, H.M., Escobar, E., Embid, S., Morillas, N.R., Hegazy, M., Larry, W.L., 2014.
prediction model and its application in Duvernay formation. Energies 14 (17), New approach to identify analogous reservoirs. SPE Econ & Mgmt 6, 173e184.
5509. https://doi.org/10.3390/en14175509. https://doi.org/10.2118/166449-PA.
Harris, J.R., Grunsky, E.C., 2015. Predictive lithological mapping of Canada's North Siddiqi, S.S., Andrew, K.W., 2002. A study of water coning control in oil wells by
using random forest classification applied to geophysical and geochemical data. injected or natural flow barriers using scaled physical model and numerical
Comput. Geosci. 80, 9e25. https://doi.org/10.1016/j.cageo.2015.03.013. simulator. In: SPE Annual Technical Conference and Exhibition. https://doi.org/
Helmy, T., Rahman, S.M., Hossain, M.I., Abdelraheem, A., 2013. Non-linear hetero- 10.2118/77415-MS.
geneous ensemble model for permeability prediction of oil reservoirs. Arabian J. Van, S.L., Chon, B.H., 2018. Effective prediction and management of a CO2 flooding
Sci. Eng. 38, 1379e1395. https://doi.org/10.1007/s13369-013-0588-z. process for enhancing oil recovery using artificial neural networks. J. Energy
Hossin, M., Sulaiman, M.N., 2015. A review on evaluation Metrics for data classifi- Resour. Technol. 140 (3), 032906. https://doi.org/10.1115/1.4038054.
cation evaluations. International Journal of Data Mining & Knowledge Man- You, J., Ampomah, W., Kutsienyo, E.J., Sun, Q., Balch, R.S., Aggrey, W.N., Cather, M.,
agement Process (IJDKP) 5 (2). 2019. Assessment of enhanced oil recovery and CO2 storage capacity using
Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE machine learning and optimization framework. SPE Europec featured at 81st
Trans. Pattern Anal. Mach. Intell. 20 (8), 832e844. https://doi.org/10.1109/ EAGE Conference and Exhibition. https://doi.org/10.2118/195490-MS.
34.709601.

308

You might also like