Production Prediction ML
Production Prediction ML
                                                                  Petroleum Science
                             journal homepage: www.keaipublishing.com/en/journals/petroleum-science
Original Paper
a r t i c l e i n f o a b s t r a c t
Article history:                                       Reservoir identification and production prediction are two of the most important tasks in petroleum
Received 15 February 2022                              exploration and development. Machine learning (ML) methods are used for petroleum-related studies,
Received in revised form                               but have not been applied to reservoir identification and production prediction based on reservoir
1 September 2022
                                                       identification. Production forecasting studies are typically based on overall reservoir thickness and lack
Accepted 5 September 2022
Available online 9 September 2022
                                                       accuracy when reservoirs contain a water or dry layer without oil production. In this paper, a systematic
                                                       ML method was developed using classification models for reservoir identification, and regression models
Edited by Yan-Hua Sun                                  for production prediction. The production models are based on the reservoir identification results. To
                                                       realize the reservoir identification, seven optimized ML methods were used: four typical single ML
Keywords:                                              methods and three ensemble ML methods. These methods classify the reservoir into five types of layers:
Reservoir identification                                water, dry and three levels of oil (I oil layer, II oil layer, III oil layer). The validation and test results of these
Production prediction                                  seven optimized ML methods suggest the three ensemble methods perform better than the four single
Machine learning                                       ML methods in reservoir identification. The XGBoost produced the model with the highest accuracy; up
Ensemble method
                                                       to 99%. The effective thickness of I and II oil layers determined during the reservoir identification was fed
                                                       into the models for predicting production. Effective thickness considers the distribution of the water and
                                                       the oil resulting in a more reasonable production prediction compared to predictions based on the
                                                       overall reservoir thickness. To validate the superiority of the ML methods, reference models using overall
                                                       reservoir thickness were built for comparison. The models based on effective thickness outperformed the
                                                       reference models in every evaluation metric. The prediction accuracy of the ML models using effective
                                                       thickness were 10% higher than that of reference model. Without the personal error or data distortion
                                                       existing in traditional methods, this novel system realizes rapid analysis of data while reducing the time
                                                       required to resolve reservoir classification and production prediction challenges. The ML models using
                                                       the effective thickness obtained from reservoir identification were more accurate when predicting oil
                                                       production compared to previous studies which use overall reservoir thickness.
                                                       © 2022 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This
                                                               is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
https://doi.org/10.1016/j.petsci.2022.09.002
1995-8226/© 2022 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC BY license (http://
creativecommons.org/licenses/by/4.0/).
W. Liu, Z. Chen, Y. Hu et al.                                                                                         Petroleum Science 20 (2023) 295e308
(AI) has been applied in various fields with a positive impact for              existing wells (Awoleke and Lane, 2011; Van and Chon, 2018). To
many years. Practical applications of ML techniques have been                  forecast the oil production, well log data related to reservoir
widely investigated in petroleum engineering, including reservoir              geological characteristics and dynamic operation data have been
characterization (Anifowose et al., 2017; Chaki et al., 2018), pre-            used to build a representative prediction model. The RF technique
diction of reservoir properties (Helmy et al., 2013; Anifowose et al.,         has been applied to geological and geochemical data for lithology
2015; Priezzhev and Stanisalav, 2018) and production prediction                identification by previous researchers. It outperforms other ML
(Chakra et al., 2013; You et al., 2019). Some studies have applied ML          algorithms in this area (Harris and Grunsky, 2015; Cracknell and
techniques to petroleum geology (Raeesi et al., 2012; Merembayev               Reading, 2012). In addition to the classification performance, this
et al., 2018) where several input parameters are selected related to           technique provides reliable predictions for geological mapping
geological characteristics of a reservoir and its operating condi-             applications (Radford et al., 2018).
tions. To predict lithofacies or reservoir properties (e.g., porosity             Reservoir identification is the fundamental work necessary for
and permeability), related well log data has been used to train a              production forecasting. Most previous ML studies about reservoirs
predictive model. After learning the underlying relationship be-               were focused on lithofacies classification or reservoir properties
tween input variables and an output target, this data-driven model             and very few mention reservoir identifications, let alone the com-
is finally applied to forecast specific lithofacies or reservoir                 bination of reservoir identification and production prediction.
properties.                                                                    Historically, the prediction of oil production leaned heavily on the
    In the previous research, several ML techniques have been                  overall thickness of reservoir as a key data input (Guo et al., 2021).
introduced to solve classification and regression problems in pe-               This technique oversimplified the reservoir which contains not only
troleum engineering and geology. Among them, artificial neural                  the good oil layer, but other layers (water, dry and poor oil) that do
network (ANN) and random forest (RF) are commonly used due to                  not or cannot produce oil. There is no proven direct relationship
their remarkable performance. The ANN technique has been                       between the overall thickness of reservoir and the volume of oil
applied to lithology identification and recognition with well log               production. For example, other conditions being equal, a 20-m-
data (Ren et al., 2019; Kamenski et al., 2020) using the back prop-            thick reservoir A is theoretically predicted to produce more oil than
agation neural network (BPNN) to find patterns that identify the                10-m-thick reservoir B. In fact, if reservoir A is mainly composed of
lithology. This technique has been used to predict oil production of           water and dry layers and reservoir B is formed by almost all good oil
                                                                         296
W. Liu, Z. Chen, Y. Hu et al.                                                                                                       Petroleum Science 20 (2023) 295e308
                                                                                                                                  P
                                                                                                                   !                  whj ,H h qj
2.3.3. Decision tree                                                                         X                               e    h
    Classification and regression tree (CART) is the most popular                      Yj ¼ f  whj , H h  qj            ¼             P                                     (8)
                                                                                                                            P
                                                                                                                            n             whj ,H h qi
algorithm used to build a DT. CART (Breiman et al., 1984) develops a                           h
                                                                                                                                  e   h
Gini index to choose a feature for splitting a tree. The Gini index                                                         i¼1
reflects the probability that two samples are randomly selected
from a data set and their labels (classes) are different. A lower Gini                where wih is the weight matrix of the node connections between
index indicates greater purity of the data set.                                       the input layer and a hidden layer; whj is the weight matrix of the
    If the sample set D is split into D1 and D2 using the discrete                    node connections between a hidden layer and the output layer; qh
feature A, the Gini index calculated after splitting is defined as:                    is the threshold matrix associated with a hidden layer; qj is the
                                                                                      threshold matrix associated with the output layer; and n is the
                            jD1j            jD2j                                      number of classifications.
Gain_GiniðD; AÞ ¼                GiniðD1Þ þ      GiniðD2Þ                 (6)             In Eq. (9), a ReLU activation function was applied in this study to
                             jDj             jDj
                                                                                      solve a regression problem:
   Therefore, Gain_GiniðD; AÞ is the uncertainty after splitting. A                                    
smaller Gain_GiniðD; AÞ value is preferred because this provides                                           x if x > 0
                                                                                      ReLUðxÞ ¼                                                                             (9)
greater purity for a data set.                                                                             0 if x  0
2.3.6. Gradient boosting decision trees                                       et al., 2013). In this study, to deal with a reservoir
    Different from RF, GBDT is an ensemble ML tool using a boosting           identificationeclassification   problem   and   a    production
technique, which sequentially generates base models and improves              predictioneregression problem separately, two sets of metrics
the predictive power of the ensemble through incremental mini-                were selected and introduced below.
mization of residual errors in each iteration of construction of a
new base model (Brown and Mues, 2012). While building a clas-
sification model in GBDT, samples that are misclassified in a pre-              2.4.1. Classification problems
vious base model are more likely to be assigned an increased                      Before discussing the metrics for a classification problem, it is
weight in the next step. The new model has improved prediction                necessary to introduce four elements in a binary problem: true
accuracy compared to a previous model. A loss function is used to             positive (TP), false negative (FN), false positive (FP) and true
measure the difference between the predicted Fk ðxÞ and true values           negative (TN). The ‘true’ and ‘false’, respectively, mean whether a
yk to indicate how well a model fits the data.                                 prediction is correct or not compared to real data. The ‘positive’ and
    In a GBDT algorithm for a multi-class problem, the loss function          ‘negative’ represent whether a prediction class is the same as a
is (Friedman, 2001):                                                          specific class or not. To assess the predictive performance of the
                                                                              above classifiers and select the corresponding optimal models, a
                    X
                      K
L fyk ; Fk ðxÞgK1 ¼    yk log pk ðxÞ                           (10)          matrix of accuracy, precision, recall and f1 score are evaluated.
                                k¼1
                                                                              Accuracy is based on the ratio of correctly predicted samples to the
                                                                              total samples. Precision is defined as the ratio of correct positive
where yk ¼ 1 (class ¼ k) 2 {0,1}, pk ðxÞ ¼ P (yk ¼ 1jx), and                  predictions. Recall is the ratio of actual positive results correctly
                          ,                                                   predicted. f1 score is the harmonic mean of the precision and recall.
                                X
                                K                                             The higher the f1-score value the better precision and recall.
pk ðxÞ ¼ expðFk ðxÞÞ                  expðFl ðxÞÞ               (11)
                                l¼1                                                                   TP þ TN
                                                                              accuracy ¼                                                                           (14)
                                                                                                 TP þ TN þ FP þ FN
2.3.7. XGBoost                                                                                     TP
                                                                              precision ¼                                                                          (15)
    XGB is one of the most popular methods in the ensemble ma-                                   TP þ FP
chine learning category today. It performs very well in multiple
programming competitions like Kaggle (Chen and Guestrin, 2016).                              TP
XGB is a machine learning technique for classification and regres-             recall ¼                                                                             (16)
                                                                                           TP þ FN
sion problems. It produces a prediction model in the form of an
ensemble of weak prediction models, typically decision trees. Based
                                                                                                1               precison,recall
on the concept of GBDT, XGB uses a regularized model formaliza-               f1 ¼ 2 1                      ¼2                                                     (17)
                                                                                      =recall þ 1
                                                                                                  =precison    precison þ recall
tion to control over-fitting, which leads to better performance.
    Distinct from GBDT, the objective function of XGB consists of a
loss function and a regularization term (Chen et al., 2016):
     X                X
L¼    lð b
         y i ; yi Þ þ  Uðfk Þ                                   (12)
       i                  k                                                   2.4.2. Regression problems
                                                                                  In this study, the following measurements were applied to
                   1                                                          substantiate the statistical accuracy of the performance of ANN and
Uðf Þ ¼ gT 0 þ lkwk2                                            (13)          XGB for production prediction: RMSE, MAE, and R2. The RMSE is a
                   2
                                                                              measure of the spread of actual values around the average of the
where l is a loss function as in GBDT, and measures a difference              predicted values. It computes the average of the squared differ-
between prediction b y i and observation; A regularization term U is          ences between each predicted value and its corresponding actual
added in the objective function to control over-fitting and                    value. It is expressed as:
contribute to better performance and flexible complexity; fk rep-                     vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
resents a specific tree structure; T 0 and w denote the number of leaf                u n                                     
                                                                                     u1 X                             pred 2
nodes and the score on each node respectively; g and l are pa-                RMSE ¼ t               yobs
                                                                                                       i
                                                                                                               yi                                                 (18)
                                                                                      n i¼1
rameters to control the regularization.
                                                                        299
W. Liu, Z. Chen, Y. Hu et al.                                                                                                             Petroleum Science 20 (2023) 295e308
Table 1                                                                              Table 2
Description of input features in reservoir identification.                            Description of input features in production prediction.
Feature Nomenclature Unit Min Mean Max Feature Unit Min Mean Max
                                                                               300
W. Liu, Z. Chen, Y. Hu et al.                                                                                                                   Petroleum Science 20 (2023) 295e308
Table 3
Tuned optimal hyper-parameter values of seven classification methods.
                                                                                              Table 5
                                                                                              Number of samples in each type of layer.
DL WL IO IIO IIIO
Table 4
Precision, recall and f1-scores for 10-fold cross validation for seven classifiers.
Method Output reservoir Precision Recall f1-score Method Output reservoir Precision Recall f1-score
                                                                                        301
W. Liu, Z. Chen, Y. Hu et al.                                                                                                             Petroleum Science 20 (2023) 295e308
Fig. 3. Confusion matrix plots of single methods on the test dataset: (a) LR model; (b) KNN model; (c) DT model; (d) ANN model.
achieved better classification results compared to the other four                            was up to 99%. This was interpreted to mean the probability of
single methods. Among the ensemble methods, XGB and GBDT are                                being predicted as another type of reservoir (not an IO or IIO
preferred for their higher accuracy, precision, recall and f1-scores.                       reservoir) was only 1%. The probability of the real IIO reservoir
                                                                                            being predicted as the IIO reservoir was 90% and the probability of
4.1.2. Results of testing                                                                   being predicted as the IO or IIO reservoir was 95%, which means the
    Figs. 3 and 4 present the reservoir classes that were correctly                         probability of being predicted as another type of reservoir (any
classified or misclassified in the test dataset for single models and                         reservoir except for IO or IIO) was only 5%. Although the prediction
ensemble models. The testing result showed that LR, KNN and DT                              accuracy of the XGB model for each type of reservoir was 92%, the
predict DL and IO with an accuracy greater than 0.8. ANN provided                           prediction success rate for effective reservoirs (IO and IIO) was very
the best identification performance for a single method, but the                             high - up to 99%. Using the feature importance method, Fig. 5 shows
prediction accuracy for each reservoir was still lower than the                             importance score ranking of different input features calculated by
ensemble methods. This was consistent with the results from a 10-                           XGB. Sw is the most valuable feature for reservoir identification.
fold cross validation. In the confusion matrices below, XGB is the                          POR and PER were the second and third most relevant features
optimal ML method for overall performance. It identified IO, IIO and                         contributing to accurate prediction. Thickness of reservoir is the
DL with 0.9 accuracy. Even training with very few samples in IIIO                          least important feature in this case.
class, XGB still predicted this reservoir with an accuracy up to 0.76.                         To train and test the predictive model for reservoir identifica-
The prediction of XGB was selected as the final reservoir identifi-                           tion, the ANN model needs several minutes to finish the work,
cation result from all the ML methods.                                                      while the other six ML models require several seconds. By contrast,
    The confusion matrix result of XGB (Fig. 4c) shows the proba-                           traditional methods used to identify a reservoir could be much time
bility of a real IO reservoir being predicted as the IO reservoir was                       consuming and require additional manpower. An experienced
92% and the probability of being predicted as the IO or IIO reservoir                       geologist requires days or even weeks to complete the
                                                                                      302
W. Liu, Z. Chen, Y. Hu et al.                                                                                                              Petroleum Science 20 (2023) 295e308
Fig. 4. Confusion matrix plots of ensemble methods on the test dataset: (a) RF model; (b) GBDT model; (c) XGB model.
Fig. 6. Loss vs. epoch curve in the training data and the validation data: (a) Two hidden layered ANN model; (b) Three hidden layered ANN model.
Table 7                                                                                    the training set (0.9986) and testing set (0.8575) prove the supe-
Model results of ANN and two reference models.                                             riority of XGB in the prediction of production compared to ANN.
  Metrics         Training                         Testing                                     Since XGB performed better than ANN for prediction of oil
                  XGB           RM I     RM II     XGB         RM I      RM II
                                                                                           production, the importance score of features in this case was
                                                                                           calculated based on the prediction of XGB. Fig. 11 shows the
  RMSE, ton       26.04         25.881   69.475    237.753     234.792   298.924
                                                                                           importance score of different features using feature importance
  MAE, ton        18.807        18.094   57.153    189.962     187.028   257.936
  R2              0.999         0.999    0.913     0.857       0.861     0.782             method for the XGB, RM I and RM II models. All three models had
                                                                                           the same top 3 important features. Ldo was the most valuable
                                                                                           feature for the prediction of cumulative oil production; this cannot
                                                                                           be used in traditional simulations. Day and Wr were the second and
                                                                                           third most important features contributing to the prediction. In the
                                                                                           XGB model and RM I, T and Tr were the fourth most important
                                                                                           features and had very little difference. To in RM II was the least
                                                                                           important feature. To contributed less to the prediction of cumu-
                                                                                           lative oil production compared to T and Tr. The result was consistent
                                                                                           with the prediction performance of XGB model and reference
                                                                                           models.
                                                                                               From the algorithm perspective, it makes sense that XGB per-
                                                                                           formed better than ANN in these case studies. The two methods are
                                                                                           important and widely used in data science research and by in-
                                                                                           dustry. These different machine learning methods perform differ-
Fig. 9. Comparison of prediction performance (R2) of XGB and two reference models.
                                                                                           ently for different types of tasks. ANN captures image, voice, text
model and its two reference models in three different metrics. XGB                         and other high-dimensional data by modeling a spatiotemporal
had a very reliable performance estimating oil production with the                         location. The tree-based XGB handles tabular data well and has
predicted reservoir data in the training and testing processes, with                       some features that ANN does not have, such as interpretability of a
R2 being 0.999 for training and R2 being 0.857 for testing. Like the                       model, easier hyper-parameter tuning and a faster calculating
ANN model, the prediction results from training and testing of the                         speed. In this study, it became obvious that compared to XGB, it is
XGB model using the predicted effective reservoir thickness was                            tough work to find the best construction of ANN without over-
very similar to the prediction result of RM I with the real effective                      fitting. The calculating speed of XGB was nearly 100 times faster
reservoir thickness. RM II with overall reservoir thickness had a                          than ANN in the process of production forecasting. XGB needed
lower prediction accuracy compared to the XGB model in the                                 only 2 or 3 s, while ANN required several minutes. The difference in
training and testing sets. The XGB model using predicted effective                         computational speed is attributed to: (1) using a backpropagation
reservoir thickness was considered reliable because of its similar                         process, the convergence rate of ANN is particularly slow and easily
performance compared to RM I and had a more accurate prediction                            falls into the local minimum (Ren et al., 2020); (2) compared to
than RM II. In test datasets, the prediction accuracy (R2) of XGB                          ANN, XGB has a lower number of hyperparameters to be tuned; (3)
model with effective reservoir thickness was about 10% higher than                         sparsity-aware split finding of XGB makes it find the optimal di-
that of RM II.                                                                             rection and only non-missing observations are visited; (4) cache-
                                                                                           aware access and blocks for out-of-core computation make XGB
                                                                                           fast. Although the computing speed of ANN is slower than XGB,
4.2.3. Comparative analysis of ANN and XGB                                                 XGB and ANN models have much faster prediction speed when
    Table 8 expresses the comparative performance of ANN and                               compared to traditional methods.
XGB. XGB is preferred because it outperforms ANN in every eval-                                In the case of production forecasting, ANN and XGB models
uation metric for the training data and validation data sets. The two                      show similar performance compared to their RM I. The results of
methods perform better in the training dataset when compared to                            reservoir identification from classification models are reliable and
the validation dataset. Different from ANN, the performance of the                         can be used in regression models for prediction of production. In
training data in XGB was significantly better than that of the vali-                        RM II, established ANN and XGB models provide higher accuracy for
dation data. This is because in ANN, a larger epoch can increase the                       predicting production. The combination of reservoir identification
learning and training accuracy, but at the same time it leads to an                        and production forecasting in this study was meaningful and
overfitting problem where the model learns a great degree of error                          valuable because the production was correlated with the thickness
or random noise within the training data and then its predictive                           of effective reservoirs rather than with the overall reservoir
power is reduced. To avoid overfitting and to find the best valida-                          thickness.
tion accuracy, the learning accuracy must be limited. Fig. 9 shows
the cross plots of real oil production against predictions using the                       5. Conclusions
ANN model and XGB model. In Fig. 10a and b, the prediction of the
ANN model performs well with training data and testing data, with                             This paper developed an integrated ML system, formed by two
R2 being 0.8790 and 0.7950, separately. Fig. 9c and d provides the                         interconnected predictive models. It makes full use of historical
prediction performance of the XGB model. Higher values of R2 in                            data and solves reservoir identification and production forecasting
Table 8                                                                                    problems, making the models faster and less labor-intensive than
Comparative performance of ANN and XGB.                                                    traditional methods.
                                                                                              The results of reservoir identification revealed that ensemble
  Metrics               Training                         Testing
                                                                                           techniques (RF, GBDT and XGB) perform better than single classi-
                        ANN               XGB            ANN             XGB
                                                                                           fiers (LR, KNN, DT and ANN). The reservoir identification results of
  RMSE, ton             246.347           26.04          321.711         237.753           XGB were selected because of the outperformance of XGB in all
  MAE, ton              174.183           18.807         258.414         189.962           evaluation metrics in the 10-fold-cross-validation and test process
  R2                    0.879             0.999          0.795           0.857
                                                                                           when compared to the other methods. The prediction accuracy for
                                                                                     305
W. Liu, Z. Chen, Y. Hu et al.                                                                                                              Petroleum Science 20 (2023) 295e308
Fig. 10. Cross plots of real field cumulative oil production results vs. forecasts of established ANN model and XGB model: (a) Training set results of ANN model; (b) Testing set
results of ANN model; (c) Training set results of XGB model; (d) Testing set results of XGB model.
(continued )
No. Real production, ton Prediction of ANN, ton Prediction of XGB, ton No. Real production, ton Prediction of ANN, ton Prediction of XGB, ton
                                                                                     307
W. Liu, Z. Chen, Y. Hu et al.                                                                                                                       Petroleum Science 20 (2023) 295e308
    reservoir characterization. Spec. Issue J. Nat. Gas. Sci. Eng. 25, 1561e1572.               Kamenski, A., Cvetkovi   c, M., Kolenkovic Mocilac, I., et al., 2020. Lithology prediction
    https://doi.org/10.1016/j.jngse.2015.02.012.                                                    in the subsurface by artificial neural networks on well and 3D seismic data in
Anifowose, F.A., Labadin, J., Abdulraheem, A., 2017. Ensemble machine learning: an                  clastic sediments: a stochastic approach to a deterministic method. Int. J. Geom.
    untapped modeling paradigm for petroleum reservoir characterization. J. Petrol.                 11, 8. https://doi.org/10.1007/s13137-020-0145-3.
    Sci. Eng. 151, 480e487. https://doi.org/10.1016/j.petrol.2017.01.024.                       Liu, W., Chen, Z., Hu, Y., 2022. XGBoost algorithm-based prediction of safety
Awoleke, O., Lane, R., 2011. Analysis of data from the Barnett Shale using conven-                  assessment for pipelines. Int. J. Pres. Ves. Pip. 197, 104655. https://doi.org/
    tional statistical and virtual intelligence techniques. SPE Reservoir Eval. Eng. 14             10.1016/j.ijpvp.2022.104655.
    (5), 544e556. https://doi.org/10.2118/127919-PA.                                            Merembayev, T., Yunussov, R., Yedilkhan, A., 2018. Machine learning algorithms for
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.G., 1984. Classification and                     classification geology data from well logging. 2018 14th International Confer-
    regression trees. In: Hoecker, A. (Ed.), TMVAeToolkit for Multivariate Data                     ence on Electronics Computer and Computation (ICECCO), pp. 206e212. https://
    Analysis. Wadsworth International Group, Belmont, California, USA arXiv pre-                    doi.org/10.1109/ICECCO.2018.8634775.
    print physics/0703039.                                                                      Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models,
Breiman, L., 1996. Bagging predictors. Mach. Learn. 26 (2), 123e140. https://doi.org/               part I, A discussion of principles. J. Hydrol. 10, 282e290. https://doi.org/10.1016/
    10.1007/BF00058655.                                                                             0022-1694(70)90255-6.
Brown, I., Mues, C., 2012. An experimental comparison of classification algorithms               Priezzhev, I., Stanisalav, E., 2018. Application of Machine Learning Algorithms Using
    for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446e3453.                      Seismic Data and Well Logs to Predict Reservoir Properties, vol. 1. European
    https://doi.org/10.1016/j.eswa.2011.09.033.                                                     Association of Geoscientists & Engineers, pp. 1e5. https://doi.org/10.3997/2214-
Chaki, S., Routray, A., Mohanty, W.K., 2018. Well-log and seismic data integration for              4609.201800920.
    reservoir characterization: a signal processing and machine-learning perspec-               Radford, D.D.G., Cracknell, M.J., Roach, M.J., Cumming, G.V., 2018. Geological map-
    tive. IEEE Signal Process. Mag. 35 (2), 72e81. https://doi.org/10.1109/                         ping in western Tasmania using radar and random forests. IEEE J. Sel. Top. Appl.
    MSP.2017.2776602.                                                                               Earth Obs. Rem. Sens. 11 (9), 3075e3087, September. https://doi.org/10.1109/
Chen, T., Guestrin, C., 2016. Xgboost: a scalable tree boosting system. Proc. 22nd                  JSTARS.2018.2855207.
    ACM SIGKDD Int. Conf. Knowl. Discov. Data Min 785e794. https://doi.org/                     Raeesi, M., Moradzadeh, A., Ardejani, F.D., Rahimi, M., 2012. Classification and
    10.1145/2939672.299785.                                                                         identification of hydrocarbon reservoir lithofacies and their heterogeneity using
Chakra, N.C., Song, K.-Y., Gupta, M.M., Saraf, D.N., 2013. An innovative neural fore-               seismic attributes, logs data and artificial neural networks. J. Petrol. Sci. Eng. 82,
    cast of cumulative oil production from a petroleum reservoir employing higher-                  151e165. https://doi.org/10.1016/j.petrol.2012.01.012.
    order neural networks (HONNs). J. Petrol. Sci. Eng. 106, 18e33. https://doi.org/            Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd.
    10.1016/j.petrol.2013.03.004.                                                               Ren, X., Hou, J., Song, S., Liu, Y., Chen, D., Wang, X., Dou, L., 2019. Lithology identi-
Cox, D.R., 1958. The regression analysis of binary sequences (with discussion). J. Roy.             fication using well logs: a method by integrating artificial neural networks and
    Stat. Soc. B 20, 215e242. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x.                   sedimentary patterns. J. Petrol. Sci. Eng. 182, 106336. https://doi.org/10.1016/
Cracknell, M., Reading, A., 2012. Machine Learning for Lithology Classification and                  j.petrol.2019.106336.
    Uncertainty Mapping. AGU Fall Meeting Abstracts, p. 1511.                                   Ren, Y., Mao, J., Zhao, H., Zhou, C., Gong, X., Rao, Z., Wang, Q., Zhang, Y., 2020.
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine.                   Prediction of aerosol particle size distribution based on neural network. Adv.
    Ann. Stat. 29 (5), 1189e1232. http://www.jstor.org/stable/2699986.                              Meteorol. https://doi.org/10.1155/2020/5074192.
Guo, Z., Wang, H., Kong, X., Shen, L., Jia, Y., 2021. Machine learning-based production         Rodríguez, H.M., Escobar, E., Embid, S., Morillas, N.R., Hegazy, M., Larry, W.L., 2014.
    prediction model and its application in Duvernay formation. Energies 14 (17),                   New approach to identify analogous reservoirs. SPE Econ & Mgmt 6, 173e184.
    5509. https://doi.org/10.3390/en14175509.                                                       https://doi.org/10.2118/166449-PA.
Harris, J.R., Grunsky, E.C., 2015. Predictive lithological mapping of Canada's North            Siddiqi, S.S., Andrew, K.W., 2002. A study of water coning control in oil wells by
    using random forest classification applied to geophysical and geochemical data.                  injected or natural flow barriers using scaled physical model and numerical
    Comput. Geosci. 80, 9e25. https://doi.org/10.1016/j.cageo.2015.03.013.                          simulator. In: SPE Annual Technical Conference and Exhibition. https://doi.org/
Helmy, T., Rahman, S.M., Hossain, M.I., Abdelraheem, A., 2013. Non-linear hetero-                   10.2118/77415-MS.
    geneous ensemble model for permeability prediction of oil reservoirs. Arabian J.            Van, S.L., Chon, B.H., 2018. Effective prediction and management of a CO2 flooding
    Sci. Eng. 38, 1379e1395. https://doi.org/10.1007/s13369-013-0588-z.                             process for enhancing oil recovery using artificial neural networks. J. Energy
Hossin, M., Sulaiman, M.N., 2015. A review on evaluation Metrics for data classifi-                  Resour. Technol. 140 (3), 032906. https://doi.org/10.1115/1.4038054.
    cation evaluations. International Journal of Data Mining & Knowledge Man-                   You, J., Ampomah, W., Kutsienyo, E.J., Sun, Q., Balch, R.S., Aggrey, W.N., Cather, M.,
    agement Process (IJDKP) 5 (2).                                                                  2019. Assessment of enhanced oil recovery and CO2 storage capacity using
Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE                  machine learning and optimization framework. SPE Europec featured at 81st
    Trans. Pattern Anal. Mach. Intell. 20 (8), 832e844. https://doi.org/10.1109/                    EAGE Conference and Exhibition. https://doi.org/10.2118/195490-MS.
    34.709601.
308