SPE-214881-MS
A Comparative Study of Deep Learning Models and Traditional Methods in
Forecasting Oil Production in the Volve Field
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Z. H. Alali and R. N. Horne, Stanford University
Copyright 2023, Society of Petroleum Engineers DOI 10.2118/214881-MS
This paper was prepared for presentation at the 2023 SPE Annual Technical Conference and Exhibition held in San Antonio, Texas, USA, 16 - 18 October 2023.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Accurate well rate forecasting is essential for successful field development in the oil industry. Recurrent-
based deep learning models have traditionally been used for production forecasting. However, recent
advancements in the field have led to the use of transformer and transfer learning to overcome the need
for large amounts of data. This paper introduces a novel approach to using state of the art deep learning
algorithms for oil production forecasting.
To enhance the accuracy of oil rate predictions in the Norwegian Volve field, a combination of statistical
models and cutting-edge deep learning models were utilized. These models included Autoregressive
Integrated Moving Average (ARIMA), Block Recurrent Neural Network (BlockRNN), Temporal Fusion
Transformer (TFT), and the Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (N-
BEATS) using meta learning. The models used multivariate real-time historical data, such as bottomhole
pressure, wellhead pressure, wellhead temperature, and choke size, as input features to predict the oil rate of
two wells. The models were trained on 85% of the data and tested on the remaining 15%, with the advanced
models TFT and N-BEATS being compared to the conventional models in terms of prediction performance.
The complex production data used in this forecasting problem showed no clear trends or seasonality.
The state-of-the-art deep learning models, the Temporal Fusion Transformer (TFT) and the Neural Basis
Expansion Analysis for Interpretable Time Series (N-BEATS), outperformed other models in accuracy of
forecast. The TFT model was able to significantly minimize the testing Mean Squared Error (MSE) for
wells F-11H and F-12H. Additionally, the model predicted a range of uncertainty between the 10th and 90th
quantiles to consider the variability in the blind test intervals. The N-BEATS meta learning model was better
at capturing dynamic time series features from the M4 dataset and applying that knowledge to predict oil
rates in the Norne field, without any input variables like reservoir pressure. The N-BEATS approach was
superior to all other models in terms of the difference between the forecast and actual rate, resulting in a
mean square error of 0.02 for well F-12 and 0.05 for well F-11 respectively.
Our work, to the best of our knowledge, presents a novel implementation of a new model and evaluates
the efficiency of deep learning models in forecasting oil production compared to conventional methods.
Previously, machine learning and deep learning techniques in the petroleum sector mainly utilized historical
field data for their predictions. However, our study highlights the potential of meta learning and the N-
2 SPE-214881-MS
BEATS model in greenfield or newly developed areas where historical data are scarce. Additionally, the
TFT probabilistic deep learning model showed outstanding results, outperforming traditional models, and
providing a range of forecast uncertainty, which is very useful in making well-informed decisions in field
development.
Overview of Volve Field
The Volve field is an offshore oil field located in the North Sea on the Norwegian continental shelf,
200 kilometers west of Stavanger. To maintain pressure, the primary method of recovery employed
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
water injection wells, located at the flanks. Additional injection and production wells were drilled during
2012-2013, resulting in an improved recovery rate and extended field life.
The field had a total of 7 active wells, 3 injectors and 4 producers during its operation period from
February 2008 to September 2016. After 8.5 years of operation, with a cumulative production of 63 million
barrels of oil and a recovery rate of 54%. Equinor and the Volve license partners made all subsurface and
production datasets from the field publicly available. These datasets incorporate a wide range of time series
data, including oil and water rates, well trajectories, well logs, bottom hole pressure, wellhead pressure,
wellhead temperature, choke size opening, water cut, total oil rate, and total water rate. These datasets cover
the entire operational period of the field, from 2008 to 2016, and can be utilized to support further research
and facilitate learning. (Samo, 2020.; Sen & Ganguli, 2019).
Methodology
In this study, we used a statistical model, Autoregressive Integrated Moving Average (ARIMA), and deep
learning models known for their effectiveness in handling temporal dependencies within time series data.
Specifically, we trained the Block Recurrent Neural Network (BlockRNN), Temporal Fusion Transformer
(TFT), and Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (N-BEATS)
algorithms to predict the oil production rates of two wells, namely 15/9 F-12 and 15/6 F-11, located in the
Volve field.
Data Preprocessing
The research utilized data collected from the Volve field, which consisted of daily measurements from
2008 to 2016. The production periods for the F-12 and F-11 wells varied. The F-12 well data covered the
entire period from 2008 to 2016, while the F-11 production data was available from 2013 to 2016 only. To
evaluate the model's ability to predict production rates in the presence of high-noise datasets, we used the
raw data without preprocessing. However, the raw data contained missing values, which were addressed
using forward linear interpolation. This method estimates missing data by taking the average of the known
data points surrounding it, ensuring consistent time steps. However, F-11 and F-12 wells had fewer missing
data compared to other wells (Fig. 1).
SPE-214881-MS 3
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 1—Heatmap showing the missing values in Volve Field dataset (all wells).
Data Exploration
Before proceeding with the modeling phase, it was important to investigate the associations between the
input features and the target variable. This inspection enabled the recognition of correlations among different
variables, thus providing a comprehensive understanding of the dataset's abundance and coverage. To
visually represent these associations, we produced heat maps using Pearson correlation coefficients for
every pair of features and feature-target combinations.
Pearson's correlation quantifies the linear relationship between two variables. High coefficient values
usually indicate a strong positive or negative correlation, while lower coefficients suggest a weaker
correlation. As showing in Fig. 2, the correlation between F-12 WHP and F-12 BHP is not significant, which
is likely attributable to frozen data in F-12 BHP. Conversely, the map reveals strong correlations between
F-12 WHP and the oil rate, as well as between F-12 choke size and the F-12 water rate.
4 SPE-214881-MS
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 2—Heat map of Pearson's correlation coefficients
For the second well, F-11 BHP demonstrated a high correlation with F-11 WHP, unlike F-12. Moreover,
F-11 WHT had a robust correlation with the F-11 oil rate. When two features exhibit strong correlation, it is
crucial to exclude one of them during the modeling stage. This is due to the fact that the presence of highly
correlated features may result in multicollinearity, thereby increasing the model's variance and reducing its
interpretability. Furthermore, such features can negatively impact the model's performance and complicate
the task of determining the authentic relationship between the features and the target variable.
To thoroughly analyze the statistical distribution of our dataset, we utilized boxplots for the features
associated with both wellbores. As seen in Fig. 3, the data show a considerable amount of skewness and
variation contingent upon the specific wellbore. To mitigate this skewness, we implemented standard scaling
on the training data, a technique designed to make the data adhere more closely to a normal distribution.
We then applied this scaling to the testing dataset to maintain uniformity in the data distribution.
SPE-214881-MS 5
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 3—Box plots of wells F-12 and F-11 showing the statistical distribution.
Data split and Model Evaluation
In our research, we partitioned the dataset into two sections, one for training, comprising 85% of the data,
and one for testing, encompassing the remaining 15%. Our model's training process used historical oil rate,
bottom hole pressure, wellhead pressure, wellhead temperature, and water rate data to predict the upcoming
time step for oil rate. The best model was achieved when wellhead pressure was applied as both past and
future covariates.
6 SPE-214881-MS
For the prediction process, we used the sliding window technique, characterized by two parameters, the
window size and the forecast horizon. The window size refers to the count of preceding time steps that the
model will take into consideration when making a prediction, while the forecast horizon specifies the time
step to be predicted.
Let us consider an example displayed in Fig. 4. Suppose we have a window size of three and a forecast
horizon of one. The model will then take into account the prior three time steps to predict the subsequent
fourth step. After making this prediction, it will progress one step further, consider the next three time steps
and predict the fifth one. This process continues iteratively until all time series data have been segmented
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
and assessed (Al-Ali & Horne, 2023a; Hota et al., 2017.; Lesti & Spiegel, 2017.).
Figure 4—Sliding window technique.
Forecasting Method
In our research, we employed two primary techniques for time series forecasting: Historical Forecasting
and Regression Prediction. Although these methods are both used for time series forecasting, they operate
on different principles. In Historical Forecasting, we take advantage of actual data points to make our
predictions. For instance, when utilizing the sliding window technique with a window horizon of six, the
seventh point is predicted using the actual values of the preceding six points as the window progresses.
In the Regression Prediction method, we use previously forecasted values for further predictions. In this
scenario, the model's predicted value at the sixth point serves as the basis for predicting the seventh point.
For this study, we applied the Historical Forecasting method in the Neural Basis Expansion Analysis
Time Series (N-BEATS) model. On the other hand, we applied the Regression Prediction method in the
Autoregressive Integrated Moving Average (ARIMA), Block Recurrent Neural Network (BlockRNN), and
Temporal Fusion Transformer (TFT) models.
SPE-214881-MS 7
Statistical Modeling Using Autoregressive Integrated Moving Average
(ARIMA)
The Autoregressive Integrated Moving Average (ARIMA) is a most popular statistical methods for time
series forecasting. ARIMA combines two simpler components, as per Eq. 1: the AutoRegressive (AR)
model, which predicts the future using past information, and the Moving Average (MA) model, which
forecasts based on previous forecasting errors.
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
The general formula for ARIMA models is as follows:
This equation comprises two main parts. The first part represents the autoregressive model, and the second
part represents the moving average model. Table 1 summarizes the definition of each parameter in the
ARIMA model.
Table 1—Parameters and Definitions for ARMA Model Components
Parameters Definitions
p The order of AR part, which represents the past values that were used in the prediction
ai The autoregressive coefficients to determine how much the past term influences the current term.
x(t −i) The observed value at the previous time step.
q The order of MA part, which represents how many past forecast errors were used
βi The MA parameters or coefficients that the model learns. They determine how much each past
forecast error influences the current term.
ε(t-1) The past forecast errors
In this study, the hyperparameters, such as the autoregressive parameter (p), integrated parameter (d),
and moving averages (q), were determined by performing a grid search. The search was aimed at finding
the optimal parameters that minimized the Mean Absolute Percentage Error (MAPE). For each well, we
identified different optimal parameters. For instance, for well F-11, the optimal parameters were p=5, d=1,
q=1, while for well F-12, they were p=10, d=1, q=1. The data did not show any signs of seasonality or
trends. However, the seasonality in the data can be analyzed using an autocorrelation plot, which helps
identify similarities between data points. In our implementation, ARIMA uses features such as pressure,
temperature, and choke size as future covariates to predict the oil rate. In the first well, F-11, for example,
five previous oil rate time steps are used to predict the sixth time step, with the window continuously shifting
until the end of the prediction period (Ojedapo et al., 2022; Olominu & Sulaimon, 2014).
However, ARIMA did not perform well in predicting oil rates. As a result, we made several attempts
to improve the ARIMA performance. The first attempt involved transforming the data logarithmically to
achieve stationarity, where stationarity can be tested. The second attempt focused on excluding the last
500 timesteps. This approach made the data easier to predict, as removing the last 500 points revealed
a consistent decreasing trend. The following sections benchmark the performance of the ARIMA model
8 SPE-214881-MS
when deployed on the full dataset without transformation, the logarithmically transformed dataset and the
truncated dataset.
ARIMA Performance on the Full Dataset
85% of the data was allocated for training, while the remaining 15% was used for testing. Fig. 5 and Fig.
6 illustrate the prediction results for well F-12 and F-11 using the complete dataset with varied inputs
such as bottomhole pressure, wellhead pressure, temperature, water rate, and choke size. It was observed
that employing wellhead pressure and water rate as future covariates yielded the most accurate predictions
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
compared to other features. The model could not accurately capture the decreasing trend in the oil rate as
well as producing unrealistic results due to frozen bottomhole pressure data.
Figure 5—F-12, ARIMA oil rate prediction results using future covariates (full dataset). The forecasted
oil rate is represented in black, the original oil rate in green, and the future covariates in red.
SPE-214881-MS 9
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 6—F-11, ARIMA oil rate prediction results using future covariates (full dataset). The forecasted
oil rate is represented in black, the original oil rate in green, and the future covariates in red.
Various error metrics were utilized to assess the model's performance. Mean absolute error proved to be
the best error metric for noisy time series data. ARIMA performed best with short-term predictions and
required stationary data. The initial attempt to use the data and the optimal parameters resulting from the
grid search did not yield satisfactory results. Consequently, a second approach was employed involving
the use of a logarithmic transformer to promote stationarity. A third attempt involved utilizing a truncated
dataset, whereby the last significant change in trend was omitted. The data were tested over 2000 timesteps
where the trend in training and testing was consistent, thus making predictions easier.
10 SPE-214881-MS
Logarithmically Transformed Dataset Results
A logarithmic transformation was applied to the same dataset to enforce stationarity. The results obtained
from this transformed dataset were better than those from the untransformed dataset. However, the model
still did not fully capture the actual oil rate data. Notably, when using the wellhead pressure as a future
covariate, the prediction performed better than with the untransformed dataset. Fig. 7 and Fig. 8 show the
results for the oil rate forecasting using logarithmic transformed dataset.
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 7—ARIMA oil rate prediction results for F-12 using future covariates (logarithmic transformed dataset).
The forecasted oil rate is represented in black, the original oil rate in green, and the future covariates in red.
SPE-214881-MS 11
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 8—ARIMA oil rate prediction results for F-11 using future covariates (logarithmic transformed dataset).
The forecasted oil rate is represented in black, the original oil rate in green, and the future covariates in red.
Table 2 and Table 3 display the corresponding error matrices.
Table 2—The error metrics for well F-12 using the ARIMA model.
Feature MAE SMAPE Coefficient of Variation MARRE RMSE
F-12 BHP 0.18 21.83 201.51 20.96 1.60
F-12 WHP 0.05 22.37 10.10 6.05 0.08
F-12 WHT 0.25 24.94 86.14 29.76 0.68
F-12 Water Rate 0.07 24.91 9.59 8.16 0.08
F-12 Chz (choke size) 0.12 28.15 17.98 14.06 0.14
Total water 0.15 27.73 32.79 18.02 0.26
12 SPE-214881-MS
Table 3—The error metrics for well F-11 using the ARIMA model.
Feature MAE SMAPE Coefficient of Variation MARRE RMSE
F-11 BHP 0.06 8.13 14.35 6.28 0.13
F-11 WHP 0.05 7.69 14.86 5.93 0.14
F-11 WHT 0.08 10.21 15.72 8.49 0.14
F-11 Water Rate 0.04 7.40 4.53 4.05 0.04
F-11 Chz choke size 0.04 8.68 7.30 4.50 0.07
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Total Oil 0.03 5.74 13.47 3.05 0.12
Total water 0.05 7.40 13.07 5.43 0.12
Truncated Dataset Results
The dataset shows a trend in the last 500 timesteps of the prediction period that is different from the earlier
data (in the training set). Ideally, using future covariates, we would expect the model to account for changes
in the rate due to factors such as decreasing pressure, changes in the opening size, or decrease in water rate.
However, ARIMA was not able to accurately incorporate these changes in the prediction phase. Instead,
it largely followed the future feature while predicting the oil rate, failing to capture the changes affecting
oil production. As an attempt to improve this performance, the last 500 data points were removed, and
the model was retrained and tested using the remaining data points. As shown in Fig. 9 and Fig. 10, this
approach resulted in a more accurate forecast compared to the actual oil rate, thus demonstrating improved
performance. Nevertheless, the prediction did occasionally produce negative results for oil rate forecasts.
ARIMA can produce negative forecasts for a variety of reasons. Being designed specifically for stationary
data, ARIMA may behave unpredictably and yield negative results if the data are not properly transformed.
In addition, incorrect selection of optimal parameters could lead the model to produce biased forecasts.
A comprehensive grid search was conducted, and logarithmic transformation was applied to the data.
However, we still encountered negative predictions in our forecast. This indicates that ARIMA might not
be the most suitable model for this specific dataset, or there may be other underlying factors or complexities
within the data that ARIMA is not capable of adequately capturing. These could include nonlinear trends
or dependencies that are beyond the model's capacity.
SPE-214881-MS 13
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 9—ARIMA F-12 oil rate prediction results using the truncated dataset with future covariates. The
forecasted oil rate is represented in black, the original oil rate in green, and the future covariates are shown in red.
14 SPE-214881-MS
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 10—ARIMA F-11 oil rate prediction results using the truncated dataset with future covariates. The
forecasted oil rate is represented in black, the original oil rate in green, and the future covariates are shown in red.
SPE-214881-MS 15
Table 4—The error metrics for well F-12 using the ARIMA model.
Feature MAE SMAPE Coefficient of Variation MARRE RMSE
F-12 BHP 0.02 37.49 10.25 21.01 0.04
F-12 WHP 0.02 35.85 12.35 22.44 0.04
F-12 WHT 0.04 36.78 23.02 33.51 0.08
F-12 Water Rate 0.04 57.23 14.05 39.23 0.05
F-12 Chz choke size 0.07 70.92 28.88 69.84 0.10
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Total Water (F-11 and F-12) 0.04 57.42 14.20 39.63 0.05
Table 5—The error metrics for well F-11 using the ARIMA model.
Feature MAE SMAPE Coefficient of Variation MARRE RMSE
F-11 BHP 0.11 14.85 30.52 15.32 0.22
F-11 WHP 0.09 12.62 29.17 12.23 0.21
F-11 WHT 0.11 14.49 31.18 14.90 0.22
F-11 Water Rate 0.06 9.31 12.39 8.48 0.09
F-11 Chz choke size 0.06 8.36 12.30 7.75 0.09
Total water 0.07 10.12 13.75 9.37 0.10
Deep Learning Modeling
In our research, we tried a range of deep learning models specifically designed for handling time series data,
with the goal of predicting oil rates for the two wells. These models, characterized by their ability to manage
sequence dependencies, include the Block Recurrent Neural Network (BlockRNN), the Temporal Fusion
Transformer (TFT), and the Neural Basis Expansion Analysis Time Series (N-BEATS). Each model offers a
unique approach to handling sequential data. The BlockRNN is a traditional recurrent based neural network
that uses memory to understand and predict new data. In contrast, the TFT, a transformer based model, relies
on multihead self-attention within both the encoder and decoder. This mechanism allows it to selectively
focus on specific parts of the input sequence, thereby effectively managing long term dependencies in
the data. The N-BEATS model, on the other hand, employs fully connected multilayer blocks to process
sequential data. Each block is designed to capture different patterns within the sequence, and the final
prediction is made based on the outputs of these blocks (Al-Ali & Horne, 2023a; Cheng et al., 2016; Vaswani
et al., 2017).
In our initial approach, we trained the dataset using the BlockRNN model to predict the oil rates for
both wells, F-11 and F-12. We faced considerable challenges due to the data's complexity, which exhibited
no clear trend or seasonality and was highly noisy. As a result, fitting the BlockRNN model proved
challenging, necessitating numerous iterations and meticulous tuning of hyperparameters. Despite these
extensive efforts, the model's performance on the test dataset fell short of expectations.
In our second approach, we tried the Temporal Fusion Transformer (TFT) model. The TFT model
amalgamates the mechanisms used in BlockRNN layers, but adds the attention heads of transformers, and
the Gated Residual Network (GRN) to learn relationships along the time axis. The TFT model outperformed
the BlockRNN model, requiring less hyperparameter tuning and demonstrating an ability to handle noisy
data more efficiently. TFT also offered the added benefit of providing a probabilistic forecast.
In our third approach, we tried the N-BEATS model using a transfer learning technique. This model
had been pretrained on the M4 dataset, which is composed of 100,000 diverse time series data collected
from various real world sectors, such as finance, economics and industry as showing in Fig. 10. These time
16 SPE-214881-MS
series differ in their frequencies and lengths, which makes the dataset an ideal source of diversity. This
extensive dataset facilitated the model's ability to make predictions without dependency on any specific
features. Instead, the model learned to identify and interpret various trends, frequencies, and noise patterns
inherent in the M4 data. For our purpose, the pretrained model employed only the oil rate data to derive
predictions. (Al-Ali & Horne, 2023a, 2023b)
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 11—M4 dataset used for training the N-BEATS model.
Block Recurrent Neural Network (BlockRNN)
The BlockRNN structure was designed using the Block Recurrent Neural Network model. In order to
determine the optimal set of hyperparameters for the model, we utilized a grid search technique. This
strategy examined various parameter combinations, such as the number of layers, the number of units per
layer, batch size, and learning rate to minimize the mean squared error loss. As demonstrated in Table 6
and Fig. 12, the fine-tuned model featured two LSTM layers with 1000 units each, succeeded by three
fully connected layers of sizes 512, 512, and 1024. We employed a batch size of 96, and utilized the
Adam optimizer with a learning rate of 10−4 to minimize the Mean Squared Error (MSE) loss function. We
trained the model for 300 epochs, and integrated an early stopping technique with a patience parameter set
to 50 epochs. Throughout the hyperparameters tuning process, we found that larger model architectures,
smaller learning rates, and larger batch sizes were necessary to minimize the loss function. The best past
covariates feature for well F-12 was determined to be the wellhead pressure. However, for well F-11, a
satisfactory match could not be achieved due to the limited data available for training and testing. This
was primarily attributed to significant variation in the oil rate at later time steps, making it challenging to
capture accurately.
SPE-214881-MS 17
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 12—BlockRNN optimized architecture.
Table 6—Key parameters of BlockRNN model
Parameters Value
Number of LSTM layers 2
Number of units in LSTM 1000
Learning rate 0.00001
Number of Fully connected layers (FC) 3
Number of neurons in FC layers 512, 512, 1024
Epochs 85
Batch size 96
Optimizer Adam
Loss function MSE
Temporal Fusion Transformer (TFT)
The Temporal Fusion Transformer (TFT) represents a cutting edge attention-based deep learning approach
to time series forecasting. TFT primary building block is a Gated Residual Network (GRN), which includes
two dense layers coupled with two activation functions, the Exponential Linear Unit (ELU) and the Gated
Linear Unit (GLU) (Lim et al., 2020). This combination facilitates both skip connections and gating,
promoting efficient information flow throughout the network. TFT also contains a Variable Selection
Network (VSN) for selecting the most relevant features at each time step as showing in Fig. 13. Time
dependent processing is based on an LSTM encoder-decoder for local processing and a self-attention layer
for learning long range dependencies across different time steps. In our model, the optimized parameters
were identified using a grid search technique. We found that fewer attention head layers contributed to
minimizing the loss function. Table 7 displays the parameter range for our grid search, while Table 8 presents
the most effective parameters. The optimal model consisted of a single LSTM layer, 64 hidden neurons
within each dense layer of the GRN, and a single self-attention layer. The TFT is trained by minimizing the
quantile loss function (Eq.3) summed across three quantiles, (q) ∈ [0.1, 0.5, 0.9], as per equation provided
below.
18 SPE-214881-MS
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 13—The Transformer model architecture (Vaswani et al., 2017)
Table 7—Grid search parameters for TFT model
Parameters Value range
Input sequence length [10,20,30]
Number of LSTM layers [1,2,3]
Number of neurons in the hidden layer [32,64,128]
Learning rate [10−2, 10 −3, 10 −4]
Number of attention heads [1,2,3,4]
Epochs 500
Batch size [64,96,128]
Dropout rate [0,0.01,.1]
Optimizer Adam
Loss function Quantile (0.5)
SPE-214881-MS 19
Table 8—Optimal hyperparameters for the best TFT model
Parameter Value
Input sequence length 30
Number of LSTM layers 1
Number of neurons in the hidden layer 128
Learning rate 0.0001
Number of attention heads 1
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Epochs 135
Batch size 96
Dropout rate 0.01
Optimizer Adam
Loss function Quantile (0.5)
For the training process, we used a batch size of 96 and the Adam optimizer with a learning rate of 10−4
to minimize the quantile loss function centered around 0.5. The training procedure aligned with that of the
earlier BlockRNN model, wherein we monitored the loss and implemented an early stopping technique to
prevent overfitting (Al-Ali & Horne, 2023b).
The Neural Basis Expansion Analysis Time Series Model (N-BEATS)
The Neural Basis Expansion Analysis Time Series Model (N-BEATS) is fundamentally composed of a
multilayered, fully connected network. Each building block within the model produces two expansion
coefficients: a forward projection (forecast) and a backward projection (backcast). These blocks are
systematically assembled into stacks following the doubly residual stacking principle. The individual
forecasts generated by each block are hierarchically aggregated (Oreshkin et al., 2020). To simplify, consider
N-BEATS as a brain that uses past information to predict future events, even when some details are absent.
Each small component (block) within this brain uses the data available to it (past and current) to project
future values (forward) and fill in missing past values (backward) based on discernible patterns (expansion
coefficients). The brain's final prediction is an amalgamation of all individual block predictions. (Al-Ali
& Horne, 2023a)
We pretrained the N-BEATS model on the M4 series, leveraging the hyperparameters outlined in Table
9. The model primarily consists of two layers and 20 stacks and was trained on a GPU machine. During the
training phase, we used the Symmetric Mean Absolute Percentage Error (SMAPE) as our chosen metric for
the loss function. This measure quantifies the average, symmetrical difference between the predicted and
actual values. A transfer learning approach was then used to predict the oil rate series of our test dataset.
This was done by applying the model parameters and weights learned from the M4 series. Subsequently,
we utilized the pretrained model to predict the oil rate, even without having any prior knowledge of the
Volve field data.
Table 9—Key parameters of N-BEATS model
Parameters Value
Number of stacks 20
Number of blocks 1
Number of layers 2
FLayer width 136
Expansion coefficient 11
20 SPE-214881-MS
Parameters Value
Learning rate 0.0001
Optimizer Adam
Loss function SMAPE
Forecasting Results of Deep Learning Models
Fig. 14 and Fig. 15 demonstrate the prediction results for well F-12 using the BlockRNN model with varying
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
input features. Our observations indicated that incorporating wellhead pressure as a covariate led to more
accurate predictions compared to other features. However, we were unable to achieve a reasonable match
for well F-11.
Figure 14—Comparison of oil rate prediction in rest data using BlockRNN
model with varying covariates: water rate and wellhead pressure.
SPE-214881-MS 21
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 15—Comparison of oil rate prediction in test data using BlockRNN model with varying
covariates: wellhead pressure and choke size opening, with wellhead pressure showing best results
Fig. 16 shows the prediction results using the TFT model with wellhead pressure as a covariate. The
TFT provides a probabilistic forecast with lower and upper bounds using a quantile of 0.5. Compared to
the BlockRNN model, the TFT model delivered more accurate predictions for both wells. The TFT model
performed better than the BlockRNN model in this study due to its ability to handle nonstationary and noisy
data, which are common issues in well production data. The TFT model was able to filter out the noise
and make more accurate predictions, compared to the BlockRNN model which was unable to decode the
noise. In addition, the TFT model provided a probabilistic forecast allowing for a better understanding of
the uncertainties.
22 SPE-214881-MS
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 16—Oil rate prediction using TFT model and wellhead pressure as covariates.
On the other hand, the Neural Basis Expansion Analysis Time Series (N-BEATS) model outperformed all
previously mentioned models, including: ARIMA, BlockRNN and TFT, without the requirement of tuning
SPE-214881-MS 23
or additional feature inputs. The pretrained N-BEATS model based on the M4 dataset was used as is and
applied to both wells. The results of the N-BEATS model predictions are shown in Fig. 17 and Fig. 18.
Using transfer learning with the N-BEATS model yielded promising results, particularly for fields with
limited data (green fields). This method can be employed to overcome limitations related to data availability.
However, as mentioned earlier, we used the historical forecast method in the N-BEATS transfer learning.
Therefore, we conducted a sensitivity analysis to identify the horizon length that leads to high error. Fig.
19 and Fig. 20 shows the sensitivity analysis we performed using different horizon lengths up to nine time
steps. As demonstrated in the figures, even with a forecast horizon of nine time steps, the N-BEATS transfer
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
learning model still outperformed the other models.
Figure 17—F-12 oil rate prediction results using N-BEATS transfer model.
The forecasted oil rate is represented in black, the original oil rate in green.
Figure 18—F-11 oil rate prediction results using N-BEATS transfer model.
The forecasted oil rate is represented in black, the original oil rate in green.
24 SPE-214881-MS
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Figure 19—Oil rate prediction results for Well F-12 using the N-BEATS transfer
model with different horizon lengths, each represented by a different color.
Figure 20—Oil rate prediction results for Well F-11 using the N-BEATS transfer
model with different horizon lengths, each represented by a different color.
Conclusion
Our research provided key insights into the effectiveness of various forecasting techniques for time series
analysis. The conventional ARIMA model encountered difficulties due to its inherent assumptions and
limited ability to handle nonstationary and noisy data, as encountered in our real time well production
dataset. The ARIMA model's dependency on data stationarity and its proficiency for short term predictions
restricted its efficacy when dealing with complex datasets. However, deep learning models such as the
BlockRNN, TFT, and N-BEATS demonstrated their potential to manage sequential data with varied levels of
success. The BlockRNN model, while theoretically effective, found it challenging to navigate the complex
and noisy well production data. The TFT model outperformed the BlockRNN, showing superior adaptability
SPE-214881-MS 25
and performance. TFT capacity to efficiently process nonstationary and noisy data, along with the provision
of a probabilistic forecast, distinguished it as a formidable forecasting tool. However, the N-BEATS model
emerged as the top performer, surpassing all other models. This model, which needed no additional feature
inputs or tuning, underscored the benefits of using a pretrained model (trained on the M4 dataset) and the
effectiveness of transfer learning. This approach was particularly advantageous for fields with limited data
(greenfield).
References
Downloaded from http://onepetro.org/SPEATCE/proceedings-pdf/23ATCE/3-23ATCE/D031S032R003/3301078/spe-214881-ms.pdf/1 by Petrobras user on 28 June 2024
Al-Ali, Z. A.-A. H., & Horne, R. (2023a, March 13). Meta Learning Using Deep N-BEATS Model for Production
Forecasting with Limited History. Gas & Oil Technology Showcase and Conference. https://doi.org/10.2118/214214-
MS
Al-Ali, Z. A.-A. H., & Horne, R. 2023b, March 13. Probabilistic Well Production Forecasting in Volve Field Using
Temporal Fusion Transformer Deep Learning Models. Gas & Oil Technology Showcase and Conference. https://
doi.org/10.2118/214133-MS
Cheng, J., Dong, L., & Lapata, M. 2016. Long Short-Term Memory-Networks for Machine Reading (arXiv:1601.06733).
arXiv. http://arxiv.org/abs/1601.06733
Hota, H. S., Handa, R., & Shrivas, A. K. 2017. Time Series Data Prediction Using Sliding Window Based RBF Neural
Network. https://www.ripublication.com/ijcir17/ijcirv13n5_46
Lesti, G., & Spiegel, S. 2017. A Sliding Window Filter for Time Series Streams. https://ceur-ws.org/Vol-1958
Ojedapo, B., Ikiensikimama, S. S., & Wachikwu-Elechi, V. U. 2022. Petroleum Production Forecasting Using Machine
Learning Algorithms. Day 3 Wed, August 03, 2022, D031S018R005. https://doi.org/10.2118/212018-MS
Olominu, O., & Sulaimon, A. A. 2014. Application of Time Series Analysis to Predict Reservoir Production Performance.
All Days, SPE-172395-MS. https://doi.org/10.2118/172395-MS
Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. 2020. Meta-learning framework with applications to zero-shot
time-series forecasting (arXiv:2002.02887). arXiv. https://doi.org/10.48550/arXiv.2002.02887
Samo, A. O. (n.d.). Reservoir Characterization of the Volve Field North Sea, Using Rock-physics Modeling [M.S., Texas
A&M University - Kingsville]. Retrieved January 21, 2023, from https://www.proquest.com/docview/2445578339/
abstract/2325EDA1F4954B5FPQ/1
Sen, S., & Ganguli, S. S. 2019, April 8. Estimation of Pore Pressure and Fracture Gradient in Volve Field, Norwegian
North Sea. SPE Oil and Gas India Conference and Exhibition. https://doi.org/10.2118/194578-MS
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. 2017. Attention
Is All You Need (arXiv:1706.03762). arXiv. http://arxiv.org/abs/1706.03762