EE 461 Final Post II: Load Forecasting for Tonga Power Limited
Technical Results
Data Visualizations
Cleaned data of any outlier or ambiguity.
Table 1:
A subset of the 222x4 dataset.
No. Samples |
Dataset Variables |
|||
GENERATED |
SENT OUT |
MONTH |
Tmp2m (C') |
|
1 |
4399713 |
4321268 |
7 |
23.5701 |
2 |
4503448 |
4422164 |
8 |
23.8409 |
3 |
4386093 |
4291216 |
9 |
24.3952 |
4 |
4559317 |
4438567 |
10 |
24.3477 |
5 |
4332707 |
4218030 |
11 |
25.8833 |
6 |
4473349 |
4378796 |
12 |
25.9076 |
7 |
4739349 |
4611365 |
1 |
27.3127 |
8 |
4558099 |
4435805 |
2 |
27.6036 |
9 |
4737354 |
4629863 |
3 |
27.8743 |
Table 1 is a subset table representing the first 9 rows of our dataset which contains 4 variables with over 200 observations. This table shows the variables that have been chosen to be the main features of study for load forecasting as they already correlate with each other. The Sen Out power which is the power sent through the distribution lines after generation is our focus of study (Response Variable), this is highlight by the fact that during the timeline of this dataset the community of Tonga had been bypassing the energy from the meter, therefore entailing that the billed power from the Tong Power Limited (TPL) company may be inaccurate or have outliers in itself.
Engineered Features
Table 2:
The Engineered Features for our dataset.
No. |
Engineered Features |
|||||
SENT OUT LAG1 |
SENT OUT ROLLING (12M) |
EFFICIENCY |
GROWTH RATE |
MONTH SIN |
MONTH COS |
|
1 |
NaN |
4321268 |
0.9822 |
NaN |
0.5 |
0.866 |
2 |
4321268 |
4371716 |
0.982 |
0.0233 |
0.5 |
0.866 |
3 |
4422164 |
4344900 |
0.9784 |
-0.0296 |
0.5 |
0.866 |
4 |
4291216 |
4368300 |
0.9735 |
0.0343 |
0.5 |
0.866 |
5 |
4438567 |
4338249 |
0.9735 |
-0.0497 |
0.5 |
0.866 |
6 |
4218030 |
4345000 |
0.9789 |
0.0381 |
0.5 |
0.866 |
7 |
4378796 |
4383058 |
0.973 |
0.0531 |
0.5 |
0.866 |
8 |
4611365 |
4389700 |
0.9732 |
-0.0381 |
0.5 |
0.866 |
9 |
4435805 |
4416300 |
0.9773 |
0.0437 |
0.5 |
0.866 |
Through the capture of temporal patterns, trends, and cyclical behavior in electricity demand, the engineering characteristics given aim to improve a load forecasting model's predictive capability. In order to capture instantaneous trends or variations, the "sent out lag1" feature which provides short-term memory of recent load values represents the load from the preceding time step. In order to smooth out short-term fluctuations and draw attention to longer-term seasonal or trend patterns, the "Sent out rolling (12M)" is a 12-month rolling average of the sent-out load. "Efficiency" is probably a ratio that provides information about operational performance or system effectiveness by comparing the actual load to a reference or expected number. The "Growth Rate" helps the model determine how quickly demand is rising or falling by capturing the relative change in load from one period to the next. Last but not least, "Month Sine" and "Month Cosine" are cyclical transformations of the month index using sine and cosine functions. This eliminates the need for artificial gaps between months (such as those between December and January) and enables the model to continuously and non-linearly learn seasonal patterns. When combined, these characteristics offer a comprehensive understanding of seasonality, historical trends, and current load behavior dynamics all of which are essential for precise forecasting.
Forecasting Models
Models Hyperparameters
Table 3: Classic statistical models.
Hyper-parameter |
Classical Models |
|
ARIMA |
SARIMA |
|
p |
2 |
1 |
d |
0 |
0 |
q |
1 |
1 |
P |
- |
1 |
D |
- |
0 |
Q |
- |
7 |
S |
- |
9 |
The three primary hyperparameters of ARIMA (Autoregressive
Integrated Moving Average), a popular classical model in time series
forecasting, are p, d, and q. The number of autoregressive (AR) terms, or how
many historical values are utilized to predict the current value, is
represented by the parameter p. Since ARIMA has p = 2, the model considers the
last two observations. When d = 0, it indicates that there is no need for
differencing because the data is already stationary. The number of moving
average (MA) terms that take into consideration the impact of previous forecast
errors on the prediction is known as the q parameter. In this case, q = 1
denotes that there is one lag forecast error in the model.
SARIMA (Seasonal ARIMA) adds more hyperparameters to simulate seasonal trends. These are P, D, Q, and S. The first three are comparable to p, d, and q in ARIMA, but with the seasonal component added. P is specifically the number of seasonal autoregressive terms; in your example, P = 1, meaning that just one seasonal lag is taken into account. When D = 0, it indicates that no seasonal differencing is being used. D is the order of seasonal differencing. With Q = 7, the model incorporates seven seasonal lagged forecast errors. Q is the number of seasonal moving average terms. The seasonal cycle's duration is determined by S, which in your instance is S = 9. This means that seasonal impacts recur every nine-time steps. SARIMA (1, 0, 1) (1, 0, 7, 9) is the collective representation of the SARIMA model you supplied, which captures both seasonal and short-term patterns for more precise forecasting.
Table 4:
Deep learning models.
Hyper-parameter |
Deep Learning Models |
||
LSTM |
CNN |
Hybrid |
|
NumLags |
10 |
- |
- |
HiddenUnit |
54 |
64 |
82 |
DropOut |
0.48387 |
0.2 |
0.19794 |
Epochs |
224 |
100 |
- |
Filter |
- |
32 |
32 |
BatchSize |
- |
16 |
- |
Window |
- |
- |
12 |
The performance of deep learning models, including LSTM,
CNN, and hybrid LSTM-CNN architectures, is greatly influenced by a number of
hyperparameters. The NumLags parameter for the LSTM model is set to 10, meaning
that each prediction is based on input features from the last ten-time steps of
historical data. The network can capture intricate temporal correlations in the
input sequence because the Hidden Unit parameter, which indicates the number of
neurons in the LSTM layer, is set to 54. By randomly deactivating almost 48% of
the neurons during training, a comparatively high DropOut rate of 0.48387 is
used to avoid overfitting. To ensure sufficient learning over several runs of
the dataset, the model is trained over 224 epochs.
To balance regularization and model capacity, the CNN
model's architecture consists of 64 hidden units with a DropOut rate of 0.2.
The model can identify local temporal patterns in the input sequence with the
use of the CNN's 32 filters. The model updates its weights every 16 samples
when training with a BatchSize of 16, which helps expedite training and enhance
convergence. To enable adequate learning of local features, the training
process lasts for 100 epochs.
The architecture of the hybrid LSTM-CNN model includes 82 hidden units to give increased learning capacity, combining the advantages of both local pattern extraction (CNN) and temporal memory (LSTM). To reduce overfitting, a DropOut rate of 0.19794 is employed, same like in the solo CNN model. In keeping with the CNN approach, the hybrid model also uses 32 convolutional filters to extract features. In order to align the data structure for both the convolutional and recurrent processing layers, the model also uses a window size of 12, which means it accepts a sequence of 12 time steps as input. Although the hybrid model does not specify the precise number of training epochs, this is usually ascertained by early halting or validation performance.
Model Comparison
all 5 models.
|
ARIMAX |
||||
Evaluation Metrics |
|||||
MAE |
MAPE |
MSE |
RMSE |
R² |
|
Test |
0.023 |
10.65% |
0.011 |
0.0331 |
0.9978 |
Training |
0.0091 |
3.17% |
0.0004 |
0.0201 |
0.9984 |
|
SARIMAX |
||||
Evaluation Metrics |
|||||
MAE |
MAPE |
MSE |
RMSE |
R² |
|
Test |
0.2081 |
79.34% |
0.0657 |
0.2562 |
0.8641 |
Training |
0.1897 |
94.11% |
0.0529 |
0.23 |
0.7946 |
|
LSTM |
||||
Evaluation Metrics |
|||||
MAE |
MAPE |
MSE |
RMSE |
R² |
|
Training |
0.2252 |
165.48% |
0.1163 |
0.3411 |
0.6364 |
Validation |
0.3132 |
66.18% |
0.2244 |
0.4737 |
0.2873 |
Test |
0.3179 |
38.17% |
0.2044 |
0.4521 |
0.4173 |
|
CNN |
||||
Evaluation Metrics |
|||||
MAE |
MAPE |
MSE |
RMSE |
R² |
|
Training |
0.1196 |
47.37% |
0.0233 |
0.1526 |
0.9801 |
Validation |
0.1083 |
27.28% |
0.0177 |
0.133 |
0.9815 |
Test |
0.1083 |
27.29% |
0.0177 |
0.1672 |
0.9636 |
|
LSTM/CNN Hybrid |
||||
Evaluation Metrics |
|||||
MAE |
MAPE |
MSE |
RMSE |
R² |
|
Training |
0.1852 |
79.59% |
0.0527 |
0.2296 |
0.9479 |
Validation |
0.257 |
135.17% |
0.1031 |
0.3211 |
0.8761 |
Test |
0.4017 |
173.75% |
0.2312 |
0.4808 |
0.775 |
Generation Scheduling
Data de-normalization
The ARIMAX model's load forecast uses normalized input and output data to enhance model performance. To obtain the corresponding raw value for generation scheduling, the normalized output value is subsequently de-normalized.
Generation Sites, Capacities and Priority
Tonga has a hybrid generation system with Solar PV, Wind
Farms, BESS and Diesel Generators.
Table 6:
Generation sites and its generators types.
Name |
Type |
BaseCapacity |
Priority |
'Solar Farm Maama Mai' |
'solar' |
1.412 |
1 |
'Solar Farm Mata o e Laa' |
'solar' |
1.3 |
1 |
'Solar Farm Singyes' |
'solar' |
2.13 |
1 |
'Solar Farm Sunergise 1' |
'solar' |
2.3 |
1 |
'Solar Farm Sunergise 2' |
'solar' |
2.3 |
1 |
'Solar Farm Sunergise 3' |
'solar' |
2.3 |
1 |
'Wind Farm I o Manumataongo' |
'wind' |
1.375 |
2 |
'Diesel Powerplant Popua 1.11' |
'diesel' |
2.765 |
3 |
'Diesel Powerplant Popua 1.12' |
'diesel' |
2.765 |
3 |
'Diesel Powerplant Popua 1.21' |
'diesel' |
1.4 |
3 |
'Diesel Powerplant Popua 1.22' |
'diesel' |
1.4 |
3 |
'Diesel Powerplant Popua 1.23' |
'diesel' |
1.4 |
3 |
'Diesel Powerplant Popua 1.24' |
'diesel' |
1.4 |
3 |
'Diesel Powerplant Popua 1.25' |
'diesel' |
1.4 |
3 |
'Diesel Powerplant Popua 1.26' |
'diesel' |
1.4 |
3 |
The table above shows generation site names, base capacity,
and priority. Priority is very important in generation dispatch because it
tells the Generation’s Operator which power sources to use first when meeting
the forecasted load demand. As a result, a more efficient, cost-effective, and
better environmentally responsible operation will be achieved.
Since renewable energy sources have lower operating costs
and can generate energy from free inputs like solar irradiance and wind, it is
given a higher priority. Tariffs will decrease as a result and reserve diesel generators
which have higher operating costs for peak hour demand.
With the world struggling with the effect of Greenhouse
gases, carbon footprint is vital and hence prioritizing renewable energy
sources first as it has zero fuel cost and low emissions. Since Pacific Islands are heavy relying on
fuel imports for electricity generation, price of electricity is also high to
cater for the price shock of oil.
Supplying stable electricity supply to the customers is the
main focus of every Electricity Utility company. Therefore, priority ensures
that there are enough spinning reserves available to quickly respond to load
change and fluctuations due to any drop of the renewable energy generation drop
due to cloud over the solar PV farms or wind speed suddenly drops and most
importantly it will help Planners to plan the maintenance time for each
generation sites.
Table 7:
Generation Planning and Maintenance Scheduling.
Month |
1 |
2 |
3 |
Forecasted Load |
7.58 MW |
7.57 MW |
7.55 |
Generation Site |
Power Generated (MW) |
Power Generated (MW) |
Power Generated (MW) |
Solar Farm Maama Mai |
0.38 |
0.36 |
0.35 |
Solar Farm Mata ó e Laá |
0.35 |
0.34 |
0.33 |
Solar Farm Singyes |
0.58 |
0.55 |
0.53 |
Solar Farm Sunergise 1 |
0.62 |
0.59 |
0.57 |
Solar Farm Sunergise 2 |
0.62 |
0.59 |
0.57 |
Solar Farm Sunergise 3 |
0.62 |
0.59 |
0.57 |
Wind Farm I o
Manumataongo |
0.69 |
0.73 |
0.82 |
Diesel Power Plant 1.11 |
2.21 |
2.21 |
2.21 |
Diesel Power Plant 1.12 |
1.5 |
1.59 |
1.58 |
As evident from the table above, the load forecasted demand will be 7.58 MW, 7.57 MW, and 7.55 MW for the first three months respectively. Since solar pv and wind has been prioritized higher, they contributed significantly in all the months. However, as also indicated in the table, the average generation amount renewable source varies due to decreasing in solar irradiance. In January which is summer, the irradiance is higher but wind speed is lower and in March where winder starts, solar irradiance decreases while the wind speed start increasing.
To limit fuel consumption or emissions, diesel generators are consistently ran at 80%. This also increase the operation lifetime of the generator but most importantly spinning reserves have enough spinning reserve in the system.
Comments
Post a Comment