Technical Results

Data Visualizations

Cleaned data of any outlier or ambiguity.

Table 1: A subset of the 222x4 dataset.

No. Samples	Dataset Variables
No. Samples	GENERATED	SENT OUT	MONTH	Tmp2m (C')
1	4399713	4321268	7	23.5701
2	4503448	4422164	8	23.8409
3	4386093	4291216	9	24.3952
4	4559317	4438567	10	24.3477
5	4332707	4218030	11	25.8833
6	4473349	4378796	12	25.9076
7	4739349	4611365	1	27.3127
8	4558099	4435805	2	27.6036
9	4737354	4629863	3	27.8743

Table 1 is a subset table representing the first 9 rows of our dataset which contains 4 variables with over 200 observations. This table shows the variables that have been chosen to be the main features of study for load forecasting as they already correlate with each other. The Sen Out power which is the power sent through the distribution lines after generation is our focus of study (Response Variable), this is highlight by the fact that during the timeline of this dataset the community of Tonga had been bypassing the energy from the meter, therefore entailing that the billed power from the Tong Power Limited (TPL) company may be inaccurate or have outliers in itself.

Engineered Features

Table 2: The Engineered Features for our dataset.

No.	Engineered Features
No.	SENT OUT LAG1	SENT OUT ROLLING (12M)	EFFICIENCY	GROWTH RATE	MONTH SIN	MONTH COS
1	NaN	4321268	0.9822	NaN	0.5	0.866
2	4321268	4371716	0.982	0.0233	0.5	0.866
3	4422164	4344900	0.9784	-0.0296	0.5	0.866
4	4291216	4368300	0.9735	0.0343	0.5	0.866
5	4438567	4338249	0.9735	-0.0497	0.5	0.866
6	4218030	4345000	0.9789	0.0381	0.5	0.866
7	4378796	4383058	0.973	0.0531	0.5	0.866
8	4611365	4389700	0.9732	-0.0381	0.5	0.866
9	4435805	4416300	0.9773	0.0437	0.5	0.866

Through the capture of temporal patterns, trends, and cyclical behavior in electricity demand, the engineering characteristics given aim to improve a load forecasting model's predictive capability. In order to capture instantaneous trends or variations, the "sent out lag1" feature which provides short-term memory of recent load values represents the load from the preceding time step. In order to smooth out short-term fluctuations and draw attention to longer-term seasonal or trend patterns, the "Sent out rolling (12M)" is a 12-month rolling average of the sent-out load. "Efficiency" is probably a ratio that provides information about operational performance or system effectiveness by comparing the actual load to a reference or expected number. The "Growth Rate" helps the model determine how quickly demand is rising or falling by capturing the relative change in load from one period to the next. Last but not least, "Month Sine" and "Month Cosine" are cyclical transformations of the month index using sine and cosine functions. This eliminates the need for artificial gaps between months (such as those between December and January) and enables the model to continuously and non-linearly learn seasonal patterns. When combined, these characteristics offer a comprehensive understanding of seasonality, historical trends, and current load behavior dynamics all of which are essential for precise forecasting.

Forecasting Models

Models Hyperparameters

Table 3: Classic statistical models.

Hyper-parameter	Classical Models
Hyper-parameter	ARIMA	SARIMA
p	2	1
d	0	0
q	1	1
P	-	1
D	-	0
Q	-	7
S	-	9

The three primary hyperparameters of ARIMA (Autoregressive Integrated Moving Average), a popular classical model in time series forecasting, are p, d, and q. The number of autoregressive (AR) terms, or how many historical values are utilized to predict the current value, is represented by the parameter p. Since ARIMA has p = 2, the model considers the last two observations. When d = 0, it indicates that there is no need for differencing because the data is already stationary. The number of moving average (MA) terms that take into consideration the impact of previous forecast errors on the prediction is known as the q parameter. In this case, q = 1 denotes that there is one lag forecast error in the model.

SARIMA (Seasonal ARIMA) adds more hyperparameters to simulate seasonal trends. These are P, D, Q, and S. The first three are comparable to p, d, and q in ARIMA, but with the seasonal component added. P is specifically the number of seasonal autoregressive terms; in your example, P = 1, meaning that just one seasonal lag is taken into account. When D = 0, it indicates that no seasonal differencing is being used. D is the order of seasonal differencing. With Q = 7, the model incorporates seven seasonal lagged forecast errors. Q is the number of seasonal moving average terms. The seasonal cycle's duration is determined by S, which in your instance is S = 9. This means that seasonal impacts recur every nine-time steps. SARIMA (1, 0, 1) (1, 0, 7, 9) is the collective representation of the SARIMA model you supplied, which captures both seasonal and short-term patterns for more precise forecasting.

Table 4: Deep learning models.

Hyper-parameter	Deep Learning Models
Hyper-parameter	LSTM	CNN	Hybrid
NumLags	10	-	-
HiddenUnit	54	64	82
DropOut	0.48387	0.2	0.19794
Epochs	224	100	-
Filter	-	32	32
BatchSize	-	16	-
Window	-	-	12

The performance of deep learning models, including LSTM, CNN, and hybrid LSTM-CNN architectures, is greatly influenced by a number of hyperparameters. The NumLags parameter for the LSTM model is set to 10, meaning that each prediction is based on input features from the last ten-time steps of historical data. The network can capture intricate temporal correlations in the input sequence because the Hidden Unit parameter, which indicates the number of neurons in the LSTM layer, is set to 54. By randomly deactivating almost 48% of the neurons during training, a comparatively high DropOut rate of 0.48387 is used to avoid overfitting. To ensure sufficient learning over several runs of the dataset, the model is trained over 224 epochs.

To balance regularization and model capacity, the CNN model's architecture consists of 64 hidden units with a DropOut rate of 0.2. The model can identify local temporal patterns in the input sequence with the use of the CNN's 32 filters. The model updates its weights every 16 samples when training with a BatchSize of 16, which helps expedite training and enhance convergence. To enable adequate learning of local features, the training process lasts for 100 epochs.

The architecture of the hybrid LSTM-CNN model includes 82 hidden units to give increased learning capacity, combining the advantages of both local pattern extraction (CNN) and temporal memory (LSTM). To reduce overfitting, a DropOut rate of 0.19794 is employed, same like in the solo CNN model. In keeping with the CNN approach, the hybrid model also uses 32 convolutional filters to extract features. In order to align the data structure for both the convolutional and recurrent processing layers, the model also uses a window size of 12, which means it accepts a sequence of 12 time steps as input. Although the hybrid model does not specify the precise number of training epochs, this is usually ascertained by early halting or validation performance.

Model Comparison

all 5 models.

	ARIMAX
	Evaluation Metrics
	MAE	MAPE	MSE	RMSE	R²
Test	0.023	10.65%	0.011	0.0331	0.9978
Training	0.0091	3.17%	0.0004	0.0201	0.9984
	SARIMAX
	Evaluation Metrics
	MAE	MAPE	MSE	RMSE	R²
Test	0.2081	79.34%	0.0657	0.2562	0.8641
Training	0.1897	94.11%	0.0529	0.23	0.7946
	LSTM
	Evaluation Metrics
	MAE	MAPE	MSE	RMSE	R²
Training	0.2252	165.48%	0.1163	0.3411	0.6364
Validation	0.3132	66.18%	0.2244	0.4737	0.2873
Test	0.3179	38.17%	0.2044	0.4521	0.4173
	CNN
	Evaluation Metrics
	MAE	MAPE	MSE	RMSE	R²
Training	0.1196	47.37%	0.0233	0.1526	0.9801
Validation	0.1083	27.28%	0.0177	0.133	0.9815
Test	0.1083	27.29%	0.0177	0.1672	0.9636
	LSTM/CNN Hybrid
	Evaluation Metrics
	MAE	MAPE	MSE	RMSE	R²
Training	0.1852	79.59%	0.0527	0.2296	0.9479
Validation	0.257	135.17%	0.1031	0.3211	0.8761
Test	0.4017	173.75%	0.2312	0.4808	0.775

Generation Scheduling

Data de-normalization

The ARIMAX model's load forecast uses normalized input and output data to enhance model performance. To obtain the corresponding raw value for generation scheduling, the normalized output value is subsequently de-normalized.

Generation Sites, Capacities and Priority

Tonga has a hybrid generation system with Solar PV, Wind Farms, BESS and Diesel Generators.

Table 6: Generation sites and its generators types.

Name	Type	BaseCapacity	Priority
'Solar Farm Maama Mai'	'solar'	1.412	1
'Solar Farm Mata o e Laa'	'solar'	1.3	1
'Solar Farm Singyes'	'solar'	2.13	1
'Solar Farm Sunergise 1'	'solar'	2.3	1
'Solar Farm Sunergise 2'	'solar'	2.3	1
'Solar Farm Sunergise 3'	'solar'	2.3	1
'Wind Farm I o Manumataongo'	'wind'	1.375	2
'Diesel Powerplant Popua 1.11'	'diesel'	2.765	3
'Diesel Powerplant Popua 1.12'	'diesel'	2.765	3
'Diesel Powerplant Popua 1.21'	'diesel'	1.4	3
'Diesel Powerplant Popua 1.22'	'diesel'	1.4	3
'Diesel Powerplant Popua 1.23'	'diesel'	1.4	3
'Diesel Powerplant Popua 1.24'	'diesel'	1.4	3
'Diesel Powerplant Popua 1.25'	'diesel'	1.4	3
'Diesel Powerplant Popua 1.26'	'diesel'	1.4	3

The table above shows generation site names, base capacity, and priority. Priority is very important in generation dispatch because it tells the Generation’s Operator which power sources to use first when meeting the forecasted load demand. As a result, a more efficient, cost-effective, and better environmentally responsible operation will be achieved.

Since renewable energy sources have lower operating costs and can generate energy from free inputs like solar irradiance and wind, it is given a higher priority. Tariffs will decrease as a result and reserve diesel generators which have higher operating costs for peak hour demand.

With the world struggling with the effect of Greenhouse gases, carbon footprint is vital and hence prioritizing renewable energy sources first as it has zero fuel cost and low emissions. Since Pacific Islands are heavy relying on fuel imports for electricity generation, price of electricity is also high to cater for the price shock of oil.

Supplying stable electricity supply to the customers is the main focus of every Electricity Utility company. Therefore, priority ensures that there are enough spinning reserves available to quickly respond to load change and fluctuations due to any drop of the renewable energy generation drop due to cloud over the solar PV farms or wind speed suddenly drops and most importantly it will help Planners to plan the maintenance time for each generation sites.

Table 7: Generation Planning and Maintenance Scheduling.

Month	1	2	3
Forecasted Load	7.58 MW	7.57 MW	7.55
Generation Site	Power Generated (MW)	Power Generated (MW)	Power Generated (MW)
Solar Farm Maama Mai	0.38	0.36	0.35
Solar Farm Mata ó e Laá	0.35	0.34	0.33
Solar Farm Singyes	0.58	0.55	0.53
Solar Farm Sunergise 1	0.62	0.59	0.57
Solar Farm Sunergise 2	0.62	0.59	0.57
Solar Farm Sunergise 3	0.62	0.59	0.57
Wind Farm I o Manumataongo	0.69	0.73	0.82
Diesel Power Plant 1.11	2.21	2.21	2.21
Diesel Power Plant 1.12	1.5	1.59	1.58

As evident from the table above, the load forecasted demand will be 7.58 MW, 7.57 MW, and 7.55 MW for the first three months respectively. Since solar pv and wind has been prioritized higher, they contributed significantly in all the months. However, as also indicated in the table, the average generation amount renewable source varies due to decreasing in solar irradiance. In January which is summer, the irradiance is higher but wind speed is lower and in March where winder starts, solar irradiance decreases while the wind speed start increasing.

To limit fuel consumption or emissions, diesel generators are consistently ran at 80%. This also increase the operation lifetime of the generator but most importantly spinning reserves have enough spinning reserve in the system.

Search This Blog

EE461-Special-Topics-AI-Project-Load-Forecasting

EE 461 Final Post II: Load Forecasting for Tonga Power Limited

Technical Results

Data Visualizations

Engineered Features

Forecasting Models

Models Hyperparameters

Model Comparison

Generation Scheduling

Data de-normalization

Generation Sites, Capacities and Priority

Comments

Post a Comment

Popular posts from this blog

EE 461 Post 7: Final Generation Scheduling

EE 461 Post 1: Data Familiarization and Visualization