EE 461 Final Post II: Load Forecasting for Tonga Power Limited

 

Technical Results

Data Visualizations

Cleaned data of any outlier or ambiguity.

Table 1: A subset of the 222x4 dataset.

No. Samples

Dataset Variables

GENERATED

SENT OUT

MONTH

Tmp2m (C')

1

4399713

4321268

7

23.5701

2

4503448

4422164

8

23.8409

3

4386093

4291216

9

24.3952

4

4559317

4438567

10

24.3477

5

4332707

4218030

11

25.8833

6

4473349

4378796

12

25.9076

7

4739349

4611365

1

27.3127

8

4558099

4435805

2

27.6036

9

4737354

4629863

3

27.8743

 

Table 1 is a subset table representing the first 9 rows of our dataset which contains 4 variables with over 200 observations. This table shows the variables that have been chosen to be the main features of study for load forecasting as they already correlate with each other. The Sen Out power which is the power sent through the distribution lines after generation is our focus of study (Response Variable), this is highlight by the fact that during the timeline of this dataset the community of Tonga had been bypassing the energy from the meter, therefore entailing that the billed power from the Tong Power Limited (TPL) company may be inaccurate or have outliers in itself. 

Engineered Features

Table 2: The Engineered Features for our dataset.

No. 

Engineered Features

SENT OUT LAG1

SENT OUT ROLLING (12M)

EFFICIENCY

GROWTH RATE

MONTH SIN

MONTH COS

1

NaN

4321268

0.9822

NaN

0.5

0.866

2

4321268

4371716

0.982

0.0233

0.5

0.866

3

4422164

4344900

0.9784

-0.0296

0.5

0.866

4

4291216

4368300

0.9735

0.0343

0.5

0.866

5

4438567

4338249

0.9735

-0.0497

0.5

0.866

6

4218030

4345000

0.9789

0.0381

0.5

0.866

7

4378796

4383058

0.973

0.0531

0.5

0.866

8

4611365

4389700

0.9732

-0.0381

0.5

0.866

9

4435805

4416300

0.9773

0.0437

0.5

0.866

 

Through the capture of temporal patterns, trends, and cyclical behavior in electricity demand, the engineering characteristics given aim to improve a load forecasting model's predictive capability. In order to capture instantaneous trends or variations, the "sent out lag1" feature which provides short-term memory of recent load values represents the load from the preceding time step. In order to smooth out short-term fluctuations and draw attention to longer-term seasonal or trend patterns, the "Sent out rolling (12M)" is a 12-month rolling average of the sent-out load. "Efficiency" is probably a ratio that provides information about operational performance or system effectiveness by comparing the actual load to a reference or expected number. The "Growth Rate" helps the model determine how quickly demand is rising or falling by capturing the relative change in load from one period to the next. Last but not least, "Month Sine" and "Month Cosine" are cyclical transformations of the month index using sine and cosine functions. This eliminates the need for artificial gaps between months (such as those between December and January) and enables the model to continuously and non-linearly learn seasonal patterns. When combined, these characteristics offer a comprehensive understanding of seasonality, historical trends, and current load behavior dynamics all of which are essential for precise forecasting.

Forecasting Models

Models Hyperparameters

Table 3:  Classic statistical models.

Hyper-parameter

Classical Models

ARIMA

SARIMA

p

2

1

d

0

0

q

1

1

P

-

1

D

-

0

Q

-

7

S

-

9

 

The three primary hyperparameters of ARIMA (Autoregressive Integrated Moving Average), a popular classical model in time series forecasting, are p, d, and q. The number of autoregressive (AR) terms, or how many historical values are utilized to predict the current value, is represented by the parameter p. Since ARIMA has p = 2, the model considers the last two observations. When d = 0, it indicates that there is no need for differencing because the data is already stationary. The number of moving average (MA) terms that take into consideration the impact of previous forecast errors on the prediction is known as the q parameter. In this case, q = 1 denotes that there is one lag forecast error in the model.

SARIMA (Seasonal ARIMA) adds more hyperparameters to simulate seasonal trends. These are P, D, Q, and S. The first three are comparable to p, d, and q in ARIMA, but with the seasonal component added. P is specifically the number of seasonal autoregressive terms; in your example, P = 1, meaning that just one seasonal lag is taken into account. When D = 0, it indicates that no seasonal differencing is being used. D is the order of seasonal differencing. With Q = 7, the model incorporates seven seasonal lagged forecast errors. Q is the number of seasonal moving average terms. The seasonal cycle's duration is determined by S, which in your instance is S = 9. This means that seasonal impacts recur every nine-time steps. SARIMA (1, 0, 1) (1, 0, 7, 9) is the collective representation of the SARIMA model you supplied, which captures both seasonal and short-term patterns for more precise forecasting.

Table 4: Deep learning models.

Hyper-parameter

Deep Learning Models

LSTM

CNN

Hybrid

NumLags

10

-

-

HiddenUnit

54

64

82

DropOut

0.48387

0.2

0.19794

Epochs

224

100

-

Filter

-

32

32

BatchSize

-

16

-

Window

-

-

12

 

The performance of deep learning models, including LSTM, CNN, and hybrid LSTM-CNN architectures, is greatly influenced by a number of hyperparameters. The NumLags parameter for the LSTM model is set to 10, meaning that each prediction is based on input features from the last ten-time steps of historical data. The network can capture intricate temporal correlations in the input sequence because the Hidden Unit parameter, which indicates the number of neurons in the LSTM layer, is set to 54. By randomly deactivating almost 48% of the neurons during training, a comparatively high DropOut rate of 0.48387 is used to avoid overfitting. To ensure sufficient learning over several runs of the dataset, the model is trained over 224 epochs.

To balance regularization and model capacity, the CNN model's architecture consists of 64 hidden units with a DropOut rate of 0.2. The model can identify local temporal patterns in the input sequence with the use of the CNN's 32 filters. The model updates its weights every 16 samples when training with a BatchSize of 16, which helps expedite training and enhance convergence. To enable adequate learning of local features, the training process lasts for 100 epochs.

The architecture of the hybrid LSTM-CNN model includes 82 hidden units to give increased learning capacity, combining the advantages of both local pattern extraction (CNN) and temporal memory (LSTM). To reduce overfitting, a DropOut rate of 0.19794 is employed, same like in the solo CNN model. In keeping with the CNN approach, the hybrid model also uses 32 convolutional filters to extract features. In order to align the data structure for both the convolutional and recurrent processing layers, the model also uses a window size of 12, which means it accepts a sequence of 12 time steps as input. Although the hybrid model does not specify the precise number of training epochs, this is usually ascertained by early halting or validation performance.

Model Comparison

all 5 models.

 

ARIMAX

Evaluation Metrics

MAE

MAPE

MSE

RMSE

Test

0.023

10.65%

0.011

0.0331

0.9978

Training

0.0091

3.17%

0.0004

0.0201

0.9984

 

SARIMAX

Evaluation Metrics

MAE

MAPE

MSE

RMSE

Test

0.2081

79.34%

0.0657

0.2562

0.8641

Training

0.1897

94.11%

0.0529

0.23

0.7946

 

LSTM

Evaluation Metrics

MAE

MAPE

MSE

RMSE

Training 

0.2252

165.48%

0.1163

0.3411

0.6364

Validation

0.3132

66.18%

0.2244

0.4737

0.2873

Test 

0.3179

38.17%

0.2044

0.4521

0.4173

 

CNN

Evaluation Metrics

MAE

MAPE

MSE

RMSE

Training 

0.1196

47.37%

0.0233

0.1526

0.9801

Validation

0.1083

27.28%

0.0177

0.133

0.9815

Test 

0.1083

27.29%

0.0177

0.1672

0.9636

 

LSTM/CNN Hybrid

Evaluation Metrics

MAE

MAPE

MSE

RMSE

Training 

0.1852

79.59%

0.0527

0.2296

0.9479

Validation

0.257

135.17%

0.1031

0.3211

0.8761

Test 

0.4017

173.75%

0.2312

0.4808

0.775

Generation Scheduling

Data de-normalization

The ARIMAX model's load forecast uses normalized input and output data to enhance model performance. To obtain the corresponding raw value for generation scheduling, the normalized output value is subsequently de-normalized.  

Generation Sites, Capacities and Priority

Tonga has a hybrid generation system with Solar PV, Wind Farms, BESS and Diesel Generators.

Table 6: Generation sites and its generators types.

Name

Type

BaseCapacity

Priority

'Solar Farm Maama Mai'

'solar'

1.412

1

'Solar Farm Mata o e Laa'

'solar'

1.3

1

'Solar Farm Singyes'

'solar'

2.13

1

'Solar Farm Sunergise 1'

'solar'

2.3

1

'Solar Farm Sunergise 2'

'solar'

2.3

1

'Solar Farm Sunergise 3'

'solar'

2.3

1

'Wind Farm I o Manumataongo'

'wind'

1.375

2

'Diesel Powerplant Popua 1.11'

'diesel'

2.765

3

'Diesel Powerplant Popua 1.12'

'diesel'

2.765

3

'Diesel Powerplant Popua 1.21'

'diesel'

1.4

3

'Diesel Powerplant Popua 1.22'

'diesel'

1.4

3

'Diesel Powerplant Popua 1.23'

'diesel'

1.4

3

'Diesel Powerplant Popua 1.24'

'diesel'

1.4

3

'Diesel Powerplant Popua 1.25'

'diesel'

1.4

3

'Diesel Powerplant Popua 1.26'

'diesel'

1.4

3

 

 

The table above shows generation site names, base capacity, and priority. Priority is very important in generation dispatch because it tells the Generation’s Operator which power sources to use first when meeting the forecasted load demand. As a result, a more efficient, cost-effective, and better environmentally responsible operation will be achieved.

Since renewable energy sources have lower operating costs and can generate energy from free inputs like solar irradiance and wind, it is given a higher priority. Tariffs will decrease as a result and reserve diesel generators which have higher operating costs for peak hour demand.

With the world struggling with the effect of Greenhouse gases, carbon footprint is vital and hence prioritizing renewable energy sources first as it has zero fuel cost and low emissions.  Since Pacific Islands are heavy relying on fuel imports for electricity generation, price of electricity is also high to cater for the price shock of oil.

Supplying stable electricity supply to the customers is the main focus of every Electricity Utility company. Therefore, priority ensures that there are enough spinning reserves available to quickly respond to load change and fluctuations due to any drop of the renewable energy generation drop due to cloud over the solar PV farms or wind speed suddenly drops and most importantly it will help Planners to plan the maintenance time for each generation sites.

 

Table 7: Generation Planning and Maintenance Scheduling.

Month

1

2

3

Forecasted Load

7.58 MW

7.57 MW

7.55

Generation Site

Power Generated (MW)

Power Generated (MW)

Power Generated (MW)

Solar Farm Maama Mai

0.38

0.36

0.35

Solar Farm Mata ó e Laá

0.35

0.34

0.33

Solar Farm Singyes

0.58

0.55

0.53

Solar Farm Sunergise 1

0.62

0.59

0.57

Solar Farm Sunergise 2

0.62

0.59

0.57

Solar Farm Sunergise 3

0.62

0.59

0.57

Wind Farm I o Manumataongo

0.69

0.73

0.82

Diesel Power Plant 1.11

2.21

2.21

2.21

Diesel Power Plant 1.12

1.5

1.59

1.58

 

As evident from the table above, the load forecasted demand will be 7.58 MW, 7.57 MW, and 7.55 MW for the first three months respectively. Since solar pv and wind has been prioritized higher, they contributed significantly in all the months. However, as also indicated in the table, the average generation amount renewable source varies due to decreasing in solar irradiance. In January which is summer, the irradiance is higher but wind speed is lower and in March where winder starts, solar irradiance decreases while the wind speed start increasing. 

To limit fuel consumption or emissions, diesel generators are consistently ran at 80%. This also increase the operation lifetime of the generator but most importantly spinning reserves have enough spinning reserve in the system.

Comments

Popular posts from this blog

EE 461 Post 7: Final Generation Scheduling

EE 461 Post 1: Data Familiarization and Visualization