EE 461 Post 1: Data Familiarization and Visualization

 

EE461-Special-Topics-AI-Project-Load-Forecasting-

This project focuses on applying advanced machine learning and dimensionality reduction techniques to forecast electricity load demand for Tonga Power Limited (TPL).

Accurate load forecasting is vital for:

  • Ensuring grid stability
  • Managing renewable integration (solar, wind, BESS)
  • Improving energy efficiency
  • Supporting future planning initiatives in Tonga

Weeks 2–6 Progress: Data Familiarization and Visualization

Objective

  • Understand the historical electrical load data from Tonga Power Limited (TPL)
  • Preprocess the data: handle missing values, normalize, and clean inconsistencies
  • Visualize datasets to observe patterns, trends, and anomalies
  • Begin dimensionality reduction (PCA)

Tasks Completed

  • Mapping generation sites 

Figure 1: Map of generation sites

This image is provided to show the pinpoints of the location of the 10 power stations. The imagery was built/made using the inbuilt "geoscatter" command and using 'topographic' as our choice of color trace. The code uses the latitude and the longitude provided from the "GSL" file to map out the accuracy of the location.

  • Importing and preprocessing data.
    Table 1: Datasets loading

The table above is the datasets that was collected from TPL. It was than preprocessed using the primary EDA methods. Firstly the data was imported and checked for any irregularities. The common irregularity found was the absence of data for some years, for instance, the "RE_GEN", which is the generation of power using renewable energy sources, was recorded from the month of August in the year 2012. The rest of the previous years was labeled as 'NaN' as it was empty. Using the skills learnt from the practical's, the commands "fillmissing" and "removevars" were used to tidy up the table.

  • Visualizations:
    • Histograms of:
      • Generation.
Figure 2: Histogram of Generation Power

Using the "histogram" command we were able to generate a graphical representation of the generation of power produced over a monthly basis. Through this we are able to identify patterns, see trends in the production stages and/or anomalies in the data. Through the visualization we can identify facts as such, the production of energy within the range of 4.5 x10^6 kW/H till 4.675 x10^6 kW/H was generated at a count of 26 times.

      • Parasitic losses.
Figure 3: Histogram of Parasitic losses

This histogram plays a crucial in understanding the behavior of parasitic losses within a generation system. It provides a more clear insight into how these losses are distributed across different time zones (Monthly basis). It helps the engineers and technicians in assessing the system performance and identifying key areas for improvements.

      • Line losses.
Figure 4: Line losses Histogram

The histogram above shows the line losses and the amount of time a certain amount of line loss was experienced. The histogram shows that the line losses are predominantly around the range of 400,000 to 600,000 kWh with the most around 500,000 kWh. Additionally, the histogram shows that the line loss has also varied greatly with some recordings being around 2,000,000 kWh, although with very very less recurrence. This histogram allows users especially engineers to well grasp the range within line losses occur.

    • Scatter plots of generation metrics (Generated, Sent Out and Billed).
Figure 5: Scatter Plot

The above scatter plot shows the Generated(total electricity generated), Sent out(total electricity sent out to the grid), and Billed(Electricity billed to consumers) against the duration. Through this scatter plot, it can be noted that all three variables have an increasing trend with over time, indicating great electricity generation, sent out and consumption over time. It is also seen that the Generated power (blue) is greater than the other two variables due to losses.

  • Principal Component Analysis (PCA) with Pareto chart and component variance analysis 
Figure 6: PC analysis

In the above Pareto chart, each bar gives the component with the height of the bar showing how much that single component contributes. Meanwhile the redline gives the cumulative percentage of the components combined. Here, it can be seen that 80% is reached by the third component.

Remarks

Upon completion of Data collection and preprocessing which included examning, cleansing and visualizing the data, normalization was carried out along with PCA. In the coming weeks, other linear techniques such as LDA will be explored and non linear techniques such as CCA will also be explored before moving onto feature engineering.

Comments

Popular posts from this blog

EE 461 Final Post II: Load Forecasting for Tonga Power Limited

EE 461 Post 7: Final Generation Scheduling