EE 461 Post 1: Data Familiarization and Visualization
This project focuses on applying advanced machine learning and dimensionality reduction techniques to forecast electricity load demand for Tonga Power Limited (TPL).
- Ensuring grid stability
- Managing renewable integration (solar, wind, BESS)
- Improving energy efficiency
- Supporting future planning initiatives in Tonga
- Understand the historical electrical load data from Tonga Power Limited (TPL)
- Preprocess the data: handle missing values, normalize, and clean inconsistencies
- Visualize datasets to observe patterns, trends, and anomalies
- Begin dimensionality reduction (PCA)
- Mapping generation sites
This image is provided to show the pinpoints of the location of the 10 power stations. The imagery was built/made using the inbuilt "geoscatter" command and using 'topographic' as our choice of color trace. The code uses the latitude and the longitude provided from the "GSL" file to map out the accuracy of the location.
- Importing and preprocessing data.Table 1: Datasets loading
The table above is the datasets that was collected from TPL. It was than preprocessed using the primary EDA methods. Firstly the data was imported and checked for any irregularities. The common irregularity found was the absence of data for some years, for instance, the "RE_GEN", which is the generation of power using renewable energy sources, was recorded from the month of August in the year 2012. The rest of the previous years was labeled as 'NaN' as it was empty. Using the skills learnt from the practical's, the commands "fillmissing" and "removevars" were used to tidy up the table.
- Visualizations:
- Histograms of:
- Generation.
Using the "histogram" command we were able to generate a graphical representation of the generation of power produced over a monthly basis. Through this we are able to identify patterns, see trends in the production stages and/or anomalies in the data. Through the visualization we can identify facts as such, the production of energy within the range of 4.5 x10^6 kW/H till 4.675 x10^6 kW/H was generated at a count of 26 times.
- Parasitic losses.
This histogram plays a crucial in understanding the behavior of parasitic losses within a generation system. It provides a more clear insight into how these losses are distributed across different time zones (Monthly basis). It helps the engineers and technicians in assessing the system performance and identifying key areas for improvements.
- Line losses.
The histogram above shows the line losses and the amount of time a certain amount of line loss was experienced. The histogram shows that the line losses are predominantly around the range of 400,000 to 600,000 kWh with the most around 500,000 kWh. Additionally, the histogram shows that the line loss has also varied greatly with some recordings being around 2,000,000 kWh, although with very very less recurrence. This histogram allows users especially engineers to well grasp the range within line losses occur.
- Scatter plots of generation metrics (Generated, Sent Out and Billed).
The above scatter plot shows the Generated(total electricity generated), Sent out(total electricity sent out to the grid), and Billed(Electricity billed to consumers) against the duration. Through this scatter plot, it can be noted that all three variables have an increasing trend with over time, indicating great electricity generation, sent out and consumption over time. It is also seen that the Generated power (blue) is greater than the other two variables due to losses.
- Principal Component Analysis (PCA) with Pareto chart and component variance analysis
In the above Pareto chart, each bar gives the component with the height of the bar showing how much that single component contributes. Meanwhile the redline gives the cumulative percentage of the components combined. Here, it can be seen that 80% is reached by the third component.
Upon completion of Data collection and preprocessing which included examning, cleansing and visualizing the data, normalization was carried out along with PCA. In the coming weeks, other linear techniques such as LDA will be explored and non linear techniques such as CCA will also be explored before moving onto feature engineering.
Comments
Post a Comment