EE 461 Post 2: Advanced Exploratory Data Analysis

 

Week 7 Progress: Advanced Exploratory Data Analysis

Objective For Week 7

  • Explore deeper relationships through PCA, CCA, and t-SNE
  • Analyze intrinsic dimensionality
  • Visualize clustering and transitions

Tasks Completed

  • Cleaned Dataset    
    Table 1: Cleaned Dataset

Important variables are being kept by filtering out other variables.

  • 3D Scatter plots.
Figure 1: 3D scatter plot of power

This 3D scatter plot visually compares the three key operational variables: generated, sent-out, and billed power. Each point is colored by the corresponding line loss, helping to identify how efficiently power is being transmitted and where losses may be excessive. It provides a high-level overview of the relationship between production and consumption and highlights potential outliers or inefficiencies in the energy flow, forming a baseline understanding of system performance.

  • grouped scatter plots.
Figure 2: 3D scatter plot of yearly power

This 2D scatter plot groups monthly billed and sent-out power data by year using distinct colors. It allows for the observation of annual trends, detecting any systemic shifts or performance variations over time. This visualization is particularly useful for spotting gradual improvements or deteriorations in the energy distribution system and understanding the temporal evolution of billing accuracy relative to transmission.

  • Principal Component Analysis (PCA) (2D biplots).
Figure 3: 2D biplot

The 2D Biplot above is of normalized data with respect to the principle components (PC1 and PC2) and the variables Billed, Genereated, Sent out and Re_Gen which represents the power that is generated from renewable energy which in this case is solar energy, wind energy and BESS. It can be seen that Re_Gen has a very high positive correlation on PC1. Billed can be said to have least influence on PC1 and PC2. Due to its weak correlation, billed may require feature engineering to improve forecasting accuracy. Meanwhile, there is very little separation between the Generated and Sent out variables due to the difference from line loss.
  • PCA (3D biplots) 
Figure 4: 3D biplot

The figure above is a 3D plot of normalized data with respect to principle components and the variables, Billed, Generated, Sent_out and Re_Gen.

  • Correlation heatmap.
Figure 5: Heat Correlations

The heatmap above shows the pairwise relationships between the four variables—GENERATED, SENT_OUT, BILLED, and REGEN—are displayed in this correlation heatmap. The Pearson correlation coefficient, which measures the strength and direction of a pair of variables' linear relationship, is displayed in each cell of the matrix. Since every variable has a perfect correlation with itself, all of the diagonal values are 1. Stronger positive correlations are indicated by darker blue hues in the heatmap, which shows the correlation's magnitude. All four variables are positively and strongly associated, according to this heatmap, GENERATED, SENT_OUT, and BILLED are close correlated. This suggests that energy flows and is tracked consistently throughout the generation, distribution, and billing operations. REGEN's somewhat lower correlation raises the possibility that it has a secondary function or is more variable.

  • Dy/Dx plot & Pareto



  • Figure 6: PCA plots

The Pareto chart above shows the variance explained by both the principle components that were deduced from PCA dimensionality reduction. The chart shows that the first component, PC1, captures almost all the variance while the second component has very minimal contribution which also makes the intrinsic dimensionality of the dataset to be 1. This shows that the variables are highly redundant which may be due to the three variables Billed, Generated and Sent out as they are only separated by small losses. Meanwhile, the dy-dx plot shows that the distance in the reduced dimensional space remains very close to that of the original.

Remarks

Upon completion of the PCA techniques the techniques to look at will be the:

  • Curvilinear Component Analysis (CCA).
  • t-distributed Stochastic Neighbor Embedding (t-SNE).
  • Engineering Features.

Comments

Popular posts from this blog

EE 461 Final Post II: Load Forecasting for Tonga Power Limited

EE 461 Post 7: Final Generation Scheduling

EE 461 Post 1: Data Familiarization and Visualization