EE 461 Post 2: Advanced Exploratory Data Analysis
- Explore deeper relationships through PCA, CCA, and t-SNE
- Analyze intrinsic dimensionality
- Visualize clustering and transitions
- Cleaned DatasetTable 1: Cleaned Dataset
- 3D Scatter plots.
This 3D scatter plot visually compares the three key operational variables: generated, sent-out, and billed power. Each point is colored by the corresponding line loss, helping to identify how efficiently power is being transmitted and where losses may be excessive. It provides a high-level overview of the relationship between production and consumption and highlights potential outliers or inefficiencies in the energy flow, forming a baseline understanding of system performance.
- grouped scatter plots.
This 2D scatter plot groups monthly billed and sent-out power data by year using distinct colors. It allows for the observation of annual trends, detecting any systemic shifts or performance variations over time. This visualization is particularly useful for spotting gradual improvements or deteriorations in the energy distribution system and understanding the temporal evolution of billing accuracy relative to transmission.
- Principal Component Analysis (PCA) (2D biplots).
- PCA (3D biplots)
The figure above is a 3D plot of normalized data with respect to principle components and the variables, Billed, Generated, Sent_out and Re_Gen.
- Correlation heatmap.
The heatmap above shows the pairwise relationships between the four variables—GENERATED, SENT_OUT, BILLED, and REGEN—are displayed in this correlation heatmap. The Pearson correlation coefficient, which measures the strength and direction of a pair of variables' linear relationship, is displayed in each cell of the matrix. Since every variable has a perfect correlation with itself, all of the diagonal values are 1. Stronger positive correlations are indicated by darker blue hues in the heatmap, which shows the correlation's magnitude. All four variables are positively and strongly associated, according to this heatmap, GENERATED, SENT_OUT, and BILLED are close correlated. This suggests that energy flows and is tracked consistently throughout the generation, distribution, and billing operations. REGEN's somewhat lower correlation raises the possibility that it has a secondary function or is more variable.
Dy/Dx plot & Pareto
- Figure 6: PCA plots
The Pareto chart above shows the variance explained by both the principle components that were deduced from PCA dimensionality reduction. The chart shows that the first component, PC1, captures almost all the variance while the second component has very minimal contribution which also makes the intrinsic dimensionality of the dataset to be 1. This shows that the variables are highly redundant which may be due to the three variables Billed, Generated and Sent out as they are only separated by small losses. Meanwhile, the dy-dx plot shows that the distance in the reduced dimensional space remains very close to that of the original.
Upon completion of the PCA techniques the techniques to look at will be the:
- Curvilinear Component Analysis (CCA).
- t-distributed Stochastic Neighbor Embedding (t-SNE).
- Engineering Features.
Comments
Post a Comment