Dataset statistics
| Number of variables | 4 |
|---|---|
| Number of observations | 34561 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 32.0 B |
Variable types
| Numeric | 4 |
|---|
Coal is highly correlated with FirstWind and 2 other fields | High correlation |
FirstWind is highly correlated with Coal and 1 other fields | High correlation |
SecondWind is highly correlated with Coal and 1 other fields | High correlation |
O2 is highly correlated with Coal | High correlation |
Coal is highly correlated with FirstWind and 2 other fields | High correlation |
FirstWind is highly correlated with Coal | High correlation |
SecondWind is highly correlated with Coal | High correlation |
O2 is highly correlated with Coal | High correlation |
Coal is highly correlated with SecondWind | High correlation |
SecondWind is highly correlated with Coal | High correlation |
Coal is highly correlated with FirstWind and 2 other fields | High correlation |
FirstWind is highly correlated with Coal and 2 other fields | High correlation |
SecondWind is highly correlated with Coal and 2 other fields | High correlation |
O2 is highly correlated with Coal and 2 other fields | High correlation |
Reproduction
| Analysis started | 2022-10-13 06:43:29.099081 |
|---|---|
| Analysis finished | 2022-10-13 06:43:32.989992 |
| Duration | 3.89 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 1628 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.44059836 |
| Minimum | 12.17 |
|---|---|
| Maximum | 23.47 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 270.1 KiB |
Quantile statistics
| Minimum | 12.17 |
|---|---|
| 5-th percentile | 13.62 |
| Q1 | 16.26 |
| median | 18.38 |
| Q3 | 21.11 |
| 95-th percentile | 22.17 |
| Maximum | 23.47 |
| Range | 11.3 |
| Interquartile range (IQR) | 4.85 |
Descriptive statistics
| Standard deviation | 2.763903599 |
|---|---|
| Coefficient of variation (CV) | 0.1498814488 |
| Kurtosis | -1.006417663 |
| Mean | 18.44059836 |
| Median Absolute Deviation (MAD) | 2.57 |
| Skewness | -0.2160908912 |
| Sum | 637325.52 |
| Variance | 7.639163103 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 18.39 | 240 | 0.7% |
| 18.23 | 217 | 0.6% |
| 18.46 | 206 | 0.6% |
| 21.48 | 202 | 0.6% |
| 18.61 | 200 | 0.6% |
| 21.78 | 192 | 0.6% |
| 18.68 | 184 | 0.5% |
| 18.53 | 179 | 0.5% |
| 21.41 | 173 | 0.5% |
| 18.32 | 170 | 0.5% |
| Other values (1618) | 32598 |
| Value | Count | Frequency (%) |
| 12.17 | 1 | < 0.1% |
| 12.18 | 1 | < 0.1% |
| 12.23 | 1 | < 0.1% |
| 12.24 | 3 | |
| 12.25 | 4 | |
| 12.26 | 3 | |
| 12.26 | 1 | < 0.1% |
| 12.27 | 3 | |
| 12.31 | 1 | < 0.1% |
| 12.31 | 2 |
| Value | Count | Frequency (%) |
| 23.47 | 1 | < 0.1% |
| 23.46 | 1 | < 0.1% |
| 23.45 | 1 | < 0.1% |
| 23.45 | 1 | < 0.1% |
| 23.44 | 1 | < 0.1% |
| 23.43 | 1 | < 0.1% |
| 23.39 | 2 | < 0.1% |
| 23.38 | 6 | |
| 23.38 | 3 | |
| 23.37 | 3 |
| Distinct | 33736 |
|---|---|
| Distinct (%) | 97.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 69799.93182 |
| Minimum | 61928.31 |
|---|---|
| Maximum | 78164.18 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 270.1 KiB |
Quantile statistics
| Minimum | 61928.31 |
|---|---|
| 5-th percentile | 66438.8 |
| Q1 | 67779.8 |
| median | 69752.46 |
| Q3 | 71316.38 |
| 95-th percentile | 74014.99 |
| Maximum | 78164.18 |
| Range | 16235.87 |
| Interquartile range (IQR) | 3536.58 |
Descriptive statistics
| Standard deviation | 2483.049716 |
|---|---|
| Coefficient of variation (CV) | 0.03557381291 |
| Kurtosis | -0.06943271878 |
| Mean | 69799.93182 |
| Median Absolute Deviation (MAD) | 1800.38 |
| Skewness | 0.3579589551 |
| Sum | 2412355444 |
| Variance | 6165535.89 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 69850.16 | 4 | < 0.1% |
| 67827.05 | 3 | < 0.1% |
| 70011.55 | 3 | < 0.1% |
| 67360.45 | 3 | < 0.1% |
| 68337.23 | 3 | < 0.1% |
| 67282.23 | 3 | < 0.1% |
| 73196.93 | 3 | < 0.1% |
| 69858.55 | 3 | < 0.1% |
| 67919.88 | 3 | < 0.1% |
| 67479.48 | 3 | < 0.1% |
| Other values (33726) | 34530 |
| Value | Count | Frequency (%) |
| 61928.31 | 1 | |
| 62165.91 | 1 | |
| 62259.36 | 1 | |
| 62462.94 | 1 | |
| 62515.38 | 1 | |
| 62549.07 | 1 | |
| 62551.26 | 1 | |
| 62557.49 | 1 | |
| 62601.22 | 1 | |
| 62642.59 | 1 |
| Value | Count | Frequency (%) |
| 78164.18 | 1 | |
| 78118.8 | 1 | |
| 78015.99 | 1 | |
| 77945.58 | 1 | |
| 77878.36 | 1 | |
| 77877.56 | 1 | |
| 77863.23 | 1 | |
| 77834.47 | 1 | |
| 77828.4 | 1 | |
| 77823.64 | 1 |
| Distinct | 34395 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 32385.37434 |
| Minimum | 3652.76 |
|---|---|
| Maximum | 60447.62 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 270.1 KiB |
Quantile statistics
| Minimum | 3652.76 |
|---|---|
| 5-th percentile | 9698.6 |
| Q1 | 20122.55 |
| median | 31360.71 |
| Q3 | 47594.16 |
| 95-th percentile | 52976.58 |
| Maximum | 60447.62 |
| Range | 56794.86 |
| Interquartile range (IQR) | 27471.61 |
Descriptive statistics
| Standard deviation | 14453.21809 |
|---|---|
| Coefficient of variation (CV) | 0.4462884369 |
| Kurtosis | -1.20030718 |
| Mean | 32385.37434 |
| Median Absolute Deviation (MAD) | 13063.41 |
| Skewness | -0.001038432413 |
| Sum | 1119270923 |
| Variance | 208895513.2 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 53010.23 | 3 | < 0.1% |
| 4166.01 | 2 | < 0.1% |
| 22546.5 | 2 | < 0.1% |
| 10242.21 | 2 | < 0.1% |
| 9878.78 | 2 | < 0.1% |
| 51060.04 | 2 | < 0.1% |
| 4039.01 | 2 | < 0.1% |
| 50153 | 2 | < 0.1% |
| 50499.85 | 2 | < 0.1% |
| 52434.67 | 2 | < 0.1% |
| Other values (34385) | 34540 |
| Value | Count | Frequency (%) |
| 3652.76 | 1 | |
| 3685.57 | 1 | |
| 3716.47 | 1 | |
| 3752.6 | 1 | |
| 3785.02 | 1 | |
| 3821.24 | 1 | |
| 3821.52 | 1 | |
| 3821.91 | 1 | |
| 3822.63 | 1 | |
| 3824.11 | 1 |
| Value | Count | Frequency (%) |
| 60447.62 | 1 | |
| 60404.05 | 1 | |
| 60397.04 | 1 | |
| 60391.39 | 1 | |
| 60376.19 | 1 | |
| 60341.98 | 1 | |
| 60340.88 | 1 | |
| 60295.57 | 1 | |
| 60295.3 | 1 | |
| 60292.88 | 1 |
| Distinct | 960 |
|---|---|
| Distinct (%) | 2.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.664682301 |
| Minimum | 1.14 |
|---|---|
| Maximum | 4.775 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 270.1 KiB |
Quantile statistics
| Minimum | 1.14 |
|---|---|
| 5-th percentile | 1.925 |
| Q1 | 2.33 |
| median | 2.6 |
| Q3 | 2.93 |
| 95-th percentile | 3.745 |
| Maximum | 4.775 |
| Range | 3.635 |
| Interquartile range (IQR) | 0.6 |
Descriptive statistics
| Standard deviation | 0.5166159169 |
|---|---|
| Coefficient of variation (CV) | 0.1938752386 |
| Kurtosis | 0.7795749561 |
| Mean | 2.664682301 |
| Median Absolute Deviation (MAD) | 0.3 |
| Skewness | 0.7303941689 |
| Sum | 92094.085 |
| Variance | 0.2668920055 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.535 | 200 | 0.6% |
| 2.545 | 193 | 0.6% |
| 2.465 | 190 | 0.5% |
| 2.415 | 190 | 0.5% |
| 2.55 | 188 | 0.5% |
| 2.54 | 184 | 0.5% |
| 2.51 | 179 | 0.5% |
| 2.455 | 179 | 0.5% |
| 2.575 | 174 | 0.5% |
| 2.495 | 173 | 0.5% |
| Other values (950) | 32711 |
| Value | Count | Frequency (%) |
| 1.14 | 1 | < 0.1% |
| 1.145 | 1 | < 0.1% |
| 1.165 | 1 | < 0.1% |
| 1.17 | 1 | < 0.1% |
| 1.19 | 3 | |
| 1.195 | 1 | < 0.1% |
| 1.2 | 1 | < 0.1% |
| 1.2 | 1 | < 0.1% |
| 1.205 | 1 | < 0.1% |
| 1.21 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 4.775 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 4.73 | 1 | < 0.1% |
| 4.725 | 1 | < 0.1% |
| 4.72 | 1 | < 0.1% |
| 4.7 | 2 | |
| 4.695 | 3 | |
| 4.69 | 1 | < 0.1% |
| 4.69 | 3 | |
| 4.68 | 4 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Coal | FirstWind | SecondWind | O2 | |
|---|---|---|---|---|
| 0 | 16.80 | 64379.99 | 23792.59 | 1.960 |
| 1 | 16.81 | 64444.50 | 24001.90 | 1.960 |
| 2 | 16.88 | 65216.45 | 23890.99 | 1.980 |
| 3 | 16.87 | 64742.45 | 23584.70 | 2.035 |
| 4 | 16.87 | 64837.68 | 23489.90 | 2.050 |
| 5 | 16.93 | 64698.29 | 23497.79 | 2.035 |
| 6 | 16.86 | 64509.76 | 23706.09 | 2.035 |
| 7 | 16.70 | 64627.97 | 24168.01 | 2.035 |
| 8 | 16.78 | 64361.31 | 23854.38 | 2.060 |
| 9 | 16.86 | 64002.26 | 23956.83 | 2.040 |
Last rows
| Coal | FirstWind | SecondWind | O2 | |
|---|---|---|---|---|
| 34551 | 15.73 | 70768.01 | 15472.46 | 2.600 |
| 34552 | 15.67 | 70797.01 | 15634.67 | 2.645 |
| 34553 | 15.81 | 70489.61 | 15456.44 | 2.710 |
| 34554 | 15.88 | 70462.48 | 15535.99 | 2.720 |
| 34555 | 15.88 | 70842.63 | 15555.26 | 2.750 |
| 34556 | 15.89 | 70156.09 | 15631.97 | 2.780 |
| 34557 | 15.64 | 70230.87 | 15582.21 | 2.770 |
| 34558 | 15.63 | 70578.46 | 15436.95 | 2.790 |
| 34559 | 15.81 | 70433.12 | 15529.33 | 2.830 |
| 34560 | 15.79 | 70689.77 | 15509.43 | 2.855 |