Overview

Dataset statistics

Number of variables4
Number of observations34561
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory32.0 B

Variable types

Numeric4

Alerts

Coal is highly correlated with FirstWind and 2 other fieldsHigh correlation
FirstWind is highly correlated with Coal and 1 other fieldsHigh correlation
SecondWind is highly correlated with Coal and 1 other fieldsHigh correlation
O2 is highly correlated with CoalHigh correlation
Coal is highly correlated with FirstWind and 2 other fieldsHigh correlation
FirstWind is highly correlated with CoalHigh correlation
SecondWind is highly correlated with CoalHigh correlation
O2 is highly correlated with CoalHigh correlation
Coal is highly correlated with SecondWindHigh correlation
SecondWind is highly correlated with CoalHigh correlation
Coal is highly correlated with FirstWind and 2 other fieldsHigh correlation
FirstWind is highly correlated with Coal and 2 other fieldsHigh correlation
SecondWind is highly correlated with Coal and 2 other fieldsHigh correlation
O2 is highly correlated with Coal and 2 other fieldsHigh correlation

Reproduction

Analysis started2022-10-13 06:43:29.099081
Analysis finished2022-10-13 06:43:32.989992
Duration3.89 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Coal
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1628
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.44059836
Minimum12.17
Maximum23.47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size270.1 KiB
2022-10-13T14:43:33.103013image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12.17
5-th percentile13.62
Q116.26
median18.38
Q321.11
95-th percentile22.17
Maximum23.47
Range11.3
Interquartile range (IQR)4.85

Descriptive statistics

Standard deviation2.763903599
Coefficient of variation (CV)0.1498814488
Kurtosis-1.006417663
Mean18.44059836
Median Absolute Deviation (MAD)2.57
Skewness-0.2160908912
Sum637325.52
Variance7.639163103
MonotonicityNot monotonic
2022-10-13T14:43:33.241039image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.39240
 
0.7%
18.23217
 
0.6%
18.46206
 
0.6%
21.48202
 
0.6%
18.61200
 
0.6%
21.78192
 
0.6%
18.68184
 
0.5%
18.53179
 
0.5%
21.41173
 
0.5%
18.32170
 
0.5%
Other values (1618)32598
94.3%
ValueCountFrequency (%)
12.171
 
< 0.1%
12.181
 
< 0.1%
12.231
 
< 0.1%
12.243
< 0.1%
12.254
< 0.1%
12.263
< 0.1%
12.261
 
< 0.1%
12.273
< 0.1%
12.311
 
< 0.1%
12.312
< 0.1%
ValueCountFrequency (%)
23.471
 
< 0.1%
23.461
 
< 0.1%
23.451
 
< 0.1%
23.451
 
< 0.1%
23.441
 
< 0.1%
23.431
 
< 0.1%
23.392
 
< 0.1%
23.386
< 0.1%
23.383
< 0.1%
23.373
< 0.1%

FirstWind
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct33736
Distinct (%)97.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69799.93182
Minimum61928.31
Maximum78164.18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size270.1 KiB
2022-10-13T14:43:33.383059image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum61928.31
5-th percentile66438.8
Q167779.8
median69752.46
Q371316.38
95-th percentile74014.99
Maximum78164.18
Range16235.87
Interquartile range (IQR)3536.58

Descriptive statistics

Standard deviation2483.049716
Coefficient of variation (CV)0.03557381291
Kurtosis-0.06943271878
Mean69799.93182
Median Absolute Deviation (MAD)1800.38
Skewness0.3579589551
Sum2412355444
Variance6165535.89
MonotonicityNot monotonic
2022-10-13T14:43:33.523079image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
69850.164
 
< 0.1%
67827.053
 
< 0.1%
70011.553
 
< 0.1%
67360.453
 
< 0.1%
68337.233
 
< 0.1%
67282.233
 
< 0.1%
73196.933
 
< 0.1%
69858.553
 
< 0.1%
67919.883
 
< 0.1%
67479.483
 
< 0.1%
Other values (33726)34530
99.9%
ValueCountFrequency (%)
61928.311
< 0.1%
62165.911
< 0.1%
62259.361
< 0.1%
62462.941
< 0.1%
62515.381
< 0.1%
62549.071
< 0.1%
62551.261
< 0.1%
62557.491
< 0.1%
62601.221
< 0.1%
62642.591
< 0.1%
ValueCountFrequency (%)
78164.181
< 0.1%
78118.81
< 0.1%
78015.991
< 0.1%
77945.581
< 0.1%
77878.361
< 0.1%
77877.561
< 0.1%
77863.231
< 0.1%
77834.471
< 0.1%
77828.41
< 0.1%
77823.641
< 0.1%

SecondWind
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct34395
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32385.37434
Minimum3652.76
Maximum60447.62
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size270.1 KiB
2022-10-13T14:43:33.674113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum3652.76
5-th percentile9698.6
Q120122.55
median31360.71
Q347594.16
95-th percentile52976.58
Maximum60447.62
Range56794.86
Interquartile range (IQR)27471.61

Descriptive statistics

Standard deviation14453.21809
Coefficient of variation (CV)0.4462884369
Kurtosis-1.20030718
Mean32385.37434
Median Absolute Deviation (MAD)13063.41
Skewness-0.001038432413
Sum1119270923
Variance208895513.2
MonotonicityNot monotonic
2022-10-13T14:43:33.807132image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
53010.233
 
< 0.1%
4166.012
 
< 0.1%
22546.52
 
< 0.1%
10242.212
 
< 0.1%
9878.782
 
< 0.1%
51060.042
 
< 0.1%
4039.012
 
< 0.1%
501532
 
< 0.1%
50499.852
 
< 0.1%
52434.672
 
< 0.1%
Other values (34385)34540
99.9%
ValueCountFrequency (%)
3652.761
< 0.1%
3685.571
< 0.1%
3716.471
< 0.1%
3752.61
< 0.1%
3785.021
< 0.1%
3821.241
< 0.1%
3821.521
< 0.1%
3821.911
< 0.1%
3822.631
< 0.1%
3824.111
< 0.1%
ValueCountFrequency (%)
60447.621
< 0.1%
60404.051
< 0.1%
60397.041
< 0.1%
60391.391
< 0.1%
60376.191
< 0.1%
60341.981
< 0.1%
60340.881
< 0.1%
60295.571
< 0.1%
60295.31
< 0.1%
60292.881
< 0.1%

O2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct960
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.664682301
Minimum1.14
Maximum4.775
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size270.1 KiB
2022-10-13T14:43:33.952171image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.14
5-th percentile1.925
Q12.33
median2.6
Q32.93
95-th percentile3.745
Maximum4.775
Range3.635
Interquartile range (IQR)0.6

Descriptive statistics

Standard deviation0.5166159169
Coefficient of variation (CV)0.1938752386
Kurtosis0.7795749561
Mean2.664682301
Median Absolute Deviation (MAD)0.3
Skewness0.7303941689
Sum92094.085
Variance0.2668920055
MonotonicityNot monotonic
2022-10-13T14:43:34.093222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.535200
 
0.6%
2.545193
 
0.6%
2.465190
 
0.5%
2.415190
 
0.5%
2.55188
 
0.5%
2.54184
 
0.5%
2.51179
 
0.5%
2.455179
 
0.5%
2.575174
 
0.5%
2.495173
 
0.5%
Other values (950)32711
94.6%
ValueCountFrequency (%)
1.141
 
< 0.1%
1.1451
 
< 0.1%
1.1651
 
< 0.1%
1.171
 
< 0.1%
1.193
< 0.1%
1.1951
 
< 0.1%
1.21
 
< 0.1%
1.21
 
< 0.1%
1.2051
 
< 0.1%
1.211
 
< 0.1%
ValueCountFrequency (%)
4.7751
 
< 0.1%
4.761
 
< 0.1%
4.731
 
< 0.1%
4.7251
 
< 0.1%
4.721
 
< 0.1%
4.72
< 0.1%
4.6953
< 0.1%
4.691
 
< 0.1%
4.693
< 0.1%
4.684
< 0.1%

Interactions

2022-10-13T14:43:32.196834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:30.577349image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.107460image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.664734image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:32.331858image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:30.706386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.244481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.794752image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:32.475898image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:30.846414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.390509image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.936778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:32.603901image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:30.974436image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:31.525715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-13T14:43:32.064815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-10-13T14:43:34.210226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-13T14:43:34.340243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-13T14:43:34.468286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-13T14:43:34.594297image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-13T14:43:32.790946image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-13T14:43:32.926969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

CoalFirstWindSecondWindO2
016.8064379.9923792.591.960
116.8164444.5024001.901.960
216.8865216.4523890.991.980
316.8764742.4523584.702.035
416.8764837.6823489.902.050
516.9364698.2923497.792.035
616.8664509.7623706.092.035
716.7064627.9724168.012.035
816.7864361.3123854.382.060
916.8664002.2623956.832.040

Last rows

CoalFirstWindSecondWindO2
3455115.7370768.0115472.462.600
3455215.6770797.0115634.672.645
3455315.8170489.6115456.442.710
3455415.8870462.4815535.992.720
3455515.8870842.6315555.262.750
3455615.8970156.0915631.972.780
3455715.6470230.8715582.212.770
3455815.6370578.4615436.952.790
3455915.8170433.1215529.332.830
3456015.7970689.7715509.432.855