Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 10424 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 407.3 KiB |
| Average record size in memory | 40.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 3 |
date has a high cardinality: 110 distinct values | High cardinality |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
refjunction is highly correlated with avg_numvehicles | High correlation |
hour is highly correlated with avg_co2 and 1 other fields | High correlation |
avg_co2 is highly correlated with hour and 1 other fields | High correlation |
avg_numvehicles is highly correlated with refjunction and 2 other fields | High correlation |
refjunction is uniformly distributed | Uniform |
date is uniformly distributed | Uniform |
hour has 432 (4.1%) zeros | Zeros |
Reproduction
| Analysis started | 2022-06-14 12:29:23.156227 |
|---|---|
| Analysis finished | 2022-06-14 12:29:28.119509 |
| Duration | 4.96 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 81.6 KiB |
| urn:ngsi-ld:Junction:54201 | |
|---|---|
| urn:ngsi-ld:Junction:54202 | |
| urn:ngsi-ld:Junction:54204 | |
| urn:ngsi-ld:Junction:54206 |
Length
| Max length | 26 |
|---|---|
| Median length | 26 |
| Mean length | 26 |
| Min length | 26 |
Characters and Unicode
| Total characters | 271024 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | urn:ngsi-ld:Junction:54201 |
|---|---|
| 2nd row | urn:ngsi-ld:Junction:54201 |
| 3rd row | urn:ngsi-ld:Junction:54201 |
| 4th row | urn:ngsi-ld:Junction:54201 |
| 5th row | urn:ngsi-ld:Junction:54201 |
Common Values
| Value | Count | Frequency (%) |
| urn:ngsi-ld:Junction:54201 | 2606 | |
| urn:ngsi-ld:Junction:54202 | 2606 | |
| urn:ngsi-ld:Junction:54204 | 2606 | |
| urn:ngsi-ld:Junction:54206 | 2606 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| urn:ngsi-ld:junction:54201 | 2606 | |
| urn:ngsi-ld:junction:54202 | 2606 | |
| urn:ngsi-ld:junction:54204 | 2606 | |
| urn:ngsi-ld:junction:54206 | 2606 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 41696 | |
| : | 31272 | 11.5% |
| u | 20848 | 7.7% |
| i | 20848 | 7.7% |
| 2 | 13030 | 4.8% |
| 4 | 13030 | 4.8% |
| c | 10424 | 3.8% |
| 0 | 10424 | 3.8% |
| 5 | 10424 | 3.8% |
| o | 10424 | 3.8% |
| Other values (10) | 88604 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 166784 | |
| Decimal Number | 52120 | 19.2% |
| Other Punctuation | 31272 | 11.5% |
| Uppercase Letter | 10424 | 3.8% |
| Dash Punctuation | 10424 | 3.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 41696 | |
| u | 20848 | |
| i | 20848 | |
| c | 10424 | 6.2% |
| o | 10424 | 6.2% |
| t | 10424 | 6.2% |
| r | 10424 | 6.2% |
| d | 10424 | 6.2% |
| l | 10424 | 6.2% |
| s | 10424 | 6.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 13030 | |
| 4 | 13030 | |
| 0 | 10424 | |
| 5 | 10424 | |
| 1 | 2606 | 5.0% |
| 6 | 2606 | 5.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 31272 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 10424 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10424 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 177208 | |
| Common | 93816 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 41696 | |
| u | 20848 | |
| i | 20848 | |
| c | 10424 | 5.9% |
| o | 10424 | 5.9% |
| t | 10424 | 5.9% |
| J | 10424 | 5.9% |
| r | 10424 | 5.9% |
| d | 10424 | 5.9% |
| l | 10424 | 5.9% |
| Other values (2) | 20848 |
Common
| Value | Count | Frequency (%) |
| : | 31272 | |
| 2 | 13030 | |
| 4 | 13030 | |
| 0 | 10424 | 11.1% |
| 5 | 10424 | 11.1% |
| - | 10424 | 11.1% |
| 1 | 2606 | 2.8% |
| 6 | 2606 | 2.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 271024 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 41696 | |
| : | 31272 | 11.5% |
| u | 20848 | 7.7% |
| i | 20848 | 7.7% |
| 2 | 13030 | 4.8% |
| 4 | 13030 | 4.8% |
| c | 10424 | 3.8% |
| 0 | 10424 | 3.8% |
| 5 | 10424 | 3.8% |
| o | 10424 | 3.8% |
| Other values (10) | 88604 |
| Distinct | 110 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 81.6 KiB |
| 2022-04-21 | 96 |
|---|---|
| 2022-05-17 | 96 |
| 2022-05-15 | 96 |
| 2022-05-14 | 96 |
| 2022-05-13 | 96 |
| Other values (105) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 104240 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2022-02-25 |
|---|---|
| 2nd row | 2022-02-25 |
| 3rd row | 2022-02-25 |
| 4th row | 2022-02-25 |
| 5th row | 2022-02-25 |
Common Values
| Value | Count | Frequency (%) |
| 2022-04-21 | 96 | 0.9% |
| 2022-05-17 | 96 | 0.9% |
| 2022-05-15 | 96 | 0.9% |
| 2022-05-14 | 96 | 0.9% |
| 2022-05-13 | 96 | 0.9% |
| 2022-05-12 | 96 | 0.9% |
| 2022-05-11 | 96 | 0.9% |
| 2022-05-10 | 96 | 0.9% |
| 2022-05-09 | 96 | 0.9% |
| 2022-05-08 | 96 | 0.9% |
| Other values (100) | 9464 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2022-04-21 | 96 | 0.9% |
| 2022-04-20 | 96 | 0.9% |
| 2022-03-02 | 96 | 0.9% |
| 2022-03-03 | 96 | 0.9% |
| 2022-03-04 | 96 | 0.9% |
| 2022-03-05 | 96 | 0.9% |
| 2022-03-06 | 96 | 0.9% |
| 2022-03-07 | 96 | 0.9% |
| 2022-03-08 | 96 | 0.9% |
| 2022-03-09 | 96 | 0.9% |
| Other values (100) | 9464 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 35784 | |
| 0 | 25248 | |
| - | 20848 | |
| 1 | 4568 | 4.4% |
| 3 | 4496 | 4.3% |
| 5 | 3960 | 3.8% |
| 4 | 3912 | 3.8% |
| 6 | 2364 | 2.3% |
| 7 | 1052 | 1.0% |
| 8 | 1048 | 1.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 83392 | |
| Dash Punctuation | 20848 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 35784 | |
| 0 | 25248 | |
| 1 | 4568 | 5.5% |
| 3 | 4496 | 5.4% |
| 5 | 3960 | 4.7% |
| 4 | 3912 | 4.7% |
| 6 | 2364 | 2.8% |
| 7 | 1052 | 1.3% |
| 8 | 1048 | 1.3% |
| 9 | 960 | 1.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 20848 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 104240 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 35784 | |
| 0 | 25248 | |
| - | 20848 | |
| 1 | 4568 | 4.4% |
| 3 | 4496 | 4.3% |
| 5 | 3960 | 3.8% |
| 4 | 3912 | 3.8% |
| 6 | 2364 | 2.3% |
| 7 | 1052 | 1.0% |
| 8 | 1048 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 104240 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 35784 | |
| 0 | 25248 | |
| - | 20848 | |
| 1 | 4568 | 4.4% |
| 3 | 4496 | 4.3% |
| 5 | 3960 | 3.8% |
| 4 | 3912 | 3.8% |
| 6 | 2364 | 2.3% |
| 7 | 1052 | 1.0% |
| 8 | 1048 | 1.0% |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.50460476 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 432 |
| Zeros (%) | 4.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 81.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 12 |
| Q3 | 17 |
| 95-th percentile | 22 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.916545735 |
|---|---|
| Coefficient of variation (CV) | 0.6011980316 |
| Kurtosis | -1.201710894 |
| Mean | 11.50460476 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.001068449895 |
| Sum | 119924 |
| Variance | 47.83860491 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 17 | 436 | 4.2% |
| 8 | 436 | 4.2% |
| 16 | 436 | 4.2% |
| 21 | 436 | 4.2% |
| 22 | 436 | 4.2% |
| 15 | 436 | 4.2% |
| 14 | 436 | 4.2% |
| 13 | 436 | 4.2% |
| 2 | 436 | 4.2% |
| 3 | 436 | 4.2% |
| Other values (14) | 6064 |
| Value | Count | Frequency (%) |
| 0 | 432 | |
| 1 | 432 | |
| 2 | 436 | |
| 3 | 436 | |
| 4 | 432 | |
| 5 | 432 | |
| 6 | 432 | |
| 7 | 432 | |
| 8 | 436 | |
| 9 | 436 |
| Value | Count | Frequency (%) |
| 23 | 432 | |
| 22 | 436 | |
| 21 | 436 | |
| 20 | 432 | |
| 19 | 432 | |
| 18 | 432 | |
| 17 | 436 | |
| 16 | 436 | |
| 15 | 436 | |
| 14 | 436 |
| Distinct | 2986 |
|---|---|
| Distinct (%) | 28.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.46244189 |
| Minimum | 2.181935484 |
|---|---|
| Maximum | 38.33290323 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 81.6 KiB |
Quantile statistics
| Minimum | 2.181935484 |
|---|---|
| 5-th percentile | 2.692 |
| Q1 | 13.59666667 |
| median | 13.95485952 |
| Q3 | 14.25660714 |
| 95-th percentile | 32.87314815 |
| Maximum | 38.33290323 |
| Range | 36.15096774 |
| Interquartile range (IQR) | 0.6599404762 |
Descriptive statistics
| Standard deviation | 8.065269185 |
|---|---|
| Coefficient of variation (CV) | 0.5576699457 |
| Kurtosis | 1.083705487 |
| Mean | 14.46244189 |
| Median Absolute Deviation (MAD) | 0.3264848485 |
| Skewness | 0.9520260068 |
| Sum | 150756.4943 |
| Variance | 65.04856703 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13.75 | 12 | 0.1% |
| 2.682596154 | 8 | 0.1% |
| 13.621875 | 8 | 0.1% |
| 2.705625 | 8 | 0.1% |
| 13.96653846 | 8 | 0.1% |
| 13.89333333 | 8 | 0.1% |
| 13.585 | 8 | 0.1% |
| 13.85 | 8 | 0.1% |
| 13.83586538 | 8 | 0.1% |
| 2.7015 | 8 | 0.1% |
| Other values (2976) | 10340 |
| Value | Count | Frequency (%) |
| 2.181935484 | 4 | |
| 2.278684211 | 4 | |
| 2.332307692 | 4 | |
| 2.393448276 | 4 | |
| 2.472916667 | 4 | |
| 2.534615385 | 1 | < 0.1% |
| 2.534615385 | 3 | |
| 2.544772727 | 4 | |
| 2.545916667 | 4 | |
| 2.5465625 | 4 |
| Value | Count | Frequency (%) |
| 38.33290323 | 4 | |
| 35.71913043 | 4 | |
| 35.59666667 | 4 | |
| 35.3275 | 4 | |
| 35.29521739 | 4 | |
| 35.16107143 | 4 | |
| 34.56696429 | 4 | |
| 34.41596154 | 4 | |
| 33.9403125 | 4 | |
| 33.8903125 | 4 |
| Distinct | 8429 |
|---|---|
| Distinct (%) | 80.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 72.69160203 |
| Minimum | 6.4 |
|---|---|
| Maximum | 267.3333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 81.6 KiB |
Quantile statistics
| Minimum | 6.4 |
|---|---|
| 5-th percentile | 14.125 |
| Q1 | 44.66666667 |
| median | 75.93269231 |
| Q3 | 82.02088859 |
| 95-th percentile | 184.5109435 |
| Maximum | 267.3333333 |
| Range | 260.9333333 |
| Interquartile range (IQR) | 37.35422193 |
Descriptive statistics
| Standard deviation | 44.5546616 |
|---|---|
| Coefficient of variation (CV) | 0.6129272206 |
| Kurtosis | 1.651848485 |
| Mean | 72.69160203 |
| Median Absolute Deviation (MAD) | 9.387019231 |
| Skewness | 1.146292551 |
| Sum | 757737.2596 |
| Variance | 1985.11787 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 44.75 | 15 | 0.1% |
| 8.5 | 12 | 0.1% |
| 79 | 11 | 0.1% |
| 14.5 | 10 | 0.1% |
| 82.25 | 9 | 0.1% |
| 44.25 | 9 | 0.1% |
| 80 | 8 | 0.1% |
| 14.25 | 8 | 0.1% |
| 15.5 | 8 | 0.1% |
| 74.5 | 8 | 0.1% |
| Other values (8419) | 10326 |
| Value | Count | Frequency (%) |
| 6.4 | 1 | |
| 6.848484848 | 1 | |
| 7 | 1 | |
| 7.346153846 | 1 | |
| 7.541666667 | 2 | |
| 7.595744681 | 1 | |
| 7.625 | 1 | |
| 7.878787879 | 1 | |
| 7.9 | 1 | |
| 7.90625 | 1 |
| Value | Count | Frequency (%) |
| 267.3333333 | 1 | |
| 264.3928571 | 1 | |
| 261.75 | 1 | |
| 247.75 | 1 | |
| 244.173913 | 1 | |
| 243.5714286 | 1 | |
| 240.7666667 | 1 | |
| 226.8461538 | 1 | |
| 225.0833333 | 1 | |
| 223.85 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| refjunction | date | hour | avg_co2 | avg_numvehicles | |
|---|---|---|---|---|---|
| 0 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 18 | 15.519286 | 29.000000 |
| 1 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 19 | 14.489091 | 44.454545 |
| 2 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 20 | 15.221364 | 55.000000 |
| 3 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 21 | 35.295217 | 103.000000 |
| 4 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 22 | 27.086486 | 116.593750 |
| 5 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 23 | 15.413000 | 28.937500 |
| 6 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 0 | 18.458214 | 46.375000 |
| 7 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 1 | 13.054865 | 43.974359 |
| 8 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 2 | 14.035417 | 42.733333 |
| 9 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 3 | 14.191389 | 45.619048 |
Last rows
| refjunction | date | hour | avg_co2 | avg_numvehicles | |
|---|---|---|---|---|---|
| 10414 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 8 | 14.120333 | 83.491667 |
| 10415 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 9 | 13.771719 | 84.375000 |
| 10416 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 10 | 14.129741 | 82.025862 |
| 10417 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 11 | 32.860484 | 194.266129 |
| 10418 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 12 | 14.132917 | 82.983333 |
| 10419 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 13 | 13.779375 | 79.861111 |
| 10420 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 14 | 14.333636 | 82.102190 |
| 10421 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 15 | 14.016618 | 83.174242 |
| 10422 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 16 | 13.963437 | 84.109375 |
| 10423 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 17 | 13.676638 | 81.413793 |