Dataset statistics
Number of variables | 5 |
---|---|
Number of observations | 10424 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 407.3 KiB |
Average record size in memory | 40.0 B |
Variable types
Categorical | 2 |
---|---|
Numeric | 3 |
date has a high cardinality: 110 distinct values | High cardinality |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
refjunction is highly correlated with avg_numvehicles | High correlation |
hour is highly correlated with avg_co2 and 1 other fields | High correlation |
avg_co2 is highly correlated with hour and 1 other fields | High correlation |
avg_numvehicles is highly correlated with refjunction and 2 other fields | High correlation |
refjunction is uniformly distributed | Uniform |
date is uniformly distributed | Uniform |
hour has 432 (4.1%) zeros | Zeros |
Reproduction
Analysis started | 2022-06-14 12:29:23.156227 |
---|---|
Analysis finished | 2022-06-14 12:29:28.119509 |
Duration | 4.96 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 81.6 KiB |
urn:ngsi-ld:Junction:54201 | |
---|---|
urn:ngsi-ld:Junction:54202 | |
urn:ngsi-ld:Junction:54204 | |
urn:ngsi-ld:Junction:54206 |
Length
Max length | 26 |
---|---|
Median length | 26 |
Mean length | 26 |
Min length | 26 |
Characters and Unicode
Total characters | 271024 |
---|---|
Distinct characters | 20 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | urn:ngsi-ld:Junction:54201 |
---|---|
2nd row | urn:ngsi-ld:Junction:54201 |
3rd row | urn:ngsi-ld:Junction:54201 |
4th row | urn:ngsi-ld:Junction:54201 |
5th row | urn:ngsi-ld:Junction:54201 |
Common Values
Value | Count | Frequency (%) |
urn:ngsi-ld:Junction:54201 | 2606 | |
urn:ngsi-ld:Junction:54202 | 2606 | |
urn:ngsi-ld:Junction:54204 | 2606 | |
urn:ngsi-ld:Junction:54206 | 2606 |
Length
Histogram of lengths of the category
Category Frequency Plot
Value | Count | Frequency (%) |
urn:ngsi-ld:junction:54201 | 2606 | |
urn:ngsi-ld:junction:54202 | 2606 | |
urn:ngsi-ld:junction:54204 | 2606 | |
urn:ngsi-ld:junction:54206 | 2606 |
Most occurring characters
Value | Count | Frequency (%) |
n | 41696 | |
: | 31272 | 11.5% |
u | 20848 | 7.7% |
i | 20848 | 7.7% |
2 | 13030 | 4.8% |
4 | 13030 | 4.8% |
c | 10424 | 3.8% |
0 | 10424 | 3.8% |
5 | 10424 | 3.8% |
o | 10424 | 3.8% |
Other values (10) | 88604 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 166784 | |
Decimal Number | 52120 | 19.2% |
Other Punctuation | 31272 | 11.5% |
Uppercase Letter | 10424 | 3.8% |
Dash Punctuation | 10424 | 3.8% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
n | 41696 | |
u | 20848 | |
i | 20848 | |
c | 10424 | 6.2% |
o | 10424 | 6.2% |
t | 10424 | 6.2% |
r | 10424 | 6.2% |
d | 10424 | 6.2% |
l | 10424 | 6.2% |
s | 10424 | 6.2% |
Decimal Number
Value | Count | Frequency (%) |
2 | 13030 | |
4 | 13030 | |
0 | 10424 | |
5 | 10424 | |
1 | 2606 | 5.0% |
6 | 2606 | 5.0% |
Other Punctuation
Value | Count | Frequency (%) |
: | 31272 |
Uppercase Letter
Value | Count | Frequency (%) |
J | 10424 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 10424 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 177208 | |
Common | 93816 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
n | 41696 | |
u | 20848 | |
i | 20848 | |
c | 10424 | 5.9% |
o | 10424 | 5.9% |
t | 10424 | 5.9% |
J | 10424 | 5.9% |
r | 10424 | 5.9% |
d | 10424 | 5.9% |
l | 10424 | 5.9% |
Other values (2) | 20848 |
Common
Value | Count | Frequency (%) |
: | 31272 | |
2 | 13030 | |
4 | 13030 | |
0 | 10424 | 11.1% |
5 | 10424 | 11.1% |
- | 10424 | 11.1% |
1 | 2606 | 2.8% |
6 | 2606 | 2.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 271024 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
n | 41696 | |
: | 31272 | 11.5% |
u | 20848 | 7.7% |
i | 20848 | 7.7% |
2 | 13030 | 4.8% |
4 | 13030 | 4.8% |
c | 10424 | 3.8% |
0 | 10424 | 3.8% |
5 | 10424 | 3.8% |
o | 10424 | 3.8% |
Other values (10) | 88604 |
Distinct | 110 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 81.6 KiB |
2022-04-21 | 96 |
---|---|
2022-05-17 | 96 |
2022-05-15 | 96 |
2022-05-14 | 96 |
2022-05-13 | 96 |
Other values (105) |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 104240 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2022-02-25 |
---|---|
2nd row | 2022-02-25 |
3rd row | 2022-02-25 |
4th row | 2022-02-25 |
5th row | 2022-02-25 |
Common Values
Value | Count | Frequency (%) |
2022-04-21 | 96 | 0.9% |
2022-05-17 | 96 | 0.9% |
2022-05-15 | 96 | 0.9% |
2022-05-14 | 96 | 0.9% |
2022-05-13 | 96 | 0.9% |
2022-05-12 | 96 | 0.9% |
2022-05-11 | 96 | 0.9% |
2022-05-10 | 96 | 0.9% |
2022-05-09 | 96 | 0.9% |
2022-05-08 | 96 | 0.9% |
Other values (100) | 9464 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
2022-04-21 | 96 | 0.9% |
2022-04-20 | 96 | 0.9% |
2022-03-02 | 96 | 0.9% |
2022-03-03 | 96 | 0.9% |
2022-03-04 | 96 | 0.9% |
2022-03-05 | 96 | 0.9% |
2022-03-06 | 96 | 0.9% |
2022-03-07 | 96 | 0.9% |
2022-03-08 | 96 | 0.9% |
2022-03-09 | 96 | 0.9% |
Other values (100) | 9464 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 35784 | |
0 | 25248 | |
- | 20848 | |
1 | 4568 | 4.4% |
3 | 4496 | 4.3% |
5 | 3960 | 3.8% |
4 | 3912 | 3.8% |
6 | 2364 | 2.3% |
7 | 1052 | 1.0% |
8 | 1048 | 1.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 83392 | |
Dash Punctuation | 20848 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 35784 | |
0 | 25248 | |
1 | 4568 | 5.5% |
3 | 4496 | 5.4% |
5 | 3960 | 4.7% |
4 | 3912 | 4.7% |
6 | 2364 | 2.8% |
7 | 1052 | 1.3% |
8 | 1048 | 1.3% |
9 | 960 | 1.2% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 20848 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 104240 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 35784 | |
0 | 25248 | |
- | 20848 | |
1 | 4568 | 4.4% |
3 | 4496 | 4.3% |
5 | 3960 | 3.8% |
4 | 3912 | 3.8% |
6 | 2364 | 2.3% |
7 | 1052 | 1.0% |
8 | 1048 | 1.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 104240 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 35784 | |
0 | 25248 | |
- | 20848 | |
1 | 4568 | 4.4% |
3 | 4496 | 4.3% |
5 | 3960 | 3.8% |
4 | 3912 | 3.8% |
6 | 2364 | 2.3% |
7 | 1052 | 1.0% |
8 | 1048 | 1.0% |
Distinct | 24 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 11.50460476 |
Minimum | 0 |
---|---|
Maximum | 23 |
Zeros | 432 |
Zeros (%) | 4.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 81.6 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 6 |
median | 12 |
Q3 | 17 |
95-th percentile | 22 |
Maximum | 23 |
Range | 23 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 6.916545735 |
---|---|
Coefficient of variation (CV) | 0.6011980316 |
Kurtosis | -1.201710894 |
Mean | 11.50460476 |
Median Absolute Deviation (MAD) | 6 |
Skewness | -0.001068449895 |
Sum | 119924 |
Variance | 47.83860491 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
Value | Count | Frequency (%) |
17 | 436 | 4.2% |
8 | 436 | 4.2% |
16 | 436 | 4.2% |
21 | 436 | 4.2% |
22 | 436 | 4.2% |
15 | 436 | 4.2% |
14 | 436 | 4.2% |
13 | 436 | 4.2% |
2 | 436 | 4.2% |
3 | 436 | 4.2% |
Other values (14) | 6064 |
Value | Count | Frequency (%) |
0 | 432 | |
1 | 432 | |
2 | 436 | |
3 | 436 | |
4 | 432 | |
5 | 432 | |
6 | 432 | |
7 | 432 | |
8 | 436 | |
9 | 436 |
Value | Count | Frequency (%) |
23 | 432 | |
22 | 436 | |
21 | 436 | |
20 | 432 | |
19 | 432 | |
18 | 432 | |
17 | 436 | |
16 | 436 | |
15 | 436 | |
14 | 436 |
Distinct | 2986 |
---|---|
Distinct (%) | 28.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 14.46244189 |
Minimum | 2.181935484 |
---|---|
Maximum | 38.33290323 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 81.6 KiB |
Quantile statistics
Minimum | 2.181935484 |
---|---|
5-th percentile | 2.692 |
Q1 | 13.59666667 |
median | 13.95485952 |
Q3 | 14.25660714 |
95-th percentile | 32.87314815 |
Maximum | 38.33290323 |
Range | 36.15096774 |
Interquartile range (IQR) | 0.6599404762 |
Descriptive statistics
Standard deviation | 8.065269185 |
---|---|
Coefficient of variation (CV) | 0.5576699457 |
Kurtosis | 1.083705487 |
Mean | 14.46244189 |
Median Absolute Deviation (MAD) | 0.3264848485 |
Skewness | 0.9520260068 |
Sum | 150756.4943 |
Variance | 65.04856703 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
13.75 | 12 | 0.1% |
2.682596154 | 8 | 0.1% |
13.621875 | 8 | 0.1% |
2.705625 | 8 | 0.1% |
13.96653846 | 8 | 0.1% |
13.89333333 | 8 | 0.1% |
13.585 | 8 | 0.1% |
13.85 | 8 | 0.1% |
13.83586538 | 8 | 0.1% |
2.7015 | 8 | 0.1% |
Other values (2976) | 10340 |
Value | Count | Frequency (%) |
2.181935484 | 4 | |
2.278684211 | 4 | |
2.332307692 | 4 | |
2.393448276 | 4 | |
2.472916667 | 4 | |
2.534615385 | 1 | < 0.1% |
2.534615385 | 3 | |
2.544772727 | 4 | |
2.545916667 | 4 | |
2.5465625 | 4 |
Value | Count | Frequency (%) |
38.33290323 | 4 | |
35.71913043 | 4 | |
35.59666667 | 4 | |
35.3275 | 4 | |
35.29521739 | 4 | |
35.16107143 | 4 | |
34.56696429 | 4 | |
34.41596154 | 4 | |
33.9403125 | 4 | |
33.8903125 | 4 |
Distinct | 8429 |
---|---|
Distinct (%) | 80.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 72.69160203 |
Minimum | 6.4 |
---|---|
Maximum | 267.3333333 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 81.6 KiB |
Quantile statistics
Minimum | 6.4 |
---|---|
5-th percentile | 14.125 |
Q1 | 44.66666667 |
median | 75.93269231 |
Q3 | 82.02088859 |
95-th percentile | 184.5109435 |
Maximum | 267.3333333 |
Range | 260.9333333 |
Interquartile range (IQR) | 37.35422193 |
Descriptive statistics
Standard deviation | 44.5546616 |
---|---|
Coefficient of variation (CV) | 0.6129272206 |
Kurtosis | 1.651848485 |
Mean | 72.69160203 |
Median Absolute Deviation (MAD) | 9.387019231 |
Skewness | 1.146292551 |
Sum | 757737.2596 |
Variance | 1985.11787 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
44.75 | 15 | 0.1% |
8.5 | 12 | 0.1% |
79 | 11 | 0.1% |
14.5 | 10 | 0.1% |
82.25 | 9 | 0.1% |
44.25 | 9 | 0.1% |
80 | 8 | 0.1% |
14.25 | 8 | 0.1% |
15.5 | 8 | 0.1% |
74.5 | 8 | 0.1% |
Other values (8419) | 10326 |
Value | Count | Frequency (%) |
6.4 | 1 | |
6.848484848 | 1 | |
7 | 1 | |
7.346153846 | 1 | |
7.541666667 | 2 | |
7.595744681 | 1 | |
7.625 | 1 | |
7.878787879 | 1 | |
7.9 | 1 | |
7.90625 | 1 |
Value | Count | Frequency (%) |
267.3333333 | 1 | |
264.3928571 | 1 | |
261.75 | 1 | |
247.75 | 1 | |
244.173913 | 1 | |
243.5714286 | 1 | |
240.7666667 | 1 | |
226.8461538 | 1 | |
225.0833333 | 1 | |
223.85 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
refjunction | date | hour | avg_co2 | avg_numvehicles | |
---|---|---|---|---|---|
0 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 18 | 15.519286 | 29.000000 |
1 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 19 | 14.489091 | 44.454545 |
2 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 20 | 15.221364 | 55.000000 |
3 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 21 | 35.295217 | 103.000000 |
4 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 22 | 27.086486 | 116.593750 |
5 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 23 | 15.413000 | 28.937500 |
6 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 0 | 18.458214 | 46.375000 |
7 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 1 | 13.054865 | 43.974359 |
8 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 2 | 14.035417 | 42.733333 |
9 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 3 | 14.191389 | 45.619048 |
Last rows
refjunction | date | hour | avg_co2 | avg_numvehicles | |
---|---|---|---|---|---|
10414 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 8 | 14.120333 | 83.491667 |
10415 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 9 | 13.771719 | 84.375000 |
10416 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 10 | 14.129741 | 82.025862 |
10417 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 11 | 32.860484 | 194.266129 |
10418 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 12 | 14.132917 | 82.983333 |
10419 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 13 | 13.779375 | 79.861111 |
10420 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 14 | 14.333636 | 82.102190 |
10421 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 15 | 14.016618 | 83.174242 |
10422 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 16 | 13.963437 | 84.109375 |
10423 | urn:ngsi-ld:Junction:54206 | 2022-06-14 | 17 | 13.676638 | 81.413793 |