Overview

Dataset statistics

Number of variables5
Number of observations10424
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory407.3 KiB
Average record size in memory40.0 B

Variable types

Categorical2
Numeric3

Alerts

date has a high cardinality: 110 distinct values High cardinality
avg_co2 is highly correlated with avg_numvehiclesHigh correlation
avg_numvehicles is highly correlated with avg_co2High correlation
avg_co2 is highly correlated with avg_numvehiclesHigh correlation
avg_numvehicles is highly correlated with avg_co2High correlation
refjunction is highly correlated with avg_numvehiclesHigh correlation
hour is highly correlated with avg_co2 and 1 other fieldsHigh correlation
avg_co2 is highly correlated with hour and 1 other fieldsHigh correlation
avg_numvehicles is highly correlated with refjunction and 2 other fieldsHigh correlation
refjunction is uniformly distributed Uniform
date is uniformly distributed Uniform
hour has 432 (4.1%) zeros Zeros

Reproduction

Analysis started2022-06-14 12:29:23.156227
Analysis finished2022-06-14 12:29:28.119509
Duration4.96 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

refjunction
Categorical

HIGH CORRELATION
UNIFORM

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size81.6 KiB
urn:ngsi-ld:Junction:54201
2606 
urn:ngsi-ld:Junction:54202
2606 
urn:ngsi-ld:Junction:54204
2606 
urn:ngsi-ld:Junction:54206
2606 

Length

Max length26
Median length26
Mean length26
Min length26

Characters and Unicode

Total characters271024
Distinct characters20
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:ngsi-ld:Junction:54201
2nd rowurn:ngsi-ld:Junction:54201
3rd rowurn:ngsi-ld:Junction:54201
4th rowurn:ngsi-ld:Junction:54201
5th rowurn:ngsi-ld:Junction:54201

Common Values

ValueCountFrequency (%)
urn:ngsi-ld:Junction:542012606
25.0%
urn:ngsi-ld:Junction:542022606
25.0%
urn:ngsi-ld:Junction:542042606
25.0%
urn:ngsi-ld:Junction:542062606
25.0%

Length

2022-06-14T12:29:28.170987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-14T12:29:28.293484image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
urn:ngsi-ld:junction:542012606
25.0%
urn:ngsi-ld:junction:542022606
25.0%
urn:ngsi-ld:junction:542042606
25.0%
urn:ngsi-ld:junction:542062606
25.0%

Most occurring characters

ValueCountFrequency (%)
n41696
15.4%
:31272
 
11.5%
u20848
 
7.7%
i20848
 
7.7%
213030
 
4.8%
413030
 
4.8%
c10424
 
3.8%
010424
 
3.8%
510424
 
3.8%
o10424
 
3.8%
Other values (10)88604
32.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter166784
61.5%
Decimal Number52120
 
19.2%
Other Punctuation31272
 
11.5%
Uppercase Letter10424
 
3.8%
Dash Punctuation10424
 
3.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n41696
25.0%
u20848
12.5%
i20848
12.5%
c10424
 
6.2%
o10424
 
6.2%
t10424
 
6.2%
r10424
 
6.2%
d10424
 
6.2%
l10424
 
6.2%
s10424
 
6.2%
Decimal Number
ValueCountFrequency (%)
213030
25.0%
413030
25.0%
010424
20.0%
510424
20.0%
12606
 
5.0%
62606
 
5.0%
Other Punctuation
ValueCountFrequency (%)
:31272
100.0%
Uppercase Letter
ValueCountFrequency (%)
J10424
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10424
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin177208
65.4%
Common93816
34.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n41696
23.5%
u20848
11.8%
i20848
11.8%
c10424
 
5.9%
o10424
 
5.9%
t10424
 
5.9%
J10424
 
5.9%
r10424
 
5.9%
d10424
 
5.9%
l10424
 
5.9%
Other values (2)20848
11.8%
Common
ValueCountFrequency (%)
:31272
33.3%
213030
13.9%
413030
13.9%
010424
 
11.1%
510424
 
11.1%
-10424
 
11.1%
12606
 
2.8%
62606
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII271024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n41696
15.4%
:31272
 
11.5%
u20848
 
7.7%
i20848
 
7.7%
213030
 
4.8%
413030
 
4.8%
c10424
 
3.8%
010424
 
3.8%
510424
 
3.8%
o10424
 
3.8%
Other values (10)88604
32.7%

date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct110
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size81.6 KiB
2022-04-21
 
96
2022-05-17
 
96
2022-05-15
 
96
2022-05-14
 
96
2022-05-13
 
96
Other values (105)
9944 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters104240
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-02-25
2nd row2022-02-25
3rd row2022-02-25
4th row2022-02-25
5th row2022-02-25

Common Values

ValueCountFrequency (%)
2022-04-2196
 
0.9%
2022-05-1796
 
0.9%
2022-05-1596
 
0.9%
2022-05-1496
 
0.9%
2022-05-1396
 
0.9%
2022-05-1296
 
0.9%
2022-05-1196
 
0.9%
2022-05-1096
 
0.9%
2022-05-0996
 
0.9%
2022-05-0896
 
0.9%
Other values (100)9464
90.8%

Length

2022-06-14T12:29:28.412390image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-04-2196
 
0.9%
2022-04-2096
 
0.9%
2022-03-0296
 
0.9%
2022-03-0396
 
0.9%
2022-03-0496
 
0.9%
2022-03-0596
 
0.9%
2022-03-0696
 
0.9%
2022-03-0796
 
0.9%
2022-03-0896
 
0.9%
2022-03-0996
 
0.9%
Other values (100)9464
90.8%

Most occurring characters

ValueCountFrequency (%)
235784
34.3%
025248
24.2%
-20848
20.0%
14568
 
4.4%
34496
 
4.3%
53960
 
3.8%
43912
 
3.8%
62364
 
2.3%
71052
 
1.0%
81048
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number83392
80.0%
Dash Punctuation20848
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
235784
42.9%
025248
30.3%
14568
 
5.5%
34496
 
5.4%
53960
 
4.7%
43912
 
4.7%
62364
 
2.8%
71052
 
1.3%
81048
 
1.3%
9960
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
-20848
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common104240
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
235784
34.3%
025248
24.2%
-20848
20.0%
14568
 
4.4%
34496
 
4.3%
53960
 
3.8%
43912
 
3.8%
62364
 
2.3%
71052
 
1.0%
81048
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII104240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
235784
34.3%
025248
24.2%
-20848
20.0%
14568
 
4.4%
34496
 
4.3%
53960
 
3.8%
43912
 
3.8%
62364
 
2.3%
71052
 
1.0%
81048
 
1.0%

hour
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.50460476
Minimum0
Maximum23
Zeros432
Zeros (%)4.1%
Negative0
Negative (%)0.0%
Memory size81.6 KiB
2022-06-14T12:29:28.522230image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q16
median12
Q317
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.916545735
Coefficient of variation (CV)0.6011980316
Kurtosis-1.201710894
Mean11.50460476
Median Absolute Deviation (MAD)6
Skewness-0.001068449895
Sum119924
Variance47.83860491
MonotonicityNot monotonic
2022-06-14T12:29:28.636315image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
17436
 
4.2%
8436
 
4.2%
16436
 
4.2%
21436
 
4.2%
22436
 
4.2%
15436
 
4.2%
14436
 
4.2%
13436
 
4.2%
2436
 
4.2%
3436
 
4.2%
Other values (14)6064
58.2%
ValueCountFrequency (%)
0432
4.1%
1432
4.1%
2436
4.2%
3436
4.2%
4432
4.1%
5432
4.1%
6432
4.1%
7432
4.1%
8436
4.2%
9436
4.2%
ValueCountFrequency (%)
23432
4.1%
22436
4.2%
21436
4.2%
20432
4.1%
19432
4.1%
18432
4.1%
17436
4.2%
16436
4.2%
15436
4.2%
14436
4.2%

avg_co2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2986
Distinct (%)28.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.46244189
Minimum2.181935484
Maximum38.33290323
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size81.6 KiB
2022-06-14T12:29:28.999312image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2.181935484
5-th percentile2.692
Q113.59666667
median13.95485952
Q314.25660714
95-th percentile32.87314815
Maximum38.33290323
Range36.15096774
Interquartile range (IQR)0.6599404762

Descriptive statistics

Standard deviation8.065269185
Coefficient of variation (CV)0.5576699457
Kurtosis1.083705487
Mean14.46244189
Median Absolute Deviation (MAD)0.3264848485
Skewness0.9520260068
Sum150756.4943
Variance65.04856703
MonotonicityNot monotonic
2022-06-14T12:29:29.148727image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13.7512
 
0.1%
2.6825961548
 
0.1%
13.6218758
 
0.1%
2.7056258
 
0.1%
13.966538468
 
0.1%
13.893333338
 
0.1%
13.5858
 
0.1%
13.858
 
0.1%
13.835865388
 
0.1%
2.70158
 
0.1%
Other values (2976)10340
99.2%
ValueCountFrequency (%)
2.1819354844
< 0.1%
2.2786842114
< 0.1%
2.3323076924
< 0.1%
2.3934482764
< 0.1%
2.4729166674
< 0.1%
2.5346153851
 
< 0.1%
2.5346153853
< 0.1%
2.5447727274
< 0.1%
2.5459166674
< 0.1%
2.54656254
< 0.1%
ValueCountFrequency (%)
38.332903234
< 0.1%
35.719130434
< 0.1%
35.596666674
< 0.1%
35.32754
< 0.1%
35.295217394
< 0.1%
35.161071434
< 0.1%
34.566964294
< 0.1%
34.415961544
< 0.1%
33.94031254
< 0.1%
33.89031254
< 0.1%

avg_numvehicles
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8429
Distinct (%)80.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.69160203
Minimum6.4
Maximum267.3333333
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size81.6 KiB
2022-06-14T12:29:29.311935image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum6.4
5-th percentile14.125
Q144.66666667
median75.93269231
Q382.02088859
95-th percentile184.5109435
Maximum267.3333333
Range260.9333333
Interquartile range (IQR)37.35422193

Descriptive statistics

Standard deviation44.5546616
Coefficient of variation (CV)0.6129272206
Kurtosis1.651848485
Mean72.69160203
Median Absolute Deviation (MAD)9.387019231
Skewness1.146292551
Sum757737.2596
Variance1985.11787
MonotonicityNot monotonic
2022-06-14T12:29:29.471958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44.7515
 
0.1%
8.512
 
0.1%
7911
 
0.1%
14.510
 
0.1%
82.259
 
0.1%
44.259
 
0.1%
808
 
0.1%
14.258
 
0.1%
15.58
 
0.1%
74.58
 
0.1%
Other values (8419)10326
99.1%
ValueCountFrequency (%)
6.41
< 0.1%
6.8484848481
< 0.1%
71
< 0.1%
7.3461538461
< 0.1%
7.5416666672
< 0.1%
7.5957446811
< 0.1%
7.6251
< 0.1%
7.8787878791
< 0.1%
7.91
< 0.1%
7.906251
< 0.1%
ValueCountFrequency (%)
267.33333331
< 0.1%
264.39285711
< 0.1%
261.751
< 0.1%
247.751
< 0.1%
244.1739131
< 0.1%
243.57142861
< 0.1%
240.76666671
< 0.1%
226.84615381
< 0.1%
225.08333331
< 0.1%
223.851
< 0.1%

Interactions

2022-06-14T12:29:27.449908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:26.658837image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:27.048012image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:27.572715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:26.790483image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:27.175536image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:27.706297image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:26.922386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T12:29:27.316874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-14T12:29:29.595740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-14T12:29:29.728047image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-14T12:29:29.855239image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-14T12:29:29.985295image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-14T12:29:27.887712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-14T12:29:28.050439image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

refjunctiondatehouravg_co2avg_numvehicles
0urn:ngsi-ld:Junction:542012022-02-251815.51928629.000000
1urn:ngsi-ld:Junction:542012022-02-251914.48909144.454545
2urn:ngsi-ld:Junction:542012022-02-252015.22136455.000000
3urn:ngsi-ld:Junction:542012022-02-252135.295217103.000000
4urn:ngsi-ld:Junction:542012022-02-252227.086486116.593750
5urn:ngsi-ld:Junction:542012022-02-252315.41300028.937500
6urn:ngsi-ld:Junction:542012022-02-26018.45821446.375000
7urn:ngsi-ld:Junction:542012022-02-26113.05486543.974359
8urn:ngsi-ld:Junction:542012022-02-26214.03541742.733333
9urn:ngsi-ld:Junction:542012022-02-26314.19138945.619048

Last rows

refjunctiondatehouravg_co2avg_numvehicles
10414urn:ngsi-ld:Junction:542062022-06-14814.12033383.491667
10415urn:ngsi-ld:Junction:542062022-06-14913.77171984.375000
10416urn:ngsi-ld:Junction:542062022-06-141014.12974182.025862
10417urn:ngsi-ld:Junction:542062022-06-141132.860484194.266129
10418urn:ngsi-ld:Junction:542062022-06-141214.13291782.983333
10419urn:ngsi-ld:Junction:542062022-06-141313.77937579.861111
10420urn:ngsi-ld:Junction:542062022-06-141414.33363682.102190
10421urn:ngsi-ld:Junction:542062022-06-141514.01661883.174242
10422urn:ngsi-ld:Junction:542062022-06-141613.96343784.109375
10423urn:ngsi-ld:Junction:542062022-06-141713.67663881.413793