Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory80.0 B

Variable types

Numeric9
Categorical1

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Reproduction

Analysis started2021-03-16 06:53:41.379928
Analysis finished2021-03-16 06:53:57.967040
Duration16.59 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:53:58.085206image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotocityNot monotonic
2021-03-15T23:53:58.307823image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31162
 
0.8%
-118.3160
 
0.8%
-118.29148
 
0.7%
-118.27144
 
0.7%
-118.32142
 
0.7%
-118.28141
 
0.7%
-118.35140
 
0.7%
-118.36138
 
0.7%
-118.19135
 
0.7%
-118.25128
 
0.6%
Other values (834)19202
93.0%
ValueCountFrequency (%)
-124.351
< 0.1%
-124.32
< 0.1%
-124.271
< 0.1%
-124.261
< 0.1%
-124.251
< 0.1%
ValueCountFrequency (%)
-114.311
< 0.1%
-114.471
< 0.1%
-114.491
< 0.1%
-114.551
< 0.1%
-114.561
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct862
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:53:58.521618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
MonotocityNot monotonic
2021-03-15T23:53:58.701785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06244
 
1.2%
34.05236
 
1.1%
34.08234
 
1.1%
34.07231
 
1.1%
34.04221
 
1.1%
34.09212
 
1.0%
34.02208
 
1.0%
34.1203
 
1.0%
34.03193
 
0.9%
33.93181
 
0.9%
Other values (852)18477
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.5610
 
< 0.1%
32.5718
0.1%
32.5826
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.63948643
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:53:59.087956image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
MonotocityNot monotonic
2021-03-15T23:53:59.308109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521273
 
6.2%
36862
 
4.2%
35824
 
4.0%
16771
 
3.7%
17698
 
3.4%
34689
 
3.3%
26619
 
3.0%
33615
 
3.0%
18570
 
2.8%
25566
 
2.7%
Other values (42)13153
63.7%
ValueCountFrequency (%)
14
 
< 0.1%
258
 
0.3%
362
 
0.3%
4191
0.9%
5244
1.2%
ValueCountFrequency (%)
521273
6.2%
5148
 
0.2%
50136
 
0.7%
49134
 
0.6%
48177
 
0.9%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct5926
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.763081
Minimum2
Maximum39320
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:53:59.509607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
MonotocityNot monotonic
2021-03-15T23:53:59.695022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152718
 
0.1%
161317
 
0.1%
158217
 
0.1%
212716
 
0.1%
170315
 
0.1%
147115
 
0.1%
205315
 
0.1%
172215
 
0.1%
171715
 
0.1%
160715
 
0.1%
Other values (5916)20482
99.2%
ValueCountFrequency (%)
21
< 0.1%
61
< 0.1%
81
< 0.1%
111
< 0.1%
121
< 0.1%
ValueCountFrequency (%)
393201
< 0.1%
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304501
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:53:59.887750image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotocityNot monotonic
2021-03-15T23:54:00.064037image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28055
 
0.3%
33151
 
0.2%
34550
 
0.2%
34349
 
0.2%
39349
 
0.2%
39448
 
0.2%
34848
 
0.2%
32848
 
0.2%
27247
 
0.2%
30947
 
0.2%
Other values (1913)19941
96.6%
(Missing)207
 
1.0%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
35
< 0.1%
47
< 0.1%
56
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
62101
< 0.1%
54711
< 0.1%
54191
< 0.1%
52901
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3888
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.476744
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:54:00.271111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
MonotocityNot monotonic
2021-03-15T23:54:00.486518image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89125
 
0.1%
85024
 
0.1%
105224
 
0.1%
122724
 
0.1%
76124
 
0.1%
82523
 
0.1%
99922
 
0.1%
100522
 
0.1%
78222
 
0.1%
75321
 
0.1%
Other values (3878)20409
98.9%
ValueCountFrequency (%)
31
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
84
< 0.1%
92
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:54:00.706943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotocityNot monotonic
2021-03-15T23:54:00.884862image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30657
 
0.3%
33556
 
0.3%
38656
 
0.3%
28255
 
0.3%
42954
 
0.3%
37553
 
0.3%
29751
 
0.2%
28451
 
0.2%
38050
 
0.2%
36250
 
0.2%
Other values (1805)20107
97.4%
ValueCountFrequency (%)
11
 
< 0.1%
23
< 0.1%
34
< 0.1%
44
< 0.1%
57
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
49301
< 0.1%

median_income
Real number (ℝ≥0)

Distinct12928
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870671003
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:54:01.074162image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
MonotocityNot monotonic
2021-03-15T23:54:01.299015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12549
 
0.2%
15.000149
 
0.2%
2.87546
 
0.2%
2.62544
 
0.2%
4.12544
 
0.2%
3.87541
 
0.2%
3.37538
 
0.2%
338
 
0.2%
437
 
0.2%
3.62537
 
0.2%
Other values (12918)20217
98.0%
ValueCountFrequency (%)
0.499912
0.1%
0.53610
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
ValueCountFrequency (%)
15.000149
0.2%
152
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%

median_house_value
Real number (ℝ≥0)

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2021-03-15T23:54:01.557598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816 × 1010
MonotocityNot monotonic
2021-03-15T23:54:01.804094image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001965
 
4.7%
137500122
 
0.6%
162500117
 
0.6%
112500103
 
0.5%
18750093
 
0.5%
22500092
 
0.4%
35000079
 
0.4%
8750078
 
0.4%
27500065
 
0.3%
15000064
 
0.3%
Other values (3832)18862
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225004
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
ValueCountFrequency (%)
500001965
4.7%
50000027
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%

ocean_proximity
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
<1H OCEAN
9136 
INLAND
6551 
NEAR OCEAN
2658 
NEAR BAY
2290 
ISLAND
 
5

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Characters and Unicode

Total characters166460
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEAR BAY
2nd rowNEAR BAY
3rd rowNEAR BAY
4th rowNEAR BAY
5th rowNEAR BAY
ValueCountFrequency (%)
<1H OCEAN9136
44.3%
INLAND6551
31.7%
NEAR OCEAN2658
 
12.9%
NEAR BAY2290
 
11.1%
ISLAND5
 
< 0.1%
2021-03-15T23:54:02.287052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:54:02.414040image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
ocean11794
34.0%
1h9136
26.3%
inland6551
18.9%
near4948
14.2%
bay2290
 
6.6%
island5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134104
80.6%
Space Separator14084
 
8.5%
Math Symbol9136
 
5.5%
Decimal Number9136
 
5.5%

Most frequent character per category

ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
ValueCountFrequency (%)
14084
100.0%
ValueCountFrequency (%)
<9136
100.0%
ValueCountFrequency (%)
19136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134104
80.6%
Common32356
 
19.4%

Most frequent character per script

ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
ValueCountFrequency (%)
14084
43.5%
<9136
28.2%
19136
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII166460
100.0%

Most frequent character per block

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Interactions

2021-03-15T23:53:43.120988image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:43.548801image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:44.023021image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:44.342997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:44.634676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:44.891298image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:45.131806image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:45.350798image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:45.561001image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:45.775929image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:45.937490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.097965image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.248734image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.407978image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.565019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.746060image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:46.924397image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:47.084665image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:47.242461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:47.499996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.034028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.196754image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.383592image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.571157image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.739156image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:48.926135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:49.101627image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:49.273689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:49.475696image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:49.710597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:49.952760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:50.140558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:50.337294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:50.536546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:50.710030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:50.878703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:51.082492image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:51.297656image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:51.507991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:51.686165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:51.850151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.011410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.181483image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.362438image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.549068image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.728611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:52.895998image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.058150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.219443image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.497799image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.648864image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.802491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:53.969726image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.131988image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.307025image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.476408image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.628980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.786759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:54.939382image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.098631image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.269703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.438863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.612324image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.773879image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:55.934046image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.083687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.235454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.389767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.555289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.714669image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:56.871075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-15T23:53:57.023431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-03-15T23:54:02.579335image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-15T23:54:02.822245image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-15T23:54:03.113352image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-15T23:54:03.357170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-03-15T23:53:57.336195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-15T23:53:57.660975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-15T23:53:57.834789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
5-122.2537.8552.0919.0213.0413.0193.04.0368269700.0NEAR BAY
6-122.2537.8452.02535.0489.01094.0514.03.6591299200.0NEAR BAY
7-122.2537.8452.03104.0687.01157.0647.03.1200241400.0NEAR BAY
8-122.2637.8442.02555.0665.01206.0595.02.0804226700.0NEAR BAY
9-122.2537.8452.03549.0707.01551.0714.03.6912261100.0NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.2911.02640.0505.01257.0445.03.5673112000.0INLAND
20631-121.4039.3315.02655.0493.01200.0432.03.5179107200.0INLAND
20632-121.4539.2615.02319.0416.01047.0385.03.1250115600.0INLAND
20633-121.5339.1927.02080.0412.01082.0382.02.549598300.0INLAND
20634-121.5639.2728.02332.0395.01041.0344.03.7125116800.0INLAND
20635-121.0939.4825.01665.0374.0845.0330.01.560378100.0INLAND
20636-121.2139.4918.0697.0150.0356.0114.02.556877100.0INLAND
20637-121.2239.4317.02254.0485.01007.0433.01.700092300.0INLAND
20638-121.3239.4318.01860.0409.0741.0349.01.867284700.0INLAND
20639-121.2439.3716.02785.0616.01387.0530.02.388689400.0INLAND