Overview

Dataset statistics

Number of variables12
Number of observations891
Missing cells866
Missing cells (%)8.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory83.7 KiB
Average record size in memory96.1 B

Variable types

CAT6
NUM5
BOOL1

Warnings

Ticket has a high cardinality: 681 distinct values High cardinality
Cabin has a high cardinality: 147 distinct values High cardinality
Age has 177 (19.9%) missing values Missing
Cabin has 687 (77.1%) missing values Missing
Ticket is uniformly distributed Uniform
Cabin is uniformly distributed Uniform
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 608 (68.2%) zeros Zeros
Parch has 678 (76.1%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2021-06-15 06:36:48.929254
Analysis finished2021-06-15 06:36:57.804957
Duration8.88 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

PassengerId
Real number (ℝ≥0)

UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Memory size7.0 KiB
2021-06-14T23:36:57.940610image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45.5
Q1223.5
median446
Q3668.5
95-th percentile846.5
Maximum891
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5770265516
Kurtosis-1.2
Mean446
Median Absolute Deviation (MAD)223
Skewness0
Sum397386
Variance66231
MonotocityStrictly increasing
2021-06-14T23:36:58.124221image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
89110.1%
 
29310.1%
 
30410.1%
 
30310.1%
 
30210.1%
 
30110.1%
 
30010.1%
 
29910.1%
 
29810.1%
 
29710.1%
 
Other values (881)88198.9%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
89110.1%
 
89010.1%
 
88910.1%
 
88810.1%
 
88710.1%
 

Survived
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
0
549 
1
342 
ValueCountFrequency (%) 
054961.6%
 
134238.4%
 
2021-06-14T23:36:58.238347image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Pclass
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
3
491 
1
216 
2
184 
ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 
2021-06-14T23:36:58.333714image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-06-14T23:36:58.426062image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:58.514164image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Name
Categorical

UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
Rogers, Mr. William John
 
1
Goldenberg, Mr. Samuel L
 
1
Chronopoulos, Mr. Apostolos
 
1
Saad, Mr. Khalil
 
1
Robins, Mrs. Alexander A (Grace Charity Laury)
 
1
Other values (886)
886 
ValueCountFrequency (%) 
Rogers, Mr. William John10.1%
 
Goldenberg, Mr. Samuel L10.1%
 
Chronopoulos, Mr. Apostolos10.1%
 
Saad, Mr. Khalil10.1%
 
Robins, Mrs. Alexander A (Grace Charity Laury)10.1%
 
Sundman, Mr. Johan Julian10.1%
 
Taylor, Mr. Elmer Zebley10.1%
 
Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)10.1%
 
Connolly, Miss. Kate10.1%
 
Sheerlinck, Mr. Jan Baptist10.1%
 
Other values (881)88198.9%
 
2021-06-14T23:36:58.671607image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique891 ?
Unique (%)100.0%
2021-06-14T23:36:58.863426image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length82
Median length25
Mean length26.96520763
Min length12

Sex
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
male
577 
female
314 
ValueCountFrequency (%) 
male57764.8%
 
female31435.2%
 
2021-06-14T23:36:59.020613image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-06-14T23:36:59.099558image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:59.199364image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Age
Real number (ℝ≥0)

MISSING

Distinct88
Distinct (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Memory size7.0 KiB
2021-06-14T23:36:59.358864image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
MonotocityNot monotonic
2021-06-14T23:36:59.535153image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
24303.4%
 
22273.0%
 
18262.9%
 
28252.8%
 
19252.8%
 
30252.8%
 
21242.7%
 
25232.6%
 
36222.5%
 
29202.2%
 
Other values (78)46752.4%
 
(Missing)17719.9%
 
ValueCountFrequency (%) 
0.4210.1%
 
0.6710.1%
 
0.7520.2%
 
0.8320.2%
 
0.9210.1%
 
ValueCountFrequency (%) 
8010.1%
 
7410.1%
 
7120.2%
 
70.510.1%
 
7020.2%
 

SibSp
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Memory size7.0 KiB
2021-06-14T23:36:59.836083image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotocityNot monotonic
2021-06-14T23:36:59.959778image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
060868.2%
 
120923.5%
 
2283.1%
 
4182.0%
 
3161.8%
 
870.8%
 
550.6%
 
ValueCountFrequency (%) 
060868.2%
 
120923.5%
 
2283.1%
 
3161.8%
 
4182.0%
 
ValueCountFrequency (%) 
870.8%
 
550.6%
 
4182.0%
 
3161.8%
 
2283.1%
 

Parch
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Memory size7.0 KiB
2021-06-14T23:37:00.087534image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotocityNot monotonic
2021-06-14T23:37:00.202886image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
067876.1%
 
111813.2%
 
2809.0%
 
550.6%
 
350.6%
 
440.4%
 
610.1%
 
ValueCountFrequency (%) 
067876.1%
 
111813.2%
 
2809.0%
 
350.6%
 
440.4%
 
ValueCountFrequency (%) 
610.1%
 
550.6%
 
440.4%
 
350.6%
 
2809.0%
 

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct681
Distinct (%)76.4%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
347082
 
7
CA. 2343
 
7
1601
 
7
CA 2144
 
6
347088
 
6
Other values (676)
858 
ValueCountFrequency (%) 
34708270.8%
 
CA. 234370.8%
 
160170.8%
 
CA 214460.7%
 
34708860.7%
 
310129560.7%
 
S.O.C. 1487950.6%
 
38265250.6%
 
266640.4%
 
LINE40.4%
 
Other values (671)83493.6%
 
2021-06-14T23:37:00.379319image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique547 ?
Unique (%)61.4%
2021-06-14T23:37:00.553835image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length18
Median length6
Mean length6.750841751
Min length3

Fare
Real number (ℝ≥0)

ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Memory size7.0 KiB
2021-06-14T23:37:00.712817image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotocityNot monotonic
2021-06-14T23:37:00.893081image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
8.05434.8%
 
13424.7%
 
7.8958384.3%
 
7.75343.8%
 
26313.5%
 
10.5242.7%
 
7.925182.0%
 
7.775161.8%
 
26.55151.7%
 
0151.7%
 
Other values (238)61569.0%
 
ValueCountFrequency (%) 
0151.7%
 
4.012510.1%
 
510.1%
 
6.237510.1%
 
6.437510.1%
 
ValueCountFrequency (%) 
512.329230.3%
 
26340.4%
 
262.37520.2%
 
247.520820.2%
 
227.52540.4%
 

Cabin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct147
Distinct (%)72.1%
Missing687
Missing (%)77.1%
Memory size7.0 KiB
G6
 
4
B96 B98
 
4
C23 C25 C27
 
4
E101
 
3
F33
 
3
Other values (142)
186 
ValueCountFrequency (%) 
G640.4%
 
B96 B9840.4%
 
C23 C25 C2740.4%
 
E10130.3%
 
F3330.3%
 
C22 C2630.3%
 
F230.3%
 
D30.3%
 
B3520.2%
 
F420.2%
 
Other values (137)17319.4%
 
(Missing)68777.1%
 
2021-06-14T23:37:01.090094image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique101 ?
Unique (%)49.5%
2021-06-14T23:37:01.254964image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length3
Mean length3.134680135
Min length1

Embarked
Categorical

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.0 KiB
S
644 
C
168 
Q
77 
ValueCountFrequency (%) 
S64472.3%
 
C16818.9%
 
Q778.6%
 
(Missing)20.2%
 
2021-06-14T23:37:01.395786image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-06-14T23:37:01.485544image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:37:01.585272image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length1
Mean length1.004489338
Min length1

Interactions

2021-06-14T23:36:53.241203image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:53.389312image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:53.533266image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:53.667651image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:53.804570image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:53.938575image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.090271image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.255164image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.410848image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.563162image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.712529image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.847700image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:54.993857image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.129096image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.268623image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.417895image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.561945image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.712028image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:55.850467image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.116909image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.259428image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.392380image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.535675image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.670294image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:56.807561image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Correlations

2021-06-14T23:37:01.709859image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-14T23:37:01.897909image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-14T23:37:02.075957image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-14T23:37:02.260138image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-06-14T23:37:02.426522image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-06-14T23:36:57.085609image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:57.381120image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:57.573721image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-06-14T23:36:57.683105image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Sample

First rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC

Last rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ