24
loading...
This website collects cookies to deliver better user experience
"Exploratory data analysis, EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily exploratory data analysis is for seeing what the data can tell us beyond the formal modelling or hypothesis testing task."
If EDA is not done properly then it can hamper the further steps in the machine learning model building process.
If done well, it may improve the efficiency of everything we do next.
url = 'https://raw.githubusercontent.com/Data-Science-East-AFrica/Exploratory-Data-Analysis-Using-Python/main/train.csv'
train=pd.read_csv(url)
sns.heatmap(train.isnull(),yticklabels=False,cbar=False,cmap=’viridis’)
train.drop(‘Cabin’,axis=1,inplace=True)
sns.factorplot(x=’Survived’,col=’Sex’,kind=’count’,data=train)
sns.countplot(x=’Survived’,hue=’Pclass’,data=train)
sns.countplot(x=’SibSp’,data=train)
sns.boxplot(x='Pclass',y='Age',data=train,palette='winter')
embark=pd.get_dummies(train[‘Embarked’],drop_first=True) sex=pd.get_dummies(train[‘Sex’],drop_first=True)
train.drop(['Sex','Embarked','Name','Ticket'],axis=1,inplace=True)
train=pd.concat([train,sex,embark],axis=1)