
Short checklist for a small data science project
- Checking for null values
- Checking for duplicates
- Checking for outliers — In every data science project, its important to deal with the outliers. This process should be done before filling the missing values. If you impute the missing values with mean, and the column has outliers, it wont be helpful.
4. Explanatory data analysis — this step will help exploring each and every feature on graphs using seaborn and pandas.
5. Standardization — If the data is not uniformly distributed its important to standardize the data.
5. Dimensionality reduction — Always split the data before dimensionality reduction. Because if u split the data later, your accuracy might turn out to be better which can lead to overfitting.
6. Split the data
7. Train the model
8. Make predictions
9. Evaluate the model
10. Tuning the model to improve the accuracy — Change the hyper-parameters in order to improve the performance