Short checklist for a small data science project

  1. Checking for null values
  2. Checking for duplicates
  3. Checking for outliers — In every data science project, its important to deal with the outliers. This process should be done before filling the missing values. If you impute the missing values with mean, and the column has outliers, it wont be helpful.

4. Explanatory data analysis — this step will help exploring each and every feature on graphs using seaborn and pandas.

5. Standardization — If the data is not uniformly distributed its important to standardize the data.

5. Dimensionality reduction — Always split the data before dimensionality reduction. Because if u split the data later, your accuracy might turn out to be better which can lead to overfitting.

6. Split the data

7. Train the model

8. Make predictions

9. Evaluate the model

10. Tuning the model to improve the accuracy — Change the hyper-parameters in order to improve the performance

--

--

--

Data Scientist | Machine Learning Engineer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Using BigQuery to Track and Estimate Home Heating Oil Deliveries

Car dashboard speedometer with focus on 99MPG fuel economy

NBA Shot-Log Project

Google Data Analytics Capstone Project — Cyclistic Bike Share Analysis

ENSEMBLING METHODOLOGY

Exploring the Restaurant Scenario in the Neighbourhood of Tirupathi using Foursquare and Zomato API

Tirupathi, Sree Venkateswara swamy Temple

Internet Users by Country

How Do You Bring SQL Together For Use In Data Science?

A Comprehensive Guide to Machine Learning Interpretability.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tanvi Punjani

Tanvi Punjani

Data Scientist | Machine Learning Engineer

More from Medium

The Data Scientist’s New Year Wish.

Data Analysis 1: Numerical Computing with Numpy

Sales Analysis of an Electronic Store in United States

Data Analysis in 10 Easy Steps