This project focuses on Exloratory Data Analysis (EDA) using two independent data sets : one involving SAT scores by state, and one involving drug use by age group. EDA is an essential part of the data science analysis pipeline. Failure to perform EDA before modeling is almost guaranteed to lead to bad models and faulty conclusions. For each data set we explore relations/correlations between features, and see if we can hypothesize/infer any information from the exploration. This project also demonstrates :
Use of Pandas DataFrames
Descriptive Statistics
Data Cleaning / Preprocessing
Feature Extraction
Use of common Python plotting libraries (Matplotlib, Seaborn, Pandas' built in functions)
Check out the full project on GitHub !
Tags : EDA, Statistics, Python