Welcome to analysis of students performance¶
The dataset that is used comes from the following kaggle page: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams
InĀ [13]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
InĀ [3]:
df = pd.read_csv("StudentsPerformance.csv")
Data exploration phase¶
InĀ [4]:
df.head()
Out[4]:
gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
---|---|---|---|---|---|---|---|---|
0 | female | group B | bachelor's degree | standard | none | 72 | 72 | 74 |
1 | female | group C | some college | standard | completed | 69 | 90 | 88 |
2 | female | group B | master's degree | standard | none | 90 | 95 | 93 |
3 | male | group A | associate's degree | free/reduced | none | 47 | 57 | 44 |
4 | male | group C | some college | standard | none | 76 | 78 | 75 |
InĀ [5]:
df.describe()
Out[5]:
math score | reading score | writing score | |
---|---|---|---|
count | 1000.00000 | 1000.000000 | 1000.000000 |
mean | 66.08900 | 69.169000 | 68.054000 |
std | 15.16308 | 14.600192 | 15.195657 |
min | 0.00000 | 17.000000 | 10.000000 |
25% | 57.00000 | 59.000000 | 57.750000 |
50% | 66.00000 | 70.000000 | 69.000000 |
75% | 77.00000 | 79.000000 | 79.000000 |
max | 100.00000 | 100.000000 | 100.000000 |
InĀ [17]:
for i in df.columns:
df[i].value_counts().plot(kind='bar')
plt.title(i)
plt.show()
InĀ [20]:
df.plot(subplots=True, kind='hist')
plt.tight_layout()
plt.show()