News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

2024-11(5)

[Data Analysis and Visualization] Use Python to perform visual analysis on student performance (with source code)

posted on 2023-06-06 09:36     read(225)     comment(0)     like(11)     collect(4)


If you need source code and data sets, please like and follow the collection and leave a private message in the comment area~~~

The following is a visual analysis of data such as students' sentence formation and performance

1: import module

  1. import pandas as pd
  2. import numpy as np
  3. import seaborn as sns
  4. import matplotlib.pyplot as plt
  5. plt.rcParams['font.sans-serif']=['simhei']
  6. plt.rcParams['font.serif'] = ['simhei']
  7. import warnings
  8. warnings.filterwarnings('ignore')

2: Get the data and print the first four lines

  1. from matplotlib.font_manager import FontProperties
  2. myfont=FontProperties(fname=r'C:\Windows\Fonts\SimHei.ttf',size=12)
  3. sns.set(font=myfont.get_name())
  4. df = pd.read_csv('.\data\StudentPerformance.csv')
  5. df.head(4)

 The corresponding meaning of the attribute list is as follows

Gender

Nationality

Place of Birth

Stageid school level

Gradeid

Sectionid class

Topic subjects

semester semester

ralation child family education director

raisedhands The number of times students raised their hands in class

announcementviews The number of times students viewed online courseware

discussion The number of times students participated in classroom discussions

parentanswersurvey Did the parents fill out the school's questionnaire

parentsschoolsatisfaction Parents' satisfaction with the school

studentabsencedays student absence days

3: Data visualization analysis

Next, modify the table column name to Chinese

  1. df.rename(columns={'gender':'性别','NationalITy':'国籍','PlaceofBirth':'出生地',
  2. 'StageID':'学段','GradeID':'年级','SectionID':'班级','Topic':'科目',
  3. 'Semester':'学期','Relation':'监管人','raisedhands':'举手次数',
  4. 'VisITedResources':'浏览课件次数','AnnouncementsView':'浏览公告次数',
  5. 'Discussion':'讨论次数','ParentAnsweringSurvey':'父母问卷',
  6. 'ParentschoolSatisfaction':'家长满意度','StudentAbsenceDays':'缺勤次数',
  7. 'Class':'成绩'},inplace=True)
  8. df.replace({'lowerlevel':'小学','MiddleSchool':'中学','HighSchool':'高中'},inplace=True)
  9. df.columns

 Display the values ​​of term and period

Then modify the data

  1. df.replace({'lowerlevel':'小学','MiddleSchool':'中学','HighSchool':'高中'},inplace=True)
  2. df['性别'].replace({'M':'男','F':'女'},inplace=True)
  3. df['学期'].replace({'S':'春季','F':'秋季'},inplace=True)
  4. df.head(4)

 View vacancies

df.isnull().sum()

View data statistics

 

 Then draw a histogram of counts by grade

sns.countplot(x = '成绩', order = ['L', 'M', 'H'], data = df, linewidth=2,edgecolor=sns.color_palette("dark",4))

 Then draw a histogram of counts by gender

sns.countplot(x = '性别', order = ['女', '男'],data = df)

 Draw a histogram of counts by subject

  1. sns.set_style('whitegrid')
  2. sns.set(rc={'figure.figsize':(16,8)},font=myfont.get_name(),font_scale=1.5)
  3. sns.countplot(x = '科目', data = df)

 Draw a histogram of counts of different grades by subject

Plot a histogram of counts by gender and grade

sns.countplot(x = '性别', hue = '成绩',data = df, order = ['女', '男'], hue_order = ['L', 'M', 'H'])

View grade distribution ratio by class

  1. sns.countplot(x = '班级', hue='成绩', data=df, hue_order = ['L','M','H'])
  2. # 从这里可以看出虽然每个班人数较少,但是没有那个班优秀的人数的比例比较突出,这个特征可以删除

 Analyze the correlation between 4 performance and grades

  1. # 了解四个课堂和课后表现与成绩的相关性
  2. fig, axes = plt.subplots(2,2,figsize=(14,10))
  3. sns.barplot(x='成绩', y='浏览课件次数',data=df,order=['L','M','H'],ax=axes[0,0])
  4. sns.barplot(x='成绩', y='浏览公告次数',data=df,order=['L','M','H'],ax=axes[0,1])
  5. sns.barplot(x='成绩', y='举手次数',data=df,order=['L','M','H'],ax=axes[1,0])
  6. sns.barplot(x='成绩', y='讨论次数',data=df,order=['L','M','H'],ax=axes[1,1])
  7. # 在sns.barplot中,默认的计算方式为计算平均值

 Analyze the discussion of students with different grades

  1. # 了解举手次数与成绩之间的相关性
  2. sns.set(rc={'figure.figsize':(8,6)},font=myfont.get_name(),font_scale=1.5)
  3. sns.boxplot(x='成绩',y='讨论次数',data=df,order=['L','M','H'])

 Analyze the correlation between the number of hands raised and the number of discussions participated

  1. # 了解四个课堂后量化表现之间的相关性
  2. # fig,axes = plt.subplots(2,1,figsize=(10,10))
  3. sns.regplot(x='举手次数',y='讨论次数',order =4,data=df)
  4. # sns.regplot(x='浏览公告次数',y='浏览课件次数',order=4,data=df,ax=axes[1]) ,ax=axes[0]

 Analyze the correlation between the number of times of browsing courseware, the number of times of raising hands, the number of times of browsing announcements, and the number of discussions

  1. # Correlation Matrix 相关性矩阵
  2. corr = df[['浏览课件次数','举手次数','浏览公告次数','讨论次数']].corr()
  3. corr

 Finally, visualize the correlation matrix with a heat map

  1. # Correlation Matrix Visualization 相关性可视化
  2. sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.columns)

 It's not easy to create and find it helpful, please like, follow and collect~~~



Category of website: technical article > Blog

Author:Disheartened

link:http://www.pythonblackhole.com/blog/article/79525/0abb96e7708654d11e86/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

11 0
collect article
collected

Comment content: (supports up to 255 characters)