News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

数据分析案例-电影数据可视化分析

posted on 2023-06-06 11:09     read(539)     comment(0)     like(22)     collect(5)


Data introduction

The data is 2011-2021 movie data

visual analysis

First import the packages and data needed for this project

  1. import pandas as pd
  2. import numpy as np
  3. import matplotlib.pyplot as plt
  4. import seaborn as sns
  5. from pyecharts.charts import Pie
  6. from pyecharts import options as opts
  7. from pyecharts.globals import ThemeType
  8. sns.set_style('ticks')
  9. import warnings
  10. warnings.filterwarnings('ignore') # 忽略警告
  11. plt.rcParams['font.sans-serif'] = ['SimHei'] #解决中文显示
  12. plt.rcParams['axes.unicode_minus'] = False #解决符号无法显示
  13. data = pd.read_excel('data.xlsx')
  14. data.head()

data preprocessing

  1. data.dropna(inplace=True)
  2. data.reset_index(drop=True,inplace=True)
  3. data.drop_duplicates(['电影名称'],inplace=True)
  4. data['年份'] = data['上映时间'].apply(lambda x:x.split('-')[0])
  5. # 将首周票房中的--数据删除
  6. data[data['首周票房']=='--'].index
  7. data.drop(index=data[data['首周票房']=='--'].index,inplace=True)
  8. data.reset_index(drop=True,inplace=True)
  9. # 将首周票房亿单位转化为万,且只保留数字
  10. data['首周票房'] = data['首周票房'].apply(lambda x: float(x[:-1])*1000 if x[-1] == '亿' else float(x[:-1]))
  11. # 将累计票房亿单位转化为万,且只保留数字
  12. data['累计票房'] = data['累计票房'].apply(lambda x: float(x[:-1])*1000 if x[-1] == '亿' else float(x[:-1]))

 visualization

  1. # 分析各个年份的总票房
  2. df1 = data.groupby('年份').sum()['累计票房']
  3. plt.figure(figsize=(10,8))
  4. plt.title('各个年份的总票房',fontsize=14)
  5. plt.xlabel('年份',fontsize=14)
  6. plt.ylabel('总票房(万元)',fontsize=14)
  7. plt.bar(x=df1.index,height=df1.values)
  8. plt.show()

  1. # 分析各个年份的电影比例
  2. result_list = [(i,j) for i,j in zip(df1.index.to_list(),df1.values.tolist())]
  3. a = Pie(init_opts=opts.InitOpts(theme = ThemeType.DARK))
  4. a.add(series_name='年份',
  5. data_pair=result_list,
  6. rosetype='radius',
  7. radius='70%',
  8. )
  9. a.set_global_opts(title_opts=opts.TitleOpts(title="各个年份的电影比例",
  10. pos_top=50))
  11. a.set_series_opts(tooltip_opts=opts.TooltipOpts(trigger='item',formatter='{a} <br/>{b}:{c} ({d}%)'))
  12. a.render_notebook()

 

  1. # 分析各个年份的平均票价
  2. df2 = data.groupby('年份').mean()['平均票价']
  3. plt.figure(figsize=(10,8))
  4. plt.title('各个年份的平均票价',fontsize=14)
  5. plt.xlabel('年份',fontsize=14)
  6. plt.ylabel('平均票价(元)',fontsize=14)
  7. plt.plot(df2.index,df2.values)
  8. plt.show()

 

  1. # 分析电影片长的分布
  2. sns.displot(data['片长'],bins=30,kde=True)

  1. # 分析电影平均票价的分布
  2. sns.displot(data['平均票价'],kde=True)

 

  1. # 分析导演喜爱度的分布
  2. sns.displot(data['导演喜爱度'],kde=True)

 

  1. df3 = data['来源'].apply(lambda x:x.split(',')[0]).value_counts().head()
  2. # 各个地区的电影比例
  3. a1 = Pie(init_opts=opts.InitOpts(theme = ThemeType.CHALK))
  4. a1.add(series_name='地区',
  5. data_pair=[list(z) for z in zip(df3.index.to_list(),df3.values.tolist())],
  6. rosetype='radius',
  7. radius='60%',
  8. )
  9. a1.set_global_opts(title_opts=opts.TitleOpts(title="各个地区的电影比例",
  10. pos_left='center',
  11. pos_top=30))
  12. a1.set_series_opts(tooltip_opts=opts.TooltipOpts(trigger='item',formatter='{a} <br/>{b}:{c} ({d}%)'))
  13. a1.render_notebook()

  1. # 分析拍电影数前五的发行公司
  2. df4 = data['发行公司'].value_counts().head().plot(kind='barh')

 

  1. # 分析片长和评分的关系
  2. plt.figure(figsize=(10,8))
  3. plt.scatter(data['片长'],data['评分'])
  4. plt.title('片长和评分的关系',fontsize=15)
  5. plt.xlabel('片长',fontsize=15)
  6. plt.ylabel('评分',fontsize=15)
  7. plt.show()

 

  1. # 分析各个特征之间的相关系数
  2. fig = plt.figure(figsize=(18,18))
  3. sns.heatmap(data.corr(),vmax=1,annot=True,linewidths=0.5,cbar=False,cmap='YlGnBu',annot_kws={'fontsize':25})
  4. plt.xticks(fontsize=20)
  5. plt.yticks(fontsize=20)
  6. plt.title('各个特征之间的相关系数',fontsize=20)
  7. plt.show()

  1. # 分析年度总票房走势
  2. df1 = data.groupby('年份').sum()['累计票房']
  3. plt.figure(figsize=(10,8))
  4. plt.title('年度总票房走势',fontsize=14)
  5. plt.xlabel('年份',fontsize=14)
  6. plt.ylabel('总票房(万元)',fontsize=14)
  7. plt.plot(df1.index,df1.values)
  8. plt.show()

  1. # 分析哪种制片制式最受欢迎
  2. from pyecharts.charts import WordCloud
  3. import collections
  4. result_list = []
  5. for i in data['制片制式'].values:
  6. word_list = str(i).split('/')
  7. for j in word_list:
  8. result_list.append(j)
  9. result_list
  10. word_counts = collections.Counter(result_list)
  11. word_counts_top = word_counts.most_common(50)
  12. print(word_counts_top)
  13. wc = WordCloud()
  14. wc.add('',word_counts_top)
  15. wc.render_notebook()

 

  1. # 分析各种制式制片的比例
  2. a2 = Pie(init_opts=opts.InitOpts(theme = ThemeType.CHALK))
  3. a2.add(series_name='类型',
  4. data_pair=word_counts_top,
  5. radius='60%',
  6. )
  7. a2.set_global_opts(title_opts=opts.TitleOpts(title="各种制式制片的比例",
  8. pos_top=50))
  9. a2.set_series_opts(tooltip_opts=opts.TooltipOpts(trigger='item',formatter='{a} <br/>{b}:{c} ({d}%)'))
  10. a2.render_notebook()

 

  1. # 分析各种类型的电影出现的次数
  2. from pyecharts.charts import WordCloud
  3. import collections
  4. result_list = []
  5. for i in data['电影类型'].values:
  6. word_list = str(i).split(' / ')
  7. for j in word_list:
  8. result_list.append(j)
  9. result_list
  10. word_counts = collections.Counter(result_list)
  11. # 词频统计:获取前100最高频的词
  12. word_counts_top = word_counts.most_common(100)
  13. print(word_counts_top)
  14. wc = WordCloud()
  15. wc.add('',word_counts_top)
  16. wc.render_notebook()

 

  1. # 分析各种类型电影的比例
  2. word_counts_top = word_counts.most_common(10)
  3. a3 = Pie(init_opts=opts.InitOpts(theme = ThemeType.MACARONS))
  4. a3.add(series_name='类型',
  5. data_pair=word_counts_top,
  6. rosetype='radius',
  7. radius='60%',
  8. )
  9. a3.set_global_opts(title_opts=opts.TitleOpts(title="各种类型电影的比例",
  10. pos_left='center',
  11. pos_top=50))
  12. a3.set_series_opts(tooltip_opts=opts.TooltipOpts(trigger='item',formatter='{a} <br/>{b}:{c} ({d}%)'))
  13. a3.render_notebook()

The following is the experience:

Through this actual Python project , I have learned a lot of new knowledge. This is a good opportunity for me to apply the theoretical knowledge in books to practice. Originally, when I was studying, I lamented that the materials I learned were too difficult to understand. Now that I think about it, some of them are actually not difficult, the key is to understand.

In this actual combat, I also exercised my other abilities and improved my overall quality. First of all, it has exercised my ability to do projects, improved my ability to think independently and do it myself. In the process of working, I reviewed the knowledge I have learned before and mastered some skills of applying knowledge, etc.

In this actual combat, I also learned the following work and study mentality:

1) Continue to study and continuously improve theoretical cultivation. In the information age, learning is the driving force for constantly absorbing new information and gaining career advancement. As a young student, learning should be regarded as an important way to maintain work enthusiasm. After taking up the job, I will actively respond to the call of the unit, combine the actual work, continue to learn theory, business knowledge and social knowledge, arm my mind with advanced theories, improve my potential with excellent business knowledge, and expand my horizons with extensive social knowledge.

2) Practice hard and consciously transform the protagonist. Only by putting theory into practice can the value of theory itself be realized, and only by putting theory into practice can theory be tested. Similarly, a person's value is also realized through practical activities, and only through practice can one's quality be tempered and one's will manifested.

3) Improve work enthusiasm and initiative. Internship is the beginning and the end. What unfolds in front of me is a piece of fertile ground for me to gallop, and I clearly feel the heavy responsibility. In the future work and life, I will continue to study, practice in depth, constantly improve myself, strive to create performance, and continue to create more value.

This Python actual combat not only enabled me to learn knowledge, but also enriched my experience. It also helped me narrow the gap between practice and theory. In the future work, I will continue to apply the theoretical knowledge and practical experience I have learned to practical work, and work hard to realize my ideal.

In this actual combat, I also exercised my other abilities and improved my overall quality. First of all, it has exercised my ability to do projects, improved my ability to think independently and do it myself. In the process of working, I reviewed the knowledge I have learned before and mastered some skills of applying knowledge, etc.

1) Continue to study and continuously improve theoretical cultivation. In the information age, learning is the driving force for constantly absorbing new information and gaining career advancement. As a young student, learning should be regarded as an important way to maintain work enthusiasm. After taking up the job, I will actively respond to the call of the unit, combine the actual work, continue to learn theory, business knowledge and social knowledge, arm my mind with advanced theories, improve my potential with excellent business knowledge, and expand my horizons with extensive social knowledge.

2) Practice hard and consciously transform the protagonist. Only by putting theory into practice can the value of theory itself be realized, and only by putting theory into practice can theory be tested. Similarly, a person's value is also realized through practical activities, and only through practice can one's quality be tempered and one's will manifested.

3) Improve work enthusiasm and initiative. Internship is the beginning and the end. What unfolds in front of me is a piece of fertile ground for me to gallop, and I clearly feel the heavy responsibility. In the future work and life, I will continue to study, practice in depth, constantly improve myself, strive to create performance, and continue to create more value.

This Python actual combat not only enabled me to learn knowledge, but also enriched my experience. It also helped me narrow the gap between practice and theory. In the future work, I will continue to apply the theoretical knowledge and practical experience I have learned to practical work, and work hard to realize my ideal.



Category of website: technical article > Blog

Author:Soledad

link:http://www.pythonblackhole.com/blog/article/80177/ea6edfc02cc0aedf64de/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

22 0
collect article
collected

Comment content: (supports up to 255 characters)