posted on 2023-05-21 17:27 read(539) comment(0) like(16) collect(5)
(1) Modeling scheme
[The Eleventh Teddy Cup Data Mining Challenge in 2023] Question B: Data Analysis and Demand Forecasting Modeling of Product Orders and Python Code Detailed Explanation Question 1 [The Eleventh Teddy Cup Data Mining Challenge in 2023]
B Question: Data Analysis and Demand Forecast Modeling of Product Orders and Detailed Explanation of Python Code Question 2
(2) Papers on relevant competition topics
one. problem background
In recent years, the external environment of enterprises has become more and more uncertain, and the complex and changeable external environment has made the supply chain of enterprises face more difficulties.
Demand forecasting is the first line of defense in an enterprise's supply chain, and its importance is self-evident. However, demand forecasting is affected by various factors, resulting in generally low forecasting accuracy. Therefore, more excellent algorithms are needed to solve this problem. Demand forecasting is a theoretically based conclusion based on historical data and future predictions, which is helpful for the company's management to make decision-making references for future sales and operation plans, goals, and capital budgets; secondly, demand forecasting is helpful for procurement planning And arrange the formulation of production plans to reduce the impact of business fluctuations. If there is no demand forecast or the forecast is inaccurate, many internal decisions about sales, procurement, and financial budgets in the company can only be based on experience, which will lead to insufficient market forecasts, resulting in backlogs or shortages of inventory and funds, etc. Inventory costs.
two. the data shows
The training data (order_train1.csv) in the attachment provides the shipment data of a large domestic manufacturing company to dealers from September 1, 2015 to December 20, 2018 (see Table 1 for the format), reflecting the company's products Price and demand information in different sales regions, including: order_date (order date), sales_region_code (sales region code), item_code (product code), first_cate_code (product category code), second_cate_code (product category code), sales_chan_name ( sales channel name), item_price (product price), and ord_qty (order demand quantity).
Table 1: Data format of training quantity (historical data)
Among them, "order date" is the date of a certain demand; one "major product category code" corresponds to multiple "product category codes"; "sales channel name" is divided into online (online) and offline (offline), "Online" refers to e-commerce platforms such as Taobao and JD.com, and "offline" refers to offline physical dealers.
The forecast data (predict_sku1.csv) in the attachment provides the sales area code, product code, product category and product category of the product to be forecasted (see Table 2 for the format).
Table 2: Sample data for products that require forecasting
three. issues that need resolving
question.
(1) The impact of different prices of products on the quantity demanded;
(2) 产品所在区域对需求量的影响,以及不同区域的产品需求量有何特性;
(3) 不同销售方式(线上和线下)的产品需求量的特性;
(4) 不同品类之间的产品需求量有何不同点和共同点;
(5) 不同时间段(例如月头、月中、月末等)产品需求量有何特性;
(6) 节假日对产品需求量的影响;
(7) 促销(如 618、双十一等)对产品需求量的影响;
(8) 季节因素对产品需求量的影响。
(1)产品的不同价格对需求量的影响
首先,读取数据并提取item_price和ord_qty两列数据; 然后,根据item_price进行分组统计,计算每个价格区间的平均需求量; 最后,通过散点图将不同价格区间的平均需求量进行可视化展示。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# 读取数据
df = pd.read_csv('data/order_train0.csv')
# 按照产品价格分组,并计算平均值
grouped = df.groupby('item_price')['ord_qty'].mean().reset_index()
# 使用 Matplotlib 画图
plt.figure(figsize=(10, 6))
plt.plot(grouped['item_price'], grouped['ord_qty'], 'o-')
plt.xlabel('Product Price')
plt.ylabel('Average Order Quantity')
plt.title('Relationship between Product Price and Order Quantity')
plt.savefig('img/1.png',dpi=300)
# 使用 Seaborn 画图
sns.set_style('darkgrid')
plt.figure(figsize=(10, 6))
sns.lineplot(x='item_price', y='ord_qty', data=grouped)
plt.xlabel('Product Price')
plt.ylabel('Average Order Quantity')
plt.title('Relationship between Product Price and Order Quantity')
plt.savefig('img/2.png',dpi=300)
从图表中可以看出,产品价格与平均订单需求量之间呈现出U形关系,即价格较低或较高时,订单需求量较高;而当价格处于中间区间时,订单需求量较低。这可能是因为价格过低会让消费者觉得产品质量不高,而价格过高则会让消费者觉得不值得购买。因此,合理的定价策略可以在一定程度上提高产品的销售量。
也可以使用回归模型(例如线性回归、多项式回归等)对产品价格和需求量之间的关系进行建模和预测,从而确定价格对需求量的影响。
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# 读取数据
df = pd.read_csv('order_train1.csv')
# 绘制散点图
sns.scatterplot(x='item_price', y='ord_qty', data=df)
# 绘制箱线图
sns.boxplot(x='item_price', y='ord_qty', data=df)
# 使用线性回归模型拟合
x = df[['item_price']]
y = df[['ord_qty']]
model = LinearRegression()
model.fit(x, y)
# 输出模型系数和截距
print('Coefficients:', model.coef_)
print('Intercept:', model.intercept_)
(2)产品所在区域对需求量的影响,以及不同区域的产品需求量有何特性
可以通过对不同区域的需求量进行可视化分析,例如绘制直方图、箱线图等,查看需求量的分布情况。也可以使用ANOVA方差分析等方法来判断不同区域之间的需求量是否存在显著差异,从而确定产品所在区域对需求量的影响。
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import f_oneway
# 读取数据
df = pd.read_csv('order_train1.csv')
# 绘制直方图
sns.histplot(x='ord_qty', hue='sales_region_code', data=df, kde=True)
# 绘制箱线图
sns.boxplot(x='sales_region_code', y='ord_qty', data=df)
# 进行ANOVA方差分析
grouped_data = df.groupby('sales_region_code')['ord_qty'].apply(list)
。。。略,请下载完整代码
print('F-value:', f_value)
print('P-value:', p_value)
(3)不同销售方式(线上和线下)的产品需求量的特性
可以通过绘制不同销售方式的需求量直方图、箱线图等方法来查看产品需求量的分布情况和差异。也可以使用t检验等方法来确定不同销售方式之间的需求量是否存在显著差异。
然后,我们可以按照销售渠道名称(sales_chan_name
)将数据分为线上和线下两类,计算它们的订单需求量(ord_qty
)的基本统计量,包括均值、中位数、最大值、最小值、标准差等,以了解它们的分布情况和差异性。
import pandas as pd
# 读取数据
data = pd.read_csv('order_train1.csv')
# 查看数据
print(data.head())
# 将数据按照销售渠道名称分为线上和线下两类
online_data = data[data['sales_chan_name'] == 'online']
offline_data = data[data['sales_chan_name'] == 'offline']
# 计算线上和线下订单需求量的基本统计量
print('线上订单需求量的基本统计量:')
print(online_data['ord_qty'].describe())
print('线下订单需求量的基本统计量:')
print(offline_data['ord_qty'].describe())
除了计算订单需求量的基本统计量之外,我们还可以通过可视化方式更加直观地了解不同销售方式下产品需求量的特性。在 Python 中,我们可以使用 Matplotlib 或者 Seaborn 库进行数据可视化。
import seaborn as sns
# 设置图形风格
sns.set(style="ticks", palette="pastel")
# 绘制箱线图,分析线上和线下订单需求量的分布情况
sns.boxplot(x="sales_chan_name", y="ord_qty", data=data)
# 显示图形
sns.despine(trim=True)
运行上述代码,可以得到一个箱线图,展示了线上和线下订单需求量的分布情况。通过比较箱线图的位置、大小和形状等特征,我们可以了解不同销售方式下产品需求量的差异性和分布情况。例如,如果线上订单需求量的中位数明显高于线下订单需求量的中位数,那么我们可以判断线上销售渠道对产品需求量的贡献较大。
import matplotlib.pyplot as plt
# 提取线上和线下订单需求量
online_ord_qty = data[data["sales_chan_name"] == "online"]["ord_qty"]
offline_ord_qty = data[data["sales_chan_name"] == "offline"]["ord_qty"]
# 绘制线上和线下订单需求量直方图
。。。略,请下载完整代码
labels = ['Online', 'Offline']
plt.bar(labels, X)
plt.title('Distribution of Sales Channels')
plt.xlabel('Sales Channels')
plt.ylabel('Sales Volume')
plt.show()
核密度图可以更加直观地展示数据的分布情况,它可以通过对数据进行平滑处理,得到一条连续的曲线,反映了数据的概率密度分布情况。
import seaborn as sns
# 提取线上和线下订单需求量
online_ord_qty = data[data["sales_chan_name"] == "online"]["ord_qty"]
offline_ord_qty = data[data["sales_chan_name"] == "offline"]["ord_qty"]
# 绘制线上和线下订单需求量核密度图
sns.kdeplot(online_ord_qty, shade=True, label="Online")
sns.kdeplot(offline_ord_qty, shade=True, label="Offline")
plt.legend(loc="upper right")
plt.title("Distribution of Order Quantity by Sales Channel")
plt.xlabel("Order Quantity")
plt.ylabel("Density")
plt.show()
从核密度图中可以看出,线下销售方式下的产品需求量分布相对于线上销售方式更加集中,呈现出一个明显的峰态;而线上销售方式下的产品需求量分布比较平滑,没有出现明显的峰态。同时,线下销售方式下的产品需求量整体偏高,而线上销售方式下的产品需求量整体偏低。
# 绘制散点图
sns.scatterplot(data=train_data, x="item_price", y="ord_qty", hue="sales_chan_name")
从散点图中可以看出,线下销售方式下产品价格与需求量之间的关系似乎比线上销售方式下更加紧密,而且线下销售方式下有一些高价格、高需求量的异常值。但是需要注意的是,由于数据中的产品价格和需求量都是离散值,所以散点图中的点是会有重叠的。
(4)不同品类之间的产品需求量有何不同点和共同点;
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 读取数据
data = pd.read_csv('order_train1.csv')
# 按照品类分组,计算每个品类的订单需求量的平均值、中位数、标准差等统计指标
category_demand = data.groupby('second_cate_code')['ord_qty'].agg(['mean', 'median', 'std'])
。。。略,请下载完整代码
# 绘制每个品类的订单需求量的分布直方图
category_list = data['second_cate_code'].unique().tolist()
for category in category_list:
demand = data.loc[data['second_cate_code'] == category, 'ord_qty']
plt.hist(demand, bins=30)
plt.title(f'Cate:{category}')
plt.xlabel('Demand')
plt.ylabel('Frequency')
plt.show()
# 对于不同品类之间的需求量进行比较分析,找出不同品类之间的不同点和共同点
# 可以使用t检验、方差分析等统计方法
(5)不同时间段(例如月头、月中、月末等)产品需求量有何特性;
为了研究不同时间段产品需求量的特性,我们需要首先将订单日期进行拆分,提取出月初、月中和月末三个时间段的需求量。可以使用 pandas 中的 dt 属性来获取日期时间中的年、月、日、小时等信息。在这里,我们可以使用 pandas 中的 cut 函数对订单日期进行分段,然后对不同时间段的订单需求量进行统计。
import pandas as pd
# 读取数据
data = pd.read_csv('order_train1.csv')
# 转换订单日期格式为 datetime 类型
data['order_date'] = pd.to_datetime(data['order_date'], format='%y/%m/%d')
# 根据订单日期将数据进行排序
data = data.sort_values(by='order_date')
# 按照月初、月中、月末将订单需求量进行分组
。。。略,请下载完整代码
time_bins = [0, 10, 20, 31]
data['order_date_category'] = pd.cut(data['order_date'].dt.day, bins=time_bins, labels=time_labels)
# 统计不同时间段的订单需求量
demand_by_time = data.groupby('order_date_category')['ord_qty'].sum()
# 绘制不同时间段的订单需求量柱状图
demand_by_time.plot(kind='bar')
(6)节假日对产品需求量的影响:
节假日通常会对消费者的购买行为产生影响,因此对产品需求量也会有影响。在此问题中,我们可以选取国内的法定节假日,对节假日和非节假日进行对比分析。
为了分析节假日对产品需求量的影响,可以先对数据进行处理,找出所有的节假日以及对应的日期。在本数据集中,可以通过观察订单日期(order_date)列来确定节假日日期,例如春节、国庆节等。然后,可以计算出每个节假日的平均需求量,将其与普通日的需求量进行比较,从而分析节假日对产品需求量的影响。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import holidays
# 加载数据集并进行数据预处理
df = pd.read_csv('data/order_train0.csv')
df['order_date'] = pd.to_datetime(df['order_date'])
df['is_holiday'] = df['order_date'].isin(holidays.China(years=[2015,2016,2017,2018]))
df['is_holiday'] = df['is_holiday'].astype(int)
# 将数据集分成两部分:节假日数据和非节假日数据
。。。略,请下载完整代码
# 计算每天的平均需求量
holiday_demand = holiday_df.groupby(['order_date'])['ord_qty'].mean()
non_holiday_demand = non_holiday_df.groupby(['order_date'])['ord_qty'].mean()
# 可视化比较节假日和非节假日的平均需求量
plt.figure(figsize=(10,6))
plt.plot(holiday_demand.index, holiday_demand.values, label='Holiday')
plt.plot(non_holiday_demand.index, non_holiday_demand.values, label='Non-Holiday')
plt.title('Average demand on holiday vs non-holiday')
plt.xlabel('Date')
plt.ylabel('Average demand')
plt.legend()
plt.show()
(7)促销对产品需求量的影响:
Promotional activities usually increase product sales and therefore have an impact on product demand. In this problem, we can select some promotional activities and compare and analyze the promotional period and non-promotional period.
import pandas as pd
import matplotlib.pyplot as plt
# 加载数据集
df = pd.read_csv('data/order_train0.csv', parse_dates=['order_date'])
df['order_date'] = pd.to_datetime(df['order_date'], format='%y/%m/%d')
# 按照促销日期将数据集分成两部分
promo_dates = [pd.to_datetime('2016-06-18'), pd.to_datetime('2016-11-11')]
df_promo = df[df['order_date'].isin(promo_dates)]
df_nonpromo = df[~df['order_date'].isin(promo_dates)]
# 计算促销和非促销期间的每天平均需求量
。。。略,请下载完整代码
# 可视化结果
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(promo_mean_qty.index, promo_mean_qty.values, label='Promo')
ax.plot(nonpromo_mean_qty.index, nonpromo_mean_qty.values, label='Non-Promo')
ax.set_xlabel('Date')
ax.set_ylabel('Average Demand')
ax.set_title('Impact of Promotions on Product Demand')
ax.legend()
plt.show()
Compare the average order demand during the promotion period and the non-promotion period to analyze the impact of the promotion on the product demand.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 1. 确定促销期
promotions = ['2015/6/18', '2015/11/11', '2016/6/18', '2016/11/11', '2017/6/18', '2017/11/11', '2018/6/18']
# 2. 加载并预处理数据
df = pd.read_csv('data/order_train0.csv', parse_dates=['order_date'], dtype={'sales_region_code': 'str'})
df['is_promotion'] = df['order_date'].isin(promotions).astype(int)
df_agg = df.groupby(['order_date'])['ord_qty'].sum().reset_index()
# 3. 计算促销期和非促销期的订单需求量
df_promo = df_agg[df_agg['order_date'].isin(promotions)]
df_nonpromo = df_agg[~df_agg['order_date'].isin(promotions)]
promo_mean = df_promo['ord_qty'].mean()
nonpromo_mean = df_nonpromo['ord_qty'].mean()
# 4. 可视化比较促销期和非促销期的订单需求量
。。。略,请下载完整代码
ax.bar(['Promotion', 'Non-promotion'], [promo_mean, nonpromo_mean])
ax.set_xlabel('Period')
ax.set_ylabel('Average order quantity')
ax.set_title('Effect of promotions on order quantity')
plt.show()
As can be seen from the bar graph, the average demand for products that participate in the promotion is higher than the average demand for products that do not participate in the promotion. This suggests that promotional activities have a positive impact on product demand.
8. The impact of seasonal factors on product demand
import pandas as pd
import matplotlib.pyplot as plt
# 读取数据
df = pd.read_csv('order_train1.csv')
# 将订单日期转换为季节
def date_to_season(date):
year, month, day = map(int, date.split('/'))
if month in (3, 4, 5):
return 'Spring'
elif month in (6, 7, 8):
return 'Summer'
elif month in (9, 10, 11):
return 'Autumn'
else:
return 'Winter'
df['Season'] = df['order_date'].apply(date_to_season)
# 按季度聚合订单需求量
。。。略,请下载完整代码
# 绘制直方图和核密度图
for season in ['Spring', 'Summer', 'Autumn', 'Winter']:
plt.figure(figsize=(8,6))
plt.hist(df[df['Season'] == season]['ord_qty'], bins=20, alpha=0.5, color='blue')
df[df['Season'] == season]['ord_qty'].plot(kind='density', secondary_y=True)
plt.title('Demand Distribution in ' + season)
plt.xlabel('Order Demand')
plt.ylabel('Frequency / Density')
plt.show()
# 绘制散点图
for season in ['Spring', 'Summer', 'Autumn', 'Winter']:
plt.figure(figsize=(8,6))
plt.scatter(df[df['Season'] == season]['item_price'], df[df['Season'] == season]['ord_qty'], alpha=0.5)
plt.title('Demand vs. Price in ' + season)
plt.xlabel('Item Price')
plt.ylabel('Order Demand')
plt.show()
It can be seen from the results that there are differences in the distribution of order demand in different seasons, for example, the order demand in winter is generally higher, while that in summer is generally lower. In addition, there are certain differences in the relationship between order demand and product prices in different seasons. For example, in spring and autumn, there is a certain positive correlation between order demand and product prices, but it does not exist in summer and winter. obvious correlation.
computer browser open
betterbench.top/#/49/detail
Author:kimi
link:http://www.pythonblackhole.com/blog/article/25305/5cbee5c4d03d28cbe4c4/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!