News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

[Mathematical Modeling] Multiple Linear Regression (Python&Matlab code implementation)

posted on 2023-06-06 17:53     read(722)     comment(0)     like(5)     collect(1)


Table of contents

1 Overview

2 Calculation example 1

2.1 Calculation example

2.2 Python code implementation 

2.3 Results

3 Calculation example 2 

3.1 Calculation example

3.2 Python code

3.3 Results

4 Calculation example 3

4.1 Calculation example

4.2 Python code

4.3 Results

5 Calculation example 4 - Matlab code implementation

5.1 Calculation example

5.2 Matlab code implementation

5.3 Results 

6 write at the end


1 Overview

The unary linear regression model studies the quantitative relationship between a dependent variable and an independent variable in a linear trend. In practical problems, we often encounter the problem of the quantitative relationship between an independent variable and multiple dependent variables, which requires us to establish a multiple linear regression model .

I use a formula to describe:

               \mathrm{f}\left(\mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}}\right)=\alpha_{1} \times \mathrm{x}_{1}+\alpha_{2} \times \mathrm{x}_{2}+\ldots+\alpha_{\mathrm{n}} \times \mathrm{x}_{\mathrm{n}}+\beta

① Among them, x1 , x2 , . . . , xn represent independent variable 1, independent variable 2, ..., independent variable n.

②f ( x1 , x2 , . . . , xn ) represents the dependent variable linearly synthesized by these independent variables.

③ α1 , α2 , . . . , αn represent coefficients to be fitted respectively.

④ β represents the constant to be fitted.

In practical problems, we often encounter the problem of linear relationship between multiple variables. At this time, multiple linear regression is used. Multiple linear regression is an extension of unary regression. It is widely used in practical applications. This article will Use python code and Matlab code to show how to use multiple linear regression to solve practical problems. 

2 Calculation example 1

2.1 Calculation example

This calculation example comes from the employment analysis of 2022 Ningxia Cup college students.

The employment of college students has always been the focus of social attention. According to the previous press conference of the Ministry of Education, the number of college graduates in the class of 2022 will reach 10.76 million, breaking through 10 million for the first time, and both the scale and increment have hit a record high. At the same time, affected by factors such as the market environment and the epidemic situation, employment pressure is relatively high. What are the characteristics and trends of the employment of college students? Among the many employed students, what factors determine that some students get jobs with different salaries in many competitions? These factors may include university grades, their own skills, The proximity of universities to industrial centers, the degree of specialization possessed, the market conditions of specific industries, etc. It is reported that there are 6,214 engineering and technical colleges in India with about 2.9 million students. An average of 1.5 million students earn engineering degrees each year, yet fewer than 20 percent of them find work in their core fields because they lack the skills needed to take on technical jobs. Attachment (https: / /www.datafountain.cn/datasets/4955) gives the salary level and various factors of the employment of Indian engineering graduates. According to the attached data combined with other data research:

Analyze the main factors affecting the salary of engineering graduates in colleges and universities.

2.2 Python code implementation 

  1. from sklearn import linear_model
  2. import pandas as pd
  3. # 可以根据相关性删除一些特征
  4. shuju=pd.read_csv('数据.csv')
  5. x =shuju[["10percentage","12percentage","CollegeTier","MechanicalEngg","ElectricalEngg","TelecomEngg","CivilEngg","collegeGPA","Logical","GraduationYear"]]
  6. y =shuju["Salary"]
  7. regr = linear_model.LinearRegression() #建立模型
  8. regr.fit(x, y) # 训练
  9. # 获取回归的系数。
  10. print(regr.coef_)

2.3 Results

Then get the relationship between salary and other independent variables: here are the coefficients obtained by the respective variables:

3 Calculation example 2 

3.1 Calculation example

This calculation example uses the data of calculation example 1, but we use another solution, which is more advanced.

3.2 Python code

  1. # ====导入相关库===========
  2. import numpy as np
  3. import pandas as pd
  4. import statsmodels.api as sm # 实现多元线性回归
  5. # =====接下来是数据预处理===========
  6. file = r'shuju.csv'
  7. data = pd.read_csv(file)
  8. data.columns = ["Salary","10percentage","12percentage","CollegeTier","MechanicalEngg","ElectricalEngg","TelecomEngg","CivilEngg","collegeGPA","Logical","GraduationYear"]
  9. # ====开始生成多元线性模型=====
  10. x = sm.add_constant(data.iloc[:, 1:]) # 生成自变量x1~x9,Python从0开始索引喔,和Matlab不一样,Matlab从1开始索引
  11. y = data["Salary"] # 生成因变量
  12. model = sm.OLS(y, x) # 生成模型
  13. result = model.fit() # 模型拟合
  14. result.summary() # 模型描述
  15. print(result.summary())

result:

  1. OLS Regression Results
  2. ==============================================================================
  3. Dep. Variable: Salary R-squared: 0.084
  4. Model: OLS Adj. R-squared: 0.080
  5. Method: Least Squares F-statistic: 27.21
  6. Date: Wed, 21 Sep 2022 Prob (F-statistic): 2.90e-50
  7. Time: 20:17:27 Log-Likelihood: -40896.
  8. No. Observations: 2998 AIC: 8.181e+04
  9. Df Residuals: 2987 BIC: 8.188e+04
  10. Df Model: 10
  11. Covariance Type: nonrobust
  12. ==================================================================================
  13. coef std err t P>|t| [0.025 0.975]
  14. ----------------------------------------------------------------------------------
  15. const 8.2e+04 2.12e+05 0.387 0.699 -3.34e+05 4.98e+05
  16. 10percentage 1341.3623 503.139 2.666 0.008 354.828 2327.897
  17. 12percentage 1410.6973 446.494 3.160 0.002 535.231 2286.164
  18. CollegeTier -1.055e+05 1.44e+04 -7.309 0.000 -1.34e+05 -7.72e+04
  19. MechanicalEngg 37.9658 37.902 1.002 0.317 -36.350 112.282
  20. ElectricalEngg -134.0305 43.477 -3.083 0.002 -219.278 -48.783
  21. TelecomEngg -76.8638 36.165 -2.125 0.034 -147.775 -5.952
  22. CivilEngg 199.5630 116.302 1.716 0.086 -28.478 427.604
  23. collegeGPA 1502.1779 494.583 3.037 0.002 532.420 2471.936
  24. Logical 294.6543 45.724 6.444 0.000 205.000 384.309
  25. GraduationYear -17.0930 101.498 -0.168 0.866 -216.107 181.921
  26. ==============================================================================
  27. Omnibus: 4088.757 Durbin-Watson: 2.029
  28. Prob(Omnibus): 0.000 Jarque-Bera (JB): 1454349.913
  29. Skew: 7.577 Prob(JB): 0.00
  30. Kurtosis: 109.831 Cond. No. 1.18e+05
  31. ==============================================================================
  32. Notes:
  33. [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  34. [2] The condition number is large, 1.18e+05. This might indicate that there are
  35. strong multicollinearity or other numerical problems.

In this result, we mainly look at the three columns of “coef”, “t”and “P>|t|”. coefIt is the regression coefficient mentioned above, constand this value is the regression constant, so the regression model we get is:

y = 320.640948 + 1341.3623x1 + 1410.6973x2 + -1.055e+05x3 -  37.9658x4 + -134.0305x5 -76.8638x6 +199.5630x7 -1502.1779x8 + 294.6543x9-17.0930x10.

“t”“P>|t|”这两列是等价的,使用时选择其中一个就行,其主要用来判断每个自变量和y的线性显著关系,后面我们会讲到。从图中还可以看出,Prob (F-statistic)4.21e-20,这个值就是我们常用的P值,其接近于零,说明我们的多元线性方程是显著的,也就是yx1、x2、...、x9有着显著的线性关系,而R-squared是0.992,也说明这个线性关系比较显著。

理论上,这个多元线性方程已经求出来了,而且效果还不错,我们就可以用其进行预测了,但这里我们还是要进行更深一步的探讨。

前面说过,yx1、x2、...、x9有着显著的线性关系,这里要注意x1x9这9个变量被看作是一个整体,y与这个整体有显著的线性关系,但不代表y与其中的每个自变量都有显著的线性关系,我们在这里要找出那些与y的线性关系不显著的自变量,然后把它们剔除,只留下关系显著的,这就是前面说过的t检验。

t检验的原理内容有些复杂,有兴趣的读者可以自行查阅资料,这里不再赘述。我们可以通过图3中“P>|t|”这一列来判断,这一列中我们可以选定一个阈值,比如统计学常用的就是0.05、0.02或0.01,这里我们就用0.05,凡是P>|t|这列中数值大于0.05的自变量,我们都把它剔除掉,这些就是和y线性关系不显著的自变量,所以都舍去,请注意这里指的自变量是x1x9,不包括图3中const这个值。但是这里有一个原则,就是一次只能剔除一个,剔除的这个往往是P值最大的那个,比如图3中P值最大的是x4,那么就把它剔除掉,然后再用剩下的x1、x2、x3、x5、x6、x7、x8、x9来重复上述建模过程,再找出P值最大的那个自变量,把它剔除,如此重复这个过程,直到所有P值都小于等于0.05,剩下的这些自变量就是我们需要的自变量,这些自变量和y的线性关系都比较显著,我们要用这些自变量来进行建模。

下面是Python代码:

  1. # ====导入相关库===========
  2. import numpy as np
  3. import pandas as pd
  4. import statsmodels.api as sm # 实现多元线性回归
  5. # =====接下来是数据预处理===========
  6. file = r'shuju.csv'
  7. data = pd.read_csv(file)
  8. data.columns = ["Salary","10percentage","12percentage","CollegeTier","MechanicalEngg","ElectricalEngg","TelecomEngg","CivilEngg","collegeGPA","Logical","GraduationYear"]
  9. # ====开始生成多元线性模型=====
  10. x = sm.add_constant(data.iloc[:, 1:]) # 生成自变量x1~x9,Python从0开始索引喔,和Matlab不一样,Matlab从1开始索引
  11. y = data["Salary"] # 生成因变量
  12. model = sm.OLS(y, x) # 生成模型
  13. result = model.fit() # 模型拟合
  14. result.summary() # 模型描述
  15. print(result.summary())
  16. '''
  17. y与x1、x2、...、x9有着显著的线性关系,这里要注意x1到x9这9个变量被看作是一个整体,
  18. y与这个整体有显著的线性关系,但不代表y与其中的每个自变量都有显著的线性关系,
  19. 我们在这里要找出那些与y的线性关系不显著的自变量,然后把它们剔除,下面这个函数就是这个作用。
  20. '''
  21. # =====找出y与哪几个自变量有显著的关系=============
  22. def looper(limit): # limit一般取0.05
  23. cols = ["10percentage","12percentage","CollegeTier","MechanicalEngg","ElectricalEngg","TelecomEngg","CivilEngg","collegeGPA","Logical","GraduationYear"]
  24. for i in range(len(cols)):
  25. data1 = data[cols]
  26. x = sm.add_constant(data1) # 生成自变量
  27. y = data['Salary'] # 生成因变量
  28. model = sm.OLS(y, x) # 生成模型
  29. result = model.fit() # 模型拟合
  30. pvalues = result.pvalues # 得到结果中所有P值
  31. pvalues.drop('const', inplace=True) # 删除const这一列
  32. pmax = max(pvalues) # 选出最大的P值
  33. if pmax > limit:
  34. ind = pvalues.idxmax() # 找出最大P值的index
  35. cols.remove(ind) # 把这个indexcols中删除
  36. else:
  37. return result
  38. result = looper(0.05)
  39. result.summary()
  40. print(result.summary())

3.3 结果

  1. Covariance Type: nonrobust
  2. ==================================================================================
  3. coef std err t P>|t| [0.025 0.975]
  4. ----------------------------------------------------------------------------------
  5. const 5.174e+04 5.31e+04 0.975 0.330 -5.23e+04 1.56e+05
  6. 10percentage 1400.6369 502.351 2.788 0.005 415.648 2385.626
  7. 12percentage 1419.5952 446.442 3.180 0.001 544.230 2294.960
  8. CollegeTier -1.06e+05 1.44e+04 -7.347 0.000 -1.34e+05 -7.77e+04
  9. ElectricalEngg -137.5982 43.436 -3.168 0.002 -222.766 -52.431
  10. TelecomEngg -81.7501 36.052 -2.268 0.023 -152.440 -11.061
  11. collegeGPA 1433.1099 493.383 2.905 0.004 465.706 2400.514
  12. Logical 290.7453 45.668 6.366 0.000 201.201 380.289
  13. ==============================================================================
  14. Omnibus: 4083.414 Durbin-Watson: 2.028
  15. Prob(Omnibus): 0.000 Jarque-Bera (JB): 1441070.840
  16. Skew: 7.561 Prob(JB): 0.00
  17. Kurtosis: 109.337 Cond. No. 7.62e+03
  18. ==============================================================================
  19. Notes:
  20. [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  21. [2] The condition number is large, 7.62e+03. This might indicate that there are
  22. strong multicollinearity or other numerical problems.
  23. Process finished with exit code 0

 那么问题来了,前面我们得到的包含所有自变量的多元线性模型和这个剔除部分变量的模型,我们要选择哪一个,毕竟第一个模型的整体线性效果也挺显著,依据笔者的经验,这个还是要看具体的项目要求。因为我们实际项目中遇到的问题都是现实生活中真实存在的例子,不再是单纯的数学题了,这两个肯定对y是有一定影响的,如果盲目剔除,可能会对最终的结果产生不良影响,所以我们还是要根据实际需求来做决定。 

4 算例3

4.1 算例

这里我们用到的数据来源于2013年《中国统计年鉴》,数据以居民的消费性支出为因变量y,其他9个变量为自变量,其中x1是居民的食品花费,x2是衣着花费,x3是居住花费,x4是医疗保健花费,x5是文教娱乐花费,x6是职工平均工资,x7是地区的人均GDP,x8是地区的消费价格指数,x9是地区的失业率。在这所有变量里面,x1x7以及y的单位是元,x9是百分数,x8没有单位,因为其是消费价格指数。数据的总体大小为31x10,即31行、10列,大体内容下表所示。

4.2 Python代码

  1. #====导入相关库===========
  2. import numpy as np
  3. import pandas as pd
  4. import statsmodels.api as sm #实现多元线性回归
  5. #=====接下来是数据预处理===========
  6. file = r'data.xlsx'
  7. data = pd.read_excel(file)
  8. data.columns = ['y', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']
  9. #====开始生成多元线性模型=====
  10. x = sm.add_constant(data.iloc[:,1:]) #生成自变量x1~x9
  11. y = data['y'] #生成因变量
  12. model = sm.OLS(y, x) #生成模型
  13. result = model.fit() #模型拟合
  14. result.summary() #模型描述
  15. print(result.summary())

 结果:

  1. ==============================================================================
  2. coef std err t P>|t| [0.025 0.975]
  3. ------------------------------------------------------------------------------
  4. const 320.6409 3951.557 0.081 0.936 -7897.071 8538.353
  5. x1 1.3166 0.106 12.400 0.000 1.096 1.537
  6. x2 1.6499 0.301 5.484 0.000 1.024 2.275
  7. x3 2.1787 0.520 4.190 0.000 1.097 3.260
  8. x4 -0.0056 0.477 -0.012 0.991 -0.997 0.985
  9. x5 1.6843 0.214 7.864 0.000 1.239 2.130
  10. x6 0.0103 0.013 0.769 0.451 -0.018 0.038
  11. x7 0.0037 0.011 0.342 0.736 -0.019 0.026
  12. x8 -19.1306 31.970 -0.598 0.556 -85.617 47.355
  13. x9 50.5156 150.212 0.336 0.740 -261.868 362.899
  14. ==============================================================================
  15. Omnibus: 4.552 Durbin-Watson: 2.334
  16. Prob(Omnibus): 0.103 Jarque-Bera (JB): 3.059
  17. Skew: -0.717 Prob(JB): 0.217
  18. Kurtosis: 3.559 Cond. No. 3.76e+06
  19. ==============================================================================
  20. Notes:
  21. [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  22. [2] The condition number is large, 3.76e+06. This might indicate that there are
  23. strong multicollinearity or other numerical problems.
  24. OLS Regression Results

在这个结果中,我们主要看“coef”“t”“P>|t|”这三列。coef就是前面说过的回归系数,const这个值就是回归常数,所以我们得到的这个回归模型就是:

y = 320.640948 + 1.316588 x1 + 1.649859 x2 + 2.17866 x3 - 0.005609 x4 + 1.684283 x5 + 0.01032 x6 + 0.003655 x7 -19.130576 x8 + 50.515575 x9

“t”“P>|t|”这两列是等价的,使用时选择其中一个就行,其主要用来判断每个自变量和y的线性显著关系,后面我们会讲到。从图中还可以看出,Prob (F-statistic)4.21e-20,这个值就是我们常用的P值,其接近于零,说明我们的多元线性方程是显著的,也就是yx1、x2、...、x9有着显著的线性关系,而R-squared是0.992,也说明这个线性关系比较显著。

理论上,这个多元线性方程已经求出来了,而且效果还不错,我们就可以用其进行预测了,但这里我们还是要进行更深一步的探讨。

前面说过,yx1、x2、...、x9有着显著的线性关系,这里要注意x1x9这9个变量被看作是一个整体,y与这个整体有显著的线性关系,但不代表y与其中的每个自变量都有显著的线性关系,我们在这里要找出那些与y的线性关系不显著的自变量,然后把它们剔除,只留下关系显著的,这就是前面说过的t检验。

t检验的原理内容有些复杂,有兴趣的读者可以自行查阅资料,这里不再赘述。我们可以通过图3中“P>|t|”这一列来判断,这一列中我们可以选定一个阈值,比如统计学常用的就是0.05、0.02或0.01,这里我们就用0.05,凡是P>|t|这列中数值大于0.05的自变量,我们都把它剔除掉,这些就是和y线性关系不显著的自变量,所以都舍去,请注意这里指的自变量是x1x9,不包括图3中const这个值。但是这里有一个原则,就是一次只能剔除一个,剔除的这个往往是P值最大的那个,比如图3中P值最大的是x4,那么就把它剔除掉,然后再用剩下的x1、x2、x3、x5、x6、x7、x8、x9来重复上述建模过程,再找出P值最大的那个自变量,把它剔除,如此重复这个过程,直到所有P值都小于等于0.05,剩下的这些自变量就是我们需要的自变量,这些自变量和y的线性关系都比较显著,我们要用这些自变量来进行建模。

下面是Python代码:

  1. #====导入相关库===========
  2. import numpy as np
  3. import pandas as pd
  4. import statsmodels.api as sm #实现多元线性回归
  5. #=====接下来是数据预处理===========
  6. file = r'data.xlsx'
  7. data = pd.read_excel(file)
  8. data.columns = ['y', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']
  9. #====开始生成多元线性模型=====
  10. x = sm.add_constant(data.iloc[:,1:]) #生成自变量x1~x9
  11. y = data['y'] #生成因变量
  12. model = sm.OLS(y, x) #生成模型
  13. result = model.fit() #模型拟合
  14. result.summary() #模型描述
  15. print(result.summary())
  16. '''
  17. y与x1、x2、...、x9有着显著的线性关系,这里要注意x1到x9这9个变量被看作是一个整体,
  18. y与这个整体有显著的线性关系,但不代表y与其中的每个自变量都有显著的线性关系,
  19. 我们在这里要找出那些与y的线性关系不显著的自变量,然后把它们剔除,下面这个函数就是这个作用。
  20. '''
  21. #=====找出y与哪几个自变量有显著的关系=============
  22. def looper(limit): #limit一般取0.05
  23. cols = ['x1', 'x2', 'x3', 'x5', 'x6', 'x7', 'x8', 'x9']
  24. for i in range(len(cols)):
  25. data1 = data[cols]
  26. x = sm.add_constant(data1) #生成自变量
  27. y = data['y'] #生成因变量
  28. model = sm.OLS(y, x) #生成模型
  29. result = model.fit() #模型拟合
  30. pvalues = result.pvalues #得到结果中所有P值
  31. pvalues.drop('const',inplace=True) #删除const这一列
  32. pmax = max(pvalues) #选出最大的P值
  33. if pmax>limit:
  34. ind = pvalues.idxmax() #找出最大P值的index
  35. cols.remove(ind) #把这个indexcols中删除
  36. else:
  37. return result
  38. result = looper(0.05)
  39. result.summary()
  40. print(result.summary())

4.3 结果

  1. ==============================================================================
  2. coef std err t P>|t| [0.025 0.975]
  3. ------------------------------------------------------------------------------
  4. const -1694.6269 562.977 -3.010 0.006 -2851.843 -537.411
  5. x1 1.3642 0.086 15.844 0.000 1.187 1.541
  6. x2 1.7679 0.201 8.796 0.000 1.355 2.181
  7. x3 2.2894 0.349 6.569 0.000 1.573 3.006
  8. x5 1.7424 0.191 9.111 0.000 1.349 2.136
  9. ==============================================================================
  10. Omnibus: 3.769 Durbin-Watson: 2.401
  11. Prob(Omnibus): 0.152 Jarque-Bera (JB): 2.493
  12. Skew: -0.668 Prob(JB): 0.287
  13. Kurtosis: 3.379 Cond. No. 5.74e+04
  14. ==============================================================================

其结果如图4所示。从结果中可以看到最后剩下的有效变量为x1x2x3x5,我们得到的多元线性模型为y = -1694.6269 + 1.3642 x1 + 1.7679 x2 + 2.2894 x3 + 1.7424 x5,这个就是我们最终要用到的有效的多元线性模型。

那么问题来了,前面我们得到的包含所有自变量的多元线性模型和这个剔除部分变量的模型,我们要选择哪一个,毕竟第一个模型的整体线性效果也挺显著,依据笔者的经验,这个还是要看具体的项目要求。因为我们实际项目中遇到的问题都是现实生活中真实存在的例子,不再是单纯的数学题了,比如本例中的x8消费价格指数和x9地区的失业率,这两个肯定对y是有一定影响的,如果盲目剔除,可能会对最终的结果产生不良影响,所以我们还是要根据实际需求来做决定。 

5 算例4——Matlab代码实现

5.1 算例

“综合打分” 是 去年 体育老师根据这15名同学的体重、肺活量、50m短跑、1分钟仰卧起坐、跳远成绩、1000米成绩、1分钟跳绳、引体向上、坐位体前屈等等数据综合评价打出的分数。

  今年 因为学校器材有限,体育老师只测了这15名同学的三项指标:跳远成绩、1000米成绩、1分钟跳绳。

  现在体育老师想要知道这三项指标能不能 线性合成 成最后的 今年 的综合分数?

项目跳远成绩(cm)1000米成绩(s)1分钟跳绳(个)综合打分(100分制)
样本118028015360
样本220124017075
样本320522616270
样本420822416070
样本521322016275
样本621721716575
样本721822517085
样本822222116880
样本92262111698o
样本1023021317985
样本112331991729o
样本1223819817290
样本1324019517590
样本1424218618195
样本15253183176 q96

5.2 Matlab代码实现

语法:

[b,bint,r,rint,stats]=regress(y,x,0.05);     % 95%的置信区间

说明:
  ①regress()中的α为显著性水平(缺省时默认为0.05)

  ②b,bint 为 回归系数估计值 和 它们的置信区间

  ③r,rint 为 残差(向量) 及其置信区间

  ④stats 是用于检验回归模型的统计量,有4个数值,第一个是拟合优度 R2,第二个是 对方程整体显著性检验 的 F检验 ,第三个是 p值,第四个是 误差方差的估计值s^{2}

  1. clc;
  2. clear;
  3. close all;
  4. %% 读取数据
  5. shuju=xlsread('case4.xlsx');
  6. %shuju=xlsread('case5.xlsx');
  7. % 因为用的3是维拟合,则 x 应该为 3*15 的矩阵,第一列为 1 ,第二列为 x1 ,第三列为 x2 , 第四列为 x3
  8. % 15 代表的是 样本个数
  9. x1=shuju(:,1); % 跳高成绩
  10. x2=shuju(:,2); % 1000m成绩
  11. x3=shuju(:,3); % 跳绳个数
  12. y=shuju(:,4);% 综合打分
  13. len = length(y);
  14. pelta = ones(len,1);
  15. %% 多元线性拟合
  16. x = [pelta, x1, x2, x3];
  17. [b,bint,r,rint,stats]=regress(y,x,0.05); % 95%的置信区间
  18. %% 拟合函数
  19. Y_NiHe = b(1) + b(2) .* x1 + b(3) .* x2 + b(4) .* x3 ;
  20. %% 可视化
  21. figure(1);
  22. hold on;
  23. plot(x1,'m*-');
  24. plot(x2,'y<-');
  25. plot(x3,'ro-');
  26. plot(y,'bh-');
  27. plot(Y_NiHe,'gx-','LineWidth',1);
  28. legend('跳高成绩(cm)','1000m成绩(s)','跳绳个数','去年的综合分数(100分制)','多元线性回归拟合曲线')
  29. R_2 = 1 - sum( (Y_NiHe - y).^2 )./ sum( (y - mean(y)).^2 );
  30. str = num2str(R_2);
  31. disp(['拟合优度为:',str])
  32. figure(2)
  33. rcoplot(r,rint)%做残差图
  34. title('厚度残差图')
  35. xlabel('数据');ylabel('残差');

 

5.3 Results 

 The final expression is obtained as follows:

\mathrm{y}=\mathrm{f}\left(\mathrm{x}_{1}, \mathrm{x}_{2}, \mathrm{x}_{3}\right)=-112,619+ 0.4461\times\mathrm{x}_{1}+0.068\times\mathrm{x}_{2}+0.473\times\mathrm{x}_{3}

Residual:

6 write at the end

Multiple linear regression (Python&Matlab code implementation)



Category of website: technical article > Blog

Author:python98k

link:http://www.pythonblackhole.com/blog/article/83232/fddb1f54c57bf84d75c6/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

5 0
collect article
collected

Comment content: (supports up to 255 characters)