Python机器学习bug：ValueError_ Expected 2D array, got 1D array instead

posted on 2023-05-03 19:54 read(895) comment(0) like(6) collect(4)

0 Preface

When learning machine learning , in order to facilitate understanding and observation, sometimes some one-dimensional arrays are used for testing. It may be inevitable to step on this pit in the beginner stage. This bug is relatively simple to deal with, which is to change a one-dimensional array into a two-dimensional array.

Related environment:

Windows 64-bit
Python3.9
scikit-learn1.0.2
pandas1.4.2

1 scene restoration

Let's use a simple example to restore the scene to see how to deal with it:
When doing a linear regression training, reading data and drawing pictures went smoothly, but when it comes to the training model step, an error is reported. Literally, training The model expects to pass a two-dimensional array , but what is actually passed is a one-dimensional array. Judging from the information given, something Xwent wrong.
The relevant code is as follows:

# 测试代码
import pandas as pd
# 调用sklearn的线性模型
from sklearn.linear_model import LinearRegression

data = pd.DataFrame({'x':[1,2,3,4,5,6],'y':[3,4,5,6,7,8]})
X = data.loc[:,'x']
y = data.loc[:,'y']

# 实例化线性模型
lr_model = LinearRegression()

# 训练模型
lr_model.fit(X,y)
# 预测x结果
y_predict = lr_model.predict(X)
# 预测具体某个值的结果
y_p = lr_model.predict([[3.5]])

Error content:

ValueError: Expected 2D array, got 1D array instead: array=[1 2 3 4 5 6].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

2 solutions

After the error is reported, the official also provides related solutions, just add one step to convert the error to a two-dimensional array. The specific code is as follows, adding lines 12~13. Convert the Series to a two-dimensional array.array=[1 2 3 4 5 6]array.reshape(-1, 1)

import pandas as pd
# 调用sklearn的线性模型
from sklearn.linear_model import LinearRegression

data = pd.DataFrame({'x':[1,2,3,4,5,6],'y':[3,4,5,6,7,8]})
X = data.loc[:,'x']
y = data.loc[:,'y']

# 实例化线性模型
lr_model = LinearRegression()

# 将Series转化为二维数组
X = X.values.reshape(-1, 1) 
# 训练模型
lr_model.fit(X,y)
# 预测x结果
y_predict = lr_model.predict(X)
# 预测具体某个值的结果
y_p = lr_model.predict([[3.5]])

In addition to using it , you can also take out a two-dimensional array from it when assigning a value to the variablearray.reshape(-1, 1) at the beginning , and use it. The specific code is as follows:XdataX = df[['x']]

import pandas as pd
# 调用sklearn的线性模型
from sklearn.linear_model import LinearRegression

data = pd.DataFrame({'x':[1,2,3,4,5,6],'y':[3,4,5,6,7,8]})
# X = data.loc[:,'x']
X = data[['x']].values      # values是转化为数组，不转也不影响最终的结果，只是会有一个提示：X does not have valid feature names
y = data.loc[:,'y']

# 实例化线性模型
lr_model = LinearRegression()

# 训练模型
lr_model.fit(X,y)
# 预测x结果
y_predict = lr_model.predict(x)
# 预测具体某个值的结果
y_p = lr_model.predict([[3.5]])