News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

Python数据分析案例24——基于深度学习的锂电池寿命预测

posted on 2023-06-06 11:23     read(725)     comment(0)     like(11)     collect(3)


In this issue, the cases are more hard-core. It is suitable for masters of science and engineering, and students of humanities and social sciences can read the previous cases.


case background

This article was published last year, and the publication was also printed. Now I share some code as a case for students who need it.

Original text link (HowNet article C core):

A Lithium Battery Life Prediction Method Based on Mode Decomposition and Machine Learning

Li-ion battery remaining useful life (RUL) is an important indicator of battery health management. This paper uses battery capacity as an indicator of health status, and uses modal decomposition and machine learning algorithms to propose a CEEMDAN-RF-SED-LSTM method to predict the RUL of lithium batteries.

First, CEEMDAN is used to decompose the battery capacity data. In order to avoid the influence of the noise in the fluctuation component on the prediction ability of the model, and not completely discard the characteristic information in the fluctuation component, this work proposes to use the random forest (RF) algorithm to obtain the value of each fluctuation component. Importance ranking and numerical value, which are used as the weight of each component for the explanatory power of the original data. Then, the prediction results obtained by the weight value and the neural network model constructed by different fluctuation components are weighted and reconstructed, and then the RUL prediction of the lithium-ion battery is obtained.

The article compares the prediction accuracy of a single model and a combined model. The prediction accuracy of the combined model with RF has further improved the performance of the five neural networks. The performance test of this method is carried out with the NASA dataset as the research object. The experimental results show that the CEEMDAN-RF-SED-LSTM model has a good performance in predicting battery RUL, and the prediction results have lower errors than a single model.

The above is a summary. I won’t introduce the principles. They are all in the article. This blog is mainly to share a process of how to use these neural networks to build time series forecasting. Just some code , not the whole code of this article.

It mainly uses modal decomposition to decompose the battery capacity degradation curve, then uses random forest regression to adjust the weight coefficient of the modal component, and finally uses the neural network to predict and add. The codec structure behind the article is this blog. No.

Data Sources

NASA's battery data set is very old. NASA seems to have removed this data set last year. However, there are still many acquisition methods on the Internet. Of course, the original data is stored in matlab files, and it needs to be processed and cleaned before it can be extracted and used.

In the article, all 4 batteries have been tested. This blog uses the data of one battery, B0006, as a demonstration.

Deep Learning Framework 

The Keras framework based on TensorFlow is used, which is easy to get started. Although pytorch is very popular in academia, object-oriented programming really makes it difficult for programming novice to understand. .


Code implementation preparation

Since it is the code of a relatively systematic article, my code style here will have a clear division of labor, has an engineering nature, and has a high degree of encapsulation. In order to facilitate reuse, there will be a large number of package adjustments and custom functions, which must be programmed Only the basics of thinking can be understood, and it is not as simple as the previous cases.

Import required packages

  1. import os
  2. import math
  3. import datetime
  4. import random as rn
  5. import numpy as np
  6. import pandas as pd
  7. import matplotlib.pyplot as plt
  8. %matplotlib inline
  9. plt.rcParams ['font.sans-serif'] ='SimHei' #显示中文
  10. plt.rcParams ['axes.unicode_minus']=False #显示负号
  11. from PyEMD import EMD,CEEMDAN,Visualisation
  12. from sklearn.model_selection import train_test_split
  13. from sklearn.preprocessing import MinMaxScaler
  14. from sklearn.ensemble import RandomForestRegressor
  15. from sklearn.metrics import mean_absolute_error
  16. from sklearn.metrics import mean_squared_error
  17. import tensorflow as tf
  18. import keras
  19. from keras.models import Model, Sequential
  20. from keras.layers import GRU, Dense,Conv1D, MaxPooling1D,GlobalMaxPooling1D,Embedding,Dropout,Flatten,SimpleRNN,LSTM
  21. #from keras.callbacks import EarlyStopping
  22. #from tensorflow.keras import regularizers
  23. #from keras.utils.np_utils import to_categorical
  24. from tensorflow.keras import optimizers

Read the data, perform CEEMDAN modal decomposition, and then draw a picture to view the decomposition results:

  1. data0=pd.read_csv('NASA电容量.csv',usecols=['B0006'])
  2. S1 = data0.values
  3. S = S1[:,0]
  4. t = np.arange(0,len(S),1)
  5. ceemdan=CEEMDAN()
  6. ceemdan.ceemdan(S)
  7. imfs, res = ceemdan.get_imfs_and_residue()
  8. print(len(imfs))
  9. vis = Visualisation()
  10. vis.plot_imfs(imfs=imfs, residue=res, t=t , include_residue=False)

The 4 below is the number of modes representing the decomposition. 

 

 Perform random forest regression on 4 modes:

  1. df=pd.DataFrame(imfs.T,columns=['imf'+str(i+1) for i in range(len(imfs))])
  2. df['capacity']=data0.values
  3. X_train=df.iloc[:,:-1]
  4. y_train=df.iloc[:,-1]
  5. model = RandomForestRegressor(n_estimators=5000, max_features=2, random_state=0)
  6. model.fit(X_train, y_train)
  7. model.score(X_train, y_train)

Goodness of fit 99.9% 

plot variable importance

  1. model.feature_importances_
  2. sorted_index = model.feature_importances_.argsort()
  3. plt.barh(range(X_train.shape[1]), model.feature_importances_[sorted_index])
  4. plt.yticks(np.arange(X_train.shape[1]), X_train.columns[sorted_index])
  5. plt.xlabel('Feature Importance')
  6. plt.ylabel('Feature')
  7. plt.title('Random Forest')
  8. plt.tight_layout()

 

 Note component names and importance:

  1. imf_names=X_train.columns[sorted_index][::-1]
  2. imf_weight=model.feature_importances_[sorted_index][::-1]
  3. imf_weight[0]=1
  4. #imf_names,imf_weight

Define random number seed function, error evaluation index calculation function

  1. def set_my_seed():
  2. os.environ['PYTHONHASHSEED'] = '0'
  3. np.random.seed(1)
  4. rn.seed(12345)
  5. tf.random.set_seed(123)
  6. def evaluation(y_test, y_predict):
  7. mae = mean_absolute_error(y_test, y_predict)
  8. mse = mean_squared_error(y_test, y_predict)
  9. rmse = math.sqrt(mean_squared_error(y_test, y_predict))
  10. mape=(abs(y_predict -y_test)/ y_test).mean()
  11. return mae, rmse, mape
  12. def relative_error(y_test, y_predict, threshold):
  13. true_re, pred_re = len(y_test), 0
  14. for i in range(len(y_test)-1):
  15. if y_test[i] <= threshold >= y_test[i+1]:
  16. true_re = i - 1
  17. break
  18. for i in range(len(y_predict)-1):
  19. if y_predict[i] <= threshold:
  20. pred_re = i - 1
  21. break
  22. return abs(true_re - pred_re)/true_re

Define the function to construct the sequence, and obtain the explanatory variables and response variables corresponding to the training set and test set from the sequence data

  1. def build_sequences(text, window_size=4):
  2. #text:list of capacity
  3. x, y = [],[]
  4. for i in range(len(text) - window_size):
  5. sequence = text[i:i+window_size]
  6. target = text[i+window_size]
  7. x.append(sequence)
  8. y.append(target)
  9. return np.array(x), np.array(y)
  10. def get_traintest(data,train_size=len(data0),window_size=4):
  11. train=data[:train_size]
  12. test=data[train_size-window_size:]
  13. X_train,y_train=build_sequences(train,window_size=window_size)
  14. X_test,y_test=build_sequences(test)
  15. return X_train,y_train,X_test,y_test

Define the function to build the model, as well as the function to draw the loss graph, and the fitting effect evaluation and comparison function:

  1. def build_model(X_train,mode='LSTM',hidden_dim=[32,16]):
  2. set_my_seed()
  3. model = Sequential()
  4. if mode=='RNN':
  5. #RNN
  6. model.add(SimpleRNN(hidden_dim[0],return_sequences=True, input_shape=(X_train.shape[-2],X_train.shape[-1])))
  7. model.add(SimpleRNN(hidden_dim[1]))
  8. elif mode=='MLP':
  9. model.add(Dense(hidden_dim[0],activation='relu',input_shape=(X_train.shape[-1],)))
  10. model.add(Dense(hidden_dim[1],activation='relu'))
  11. elif mode=='LSTM':
  12. # LSTM
  13. model.add(LSTM(hidden_dim[0],return_sequences=True, input_shape=(X_train.shape[-2],X_train.shape[-1])))
  14. model.add(LSTM(hidden_dim[1]))
  15. elif mode=='GRU':
  16. #GRU
  17. model.add(GRU(hidden_dim[0],return_sequences=True, input_shape=(X_train.shape[-2],X_train.shape[-1])))
  18. model.add(GRU(hidden_dim[1]))
  19. elif mode=='CNN':
  20. #一维卷积
  21. model.add(Conv1D(hidden_dim[0],3,activation='relu',input_shape=(X_train.shape[-2],X_train.shape[-1])))
  22. model.add(GlobalMaxPooling1D())
  23. model.add(Dense(1))
  24. model.compile(optimizer='Adam', loss='mse',metrics=[tf.keras.metrics.RootMeanSquaredError(),"mape","mae"])
  25. return model
  26. def plot_loss(hist,imfname):
  27. plt.subplots(1,4,figsize=(16,2))
  28. for i,key in enumerate(hist.history.keys()):
  29. n=int(str('14')+str(i+1))
  30. plt.subplot(n)
  31. plt.plot(hist.history[key], 'k', label=f'Training {key}')
  32. plt.title(f'{imfname} Training {key}')
  33. plt.xlabel('Epochs')
  34. plt.ylabel(key)
  35. plt.legend()
  36. plt.tight_layout()
  37. plt.show()
  38. def evaluation_all(df_RFW_eval_all,df_eval_all,mode,Rated_Capacity=2,show_fit=True):
  39. df_RFW_eval_all['all_pred']=df_RFW_eval_all.iloc[:,1:].sum(axis=1)
  40. df_eval_all['all_pred']=df_eval_all.iloc[:,1:].sum(axis=1)
  41. MAE1,RMSE1,MAPE1=evaluation(df_RFW_eval_all['capacity'],df_RFW_eval_all['all_pred'])
  42. RE1=relative_error(df_RFW_eval_all['capacity'],df_RFW_eval_all['all_pred'],threshold=Rated_Capacity*0.7)
  43. MAE2,RMSE2,MAPE2=evaluation(df_eval_all['capacity'],df_eval_all['all_pred'])
  44. RE2=relative_error(df_eval_all['capacity'],df_eval_all['all_pred'],threshold=Rated_Capacity*0.7)
  45. df_RFW_eval_all.rename(columns={'all_pred':'predict','capacity':'actual'},inplace=True)
  46. if show_fit:
  47. df_RFW_eval_all.loc[:,['predict','actual']].plot(figsize=(10,4),title=f'CEEMDAN+RF+{mode}的拟合效果')
  48. print(f'CEEMDAN+RF+{mode}的效果为mae:{MAE1}, rmse:{RMSE1} ,mape:{MAPE1}, re:{RE1}')
  49. print(f'CEEMDAN+{mode}的效果为mae:{MAE2}, rmse:{RMSE2} ,mape:{MAPE2}, re:{RE2}')

Define the training function

  1. def train_fuc(mode='LSTM',window_size=8,batch_size=32,epochs=100,hidden_dim=[32,16],Rated_Capacity=2,show_imf=False,show_loss=True,show_fit=True):
  2. df_RFW_eval_all=pd.DataFrame(df['capacity'])
  3. df_eval_all=pd.DataFrame(df['capacity'])
  4. for i,imfname in enumerate(imf_names):
  5. print(f'正在处理分量信号:{imfname}')
  6. data=df[imfname]
  7. X_train,y_train,X_test,y_test=get_traintest(data.values,window_size=window_size,train_size=len(data))
  8. if mode!='MLP':
  9. X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
  10. #print(X_train.shape, y_train.shape)
  11. start = datetime.datetime.now()
  12. set_my_seed()
  13. model=build_model(X_train=X_train,mode=mode,hidden_dim=hidden_dim)
  14. hist=model.fit(X_train, y_train,batch_size=batch_size,epochs=epochs,verbose=0)
  15. if show_loss:
  16. plot_loss(hist,imfname)
  17. #预测
  18. point_list = list(data[:window_size].values.copy())
  19. y_pred=[]
  20. while (len(point_list)) < len(data.values):
  21. x = np.reshape(np.array(point_list[-window_size:]), (-1, window_size)).astype(np.float32)
  22. pred = model.predict(x)
  23. next_point = pred[0,0]
  24. point_list.append(next_point)#加入原来序列用来继续预测下一个点
  25. #point_list.append(next_point)#保存输出序列最后一个点的预测值
  26. y_pred.append(point_list)#保存本次预测所有的预测值
  27. y_pred=np.array(y_pred).T
  28. #print(y_pred.shape)
  29. end = datetime.datetime.now()
  30. if show_imf:
  31. df_eval=pd.DataFrame()
  32. df_eval['actual']=data.values
  33. df_eval['pred']=y_pred
  34. mae, rmse, mape=evaluation(y_test=data.values, y_predict=y_pred)
  35. print(f'{imfname}该分量的效果:mae:{mae}, rmse:{rmse} ,mape:{mape}')
  36. df_eval_all[imfname+'_w_pred']=y_pred
  37. df_RFW_eval_all[imfname+'_w_pred']=y_pred*imf_weight[i]
  38. print('============================================================================================================================')
  39. evaluation_all(df_RFW_eval_all,df_eval_all,mode=mode,Rated_Capacity=Rated_Capacity,show_fit=show_fit)
  40. print(f'running time is {end-start}')

The training function uses all the previous custom functions, and I want to understand the functions of all the custom functions. 

The values ​​of the initialization parameters are the default values ​​of the hyperparameters.

  1. window_size=8
  2. batch_size=16
  3. epochs=100
  4. hidden_dim=[32,16]
  5. Rated_Capacity=2
  6. show_fit=True
  7. show_loss=True
  8. mode='LSTM' #RNN,GRU,CNN

 window_size refers to the size of the sliding sequence window

batch_size is the batch size

epochs is the number of training epochs

hidden_dim is the number of neurons in the hidden layer of the neural network

Rated_Capacity is the initial value of the battery capacity, the initial value of the battery in NASA is 2

show_fit Whether to display the fitting effect graph

show_loss Whether to display the loss change graph

mode is the neural network model type


Model Training and Evaluation

The above code encapsulates all the processes, and the next training and evaluation only need to change the parameters.


LSTM prediction

  1. mode='LSTM'
  2. set_my_seed()
  3. train_fuc(mode=mode,window_size=window_size,batch_size=batch_size,epochs=epochs,hidden_dim=hidden_dim,Rated_Capacity=Rated_Capacity)

The output effect is as above, it will print the change of training loss of each component, the evaluation index of point estimation, and the final evaluation index of the overall prediction effect with random forest and without random forest.

(My anaconda has been reinstalled once before, the environment has changed, and the value in the paper can't be run out...but the difference is not big, for example, mae, here is 0.039368, the paper is 0.039161, and other indicators are similar)

 

If you want to change other parameters, just change them directly in the sequence function. For example, if you want to use a sliding window of 16:

train_fuc(window_size=16)

You can run it to get the result. If the picture is too long, it will not be cut.

The default model in my training function is LSTM (because it works best)

If you want to change the number of neurons in the hidden layer, you can write it like this:

train_fuc(hidden_dim=[64,32])

Very concise and convenient.


 

 RNN prediction

Just modify the mode parameter

  1. mode='RNN'
  2. set_my_seed()
  3. train_fuc(mode=mode,window_size=window_size,batch_size=32,epochs=epochs,hidden_dim=hidden_dim,Rated_Capacity=Rated_Capacity)

 The picture is too long so I won’t finish it, just look at the results of the final evaluation index calculation. (The same is true, because the running environment has been reinstalled, so the current running results are slightly different from those in my thesis)

(Screenshot of the paper)


 

 

 GRU forecast

  1. mode='GRU'
  2. set_my_seed()
  3. train_fuc(mode=mode,window_size=window_size,batch_size=batch_size,epochs=epochs,hidden_dim=hidden_dim,Rated_Capacity=Rated_Capacity)

 


1D CNN prediction

  1. mode='CNN'
  2. set_my_seed()
  3. train_fuc(mode=mode,window_size=window_size,batch_size=batch_size,epochs=epochs,hidden_dim=hidden_dim,Rated_Capacity=Rated_Capacity)

 


MLP forecast

  1. mode='MLP'
  2. set_my_seed()
  3. train_fuc(mode=mode,window_size=window_size,batch_size=batch_size,epochs=90,hidden_dim=hidden_dim,Rated_Capacity=Rated_Capacity)

 

I didn't spend too much time adjusting other hyperparameters, because the neural network takes a long time to run at a time. If students are interested, they can try more hyperparameter adjustments, and maybe they can get better prediction results.

Pictures in my article:

 



Category of website: technical article > Blog

Author:Soledad

link:http://www.pythonblackhole.com/blog/article/79962/d825607f1817c61f3e37/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

11 0
collect article
collected

Comment content: (supports up to 255 characters)