当前位置：网站首页>[LSTM regression prediction] Based on MATLAB tpa-lstm time attention mechanism, long-term and short-term memory neural network regression prediction (multiple input and single output) [including Matla

[LSTM regression prediction] Based on MATLAB tpa-lstm time attention mechanism, long-term and short-term memory neural network regression prediction (multiple input and single output) [including Matla

2022-07-18 11:56:00 【Poseidon light】

One 、 Time pattern attention mechanism BiLSTM forecast

1 BiLSTM Principle structure
LSTM On 1997 Was proposed in , Used to deal with long time series problems , A typical LSTM Structure is shown in figure 2 Shown .

chart 2 in ,xt Represents the current input of the time series ;Ct At present LSTM The cellular state of the unit , Usually only in LSTM Internal flow , yes LSTM The internal memory of ;ht Represents the current encoded hidden state vector ;ft Indicates the degree of forgetting information ;it Indicates the retention degree of input information ;C t Processing information indicating the current state .ot Express the degree of retention of output information ;tanh Represents hyperbolic tangent function . Subscript t-1 Represents the last moment LSTM The state quantity corresponding to the unit .

LSTM Unit has 3 Daomen , The forgotten door 、 Input gate and output gate . Forgetting gate can forget a certain proportion of past information ; The input gate records part of the input information at the current time into the cell state ; The output gate will encode the hidden state vector and the cell state selectively as the next moment LSTM Unit input .

The output of the current moment may not only be related to past information , But also related to future information . but LSTM Cannot encode information from back to front , and BiLSTM By reversing the time series , From positive to negative LSTM form , It can better capture the impact of bidirectional sequences .BiL STM The output expression is
Insert picture description here
In style ：ht Express BiLSTM The hidden state vector of ;concat It means splicing in the output dimension ;htf,htb It means forward and backward LSTM The hidden state vector of .

BiLSTM The structure diagram is shown in the figure 3 Shown . By reversing the forward sequence as a backward LSTM The input of , Two neural networks can be trained at the same time . Forward direction LSTM Use past information to predict future information , Back LSTM Use future information to predict past information , The output result is determined by the output of the two networks .BiLSTM It has a better prediction effect for time series that rely on both before and after information , Therefore, this paper adopts BiLSTM Neural network combined with wind power for two-way information prediction .

2 TPA Mechanism
The attention mechanism mimics the human brain , Pay more attention to important information , And ignore the relatively useless information , It has been widely used in natural language processing 、 In image and speech recognition , In recent years, it has also been widely used in various prediction problems . The traditional attention mechanism pays attention to the weight distribution at different time points , It has better effect when each time step contains only one variable . But for the power prediction of multiple wind turbines in the region , Each time step contains multiple variables , There may be complex nonlinear internal relations between various variables , And each variable sequence has its own characteristics and period , It is difficult to choose a time step alone as the focus of attention . and TPA By multiple one-dimensional CNN Filter from BiLSTM Hidden state line vector extraction feature , So that the model can learn the interdependence between multivariable from different time steps .TPA The structure diagram is shown in the figure 4 Shown .
Insert picture description here
chart 2 A typical LSTM structure

chart 3 BiL STM Structural sketch

For the original time series BiLSTM Handle , Got ht-w—ht Express BiL STM The hidden state vector corresponding to different time inputs ,w Is the length of the time series . Define the implicit state matrix H=(ht-w,ht-w+1,…,ht-1), The implicit state column vector represents the same time step BiLSTM Variables composed of parameters of internal gate neurons , The row vector represents the state of a single variable in all time steps .

chart 4 Implicit state matrix H The boxes on represent different one-dimensional convolution kernels , Use one-dimensional convolution along H Of m Feature convolution , Extract the time pattern matrix of the variable signal model HC
Insert picture description here
In style ：Cj It means the first one j A length of T Filter for ;T Indicates the maximum length that needs attention , Usually it can be taken as w;* For convolution . The convolution kernel of one-dimensional filter has k individual , Each convolution kernel is convoluted along the row vector of the implicit state matrix . The time pattern matrix contains the complex internal relations and temporal relationships of different sequences , It is a high-dimensional embodiment of the complex nonlinear relationship of different sequences .

The following attention mechanism function is defined to calculate the Correlation ：
Insert picture description here
In style ：HiC yes HC The row vector ;Wa by m×k The weight matrix of ;αi For attention weight ;σ Express Sigmoid function . Use the attention weight obtained αi And HC To sum by weight , Get attention vector vt

In style ,n Represents the input variable x The characteristic number of .

take vt And ht The final prediction value is obtained by adding after linear mapping
Insert picture description here
In style ：yt-1+△ Represents the final predicted value ;h′ Is the intermediate variable used to generate the final value ;Δ Represents the prediction time scale of different prediction tasks ;Wh′,Wh and Wv Is the different weight matrix of the corresponding variable .

The traditional attention mechanism directly utilizes the time series of the original data CNN Feature extraction , Only the temporal features of a single sequence can be extracted , Unable to take into account the correlation between different sequences . and BiLSTM The variables of the implicit state matrix contain the complex relationships between different sequences under different time steps , utilize CNN Feature extraction of the row vector of the implicit state matrix , It can simultaneously extract the complex relationship between time series and different variables . And attention vector vt Is the weighted sum of row vectors of the time pattern matrix containing time information , Therefore, the model can select relevant information from different time steps . When dealing with the complex and nonlinear interdependence between time steps and different sequences such as multi fan ultra short-term power prediction ,TPA Its advanced performance shows unique advantages .

Two 、 Partial source code

%  Data sets   List as a feature , Number of behavior samples 
%  Data sets   List as a feature , Number of behavior samples 

%%  Clear the command line window and workspace variables 
clc
clear
close all
%%  Path settings 
addpath('./')

%%  Data import and processing 
load('./Train.mat')
Train.weekend = dummyvar(Train.weekend);
Train.month = dummyvar(Train.month);
Train = movevars(Train,{
    'weekend','month'},'After','demandLag');
Train.ts = [];

% Train.hour = dummyvar(Train.hour);
% Take the initiative to observe the variable format of the right workspace , Change and replace the previous data 
Train(1,:) =[];
y = Train.demand;
x = Train{
    :,2:5};
[xnorm,xopt] = mapminmax(x',0,1);
[ynorm,yopt] = mapminmax(y',0,1);
%
% xnorm = [xnorm;Train.weekend';Train.month'];
%%
% x = x';
xnorm = xnorm(:,1:1000);
ynorm = ynorm(1:1000);

k = 24;           %  Lag length 

%  convert to 2-D image
for i = 1:length(ynorm)-k

    Train_xNorm(:,i,:) = xnorm(:,i:i+k-1);
    Train_yNorm(i) = ynorm(i+k-1);
    Train_y(i) = y(i+k-1);
end
Train_yNorm= Train_yNorm';



ytest = Train.demand(1001:1170);
xtest = Train{
    1001:1170,2:5};
[xtestnorm] = mapminmax('apply', xtest',xopt);
[ytestnorm] = mapminmax('apply',ytest',yopt);
% xtestnorm = [xtestnorm; Train.weekend(1001:1170,:)'; Train.month(1001:1170,:)'];
xtest = xtest';
for i = 1:length(ytestnorm)-k
    Test_xNorm(:,i,:) = xtestnorm(:,i:i+k-1);
    Test_yNorm(i) = ytestnorm(i+k-1);
    Test_y(i) = ytest(i+k-1);
end
Test_yNorm = Test_yNorm';

clear k i x y
%
Train_xNorm = dlarray(Train_xNorm,'CBT');
Train_yNorm = dlarray(Train_yNorm,'BC');
Test_xNorm = dlarray(Test_xNorm,'CBT');
Test_yNorm = dlarray(Test_yNorm,'BC');
%%  Division of training set and verification set 
TrainSampleLength = length(Train_yNorm);
validatasize = floor(TrainSampleLength * 0.1);
Validata_xNorm = Train_xNorm(:,end - validatasize:end,:);
Validata_yNorm = Train_yNorm(:,TrainSampleLength-validatasize:end);
Validata_y = Train_y(TrainSampleLength-validatasize:end);

Train_xNorm = Train_xNorm(:,1:end-validatasize,:);
Train_yNorm = Train_yNorm(:,1:end-validatasize);
Train_y = Train_y(1:end-validatasize);
%%

% Parameter setting 
inputSize = size(Train_xNorm,1);   % data input x The characteristic dimension of 
outputSize = 1;                    % Data output y Dimensions   
numhidden_units1=50;

[params,~] = paramsInit(numhidden_units1,inputSize,outputSize);     %  Import initialization parameters 

[~,validatastate] = paramsInit(numhidden_units1,inputSize,outputSize);     %  Import initialization parameters 
[~,TestState] = paramsInit(numhidden_units1,inputSize,outputSize);     %  Import initialization parameters 
%  Training related parameters 
TrainOptions;
numIterationsPerEpoch = floor((TrainSampleLength-validatasize)/minibatchsize);
LearnRate = 0.01;
%% Loop over epochs.
figure
start = tic;
lineLossTrain = animatedline('color','r');
validationLoss = animatedline('color',[0 0 0]./255,'Marker','o','MarkerFaceColor',[150 150 150]./255);
xlabel('Iteration')
ylabel('Loss')

% epoch  to update  
iteration = 0;
for epoch = 1 : numEpochs
   
    [~,state] = paramsInit(numhidden_units1,inputSize,outputSize);       %  Every round epoch,state initialization 
    disp(['Epoch: ', int2str(epoch)])
    
    % batch  to update 
    for i = 1 : numIterationsPerEpoch
       
        iteration = iteration + 1;
        disp(['Iteration: ', int2str(iteration)])
        idx = (i-1)*minibatchsize+1:i*minibatchsize;
        
        dlX = gpuArray(Train_xNorm(:,idx,:));
        dlY = gpuArray(Train_yNorm(idx));
        [gradients,loss,state] = dlfeval(@TPAModel,dlX,dlY,params,state);
        
        % L2 Regularization 
%         L2regulationFactor = 0.000011;
%         gradients = dlupdate( @(g,parameters) L2Regulation(g,parameters,L2regulationFactor),gradients,params);
%         gradients = dlupdate(@(g) thresholdL2Norm(g, gradientThreshold),gradients);
        
        [params,averageGrad,averageSqGrad] = adamupdate(params,gradients,averageGrad,averageSqGrad,iteration,LearnRate);
        
        
        
        %  Verification set test 
        if iteration == 1 || mod(iteration,validationFrequency) == 0
            output_Ynorm = TPAModelPredict(gpuArray(Validata_xNorm),params,validatastate);
            lossValidation = mse(output_Ynorm, gpuArray(Validata_yNorm));
        end

3、 ... and 、 Running results

Insert picture description here

Four 、matlab Edition and references

1 matlab edition
2014a

2 reference
[1] Wang Yuhong , Shi Yunxiang . Based on the attention mechanism of time pattern BiLSTM Ultra short term power prediction of multi wind turbines [J]. High voltage technology . 2022,48(05)

3 remarks
This part of the introduction is taken from the Internet , For reference only , If infringement , Contact deletion

原网站

版权声明
本文为[Poseidon light]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207160035443927.html

当前位置：网站首页>[LSTM regression prediction] Based on MATLAB tpa-lstm time attention mechanism, long-term and short-term memory neural network regression prediction (multiple input and single output) [including Matla

[LSTM regression prediction] Based on MATLAB tpa-lstm time attention mechanism, long-term and short-term memory neural network regression prediction (multiple input and single output) [including Matla

One 、 Time pattern attention mechanism BiLSTM forecast

Two 、 Partial source code

3、 ... and 、 Running results

Four 、matlab Edition and references

边栏推荐

猜你喜欢

随机推荐