EE 380L

Artificial Neural Systems

Spring 1999
 
 

Time series prediction by neural networks
 
 

Presentation

Frank Breitling
Alexander Zemlianov
Taras Kirichenko
 
 
 
 
 
 
 
 
 

I. Time series analysis


The data

Example plot of the time series (N1).

x1, y1 (detector 1)

x2, y2 (detector 2)

A little Matlab program adjusts the outliers. Every red dot shows the adjusted samples.

Then the data is normalized to zero mean and standard deviation 1 for further processing.


 
 

Power Spectrum

The first thing one can think of is looking at the power spectrum. This is a very successful method for linear systems. Apparently on the laser data it fails (plot for data set N1, x1). The peak close to zero is probably not an actual signal but cause by noise and the limited data size. Anyway this peak would represent a very low frequency around 0.1Hz (since fmax = the Nyquist frequency which is 10Hz at the sample rate of 20Hz) that would be automatically corrected by a simple feed back control.


 
 
 
 

Autocorrelation function

Next the the auto correlation function C(t ) of the data was examined. C(t ) is defined by:
 
 

It tuns out, that there is at a certain correlation between the points what can be infered from the periodic oscillation. The smoothness and the absence of noise are remarkable.

The following zoomed graphs give a better insight.

It seems that the x and y components have slightly different correlation.

Embedologie and Dimensional estimation


 
 

Since this the autocorrelation function gives hints towards a high dimensional chaotic system, since the first minimum can be found at about 100 we were going to determined the dimension of the data by the method by Grassberger and Procaccia [Physica D 9, 189 (1983)]. Their calculation is based on a box counting algorithm. By looking at the increase of the number of neighbors by increasing the distance r one can determined the dimension (D). Since the number of neighbor points is proportional to the growth of volume (V) in the D dimensional hyper-space and V is proportional to rD for small values of r, where edge boundary effects and the limitation of the data set can be neglected, the Number of neighbors given by the correlation integral C(r) is a direct proportional to rD. So by plotting C(r) vs. r in a double logarithm plot one can obtain D as the slope of the graphs for small values of r. The slope will saturate at the dimension of the attractor.

The dimension of the test vectors are given by the time delay (t ).

The embedding theorems say that it is possible to reconstruct any attractor by just one phase space variable and its time delayed coordinates instead of the original set of phase space variables. Even though the attrators geometry changes the dimension stays the same. So this technique is appropriate to estimate the dimension, i. e. the degrees of freedom, i. e. the necessary size of the time delayed vector to describe and finally predict the system.

At first we tried to reconstruct the attractor from laser data.

What was obtained was a cloud of spot. This is typical for systems with a certain noise level. This doesn’t mean that there is no attractor, but it means that it is not possible to resolve it with the intrinsic noise level. The same happens to the Lorenz attractor if it is exposed to noise.

Here are the plots of a generated Lorenz attractor as a reference attractor and the laser data.

Then the plots for the correlation integral follow.

The first plot shows the Dimensional estimation for the Lorenz data. The second is a comparison with noise. Anyway this is rather an estimate and is supposed to give an idea of the how noise might look like. According to the theory one needs at least 2^D samples for a good estimation. In the case for t =30 it means a data set of 2^30 ~10^9 samples. At the same time the computational effort grows with N^2. That means it is not exhaustively feasible. Anyway the curves show a kind of the expected tendency.

The 3rd graph shows the laser data. The result is somewhere between. It doesn’t look like complete random. It looks rather like a superposition of a noisy background and a very low dimensional (D~2) system. The blow up reveals some more details. This will be one of the major investigations for the future.
 
 


II. Neural Networks

Regardless of the encountering difficulties neural network show that there is a basic reduction of the standard deviation possible that excels conventional feed back systems by about 20%. Many test runs for a lot of different training parameters show that this 20% improvement is typical for most models tested on the data. E. g. same results are yielded by a simple ARMA model. The results seem also very insensitive to network parameters. E. g. the number of hidden units and not more than a five dimensional input vector seem completely sufficient. This means for realizing neural control systems that the demand on the computational effort will be pretty low and could possibly be done in real time by old and slow computers.

Also this would fortify the theory that the dynamic of the beam movement can best be described by a rather simple and low dimensional law that is superposed by a not neglectable amount of noise.

However, even if it is not yet finally clear how to describe and characterize the time series best the neural networks don’t care either and the training results speak for themselves. So from that point of view the realizing of a neural network driven control system should be considered.

It follow a few neural network results with different training and network parameters. Elman networks yielded the best results. Better than simple back propagation or other variations. The reason are probably the recurrent weights after the first layer that make it to a specialist network for time series prediction.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Elman network (5-30-1) on data set N33

=============================

std deviations:

------------------

1. original data: 1.000

2. FB corrected data: 0.534

3. neural network corrected data: 0.428

4. ratio 2./1.: 0.534

5. ratio 3./1.: 0.428

6. ratio 3./2.: 0.801

running for 0.953 hours
 
 
 
 

Elman network (5-10-1) on data set N33

==============================

standard deviations:

--------------------

1. original data: 1.000

2. FB corrected data: 0.534

3. neural network corrected data: 0.429

4. ratio 2./1.: 0.534

5. ratio 3./1.: 0.429

6. ratio 3./2.: 0.803

running for 0.014 hours
 
 
 
 

Elman network (5-10-1) on data set N1

============================

standard deviations:

--------------------

1. original data: 1.000

2. FB corrected data: 1.031

3. neural network corrected data: 0.824

4. ratio 2./1.: 1.031

5. ratio 3./1.: 0.824

6. ratio 3./2.: 0.799

running for 0.007 hours

Some remarks on the dimensional computation.

Because the algorithm has a lot of operation there was the need of speeding up the program. Since Matlab is rather slow with for loops but pretty fast on Matrix operation one task was converting the for loops to matrix operations. Even though this gained a lot of speed and I convinced myself for small vectors with calculation by hand that the program gives the right results I still want to do the code once again and compare it with a more primitive one to make sure, that the results are ok.

The code is attached.

% Dimensional estimation by box counting

% method after P. Grassberger and I. Procaccia,

% Physica D 9, 189 (1983)

%==============================================

% Program by Frank Breitling

warning off

[vec,minp,maxp]=premnmx(ve');vec=vec'; %provide inputvector in D(:,1)=[...]'

rstep=.05;log10r=-0.5:rstep:.1;r=10.^log10r; %logarithm step size of r (!smaller step size gives preciser slope!)

taustep=5;taumax=6*taustep;

show=0;S=[];TD=[];

for td=0:taumax %creating time delayed matrix

TD=[TD(2:end,:) vec(1:end-td)];

end

%TD=TD(end-8000:end,:); %will cut TD to same length, so same values areobtained even for different taumax

m=length(TD);

N=round(m/8);N2=N*(N-1);

TD=TD(1:N,:);

DIFFTD=[];DISTD2=[];

DSUM=zeros(length(r),taumax/taustep); %initialization saves much time

for t1=1:length(TD) %outer sum

TDT1=ones(N,1)*TD(t1,:);

DIFFTD=(TDT1-TD).^2; %activate to use eucledean norm

%DIFFTD=(TDT1-TD); %activate to use maximum norm

%inner sum (actualy reduced to matrix operations)

DISTD2(:,1)=DIFFTD(:,1); %distance matrix: column contain (distance)^2 of X(t1) to all other X according to tau

for tau=1:taumax %DISTD2(:,2) contains time delay 1, DISTD2(:,3) contains time delay 2, etc.

DISTD2(:,tau+1)=DISTD2(:,tau)+DIFFTD(:,tau+1); %activate to use eucledean norm

%DISTD2(:,tau+1)=abs(max(DISTD2(:,tau),DIFFTD(:,tau+1))); %activate to use maximum norm

end

for tau=taustep:taustep:taumax

DSUB=DISTD2(:,1+tau)*ones(1,length(r))-ones(N,1)*r.^2; %activate to use eucledian norm %distance matrix - radius matrix

%DSUB=DISTD2(:,1+tau)*ones(1,length(r))-ones(N,1)*r; %activate to use maximum norm %distance matrix - radius matrix

DSUM(:,tau/taustep)=DSUM(:,tau/taustep)+sum(hardlim(-DSUB))'-ones(1,length(r))';

%counting the negatives, but not X(t1)-X(t1) that is always =0

S(:,tau/taustep)=diff(log10(DSUM(:,tau/taustep)))/rstep; %slope matrix

end

%if t1>show

%show=show+10;

clc;%DSUM/N2

S

t1 %outer sum counter

%end

end

figure(4);clf;zoom on;

subplot(2,1,1)

%plot(log10r,log10(DSUM(:,:)/N2));grid on;zoom on;hold on;

plot(log10r,log10(DSUM(:,1)/N2));grid on;hold on;

plot(log10r,log10(DSUM(:,2)/N2),'-o');

plot(log10r,log10(DSUM(:,3)/N2),'-x');

plot(log10r,log10(DSUM(:,4)/N2),'-+');

plot(log10r,log10(DSUM(:,5)/N2),'-*');

plot(log10r,log10(DSUM(:,6)/N2),'-s');

ti='Noise';title(ti)

xlabel('log10(r)');ylabel('log10(C(r))')

leg=' ';

for i1=1:taumax/taustep

%slopos=find(hardlim(S(:,i1)));

%slope(i1)=(S(slopos(2),i1)+S(slopos(3),i1))/2;

%legstr=['tau=' num2str(i1*taustep) ',sl~' num2str(slope(i1))];

legstr=['tau=' num2str(i1*taustep) ' '];

leg(1+i1*20:i1*20+length(legstr))=legstr;

end

precision=5;

legend(leg(21:21+precision),leg(41:41+precision),leg(61:61+precision),...

leg(81:81+precision),leg(101:101+precision),leg(121:121+precision),-1)

subplot(2,1,2)

plot(log10r(2:end)-rstep/2,S(:,1));grid on;hold on;

plot(log10r(2:end)-rstep/2,S(:,2),'-o');

plot(log10r(2:end)-rstep/2,S(:,3),'-x');

plot(log10r(2:end)-rstep/2,S(:,4),'-+');

plot(log10r(2:end)-rstep/2,S(:,5),'-*');

plot(log10r(2:end)-rstep/2,S(:,6),'-s');

xlabel('log10(r)');ylabel('slope(log10(C(r)))')

legend(leg(21:21+precision),leg(41:41+precision),leg(61:61+precision),...

leg(81:81+precision),leg(101:101+precision),leg(121:121+precision),-1)

filename=strcat(ti(1:3),'_',num2str(t1));

save(filename,'log10r','DSUM','S','N2','taumax','rstep','taustep','t1','ti','ve')