Pages

Wednesday, May 6, 2015

Introduction To The Frequency Domain With R

The use of frequency domain concepts is extremely useful when it comes to the analysis of systems. A good portion of communication and control system theory is done almost exclusively within the frequency domain. The advantage of the frequency domain is that it can more easily incorporate uncertainty than time domain analysis. Since the conversion of a signal to a frequency domain representation is a mathematical operator, this is a very mathematical topic. But I believe that many principles can be understood by just working with data and the Fast Fourier Transform (FFT). This article explains some of the fundamentals of the FFT using the open source R programming language. Later articles will then cover more advanced concepts.

This article is somewhat experimental on my part. If you are unfamiliar with frequency domain concepts, my feeling is the best idea would be to experiment with the R commands that I used to generate these screenshots. Understanding why this matters may need to wait until I write follow up articles explaining how frequency domain concepts influence time series analysis.

What Is The Frequency Domain?


The basic concept of a frequency domain representation of a signal now appears to be well known as a result of advances in science and electronics. The difficulty probably lies in the mathematics around the concept. But it is possible to develop an intuitive understanding of the frequency domain, without recourse to the underlying mathematical formulae.

The figure above gives the high level view of how the frequency domain is related to the the usual time domain representation of a series.
  • A Fourier Transform converts a time domain series into a frequency domain representation.
  • An Inverse Fourier Transform converts back a frequency domain representation into a time series.
Please note that there are a number of Fourier transforms, for both continuous time and discrete time, as well as related operators (Laplace and z-transform). Within this article, I will use the Fast Fourier Transform, the "FFT", which is heavily used in practice.

The frequency domain representation of a time series is generated by breaking the series up into a set of underlying sinusoidal series of a number of different frequencies. (A sinusoid is a time series that is generated by a function that is a combination of sine and/or cosine functions.)

As long as we do not ask awkward questions as to how the FFT works, we can treat it as a black box and just examine the output empirically. Since I believe that frequency domain concepts are mainly useful for understanding the fundamental limitations of what analysis can be done on economic time series, and less useful for computational purposes, I do not see the need to delve too deep into the mathematical formulae.

Using The FFT In R



The first thing to do is to generate a sine wave in the time domain. In the R code above, I create a time vector that runs from t=0 to t=63, which is 64 points. The preferred indexing to work with is to have the first index as being 0, and not 1. But the R language defines the first point in a vector as having an index of 1, which makes things messier.

The line:
s <- sin(2*pi*t/16)   # Frequency = 1/16
generates the entire time series (denoted s) in one line of code; this is because the R language is vectorised. In some languages, it would be necessary to run a loop to generate all the values of the time series. Also note that R uses the "s <-" to assign a value to the variable s. As the plot shows, the time series s repeats every 16 time steps. (Note that I have truncated the x-axis by the use of a xlim option.)

Please note that the frequency of this time series is 1/16 (0.0625), which is expressed relative to the time steps in discrete time. If the time steps were one second apart, this would correspond to a frequency of 0.0625 Hertz (1 Hertz = 1/second). However, since we do not know how much time each time step represents, the frequencies I quote here have no dimension. To put this into context for economics, an annual frequency when we sample data quarterly corresponds to a (dimensionless) frequency of 1/4 (= 0.25), but if we sample data monthly, an annual frequency is 1/12 (=.0833). In other words, the same real world frequency maps to different dimensionless frequencies depending upon the rate at which you sample data.

This time series s will be used repeatedly throughout this article.


The screenshot above shows the calculation of the frequency domain representation of s, which is assigned to f_s. When I print out the first 6 entries of f_s, we get a small surprise: the values of the vector are now complex numbers. That is, they have an imaginary component, denoted with i, where i is the square root of -1. (In electrical engineering texts, j is used to represent this number, as i was already reserved to refer to current in a circuit.)

Since it is hard to visualise imaginary numbers, I plotted the absolute value of each of the vector components. (The absolute value is more formally the modulus of the complex number.) We can see the magnitude of each entry, but we lose the information about the breakdown of the real and imaginary components.

As the chart shows, the magnitude of most entries is nearly zero (with non-significant calculation errors), except for the 5th and 61st entry of the vector, which are -32i and +32i respectively. 

Why those entries? If we go to a zero-based indexing, those are entries 4 and 60 (out of 64). The entry 4 is equal to the size of the sample (64) divided by the frequency (16). The other entry is a complement of that value. Since I did not want to delve into the mathematics, all I can say is that this complement has to be there. I will show below what happens if that higher frequency component is deleted.

Also note that there is an asymmetry. The first entry (index 0) corresponds to a constant signal (or direct current (DC) in electrical engineering). If we take the FFT of time series that is 1 everywhere, the FFT is zero everywhere except the first entry.

> x = rep(1,64)
> x
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> fft(x)
 [1] 64+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i
[13]  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i
[25]  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i
[37]  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i
[49]  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i  0+0i
[61]  0+0i  0+0i  0+0i  0+0i

In summary, the frequency domain representation will have a first entry that corresponds to a constant offset, and then the remaining entries correspond to time-varying sinusoids. 

Phase Difference



In the code above, I create another time series, called c, which is generated by the cosine (cos) function. As can be seen in the plot, the cosine is just like the sine wave, except that it is leading by 4 periods. This corresponds to a result from trigonometry which most people have probably forgotten (including the author, who had to validate the result):
cos(x) = sin(x + π/2).
When we compare the frequency domain representations, they have the same magnitude at all points. (If we subtract the absolute values of the frequency domain vectors, the difference is zero.) 

The difference between the two frequency domain representations is the breakdown of the real/imaginary parts. For the cosine, the 5th entry is 32, compared to the sine which is -32i

In general:
  • the magnitude (modulus) of a frequency domain entry gives the amplitude of the sine wave;
  • the relative sizes of the real/imaginary parts of the frequency domain entry determines the phase shift (lead/lag) of the sine wave.

Filtering A Noisy Signal



We can now look what happens when we make the time series more complicated. This was done by adding "noise" which is just another sinusoid of higher frequency.

The code above creates a noise signal n which now repeats every 8 periods (frequency = 1/8, which is larger than 1/16). The series "signal" is the sum of the noise and the original sine wave  (signal = s + n; note that we can use "=" for assignment as well in R). The resulting time domain signal is shown above.



As one might expect, the FFT of "signal" now has two peaks, at (zero-based) indices 4 and 8; which correspond to 64/16, and 64/8 respectively.



We can then apply an ideal low-pass filter , which just eliminates high frequency components. I just zero out the middle of the frequency domain representation of the signal.

We can then convert this filtered signal back to the time domain, using the fft(f_filtered,inverse=TRUE)  operation.  Given a quirk in the definition of the FFT, in order to bring back the true time domain representation, we have to divide by the length of the sample period (in this case 64). This means that the "inverse FFT" is not the "true" Inverse Fourier Transform. The reason for this unusual scaling is that the standard definition of the FFT is computationally efficient. (It is called the Fast Fourier Transform for a reason; it was a great leap forward for the efficiency of digital signal processing.)

Therefore, this ideal low-pass filter does what you intuitively expect: it wipes out the high frequency noise in the signal.

What If We Break Complementarity?


The screenshot above shows what happens if we break the complementarity of the frequency domain representation. In it, I remove all of the vector beyond the 6th index. This eliminates the noise, as well as the top end of the original lower frequency (frequency = 1/16) signal.

The time domain series after the inverse FFT now has imaginary parts. If we plot the real part, we see that it is a sine wave, but with half of the amplitude. The modulus is is constant, and is 0.5, which is half of the original amplitude.

What this means is that the original signal is the sum of two of complex sine waves, with the imaginary parts cancelling out.

More technically, a FFT on a signal with N time points is converted into a frequency domain representation sampled at N point. The sample (dimensionless) frequencies are given by: 0, 1/N, 2/N,... (N-1)/N. (This converges towards a frequency of 1, which is a signal that repeats every time sample.) For a discrete time signal, what happens is that a (dimensionless) frequency that is 1 or higher becomes indistinguishable from a frequency that is in the interval [0,1) (numbers that are greater than or equal to 0, but strictly less than 1). For example, if a sinusoid has frequency of 1, it will repeat the same value each period, and so it is indistinguishable from a constant (frequency = 0). (Please note that this type of aliasing does not happen for continuous time signals, which is why this concept is not intuitive, since it is not normally encountered.) Please note that the usual convention is to denote this interval as being [0,2π), but I have embedded the 2π within my definition of a frequency.

It takes a little bit of trigonometry, but we will discover that only the frequencies in the range [0,1/2) can be represented in the time domain; the frequency domain components in the range (1/2,1) are just the complements of the lower frequency signal.

In summary: when we look at the FFT of a time domain signal, only the first half of the vector actually contains information.The second half of the vector just ensures that the inverse operation returns a real-valued time signal.

Appendix: R Code

If you copy this code into a file, the "readline()" command will stop execution until you hit "return". The R programming language is available here (free).

#######################
# First plot
# Create 64 period time vector, sine wave
t <- 0:63
# Operation is vectorised, can do in one line
s <- sin(2*pi*t/16)   # Frequency = 1/16
print(s[1:17])
plot(t,s,xlim=c(0,36)) 
title(main="Sine Wave, frequency=1/16")
abline(h=0,v=c(0,16,32)) # Add marker lines
readline()
#######################
# Second plot
idx <- 0:63
f_s <- fft(s)
# Display elements 1-6: imaginary numbers!
print(f_s[1:6],digits=3)
# Since imaginary, plot the absolute value
plot(idx,abs(f_s))
title("FFT of sine wave")
# Second peak at element #61 (=60 zero-based)
print(f_s[60:62],digits=3)
readline()
#######################
# Third plot
# Create c(t) = wave created by cosine
c <- cos(2*pi*t/16) #cosine,Frequency = 1/16
# Plot the two
matplot(t,cbind(s,c),type="b",pch=1:2,col=1:2)
legend(x='topright',pch=1:2,legend=c("sine","cosine"),col=1:2)
f_c <- fft(c)
# Display element 5 of both
mat <- cbind(f_s,f_c)
print(mat[5,],digits=3)
# Note that f_s has 5th element = -32i,
# f_c has 5th element = 32
# However, magnitudes are (almost) identical
print(max(abs(f_c)-abs(f_s)))
readline()
#####################################
# Plot 4: signal & noise
# add a signal with frequncy of 1/8 to the sine
n = sin(2*pi*t/8)  # freq=1/8
signal = s + n
plot(t,signal,type="b")
title("Time series of noisy signal")
readline()
######################################
# Plot 5: FFT of signal & noise:
f_signal = fft(signal)
plot(idx,abs(f_signal))
title("FFT Of Signal & Noise")
readline()
#####################################
# Plot 6:  
# Ideal low pass filter remove everything beyond
# the 6th element. Note there is an index asymmetry.
f_filtered <- f_signal
f_filtered[1+6:(64-6)] = 0
# Need to normalise by 1/length
filtered <- fft(f_filtered,inverse=TRUE)/64
# Noted: there are tiny (1e-16) imaginary parts
# Use Re() to remove them
plot(t,Re(filtered),type="b")
title("Time series after low-pass filter")
readline()
#####################################
# Plot 7: 
# Broken ideal "low-pass" filter
f_filtered <- f_signal
f_filtered[(1+6):64] = 0 # Blow out all of high end
# Need to normalise by 1/length
filtered <- fft(f_filtered,inverse=TRUE)/64
# Imaginary real-time data!
print(filtered[1:5])
matplot(t,cbind(Re(filtered),abs(filtered)),type="b",pch=1:2,col=1:2)
legend(x='right',pch=1:2,legend=c("Real Part","Modulus"),col=1:2)
title("Real Part of Time series After Broken Filtering")

(c) Brian Romanchuk 2015

18 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. This comment has been removed by a blog administrator.

    ReplyDelete
  14. This comment has been removed by a blog administrator.

    ReplyDelete
  15. This comment has been removed by a blog administrator.

    ReplyDelete
  16. This comment has been removed by a blog administrator.

    ReplyDelete
  17. This comment has been removed by a blog administrator.

    ReplyDelete
  18. This comment has been removed by a blog administrator.

    ReplyDelete

Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.