Pages

Sunday, December 16, 2018

Primer: Understanding Principal Component Analysis

From the perspective of analysts with an interest in fixed income markets and macroeconomics, principal component analysis (PCA) is mainly of interest for two areas. The first is straightforward: decomposing yield curve movements so that we can get a better handle on the effects of directionality (which is used for risk analysis). The second application is in the development of aggregate economic indicators. This article offers an introductory overview of the mathematical principles of principal component analysis, without attempting to dive into the underlying mathematics. The justification for this approach is straightforward: without an intuition of what we are trying to accomplish, it is very easy to get lost in a mathematical exposition.
(This is a very preliminary draft of a section that would make its way into my planned text on business cycle analysis. Although the text is meant to be more advanced, it is still unclear whether it is worthwhile diving into the details of the mathematics. It is easy to find a mathematical exposition of PCA analysis, the problem is finding one that explains why we are interested in the subject in the first place.)

Change of Coordinates

We will first step away from PCA analysis itself, and instead offer a basic explanation of the concept of changing coordinate systems. This is a standard part of any course on linear algebra, but the risk is that the reader has long forgotten the contents of that course even if the subject was studied in university.

We assume that we are looking at data where we have a number of underlying variables that we want to treat as a single object. The values of the variables are put into a vector, and we treat the vector of values as a single object. In order to better understand these vectors, we want to express them in a coordinate system that is intuitively useful.

For example, we can look at a simplified yield curve, with just the 2-year and 10-year tenors for government bond yields. On any given date, we have the vector of the 2-year yield and 10-year yield, which would be written as a vector ({2-year yield}, {10-year yield}).* For fixed income portfolio purposes, we are often more interested in the changes in yield, so we would instead have a vector of the one period changes in those yields.
Figure: Change of Coordinates

The figure above offers an abstract explanation of changing coordinates. We assume that we have a vector x which is expressed in some coordinate system. There exists a pair of vectors a and b that have some intuitive meaning for us. We want to express x in terms of those vectors. If a and b are properly chosen (see explanation below), we will find that we can express the vector x in terms of those vectors. In the example, it turns out that x = a + 2b. This means that we can say that the vector x = (1, 2) in the new coordinate system, since it equals 1 times a, and 2 times b. One bit of mathematical jargon that comes up is that we refer to a and b as basis vectors -- which has almost no technical connection to the various ways that "basis" is used in market jargon.

Example. Let us assume that the 1-month change in the 2- and 10-year yields equals +0.6%, +0.4% respectively. (That is, the 2-year rose by 60 basis points, and the 10-year rose by 40 basis points.) We want to express this in a fashion that is more meaningful to fixed income analysis.
  • We define a level shift in the curve as the vector (1, 1). That is, if the there is a shock with magnitude y, the 2-year and 10-year yield both rise by y.
  • We define a steepening shift in the curve by the vector (-0.5, 0.5). That is, if there is a steepening shock with magnitude z, the 2-year yield falls by -z/2, and the 10-year yield rises by z/2 (which implies that the slope changes by +z.)
(Note that the definition of these vectors was somewhat arbitrary. For example, the "slope" vector could have been (1, -1), (-1, 1); the only restriction is that it cannot be of the form (c,c), since that would be just a scaling of the first vector, and we would lose the ability to decompose arbitrary yield curve changes . Principal component analysis is used to give a systematic method to define such vectors.)


Using these basis vectors, the yield curve change of (+0.6, +0.4) can be decomposed as:

(+0.6, +0.4) = 0.5(1, 1) - 0.20(-0.5, 0.5).

That is, the yield curve change consists of a +0.5% level shift, and a -0.2% "steepening" shift (a negative value indicating a flattening).

From a fixed income portfolio perspective, this is a more useful way to look yield changes. If we had a flattening trade on (or sadly, a steepening trade), we will have hedged out the risk of level shifts, and our profit (loss) only depends on the change in the slope, which is the second coordinate. (Carry would matter, but that is locked in for the period of analysis.) Conversely, if we had a straight directional trade on, and were long/short both maturities with the same DV01 -- so that we were indifferent to slope changes -- only the first coordinate (level shift) matters.

Although using this technique to generate exact decompositions can be useful, we are often interested in vectors that have a great many component variables. We often end up with an inability to give an exact decomposition of the original vector, only an approximation.
Figure: Best approximation of a vector.
The figure above is an attempt to illustrate this concept in 3-dimensional space. We have a a point to be approximated, which is an arbitrary vector in three dimensions. We want to best approximate it in a coordinate system that only has two basis vectors -- which implies that the vectors can (at most) cover a 2-dimensional space (a plane). It is unlikely that the point to be approximated lies exactly on the plane if its position is any sense random. This is because a plane has a volume of zero in three dimensions, so the probability of landing on that plane is essentially zero if the probability distribution is smoothly distributed.

What we want to do is find the point in the plane that is "closest" to the target point. We can define distances in multi-dimensional spaces in many different ways, but the cleanest is to use the good old fashioned 2-norm (Euclidean norm). (If it was good enough for Euclid, it is good enough for us.) For those of us whose mathematical knowledge has faded, the Euclidean norm is the square root of the sum of the squares of the components of a vector. This is also known as a root-mean-square, and is why root-mean-square errors (RMSE) pop up a lot in time series analysis.

If we use the Euclidean norm, the closest point within a plane to an arbitrary point in three dimensions is unique, and the vector that is the difference between the target point and the closest point has a useful property: it is at a 90 degree angle to the plane (orthogonal to the plane).

If we return to our level/slope example, we would be in this situation if we have more than two tenors. For example, if we add the 5-year yield change to the vector, and extend the level and slope vectors (one reasonable choice is (1, 1, 1) and (-0.5, 0, 0.5)), we can then attempt to approximate yield changes across the three values. We cannot guarantee that we can represent all changes. For example, just the 5-year yield changing cannot be expressed as a combination of those two vectors exactly (we would need to add a butterfly shock, where the 5-year moves in the opposite direction of the 2- and 10-year yield).

Principle Components

The principal components of a vector of time series are an automated way to best approximate the movements of the time series. In particular, the first principal component is the vector that explains the most of the variance of all of the component series. The second principal component then explains the most possible of the remaining variance, and so on.
Chart: First US Treasury PCA Factor

I have little doubt that the previous statements were clear as mud; it makes more sense once we apply it to yield curve analysis. When we run PCA analysis on yield curve changes, the first principal component explains the most of variance of changes. Typically, this ends up resembling a level shift of the curve (as demonstrated in an earlier article). For monthly changes in the U.S. Treasury curve during 1995-2000, the first PCA component is shown above.** It resembles a level shift, however, the 30-year tenor moves less than the 5-year tenor. This captures the empirical reality that the typical modes for yield shifts are bear flattening and bull steepening: the long end moves less than the short end.

Unlike the arbitrarily chosen level/slope factors used in my earlier example, PCA factors have useful statistical properties. For the period of analysis chosen, the first PCA factor explains 96.8% of the variance, and the first two together explain 99.6% of the variance

From a mathematical perspective, the PCA factors are the eigenvectors of the variance matrix of the variables. For someone without a training in mathematics (or forgotten everything...), that is probably not a helpful characterisation. However, readers that remember mathematics courses taken in university probably do not need a eigenvector primer. I believe that the only mathematical discussion that matters in this context is the following: although we can generally decompose any space into eigenvectors of a matrix, the procedure is notoriously non-robust numerically. That is, small changes to the matrix can have a very large effect on the calculated eigenvectors. In my academic field of control engineering, one of the key issues with applying control theory was the robustness of numerical techniques. Even though eignevector decomposition looks extremely useful from a theoretical perspective (and shows up in undergraduate courses), more practical analysis generally attempts to avoid using such decompositions.

It is at this point I will explain my reservations with the usual descriptions of PCA analysis. Although I was never a brilliant mathematician, I did spend my graduate school days hanging around with pure mathematicians. Mathematicians value elegant analysis, and are not fans of tedious analysis, which would include ploughing through long algebraic formulae. And in a move that will offend any statistician readers, I am in the camp of mathematicians that does not find statistics elegant. In particular, most discussions of PCA analysis that I have seen were written by statisticians, and fall under the "tedious" end of the mathematical spectrum.

A normal human being would ask: why do we care about elegant mathematics? The reason why is straightforward: elegance proofs provide an intuition about what we are doing. In the case of PCA analysis, it is very easy to get bogged down in details, and miss the bigger picture.

Statisticians may have an interest in all of the PCA factors, and so they need to worry about all the details. However, for the applications of interest for fixed income/macro analysis, we only care about the first PCA factor. So we can largely ignore most of the argle-bargle, and just focus on that factor.

From a mathematical perspective (which I may or may not stick into the book), we can just look at the procedure to find the first factor, and ignore the rest of the decomposition. We then run into two practical problems:
  1. Does this vector explain a significant portion of the variance of the variables? If not, then the procedure offers little value.
  2. How sensitive to the data is the vector determined by this procedure? It is easy to imagine a situation where the procedure breaks down: we have a two dimensional space where any two vectors that define the space are candidates for the numerical procedure. If the procedure is designed to pick one, the algorithm output will be essentially arbitrary. Furthermore, small changes to the data set will result in wildly different outputs.
One could spend a great deal of time worrying about these issues. However, the simplest solution is to calculate the first PCA factor, and see what happens. For yield curve applications, I have found that the results are robust: yield movements tend to look like parallel shifts (outside of ZIRP environments), with the long end less volatile. The result is that the first factor is robust to data changes, and it explains most of the variance.

The real challenge is creating composite indicators using PCA analysis. However, the practical implications are straightforward -- if the composite indicator generated by the procedure stinks, it will have lousy statistical properties, and is probably not going to be numerically robust.

For a sensible user of the technique, this means we really do not have to worry too much about the details of PCA analysis. You are picking a bunch of variables, transforming them in a fashion so that they can be fit into the PCA analysis, and then examine the output.

The challenge is for users who have want to treat econometrics like it is a settled science. They are following cookbook recipes to use PCA analysis in some optimal fashion, and then the output is "the solution." The advantage of following recipes is that the final result appears to be highly impartial, like physics. This is useful if one works in an environment like a central bank, where economics is supposed to be a science, not an art. However, if one is in industry, one should be more aware that analytical judgement matters.

In what sense does judgement matter? We can always pick some economic variables and calculate a first principal component for a historical time period. The mathematics will tell us all about the statistical properties of the variations of the variables during that time period. However, there is no reason to believe that the first principal component will be stable over time. Furthermore, we have only limited guidance as to what variables should be included. For example, we can imagine a mathematical economic model where two groups of variables will each move together, and hence there will be a single summary factor for each. If researchers lump them all together into a single vector, and the two factors are correlated, it will look like a single factor explains all of them. This will break down as soon the two true underlying factors stop being correlated.

Concluding Remarks

This article provided an introduction to the intuition behind PCA analysis. I expect that I will continue this series with a discussion of the techniques behind the creation of aggregate indicators, and possibly an analysis of the effects of the historical regime on calculated PCA factors for yield curves.




Footnotes:

* By convention, vectors are stacked vertically, so there should be a "transpose" operator on the vector in the text. This was ignored as we would need to bring in mathematical typesetting tools to do the job properly, and that would be akin to smashing a walnut with a sledge hammer.

** The period 1995-2000 was not arbitrary. The PCA factor shown is typical for periods when we are far away from zero rates. Conversely, in a ZIRP environment (such as in Japan, and post-Crisis USD curves), the front end is pinned near zero, and the long end moves up and down. In such an environment, the first PCA factor resembles a slope factor, since pure parallel movements of the curve are unusual.

(c) Brian Romanchuk 2018

1 comment:

Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.