Pages

Wednesday, May 22, 2019

Avoiding Problems When Drawing Recession Bars With ggplot

Figure: Canadian Real GDP Growth

I discovered an interesting "feature" with ggplot - the  R language plotting package when it came to drawing recession bars (as in the figure above). You need to be very careful with how you start building the chart graphic object, as I explain below.

(Note: Courtesy of hunting down these formatting issues,  I will only be publishing this programming hint rather than a regular blog article today. I should be be back on the weekend with macro content. Although I would prefer to get back to the economic content, getting the recession bars formatted right is somewhat important for a book on recessions!)

I am not going to post the code that does the above plot, rather I want to just tell you what to look out for. (The code is found in the "legacy/R" directory of my platform package, but it's too convoluted to post here, as I want to focus on what causes the issues.)

The wrong way to start off a recession bar plot is to supply a data frame (in this case, series.df) to the initial ggplot call:
pp <- ggplot(series.df) + geom_line(aes(x=...))
The right way is to start off with a bare ggplot call:
pp <- ggplot() + geom_line(data=series.df, ...)
The reason is that you need to pass the plot object (pp)  another data frame that has the recession dates. This apparently causes conflicts when you initialise the ggplot() with another data frame. (It may be that this is documented somewhere, but based on a comment I ran into on Stack Exchange (I lost the page where I saw this), this was not desired behaviour.

Layering

The other thing to do is to build the ggplot in a good layer order.
  1. Watermark first.
  2. Recession bars next.
  3. Last, the time series lines.
(This means that you do not want to invoke the geom_line() right after starting the plot like in my code snippet above, rather, it will be near the end of the process.)

My previous code did not do that, mainly because the instability created by not starting with a bare ggplot() call meant that inexplicable errors were thrown whenever I tried adjusting the recession bar plotting routine.

Platform Update

My plan was to write about the spatial issues with recessions -- some regions or sectors can be in recession, while the aggregate economy avoids recession. I wanted to dig into Canadian provincial data, as regional divergences are marked. However, this meant that I needed to get a handle on the metadata describing some of the large data sets, so I spent the past couple of days getting the Statistics Canada metadata being dealt with cleanly. That work was largely done, so the platform has pretty good support for detailed analysis of Canadian data.

My hope that this puts me in a better position to focus once again on writing my book, as I now have the data I needed to beef up sections that needed data to point to.

Postscript: An Excel Tip!

As part of my work, I discovered an amazing "feature" in Excel: if you give a tab-delimited file the "CSV" extension, Excel can mangle the data when you do the "text-to-columns" operation.

So, to avoid that problem, I highly recommend installing Open Office.

(c) Brian Romanchuk 2019

No comments:

Post a Comment

Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.