I've made a push to put my platform into a state where it could conceivably be used by other people. It is a long way from feature-complete, but it has now reached a state where it is clear how the code works. I've worked on adding the storage of series meta-data, and I used the interest rate data sets from the Reserve Bank of Australia (RBA) as an example.
I have downloaded Excel spreadsheets (both current and historical) from the RBA statistics site: https://www.rba.gov.au/statistics/tables/. (Some data are available at DB.nomics, but they are missing the back-history. I left DB.nomics a note on their forum.)
There is a wide variety of data available at the RBA, but I stuck to interest rate related ones. The recent RBA format largely matches the format used by the Australian Bureau of statistics, and my import routines might work on the other spreadsheets without changes. However, this was not true for historical data: I had to add fixes for data format changes in the old spread sheets.
I tested:
- Table F1: Money markets.
- Table F2: Government bond markets.
- Table F16: Indicative mid yields on government securities.
(The chart above is an index-linked bond yield from Table F16: the ACGB 4% of 08/2020. One may note the lack of mean reversion in real yields...)
Since I do not want test code to bombard the RBA with a massive number of download requests, I download the spreadsheets manually.
I downloaded a selection of current and historical spreadsheets into the "RBA_XLS" import directory. (The exact location is user-configurable.) I may have had to rename one of the spreadsheets from the default used by the RBA since it looked like two different files had the same name.
The screenshot above shows part of the F16 bond level spreadsheet ("f16.xls"). From the perspective of a user of my platform, the key piece of data is the Series ID: this is the unique identifier used by the "RBA Excel" data provider to pin down specific series.
F16 Spreadsheet from RBA |
Requesting any of the series triggers a search of the directory of spreadsheets, and the whole shebang is imported into a SQLite database. (This is a SQL database that runs locally on the computer. The SQLite interface is installed with modern Python versions.)
The particular series searched for is returned, but the database is stocked with all the successfully parsed series. Later requests for series will be served directly from the SQLite database.
It is now possible to examine the series information in the SQLite database. The above screenshot shows a view of series meta-data from within the (free) DB Browser software for SQLite.
Next Steps
I have been in contact with a couple of people who are interested in collaborating; I welcome more inquiries. The code has now reached a point where it is robust, and clear where my design is going. The framework is extremely flexible, and so it easy to make incremental improvements without worrying with integration with the evolving code base.
I am now able to return to my writing flow. I needed to integrate foreign data sources into my framework for my recession book, and I now have a clean way to do this. (Even the Canadian statistical agency overhauled their data tables, so I have to look up data mappings for Canadian data again.)
I expect that I will return to my book manuscript, and see where I can incorporate foreign data sections. As such, my blog articles will tend to jump around over the coming weeks...
Once again: the code is available at https://github.com/brianr747/platform
(c) Brian Romanchuk 2019
This comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete