At the end of this article, a R language usage example is given.
"Plaform Project"
Platform to download, manipulate and plot data. Mainly Python, with some R. Built around a MySQL database (eventually).(Note: if this is every to be a serious project, it needs a better package name. Since all my code is in a single PyCharm project, I can refactor the name easily, but users need to know that the package name may shift.)
Python 3 only. I work on Python 3.7, and no idea about backward compatibility issues.
This repository is mainly for my own use for now; I am transferring my platform to a new computer, and decided to do a clean up at the same time. In particular, the MySQL interface is not implemented.
Features
Please note that all features listed are minimal implementations. However, they offer an ideaof where this project is going.
Providers
"Single series" interface only; there will be table query support (for pandasdmx).
- "User Series": Series that are calculated by Python code dynamically on request.
- DB.nomics
- FRED (St. Louis Fed)
- CANSIM manual downloaded table (CSV) parsing.
- Quandl. (Update: done in 20 minutes!)
- TEXT: Save each series as a text file in a local directory. This is good enough for a casual user, and is useful for debugging and unit testing.
Dynamic Loading
Series are loaded with a single fetch() command that sits between user code and the underlying API's. This means that there is only a single command for most end users to worry about.
The fetch() command is dynamic; the ticker is a combination of the Provider and the Provider-specific ticker. (Once SQL database support is added, users can configure friendly local tickers that are mapped automatically to the clunkier provider tickers.)
- If the requested series does not exist, the system goes to the Provider and fetches it. (This is either an API call, or parsing a downloaded table, as in the CANSIM_CSV interface.)
- If the series exists on the database, that series is returned (at the minimum.)
- If the series is USER series, the appropriate Python code module (registered by the user) will calculate the series in the same way as an external provider API.
- TODO: If the series exists, an "update protocol" will be implemented. (Right now, if you want to update a series, you delete the text file...)
Other Languages
- R can directly trigger the Python dynamic fetching code, using reticulate. An example is given [see below]. (I have terrible legacy code in a side folder.)
Programming Support
- Very clean interface to the logging module; you just call start_log() and a log file based on the module name is created in a (configurable) log directory.
- Highly configurable. Although hard-coded behaviour exists, it is possible to use text configuration files to modify behaviour. Built with the configparser module. The user is encouraged to create a config file outside the repository so that it does not collide with GIT.
- Automatic extension support. Other than minimal features that should always work, providers/databases are loaded as extensions. If the user is missing the appropriate API modules, the extension load will fail - but the rest of the platform is functional. (For example, you need an API key for the St. Louis FRED interface for Python.)
- Users can use the extension interface to monkey-patch the platform, so that any parts of the code that are horrifying can be replaced.
- Unfortunately, very little in the way of unit tests (although some exist). The most important code here are API calls and database interface, which are painful for unit testing. If I'm smart, I will retro-fit end-to-end tests over the code base, but my highest priority was to get something functional for my use quickly.
- Since fetch() returns a pandas Series object, they can leverage the features of pandas. (Disclaimer: I am not too familiar with pandas, and so my pandas code may stink.) Currently, the main advantage for analysts is fetch() and the ease of configuration/logging.
- quick_plot(): One line plotting of series. (Although there is a Series.Plot() method, I still need to call matplotlib.pyplot.show() to see it.) Eventually, look-and-feel will be configurable with config files.
On the analytics side, I might try to interface with my simple_pricers module, or if I am ambitious, quantlib. Needless to say, my focus would be on fixed income calculations (e.g., calculating total returns from par coupon data, forward rate approximations, etc.)
Comments
With a fair amount of effort, users could replicate my examples. However, there will be example wrappers for various web interfaces which may make things slightly simpler to work with.I may write a book on research platform construction; at which point this repository would be beefed up.
My plotting code is in R. I am not that great a R programmer, and my code is a mess. I expect that I will migrate to a pure Python platform at some point. However, the R code is an example of how we an use a SQL database as the interface point between different languages.
There may very well be a similar package out there; I did not even bother looking. If there is a similar project, I guess I could switch over to working wit it. However, since most of what I am doing is small wrappers on top of fairly standard libraries, not seeing that as worthwhile. I am not too familiar with pandas, and it looks like most of what I would be doing is within pandas already. Instead, the effort is getting everything wrapped into a high productivity environment.
More Information
Plans.txt has some thoughts on my plans for this project.EarlyAdopters.txt has some comments aimed at anyone who wants to use this package.
R Example
The chart above was created in R. (Note: the plot formatting is messed up; apparently, there's been some changes to the R plotting library I use that broke my chart format.) Done with minimal code. Note that R is calling the Python package, and so if the underlying series was not already downloaded, it would have been automatically fetched. (In the future, the loading code would decide whether to dynamically update the series as well.) The R wrapper function I use invokes logging, so you can see what is happening (is the series fetched from the "database" versus doing a provider API call, etc.).
The code:
The log file contents:
2019-05-05 10:01:45 DEBUG Successful Extension Initialisation
2019-05-05 10:01:45 DEBUG CANSIM (CSV)
2019-05-05 10:01:45 DEBUG DB.nomics
2019-05-05 10:01:45 DEBUG FRED (St. Louis Fed)
2019-05-05 10:01:45 DEBUG Fetching D@Eurostat/namq_10_gdp/Q.CP_MEUR.SCA.B1GQ.EL from TEXT
Note that since the series was already downloaded, the series was fetched from the TEXT database. Also, the CANSIM, DB.nomics, and FRED modules are coded as extensions which were imported successfully. (This way you can see even from within R whether something went wrong during package initialisation.)
(c) Brian Romanchuk 2019
This is awesome. Congratulations and keep up the good work!
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete