Testing Indicator Soundness for Automated Trade Systems, Part 2
We're gonna take a look at mutual information. There are words like "stationarity", "permutation", and "discretization". There is even some code, broh.
In the last post, we discussed a few simple statistics to kick-start our exploration into the mathematical validation of financial market indicators. We touched on the Interquartile Range (IQR), the Range/IQR ratio, and relative entropy. If you haven't already read this post, you can check it out below:
Today, we will cover the concept of stationarity, why it is important to indicators and automated trade systems, and how to calculate the mutual information between a predictor (indicator) and the target variable. There will be some math (isn't math great?) and some Python coding. At the end, we will have a working Python algorithm for calculating mutual information with permutation tests that use JIT (just-in-time compiler) and parallelization to make the operations fast and buttery.
Like the last article, this article pulls heavily from "Statistically Sound Indicators for Financial Market Prediction" by Timothy Masters. I want to reiterate that I am not a statistician/mathematician. I am just a dude attempting to find tools to help improve trading strategy testing and research. Many of these concepts are brand new to me, and like everything else I do, I intend to go backward and break them down and learn about them as necessary.
Disclaimer: the following post is an organized representation of my research and project notes. It doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.
Stationarity
In time-series analysis, a time-series object (such as a financial indicator) is said to be stationary (or to have stationarity) if both the statistical properties of the observations and their collective associations remain unaltered over time. If a historical time-series object representing a financial indicator isn't constant over time, then you can't confidently assume its movements based on previous movements. To be considered stationary, the time-series mean, variance, and auto-correlation must remain constant. For trading purposes, this means a flat-looking series without a trend that keeps a constant variance over time without seasonality (periodic or predictable fluctuations, patterns).
Understanding Mutual Information with Time
Mutual Information (MI) is a non-linear statistic used in time-series analysis to measure the mutual dependence between two variables. This characteristic makes it invaluable for financial markets where variables often interact in non-linear, complex ways. MI provides a deeper insight into the hidden patterns that may not be apparent through traditional linear analysis.
Math
The mutual information between two variables is calculated using the formula:
where $p(x,y)$ is the joint probability distribution function of $X$ and $Y$, and $p(x)$ and $p(y)$ are the marginal probability distributions of $X$ and $Y$, respectively. This measure is derived from information theory, particularly the work of Claude Shannon in his seminal 1948 paper, which discusses the mutual dependence of variables and their information content.
Applying MI to Detect Nonstationarity
In financial markets, assessing the stationarity of time-series data is crucial. MI can be particularly telling as a test for nonstationarity by revealing how dependencies between variables change over time. Significant variations in MI at different time intervals could suggest potential nonstationarity in the data, indicating that the time series does not have constant statistical properties over time.
Statistical Significance and Permutation Testing
Permutation tests should be used to validate the significance of MI calculations. This statistical technique randomly shuffles time series data to create a distribution of MI scores under the null hypothesis of no relationship between the series. The outcomes of these tests help determine whether observed MI scores are likely due to chance or actual significant dependencies.
Code
There will be blocks of code in this section. The first will be a Numba (JIT) powered MI calculation, and the second will be a function to create a report (DataFrame) with the MI calculation and the solo/unbiased p-values.
The rest of this post is for paid subscribers. Paid subscribers will have access to the private GitHub where all the Hunt Gather Code and strategies live. It is continuously being worked on and will expand as we explore the world of automated trade systems.