Testing Indicator Soundness for Automated Trade Systems, Part 3
The final chapter of our series might mark the end of this quest, but I've got to wonder... was this all just the tutorial level?
The last installment of our indicator testing series is here. Each piece of this series has been slightly more complicated than its predecessor, and this one is no different. We don't have any math equations to cover in this one, but the code was more difficult to reverse engineer and understand. While the concepts in this post may be a bit more advanced, we don't need a STEM degree to learn about them. I do have to note that it certainly wouldn't hurt. These challenges make the hunt more enjoyable.
If you haven't read the previous posts, please check them out below. The post on the U-statistic is directly associated with this post, as we use the function created there to perform the test we are discussing today.
Before we move on to the subject at hand, I want to thank everyone who subscribes (free and paid) to my newsletter. I know that I have been off lately and I want to thank y'all for sticking around and continuing to subscribe. I wanted to let everyone know that I am working hard to try and bring valuable information and insights to the community.
Right now, I am in a building phase. I am trying to develop a workflow that allows me to conduct research and write about it along the way. That being said, I have decided that I am going to document the outcomes of my research regardless of their performance in testing. That means that win or lose, each month I am going to post the entire process of building a researching a trade strategy. Along the way, I will make improvements and additions to our process as I learn about them.
The code base in the GitHub repo is growing. My goal is to make sure that each thing I create can work as a standalone function in Python by relying on NumPy arrays and Numba's JIT compiler for computational speed. That means that the code is platform agnostic and can be adapted to use in your own workflow/research process. The goal is to create the functionality that we want without having to rely on external libraries as much.
Lastly, I want to give a broad overview about what I am thinking and where I think this is heading. If the last few posts haven't given it away, we are starting to look at strategy research and development the way a quantitative researcher would (loosely stated, no offense to the real quants out there). We are starting to see how prepping our data and measuring statistics can help give us an edge when conducting strategy research. This is the beginning steps we need to take if we want to start exploring the application of machine learning or predictive models in our strategies. There is much more to discuss before we get there, but that is where I see this heading in the future.
The prices for paid subscriptions has been lowered to $19 a month. If you want access to the paid articles and the GitHub repo, and if you want to help support a SAHD on a mission to make financial market analysis and trade strategy development techniques more accessible, sign up today.
Disclaimer: the following post is an organized representation of my research and project notes. It doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.
Nonstationary Mean
The purpose of a mean break test is check whether series of data (indicator or market data in this case) has a break in it's mean, or has a nonstationary mean, over a certain period of time. There are a few different types of break in the mean that can happen in a series of data:
A sudden break in mean. These types of break can be seen in during historical events such as 1987's "Black Monday", the 2008 financial crisis, and any other periods where there is a sharp change in the underlying mean of a market instrument.
Slow, unidirectional changes or monotonic shifts. The mean slowly drifts in one direction over time, causing a slow change in the mean.
The mean may drift either direction over time.
The last type of change is the most difficult to detect. An inherent problem with testing time series data is the varying time window the data is tested in. It is entirely possible that an indicator is stationary over a long period of time, but it is reflecting a slow moving varying property of the market (such as price data). This makes it difficult to determine if a property drift is truly nonstationary and is a good reason to perform a visual inspection of the series.
The test we are going to discuss below is designed to test for the first two types of mean break.
Note: There are two variations to this test. We are only going to discuss the variation that accounts for serial-correlated data, which we will need for testing indicators.
A Serial-Correlation Mean-Break Test
This test utilizes a variation of the Mann-Whitney U-Test across different segments of the time series, modified to account for serial correlation. All concepts were taken from reference 1 (below)
The series is divided into two segments at each potential break point (nrecent
). This division is performed cyclically with varying offsets to account for serial correlation. The min_recent
and max_recent
parameters define the range within which potential breaks in the mean are analyzed. min_recent
sets the minimum number of observations required on each side of a break point to ensure reliable statistical testing, preventing overly sensitive results due to small sample sizes. max_recent
establishes the maximum boundary for investigating potential breaks, allowing the test to capture significant shifts without diluting the effects of localized changes by including too much stable data. When setting these parameters, consider the length of your data set, typical behavior patterns of the indicators being analyzed, and the desired balance between detecting rapid changes and maintaining statistical robustness. Adjusting these values appropriately ensures that the test is sensitive enough to detect real changes while minimizing the risk of false positives.
The lag (correlation lag) used in this test is necessary as a function of testing serial-correlated data. It is defined as the interval over which serial correlation is significant. If a lag time of zero is set, the test is performing without considering any serial dependencies, which we will almost certainly have when dealing with financial market/time series data. For a financial indicator calculated using a moving window, this lag typically corresponds to the window size. Each segment comparison only includes data points that are separated by at least this lag, ensuring that the comparisons are between statistically independent observations.
To robustly account for dependencies, the test is performed for multiple offsets from 0 up to lag-1
. This approach ensures that every potential independent comparison is considered, enhancing the test's ability to more accurately detect mean breaks.
For most situations involving time-series data, cyclic permutation is used instead of totally random permutation. See Batch Cyclic Permutation for further information. However, when we conduct the mean break test, we want use a true randomization method instead. If we used cyclic permutation, we would just be shifting the location of the mean break, rendering the generated null hypothesis distribution worthless for testing purposes.
For each offset, the U-statistic is calculated for all trial boundaries within the specified range (min_recent
to max_recent
). The maximum U-statistic across all trials and offsets is used as the final test statistic.
The solo p-value requires calculating how often the observed maximum criterion (or a more extreme value) occurs under a null hypothesis simulated via permutation tests.
To compute the unbiased p-value, we must consider the distribution of the maximum criterion across all variables (indicators) for each permutation. The unbiased p-value indicates how often the maximum observed statistic for any variable under the null hypothesis (from permutations) exceeds the maximum observed statistic from the original data across all variables.
Code
The first block of code is the core of the serial-correlated mean break test. It uses the u_test
function to check a serial-correlated data series for a break in its mean. It is based on the methods found in Timothy Masters' "Statistically Sound Indicators for Financial Market Prediction".
The rest of this post is for paid subscribers. Paid subscribers will have access to the private GitHub where all the Hunt Gather Code and strategies live. It is continuously being worked on and will expand as we explore the world of automated trade systems.