Testing Indicator Soundness for Automated Trade Systems, Part 1
The first part in a series about testing our indicators' capacity to carry information.
**Announcement, I am lowering the price for HGT subscriptions and adding in a discount for yearly subscriptions. I am also opening up the chat for all subscribers. More information will be posted to the chat soon.**
I have often wondered whether or not the information an indicator gives me is reliable. Seems like a legitimate pondering, but how the hell are you supposed to test an indicator to see if it holds value? There is always the eyeball test, but is there any other way to see if an indicator has some chops? Should we just keep tweaking a trade system around an indicator to see if it produces better backtest results?
These questions have haunted me since I started this project. Since the beginning, I have mentioned building a base from which to develop strategies. My idea was to have a suite of tools that could be used to test strategy theory and validity in a manner that gives us confident results.
But, how do we develop confidence in our indicators and our strategies?
Math.
That's right, math. Turns out, I'm not the only person who has wondered this, and, luckily, some of the people who did were mathematicians, engineers, statisticians, physicists, or any other flavor of academic that I am not. This is good news for me. While I may not be an academic, I do know how to read and the general concepts of math aren't lost on me. I'm also well aware that the manual methods of trading a day trader uses are different from the methods deployed by quants and quant firms.
Good job, Larry. You discovered something everyone else already knows. On top of that, you are really oversimplifying the "math" concept.
You're right, other Larry. I am late to the game, and it's possible that I am oversimplifying the type of math that is used to quantify indicator and strategy value. How simple it is depends on your personal relationship with math. As a paramedic, proficiency in simple math is necessary to be efficient. No one wants to hear their medic having to look up dosages for paralytics or sedatives right before they are about to be intubated. This may not be much of a concern if you are using ketamine (amnesic qualities), but it's a bad look. Being able to calculate weight-based (and sometimes time-based) medication dosages on the fly is a must for any high-performing medic.
Unfortunately, the math involved in quantitative analysis/trading isn't that simple. It is unlikely that someone can look at indicator values and calculate its relative entropy on the fly. This doesn't mean that it's hard, though. Like everything else, math is something we can learn and apply without having to get a PhD. Leave the heavy lifting to the academics; what we are concerned with is learning how to apply and use the concepts they come up with.
In the first installment of our series, we are going to discuss a few of the more basic statistics that we want to investigate when looking at indicator candidates. These calculations will give us a general idea of what indicators will require further investigation and testing. After the discussion of the calculations, we will create a Python script that takes in a set of features (indicators) and a target (price) and calculates a handful of basic statistics for us.
Disclaimer: the following post is an organized representation of my research and project notes. It doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.
Simple Statistics
Before we jump into the heavy lifting with Python and start crunching numbers, let's take a moment to demystify some of the statistical jargon we're about to encounter. Since I'm not a mathematician/statistician/credentialed anything (sans BAMF), I'm going to keep the math as painless as possible. We're going to focus on a few key concepts that are essential for dissecting our trading indicators: IQR (Interquartile Range), Range/IQR, and Relative Entropy. These terms won't impress anyone at a party, and if you were at a party where nerd-talk could impress someone, these terms would probably still land you squarely in boot (read green af) territory. So, grab a cup of coffee, and let's break down these concepts into something a bit more digestible. Feel free to make it Irish.
IQR
Think of the IQR as the middle spread of your data. It's like the cozy middle ground of your dataset, where the extremes (top 25% and bottom 25%) don't get to hang out. The IQR tells you about the variability of the central portion of your data. It's calculated by subtracting the first quartile (25th percentile) from the third quartile (75th percentile). Why do we care about this? Because it gives you a good sense of where the bulk of your data points lie, minus the outliers throwing tantrums on the ends. The equation for the IQR looks like this:
Range/IQR
Now, taking the concept of IQR further, we blend it with the total range of our data to get a relative sense of dispersion. The range, the difference between the maximum and minimum values, gives us the entire playground of our data points. Divide the range by the IQR, and you get a unitless measure that tells you how spread out your data is relative to its middle chunk. This can help you understand if your data is throwing a block party or a tight-knit gathering.
Entropy, Raw and Relative
First up is Raw Entropy, a concept that might remind you of the thrill of reading a (good) mystery novel for the first time. Each page turn (or outcome, in our case) carries a certain unpredictability, which is what entropy measures. Mathematically, it's represented as:
Imagine you have a bag of colored balls, and each color represents a different outcome. The probability of drawing each color is known. Raw entropy calculates the "surprise" or uncertainty of drawing each color. The higher the entropy, the more unpredictable the outcome. It's like saying that the more diverse the colors (outcomes) and the more evenly distributed they are, the more exciting and unpredictable the game becomes.
Now, let's say you have two bags of colored balls, and you're curious about how similar they are in terms of color distribution. This is where Relative Entropy comes into play. It compares the distribution of colors (outcomes) between the two bags, giving us a measure of their divergence. In essence, it tells us how much one distribution (A) deviates from another (B), allowing us to gauge the uniqueness and predictability of the information presented by each distribution.
Code
The rest of this post is for paid subscribers. Paid subscribers will have access to the private GitHub where all the Hunt Gather Code and strategies live. It is continuously being worked on and will expand as we explore the world of automated trade systems.