Research Notes: The Mann-Whitney U Test
Research notes and Python code for calculating a one-sided statistic. Lot's of technical jargon in this one but its good information. So, don't skip it.
Today's post is all about the Mann-Whitney U Test, but before we get into the notes, I want to touch on the research/posting methodology I am currently experimenting with. I am messing around with a new way to structure my notes and blog posts. The current working idea is this:
Orient each month's research around the strategy you want to try this month.
Structure research around each aspect of the strategy. This includes researching and developing the indicators we want to test and use and any statistical calculations we need during testing.
Publish articles and develop code for each unique part of the strategy, updating the strategy project incrementally as you progress.
The finished strategy project will culminate the techniques and research conducted throughout the month.
For instance, this month, I'm finalizing the feature tests for our indicators. Subsequently, I'll delve into researching and developing three popular indicators that I plan to incorporate into our strategy by the end of the month. Each indicator will have its own publication and will be tailored specifically for automated trade systems, potentially calculating something different from what you're accustomed to seeing in your favorite charting software. Once the indicators are ready, I'll publish a research article, outlining the methods I'm using to determine the most effective way to use our indicators in a strategy. Finally, after completing the research, I'll develop the strategy, test it with the Lean backtesting engine, and share the results.
The essence of this new workflow is to establish a routine for strategy testing and then progressively build on it month by month as we venture into more complex research and strategy development. There's a wealth of information to explore in this field, and my aim is to introduce new methods and techniques as we go. Rather than spending excessive time perfecting a research methodology, I believe in the power of practical application for strategy development. My aspiration is that each month, we'll gain more insights, optimize our process for strategy research, and apply the methods we learn along the way.
Now that that is out of the way, let's look at some statistical math and Python code.
Disclaimer: the following post is an organized representation of my research and project notes. It doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.
Mann-Whitney U Test
The Mann-Whitney U-test is a nonparametric statistical test used to compare two independent samples without assuming they follow a normal distribution. Developed by Mann and Whitney in 1947, this test can be used in place of the standard t-test when the samples don't meet the normality assumption, a crucial assumption for the t-test, or when dealing with ordinal data. Instead of measuring the mean between the two samples (like the t-test), the U-test ranks the samples and then performs a calculation around the rank sums to determine the hypothesis.
Assumptions
Independence: The samples from each group must be independent of each other.
Ordinal or Continuous Data: The data should be ordinal or continuous, not nominal.
Shape of the Distributions: The test assumes that the distributions of both groups are similarly shaped, allowing for a shift in location.
Hypotheses
Null Hypothesis (H0): The distributions of both groups are equal.
Alternative Hypothesis (H1): The distributions of the two groups differ by a location shift (one distribution is consistently shifted to the left or right of the other).
Rankings and Tied Rankings
Each data point from the combined datasets of both groups is ranked. This ranking process involves sorting the data from the lowest to the highest value and assigning each data point a rank based on its position in the sorted list. When two or more data points have the same value (ties), they receive the average rank of their positions. This prevents artificially inflating or deflating the sum of ranks for any group and is crucial for the accuracy of the test.
ranks = np.empty(n1 + n2, dtype=np.float64)
tie_correction = 0.0
i = 0
while i < n1 + n2:
start = i
while i < n1 + n2 - 1 and sorted_combined[i] == sorted_combined[i + 1]:
i += 1
i += 1
ntied = i - start
tie_correction += ntied ** 3 - ntied
for j in range(start, i):
ranks[j] = start + (i - start - 1) / 2.0 + 1
Compute the U Statistic
In this test, values are calculated using the rank sums of the perspective groups. In a two-sided test, this would be done for both groups, and the smaller of the values would be returned as the answer. In our test, we are only looking to calculate a one-sided test, so we only need to perform this calculation on the target set.
The equation for calculating look like:
Where $n_1$ and $n_2$ are the sizes of the two samples and $R_1$ is the sum of the ranks in the first sample.
Since this is a one-tailed test, we are only summing the ranks for group 1 (the group for which we are calculating the U statistic). We calculate the U statistic in Python with the following code:
R = np.sum(ranks[sorted_group == 1])
U = n1 * n2 + 0.5 * (n1 * (n1 +1.0)) - R
Statistical Significance and Z-Distribution
The U statistic is computed based on the ranks of the data from one of the groups. To determine the statistical significance of the U statistic, it is compared against a z-distribution:
Z-Score Calculation: The z-score is derived from the U statistic and adjusted for ties and the total number of comparisons. The z-score indicates how many standard deviations the observed statistic is from the mean under the null hypothesis.
dn = float(n1 + n2)
term1 = n1 * n2 / (dn * (dn - 1.0))
term2 = (dn**3 - dn - tie_correction) / 12.0
z = (0.5 * n1 * n2 - U_adjusted) / np.sqrt(term1 * term2)
Interpretation: A large absolute value of z (typically greater than 1.96 or less than -1.96 for a 95% confidence level) suggests rejecting the null hypothesis, indicating a statistically significant difference between the group distributions.
Complete Python Code
This is the completed Mann-Whitney U-test in Python. Ir returns a based on set 1 relative to set 2, making it a one-tailed test. The value will be small when the mean of set 1 is greater than the mean of set 2. It uses Numba's JIT compiler for faster calculations.
The rest of this post is for paid subscribers. Paid subscribers will have access to the private GitHub where all the Hunt Gather Code and strategies live. It is continuously being worked on and will expand as we explore the world of automated trade systems.