Getting familiar with the LEAN Engine and QuantConnect Research
What are we talking about? Practice? We talking about practice, man! - Allen Iverson
I'll be honest. I didn't know that Iverson was the originator of the practice rant until recently. I only knew it from one of the many great scenes in "Ted Lasso." I find it funny that in the show, Ted uses the practice rant to shame a player for missing practice, but the original rant is slanted towards the game being more important. So, which is it? Practice or the game?
I recently listened to the Huberman Lab podcast with guest Dr. Cal Newport. Among the many things discussed in this episode (distraction, focus, cognitive improvement, to name a few) one of the reoccurring themes was the difference between the states of mind required for performance and practice. Dr. Newport believes that to get better at something, you must practice doing it in a manner that makes you uncomfortable. This means pushing yourself beyond your current comfort levels during practice. Practice, according to Dr. Newport, is not fun.
This makes sense. When I was young, it was precisely this that separated the good from those with real potential. I saw it in swimming, wrestling, karate, and eventually skateboarding. The people who were truly good pushed themselves when they practiced. The skateboarders who flourished risked eating concrete on a regular basis to hone their craft. I had a friend who practiced for hours a day and even had a staircase built in his backyard to practice gap tricks and handrails.
I never had that drive when I was younger. I was cursed with the ability to learn new skills rapidly, but I was also inherently lazy. Once I fell behind in something, I tended to quit. For example, I won a wrestling tournament once. The next match I wrestled in was many months later, and the very first match was against the same wrestler I had beaten in the last tournament. I got fucking destroyed. Pinned in the second round. There may have even been some crying.
Anyway, I didn't know how to practice to get better. I relied on talent and luck. It would take a good psychologist and many paid hours to unpack why I missed this lesson when I was younger, but that doesn't matter now. Now, I know that practice is everything and that there is almost nothing worth doing that doesn’t require practice.
Coding is no exception.
I believe that practice is just as important as the game. They don't exist without each other. In the world of coding, the game is creating code that functions and provides value, which is what today's post is about...
Practice.
In the last post, I said I would make an OS change and then jump into the LEAN/QuantConnect (QC) documentation to learn to use their system. Since then, I have switched from Windows to Linux, specifically to Pop!OS (Ubuntu-based). I don't regret it. I even discovered that MotiveWave has a native Linux application, probably due to it being written in Java. I'll experiment with that as a trade execution platform for my manual trading.
After a successful OS swap, I turned my focus to learning how to use QC and the LEAN engine (is it supposed to be in call caps?). I am not a professional developer, so I threw myself at this the only way I know how. I learn best when I am taking action, and I know that I am going to need to get some reps in.
There is a lot to go over regarding the QuantConnect cloud platform. This platform is geared towards people with knowledge in coding and provides many different cloud-based services for researchers and trade system developers. I could probably create an entire series just covering the cloud platform. Luckily, another Substack author and trade researcher has already done that at the B/O Trading Blog. The series covers everything from setting up an account to learning your way around the code. If you want to get started using the QC platform, this is a great series to read. The first article of the series is linked below.
The first thing I did was install the LEAN CLI tool and get set to start working on development locally. One of the things that most excited me was using the Jupyter research notebooks to learn how to use Python (and certain Python libraries) more efficiently. Unfortunately, it seems that it is impossible to connect a local notebook to the remote kernel. This means you have to use local data to play with the local research notebook. Bummer.
I am going to briefly cover how I setup my environment to work with Python code, then I am going to go over a few examples of from my practice efforts to discuss some of what I have learned in this process like fetching historical data, resampling data, charting candlesticks and indicators (overlay and subplot), and even a little teaser code that sparked and idea for the next few articles that I will be publishing.
Disclaimer: the following post is an organized representation of my research and project notes. It doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.
Python Environment
There are a shit ton of tutorials on the web about setting up a local Python development environment. In the end, most of these setups are subjective to the developer. For example, I like to do everything from inside my terminal. This means I use a text editor like Neovim, and I use other CLI (command line interface) tools to make this process more efficient/intuitive for myself. Just because I like this process doesn't mean you will.
That being said, I will just list the specific tools I have installed and how to install them or point to documentation. I have installed all of these for use on the command line, but many of them have GUI (graphical user interface) options. Also, if you don't care about working in a local environment, you can code your QC projects entirely in the browser.
Install Docker
Follow the instructions at the Docker Website. Post install, you can use the instructions at Linux Post Install to remove the need to use `sudo` every time you want to run a docker command. If you prefer using your mouse, you can install the Docker Desktop application.
Install Miniconda/Anaconda
Follow the instructions at the Anaconda Site or run the following commands from your terminal. Anaconda is the GUI application if you prefer to go that route.
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
Afterwards, initialize the newly-installed mini-conda into bash/zsh, whichever you use.
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
Conda Quick Start Commands
conda create --name lean-env python=3.12
conda activate lean-env
Install LEAN CLI
Before installing the LEAN CLI, make sure that you activate in the `lean-env` environment from above. This allows you to load all of the libraries/dependencies you need into the environment specific to working with QC and LEAN.
pip install lean
Login to QuantConnect with LEAN CLI for Cloud Services
You will need to request your User ID and API Key from QuantConnect before logging in on the LEAN CLI. Once received, run lean login
and paste in the information when prompted.
Create new organization workspace. This command needs to be ran in an empty directory and will act as the root folder for the lean workspace. When initializing, you can set the default language for projects within that workspace.
lean init --language python
Create new QC project
lean project-create "My Python Project"
Set default language
lean config set default-language python
lean project-create "My Python Project"
Practice
Once I learned that I didn't have access to the QC cloud kernel for a local research notebook, I switched to working with the coding environment QC has in the browser. This gave me cloud access to all sorts of data to play with. Here are some of the things I have learned so far.
Requesting Historical Data
We can't do anything without data. There are several different ways to request historical data in QC, each returning data in a different manner or amount. As a futures trader, I practiced getting data for the instrument that I trade. ES.
The first code block subscribes to an instrument and gets historical data for our charts. It sets the datetime index as the index for the charting libraries by dropping the first two indexes in the multi-index.
qb = QuantBook()
start_date = datetime(2021,1,1)
end_date = datetime(2024, 2, 29)
future = qb.AddFuture(Futures.Indices.SP500EMini, Resolution.Daily,
dataNormalizationMode = DataNormalizationMode.BackwardsRatio,
dataMappingMode = DataMappingMode.LastTradingDay,
contractDepthOffset = 0)
history = qb.History(future.Symbol, start_date, end_date,
Resolution.Daily, extendedMarketHours=True)
history.index = history.index.droplevel([0, 1])
This next code block is an experiment in creating a standardized approach to setting up a research environment by using an `instruments` object to store all the data frames and information needed for the instrument. The goal is to be able to add or remove assets manually in order to help access information for research purposes easily.
qb = QuantBook()
instruments = {
'ES': {
'symbol': Futures.Indices.SP500EMini,
'charts': {
'15min': {
'pandasTimeFrame': '15T',
'timespan': timedelta(days=30),
'dataframe': None,
'indicators': {}
},
'1H': {
'pandasTimeFrame': '60T',
'timespan': timedelta(days=90),
'dataframe': None,
'indicators': {
'FisherTransform': {
'data': None,
'overlay': False
}
}
},
'4H': {
'pandasTimeFrame': '240T',
'timespan': timedelta(days=365),
'dataframe': None,
'indicators': {}
},
'D': {
'pandasTimeFrame': 'D',
'timespan': timedelta(days=730),
'dataframe': None,
'indicators': {
'EMA100': {
'data': None,
'overlay': True
}
}
},
},
'contract': None # Placeholder for the subscription object
}
}
def aggregate_to_timeframe(df, timeframe):
aggregated_df = df.resample(timeframe).agg({
'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum'
})
return aggregated_df
for instrument_name, instrument_details in instruments.items():
symbol = instrument_details['symbol']
contract = qb.AddFuture(symbol, Resolution.Minute,
dataNormalizationMode = DataNormalizationMode.BackwardsRatio,
dataMappingMode = DataMappingMode.LastTradingDay,
contractDepthOffset = 0,
extendedMarketHours = True)
instrument_details['contract'] = contract
period = max(detail['timespan'] for detail in instrument_details['charts'].values())
history = qb.History(TradeBar, contract.Symbol, period, extendedMarketHours = True)
history.index = history.index.droplevel([0, 1])
for chart, details in instrument_details['charts'].items():
start_date = datetime.now() - details['timespan']
filtered_history = history[history.index.get_level_values('time') >= start_date]
if not filtered_history.empty:
timeframe = details['pandasTimeFrame']
aggregated_df = aggregate_to_timeframe(filtered_history, timeframe)
instruments[instrument_name]['charts'][chart]['dataframe'] = aggregated_df
else:
print(f"No data for {instrument_name} at {chart}")
# Example usage
es_15min_head = instruments['ES']['charts']['15min']['dataframe'].head()
print(f"15 Minute DataFrame Head:\n{es_15min_head}")
Creating Indicators
The first code block is a simple example of creating some indicators using the `Incidator()` helper method. I left the imports in the code as a reference, but you don't actually have to import them when using `qb`.
from QuantConnect.Indicators import FisherTransform
from QuantConnect.Indicators import ExponentialMovingAverage
ftPeriod = 10
fisherTransform = FisherTransform(ftPeriod)
ftData = qb.Indicator(fisherTransform, future.Symbol, historyBars, Resolution.Daily)
emaPeriod = 100
ema100 = ExponentialMovingAverage(emaPeriod)
emaData = qb.Indicator(ema100, future.Symbol, historyBars, Resolution.Daily)
print(emaData)
This next code extends the second example in the historical data section. It handles adding any indicators to the charts we need. We create a function for each indicator we want and ensure it returns the correctly formatted data frame for use in the add_indicators()
function. Since we are requesting minute data and building our bars from that data, we need to use the get_equivalant_bars()
function to calculate the number of minute bars in our historical data frame. Since most assets don't trade 24/7, this will return an integer larger than we need for our indicator data frame. We use the align_indicator_with_chart()
method to drop the rows we don't need from our indicator history and match them to the historical data.
The add_indicators()
method takes in the instruments
dictionary and iterates through each instrument and chart to determine whether or not we need to create an indicator. It checks this by checking the contents of the charts/indicators portion of the instrument object. To add an indicator, add the name you want in the appropriate charts/indicators list. Then, you would create a function to create the indicator and add an if
statement in add_indicators()
to check for the presence of an indicator name in the instrument object.
from QuantConnect.Indicators import FisherTransform, ExponentialMovingAverage
def calculate_fisher_transform(symbol, timespan, period=10, resolution=Resolution.Minute):
ft = FisherTransform(period)
print('Calculate fisher 1')
ft_data = qb.Indicator(ft, symbol, timespan, resolution)
print('Calculate fisher 2')
if not isinstance(ft_data.index, pd.DatetimeIndex):
ft_data.index = pd.to_datetime(ft_data.index)
return ft_data
def calculate_ema(symbol, timespan, period=100, resolution=Resolution.Minute):
ema = ExponentialMovingAverage(period)
print('Calculate ema 1')
ema_data = qb.Indicator(ema, symbol, timespan, resolution)
print('Calculate ema 2')
if not isinstance(ema_data.index, pd.DatetimeIndex):
ema_data.index = pd.to_datetime(ema_data.index)
return ema_data
def get_indicators(instrument_details):
data = {}
unique_indicators = set()
period = max(detail['timespan'] for detail in instrument_details['charts'].values())
timespan = int(period / timedelta(minutes=1))
for chart_name, chart_details in instrument_details['charts'].items():
unique_indicators.update(chart_details['indicators'].keys())
if not unique_indicators:
return data
print('Right before calculate indicator methods')
for indicator in unique_indicators:
if indicator == 'FisherTransform':
data['FisherTransform'] = calculate_fisher_transform(instrument_details['contract'].Symbol, timespan)
elif indicator == 'EMA100':
data['EMA100'] = calculate_ema(instrument_details['contract'].Symbol, timespan)
for indicator_df in data.values():
if not isinstance(indicator_df.index, pd.DatetimeIndex):
indicator_df.index = pd.to_datetime(indicator_df.index)
return data
def align_indicator_with_chart(indicator_df, chart_df):
start_date = chart_df.index.min()
end_date = chart_df.index.max()
aligned_indicator_df = indicator_df.loc[start_date:end_date]
return aligned_indicator_df
def add_indicators(instruments):
for instrument_name, instrument_details in instruments.items():
data = get_indicators(instrument_details)
for chart_name, chart_details in instrument_details['charts'].items():
pandas_freq = chart_details['pandasTimeFrame']
dataframe = chart_details['dataframe']
print("Inside chart loop")
if 'FisherTransform' in chart_details['indicators']:
ftData = data['FisherTransform']
ft_resampled = ftData.resample(pandas_freq).last().dropna()
alignedData = align_indicator_with_chart(ft_resampled, dataframe)
chart_details['indicators']['FisherTransform']['data'] = alignedData
if 'EMA100' in chart_details['indicators']:
emaData = data['EMA100']
ema_resampled = emaData.resample(pandas_freq).last().dropna()
alignedData = align_indicator_with_chart(ema_resampled, dataframe)
chart_details['indicators']['EMA100']['data'] = alignedData
add_indicators(instruments)
print(instruments['ES']['charts']['1H']['indicators']['FisherTransform']['data'].tail())
print(instruments['ES']['charts']['D']['indicators']['EMA100']['data'].head())
I am happy with the code above, but it takes a long time (40 seconds or more) to load. This is because the `Indicator()` helper function makes an API call to get historical information, calculate the indicator, and return a data frame with the information. In the repository, you will find (in the multiChart notebook) that I take this further and experiment with creating the indicators manually. This is much faster (since it uses the data we already have), but I haven't gotten completely comfortable with this method yet.
Creating Charts
I experimented with creating charts with a few different libraries. The following two examples build from the first examples in the previous sections. Both of them create a candlestick chart with an overlay indicator and a subplot indicator.
First up, Plotly. Plotly uses sub-graphs to attach the Fisher Transform below the candlestick chart.
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Define a figure with subplots: 2 rows, 1 column, and shared x-axes
fig = make_subplots(rows=2, cols=1, shared_xaxes=True,
vertical_spacing=0.02,
subplot_titles=('OHLC Candlestick Chart', 'Fisher Transform'),
row_heights=[0.7, 0.3])
# Add the candlestick chart to the first row
fig.add_trace(go.Candlestick(x=history.index,
open=history['open'],
high=history['high'],
low=history['low'],
close=history['close'],
name='Candlestick'),
row=1, col=1)
# Add the Fisher Transform line plot to the second row
fig.add_trace(go.Scatter(x=ftData.index, y=ftData['fishertransform'],
mode='lines', name='Fisher Transform'),
row=2, col=1)
# Update layout options as needed (e.g., titles, axis labels)
fig.update_layout(title='OHLC and Fisher Transform Analysis',
xaxis_title='Date',
yaxis_title='Price',
xaxis2_title='Date',
yaxis2_title='Fisher Transform',
xaxis_rangeslider_visible=False)
# Update x-axis options for better visualization
fig.update_xaxes(row=1, col=1, rangeslider_visible=False) # Hide the range slider for the candlestick chart
# Show the figure
fig.show()
Next, Bokeh. Bokeh doesn't use sub-plots/graphs. Instead, you plot them separately and can size them however you wish.
from bokeh.plotting import figure, show
from bokeh.models import BasicTicker, ColorBar, ColumnDataSource, LinearColorMapper
from bokeh.palettes import Category20c
from bokeh.transform import cumsum, transform
from bokeh.io import output_notebook
from bokeh.layouts import gridplot
output_notebook()
import numpy as np
data = history[['open', 'high', 'low', 'close', 'volume']]
up_days = data[data['close'] > data['open']]
down_days = data[data['open'] > data['close']]
plot_height_main = 600
plot_height_ft = int(plot_height_main / 3)
plot = figure(title=f"{future.Symbol.Value} OHLC",
x_axis_label='Date',
y_axis_label = 'Price',
y_axis_location="right",
x_axis_type='datetime',
width=1100,
height=plot_height_main)
plot.segment(data.index, data['high'], data.index, data['low'], color="black")
width = 12*60*60*1000
plot.vbar(up_days.index, width, up_days['open'], up_days['close'], fill_color="green", line_color="green")
plot.vbar(down_days.index, width, down_days['open'], down_days['close'], fill_color="red", line_color="red")
# Overlay the 200EMA on the candlestick chart
plot.line(emaData.index, emaData['exponentialmovingaverage'], line_width=1, color="navy", legend_label="100 EMA")
# Customize the legend
plot.legend.location = "top_left"
plot.legend.click_policy="hide"
# Fisher Transform plot
ft_plot = figure(title="Fisher Transform", x_axis_label='Date', y_axis_label='Fisher Value', x_axis_type='datetime', width=1100, height=plot_height_ft, x_range=plot.x_range) # Shared x_range for synchronized panning and zooming
ft_plot.line(ftData.index, ftData['fishertransform'], line_width=1, color="blue")
# Combine the plots
p = gridplot([[plot], [ft_plot]])
show(p)
Testing Indicators for Statistical Soundness
I know what you are thinking. Larry, you just went from basic data manipulation and visualization to talking about statistical soundness. Are you purposefully trying to make this difficult? Yes, I am. We are talking about practice, remember?
After I got the hang of getting data and displaying it, I turned towards trying something different. I have been reading "Statistically Sound Indicators For Financial Market Prediction: Algorithms in C++" by Dr. Timothy Masters. This book discusses the different techniques you can use to test the overall soundness of an indicator. I thought it would be cool (and valuable) to try and recreate these tests in a QC/LEAN project. These tests could provide a starting point for developing and testing trade systems.
The rest of this post is for paid subscribers. Paid subscribers will have access to the private GitHub where all the Hunt Gather Code and strategies live. It is continuously being worked on and will expand as we explore the world of automated trade systems.