How Market Data Reaches You
Part two of the market data for quants series. Just more information you should know.
In Part One of this series, we talked about the types of market data (trade and depth) you get from exchanges, how it travels from exchanges through vendors and brokers to end-users, and the distinction between historical and streaming feeds. It was a simple breakdown, and only discussed the basic data a quant might deal with. And realistically, it’s what most retail or hobbyist traders will work with. There are many other types of data that can be used to make trading decisions, but data from exchanges are the foundations of focus for the quant curious.
In this post I want to zoom in on the practical side: how does this data actually get to your terminal, platform, or research environment? The answer lies in the delivery models. Whether you are downloading static data files (CSV), making requests to REST APIs, or connecting to a streaming protocol, the delivery method shapes what you can do with the data and what types of strategies you can deploy.
File Exports
This is the OG method for getting historical data for testing. It is still widely used, and as we know from previous posts, one I am not a big fan of. Brokers, platforms, and data vendors will sometimes provide data in a flat file format (CSV being the most popular). This is a simple, code free way to get data to test. It is simple, portable, human-readable and works with spreadsheet applications (Excel) and coding languages (Python –> pandas). It is not real-time and often limited in granularity.
There are plenty of data vendors that provide CSV downloads or exports. Personally, I used Norgate data for daily data during my RealTest experiments. The data was reliable and great for daily strategies, especially for equities. But, it required using their software to access the data either via CSV export or connection to your testing/trading platform. Not the worst setup, but not the best either.
A quick google search and you can find plenty of providers that allow you to download aggregated OHLCV data in this type of format. This is the best way to get data if you are trying to no-code test an idea. In a soon-to-be-featured article, I will show how to use simple CSV data to test a strategy in a spreadsheet platform. Spoiler, I don’t fuck with Excel or Google Sheets, but you can do whatever you want.
Web APIs
Now, we are getting closer to the kind of data I like. I admit, I am a nerd. Being a nerd requires that newer technology gets you off a bit, but I have to admit, web APIs aren’t new. The finance world just hasn’t completely caught up with the times. This is because an API is really just a set of code that allows remote, sometimes authorized, access to some form of data. The real nerdy part is how the data is stored, but I am getting ahead of myself.
REST APIs
REST stands for representational state transfer. Yeah, it doesn’t really mean dick if you aren’t a web developer. It’s just an architecture that defines how you design an API. Oh yeah, API stands for “application programming interface”. In case you didn’t know. Probably should have mentioned that more often. If you aren’t a developer, it doesn’t matter. If you are a developer, you already know. I hope.
In the Last Post, I gave a quick example of how to get data from the Alpaca API. It’s pretty simple, especially if you are just backfilling data for a research pipeline. There are other protocols (maybe?), but REST is the standard on the inter-webs.
Web Sockets
Web sockets are responsible for our live data streams. It is a persistent connection that pushes updates to you in real-time. Alpaca (and many others) provides a WebSocket API. You can request bars, trades, or quotes for a given symbol and timeframe. It is also dependent on the type of data you ask for. Remember, many services apply their own magic to the data before it gets to you, so you have to know how they handle their data. It matters.
This is how you get real-time data to feed live order books or trades to an algo or dashboard. It’s pretty flexible, easy to integrate, and supported by most modern platforms. However, it isn’t standardized. Each platform has its own endpoints, formats, and quirks.
Protocols
This is where it gets… fun. For professional software, APIs aren’t always enough. That’s where protocols come in. They are standardized (or semi-standardized) messaging systems designed specifically for trading and market data. Cool, right?
FIX
FIX stands for “financial information exchange”. I am learning from my hastiness, see? This is the granddaddy of trading protocols. It is used worldwide for order routing and sometimes for market data. It is essentially the industry standard and widely supported. Unfortunately, it sucks. Just kidding. Kind of. It leaves a lot to be desired. It is verbose, complex, and often overkill if all you want is some historical data.
It really is the industry standard, though. It dates back to the 90s (before some of your birthdays, I bet) and is still the backbone of institutional routing. This makes it hard to ignore.
DTC
DTC stands for Data and Trading Communications. This is Sierra Chart’s love letter to modern traders. It is an open protocol that attempts to fix (heh) some of the issues with FIX. Unfortunately, I don’t know of any other platform/data service that uses it besides Sierra Chart and their Denali Data feed.
Does that matter?
Not really. It’s open and other people can use it, and I am sure users have. But, it requires some developer knowledge. Like FIX, it is a standardized messaging protocol. It has options for binary or compact JSON and is optimized to be fast and friendly. It’s pretty damn niche though. I kind of like it.
To be clear, this protocol is open and free to adapt. Free as in spe… er, beer. Beer is the safer option here these days, Larry. Adoption is limited, but it will be one I am exploring for my needs.
Proprietary Protocols
Unlike Sierra Chart, some vendors decided to fuck FIX and open protocols. Instead, they created their own shit that is tightly integrated within their own ecosystems. Very Apple of them. Instead of listing them, I just wanted to make you aware that there was the possibility of proprietary options out there. Since I am too cool the fuck with them (read broke), there isn’t much else to say here.
Putting it All Together
As a SQR (serious quant researcher, thanks SLA), you will often mix these models. You can use CSV for backtests, make requests to REST APIs, and keep a WebSocket or protocol running to stream live ticks. Understanding the difference is key to designing a reliable research and trading pipeline.
What’s next?
Now that I have covered the basics and showed you a couple of ways to get data, and convert it to CSV, I am going to get a little more detailed. I am going to show you how to use Sierra Chart (hint, this is the platform I am playing with for trading) to get data via DTC and aggregate tick data into OHCLV bar data for testing. After that, I am going to build out a simple testing suite in a spreadsheet to show you how you can go “code free” to test out simple ideas, get stats, and maybe even do some simple statistical robustness tests on the data.
Happy hunting, folks!
This post doesn’t represent any type of advice, financial or otherwise. Its purpose is to be informative and educational. Backtest results are based on historical data, not real-time data. There is no guarantee that these hypothetical results will continue in the future. Day trading is extremely risky, and I do not suggest running any of these strategies live.