How I might approach it. Interested in feedback from people closer to the space....

nickelpro · on July 9, 2024

The work I do is all in the single-to-low-double-digit microsecond range to give you an idea of timing constraints. I'm peripheral to HFT as a field though.

> First, split the load in to simple asset-specific data streams with a front-end FPGA for raw speed. Resist the temptation to actually execute here as the friction is too high for iteration, people, supply chain, etc.

This is largely incorrect, or more generously out-of-date, and it influences everything downstream of your explanation. Think of FPGAs as far more flexible GPUs and you're in the right arena. Input parsing and filtering are the obvious applications, but this is by no means the end state.

A wide variety of sanity checks and monitoring features are pushed to the FPGAs, fixed calculation tasks, and output generation. It is possible for the entire stack for some (or most, or all) transactions to be implemented at the FPGA layer. For such transactions the time magnitudes are mid-to-high triple digit nanoseconds. The stacks I've seen with my own two eyeballs talked to supervision algorithms over PCIe (which themselves must be not-slow, but not in the same realm as <10us work), but otherwise nothing crazy fancy. This is well covered in the older academic work on the subject [1], which is why I'm fairly certain its long out of date by now.

HRT has some public information on the pipeline they use for testing and verifying trading components implemented in HDL.[2] With the modern tooling, namely Verilator, development isn't significantly different than modern software development. If anything, SystemVerilog components are much easier to unit test than typical C++ code.

Beyond that it gets way too firm-specific to really comment on anything, and I'm certainly not the one to comment. There's maybe three dozen HFT firms in the entire United States? It's not a huge field with widely acknowledged industry norms.

[1]: https://ieeexplore.ieee.org/document/6299067

[2]: https://www.hudsonrivertrading.com/hrtbeat/verify-custom-har...

mgaunard · on July 9, 2024

Doing anything other than the dumb trigger-release logic on FPGA is counter-productive IMHO.

You're already heavily constrained by placement in order to achieve the lowest latency; you can't afford to have logic that's too complicated.