Boosted Trees for Alpha: XGBoost & LightGBM in a Market Regime World
Gradient boosting dominates tabular ML. Learn how XGBoost and LightGBM deliver strong performance with built-in regularization for financial markets.
1. From Forests to Boosting
Boosting works by sequentially correcting errors of prior trees. Intuition: "a committee of specialists" where each tree focuses on remaining mistakes.
Unlike random forests that average independent trees, boosted trees build an ensemble where each new tree learns from the residuals of the previous ensemble.
2. Why Boosted Trees Dominate Tabular ML
- Strong performance with modest tuning.
- Handle sparse, heterogeneous features.
- Built-in regularization (shrinkage, tree depth, subsampling).
3. XGBoost vs LightGBM
XGBoost:
- Mature, flexible, highly configurable.
- Excellent documentation and community.
- Widely used in production systems.
LightGBM:
- Faster with large feature sets; histogram-based splits.
- Lower memory footprint.
- In volarixs, both appear with predefined templates (e.g. "Fast", "Balanced", "Deep").
4. Use Cases
- Cross-sectional 5d/21d returns.
- Realized vol prediction.
- Regime-aware models: train separate boosters per regime or include regime labels as features.