Volatility Forecasting Benchmarks: GARCH, HAR, and ML on Equity Indices
Forecasting volatility is a core task in options pricing, risk management, position sizing, and portfolio optimisation.
1. Target Definition: Realised Volatility
Before you can compare GARCH, HAR, and ML on a level field, you have to agree on what they are predicting. Volatility is never observed directly — it has to be estimated from returns — so the first decision in any benchmark is the target. For an index with daily returns r_t, define the realised volatility target as:
σ_{t+1}^{real} = √(Σ_{i=0}^{h-1} r_{t+1+i}^2)for horizon h (often 1 or 5 days). Alternatively, use high-frequency-based realised measures where available.
2. Model Classes
2.1 GARCH Models
Standard GARCH(1,1):
r_t = σ_t ε_t, ε_t ~ N(0,1)
σ_t² = ω + α r_{t-1}² + β σ_{t-1}²Variants: EGARCH, GJR-GARCH (asymmetry), long-memory variants, different error distributions.
2.2 HAR Model
HAR-RV model represents volatility as:
RV_{t+1} = β₀ + β₁ RV_t^{(d)} + β₂ RV_t^{(w)} + β₃ RV_t^{(m)} + ε_twhere RV_t^{(d)} is daily, RV_t^{(w)} is weekly average, RV_t^{(m)} is monthly average. This captures multi-horizon volatility dynamics in a simple linear framework.
2.3 ML Models
Input features may include:
- lagged realised vol, RV d/w/m
- lagged returns
- volatility-of-volatility measures
- macro proxies, VIX, term structure of implied vol (if available)
Models: Ridge / ElasticNet, Random Forest, Gradient Boosted Trees (XGBoost/LightGBM), MLP / small LSTM on volatility features.
Targets: next-day or next-5-day realised volatility (or log-vol to stabilise).
3. Benchmark Design
To avoid typical pitfalls:
- Strict time-based splits
- Rolling or expanding re-estimation (walk-forward)
- Non-overlapping evaluation windows where feasible
- Multi-criteria evaluation: RMSE / MAE on volatility, R² on log-vol, accuracy of volatility buckets, performance of strategies that use the forecasts
4. From Vol Forecast to Trading Performance
One useful research question: "Does a better volatility forecast translate into better risk-adjusted returns when used for position sizing?"
Example: vol targeting strategy w_t = τ / σ̂_t where τ is target risk, σ̂_t is forecast vol.
We can compare performance using GARCH-based vol, HAR vol, and ML vol. This closes the loop from forecasting metrics to trading metrics.
5. How This Maps to volarixs
A benchmark like this is really just a set of experiments run on a common footing, and that is the shape of an experiment in volarixs. You define the target as a realised volatility series, pick datasets and a feature set, choose the model class, and set a target horizon and time window:
- the contenders map onto selectable model classes — Volatility for GARCH-type models, Time Series for HAR-style lag structures, and Regression / Tree & Boosted / Neural Networks for the ML side
- each model is fit walk-forward over rolling windows, so the time-based splits this article argues for are how training runs by default rather than something you bolt on
- every run is stored with its model, datasets, targets, status, and train/test R² — a durable record of each contender on the same target
That shared, per-run record is the foundation a benchmark is built on: with each model held to the same target, horizon, and window, the forecasts become genuinely comparable. The wider evaluation this article describes — error metrics beyond R², calibration, and closing the loop into vol-targeting strategy performance — is the lens volarixs is built around, layered on top of that stored history rather than a separate one-click report.