volarixs

volarixs - applied AI & ML to finance

Explore our latest posts on machine learning, market dynamics, strategy architecture and design

Feature Engineering
Jun 2, 2026

Shrinking the Feature Space: PCA & Autoencoders

Many features are redundant or noisy. High dimensionality = harder to generalize.

PCA
Autoencoders
Features
9 min read
Strategy
May 24, 2026

How Asset Managers Can Implement AI & Machine Learning

Part 2: Infrastructure, Governance & Roadmap. What it takes to implement AI in asset management.

AI Implementation
Governance
Roadmap
18 min read
Deep Learning
May 20, 2026

Neural Networks for Market Data: MLPs, CNNs & LSTMs

We are selective with deep learning. Expensive to train, easy to overfit, harder to debug.

Neural Networks
MLP
LSTM
12 min read
Research
May 14, 2026

Signal Half-Life and Decay: How Long Do ML Edges Really Last?

If you discover a signal today, how long will it work?

Signal Decay
Half-Life
Edge Persistence
13 min read
Strategy
May 7, 2026

How Asset Managers Can Use AI & Machine Learning in Investment Decisions

Part 1: Use Cases & Value. Real-world use cases: idea generation, regime analysis, risk management.

Asset Management
AI & ML
Use Cases
15 min read
Volatility
Apr 27, 2026

Modeling Market Turbulence: GARCH, EGARCH & HAR

Volatility ≠ returns: heavy tails, clustering, mean reversion. Dedicated volatility models are essential.

GARCH
EGARCH
HAR
10 min read
Time Series
Apr 9, 2026

ARIMA, SARIMAX & VAR: When Classical Time-Series Still Win

Explicitly model temporal dependence with transparent structure.

ARIMA
SARIMAX
VAR
9 min read
Benchmarks
Mar 31, 2026

Volatility Forecasting Benchmarks: GARCH, HAR, and ML

Compare GARCH, HAR, and ML models for volatility forecasting.

Volatility
GARCH
HAR
11 min read
Machine Learning
Mar 24, 2026

How Market Regimes Break ML Models

Financial machine learning rarely fails because the model is 'bad'. It fails because the market regime changed.

Regimes
ML
Backtesting
8 min read
Models
Mar 17, 2026

Boosted Trees for Alpha: XGBoost & LightGBM

Gradient boosting dominates tabular ML. Learn how XGBoost and LightGBM deliver strong performance.

XGBoost
LightGBM
Boosting
11 min read
Features
Mar 10, 2026

The 19 Most Important Features for Equity Return Forecasting

Most ML performance in finance doesn't come from the model — it comes from the features.

Features
Alpha
Equities
12 min read
Methodology
Feb 27, 2026

Rolling Windows for Financial ML: A Complete Guide

If you use financial data and your model does not use a rolling window, the backtest is wrong.

Rolling Windows
Time Series
Backtesting
10 min read
Evaluation
Feb 16, 2026

Beyond Sharpe: A Research Framework for Evaluating ML Trading Strategies

Sharpe ratio is dangerously incomplete for ML strategies.

Evaluation
Metrics
Sharpe
15 min read
Models
Jan 28, 2026

Random Forests in Finance: Nonlinear Signals Without the Drama

Tree-based ensembles capture nonlinearities and interactions in market data.

Random Forest
Extra Trees
Trees
10 min read
Models
Jan 5, 2026

From Linear Regression to Lasso: Fast, Interpretable Baselines

Linear and regularized regressions still do serious work in finance.

Linear Regression
Ridge
Lasso
12 min read
Regimes
Dec 12, 2025

Market Regimes, Clusters & HMMs: Teaching Models to Respect the Environment

Episodes where statistical properties are stable enough: high vol vs low vol, risk-on vs risk-off.

K-Means
GMM
HMM
11 min read
Architecture
Nov 23, 2025

Building a Universe-Wide Prediction Grid

An alpha factory needs predictions for every asset at multiple horizons from multiple models.

Prediction Grid
Scaling
Alpha Factory
14 min read
Evaluation
Oct 8, 2025

Regime-Conditioned Performance: Measuring ML Robustness

Most backtests report a single Sharpe. But ML models fail by regime.

Regimes
Robustness
Performance
12 min read
Evaluation
February 16, 2026
15 min read

Beyond Sharpe: A Research Framework for Evaluating ML Trading Strategies

Sharpe ratio is the default metric in systematic finance. For ML-driven strategies, it's also dangerously incomplete.

1. The Limits of Sharpe

Sharpe assumes returns are i.i.d., Gaussian or close enough, and variance is a good proxy for risk.

ML strategies often have:

  • highly skewed return distributions
  • clustered losses
  • exposure to hidden factors
  • complex dependence on market regimes

Two strategies with identical Sharpe can have radically different risk and robustness.

2. A Multi-Dimensional Metric Set

We propose evaluating ML strategies along at least these dimensions:

  1. Risk-adjusted return (Sharpe, Sortino, Information ratio)
  2. Drawdown profile (max DD, average DD, recovery time)
  3. Tail behaviour (CVaR, tail Sharpe, skew, kurtosis)
  4. Turnover and capacity (turnover, market impact proxies)
  5. Stability and robustness (across time, regimes, cross-validation folds)
  6. Implementation risk (signal noise, fill ratios, slippage sensitivity)

3. Tail Metrics for ML Strategies

Given return distribution r_t, define:

  • CVaR_α: average loss in the worst α% of cases
  • Tail Sharpe: Sharpe computed on left-tail truncated distribution (e.g. worst 20% of returns)

ML strategies tuned on average loss can accidentally produce very fat left tails. CVaR and tail Sharpe expose this directly.

4. Stability Measures

4.1 Time-Based Stability

Compute Sharpe in rolling windows, then analyze mean, median, dispersion, and worst decile. A strategy with Sharpe 1.0 but frequent windows with Sharpe < –1.0 is not robust.

4.2 Regime-Based Stability

Combine with regime-conditioning: Sharpe by regime, CVaR by regime, Hit ratio by regime. Define a Stability Index that penalizes dispersion of performance across regimes.

5. Turnover, Capacity, and Market Impact

ML strategies often trade too frequently. Key quantities:

  • Turnover: sum of absolute position changes
  • Estimated market impact: using square-root models or simplified cost functions

Compute Net Sharpe after cost and compare: Net Sharpe vs Gross Sharpe, Cost per unit of alpha.

6. ML-Specific Diagnostics

For ML-based strategies, we also want:

  • Performance by prediction confidence bucket
  • Performance by prediction sign agreement between models
  • Calibration plots: predicted vs realised returns/volatility
  • Feature importance stability over time

These diagnostics answer: "Do larger predicted returns actually correspond to larger realised returns?", "Is the model overconfident?", "Are signals coming from a stable feature set or from shifting noise?"

7. How This Maps to volarixs

A framework like this is only as good as the data underneath it — and the work volarixs does up front is to keep that data. Each experiment records the raw material these metrics are computed from:

  • the model's prediction history, with direction and confidence, across horizons (1d / 5d / 21d / 63d and beyond)
  • the regime context each run was generated under — inflation, policy and liquidity state
  • factor exposures for the run: betas, R², alpha and residual volatility
  • run results: the model, datasets, targets and train/test R²

That history is what the metrics above are built on. Confidence buckets are recorded with every signal, so the diagnostics in section 6 — does a higher predicted return actually pay off, is the model overconfident — can be measured rather than assumed. Regime labels travel with every run, so stability can be read by regime instead of as a single blended number.

The metric set in this post is the lens volarixs is built around, not a one-click dashboard you toggle on today. The aim is to move the question from “Is my model good?” to “Is it good where it matters, robust when conditions change, and still good after costs?” — a research-level evaluation standard, with the prediction and regime history needed to answer it kept run by run.

Evaluation
Metrics
Sharpe
CVaR
Stability

Get new research in your inbox

Applied AI & ML for the buy-side — new research on signals, regimes, and strategy design, straight to your inbox. No noise.

Ready to evaluate beyond Sharpe?

See how volarixs keeps the prediction and regime history that multi-dimensional evaluation is built on.