Research

volarixs - applied AI & ML to finance

Explore our latest posts on machine learning, market dynamics, strategy architecture and design

Feature Engineering
Feb 10, 2026

Shrinking the Feature Space: PCA & Autoencoders

Many features are redundant or noisy. High dimensionality = harder to generalize.

PCA
Autoencoders
Features
9 min read
Strategy
Feb 1, 2026

How Asset Managers Can Implement AI & Machine Learning

Part 2: Infrastructure, Governance & Roadmap. What it takes to implement AI in asset management.

AI Implementation
Governance
Roadmap
18 min read
Deep Learning
Jan 28, 2026

Neural Networks for Market Data: MLPs, CNNs & LSTMs

We are selective with deep learning. Expensive to train, easy to overfit, harder to debug.

Neural Networks
MLP
LSTM
12 min read
Research
Jan 22, 2026

Signal Half-Life and Decay: How Long Do ML Edges Really Last?

If you discover a signal today, how long will it work?

Signal Decay
Half-Life
Edge Persistence
13 min read
Strategy
Jan 15, 2026

How Asset Managers Can Use AI & Machine Learning in Investment Decisions

Part 1: Use Cases & Value. Real-world use cases: idea generation, regime analysis, risk management.

Asset Management
AI & ML
Use Cases
15 min read
Volatility
Jan 5, 2026

Modeling Market Turbulence: GARCH, EGARCH & HAR

Volatility ≠ returns: heavy tails, clustering, mean reversion. Dedicated volatility models are essential.

GARCH
EGARCH
HAR
10 min read
Time Series
Dec 18, 2025

ARIMA, SARIMAX & VAR: When Classical Time-Series Still Win

Explicitly model temporal dependence with transparent structure.

ARIMA
SARIMAX
VAR
9 min read
Benchmarks
Dec 9, 2025

Volatility Forecasting Benchmarks: GARCH, HAR, and ML

Compare GARCH, HAR, and ML models for volatility forecasting.

Volatility
GARCH
HAR
11 min read
Machine Learning
Dec 2, 2025

How Market Regimes Break ML Models

Financial machine learning rarely fails because the model is 'bad'. It fails because the market regime changed.

Regimes
ML
Backtesting
8 min read
Models
Nov 25, 2025

Boosted Trees for Alpha: XGBoost & LightGBM

Gradient boosting dominates tabular ML. Learn how XGBoost and LightGBM deliver strong performance.

XGBoost
LightGBM
Boosting
11 min read
Features
Nov 18, 2025

The 19 Most Important Features for Equity Return Forecasting

Most ML performance in finance doesn't come from the model — it comes from the features.

Features
Alpha
Equities
12 min read
Methodology
Nov 7, 2025

Rolling Windows for Financial ML: A Complete Guide

If you use financial data and your model does not use a rolling window, the backtest is wrong.

Rolling Windows
Time Series
Backtesting
10 min read
Evaluation
Oct 27, 2025

Beyond Sharpe: A Research Framework for Evaluating ML Trading Strategies

Sharpe ratio is dangerously incomplete for ML strategies.

Evaluation
Metrics
Sharpe
15 min read
Models
Oct 8, 2025

Random Forests in Finance: Nonlinear Signals Without the Drama

Tree-based ensembles capture nonlinearities and interactions in market data.

Random Forest
Extra Trees
Trees
10 min read
Models
Sep 15, 2025

From Linear Regression to Lasso: Fast, Interpretable Baselines

Linear and regularized regressions still do serious work in finance.

Linear Regression
Ridge
Lasso
12 min read
Regimes
Aug 22, 2025

Market Regimes, Clusters & HMMs: Teaching Models to Respect the Environment

Episodes where statistical properties are stable enough: high vol vs low vol, risk-on vs risk-off.

K-Means
GMM
HMM
11 min read
Architecture
Aug 3, 2025

Building a Universe-Wide Prediction Grid

An alpha factory needs predictions for every asset at multiple horizons from multiple models.

Prediction Grid
Scaling
Alpha Factory
14 min read
Evaluation
Jun 18, 2025

Regime-Conditioned Performance: Measuring ML Robustness

Most backtests report a single Sharpe. But ML models fail by regime.

Regimes
Robustness
Performance
12 min read
Evaluation
June 18, 2025
12 min read

Regime-Conditioned Performance: Measuring ML Robustness Across Market States

Most backtests still report a single Sharpe, a single drawdown, a single equity curve. But ML models in finance almost never fail "on average" – they fail by regime.

This post outlines a research framework for regime-conditioned performance analysis that we are building directly into volarixs.

1. Why Average Metrics Are Misleading

Let a strategy produce daily returns (r_t) over T days. Standard practice: compute a single Sharpe ratio:

Sharpe = E[r_t] / √Var(r_t) × √252

Suppose we segment the sample into regimes k ∈ {1, ..., K}, e.g.:

  • Regime 1: Low-vol bull
  • Regime 2: High-vol sideways
  • Regime 3: Crisis

Two models can have the same overall Sharpe, but very different regime profiles:

  • Model A: Sharpe₁ = 3.0, Sharpe₂ = -1.5, Sharpe₃ = -2.0
  • Model B: Sharpe₁ = 1.2, Sharpe₂ = 0.8, Sharpe₃ = 0.3

If you're allocating real capital, Model B is far more attractive. The usual backtest won't tell you this.

2. Defining Market Regimes

We need objective regime labels that:

  • are computed without peeking into the future
  • can be reproduced across experiments
  • can be applied across asset classes

2.1 Volatility-Based Regimes

Compute rolling realised volatility σ_t over a window w, then bucket by quantiles:

  • 0–20%: Low-vol
  • 20–60%: Normal
  • 60–85%: Elevated
  • 85–100%: Crisis

Regime Segmentation Playground

Normal: 332 days
Low-Vol: 96 days
Crisis: 72 days

Adjust the volatility thresholds and rolling window to see how regime definitions change. Notice how different parameters create different regime patterns, affecting model performance evaluation.

2.2 Hidden Markov Model (HMM) Regimes

Fit a 2–3 state Gaussian HMM to infer most likely state sequence. More flexible but heavier to compute.

2.3 Cluster-Based Regimes

Define a feature vector (vol, correlation, dispersion, etc.) and cluster using k-means or Gaussian mixtures.

3. Regime-Conditioned Metrics

Once we have regime labels, we compute for each model:

  • Regime Sharpe: Sharpe_k for each regime
  • Regime Max Drawdown
  • Regime Hit Ratio (% r_t^(k) > 0)
  • Regime Turnover
  • Regime PnL contribution

We can then define robustness scores that penalize dispersion of performance across regimes.

4. Research Use Cases

4.1 Model Comparisons

Compare LSTM vs Ridge vs Random Forest by regime, not just overall. Analyze which models collapse during crisis states.

4.2 Portfolio Construction

Build ensembles where constituent models dominate in different regimes. Allocate capital dynamically based on current regime probability.

4.3 Production Monitoring

A live model suddenly deteriorates only in one regime → diagnosis, not panic. Rolling regime-conditioned Sharpe becomes part of your health dashboard.

5. How volarixs Implements This

In volarixs, every experiment is linked to:

  • time series of features
  • time series of predictions
  • time series of realised returns
  • time series of regime labels (one or more schemes)

Regime-conditioned metrics are:

  • computed automatically for each run
  • stored in the experiment metadata
  • visualised as "Performance by Regime" plots and tables
  • available for alpha factory ranking and diagnostics

This shifts the conversation from "What's the Sharpe?" to "How does this thing behave when the world changes?" That's where robustness lives.

Regimes
Robustness
Performance
Evaluation

Ready to measure regime-conditioned performance?

Start evaluating model robustness across market regimes in volarixs.