Building a Universe-Wide Prediction Grid: Scaling ML From Single Tickers to Thousands of Assets
Most ML workflows start with "Let's test this model on AAPL." But an alpha factory needs predictions for every asset in the universe at multiple horizons from multiple models.
1. What Is a Prediction Grid?
Conceptually, a prediction grid is a large tensor: r̂_{u, m, h, t} where:
- u = asset (ticker, FX pair, crypto, index, etc.)
- m = model (Ridge, XGBoost, LSTM, etc.)
- h = horizon (1d, 5d, 21d, etc.)
- t = forecast date
Each entry stores predicted return (or volatility, or probability) and associated metadata.
2. From Single Ticker to Universe
For U tickers, M models, H horizons, you'd naively run U × M × H independent experiments. This doesn't scale.
We need shared infrastructure: centralized feature stores, hierarchical configuration, and efficient execution models.
3. Data Design for a Prediction Grid
3.1 Centralized Feature Store
Rather than recomputing features per-run, maintain a feature store keyed by (asset, date, feature_set_version). Features computed once per day per asset, versioned transformations.
3.2 Hierarchical Configuration
Separate configuration into universe-level, model template, and execution config. The prediction factory builds runs using Cartesian products where needed, but avoids combinatorial explosion.
4. Execution Model
A practical design:
- Daily schedule: For each business day, load required features and models
- Task graph (Prefect/Celery): tasks by (model, horizon, sector chunk)
- Output: prediction table for day t with (asset_id, model_id, horizon, forecast_date, prediction, metadata_ref)
This is exactly what volarixs' prediction factory is designed to orchestrate.
Prediction Grid Browser
| Ticker | 1d | 5d | 21d |
|---|---|---|---|
| NVDA | 4.74% | -1.80% | -1.52% |
| GOOGL | 2.20% | -4.86% | -4.73% |
| INTC | 0.98% | -3.08% | 4.40% |
| AMD | 0.87% | -2.08% | 1.61% |
| META | 0.39% | -0.86% | -0.21% |
| TSLA | -0.37% | 1.72% | -2.53% |
| AAPL | -0.87% | -4.86% | -1.48% |
| AMZN | -1.71% | 1.77% | -2.65% |
| MSFT | -2.79% | -2.35% | -2.96% |
| NFLX | -3.16% | 3.07% | -2.30% |
This is a simplified view of a universe-wide prediction grid. In production, volarixs maintains predictions for thousands of assets across multiple models and horizons, enabling cross-sectional analysis and alpha factory workflows.
5. Research Questions Enabled by a Prediction Grid
Once the grid is populated, you can run research that is simply not possible in single-ticker workflows:
- Cross-sectional comparison of model performance by sector, size, liquidity
- Stability of model rankings across time and regimes
- Horizon-consistency: does a model that's good at 1d also work at 21d?
- Factor extraction: PCA/ICA on prediction matrices to identify latent "model factors"
6. How volarixs Implements the Grid
volarixs separates:
- Manual experiments: user-triggered, single or small sets of assets
- Factory runs: system-triggered, universe-wide predictions
Factory runs:
- use a shared feature store in S3/Parquet
- dispatch jobs via Prefect workers
- store predictions and metadata in Postgres + S3
- expose grid slices via APIs: by date, by asset, by model
This makes research-level analysis of prediction behaviour a first-class capability, not an afterthought.
7. From Prediction Grid to Trade Signals
The grid itself is not yet a trading strategy. You still need filtering, position sizing logic, and portfolio construction rules. But if the prediction grid is well-designed and reproducible, the step from "research" to "production" is much shorter — and far less fragile.