STORM — Project Page

STORM Webpage

STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading

Yilei Zhao^*, Wentao Zhang^*, Tingran Yang, Yong Jiang, Fei Huang, Wei Yang Bryan Lim^†

Accepted by WSDM 2026

STORM is a Spatio-Temporal factor Model based on dual vector quantized variational autoencoders that extracts stock features from both temporal and spatial perspectives.

The model fuses and aligns these features at a fine-grained semantic level, representing factors as multi-dimensional embeddings rather than single values. By utilizing discrete codebooks to cluster similar factor embeddings, STORM ensures orthogonality and diversity, effectively distinguishing between different factors for financial trading selection.

Extensive experiments in portfolio management and individual trading tasks demonstrate the model's superior performance and flexibility in adapting to downstream financial applications.

View on GitHub Paper Slides Poster

Overview of the STORM framework — Figure 1: Overall architecture of STORM.

Introduction

Recently, we have witnessed the rise of latent factor models, connecting the factor model with the generative model, the variational autoencoder (VAE). These VAE-based models describe high-dimensional data (prices) to low-dimensional representations (factors), and learn factors self-adaptively. Although latent factor models have demonstrated substantial success in financial trading tasks, they still face several significant issues:

CH1: Limited Reflection of Market Complexity

Latent factor models represent factors as single values are inherently constrained by their insufficient capacity to capture the intricate complexity and nonlinearity of financial data, rendering them vulnerable to noise and non-stationarity, which compromises their predictive accuracy and stability.

CH2: Factor Inefficiency

VAE-learned factors suffer from three inefficiencies: i) focusing mainly on cross-sectional factors while neglecting temporal information, ii) allowing noise in continuous latent spaces to overshadow meaningful signals, and iii) lacking independence among factors, which leads to multicollinearity and weak adaptability to varying market conditions.

CH3: Lack of Factor Selection

Existing latent factor models primarily focus on generating factors without adequately differentiating between them. Furthermore, they neglect the crucial process of factor selection, which is essential for identifying impactful factors, thereby limiting the model's overall effectiveness and precision.

In order to address the challenges, we propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders (VQ-VAE), named STORM. Unlike traditional scalar-valued financial factors with clear economic interpretations, STORM learns high-dimensional latent vectors that, while less interpretable capture complexity and nonlinearity inherent in financial martket (CH1). Additionally, we develop a dual VQ-VAE architecture to capture cross-sectional and time-series features, considering both spatial and temporal perspectives. By integrating these features at both fine-grained and semantic levels, the model constructs more effective factors. Through diversity and orthogonality loss constraints, we ensure representation independence among learned representations (CH2). Furthermore, codebook embeddings act as cluster centers, serving as class tokens to categorize factor embeddings. This strategy provides clarity and transparency to the differentiation and selection process of factors (CH3).

Methodology

Time-sieres and Cross-sectional Modules

Patching and Encoding: In TS module, observed data is divided along the stock number dimension. In CS module, it’s divided along the time axis. Then we use Transformer as encoders and decoders to capture complex patterns.

Codebook Construction and Optimization:

Diversity loss $ \mathcal{L}_{\text{div}}=\frac{1}{GK}\sum_{g=1}^{G}\sum_{k=1}^{K}\bar{p}_{g,k} \log{\bar{p}_{g,k}} $ is to enhance representational capacity.

Orthogonality loss $ \mathcal{L}_{\text{ortho}}= \frac{1}{K^2}\left|\left| \ell_2(\mathbf{e})^\top \ell_2(\mathbf{e}) - I_K \right|\right|_F^2 $ forces factor orthogonality.

Decoding and Reconstruction: $$ \begin{aligned} \mathcal{L}_1 = & \lambda_{\text{ortho}}\mathcal{L}_{\text{ortho}} + \lambda_{\text{div}}\mathcal{L}_{\text{div}} \\ & + \left\|\mathbf{x}-\mathbf{x}'_{ts}\right\|^2_2 + \left\| \mathbf{x}-\mathbf{x}'_{cs}\right\|^2_2 \\ & + \left\|sg[\mathbf{z}_e^{ts}(\mathbf{x})]-\mathbf{z}_q^{ts}(\mathbf{x})\right\|^2_2 + \left\|sg[\mathbf{z}_q^{ts}(\mathbf{x})]-\mathbf{z}_e^{ts}(\mathbf{x})\right\|^2_2 \\ & + \left\|sg[\mathbf{z}_e^{cs}(\mathbf{x})]-\mathbf{z}_q^{cs}(\mathbf{x})\right\|^2_2 + \left\|sg[\mathbf{z}_q^{cs}(\mathbf{x})]-\mathbf{z}_e^{cs}(\mathbf{x})\right\|^2_2 \end{aligned} $$

Factor Module

Feature Fusion and Alignment: We use multiscale encoder and contrastive learning layer to fuse and align TS and CS features at fine-grained and semantic levels.

Prior - Posterior Learning: Concatenate two latent features, which are used to predict the future returns. The factors are then be used in portfolio and trading downstream tasks.

Method overview

STORM is in a dual VQ-VAE structure, with time-series module, cross-sectional module, and the factor module to align and fuse learned factors.

Model architecture of STORM — Figure 2: STORM's architecture.

Downstream Tasks

Portfolio Management: We utilize the factor decoder network to generate stock future returns $ \mathbf{\hat{y}}$, and then apply the TopK-Drop strategy, which constructs a daily portfolio by selecting the top $ k$ stocks based on predicted returns, to backtest the factor model.

Algorithmic Trading: The latent factor embeddings $ \mathbf{Z} $ are integrated into the observation set $ \mathcal{O}=\{\mathbf{Z}, \mathcal{R}\} $, where $ \mathcal{R} $ is the reward function used to guide the agent's learning and decision-making in the environment. We use the Proximal Policy Optimization (PPO) algorithm to optimize the policy.

Experiment Results

Portfolio Management Task

SP500 Dataset

Strategy	Profit		Risk-Adj. Profit		Risk
Strategy	APY ↑	CW ↑	CR ↑	ASR ↑	MDD ↓	AVO ↓
Market Index	0.058	1.184	0.228	0.142	0.254	0.410
LightGBM	0.059	1.201	0.304	0.332	0.238	0.176
LSTM	0.069	1.221	0.278	0.371	0.248	0.186
Transformer	0.076	1.244	0.389	0.433	0.198¹	0.174
CAFactor	0.075	1.241	0.342	0.428	0.223	0.174
FactorVAE	0.079	1.256	0.404	0.460	0.200	0.173
HireVAE	0.077	1.249	0.361	0.448	0.216	0.172
STORM	0.188	1.683	1.189	1.052	0.166	0.171
	± 0.055	± 0.226	± 0.661	± 0.329	± 0.050	± 0.020

Table 1: Portfolio management task results of all models across six metrics (mean ± range, computed across 10 runs) on SP500 dataset.

DJ30 dataset

Strategy	Profit		Risk-Adj. Profit		Risk
Strategy	APY ↑	CW ↑	CR ↑	ASR ↑	MDD ↓	AVO ↓
Market Index	0.063	1.201	0.147	0.429	0.219	0.288
LightGBM	0.069	1.221	0.288	0.430	0.244	0.160
LSTM	0.060	1.192	0.243	0.370	0.248	0.163
Transformer	0.056	1.179	0.227	0.367	0.250	0.154
CAFactor	0.059	1.186	0.233	0.382	0.252	0.153
FactorVAE	0.076	1.246	0.352	0.480	0.225	0.159
HireVAE	0.072	1.233	0.298	0.445	0.247	0.163
STORM	0.148	1.517	1.396	1.052	0.108	0.140
	± 0.046	± 0.188	± 0.679	± 0.297	± 0.026	± 0.014

Table 2: Portfolio management task results of all models across six metrics on DJ30 dataset.

Algorithmic Trading Task

Models	AAPL			JPM			IBM			INTC			MSFT
Models	APY ↑	CW ↑	CR ↑	APY ↑	CW ↑	CR ↑	APY ↑	CW ↑	CR ↑	APY ↑	CW ↑	CR ↑	APY ↑	CW ↑	CR ↑
Buy&Hold	0.120	1.404	0.383	0.096	1.316	0.236	0.145	1.499	0.727	-0.117	0.690	-0.184	0.214	1.784	0.569
LightGBM	0.135	1.390	0.487	0.116	1.335	0.333	0.227	1.654	1.091	-0.042	0.880	0.038	0.267	2.068	0.637
LSTM	0.053	1.152	0.283	0.079	1.290	0.266	0.134	1.386	0.754	0.060	1.262	0.381	0.178	1.513	0.893
Transformer	0.083	1.240	0.512	0.133	1.384	0.614	0.131	1.377	0.782	0.079	1.290	0.458	0.138	1.397	0.726
DQN	0.135	1.374	0.510	0.105	1.305	0.607	0.139	1.400	0.802	0.061	1.185	0.442	0.166	1.475	0.534
SAC	0.147	1.509	0.528	0.131	1.383	0.400	0.207	1.598	1.170	0.056	1.165	0.353	0.229	1.656	0.929
PPO	0.137	1.379	0.496	0.128	1.372	0.356	0.146	1.422	0.779	-0.019	0.954	0.040	0.216	1.620	0.569
STORM	0.229	1.857	0.750	0.174	1.621	0.559	0.236	1.893	1.470	0.173	1.625	0.773	0.290	2.154	1.216
	± 0.033	± 0.154	± 0.066	± 0.032	± 0.133	± 0.081	± 0.039	± 0.184	± 0.445	± 0.067	± 0.284	± 0.293	± 0.052	± 0.262	± 0.597
Improvement (%)	55.782	23.062	45.076	30.827	17.124	2.117	3.965	14.4501	20.408	118.987	26.969	66.594	8.614	4.159	30.893

Table 3: Algorithmic trading task results on all models.

Factor Quality Evaluation Task

Model	SP500		DJ30
Model	RankIC ↑	RankICIR ↑	RankIC ↑	RankICIR ↑
LightGBM	0.027 ± 0.006	0.274 ± 0.084	0.031 ± 0.005	0.272 ± 0.049
LSTM	0.034 ± 0.006	0.333 ± 0.042	0.031 ± 0.004	0.329 ± 0.056
Transformer	0.035 ± 0.007	0.340 ± 0.078	0.033 ± 0.005	0.343 ± 0.045
CAFactor	0.037 ± 0.005	0.356 ± 0.084	0.040 ± 0.003	0.380 ± 0.043
FactorVAE	0.052 ± 0.010	0.543 ± 0.122	0.056 ± 0.012	0.520 ± 0.081
HireVAE	0.057 ± 0.006	0.558 ± 0.058	0.058 ± 0.006	0.563 ± 0.053
STORM	0.062 ± 0.018	0.673 ± 0.155	0.065 ± 0.038	0.668 ± 0.287

Table 4: Factor quality evaluation task results on RankIC and RankICIR (mean ± range, computed across 10 runs)

Citation

@article{zhao2024storm,
title={STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading},
author={Zhao, Yilei and Zhang, Wentao and Yang, Tingran and Jiang, Yong and Huang, Fei and Lim, Wei Yang Bryan},
journal={arXiv preprint arXiv:2412.09468},
year={2024}
}