Latent Spatial Memory

for Video World Models

Weijie Wang^1* Haoyu Zhao^1* Yifan Yang² Feng Chen³ Zeyu Zhang¹ Yefei He¹ Zicheng Duan³ Donny Y. Chen⁴ Yuqing Yang² Bohan Zhuang¹

¹ Zhejiang University · ² Microsoft Research · ³ Adelaide University · ⁴ Monash University ^* Equal contribution

Paper Code Promo Video Results World-R1

10.57× faster generation 55× lower 3D cache memory 70.36 WorldScore average

Latent Spatial Memory

Mirage stores static scene content as 3D latent tokens, then reads and updates that cache directly during generation.

Mirage Architecture

Mirage initializes, reads, and updates a persistent latent spatial memory.

Efficiency

Mirage reduces repeated 3D cache rendering while preserving strong world modeling quality.

Qualitative Evaluation

Each row compares the same trajectory and conditioning across Mirage and four baselines.

BibTeX

@article{wang2026mirage,
  title   = {Latent Spatial Memory for Video World Models},
  author  = {Wang, Weijie and Zhao, Haoyu and Yang, Yifan and Chen, Feng and Zhang, Zeyu and He, Yefei and Duan, Zicheng and Chen, Donny Y. and Yang, Yuqing and Zhuang, Bohan},
  journal = {arXiv preprint arXiv:2606.09828},
  year    = {2026}
}