TRELLIS.2
NATIVE AND COMPACT STRUCTURED LATENTS FOR 3D GENERATION
An open-source 4B-parameter image-to-3D model producing up to 1536³ PBR textured assets, built on native 3D VAEs with 16× spatial compression, delivering efficient, scalable, high-fidelity asset generation.
Key Features
High Quality, Resolution, Efficiency
Arbitrary
Topology
Rich
Texture
Minimalist
Asset Processing
High Quality, Resolution & Efficiency

Our 4B-parameter model generates high-resolution fully textured assets with exceptional fidelity and efficiency with vanilla DiTs.

3s (2s+1s)
5123 resolution
17s (10s+7s)
10243 resolution
60s (35s+25s)
15363 resolution
* Total generation time (shape + material).   ** Tested with NVIDIA H100 GPU.

At the core is the native and compact structured latents, which push the frontiers of fidelity and compactness at the same time.

Fidelity Comparison

Reconstruction accuracy v.s. Latent compactness

Arbitrary Topology Handling

Our method robustly handles complex structures, including open surfaces, non-manifold geometry, and enclosed interior structures, breaking the constraints of iso-surface fields.

✔ Open Surfaces
✔ Non-manifold
✔ Internal Structures
* Mesh is cut to showcase internal structure.
Rich Texture Modeling

Our method can model arbitrary surface attributes such as Base Color, Roughness, Metallic, and Opacity (i.e., Transparency or Alpha channel), enabling Physically Based Rendering (PBR) and photorealistic relighting.

Relighting
Minimalist 3D Asset Pre- and Post-processing

Data processing for training and inference are simple, enabling instant conversions that are fully rendering-free and optimization-free.

< 10s on single CPU
Textured Mesh → O-Voxel
< 100ms with CUDA acceleration
O-Voxel → Textured Mesh
Image to 3D Asset Generation
3D Asset Reconstruction
Tech Innovations
Overview

TRELLIS.2's pipeline begins with an Instant Bidirectional Conversion that transforms meshes into our new representation termed O-Voxel. A Sparse Compression VAE then encodes these voxels into a compact Structured Latent space.

O-Voxel
O-Voxel: Omni-Voxel Representation
O-Voxel is a novel "field-free" sparse voxel structure designed to encode both precise geometry and complex appearance simultaneously.
GEO
Geometry (fshape) Utilizing a Flexible Dual Grids representation to handle arbitrary topologies while preserving sharp edges.
MAT
Appearance (fmat) Supports full PBR attributes (Base Color, Metallic, Roughness, Alpha) to accurately model rich surface materials.
SC-VAE: Sparse Compression VAE
We introduce a Sparse Compression 3D VAE, employing a Sparse Residual Autoencoding scheme to directly compress voxel data.
16×
Downsampling
~9.6K
Latent Tokens for 10243
It encodes a fully textured 3D asset into a highly compact representation with negligible perceptual degradation, enabling efficient large-scale generative modeling.
Sparse Compression VAE
Authors
Jianfeng Xiang
Tsinghua University, Microsoft Research
Xiaoxue Chen
Tsinghua University
Sicheng Xu
Microsoft Research
Ruicheng Wang
USTC, Microsoft Research
Zelong Lv
USTC, Microsoft Research
Yu Deng
Microsoft Research
Hongyuan Zhu
Microsoft AI
Yue Dong
Microsoft Research
Hao Zhao
Tsinghua University
Nicholas Jing Yuan
Microsoft AI
Jiaolong Yang
Microsoft Research
Citation
@article{xiang2025trellis2,
title={ Native and Compact Structured Latents for 3D Generation},
author={ Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
journal={ Tech report},
year={ 2025}
}
Responsible AI Considerations

TRELLIS.2 is purely a research project. Responsible AI considerations were factored into all stages. The datasets used in this paper are public and have been reviewed to ensure there is no personally identifiable information or harmful content. However, as these datasets are sourced from the Internet, potential bias may still be present.

Material Disclaimer

The materials made available on this page are provided solely for academic and research purposes in connection with the exploration of 3D generation technologies, as described in our tech report. These materials are not intended for commercial exploitation or use. If you believe that any content on this page infringes upon your intellectual property rights, including but not limited to copyright, please notify us by submitting a takedown request via email to jiaoyan (at) microsoft.com.

TRELLIS.2: Native and Compact Structured Latents for 3D Generation
Contact Us on GitHub Privacy & Cookies Consumer Health Privacy Terms of Use Trademarks © 2025 Microsoft