Photorealistic 3D Generation via Adversarial Distillation

Ziyu Wan1,2     Despoina Paschalidou2     Ian Huang2     Hongyu Liu3     Bokui Shen2     Xiaoyu Xiang    
Jing Liao1     Leonidas Guibas2    
City University of Hong Kong1           Stanford University2           HKUST3          


The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a correct mode in the high-dimensional distribution produced by the diffusion model is challenging and often leads to issues such as over-saturation, over-smoothing, and Janus-like artifacts. In this paper, we propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models. Instead of focusing on mode-seeking, our method directly models the distribution discrepancy between multi-view renderings and diffusion priors in an adversarial manner, which unlocks the generation of high-fidelity and photorealistic 3D content, conditioned on a single image and prompt. Moreover, by harnessing the latent space of GANs and expressive diffusion model priors, our method facilitates a wide variety of 3D applications including single-view reconstruction, high diversity generation and continuous 3D interpolation in the open domain. The experiments demonstrate the superiority of our pipeline compared to previous works in terms of generation quality and diversity.




CAD generates high-quality and photorealistic 3D using single image only.

3D Interpolation

Our method supports continuous 3D interpolation. Left: fixed viewpoint. Right: 360 viewpoint.

Diverse generation

By modeling distribution, our method could generate diverse 3D assets in 1 second without re-optimization.


We appreciate helpful discussions with Guandao Yang, Boxiao Pan, Zifan Shi, Chao Xu, Xuan Wang, Ivan Skorokhodov, Jingbo Zhang, Xingguang Yan and Connor Zhizhen Lin. The work described in this paper was substantially supported by a GRF grant from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region, China [Project No. CityU11208123]. Despoina Paschalidou is supported by the Swiss National Science Foundation under grant number P500PT 206946.

The website template was borrowed from MichaΓ«l Gharbi.