Neural 3D Mesh Renderer

Hiroharu Kato    Yoshitaka Ushiku    Tatsuya Harada†‡
The University of Tokyo    RIKEN

3D Mesh Reconstruction

2D-to-3D Style Transfer

3D DeepDream

These applications are realized by redefining the “backward pass” of a 3D mesh renderer and incorporating it into neural networks.

Short introduction

We propose Neural Renderer. This is a 3D mesh renderer and able to be integrated into neural networks.

We applied this renderer to (a) 3D mesh reconstruction from a single image and (b) 2D-to-3D image style transfer and 3D DeepDream.


For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer.


Full paper is available at


Single-image 3D reconstruction

A 3D mesh can be correctly reconstructed from a single image using our method.

Comparison with voxel-based method [1]

Mesh reconstruction does not suffer from the low-resolution problem and cubic artifacts in voxel reconstruction.

Our approach outperforms the voxel-based approach [1] in 10 out of 13 categories on the voxel IoU metric.

Retrieval-based [1].5564.4875.5713.6519.3512.3958.2905.4600.5133.5314.3097.6696.4078.4766
Voxel-based [1].5556.4924.6823.7123.4494.5395.4223.5868.5987.6221.4938.7504.5507.5736
Mesh-based (ours).6172.4998.7143.7095.4990.5831.4126.6536.6322.6735.4829.7777.5645.6016

2D-to-3D style transfer

The styles of the paintings are accurately transferred to the textures and shapes by our methond. Please pay attention to the outline of the bunny and the lid of the teapot.

The style images are Thomson No. 5 (Yellow Sunset) (D. Coupland, 2011), The Tower of Babel (P. Bruegel the Elder, 1563), The Scream (E. Munch, 1910), and Portrait of Pablo Picasso (J. Gris, 1912).

3D DeepDream

This is a 3D version of DeepDream.

Technical overview

Understanding the 3D world from 2D images is one of the fundamental problems in computer vision. And, rendering (3D-to-2D conversion) lies on the borderline between the 3D world and 2D images. A polygon mesh is an efficient, rich and intuitive 3D representation. Therefore, the “backward pass” of a 3D mesh renderer is worth pursuing.

Rendering cannot be integrated into neural networks without modifications because the back-propagation is prevented from the renderer. In this work, we propose an approximate gradient for rendering, which enables end-to-end training of neural networks including rendering. Please read the paper for the details of our renderer.

The applications demonstrated above were performed using this renderer. The figure below shows the pipelines.

The 3D mesh generator was trained with silhouette images. The generator tries to minimize the difference between the silhouettes of reconstructed 3D shape and true silhouettes in the training phase.

2D-to-3D style transfer was performed by optimizing the shape and texture of a mesh to minimize style loss defined on the images. 3D DeepDream was also performed in a similar way.

Both applications were realized by flowing information in 2D image space into 3D space through our renderer.

More details can be found in the paper.



  title={Neural 3D Mesh Renderer},
  author={Kato, Hiroharu and Ushiku, Yoshitaka and Harada, Tatsuya},


  1. X. Yan et al. “Perspective Transformer Nets: Learning Single-view 3D Object Reconstruction without 3D Supervision.” Advances in Neural Information Processing Systems (NIPS). 2016.