Hiroharu Kato1Yoshitaka Ushiku1Tatsuya Harada1,2 1The University of Tokyo2RIKEN
CVPR 2018 (spotlight)
3D Mesh Reconstruction
2D-to-3D Style Transfer
These applications are realized by redefining the “backward pass” of a 3D mesh renderer and incorporating it into neural networks.
We propose Neural Renderer. This is a 3D mesh renderer and able to be integrated into neural networks.
We applied this renderer to (a) 3D mesh reconstruction from a single image and (b) 2D-to-3D image style transfer and 3D DeepDream.
For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer.
Understanding the 3D world from 2D images is one of the fundamental problems in computer vision. And, rendering (3D-to-2D conversion) lies on the borderline between the 3D world and 2D images. A polygon mesh is an efficient, rich and intuitive 3D representation. Therefore, the “backward pass” of a 3D mesh renderer is worth pursuing.
Rendering cannot be integrated into neural networks without modifications because the back-propagation is prevented from the renderer. In this work, we propose an approximate gradient for rendering, which enables end-to-end training of neural networks including rendering. Please read the paper for the details of our renderer.
The applications demonstrated above were performed using this renderer. The figure below shows the pipelines.
The 3D mesh generator was trained with silhouette images. The generator tries to minimize the difference between the silhouettes of reconstructed 3D shape and true silhouettes in the training phase.
2D-to-3D style transfer was performed by optimizing the shape and texture of a mesh to minimize style loss defined on the images. 3D DeepDream was also performed in a similar way.
Both applications were realized by flowing information in 2D image space into 3D space through our renderer.