UV-based reconstruction of 3D garments from a single RGB image

Research project done at Human Pose Recovery and Behavior Analysis (HuPBA) group from the Computer Vision Center (CVC) in Barcelona. Paper accepted to IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) and Poster awarded with a Google Best Poster Award. preview

Introduction

Inferring 3D shapes from a single viewpoint is an essential human vision feature extremely difficult for computer vision machines. Despite the advances in the field of 3D human reconstruction, most research has concentrated only on unclothed bodies and faces, but modelling and recovering garments have remained notoriously tricky. We are interested in UV maps compared to other 3D surface representations such as meshes, point clouds or voxels, which are the ones commonly used in other 3D deep learning models. In this paper we adapt the LGGAN [1] architecture to predict garment UV map from the input image. We also introduce 3D loss functions to improve the surface quality.

UV Maps

The model has the novelty to use UV maps to represent 3D data. Garments are registered on top of SMPL [2] mesh to have homogeneous topology at both training and inference time. The UV coordinates are discrete points, thus UV maps have empty gaps between vertices. We use image inpainting techniques to estimate the values of the empty spaces and we use displacement UV maps that store garment vertices as an offset over the estimated SMPL body vertices.

RGB to UV Map Translation - LGGAN Architecture

The model is based on the LGGAN architecture. preview1

Class-Specific Local Generation Network

Model also has a novel local class-specific generation network that separately constructs a generator for each semantic class: preview2

Generator Loss

CLOTH3D++ Dataset [3]

Evaluation Metric

Results

preview3 preview4

Comparison with SMPLicit [4]

preview5

References

[1] H. Tang, D. Xu, Y. Yan, P. H. S. Torr, and N. Sebe. Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation, CVPR, 2020.

[2] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015.

[3] M. Madadi, H. Bertiche, W. Bouzouita, I. Guyon, and S. Escalera. Learning cloth dynamics: 3d + texture garment reconstruction benchmark. In Proceedings of the NeurIPS 2020 Competition and Demonstration Track, PMLR, volume 133, pages 57–76, 2021.

[4] E. Corona, A. Pumarola, G. Alenya, G. Pons-Moll, and F. Moreno-Noguer. Smplicit: Topology-aware generative model for clothed people, CVPR, 2021