TGAvatar: Reconstructing 3D Gaussian Avatars with Transformer-based Tri-plane

Ruigang Hu Xuekuan Wang Yichao Yan Cairong Zhao

Visual and Intelligent Learning Lab, Tongji University

Abstract

We introduce TGAvatar, a novel framework for 3D head animation and reconstruction that revolutionizes the use of 3D Gaussian Splatting (3DGS). TGAvatar significantly advances rendering quality by leveraging the intricate properties of 3DGS to achieve detailed and realistic representations of human head geometries and textures. We use an innovative application of linear blending techniques to imitate 3D Morphable Model (3DMM) coefficients within 3DGS, thereby enabling precise and dynamic facial feature and expression modeling. Further enhancing TGAvatar's capabilities, a transformer based tri-plane module is incorporated to accurately infer spherical harmonics and alpha parameters. This integration is pivotal for the method, as it allows allows us to efficiently and precisely represent the visual characteristics of gaussians, tailored specifically to the intricate details of the head's components. Our exhaustive evaluations show that TGAvatar not only elevates the fidelity and realism of 3D head reconstructions but also sets a new standard by surpassing existing methods in rendering quality and computational efficiency.

Pipeline

TGAvatar process begins with the random initialization of a set of Gaussians with pose, rotation, and scale bases (\(P, Q, S\)) and bias terms (\(p_0, q_0, s_0\)). In addition, a transformer based tri-plane module is employed to ensure high-fidelity novel view synthesis. Specifically, we first use a transformer-based tri-plane decoder to predict tri-plane features. Subsequently, we incorporate a tri-plane module to extract hybrid features based on the pose of each Gaussian. Finally, these hybrid features are fed into an MLP network to infer opacity (\(\alpha\)) and spherical harmonics coefficients(SH) in each gaussian.

Comparisons

Qualitative comparisons between our TGAvatar and INSTA, FlashAvatar and GaussianBlendshapes. Results are executed under the configurations specified in their works. For INSTA dataset, INSTA and GaussianBlendshapes provide pretrained models, therefore, these results are evaluated by their pretrained models. Our TGAvatar achieves better results, particularly in capturing details such as teeth, eyes, wrinkles and reflections.

TGAvatar: Reconstructing 3D Gaussian Avatars with Transformer-based Tri-plane

TGAvatar reconstructs a 3D facial avatar from a monocular portrait video of a person. By leveraging 3D Gaussian Splatting, alongside 3DMM feature blending and a transformer based tri-plane module, TGAvatar gan generate lifelike novel views and expressions of the digital avatar.

Abstract

Pipeline

Comparisons

More Results