CHA: Physics-Based Dynamics Modeling for Clothing-Editable Human Avatars

IEEE Transactions on Multimedia (TMM) 2026

Yong Deng, Baoxing Li, Xu Zhao

{dengyong2016, lbx11sjtu, zhaoxu}@sjtu.edu.cn

Graphical_abstract

Abstract

This paper proposes a novel framework for generating human avatars from monocular videos. Existing methods struggle to balance dynamic effects and modeling efficiency due to the need for per-subject retraining. To address this, we introduce CHA, a physics-based approach for Clothing-editable Human Avatars, which decouples reconstruction and driving, thereby eliminating the need for per-subject fitting, and utilizes physics-based cloth dynamics modeling to generate realistic clothing dynamics. Our framework comprises three core components: First, an avatar reconstruction module employs an attention mechanism to fuse multi-frame normal information, ensuring complete human surface recovery. Second, a cloth recovery module exploits the static equilibrium state to restore standard clothing templates from observed garments, facilitating downstream dynamics modeling. Third, we propose a new physics-based cloth dynamics module to generate realistic cloth-aware avatar animations. The module generalizes well across different clothing sizes and styles by employing dual-layer graph networks with local predictions and a dynamic KNN-based graph construction that is not limited by mesh topology. Furthermore, we employ a multi-frame joint semantic segmentation algorithm to separate clothing templates, making our method applicable to clothing editing applications, such as virtual try-on. Extensive quantitative and qualitative experiments demonstrate that our avatars exhibit enhanced physical accuracy in clothing dynamics, well-preserved individual body characteristics, and distinct clothing layers.

Overview

overview

CHA creates drivable, clothing-editable, and dynamically realistic 3D human avatars from monocular video. The overall pipeline consists of four main modules:

Avatar Reconstruction Module: Reconstructs the human avatar in canonical space by sampling complementary frames from monocular video, estimating normal maps and SMPL-X parameters, and fusing multi-frame normal features with an attention mechanism.
Cloth Separation Module: Uses semantic segmentation to obtain garment regions and separate clothing templates from the reconstructed human model, enabling independent cloth editing and dynamics modeling.
Cloth Recovery Module: Optimizes garment templates according to physical properties in a static equilibrium state, producing standardized templates for subsequent driving.
Cloth Dynamics Modeling Module: Predicts dynamic garment deformations with a physics-based dual-layer graph network. The coarse-level graph propagates global garment deformations, while the fine-level graph captures local high-frequency details.

Results

Qualitative results

CHA is capable of generating realistic human avatars for various subjects. Moreover, for different poses, our method generates detailed and realistic clothing dynamics.

qualitative_results

Qualitative comparison

CHA separates the clothing template for dynamics modeling, achieving more realistic clothing dynamics and clearer clothing hierarchy. In addition, compared to the generative model-based approach, ClothWild, our method produces avatars with better individual features, such as hair.

fig7 fig8

Quantitative Evaluation

Table I presents the quantitative evaluation of dynamics modeling. Table II presents the quantitative evaluation of avatar reconstruction. Table III presents the efficiency comparison between our framework and other methods.

table1

table2 table3

Citation

@article{11554422,
  author={Deng, Yong and Li, Baoxing and Zhao, Xu},
  journal={IEEE Transactions on Multimedia},
  title={CHA: Physics-Based Dynamics Modeling for Clothing-Editable Human Avatars},
  year={2026},
  pages={1-11},
  doi={10.1109/TMM.2026.3701539}
}