IEEE Transactions on Multimedia (TMM) 2026
This paper proposes a novel framework for generating human avatars from monocular videos. Existing methods struggle to balance dynamic effects and modeling efficiency due to the need for per-subject retraining. To address this, we introduce CHA, a physics-based approach for Clothing-editable Human Avatars, which decouples reconstruction and driving, thereby eliminating the need for per-subject fitting, and utilizes physics-based cloth dynamics modeling to generate realistic clothing dynamics. Our framework comprises three core components: First, an avatar reconstruction module employs an attention mechanism to fuse multi-frame normal information, ensuring complete human surface recovery. Second, a cloth recovery module exploits the static equilibrium state to restore standard clothing templates from observed garments, facilitating downstream dynamics modeling. Third, we propose a new physics-based cloth dynamics module to generate realistic cloth-aware avatar animations. The module generalizes well across different clothing sizes and styles by employing dual-layer graph networks with local predictions and a dynamic KNN-based graph construction that is not limited by mesh topology. Furthermore, we employ a multi-frame joint semantic segmentation algorithm to separate clothing templates, making our method applicable to clothing editing applications, such as virtual try-on. Extensive quantitative and qualitative experiments demonstrate that our avatars exhibit enhanced physical accuracy in clothing dynamics, well-preserved individual body characteristics, and distinct clothing layers.
CHA creates drivable, clothing-editable, and dynamically realistic 3D human avatars from monocular video. The overall pipeline consists of four main modules:
CHA is capable of generating realistic human avatars for various subjects. Moreover, for different poses, our method generates detailed and realistic clothing dynamics.
CHA separates the clothing template for dynamics modeling, achieving more realistic clothing dynamics and clearer clothing hierarchy. In addition, compared to the generative model-based approach, ClothWild, our method produces avatars with better individual features, such as hair.
Table I presents the quantitative evaluation of dynamics modeling. Table II presents the quantitative evaluation of avatar reconstruction. Table III presents the efficiency comparison between our framework and other methods.
@article{11554422,
author={Deng, Yong and Li, Baoxing and Zhao, Xu},
journal={IEEE Transactions on Multimedia},
title={CHA: Physics-Based Dynamics Modeling for Clothing-Editable Human Avatars},
year={2026},
pages={1-11},
doi={10.1109/TMM.2026.3701539}
}