ECCV 2026
Controlling continuous, physically valid 3D surface evolution is a fundamental challenge in computer vision and graphics. Current neural representations typically rely on unstructured latent spaces or discrete iterations, lacking explicit geometric interpretability and struggling to produce smooth deformations. While classical scale-space theory offers clear physical interpretability, its numerical irregularity isolates it from modern deep learning. In this paper, we bridge this gap by establishing a learnable scale-space foundation. Our framework comprises three synergistic components: (1) the Variational Neighborhood Curvature (VNC) operator, an efficient, parallelizable, and scale-stable geometric metric; (2) the spacetime-balanced VNC-Flow algorithm, which translates irregular physical smoothing into structured deformation trajectories; and (3) a Controllable Variational Autoencoder (C-VAE) that learns these trajectories conditioned on normalized evolution time. Departing from traditional discrete iterations or unstructured latent interpolations, our continuous neural surrogate induces a structured radial latent organization. This enables precise control over continuous shape abstraction and establishes a structural prior for downstream applications.
Pipeline of our learnable 3D surface evolution framework. Left: The VNC operator computes fast, differentiable, and scale-stable multi-scale curvatures. Middle: Guided by VNC, the spacetime-balanced VNC-Flow explicitly samples stable, singularity-free deformation trajectories toward a spherical base topology. Right: Conditioned on these physical trajectories, the C-VAE inherently organizes a structured radial latent space, enabling continuous predictive evolution and establishing a robust geometric prior for downstream tasks.
The VNC operator is a fully differentiable, multi-scale discrete mean curvature metric, with two synergistic variants: the explicitly bounded VNC-hard (theoretical anchor) and the probabilistically bounded VNC-soft (practical neural enabler). VNC-soft reformulates local spatial queries into global Gaussian-weighted matrix multiplications, turning irregular discrete geometric intersections into highly parallelizable tensor operations. By normalizing the curvature integration, VNC yields a dimensionless, scale-stable shape index.
VNC definition and visualization. (a) VNC-hard explicitly computes curvature via line-ball intersections. (b) As the perception scale increases, perceived curvature naturally transitions from high-frequency details to low-frequency structures.
To quantify geometric information across scales, we introduce the VNC-AUC metric, which measures local geometric complexity as the area between a vertex's VNC-Scale curve and the global mean curve. This drives an adaptive scale that perceives equal information per vertex.
Computation of VNC-AUC and adaptive scales. The VNC-AUC metric accumulates multi-scale curvature information, while the adaptive scale indicates where the predefined information threshold is reached.
The VNC-Flow algorithm tames the numerically irregular curvature flow into structured deformation trajectories by enforcing three balances: (1) spatial balance via the adaptive scale, (2) temporal balance via the normalized evolution time t ∈ [0,1], and (3) topology regularization via SDF-based remeshing. This smoothly eliminates topological singularities while preserving the face count and surface area, thereby upholding the geometric prerequisites for VNC scale stability. The normalized evolution time t re-parameterizes the irregular physical time as the ratio of current to initial geometric complexity, serving as the temporal coordinate to condition the C-VAE.
Comparison and ablation of curvature-flow variants. (a) Input shape. (b) Adaptive-scale VNC-Flow approaches the spherical target. Fixed-scale variants exhibit (c) uneven degeneration or (d) incomplete abstraction, while mean-curvature-flow baselines (e) collapse or (f) contract toward a mean-curvature skeleton.
The C-VAE acts as the continuous neural surrogate, fusing the normalized evolution time t ∈ [0,1] via deep cross-attention. We design two conditioning pathways: C-VAE-I (Intermediate Feature Fusion) injects t into the encoder's intermediate features, tailored for dataset-level generalization; C-VAE-II (Latent Code Modulation) modulates the high-dimensional geometric latent codes using a frozen pre-trained encoder, enabling high-fidelity, instance-specific trajectory distillation. Together with VNC-Flow supervision, this deep temporal modulation encourages the latent space to organize into a radial latent structure, where base topologies collapse toward the origin (t → 0) and complex details anchor at the periphery (t → 1).
Controllable VAE (C-VAE) architectures. Building upon the VecSet-based geometric backbone, our C-VAE introduces a temporal condition branch. C-VAE-I modulates encoder features via cross-attention with the temporal embedding t. C-VAE-II modulates independent high-dimensional latent codes utilizing a frozen pre-trained encoder.
Computational Efficiency. While DiffCurvature is exceptionally fast, it lacks multi-scale support. Conversely, the classical baseline provides multi-scale curvature estimation but becomes computationally prohibitive as mesh resolution increases. Our VNC variants bridge this gap by jointly offering differentiability, GPU parallelism, and explicit multi-scale capability. In particular, VNC-soft consistently achieves a 2–3× speedup over VNC-hard, which itself is already 50–150× faster than the CPU baseline, making it well suited for intensive flow sampling.
| Operator | Multi-scale | Parallel | Differentiable | V=10k | V=25k | V=50k |
|---|---|---|---|---|---|---|
| baseline | ✓ | × | × | 25 s | 37 s | 256 s |
| DiffCurvature | × | ✓ | ✓ | 3 ms | 4 ms | 7 ms |
| VNC-hard (Ours) | ✓ | ✓ | ✓ | 149 ms | 761 ms | 3.0 s |
| VNC-soft (Ours) | ✓ | ✓ | ✓ | 57 ms | 358 ms | 1.4 s |
Comparison of curvature operators. VNC variants provide a practical balance of multi-scale capability, differentiability, and GPU-accelerated efficiency across representative mesh resolutions.
Multi-scale Decoupling & Scale Stability. VNC acts as an effective geometric frequency filter, transitioning from high-frequency wrinkles at small scales to low-frequency structural topology at large scales. Crucially, the global mean of our normalized VNC remains stable across varying scales and diverse topological inputs — a dimensionless shape index that provides a reliable, non-drifting convergence target for the flow.
Curvature operators across scales. VNC transitions from high-frequency wrinkles at small scales to low-frequency abstractions at large scales. Setting r = 6σ effectively aligns VNC-soft with explicitly bounded VNC-hard.
Validation of dimensionless scale stability. (a) Unlike the baseline whose mean curvature diverges with scale, VNC operators maintain a stable global mean while variance strictly decreases. (b, c) This scale-stable property holds robustly across diverse topological inputs.
Spacetime Balance and Continuous Abstraction. VNC-Flow shows an overall reduction in mean VNC-AUC, with stage transitions defined by the information-decay budget; the small rebounds near some transitions arise from SDF-based remeshing and do not alter the global trend. Spatially, complex regions receive smaller scales while smoother regions receive larger scales. Across human scans with diverse poses and clothing, this ordered evolution consistently produces continuous abstraction trajectories toward compact base shapes, providing structured intermediate states for training the neural surrogate.
Spacetime balance in VNC-Flow. (a) Mean VNC-AUC decreases overall across the stage thresholds. Small rebounds arise at SDF-based remeshing boundaries. (b) Adaptive-scale maps assign smaller scales to geometrically complex regions and larger scales to smoother regions.
VNC-Flow evolution and C-VAE predictions. (a) VNC-Flow produces ordered abstraction trajectories for human scans with diverse poses and clothing. (b) On unseen scans, the trained C-VAE-I predicts continuous forward abstraction trajectories conditioned solely on the normalized time t.
High-Fidelity Subject-Specific Trajectory (C-VAE-II). To evaluate the representational capacity of our temporal mapping, we utilize C-VAE-II to intentionally overfit a highly complex sequence. Compared to existing paradigms (NMF, NIE, MeshSDF), our C-VAE closely follows the physical curvature flow, while existing methods suffer from fixed topologies, non-geometric artifacts, or unphysical smoothing orders. Quantitatively, our temporal conditioning (t = 1) yields significantly higher reconstruction precision (CD, EMD, and NCE) compared to baseline discrete iterations or latent interpolations.
Comparison of trajectory-learning paradigms. Compared to existing paradigms, our C-VAE follows the physically grounded, frequency-ordered decay.
| Method | CD×10-4 (↓) | EMD×10-2 (↓) | NCE (rad) (↓) |
|---|---|---|---|
| NMF (iter. 15k) | 1.92 | 5.60 | 0.518 |
| NIE (iter. 1k) | 4.77 | 3.17 | 0.405 |
| MeshSDF (α = 1.0) | 1.61 | 2.55 | 0.341 |
| C-VAE-II (Ours, t = 1.0) | 1.05 | 1.94 | 0.257 |
Quantitative comparison of surface evolution methods. Our C-VAE-II achieves superior reconstruction accuracy at the target shape (t = 1).
Anatomy of the Radial Latent Space. Max-pooling with UMAP reveals a radial organization: minimum-information base shapes (t → 0) cluster near the center, while increasing geometric detail extends toward the periphery. Mean-pooling with PCA further reveals a shared global direction associated with t. This organization is primarily enabled by VNC-Flow, which aligns subjects through the normalized evolution coordinate and a common base-shape endpoint.
Latent-space anatomy of C-VAE-I. (a) UMAP reveals a radial structure, anchoring base topologies at the center and complex features at the periphery. (b) PCA reveals a parallel structure, indicating a unified latent direction for the time condition.
@inproceedings{li2026vnc,
title = {VNC: A Scale-Space Foundation for Learnable 3D Surface Evolution},
author = {Li, Baoxing and Deng, Yong and Zhao, Xu},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2026}
}