Projectpage of Splat-SAP

Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with
Scale-Aware Point Map Reconstruction

AAAI 2026

Boyao Zhou^2,1, Shunyuan Zheng², Zhanfeng Liao¹, Zihan Ma¹*, Hanzhang Tu¹, Boning Liu¹, Yebin Liu^1✉

¹Tsinghua University²Ant Group
*Work done during an internship at Tsinghua University ^✉Corresponding author

Abstract

We present Splat-SAP, a feed-forward approach to render novel views of human-centered scenes from binocular cameras with large sparsity. Gaussian Splatting has shown its promising potential in rendering tasks, but it typically necessitates per-scene optimization with dense input views. Although some recent approaches achieve feed-forward Gaussian Splatting rendering through geometry priors obtained by multi-view stereo, such approaches still require largely overlapped input views to establish the geometry prior. To bridge this gap, we leverage pixel-wise point map reconstruction to represent geometry which is robust to large sparsity for its independent view modeling. In general, we propose a two-stage learning strategy. In stage 1, we transform the point map into real space via an iterative affinity learning process, which facilitates camera control in the following. In stage 2, we project point maps of two input views onto the target view plane and refine such geometry via stereo matching. Furthermore, we anchor Gaussian primitives on this refined plane in order to render high-quality images. As a metric representation, the scale-aware point map in stage 1 is trained in a self-supervised manner without 3D supervision and stage 2 is supervised with photo-metric loss. We collect multi-view human-centered data and demonstrate that our method improves both the stability of point map reconstruction and the visual quality of free-viewpoint rendering.

Method

Overview: Our method consists of two stages. In the first stage, we take two coarse images as input and predict corresponding point maps, along with an affine transform. In the second stage, our refinement module takes transformed points and fine-resolution images as input, and predicts Gaussian plane of target view for high-quality rendering.

Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with
Scale-Aware Point Map Reconstruction

AAAI 2026

Boyao Zhou^2,1, Shunyuan Zheng², Zhanfeng Liao¹, Zihan Ma¹*, Hanzhang Tu¹, Boning Liu¹, Yebin Liu^1✉

Abstract

Method

Motivation

Design for dynamic scenes, scale-aware reconstruction, spatial-temporal consistency

Rendering Comparisons

Baselines: NeRF-like ENeRF, Gaussian Splatting-based MVSGaussian, Point Map-based NoPoSplat

Ablation Studies

Ablation studies on Refinement

Citation

Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene withScale-Aware Point Map Reconstruction

AAAI 2026

Boyao Zhou2,1, Shunyuan Zheng2, Zhanfeng Liao1, Zihan Ma1*, Hanzhang Tu1, Boning Liu1, Yebin Liu1✉

Abstract

Method

Motivation

Design for dynamic scenes, scale-aware reconstruction, spatial-temporal consistency

Rendering Comparisons

Baselines: NeRF-like ENeRF, Gaussian Splatting-based MVSGaussian, Point Map-based NoPoSplat

Ablation Studies

Ablation studies on Refinement

Citation

Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with
Scale-Aware Point Map Reconstruction

Boyao Zhou^2,1, Shunyuan Zheng², Zhanfeng Liao¹, Zihan Ma¹*, Hanzhang Tu¹, Boning Liu¹, Yebin Liu^1✉