ABSTRACT: This paper introduces a self-supervised, end-to-end architecture that learns part-level implicit shape and appearance models and optimizes motion parameters jointly without requiring any 3D supervision, motion, or semantic annotation. The training process is similar to original NeRF but and extend the ray marching and volumetric rendering procedure to compose the two fields.
[Arxiv] [Github] [Project Page]
Problem Statement
The problem of articulate object reconstruction in this paper can be
summarized as: Given start state
The first problem is to decouple the object into static and movable part. Here the paper assumes that an object has only one static and one movable part.
The second problem is to estimate the articulated motion
Method
This paper divides the parts by registration on input state
Structure
Static and moving part are jointly learnt during training and they
are built separately on networks with the same structure that built upon
InstantNGP. Their relationship is modeled explicitly as the
transformation function
. $$ Here
Training
The adapted training pipeline is similar to NeRF and the ray marching and volumetric rendering procedure to compose the two fields is extended.