This paper propose a way to form articulation model of articulated objects by encoding the features and find the correspondence of static and mobile part via visual observation before and after the interaction.
Workflow-In Brief
Two Stream Encoder
- Given point cloud observations before and after interaction:
Encode them with PointNet++ Encoder
: , . . is the number of the sub-sampled points, and is the dimension of the sub-sampled point features. Fuse the features with attention layer:
, , . The fused feature is decoded by two PointNet++ decoder
, , and get , . are point features aligned with Feature encoding based on ConvONet.
Training
Revolute joint: