【Reading】LATITUDE:Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF

  1. 1. Abstract
  2. 2. System Design
    1. 2.1. Implementation
      1. 2.1.1. Place Recognition
      2. 2.1.2. Pose Optimization
  3. 3. Explanations & References

This paper proposes a two-stage localization mechanism in city-scale NeRF.

Abstract

Neural Radiance Fields (NeRFs) have made great success in representing complex 3D scenes with high-resolution details and efficient memory. Nevertheless, current NeRF-based pose estimators have no initial pose prediction and are prone to local optima during optimization. In this paper, we present LATITUDE: Global Localization with Truncated Dynamic Low-pass Filter, which introduces a two-stage localization mechanism in city-scale NeRF.

  • In place recognition stage, we train a regressor through images generated from trained NeRFs, which provides an initial value for global localization.

  • In pose optimization stage, we minimize the residual between the observed image and rendered image by directly optimizing the pose on the tangent plane.

  • To avoid falling into local optimum, we introduce a Truncated Dynamic Low-pass Filter (TDLF) for coarse-to-fine pose registration.

We evaluate our method on both synthetic and real-world data and show its potential applications for high-precision navigation in large scale city scenes.

System Design

  • Place Recognition
    1. Original poses, accompanied by additional poses around the original ones are sampled.
    2. The pose vector is passed through the trained and fixed Mega-NeRF with shuffled appearance embeddings.
    3. Initial poses of the inputted images are predicted by a pose regressor network.
  • Pose Optimization
    1. The initial poses are passed through positional encoding filter
    2. The pose vector is passed through the trained and fixed Mega-NeRF and generates a rendered image.
    3. Calculate the photometric error of the rendered image and the observed image and back propagate to get a more accurate pose with the TDLF.

Implementation

Place Recognition

  • Data Augmentation: A technique in machine learning used to reduce overfitting when training a machine learning model by training models on several slightly-modified copies of existing data.

    First uniformly sample several positions in a horizontal rectangle area around each position around original poses . Then add random perturbations on each axis drawn evenly in , where is the max amplitude of perturbation to form sampled poses . They are used to generate the rendered observations by inputting the poses to Mega-NeRF.

    To avoid memory explosion, we generate the poses using the method above and use Mega-NeRF to render images during specific epochs of pose regression training.

    Additionally, Mega-NeRF’s appearance embeddings are selected by randomly interpolating those of the training set, which can be considered as a data augmentation technique to improve the robustness of the APR model under different lighting conditions.

    Network Structure of VGG-16

  • Pose Regressor: Absolute pose regressor (APR) networks are trained to estimate the pose of the camera given a captured image.

    • Architecture: Built on top of VGG16’s light network structure, we use 4 full connection layers to learn pose information from image sequences.

      • Input: Observed image (resolution ), rendered observations

      • Output: Corresponding estimated poses , .

      • Loss Function: (In general, the model should trust more on real-world data and learn more from it.)

Pose Optimization

  • MAP Estimation Problem[A] Formulation: Here denotes place recognition; denotes the trained Mega-NeRF.

    We optimize posterior by minimizing the photometric error of and the image rendered by .

  • Optimization on Tangent Plane: We optimize pose on tangent plane to ensure a smoother convergence. [1]

    TODO I know nothing about :(

Explanations & References

[1]Adamkiewicz, M., Chen, T., Caccavale, A., Gardner, R., Culbertson, P., Bohg, J., & Schwager, M. (2022). Vision-only robot navigation in a neural radiance world. IEEE Robotics and Automation Letters, 7(2), 4606-4613. https://arxiv.org/pdf/2110.00168.pdf

Turki, H., Ramanan, D., & Satyanarayanan, M. (2022). Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12922-12931). https://arxiv.org/pdf/2112.10703.pdf

Yen-Chen, L., Florence, P., Barron, J. T., Rodriguez, A., Isola, P., & Lin, T. Y. (2021, September). inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1323-1330). IEEE. https://arxiv.org/pdf/2012.05877.pdf

[A]Maximum A Posterior (MAP) Estimation: Maximum a posteriori (MAP) estimation is a method of statistical inference that uses Bayes' theorem to find the most likely estimate of a parameter given some observed data.