【Reading】LATITUDE:Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF

  • ~6.24K 字
  1. 1. Abstract
  2. 2. System Design
    1. 2.1. Implementation
      1. 2.1.1. Place Recognition
      2. 2.1.2. Pose Optimization
  3. 3. Explanations & References

This paper proposes a two-stage localization mechanism in city-scale NeRF.

Abstract

Neural Radiance Fields (NeRFs) have made great success in representing complex 3D scenes with high-resolution details and efficient memory. Nevertheless, current NeRF-based pose estimators have no initial pose prediction and are prone to local optima during optimization. In this paper, we present LATITUDE: Global Localization with Truncated Dynamic Low-pass Filter, which introduces a two-stage localization mechanism in city-scale NeRF.

  • In place recognition stage, we train a regressor through images generated from trained NeRFs, which provides an initial value for global localization.

  • In pose optimization stage, we minimize the residual between the observed image and rendered image by directly optimizing
    the pose on the tangent plane.

  • To avoid falling into local optimum, we introduce a Truncated Dynamic Low-pass Filter (TDLF) for coarse-to-fine pose registration.

We evaluate our method on both synthetic and real-world data and show its potential applications for high-precision navigation in large scale city scenes.

System Design

  • Place Recognition
    1. Original poses, accompanied by additional poses around the original ones are sampled.
    2. The pose vector is passed through the trained and fixed Mega-NeRF with shuffled appearance embeddings.
    3. Initial poses of the inputted images are predicted by a pose regressor network.
  • Pose Optimization
    1. The initial poses are passed through positional encoding filter
    2. The pose vector is passed through the trained and fixed Mega-NeRF and generates a rendered image.
    3. Calculate the photometric error of the rendered image and the observed image and back propagate to get a more accurate pose with the TDLF.

Implementation

Place Recognition

  • Data Augmentation: A technique in machine learning used to reduce overfitting when training a machine learning model by training models on several slightly-modified copies of existing data.

    First uniformly sample several positions in a horizontal H×WH\times W rectangle area around each position around original poses Treal(xreal,qreal)T_{real}(\mathbf x_{real},\mathbf q_{real}). Then add random perturbations on each axis drawn evenly in [θ,θ][-\theta,\theta], where θ\theta is the max amplitude of perturbation to form sampled poses Tsyn(xsyn,qsyn)T_{syn}(\mathbf x_{syn},\mathbf q_{syn}). They are used to generate the rendered observations IsynI_{syn} by inputting the poses to Mega-NeRF.

    To avoid memory explosion, we generate the poses using the method above and use Mega-NeRF to render images during specific epochs of pose regression training.

    Additionally, Mega-NeRF’s appearance embeddings are selected by randomly interpolating those of the training set, which can be considered as a data augmentation technique to improve the robustness of the APR model under different lighting conditions.

    Network Structure of VGG-16
  • Pose Regressor: Absolute pose regressor (APR) networks are trained to estimate the pose of the camera given a captured image.

    • Architecture: Built on top of VGG16’s light network structure, we use 4 full connection layers to learn pose information from image sequences.

      • Input: Observed image IrealI_{real} (resolution 480×240480 \times 240), rendered observations IsynI_{syn}

      • Output: Corresponding estimated poses T^real(x^real,q^real)\hat T_{real}(\mathbf {\hat x}_{real},\mathbf {\hat q}_{real}), T^syn(x^syn,q^syn)\hat T_{syn}(\mathbf {\hat x}_{syn},\mathbf {\hat q}_{syn}).

      • Loss Function: (In general, the model should trust more on real-world data and learn more from it.)

        Lsyn=x^synxsyn2+γq^synqsynq^syn2Lreal=x^realxreal2+γq^realqrealq^real2L=Lreal+βLsyn \begin{aligned} L_{syn}&=\Vert\mathbf {\hat x}_{syn}-\mathbf {x}_{syn}\Vert_2+\gamma\left\Vert\mathbf {\hat q}_{syn}-\frac{\mathbf {q}_{syn}}{||\mathbf {\hat q}_{syn}||}\right\Vert_2\\\\ L_{real}&=\Vert\mathbf {\hat x}_{real}-\mathbf {x}_{real}\Vert_2+\gamma\left\Vert\mathbf {\hat q}_{real}-\frac{\mathbf {q}_{real}}{||\mathbf {\hat q}_{real}||}\right\Vert_2\\\\ L&=L_{real}+\beta L_{syn} \end {aligned}

Pose Optimization

  • MAP Estimation Problem[A] Formulation:

    P(T^kF(ϵ),Ik)P(I^kT^k,F(ϵ))P(T^kIk) P(\hat T_k |F(\epsilon), I_k)\propto P(\hat I_k|\hat T_k, F(\epsilon))P(\hat T_k|I_k)

    Here P(T^kIk)P(\hat T_k|I_k) denotes place recognition; F(ϵ)F(\epsilon) denotes the trained Mega-NeRF.

    We optimize posterior P(T^kF(ϵ),Ik)P(\hat T_k |F(\epsilon), I_k) by minimizing the photometric error of IkI_k and the image I^k\hat I_k rendered by F(ϵ)F(\epsilon).

  • Optimization on Tangent Plane: We optimize pose on tangent plane to ensure a smoother convergence. [1]

    TODO I know nothing about SE(3)SE(3) :(

Explanations & References

[1]Adamkiewicz, M., Chen, T., Caccavale, A., Gardner, R., Culbertson, P., Bohg, J., & Schwager, M. (2022). Vision-only robot navigation in a neural radiance world. IEEE Robotics and Automation Letters, 7(2), 4606-4613. https://arxiv.org/pdf/2110.00168.pdf

Turki, H., Ramanan, D., & Satyanarayanan, M. (2022). Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12922-12931). https://arxiv.org/pdf/2112.10703.pdf

Yen-Chen, L., Florence, P., Barron, J. T., Rodriguez, A., Isola, P., & Lin, T. Y. (2021, September). inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1323-1330). IEEE. https://arxiv.org/pdf/2012.05877.pdf

[A]Maximum A Posterior (MAP) Estimation: Maximum a posteriori (MAP) estimation is a method of statistical inference that uses Bayes’ theorem to find the most likely estimate of a parameter given some observed data.

分享这一刻
让朋友们也来瞅瞅!