WayIL: Image-based Indoor Localization with Wayfinding Maps

🐳 WayIL: Image-based Indoor Localization with Wayfinding Maps

Obin Kwon^1,2, Dongki Jung ¹, Youngji Kim ¹, Soohyun Ryu ¹, Suyong Yeon ¹,
Songhwai Oh², Donghwan Lee ¹

¹ NAVER LABS, ² RLLAB, Seoul National University
*Work done during the first author's internship at NAVER LABS.

ICRA 2024

Abstract

This paper tackles a localization problem in large-scale indoor environments with wayfinding maps. A wayfinding map abstractly portrays the environment, and humans can localize themselves based on the map. However, when it comes to using it for robot localization, large geometrical discrepancies between the wayfinding map and the real world make it hard to use conventional localization methods. Our objective is to estimate a robot pose within a wayfinding map, utilizing RGB images from perspective cameras.

We introduce two different imagination modules which are inspired by how humans can comprehend and interpret their surroundings for localization purposes. These modules jointly learn how to effectively observe the first-person-view (FPV) world to interpret bird-eye-view (BEV) maps. Providing explicit guidance to the two imagination modules significantly improves the precision of the localization system. We demonstrate the effectiveness of the proposed approach using real-world datasets, which are collected from various large-scale crowded indoor environments. The experimental results show that, in 85% of scenarios, the proposed localization system can estimate its pose within 3m in large indoor spaces.

Problem

Localization in large-scale indoor environments with wayfinding maps.

Wayfinding Map? : An illustrated map we can find in large spaces, such as shopping mall, department store

Since the primary purpose of a wayfinding map is to convey information rather than accurately represent the environment, it often highlights specific areas in detail and simplifies less significant regions. Furthermore, while extra information like GPS or satellite image is possible in outdoor environments, it can be difficult to use such information indoors. However, humans can still locate themselves on a wayfinding map without expensive sensors or precise maps. This paper aims to develop a localization system inspired by these human capabilities.

Method

FPV-to-BEV

The proposed localization system starts from building informative Bird-Eye-View representation from First-Persion View informaiton. We calculate 3D position of each pixels using estimated depth information. We used Omnidata [1] for depth estimation. Inspired by [2], we estimate the floor area using the estimated surface normal, and adjusted the depth scale so that the height of the floor matches the known camera height.

[1] Eftekhar, Ainaz, et al. "Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans." ICCV. 2021.

[2] Xue, Feng, et al. "Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications." IROS. IEEE, 2020.

Cross-correlation based Localization

The proposed localization system based on cross-correlation operation, inspired by [3], [4], [5]. The developed egocentric BEV representation is compared with the wayfinding map by cross-correlation operation (or convolution) with rotating it.

[3] Henriques, et al. "Mapnet: An allocentric spatial memory for mapping environments." CVPR. 2018.

[4] Sarlin, Paul-Edouard, et al. "Orienternet: Visual localization in 2d public maps with neural matching." CVPR. 2023.

[5] Kwon, Obin, et al. "Renderable neural radiance map for visual navigation." CVPR. 2023.

Imagination Modules

We hypothesized that this localization approach entails the robot gradually learning where to look for localization and gaining insight into how the observed area would be represented on the BEV map. Moreover, we believe that having this comprehension is an important factor that allows localization from two different viewpoints.
First, we designed a FPV-Imagine module which can stimulate the understanding from FPV image. This module imagines in a first-person-view (FPV), how the wayfinding map would appear in current RGB images. It learns which part of the FPV image would be drawn on the BEV map and which would not.
Second, we designed a BEV-Imagine module which advances the understanding of BEV representation. This module imagines how the wayfinding map would appear in current RGB images. Imagine-BEV module imagines in a bird-eye-view (BEV) how the current surroundings would be drawn as a wayfinding map.

Particle Filter

This paper focuses on large-scale indoor environments, which frequently have repetitive patterns and numerous dynamic obstacles. Maps for such environments might not always be accurate, leading the localization system to occasionally select an incorrect location that looks similar to the answer. To address this issue, we integrated the particle filter algorithm with the WayIL system to combine information over time and improve estimation.