ConGeo:
Robust Cross-view Geo-localization across Ground View Variations

École Polytechnique Fédérale de Lausanne (EPFL), Wuhan University
*Indicates Equal Contribution


EPFL Logo
WHU Logo

Motivation

Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs), including the following settings:

  • North-aligned: Ground-view and aerial-view images are aligned to the North.
  • Unknown Orientation: Ground-view images are of arbitrary orientations (FoV=360°).
  • Limited Field of View: Ground-view images are of arbitrary orientations and limited fielf of views (e.g., FoV=70°, 90°, or 180°).
An Overview of ConGeo


However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. To tackle this challenge, we propose ConGeo, a single- and cross-modal Contrastive method for Geo-localization that improves the base model's robustness across various ground view variations.

Method

ConGeo enhances robustness and consistency in feature representations to improve a model's invariance to orientation and its resilience to FoV variations, by enforcing proximity between ground view variations of the same location.

Modalities overview

ConGeo's learning pipeline. For feature representation in the left and right boxes, the North-aligned ground image, the transformed ground image, and the aerial view are sent to their respective encoders. Then in the feature space, the single- and cross-modal contrastive learning losses are applied to enforce the proximity of the paired images.

The proposed training objectives contains cross-modal and single-modal contrastive learning with four losses:

An Overview of ConGeo

Results


Experiments are performed on four cross-view geolocalization benchmarks to investigate ConGeo's robustness, adaptability, and generalization ability. Analysis on orientation invariance and activation map further comfirms ConGeo's robustness.

Robustness across different settings

Using a single model, ConGeo outperforms SOTA FoV-specific or orientation-specific models under ground-view variations and achieves leading performance under North alignment.


MY ALT TEXT

Comparison with SOTA FoV-specific methods on different settings on the CVUSA dataset.


MY ALT TEXT


Retrieval results of the baseline model and of ConGeo undering the North-aligned setting and limited FoV setting.


Superiority over data augmentations

ConGeo shows effectiveness compared with different data augmentation methods.


MY ALT TEXT

Comparison on unknown orientation setting and limited FoV setting between ConGeo and different data augmentation methods on the CVUSA dataset. “Shift” denotes using shifted query images and “FoV” denotes using query images of limited FoVs, “Rotate” randomly rotating aerial images as data augmentation.


Generalization ability on unseen ground view variations

ConGeo generalizes better on unseen ground view variations than baseline model and baseline model with data augmentation.


MY ALT TEXT

Comparison on unseen ground view variations (e.g., Random FoVs, Random Zooming, Gaussian Noise, and Motion Blur) between ConGeo and baselines on the CVUSA dataset. “DA” means data augmentation.


Analysis: How does ConGeo achieve robustness?

Orientation invariance analysis that showcases models’ vulnerabilities to orientation shifts and activation map visualization that investigates the models’ focus.


MY ALT TEXT

ConGeo shows better orientation invariance. We cyclically shift the ground view with an angle (x-axis) as the model’s input to test its retrieval performance. Note that “N-A” denotes the North-aligned setting and “DA” means data augmentation.


MY ALT TEXT

ConGeo’s activation areas are more consistent across ground view shifts.



BibTeX

@article{mi2024congeo,
  title={ConGeo: Robust Cross-view Geo-localization across Ground View Variations},
  author={Mi, Li and Xu, Chang and Castillo-Navarro, Javiera and Montariol, Syrielle and Yang, Wen and Bosselut, Antoine and Tuia, Devis},
  journal={arXiv preprint arXiv:2403.13965},
  year={2024}
}