Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations

University of Ljubljana, Faculty of Computer and Information Science

Unsupervised change detection via latent perturbations

TL;DR: We propose a novel end-to-end framework for unsupervised change detection which generates synthetic changes in latent space during training. It achieves better generalisation across change types and outperforms previous SOTA on average by 14.1 F1 points.

Abstract

Unsupervised change detection (UCD) in remote sensing aims to localise semantic changes between two images of the same region without relying on labelled data during training. Most recent approaches rely either on frozen foundation models in a training-free manner or on training with synthetic changes generated in pixel space. Both strategies inherently rely on predefined assumptions about change types, typically introduced through handcrafted rules, external datasets, or auxiliary generative models. Due to these assumptions, such methods fail to generalise beyond a few change types, limiting their real-world usage, especially in rare or complex scenarios. To address this, we propose MaSoN (Make Some Noise), an end-to-end UCD framework that synthesises diverse changes directly in the latent feature space during training. It generates changes that are dynamically estimated using feature statistics of target data, enabling diverse yet data-driven variation aligned with the target domain. It also easily extends to new modalities, such as SAR. MaSoN generalises strongly across diverse change types and achieves state-of-the-art performance on five benchmarks, improving the average F1 score by 14.4 percentage points.

Contributions

  • The first end-to-end latent space change generation and detection framework that can be trained in an unsupervised manner with our on-the-fly change synthesis strategy. Our framework overcomes the generalisation problems of related approaches, leading to improved detection performance across diverse change types.
  • Procedure for creating synthetic changes inside the latent space of an encoder. The synthetic changes are modelled as Gaussian noise, which is dynamically estimated using simple statistics derived from the latent features of the target data. Through experiments and theoretical analysis, we show that real-world changes can be decoupled into irrelevant and relevant categories and approximated accordingly. This strategy enables variability that is hard to capture in pixel space, and it naturally supports non-RGB modalities.

MaSoN

MaSoN architecture

The proposed method (MaSoN) consists of a Shared Weight Encoder, a Latent Space Change Generation Strategy, and a Mask Decoder. The features are first extracted using the Encoder. Then the Latent Space Change Generation Strategy is used (only during training) to generate synthetic changes on the feature level. Features from different time steps with generated synthetic changes are then fused (in our case via element-wise subtraction) and fed into the Mask Decoder, which outputs the predicted change mask.

Why Gaussian noise?

Feature analysis to motivate Gaussian noise.

Histogram plot of feature differences f1(l) − f2(l) averaged across all five datasets and all channels per layer. Unchanged regions are narrowly concentrated near zero, while changed regions exhibit broader variation, especially in deeper layers. Both distributions can be approximated by a zero-centred Gaussian, but each with a different variance parameter. This directly motivates our latent-space change generation strategy.

Latent Space Change Generation Strategy

Changen procedure

Illustration of the change generation process. Based on our feature analysis, we simulate two types of changes (relevant and irrelevant) using separate noise scales. Since image pairs may already contain (unknown) changes we treat each image independently and generate synthetic changes by perturbing its own features. Irrelevant noise, with a smaller magnitude, is added to entire feature map. Relevant noise, with larger magnitude, is applied to regions delimited by binary mask Mc. This binary mask is also used as a ground truth during training.

Results

Transforming VFMs into Zero-Shot Anomaly Detectors

Transforming VFMs into Zero-Shot Anomaly Detectors

False positives are marked in red and false negatives in blue.

BibTeX

@article{rolih2026mason,
  title={Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations},
  author={Rolih, Blaž and Fučka, Matic and Wolf, Filip and Čehovin Zajc, Luka},
  journal={arXiv},
  year={2026},
  eprint={2602.19881},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.19881},
}