ChangeFlow - Latent Rectified Flow for Change Detection in Remote Sensing
Abstract
Remote sensing change detection (RSCD) aims to localise changes between two images of the same geographic region. In practice, change masks often follow region-level annotation conventions rather than purely local appearance differences, making them context-dependent and occasionally ambiguous. Most state-of-the-art methods utilise per-pixel discriminative classification, which produces a single prediction per input and fails to explicitly model the changed region as a coherent whole. A natural alternative is generative formulation, which can model a distribution of plausible masks, enabling sampling to capture ambiguity and encourage global consistency. However, existing generative RSCD approaches typically lag behind strong discriminative baselines due to the high computational cost of pixel-space generation and the complexity of their conditioning mechanisms. To address the limitations of prior discriminative and generative methods, we propose ChangeFlow, a generative framework that reformulates change detection as the synthesis of a change mask in latent space via rectified flow. ChangeFlow is guided by a structured yet lightweight conditioning signal, and its stochastic design naturally supports sampling-based prediction ensembling. Namely, aggregating multiple predicted change masks improves robustness, while sample agreement provides a practical confidence estimation that highlights ambiguous regions. Across four benchmarks, ChangeFlow achieves an average F1 of 80.4%, improving by 1.3 points on average over the previous best method, while maintaining inference speed comparable to recent strong baselines.
Mask Generation Examples
Contributions
- We reformulate RSCD as latent-space change mask generation and propose a rectified flow framework that produces globally coherent change masks.
- We introduce a conditioning strategy based on input feature differences that avoids auxiliary predictors and complex architecture.
- We leverage the sampling-based generation inherent to rectified flow models to obtain confidence estimates and effectively fuse predictions, offering a controllable speed--accuracy trade-off by adjusting the number of generation steps and repetitions.
ChangeFlow
Given a pair of images, we first extract features using a Shared Weight Encoder, and we condition the Diffusion Transformer (DiT) rectified flow model on the absolute difference of the extracted features. Guided by this conditioning, the model then iteratively generates a latent representation of the corresponding change mask, which is ultimately decoded by the Variational Autoencoder (VAE) into a binary change mask.
Results
False positives are marked in red and false negatives in blue.
Coherence Analysis
Coherence measured as hole count error and connected component count error averaged over 4 datasets. Lower is better. ChangeFlow yields low structural error, indicating the fewest spurious holes and few incorrectly fragmented components.
Number of steps and sampling based inference
Impact of number of sampling steps (at fixed rate of 5 repetitions) and inference repetitions (at fixed 10 sampling steps). Change detection performance is reported on the left y-axis, measured as average F1 across 4 datasets, while inference speed is reported on the right y-axis as frames per second. Sampling-based ensembling (multiple repetitions), enabled by generative formulation, provides a controllable speed–accuracy trade-off at inference time.
BibTeX
@article{rolih2026changeflow,
title={ChangeFlow - Latent Rectified Flow for Change Detection in Remote Sensing},
author={Rolih, Blaž and Fučka, Matic and Wolf, Filip and Čehovin Zajc, Luka},
journal={arXiv},
year={2026},
eprint={2605.15375},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.15375},
}