武汉大学测绘遥感信息工程国家重点实验室;武汉大学遥感信息工程学院;湖北珞珈实验室;
当前深度学习在遥感领域已经取得了显著的发展,而大规模,高质量标注的训练数据集对深度学习的突破起着至关重要的作用。尽管遥感训练样本量在不断增加,但多样性的遥感语义分割样本仍然缺乏。针对该问题,本文提出了RS-SegDif方法,通过生成式扩散模型生成遥感影像来有效扩充遥感语义分割样本多样性,这将改变传统的数据生成过程。本方法首先根据遥感影像的文字提示,通过扩散模型生成了满足真实世界的数据分布多样化的语义标签,然后以语义分割标签为条件,通过扩散模型生成遥感影像的方式,充分地扩充了遥感语义分割样本多样性。此外,为了大幅提升生成样本的多样性,RS-SegDif整合了两个遥感数据生成策略,即通过文本生成标签再生成影像的策略以及直接通过文本和真实标签生成影像的策略。针对下游任务,对比了多种语义分割模型,当使用合成遥感数据进行训练时,本文的合成数据的高质量在下游语义分割任务中提升了模型精度约+3.25 mIoU,有效扩充了遥感样本的多样性。
157 | 0 | 21 |
下载次数 | 被引频次 | 阅读次数 |
[1] 龚健雅,许越,胡翔云等.遥感影像智能解译样本库现状与研究[J].测绘学报,2021,50(8):1013-1022.
[2] 龚健雅,郝哲.信息化时代新型测绘地理信息技术的发展[J].中国测绘,2019(7):25-30.
[3] Zhang L,Rao A,Agrawala M.Adding Conditional Control to Text-to-image Diffusion Models[C].Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:3836-3847.
[4] Zhao S,Chen D,Chen Y C,et al.Uni-controlnet:All-in-one Control to Text-to-image Diffusion Models[J].Advances in Neural Information Processing Systems,2024,36.
[5] Rombach R,Blattmann A,Lorenz D,et al.High-resolution Image Synthesis with Latent Diffusion Models[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:10684-10695.
[6] Li X,Ding H,Yuan H,et al.Transformer-based Visual Segmentation:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024.
[7] Baranchuk D,Rubachev I,Voynov A,et al.Label-efficient Semantic Segmentation with Diffusion Models[J].arXiv preprint arXiv:2112.03126,2021.
[8] Pnvr K,Singh B,Ghosh P,et al.Ld-znet:A Latent Diffusion Approach for Text-based Image Segmentation[C].Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:4157-4168.
[9] Zhao W,Rao Y,Liu Z,et al.Unleashing Text-to-image Diffusion Models for Visual Perception[C].Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:5729-5739.
[10] Ji Y,Chen Z,Xie E,et al.Ddp:Diffusion Model for Dense Visual Prediction[C].Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:21741-21752.
[11] Li Z,Zhou Q,Zhang X,et al.Open-vocabulary Object Segmentation with Diffusion Models[C].Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:7667-7676.
[12] Zhang Y,Ling H,Gao J,et al.Datasetgan:Efficient Labeled Data Factory with Minimal Human Effort[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10145-10155.
[13] Brock A.Large Scale GAN Training for High Fidelity Natural Image Synthesis[J].arXiv preprint arXiv:1809.11096,2018.
[14] Goodfellow I,Pouget-Abadie J,Mirza M,et al.Generative Adversarial Networks[J].Communications of the ACM,2020,63(11):139-144.
[15] Kang M,Zhu J Y,Zhang R,et al.Scaling Up Gans for Text-to-image Synthesis[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2023:10124-10134.
[16] Kingma D P.Auto-encoding Variational Bayes[J].arXiv preprint arXiv:1312.6114,2013.
[17] Dhariwal P,Nichol A.Diffusion Models Beat Gans on Image Synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794.
[18] Podell D,English Z,Lacey K,et al.Sdxl:Improving Latent Diffusion Models for High-resolution Image Synthesis[J].arXiv preprint arXiv:2307.01952,2023.
[19] Mou C,Wang X,Xie L,et al.T2i-adapter:Learning Adapters to Dig out More Controllable Ability for Text-to-image Diffusion Models[C].Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(5):4296-4304.
[20] Chen Z,Wu J,Wang W,et al.Internvl:Scaling Up Vision Foundation Models and Aligning for Generic Visual-linguistic Tasks[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2024:24185-24198.
[21] Long J,Shelhamer E,Darrell T.Fully Convolutional Networks for Semantic Segmentation[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440.
[22] Ronneberger O,Fischer P,Brox T.U-net:Convolutional Networks for Biomedical Image Segmentation[C].Medical Image Computing and Computer-assisted Intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,proceedings,part III 18.Springer International Publishing,2015:234-241.
[23] Chen L C,Zhu Y,Papandreou G,et al.Encoder-decoder with Atrous Separable Convolution for Semantic Image Segmentation[C].Proceedings of the European Conference on Computer Vision (ECCV),2018:801-818.
[24] Zheng S,Lu J,Zhao H,et al.Rethinking Semantic Segmentation from a Sequence-to-sequence Perspective with Transformers[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:6881-6890.
[25] Hong D,Yao J,Meng D,et al.Multimodal GANs:Toward Crossmodal Hyperspectral-multispectral Image Segmentation[J].IEEE Transactions on Geoscience and Remote Sensing,2020,59(6):5103-5113.
[26] Choi Y,Uh Y,Yoo J,et al.Stargan v2:Diverse Image Synthesis for Multiple Domains[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:8188-8197.
[27] Toker A,Eisenberger M,Cremers D,et al.Satsynth:Augmenting Image-mask Pairs Through Diffusion Models for Aerial Semantic Segmentation[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2024:27695-27705.
[28] Khanna S,Liu P,Zhou L,et al.Diffusionsat:A Generative Foundation Model for Satellite Imagery[J].arXiv preprint arXiv:2312.03606,2023.
[29] Sastry S,Khanal S,Dhakal A,et al.GeoSynth:Contextually-Aware High-Resolution Satellite Image Synthesis[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2024:460-470.
[30] Yang F,Ma C.Sparse and Complete Latent Organization for Geospatial Semantic Segmentation[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:1809-1818.
基本信息:
DOI:
中图分类号:TP751
引用信息:
[1]龚健雅,刘青瑀,张觅等.RS-SegDif:基于扩散模型的遥感语义分割样本合成方法[J].城市勘测,2025,No.208(01):1-7.
基金信息:
湖北省重大研发计划项目(2023BAB173); 国家自然科学基金项目(41901265); 湖北珞珈实验室基金项目(220100028)