{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T04:27:04Z","timestamp":1765254424994,"version":"3.41.2"},"reference-count":46,"publisher":"Wiley","issue":"7","license":[{"start":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T00:00:00Z","timestamp":1730678400000},"content-version":"vor","delay-in-days":34,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer Graphics Forum"],"published-print":{"date-parts":[[2024,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With the emergence of large\u2010scale Text\u2010to\u2010Image(T2I) models and implicit 3D representations like Neural Radiance Fields (NeRF), many text\u2010driven generative editing methods based on NeRF have appeared. However, the implicit encoding of geometric and textural information poses challenges in accurately locating and controlling objects during editing. Recently, significant advancements have been made in the editing methods of 3D Gaussian Splatting, a real\u2010time rendering technology that relies on explicit representation. However, these methods still suffer from issues including inaccurate localization and limited manipulation over editing. To tackle these challenges, we propose GSEditPro, a novel 3D scene editing framework which allows users to perform various creative and precise editing using text prompts only. Leveraging the explicit nature of the 3D Gaussian distribution, we introduce an attention\u2010based progressive localization module to add semantic labels to each Gaussian during rendering. This enables precise localization on editing areas by classifying Gaussians based on their relevance to the editing prompts derived from cross\u2010attention layers of the T2I model. Furthermore, we present an innovative editing optimization method based on 3D Gaussian Splatting, obtaining stable and refined editing results through the guidance of Score Distillation Sampling and pseudo ground truth. We prove the efficacy of our method through extensive experiments.<\/jats:p>","DOI":"10.1111\/cgf.15215","type":"journal-article","created":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T12:29:59Z","timestamp":1730723399000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["GSEditPro: 3D Gaussian Splatting Editing with Attention\u2010based Progressive Localization"],"prefix":"10.1111","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-1406-9099","authenticated-orcid":false,"given":"Y.","family":"Sun","sequence":"first","affiliation":[{"name":"State Key Laboratory for Novel Software Technology of Nanjing University China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8807-4802","authenticated-orcid":false,"given":"R.","family":"Tian","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology of Nanjing University China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6218-9170","authenticated-orcid":false,"given":"X.","family":"Han","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology of Nanjing University China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1836-4017","authenticated-orcid":false,"given":"X.","family":"Liu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9621-7321","authenticated-orcid":false,"given":"Y.","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology of Nanjing University China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9054-0216","authenticated-orcid":false,"given":"K.","family":"Xu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology China"}]}],"member":"311","published-online":{"date-parts":[[2024,11,4]]},"reference":[{"key":"e_1_2_6_2_2","doi-asserted-by":"crossref","unstructured":"BrooksT. HolynskiA. EfrosA. A.: Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2023) pp.18392\u201318402. 3 6","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_2_6_3_2","doi-asserted-by":"crossref","unstructured":"BarronJ. T. MildenhallB. VerbinD. SrinivasanP. P. HedmanP.: Mip\u2010nerf 360: Unbounded anti\u2010aliased neural radiance fields. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.5470\u20135479. 6","DOI":"10.1109\/CVPR52688.2022.00539"},{"key":"e_1_2_6_4_2","doi-asserted-by":"crossref","unstructured":"ChenJ.\u2010K. Bul\u00f2S. R. M\u00fcllerN. PorziL. KontschiederP. WangY.\u2010X.: Consistdreamer: 3d\u2010consistent 2d diffusion for high\u2010fidelity scene editing. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2024) pp.21071\u201321080. 3","DOI":"10.1109\/CVPR52733.2024.01991"},{"key":"e_1_2_6_5_2","unstructured":"ChenY. ChenZ. ZhangC. WangF. YangX. WangY. CaiZ. YangL. LiuH. LinG.: Gaussianeditor: Swift and controllable 3d editing with gaussian splatting.arXiv preprint arXiv:2311.14521(2023). 2 3 5 6 7 8 9"},{"key":"e_1_2_6_6_2","unstructured":"ChenM. LainaI. VedaldiA.:Dge: Direct gaussian 3d editing by consistent multi\u2010view editing. 3"},{"key":"e_1_2_6_7_2","first-page":"25971","article-title":"Segment anything in 3d with nerfs","volume":"36","author":"Cen J.","year":"2023","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_6_8_2","doi-asserted-by":"crossref","unstructured":"DiD. YangJ. LuoC. XueZ. ChenW. YangX. GaoY.: Hyper\u20103dg: Text\u2010to\u20103d gaussian generation via hypergraph.arXiv preprint arXiv:2403.09236(2024). 3","DOI":"10.21203\/rs.3.rs-4084374\/v1"},{"key":"e_1_2_6_9_2","article-title":"Density\u2010based spatial clustering of applications with noise","volume":"240","author":"Ester M.","year":"1996","journal-title":"Int. Conf. knowledge discovery and data mining"},{"key":"e_1_2_6_10_2","doi-asserted-by":"crossref","unstructured":"FangJ. WangJ. ZhangX. XieL. TianQ.: Gaussianeditor: Editing 3d gaussians delicately with text instructions.arXiv preprint arXiv:2311.16037(2023). 2 3","DOI":"10.1109\/CVPR52733.2024.01975"},{"key":"e_1_2_6_11_2","unstructured":"HertzA. MokadyR. TenenbaumJ. AbermanK. PritchY. Cohen\u2010OrD.: Prompt\u2010to\u2010prompt image editing with cross attention control.arXiv preprint arXiv:2208.01626(2022). 2 5"},{"issue":"47","key":"e_1_2_6_12_2","first-page":"1","article-title":"Cascaded diffusion models for high fidelity image generation","volume":"23","author":"Ho J.","year":"2022","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_6_13_2","unstructured":"HaqueA. TancikM. EfrosA. A. HolynskiA. KanazawaA.: Instruct\u2010nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.19740\u201319750. 2 3 6 7 8 9"},{"key":"e_1_2_6_14_2","unstructured":"JainA. MildenhallB. BarronJ. T. AbbeelP. PooleB.: Zero\u2010shot text\u2010guided object generation with dream fields. InProceedings of the IEEE\/CVF conference on computer vision and pattern recognition(2022) pp.867\u2013876. 3"},{"key":"e_1_2_6_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_2_6_16_2","unstructured":"KirillovA. MintunE. RaviN. MaoH. RollandC. GustafsonL. XiaoT. WhiteheadS. BergA. C. LoW.\u2010Y. et al.: Segment anything. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.4015\u20134026. 5"},{"key":"e_1_2_6_17_2","first-page":"23311","article-title":"Decomposing nerf for editing via feature field distillation","volume":"35","author":"Kobayashi S.","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_6_18_2","doi-asserted-by":"crossref","unstructured":"LinC.\u2010H. GaoJ. TangL. TakikawaT. ZengX. HuangX. KreisK. FidlerS. LiuM.\u2010Y. LinT.\u2010Y.: Magic3d: High\u2010resolution text\u2010to\u20103d content creation.2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2022) 300\u2013309. 2 3","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"e_1_2_6_19_2","unstructured":"LiY. LinZ.\u2010H. ForsythD. HuangJ.\u2010B. WangS.: Climatenerf: Extreme weather synthesis in neural radiance field. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.3227\u20133238. 4"},{"key":"e_1_2_6_20_2","unstructured":"LiJ. LiD. SavareseS. HoiS.: BLIP\u20102: bootstrapping language\u2010image pre\u2010training with frozen image encoders and large language models. InICML(2023). 6"},{"key":"e_1_2_6_21_2","doi-asserted-by":"crossref","unstructured":"LiH. ShiH. ZhangW. WuW. LiaoY. WangL. LeeL.\u2010h. ZhouP.: Dreamscene: 3d gaussian\u2010based text\u2010to\u20103d scene generation via formation pattern sampling.arXiv preprint arXiv:2404.03575(2024). 3","DOI":"10.1007\/978-3-031-72904-1_13"},{"key":"e_1_2_6_22_2","unstructured":"LiuS. ZhangX. ZhangZ. ZhangR. ZhuJ.\u2010Y. RussellB.: Editing conditional radiance fields. InProceedings of the IEEE\/CVF international conference on computer vision(2021) pp.5773\u20135783. 3"},{"key":"e_1_2_6_23_2","doi-asserted-by":"crossref","unstructured":"MikaeiliA. PerelO. SafaeeM. Cohen\u2010OrD. Mahdavi\u2010AmiriA.: Sked: Sketch\u2010guided text\u2010based 3d editing. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.14607\u201314619. 2 3","DOI":"10.1109\/ICCV51070.2023.01343"},{"key":"e_1_2_6_24_2","doi-asserted-by":"crossref","unstructured":"MetzerG. RichardsonE. PatashnikO. GiryesR. Cohen\u2010OrD.: Latent\u2010nerf for shape\u2010guided generation of 3d shapes and textures. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2023) pp.12663\u201312673. 2 3","DOI":"10.1109\/CVPR52729.2023.01218"},{"key":"e_1_2_6_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_2_6_26_2","unstructured":"PooleB. JainA. BarronJ. T. MildenhallB.: Dreamfusion: Text\u2010to\u20103d using 2d diffusion.ArXiv abs\/2209.14988(2022). 2 3 5"},{"key":"e_1_2_6_27_2","unstructured":"RombachR. BlattmannA. LorenzD. EsserP. OmmerB.: High\u2010resolution image synthesis with latent diffusion models. InProceedings of the IEEE\/CVF conference on computer vision and pattern recognition(2022) pp.10684\u201310695. 2 5"},{"issue":"2","key":"e_1_2_6_28_2","first-page":"3","article-title":"Hierarchical text\u2010conditional image generation with clip latents","volume":"1","author":"Ramesh A.","year":"2022","journal-title":"arXiv preprint arXiv:2204.06125"},{"key":"e_1_2_6_29_2","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford A.","year":"2021"},{"key":"e_1_2_6_30_2","doi-asserted-by":"crossref","unstructured":"RajA. KazaS. PooleB. NiemeyerM. RuizN. MildenhallB. ZadaS. AbermanK. RubinsteinM. BarronJ. et al.: Dreambooth3d: Subject\u2010driven text\u2010to\u20103d generation. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.2349\u20132359. 3","DOI":"10.1109\/ICCV51070.2023.00223"},{"key":"e_1_2_6_31_2","doi-asserted-by":"crossref","unstructured":"RuizN. LiY. JampaniV. PritchY. RubinsteinM. AbermanK.: Dreambooth: Fine tuning text\u2010to\u2010image diffusion models for subject\u2010driven generation. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2023) pp.22500\u201322510. 2 3 5 6 8","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_2_6_32_2","first-page":"36479","article-title":"Photorealistic text\u2010to\u2010image diffusion models with deep language understanding","volume":"35","author":"Saharia C.","year":"2022","journal-title":"Advances in neural information processing systems"},{"key":"e_1_2_6_33_2","unstructured":"SchonbergerJ. L. FrahmJ.\u2010M.: Structure\u2010from\u2010motion revisited. InProceedings of the IEEE conference on computer vision and pattern recognition(2016) pp.4104\u20134113. 5 6"},{"key":"e_1_2_6_34_2","doi-asserted-by":"crossref","unstructured":"SellaE. FiebelmanG. HedmanP. Averbuch\u2010ElorH.: Vox\u2010e: Text\u2010guided voxel editing of 3d objects. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.430\u2013440. 3","DOI":"10.1109\/ICCV51070.2023.00046"},{"key":"e_1_2_6_35_2","unstructured":"WangC. ChaiM. HeM. ChenD. LiaoJ.: Clip\u2010nerf: Text\u2010and\u2010image driven manipulation of neural radiance fields. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.3835\u20133844. 2 3"},{"key":"e_1_2_6_36_2","doi-asserted-by":"crossref","unstructured":"WangX. DarrellT. RambhatlaS. S. GirdharR. MisraI.: Instancediffusion: Instance\u2010level control for image generation.arXiv preprint arXiv:2402.03290(2024). 2","DOI":"10.1109\/CVPR52733.2024.00596"},{"key":"e_1_2_6_37_2","article-title":"Prolificdreamer: High\u2010fidelity and diverse text\u2010to\u20103d generation with variational score distillation","volume":"36","author":"Wang Z.","year":"2024","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_6_38_2","unstructured":"WangY. YiX. WuZ. ZhaoN. ChenL. ZhangH.: View\u2010consistent 3d editing with gaussian splatting.arXiv preprint arXiv:2403.11868(2024). 3"},{"key":"e_1_2_6_39_2","unstructured":"XuT. ChenJ. ChenP. ZhangY. YuJ. YangW.: Tiger: Text\u2010instructed 3d gaussian retrieval and coherent editing.arXiv preprint arXiv:2405.14455(2024). 3"},{"key":"e_1_2_6_40_2","first-page":"159","volume-title":"European Conference on Computer Vision","author":"Xu T.","year":"2022"},{"key":"e_1_2_6_41_2","doi-asserted-by":"crossref","unstructured":"YeM. DanelljanM. YuF. KeL.: Gaussian grouping: Segment and edit anything in 3d scenes.arXiv preprint arXiv:2312.00732(2023). 2 3","DOI":"10.1007\/978-3-031-73397-0_10"},{"key":"e_1_2_6_42_2","doi-asserted-by":"crossref","unstructured":"YuanY.\u2010J. SunY.\u2010T. LaiY.\u2010K. MaY. JiaR. GaoL.: Nerf\u2010editing: geometry editing of neural radiance fields. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.18353\u201318364. 4","DOI":"10.1109\/CVPR52688.2022.01781"},{"issue":"3","key":"e_1_2_6_43_2","first-page":"5","article-title":"Scaling autoregressive models for content\u2010rich text\u2010to\u2010image generation","volume":"2","author":"Yu J.","year":"2022","journal-title":"arXiv preprint arXiv:2206.10789"},{"key":"e_1_2_6_44_2","doi-asserted-by":"crossref","unstructured":"ZhuangJ. KangD. CaoY.\u2010P. LiG. LinL. ShanY.: Tip\u2010editor: An accurate 3d editor following both text\u2010prompts and image\u2010prompts.arXiv preprint arXiv:2401.14828(2024). 2","DOI":"10.1145\/3658205"},{"key":"e_1_2_6_45_2","unstructured":"ZhangL. RaoA. AgrawalaM.: Adding conditional control to text\u2010to\u2010image diffusion models. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2023) pp.3836\u20133847. 2"},{"key":"e_1_2_6_46_2","unstructured":"ZhouX. RanX. XiongY. HeJ. LinZ. WangY. SunD. YangM.\u2010H.: Gala3d: Towards text\u2010to\u20103d complex scene generation via layout\u2010guided generative gaussian splatting.arXiv preprint arXiv:2402.07207(2024). 3"},{"key":"e_1_2_6_47_2","doi-asserted-by":"crossref","unstructured":"ZhuangJ. WangC. LinL. LiuL. LiG.: Dreameditor: Text\u2010driven 3d scene editing with neural fields. InSIGGRAPH Asia 2023 Conference Papers(2023) pp.1\u201310. 3 6 8 9","DOI":"10.1145\/3610548.3618190"}],"container-title":["Computer Graphics Forum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/cgf.15215","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,30]],"date-time":"2024-11-30T19:09:05Z","timestamp":1732993745000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/cgf.15215"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10]]},"references-count":46,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["10.1111\/cgf.15215"],"URL":"https:\/\/doi.org\/10.1111\/cgf.15215","archive":["Portico"],"relation":{},"ISSN":["0167-7055","1467-8659"],"issn-type":[{"type":"print","value":"0167-7055"},{"type":"electronic","value":"1467-8659"}],"subject":[],"published":{"date-parts":[[2024,10]]},"assertion":[{"value":"2024-11-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e15215"}}