Papers
arxiv:2604.23774

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Published on Apr 29
· Submitted by
Etai Sella
on May 4
Authors:
,
,
,
,
,

Abstract

A training-free framework for fine-grained 3D editing that uses geometric primitives and vision-language models to preserve identity while enabling localized structural changes.

AI-generated summary

Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.

Community

Paper submitter

Even today when image editing models are more powerful than ever, fine grained structural 3D editing remains difficult. In this work we use primitive based abstractions to leverage the reasoning power of VLMs to solve this challenging task.

the use of superquadric primitives as a compact proxy is clever, but the real test is how the edited proxy translates into the 3d diffusion steps without losing identity. the proxy-induced denoising path is where alignment and sampling quirks will show up, especially when edits are localized and other regions dominate the shape signal. an ablation varying the number of primitives or substituting a more expressive primitive family could reveal where identity preservation actually hinges. the arxivlens breakdown helped me parse the method details, btw, here's the link: https://arxivlens.com/PaperView/Details/prox-e-fine-grained-3d-shape-editing-via-primitive-based-abstractions-5048-d8c944ab. overall, i like the clean geometry-prior split, and it would be interesting to see how this scales to multi-object scenes with occlusion.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.23774
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.23774 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.23774 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.23774 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.