SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Puyi Wang*1, Yuhao Wang*2,3, Linjie Li4, Zhengyuan Yang4, Kevin Qinghong Lin5, Yangguang Li1, Yu Cheng1

1The Chinese University of Hong Kong 2Shanghai Jiao Tong University 3Shanghai AI Laboratory 4Microsoft 5University of Oxford

*Equal contribution.

SceneCode turns natural language prompts into code-driven indoor worlds: renderable rooms, persistent scene state, part-level object programs, and simulation-ready assets with physical and articulation metadata.

SceneCode overview teaser showing prompt-driven executable indoor world generation.
SceneCode compiles a natural language prompt into an executable indoor world with editable scene programs and simulation-ready assets.

Abstract

Indoor scene synthesis underpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing pipelines, however, typically represent generated content as static meshes and inherit articulation only from curated asset libraries, which limits object-level controllability and prevents new interactable assets from being produced on demand. We address this gap by formulating physically interactable indoor scene synthesis as programmatic world generation, and present SceneCode, a framework that compiles a natural language prompt into an executable, code-driven indoor world rather than a collection of opaque meshes. A room-level agentic backbone first turns the prompt into a structured house layout and emits per-object AssetRequests through a planner--designer--critic loop. Each request is then routed to one of five code-generation strategies and converted into a synthesized part-wise Blender Python programs that are validated through an execution-guided repair-and-refine loop. The resulting programs are compiled into simulation-ready assets, and exported as SDF for physics simulation. A persistent scene-state registry links object requests, executable programs, rendered geometry, and simulation assets, turning scene assembly into a traceable and locally editable world-building process.

We evaluate SceneCode across scene-level synthesis, object-level asset quality, human judgment, and downstream robot interaction. Results show that executable world programs improve prompt-faithful indoor scene generation and produce assets with cleaner mesh structure, and simulator-loadable articulation metadata.

Indoor Rooms

Select a scene to explore. Click objects to view them in isolation.

Select a scene to inspect its prompt and generated world program.
  • Rotate: left drag
  • Pan: right drag
  • Zoom: wheel
  • Select: click object

Simulation-Ready Assets

SceneCode compiles generated articulated assets into URDF with visual meshes, collision geometry, joint types, axes, and limits for downstream simulation workflows.

Method

SceneCode couples room-level planning with object-level program synthesis, execution-guided refinement, and persistent scene-state registration.

SceneCode method overview.
Overview of the SceneCode pipeline for generating editable, executable indoor worlds from natural language prompts.

BibTeX

Please cite SceneCode with the following BibTeX entry.

BibTeX
@misc{wang2026scenecodeexecutableworldprograms,
  title={SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects},
  author={Puyi Wang and Yuhao Wang and Linjie Li and Zhengyuan Yang and Kevin Qinghong Lin and Yangguang Li and Yu Cheng},
  year={2026},
  eprint={2605.19587},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.19587},
}