💡LightSwitch💡: Multi-view Relighting with Material-guided Diffusion

Carnegie Mellon University
ICCV 2025
arXiv Code

Abstract


Recent approaches for 3D relighting have shown promise in integrating 2D image relighting generative priors to alter the appearance of a 3D representation while preserving the underlying structure. Nevertheless, generative priors used for 2D relighting that directly relight from an input image do not take advantage of intrinsic properties of the subject that can be inferred or cannot consider multi-view data at scale, leading to subpar relighting. In this paper, we propose Lightswitch, a novel finetuned material-relighting diffusion framework that efficiently relights an arbitrary number of input images to a target lighting condition while incorporating cues from inferred intrinsic properties. By using multi-view and material information cues together with a scalable denoising scheme, our method consistently and efficiently relights dense multi-view data of objects with diverse material compositions. We show that our 2D relighting prediction quality exceeds previous state-of-the-art relighting priors that directly relight from images. We further demonstrate that LightSwitch matches or outperforms state-of-the-art diffusion inverse rendering methods in relighting synthetic and real objects in as little as 2 minutes. We will publicly release our model and code.

Relighting in the Wild


LightSwitch directly relights any number of input images to a target illumination. We can produce a relit 3D representation by freezing the gaussian splat positions and continuing to optimize the appearance with the relit input images.

Learning Multi-view Relighting



LightSwitch relights multi-view posed input images to a given target illumination. It infers and encodes multi-view consistent material image maps \((\mathbf{I}_\text{d}, \mathbf{I}_\text{orm})\) using a material diffusion model (StableMaterialMV) and concatenates them to the Plücker ray maps \((\mathbf{P})\), encoded input images \((\mathbf{x}_\text{src})\), and noisy latents \((\mathbf{z}_t)\) in the channel dimension. The multi-view relighting UNet denoises the noisy latents and cross-attends to the lighting latents concatenated with the latent lighting directions \((\mathbf{E}_\text{dir})\). The lighting latents are encoded from the processed target environment map images \((\mathbf{E}^H_\text{tgt}, \mathbf{E}^L_\text{tgt})\).

Relighting 3D Assets



At inference, we optimize a 3D gaussian splat on the training images and render a novel view using the rasterizer. The relit test view is then inferred by inserting the novel view into the set of source views for consistent novel view relighting. Given the quadratic complexity of all-pair multi-view attention, we divide the input latents \(\mathbf{z}_t\) into mini-batches \(\mathbf{z}_t^\text{(1)}, \dots, \mathbf{z}_t^\text{(b)}\) and make latents attend to each other only within a subset per denoising iteration. When the batches are shuffled after the denoising step, they can attend to another subset in the next iteration. By continuously shuffling the subsets across DDPM iterations at inference we approximate the full relighting diffusion.

Results: Objects With Lighting Dataset


All objects were captured under a single fixed and unknown environment lighting as a set of images and corresponding camera poses. We show a comparison against other state-of-the-art diffusion based novel relighting methods.

antman apple chest gamepad ping_pong_racket porcelain_mug tpiece wood_bowl

Results: Synthetic Objects


LightSwitch works well for synthetic objects too!

Citation


Acknowledgements

This work was supported in part by the NSF GFRP (Grant No. DGE2140739) and NSF Award IIS-2345610. The website template was borrowed from Michaël Gharbi and ReconFusion.