LightSwitch

💡LightSwitch💡: Multi-view Relighting with Material-guided Diffusion

Carnegie Mellon University

ICCV 2025

Abstract

Recent approaches for 3D relighting have shown promise in integrating 2D image relighting generative priors to alter the appearance of a 3D representation while preserving the underlying structure. Nevertheless, generative priors used for 2D relighting that directly relight from an input image do not take advantage of intrinsic properties of the subject that can be inferred or cannot consider multi-view data at scale, leading to subpar relighting. In this paper, we propose Lightswitch, a novel finetuned material-relighting diffusion framework that efficiently relights an arbitrary number of input images to a target lighting condition while incorporating cues from inferred intrinsic properties. By using multi-view and material information cues together with a scalable denoising scheme, our method consistently and efficiently relights dense multi-view data of objects with diverse material compositions. We show that our 2D relighting prediction quality exceeds previous state-of-the-art relighting priors that directly relight from images. We further demonstrate that LightSwitch matches or outperforms state-of-the-art diffusion inverse rendering methods in relighting synthetic and real objects in as little as 2 minutes. We will publicly release our model and code.

Relighting in the Wild

LightSwitch directly relights any number of input images to a target illumination. We can produce a relit 3D representation by freezing the gaussian splat positions and continuing to optimize the appearance with the relit input images.

Learning Multi-view Relighting

LightSwitch relights multi-view posed input images to a given target illumination. It infers and encodes multi-view consistent material image maps \((\mathbf{I}_\text{d}, \mathbf{I}_\text{orm})\) using a material diffusion model (StableMaterialMV) and concatenates them to the Plücker ray maps \((\mathbf{P})\), encoded input images \((\mathbf{x}_\text{src})\), and noisy latents \((\mathbf{z}_t)\) in the channel dimension. The multi-view relighting UNet denoises the noisy latents and cross-attends to the lighting latents concatenated with the latent lighting directions \((\mathbf{E}_\text{dir})\). The lighting latents are encoded from the processed target environment map images \((\mathbf{E}^H_\text{tgt}, \mathbf{E}^L_\text{tgt})\).

Relighting 3D Assets

At inference, we optimize a 3D gaussian splat on the training images and render a novel view using the rasterizer. The relit test view is then inferred by inserting the novel view into the set of source views for consistent novel view relighting. Given the quadratic complexity of all-pair multi-view attention, we divide the input latents \(\mathbf{z}_t\) into mini-batches \(\mathbf{z}_t^\text{(1)}, \dots, \mathbf{z}_t^\text{(b)}\) and make latents attend to each other only within a subset per denoising iteration. When the batches are shuffled after the denoising step, they can attend to another subset in the next iteration. By continuously shuffling the subsets across DDPM iterations at inference we approximate the full relighting diffusion.

Results: Objects With Lighting Dataset

All objects were captured under a single fixed and unknown environment lighting as a set of images and corresponding camera poses. We show a comparison against other state-of-the-art diffusion based novel relighting methods.

Results: Synthetic Objects

LightSwitch works well for synthetic objects too!

Citation

@inproceedings{litman2025lightswitch, title={LightSwitch: Multi-view Relighting with Material-guided Diffusion}, author={Yehonathan Litman and Fernando De la Torre and Shubham Tulsiani}, booktitle = {ICCV}, year={2025} }

Acknowledgements

This work was supported in part by the NSF GFRP (Grant No. DGE2140739) and NSF Award IIS-2345610. The website template was borrowed from Michaël Gharbi and ReconFusion.