Google describes an approach that fits into the image generation model and allows to add per-object changes of the material parameters, like color or shininess/transparency, to any photograph. The obtained parametric model uses the real-world knowledge about generative text-to-image models and fine-tunes them with a synthetic dataset.
Google’s study on the concept of text-to-image model
Earlier methods that is, IID categorize an image into layers or ‘primal’ elements of images such as base color specularity and even lighting of the image. These decomposed layers can also be edited separately and rearranged to create a photo-realstic image.
Other recent techniques use source image and target object to apply edits on it and these all the recently developed methods use Text to Image (T2I) models that are highly effective in photorealistic image manipulations. However, these approaches fall short when it comes to separating the material and shape of the information.
For instance, when attempting to alter the color of a house following a blue hue into a yellow hue, the shape of the house will likely shift. They encounter similar problems in StyleDrop, which, while capable of producing different looks, fails to retain the object’s shape across styles.
In “Alchemist: In our work “Parametric Control of Material Properties with Diffusion Models”, published at CVPR 2024, Google presented a technique that allows user to have parametric control over a specific material property of an object in an image through the photorealistic prior learnt by the T2I model.
Google said that it employed conventional Computer Graphics approach and physically based rendering that have for long been used in movie and television to render synthetic data set where they have full control over the material parameters. The experiment is initiated with an initial sample of one hundred 3D models of everyday house hold items which are of geometric shapes.
Next, they change the architecture of first model in the Stable Diffusion. 5, a latent diffusion model for T2I generation, to take in the edit strength value to enable what we need; fine-tuning of material parameters. The model learns to edit material attributes when a context image, an instruction, and a scalar value that defines the required relative change in the attribute are given.
To edit material properties of objects in real-world images, all you have to do is feed the new real-world image to the trained model and enter whatever edit strength the user desired. The model generalizes from the relatively small ‘amount’ of synthetic data to images of real-world scenes, enabling material editing of prohibitively expensive, otherwise, real-world images while keeping all other attributes constant.
What does they conclude from the studies?
This is a method that really impressed us on how effective it’s going to be. Should the requested change be to make the object metallic, our model also correctly transforms the looks of the object while keeping the shape and illuminating the image in the same manner. When asked to paint an object to be more transparent then it convincingly puts in the background behind the object, internal structures not visible to the naked eye and light reflected within the object.