Title: semantic operations in CLIP latent space

Abstract:

Isolating the objects and semantics in an image can be useful for several processing tasks, such as compression.

However, this is usually done via a complex retraining and disentanglement of learned image representation.

In this paper, we rather study the effect of simple operations, additions and subtractions, in the latent space of the powerful foundation model CLIP.

We show that these simple operations in the CLIP latent space enables to remove or add objects or concepts in complex images.

(Right) Input image and secondary image. (Left) Gradually removing the secondary image from the input (Left to right, top to bottom).

(Right) Input image and secondary image. (Left) Gradually adding the secondary image from the input (Left to right, top to bottom).

(Right) Input image and secondary image. (Left) Gradually adding the secondary image from the input (Left to right, top to bottom).