We have explored how to compress an image at extremely low bitrate. As explained before, in such a bitrate regime, one must leave the objective of faithfully recovering the input image. In other words, the decoded image, instead of resembling the input image pixel by pixel, contains the same high-level information (called semantics), as illustrated in the figure below. This paradigm is called the
semantic compression.

The formulation is a particular derivation of the general Data Repurposing framework introduced here:
minf,g r(Z) + λΦ d(Φ(X), Φ(U)) – λψ ψ (U)
where
– f and g are respectively the encoding and decoding functions.
– X and U are respectively the original and reconstructed images.
– Φ (.) is a function that gives the semantic of an image. Here, d is a distance function between the image semantics.
– ψ (.) is a metric measuring the visual quality of an image
In this formulation Z describes the image semantics and g reconstructs an image from this semantic. The most suited tool for that is a diffusion model that is able to generate an image from a condition (a prompt for example). We then build an general coding scheme structure
(illustrated in Figure below).

In this scheme, the encoder builds two types of semantic information:
– the condition that is the information for which the train diffusion model is conditioned for.
– a side information that is a complementary information that can be used to complement the condition.
We have proposed 6 coding schemes that tackle the different research questions.
What condition?
- SGC (-PDF-): a compression scheme based on segmentation maps
- COCLI (-PDF): a compression scheme based on CLIP
What side information? How to guide?
- COCLICO (-PDF-): a compression scheme based on CLIP and a tiny color map
- G-COCLICO (-PDF-): derivation of COCLICO where the guidance equations are rewritten to finely correspond to color map side information
- SEACOM (-PDF-): a compression scheme based on CLIP and attention maps
How to control the decoding faithfulness?
- SESECO (-PDF-): a compression scheme based on segmentation maps and color map, where an adaptative precision can be given to each different object of the image
