

{"id":332,"date":"2025-12-23T18:34:53","date_gmt":"2025-12-23T17:34:53","guid":{"rendered":"https:\/\/project.inria.fr\/dare\/?page_id=332"},"modified":"2025-12-23T19:29:29","modified_gmt":"2025-12-23T18:29:29","slug":"semantic-compression","status":"publish","type":"page","link":"https:\/\/project.inria.fr\/dare\/semantic-compression\/","title":{"rendered":"Semantic compression"},"content":{"rendered":"\n<p>We have explored how to compress an image at extremely low bitrate. As explained before, in such a bitrate regime, one must leave the objective of faithfully recovering the input image. In other words, the decoded image, instead of resembling the input image pixel by pixel, contains the same high-level information (called semantics), as illustrated in the figure below. This paradigm is called the<br>semantic compression.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"467\" src=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression-1024x467.png\" alt=\"\" class=\"wp-image-341\" style=\"width:706px;height:auto\" srcset=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression-1024x467.png 1024w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression-300x137.png 300w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression-768x350.png 768w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression-150x68.png 150w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/semantic_compression.png 1060w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>The formulation is a particular derivation of the general Data Repurposing framework introduced <a href=\"https:\/\/project.inria.fr\/dare\/project-overview\/\">here<\/a>:<br>min<sub>f,g<\/sub> r(Z) + \u03bb<sub>\u03a6<\/sub> d(\u03a6(X), \u03a6(U)) &#8211; \u03bb<sub>\u03c8<\/sub> \u03c8 (U)<br>where<br>&#8211; <em>f<\/em> and <em>g<\/em> are respectively the encoding and decoding functions.<br>&#8211; <em>X<\/em> and <em>U<\/em> are respectively the original and reconstructed images.<br>&#8211; \u03a6 (.) is a function that gives the semantic of an image. Here, <em>d<\/em> is a distance function between the image semantics.<br>&#8211; \u03c8 (.) is a metric measuring the visual quality of an image<\/p>\n\n\n\n<p>In this formulation <em>Z<\/em> describes the image semantics and <em>g<\/em> reconstructs an image from this semantic. The most suited tool for that is a diffusion model that is able to generate an image from a condition (a prompt for example). We then build an general coding scheme structure<br>(illustrated in Figure below).<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><a href=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"414\" src=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-1024x414.png\" alt=\"\" class=\"wp-image-346\" style=\"width:763px;height:auto\" srcset=\"https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-1024x414.png 1024w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-300x121.png 300w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-768x311.png 768w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-1536x621.png 1536w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1-150x61.png 150w, https:\/\/project.inria.fr\/dare\/files\/2025\/12\/pipeline_website-1.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>In this scheme, the encoder builds two types of semantic information:<br>&#8211; the condition that is the information for which the train diffusion model is conditioned for.<br>&#8211; a side information that is a complementary information that can be used to complement the condition.<\/p>\n\n\n\n<p>We have proposed 6 coding schemes that tackle the different research questions.<\/p>\n<p><strong>What condition?<\/strong><\/p>\n<ul>\n<li>SGC (<a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-04231421\">-PDF-<\/a>): a compression scheme based on segmentation maps<\/li>\n<li>COCLI (<a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-04600515\">-PDF<\/a>): a compression scheme based on CLIP<\/li>\n<\/ul>\n<p><strong>What side information? How to guide?<\/strong><\/p>\n<ul>\n<li>COCLICO (<a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-04478601\">-PDF-<\/a>): a compression scheme based on CLIP and a tiny color map<\/li>\n<li>G-COCLICO (<a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-04882103\">-PDF-<\/a>): derivation of COCLICO where the guidance equations are rewritten to finely correspond to color map side information<\/li>\n<li>SEACOM (-PDF-): a compression scheme based on CLIP and attention maps<\/li>\n<\/ul>\n<p><strong>How to control the decoding faithfulness?<\/strong><\/p>\n<ul>\n<li>SESECO (-PDF-): a compression scheme based on segmentation maps and color map, where an adaptative precision can be given to each different object of the image<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n\n\n\n<div class=\"wp-block-group is-nowrap is-layout-flex wp-container-core-group-is-layout-ad2f72ca wp-block-group-is-layout-flex\"><\/div>\n\n\n\n\n","protected":false},"excerpt":{"rendered":"<p>We have explored how to compress an image at extremely low bitrate. As explained before, in such a bitrate regime, one must leave the objective of faithfully recovering the input image. In other words, the decoded image, instead of resembling the input image pixel by pixel, contains the same high-level\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/dare\/semantic-compression\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":1433,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-332","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/pages\/332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/users\/1433"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/comments?post=332"}],"version-history":[{"count":14,"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/pages\/332\/revisions"}],"predecessor-version":[{"id":364,"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/pages\/332\/revisions\/364"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/dare\/wp-json\/wp\/v2\/media?parent=332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}