Project Overview – Data Repurposing (DARE)

Context and objectives: We must reinvent how compression is done and this is the main objective of the MADARE project. The proposed direction is called Data Repurposing and consists in changing the initial data format for a more compact representation, thus leading to drastic compression ratios. This new data format still describes the information contained in the initial database, but under another aspect, in a more concise form: exactly as a music score is able to describe, for example, a concert in a compact and reusable form. This enables the compression to withdraw tremendous amount of useless, or at least not essential, information while condensing the important information into a compact recycled signal. In a nutshell, in the Data Repurposing framework, the decoded signals target subjective exhaustiveness of the information description, rather than fidelity to the input data, as in the traditional compression algorithms. Such approaches will be designed according to user’s profile taking into account their
feedbacks and interactive data labelling. This is a complete change of paradigm for image and video compression, which must enable gigantic compression gains.

Approach: Let us formulate the Data Repurposing framework and explicit its novelty.
– An image data collection Xⁿ.
– Encoder f: Xⁿ —> Z , where Z is the compressed description.
– The rate r is measured as the size of Z.
– Decoder g: Z —> U, where U is the decoded image collection (it can be an image collection with less images or with the same number of images).
– The quality of the reconstruction is measured as q( U, Xⁿ ). The novelty is that q measures the exhaustiveness of the semantic information contains in U (with respect to Xⁿ ) instead of comparing a pixel by pixelfidelity.

The objective is to target a rate r( Z ) that is ultra low (several order of magnitude lower than regular image compression algorithms). For that purpose, one needs to leave the objective of reconstructing faithful information. Only general exhaustiveness is targeted.

The research questions tackled are thus:

What quality metric to measure the exhaustiveness ?
What compressed form describing the image collection semantic ?
What decoder to recover the decoded signals U ?

The MADARE project has explored the following approaches relying on two pillars:

Semantic compression of individual images at ultra low bitrate
Blau et al 2018 have stated that when the rate decreases too much, a trade-off between the perception (the good looking of the decoded image) and the distortion (the fidelity to the original image) occurres. This is the reason why one needs a new coding paradigm, that we call semantic compression . We have proposed different coding schemes to demonstrate the relevance and potential of such framework.
Compression of image collection
We have also explored how to jointly compress multiple image in a data collection. Two approaches are investigated: sampling and exploitation of the semantic redundancies.