Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lossless Image Compression Through Super-Resolution (github.com/caoscott)
384 points by beagle3 on April 7, 2020 | hide | past | favorite | 125 comments


This is utterly fascinating.

To be clear -- it stores a low-res version in the output file, uses neural networks to predict the full-res version, then encodes the difference between the predicted full-res version and the actual full-res version, and stores that difference as well. (Technically, multiple iterations of this.)

I've been wondering when image and video compression would start utilizing standard neural network "dictionaries" to achieve greater compression, at the (small) cost of requiring a local NN file that encodes all the standard image "elements".

This seems like a great step in that direction.


First author here. First of all thanks so much for the interest in SReC! It was a pleasant surprise seeing my research on top of Hacker News. Answering a few questions from reading the comments:

How is this lossless? The entropy coder is what makes this technique lossless. The neural network predicts a probability distribution over pixels, and the entropy coder can find a near optimal mapping of pixel values to bits based on those probabilities (near optimal according to Shannon’s entropy).

On practicality of the method, I don’t expect SReC to replace PNGs anytime soon. The current implementation is not efficient for client-side single-image decoding because of the cost of loading a neural network into memory. However, for decoding many high-quality images, this is efficient because the memory cost is amortized. Additionally, the model size can be reduced with a more efficient architecture and pruning/quantization. Finally, as neural networks become more popular in image-related applications, I think the hardware and software support to run neural nets client-side efficiently will get better. This project in its current form is just a proof of concept that we can get state-of-the-art compression rates using neural network. The previous practical neural network-based approach (L3C) was not able to beat FLIF on Open Images.

For a detailed explanation of how SReC works and results, please refer to our paper: https://arxiv.org/abs/2004.02872.

Btw, SReC is pronounced “Shrek”, because both ogres and neural nets have layers ;).


Hi, and thanks for your interesting work.

Isn't the quality of the prediction heavily influenced by how common the encoded content is?


Yes. However, we only care about compressing natural images and they are not a big subset of the space of all possible images. In practice, we find that neural networks are quite good at making predictions on pixel values, especially when we frame the problem in terms of super-resolution.


Even though the implementation details are far from trivial, the general idea is fairly typical. Most advanced compression algorithms work the same way.

- Using the previously decoded data, try to predict what's next, and the probability or being right

- Using an entropy coder, encode the difference between what is predicted and the actual data. The predicted probability will be used to define how many bits to assign to each possible value. The higher the probability, the less bits will be used for a "right" answer and the more bits will be used for a "wrong" answer.

Decoding works by "replaying" what the encoder did.

The most interesting part is the prediction. So much that some people think of compression as a better test for AIs that the Turing test. You are basically asking the computer to solve one of these sequence-based IQ tests.

And of course neural networks are one of the first thing we tend to think of when we want to implement an AI, and unsurprising it is not an uncommon approach for compression. For instance, the latest PAQ compressors use neural networks.

Of course, all that is a general idea. How to do it in practice is where the real challenge is. Here, the clever part is to "grow" the image from low to high resolution. Which kinds of reminds me of wavelet compression.


> encode the difference between what is predicted and the actual data

Minor nitpick: In the idealized model there is no single prediction that you can take the difference with. There is just a probability distribution and you encode the actual data using this distribution.

Taking the difference between the most likely prediction and the actual data is just a very common implementation strategy.


How is "the most likely prediction" not a "single prediction that you can take the difference with"?


What if there are two (nearly) equally likely predictions? What if there are N, and they are not close together?

"A single prediction that you can take the difference with" makes some assumptions about the shape of your distribution, at least under any reasonable model of "code the difference" where, e.g., probability decreases as the difference gets larger. These are often very good assumptions, to be fair.


Anyone interested in this approach to lossless compression should visit (and try to win!) Hutter Prize website[1][2]. The goal is to compress 1GB of english wikipedia

[1] http://prize.hutter1.net/

[2] https://en.wikipedia.org/wiki/Hutter_Prize


I don't think anyone has yet applied neural net approaches to the hutter prize with success.

The trained neural network weights tend to be very large, and hard to compress, and the hutter prize requires the size of them to be counted too.

To win the hutter prize, you'd probably need some kind of training-during-inference system.



I wonder if a larger initial dataset (say 5G or 10G) might lead to better overall % compression.


The actual difference here is the encoding, into a representation of a dataset annotated by thousands of people. Sounds like a basis of knowledge or even understanding.

I bet this scales way better than any other method on large datasets


Indeed very fascinating.

Reminds me of doing something similar, albeit a thousand times dumber in ~2004 when I had to find a way to "compress" interior automotive audio data, indicator sounds, things like that. At some point instead of using traditional copression, I synthesized a wave function and and only stored its parameters and the delta from the actual wave which achieved great compression ratios. It was expensive to compress but virtually free to decompress. And as a side effect my student mind was forever blown by the beauty of it.


It's a really cool idea, but I don't know if this would ever be a practical method for image compression. First of all, you could never change the neural network without breaking the compression, so you can't ever "update" it. Like: what if you figure out a better network? Too bad! I mean, I guess you could, but then you need to to version the files and keep copies of all the networks you've ever used, but this gets messy quick.

And speaking of storing the networks: I don't know that you would ever want to pay the memory hit that it would take to store the entire network in memory just to decompress images or video, nor the performance hit the decompression takes. The trade-off here is trading reduced drive space for massively increased RAM and CPU/GPU time. I don't know any case where you'd want to make that trade-off, at least not at this magnitude.

Again though: it's an awesome idea. I just don't know that's ever going to be anything other than a cool ML curiosity.


> First of all, you could never change the neural network without breaking the compression, so you can't ever "update" it. Like: what if you figure out a better network? Too bad!

Isn’t this just a special version of a problem any type of compression will always have? There’s all kinds of ways you can imagine improving on a format like JPEG, but the reason it’s useful is because it’s locked down and widely supported.


Usual compression standards are mostly adaptive, estimating statistical models of input from implicit prior distributions (e.g. the probability of A followed by B begins at p(A)p(B)), reasonable assumptions (e.g. scanlines in an image follow the same distribution), small and fixed tables and rules (e.g. the PNG filters): not only a low volume of data, but data that can only change as a major change of algorithm.

A neural network that models upscaling is, on the other hand, not only inconveniently big, but also completely explicit (inviting all sorts of tweaking and replacement) and adapted to a specific data set (further demanding specialized replacements for performance reasons).

Among the applications that are able to store and process the neural network, which is no small feat, I don't think many would be able to amortize the cost of a tailored neural network over a large, fixed set of very homogeneous images.

The imagenet64 model is over 21 MB: saving 21 MB over PNG size, at 4.29 vs 5.74 bpp (table 2a in the article), requires a set of more than 83 MB of perfectly imagenet64-like PNG images, which is a lot. Compressing with a custom upscaling model the image datasets used for neural network experiments, which are large and stable, is the most likely good application (with the side benefit of producing useful and interesting downscaled images for free in addition to compressing the originals).


Even if it's not useful for general-purpose compression, it may still be useful in a more restricted domain. In text compression, Brotli can be found in Chrome with a dictionary that is tuned for HTTP traffic. And in audio compression, LPCnet is a research codec that used Wavenet (neural nets for speech synthesis) to compress speech to 1.6kb/s (prior discussion from 2019 at https://news.ycombinator.com/item?id=19520194).


For a standard network, you're right there would only be one version. So you just make sure it's very carefully put together. (If a massively better one comes along, then you just make it a new file format.)

And as for performance/resources -- great point. But what about video, where the space/bandwidth improvements become drastically more important?

Since h.264 and h.265 already has dedicated hardware, would it be reasonable to assume that a chip dedicated to this would handle it just fine?

And that if you've already got hardware for video, then of course you'd just re-use it for still images?


>(If a massively better one comes along, then you just make it a new file format.)

I guess you could have versioning of your file format, and some sort of organization that standardized it.


Then you get the layperson who doesn’t understand that and asks why their version 42 .imgnet won’t open in a program only supporting up to 10 (but they don’t know their image is v42 and the program only supports v10). It’s easier to understand different formats more than different versions


You'd think the program that only supports up to v10 should be able to emit an error saying that.


I think the idea is the network is completely trained and encoded along with the image and delta data. A new network would just require retraining and storing that new network along with the image data. It doesn't use a global network for all compressions.


I don't think this would work, the size of the network would likely dominate the size of the compressed image.


Wouldn't the network be part of the decoder?


Yes, and this is why you couldn't update the network. Still, much like how various compression algos have "levels," this standard could be more open in this regard, adding new networks (sort of what others above refer to as versions) and the image could just specify which network it uses. Maybe have a central repo from where the decoder could pull a network it doesn't have (i.e. I make a site and encode all 1k images on it using my own network, pull the network to your browser once so you can decode all 1k images). And even support a special mode where the image explicitly includes the network to be used for decoding it along with image data (could make sense for a very large images, as well as for specialized/demonstrational/test purposes).

All in all, a very interesting idea.


I wonder what the security implications of all this is, sounds dangerous to just run any old network. I suppose maybe if it's sandboxed enough with very strongly defined inputs and outputs then the worst that could happen is you get garbled imagery?


they include the trained models under the "model weights" section. imagenet is ~20mb, openimages is ~17mb.

now this might be prohibitive for images over the web, but it'd be interesting whether it might be applicable for images with huge resolutions for printing, where single images are are hundreds of megabytes


Why can't you update it?

There could be a release of a new model every 6 months or something (although even that is probably too often, the incremental improvement due to statistical changes in the distribution of images being compressed isn't likely to change much over time), and you just keep a copy of all the old models (or lazily download them like msft foundation c++ library versions when you install an application).

The models themselves aren't very large.


I don't know why this comment was downvoted - it's a legitimate question.

One scenario I can picture is the Netflix app on your TV. Firstly, they create a neural network trained on the video data in their library and ship it to all their clients while they are idle. They could then stream very high-quality video at lower bandwidth than they currently use and, assuming decoding can be done quickly enough, provide a great experience for their users. Any updates to the neural network could be rolled out gradually and in the background.


Google used to do something called SDCH (Shared Dictionary Compression for HTTP), where a delta compression dictionary was downloaded to Chrome.

The dictionary had to be updated from time to time to keep a good compression rate as the Google website changed over time. There was a whole protocol to handle verifying what dictionary the client had and such.


Not just that, but you could take a page out of the "compression" book and treat the NN as a sort of dictionary in that it is part of the compressed payload. Maybe not the whole NN, but perhaps deltas from a reference implementation, assuming the network structure remains the same and/or similar.


> I don't know that you would ever want to pay the memory hit that it would take to store the entire network in memory just to decompress images or video, nor the performance hit the decompression takes.

The big memory load wouldn't neccesarily be a problem for the likes of Youtube and Netflix - they could just have dedicated machines which do nothing else but decoding. The performance penalty could be a killer though.


If you've got a big enough image you can include the model parameters with the image.


There is already a startup that makes a video compression codec based on ML - http://www.wave.one/video-compression - I am personally following their work because I think it's pretty darn cool.


There's also TVeon

https://tveon.com


It's an old idea really, or a collection of old ideas with a NN twist. Not really clear how much that latter bit brings to the table but interesting to think about.

The "dictionary" approach was roughly what vector quantization was all about. The idea of turning lossy encoders into lossless by also encoding the error is a old one too, but somewhat derailed by focus on embedable codecs with an ideal of each additional bit read will improve your estimate.

I think the potentially novelty here is really in the unfortunately-named-but-too-late-now super-resolution aspects. You could do the same sort of thing ages ago with say IFS projection, or wavelet (and related) trees, or VQ dictionaries with a resolution bump, but they were limited by the training a bit (although this approach might have some overtraining issues that make it worse for particular applications.


Great explanation. If it is and stays lossless, it would make an awesome photo archiving and browsing tool.

Browse thumbnails, open original. Without any processes to generate / keep in sync these files.


The jpeg2000 standard allows for multiscale resolutions by using wavelet transforms instead of the discrete cosine transformation


The majority of photos you already have most likely contain thumbnail and larger preview images embedded in the EXIF header.

Raw images typically contain an embedded, full-sized JPEG version of the image as well.

All of these are easily extracted with `exiftool -b -NameOfBinaryTag $file > thumb.jpg`.

I've found while making PhotoStructure that the quality of these embedded images are surprisingly inconsistent, though. Some makes and models do odd things, like handle rotation inconsistently, add black bars to the image (presumably to fit the camera display whose aspect ratio is different from the sensor), render the thumb with a color or gamma shift, or apply low quality reduction algorithms (apparent due to nearest-neighbor jaggies).

I ended up having to add a setting that lets users ignore these previews or thumbnails (to choose between "fast" and "high quality").


The point is to have originals available at a good compression rate. Having a thumbnail in the original sucks, as I don’t want lossy compression on my originals.


Here's great book on this http://mattmahoney.net/dc/dce.html

The guy was using neural networks for compression long time ago before it was a thing again.

EDIT: Oh, somebody mentioned it already (but it's really good, free & totally worth reading)


My interpreataion: Create and distribute a library of all possible images (except ones which look like random noise or are otherwise unlikely to ever be needed). When you want to send an image, find it in the library and send its index instead. Use advanced compression (NNs) to reduce the size of the library.


Of the papers at Mahoney's page [0], "Fast Text Compression with Neural Networks" dates to 2000; people have been applying these techniques for decades.

[0] http://mattmahoney.net/dc/


so you could say it precomputes a function (and it's inverse) which allows computing a very space-efficient information-dense difference between a large image and its thumbnail?


This technique has also been used in the ogg-opus audio codec.


Interesting. It sounds like the idea is fundamentally like factoring out knowledge of "real image" structure into a neutral net. In a way, this is similar to the perceptual models used to discard data in lossy compression.


I wonder if there's a way to do this more like traditional compression; performance is a huge issue for compression, and taking inspiration from a neural network might be better than actually using one. Conceptually, this is like a learned dictionary that's captured by the neural net, it's just that this is fuzzier.


Training the model is extremely expensive computationally, but using it often isn't.

For example, StyleGAN takes months of compute-time on a cluster of high-end GPUs to train to get the photorealistic face model we've all seen. But generating new faces from the trained model only takes mere seconds on a low-end GPU or even a CPU.


This is really interesting but out of my league technically. I understand that super-resolution is the technique of inferring a higher-resolution truth from several lower-resolution captured photos, but I'm not sure how this is used to turn a high-resolution image into a lower-resolution one. Can someone explain this to an educated layman?


From peaking at the code, it seems like each lower res image is a scaled down version of the original plus a tensor that is used to upscale to the previous image. The resulting tensor is saved and the scaled image is used as the input to the next iteration.

The decode process takes the last image from the process above, and iteratively applies the upscalers until the original image has been reproduced.

Link to the code in question: https://github.com/caoscott/SReC/blob/master/src/l3c/bitcodi...


If we substitute "information" for "image", "low information" for "low resolution" and "high information" for "high resolution", perhaps compression could be obtained generically on any data (not just images) by taking a high information bitstream, using a CNN or CNN's (as per this paper) to convert it into a shorter, low information bitstream plus a tensor, and then an entropy (difference) series of bits.

To decompress then, reverse the CNN on the low information bitstream with the tensor.

You now have a high information bitstream which is almost like your original.

Then use the entropy series of bits to fix the difference. You're back to the original.

Losslessly.

So I wonder if this, or a similar process can be done on non-image data...

But that's not all...

If it works with non-image data, it would also say that mathematically, low information (lower) numbers could be converted into high information (higher) numbers with a tensor and entropy values...

We could view the CNN + tensor as mathematical function, and we can view the entropy as a difference...

In other words:

Someone who is a mathematician might be able to derive some identities, some new understandings in number theory from this...


Convolution only works on data that is spatially related, meaning data points that are close to each other are more related than data points that are far apart. It doesn't give meaningful results on data like spreadsheets where columns or rows can be rearranged without corrupting the underlying information.

If by non-image data you mean something like audio, then yes it could probably work.


I asked a question about a similar idea on Stack Overflow in 2014. https://cs.stackexchange.com/questions/22317/does-there-exis...

They did not have any idea and they were dicks about it as usual.


This technology is super awesome... and it's been available for awhile.

A few years ago, I worked for #bigcorp on a product which, among other things, optimized and productized a super resolution model and made it available to customers.

For anyone looking for it - it should be available in several open source libraries (and closed source #bigcorp packages) as an already trained model which is ready to deploy


On the order of 10% smaller than WebP, substantially slower encode/decode.


The encode/decode is almost certainly not optimized, it's using Pytorch and is a research project, a 10x speedup with a tuned implementation is probably easily reachable, and I wouldn't be surprised if 100x were possible even without using a GPU.


Where did you get that from? PyTorch is already pretty optimised and relies on GPU acceleration.

The only parts that are slow in comparison are the bits written in Python and those are just the frontend application.

There's not much room for performance improvement.


pytorch has optimized generic primitives, generally optimization means including safe assumptions specific to the problem you are restricting the solution to.

For example, YACC is highly optimized, but the parsers in GCC and LLVM are an order of magnitude faster because they are custom recursive-descent parsers optimized for the specific languages that those compilers actually support. GCC switched from YACC/Bison, which are each highly optimized, in version 4.0, and parsing was sped up dramatically.

Additionally, a lot of the glue code in any pytorch project is python, which is astonishingly slow compared to C++.

So I reiterate, a 10x speedup would be mundane when moving from a generic solution like pytorch and coding something specific for using this technique for image compression.

Finally, Pytorch is optimized primarily for model training in an Nvidia GPU. Applying models doesn't need a GPU for good performance, and in fact probably isn't a net win due to the need to allocate and copy data, and the fact that consumer computers often have slow integrated GPU's that can't run nvidia gpu code, and the compatible system (OpenCL, which basically isn't used in ML in a serious way yet) is supported on many systems on the CPU only since integrated GPU's are still slower than the CPU even with OpenCL.


That could be an acceptable trade off for some applications. I could see this being useful for companies that host a lot of images. You only need to encode an image once but pay the bandwidth costs every time someone downloads it. Decoding speed probably isn't the limiting factor of someone browsing the web so that shouldn't negatively impact your customer's experience.


> Decoding speed probably isn't the limiting factor of someone browsing the web so that shouldn't negatively impact your customer's experience.

Unless it is with battery powered devices. However I would say that with general web browsing without ad-blocking it wouldn't count much either it terms of bandwidth or processing milliwatts.


is webp lossless?


It's both lossless and lossy - https://en.wikipedia.org/wiki/WebP


Webp supports lossy and lossless.


Reminds me of this.

https://en.wikipedia.org/wiki/Jan_Sloot

Gave me a comical thought if such things can be permitted.

You split into rgb and b/w, turn the pictures into blurred vector graphics. Generate and use an incredibly large spectrum of compression formulas made up of separable approaches that each are sorted in such a way that one can dial into the most movie-like result.

3d models for the top million famous actors and 10 seconds of speech then deepfake to infinite resolution.

Speech to text with plot analysis since most movies are pretty much the same.

Sure, it wont be lossless but replacing a few unknown actors with famous ones and having a few accidental happy endings seems entirely reasonable.


Related for an other domain, lossless text compression using LSTM: https://bellard.org/nncp/

(this is by Fabrice Bellard, one wonder how he can achieve so much)


This is a lot like "waifu2x".[1] That's super-resolution for anime images.

[1] https://github.com/nagadomi/waifu2x


Reminds me of RAISR (https://ai.googleblog.com/2016/11/enhance-raisr-sharp-images...).

I remember talking with the team and they had production apps using it and reducing bandwidth by 30%, while only adding a few hundred kb to the app binary.


and what's the size of the neural network you have to ship for this to work? has anyone done the math on the break even point compared to other compression tools?

e: actually a better metric would be how much does it compress compared to doing the resolution increase with just lanczos in place of the neural net and keeping the Delta part intact


Does anyone know how much better the compression ratio is compared to png? Which is also a lossless encoder.


I wonder how well this technique works when the depth of field is infinite?

Out of focus parts of an image should be pretty darned easy to compress using what is effectively a thumbnail.

That said, the idea of having an image format where 'preview' code barely has to do any work at all is pretty damned cool.


Would massive savings be achieved if an image sharing app like say, Instagram were to adopt it, considering a lot of user-uploaded travel photos of popular destinations look more or less the same?


My guess is that it would be much more expensive unless it's a frequently accessed image. CPU and GPU time is much more expensive than storage costs on any cloud provider.


Wouldn't it be cheaper if the image is infrequently accessed? I'm thinking in the extreme case where you have some 10-year-old photo that no one's looked at in 7 years. In that case the storage costs are everything because the marginal CPU cost is 0.


It depends if the decompression is done on the server or on the client. If the client is doing the decompressing it would be better to compress frequently accessed images because it would lower bandwidth costs. If the server does the decompressing it would be better for infrequently accessed images to save on CPU costs.


Where it might be very useful is for companies who distribute Cellular IOT devices where they pay for each byte uploaded. That could have a real impact on cost with the tradeoff being more work on-device (which can be optimized).


Also could be great for using up spare CPU/GPU cycles or having an AWS spot instance that triggers when pricing is low to compress images.


Thats sort of the conclusion I reached when I was looking at the stuff before. The economics of it don't quite work out yet


I believe a big issue with this will be floating point differences. Due to the network being essentially recursive, tiny errors in the initial layers can grow to yield an unrecognizably different result in the final layers.

That's why most compression algorithms use fixed point mathematics.

There are ways to quantizise neutral networks to make them use integer coefficients, but that tends to lose quite a lot of performance.

Still, this is a very promising lead to explore. Thank you for sharing :)


Is this actually lossless - that is, the same pixels as the original are recovered, guaranteed? I'm surprised such guarantees can be made from a neural network.


The way many compressors work is based on recent data, they try to predict immediately following data. The prediction doesn't have to be perfect; it just has to be good enough that only the difference between the prediction and the exact data needs to be encoded, and encoding that delta usually takes fewer bits than encoding the original data.

The compression scheme here is similar. Transmit a low res version of an image, use a neural network to guess what a 2x size image would look like, then send just the delta to fix where the prediction was wrong. Then do it again until the final resolution image is reached.

If the neural network is terrible, you'd still get a lossless image recovery, but the amount of data sent in deltas would be greater than just sending the image uncompressed.


Ah, I understand! I wasn't aware that was how they worked!


The neural net predicts the upscaled image, then they add on the delta between that prediction and the desired output. No matter what the neural net predicts, you can always generate some delta.


I was failing to understand the purpose of the delta. :)


Note that this is similar to how for example MPEG does it with the intermediate frames and motion vectors. First it encodes a full frame using basically regular JPEG, then for the next frames it first does motion estimation by splitting the image into 8x8 blocks and then for each block it tries to find the position in the previous frame which best fits it. The difference in position is called the motion vector for that block.

It can then take all the "best fit" block from the previous frame and use it to generate a prediction of the next frame. It then computes the difference between the prediction and the actual frame, and stores this difference along with the set of motion vectors used to generate the prediction image.

If nothing much has changed, just camera moving a bit about, the difference between the prediction and the actual frame data is very small and can be easily compressed. Also, the range of the motion vectors is typically limited to +/-16 pixels, and you only need one per block, so they take up very little space.


I though superresolution uses multiple input files to "enhance". For example - extracting a highres image from a video clip


They reformulate the decompression problem in the shape of a supperresolution problem conforming to what you just wrote. Instead of getting variety through images of a video clip they use the generalization properties of a neural network.

"For lossless super-resolution, we predict the probability of a high-resolution image, conditioned on the low-resolution input"


This is interesting but I'm not sure if the economics of it will ever work out. It'll only be practical when the computation costs become lower than storage costs


> computation costs become lower than storage costs

Most applications are memory movement constrained rather than compute constrained.


Think youtube or netflix - it's compressed once and then delivered hundred million times to consumers.


but if its something that's requested / viewed a lot thats probably something don't want to be compressing/decompressing all the time. Neural networks still take quite a lot of computational power and require GPUs.

If its something you don't necessarily require all the time its still probably cheaper to just store it instead of run it through a ANN. You just need to look at the prices of a GPU server compared with storage costs on AWS and the estimated run time to see there is still a large difference.

I mean I could be wrong (and I'd love to be since I looked at a lot of SR stuff before) but that's sort of the conclusion I reached before and I don't really see anything has significantly changed since


> but if its something that's requested / viewed a lot thats probably something don't want to be compressing/decompressing all the time.

You're compressing only once. Decompression is usually way less expensive and that happens on the client, so without additional cost to netflix/youtube.

Of course this does not mean youtube must use it on all the videos including the ones with 10 views.


Think of network bandwidth too!


How do ML based lossy codecs compare to state of the art lossy compression? Intuitively it sounds like something AI will do much better. But this is rather cool.


They perform better, from what I've read.


Depends entirely on your definition of "better".

In terms of quality vs bit rate, ML-based methods are superior.

In terms of computation and memory requirements, they're orders of magnitude worse. It's a trade-off; TINSTAAFL.


> memory requirements

Agreed, although this bit is unclear - the compressed representations of the ML-based methods take up much less space in memory than traditional methods, but yes - the decompression pipeline is memory-intensive due to intermediary feature maps.


Looks like FLIF has a slight edge on compression ratio according to the paper, but it beats out other common compression schemes which is impressive.


How does it work for data other than Open Images, if trained on Open Images? If it recognizes fur, it's going to be great on cat videos.


It seems like "lossless" isn't quite right; some of the information (as opposed to just the algo) seems to be in the NN?

Is a soft-link a lossless compression?

It's like the old joke about a pub where they optimise by numbering all the jokes, .. just the joke number isn't enough, it can be used to losslessly recover the joke, but it's using the community storage to hold the data.


As long as you get back the exact same image you put in, it's lossless.


So this "https://news.ycombinator.com/reply?id=22804687&goto=threads%... is a lossless encoding of your comment because it returns the content? That doesn't seem right to me.


We're talking about lossless compression. A URL is a way to locate a resource, it is not compression. Compression is taking an existing recourse and transforming it into something smaller. A URL isn't a transformation. If I delete my comment the URL no longer refers to anything.


No, because that uses external data.


I guess this is the crux of it, a NN seems like a concentration of data, more data-y than algorithmic to me (not a computer scientist). Lossless compression has an inference of essential data being intrinsic in the compressed file, but with the arrangement of the OP it seems like some of that data is extrinsic - some of the essential data is in the NN. Whilst you might make that claim for any regular algo, it seems to me this has a different complexion to it. Ultimately an algo could use all extent online images, then use reverse image search and be 'just' providing an internal link to return.

Another way of looking at it is that you could have a 3D model of a person, as used for CG in movies, and then have an error map + config data to return an image of the person that is exactly the same as a mask of a photo that was taken. The error map + config data wouldn't really be a "lossless compression". Much of the data would lie in the 3D model. Do you agree that this would not be "lossless compression"? So, there's a dividing line somewhere?!


In that sense, I can losslessly compress everything down to zero bits, and recover the original artifact perfectly with the right algorithm.


You're ignoring two things:

1) that the aggregate savings from compressing the images needs to outweigh the initial cost of distributing the decompressor.

2) to be lossless, decompression must be deterministic an unambiguous, so you can't compress _everything_ down to zero bits; you can compress only _one_ thing down to zero bits, because otherwise you wouldn't be able to unambiguously determine which thing is represented by your zero bits.


In each case I pick out an algorithm beforehand that will inflate my zero bits to whatever artifact I desire.


You have to convey which algorithm, which takes bits. And at the very least you need a pointer to a file, which also takes bits. You’d do well to look for archives of alt.comp.compression.

There was also a classic thread that surfaced recently on HN about a compression challenge whereby someone tried to disengenuously attempt to compress a file of random data (uncompressable by definition) by splitting it on a character then deleting that character from each file. Was a simple algorithm, that appeared to require fewer bits to encode. The problem is, all this person did was shift the bits to the filesystem’s metadata, which is not obvious from the command line. The final encoding ended up taking more bits once you take said metadata into account.


Then that becomes part of the payload that you decompress, and you no longer have a 0 byte payload.


"I will custom write a new program to (somehow) generate each image and then distribute that instead of my image" is not a compression algorithm. But I think you'd do well over at the halfbakery.


It works well if it's the only image!


Now you're chasing your own tail. You've gone from "I can losslessly compress everything" to "I can losslessly compress exactly one thing only".


I'm arguing it's the same as this image compression technique. They rely on a huge neural network which must exist wherever the image is to be decompressed.

If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.

In contrast, an algorithm like LZ78 can expressed in a 5 line python script and perform decently on a wide variety of data types.


> If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.

If by "background data" you mean the decompressor, this is patently false. No matter how much information is contained in the decompressor (The Algorithm + stable weights that don't change), you can only compress one thing down to any given new representation ( low resolution image + differential from rescale using stable weights ).

If by "background data" you mean new data that the decompressor doesn't already have, then you're ignoring the definition of compression. Your compressed data is all bits sent on the fly that aren't already possessed by the side doing the decompression regardless of obtuse naming scheme.

> I'm arguing it's the same as this image compression technique.

That's wrong, because this scheme doesn't claim to send a custom image generator instead of each image, which is what you're proposing.


The neutral network is not an unlimited amount of data. It is fixed size, and not transferred.

Further, if you're not familiar with the pigeonhole principle, it might be worth a read on Wikipedia.


You are missing the point that you can use the same algorithm to compress N → infinity images. So the algorithm size amortizes.


Is that true?


Unless it's overfit on some particular inputs, but if so-- it's bad science.

Ideally they would have trained the network on a non-overlapping collection of images from their testing but if they did that I don't see it mentioned in the paper.

The model is only 15.72 MiB (after compression with xz), so it would amortize pretty quickly... even if it was trained on the input it looks like it still may be pretty competitive at a fairly modest collection size.


Yes you can. However, it doesn't mean it's good lossless compression.


This is true of all compression formats. The receiver has to know how to decode it. A good compression algorithm will attempt to send only the data that the receiver can't predict. If we both know you're sending English words, then I shouldn't have to tell you "q then u" - we should assume there's a "u" after "q" unless otherwise specified. This isn't new to this technique, it's a very common and old one (it's one of the first ones I learned watching a series of youtube lectures on compression or maybe it was just information theory in general) and it has been commonly called lossless compression none the less.


https://news.ycombinator.com/item?id=22807000

[Aside: I've heard they know nothing of qi in Qatar, ;oP]


You're just not sending the database - you're sending whether or not it's the database. If you can only send a binary "is the database or is not the database" then 0 and 1 is indeed fully losslessly compressed information. If that's really what you want, then that's how you would do it. Full, perfect, lossless compression reduces your data down to only that information not shared between the sender and the receiver. Sending either 1 or 0 is, in fact, exactly what you want to do if the receiver already knows the contents of the database. Compression asks the question "what is the smallest amount of data I have to receive for the person on the other side to reconstruct the original data?" If the answer is "1 bit" then that's a perfectly valid message - the only information the receiver is missing is a single bit.


When you consider the compression algorithm itself a form of information, your point about it not being quite lossless could be applied to any lossless compression method.


Yes, but there's a line somewhere, isn't there? A link to a database isn't "lossless compression", even if it returns an artefact unchanged.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: