In that sense, I can losslessly compress everything down to zero bits, and recov...

ebg13 · on April 7, 2020

You're ignoring two things:

1) that the aggregate savings from compressing the images needs to outweigh the initial cost of distributing the decompressor.

2) to be lossless, decompression must be deterministic an unambiguous, so you can't compress _everything_ down to zero bits; you can compress only _one_ thing down to zero bits, because otherwise you wouldn't be able to unambiguously determine which thing is represented by your zero bits.

yters · on April 7, 2020

In each case I pick out an algorithm beforehand that will inflate my zero bits to whatever artifact I desire.

milesvp · on April 7, 2020

You have to convey which algorithm, which takes bits. And at the very least you need a pointer to a file, which also takes bits. You’d do well to look for archives of alt.comp.compression.

There was also a classic thread that surfaced recently on HN about a compression challenge whereby someone tried to disengenuously attempt to compress a file of random data (uncompressable by definition) by splitting it on a character then deleting that character from each file. Was a simple algorithm, that appeared to require fewer bits to encode. The problem is, all this person did was shift the bits to the filesystem’s metadata, which is not obvious from the command line. The final encoding ended up taking more bits once you take said metadata into account.

karpierz · on April 7, 2020

Then that becomes part of the payload that you decompress, and you no longer have a 0 byte payload.

ebg13 · on April 7, 2020

"I will custom write a new program to (somehow) generate each image and then distribute that instead of my image" is not a compression algorithm. But I think you'd do well over at the halfbakery.

yters · on April 7, 2020

It works well if it's the only image!

ebg13 · on April 7, 2020

Now you're chasing your own tail. You've gone from "I can losslessly compress everything" to "I can losslessly compress exactly one thing only".

yters · on April 7, 2020

I'm arguing it's the same as this image compression technique. They rely on a huge neural network which must exist wherever the image is to be decompressed.

If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.

In contrast, an algorithm like LZ78 can expressed in a 5 line python script and perform decently on a wide variety of data types.

ebg13 · on April 7, 2020

> If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.

If by "background data" you mean the decompressor, this is patently false. No matter how much information is contained in the decompressor (The Algorithm + stable weights that don't change), you can only compress one thing down to any given new representation ( low resolution image + differential from rescale using stable weights ).

If by "background data" you mean new data that the decompressor doesn't already have, then you're ignoring the definition of compression. Your compressed data is all bits sent on the fly that aren't already possessed by the side doing the decompression regardless of obtuse naming scheme.

> I'm arguing it's the same as this image compression technique.

That's wrong, because this scheme doesn't claim to send a custom image generator instead of each image, which is what you're proposing.

mypalmike · on April 7, 2020

The neutral network is not an unlimited amount of data. It is fixed size, and not transferred.

Further, if you're not familiar with the pigeonhole principle, it might be worth a read on Wikipedia.

atorodius · on April 7, 2020

You are missing the point that you can use the same algorithm to compress N → infinity images. So the algorithm size amortizes.

yters · on April 7, 2020

Is that true?

nullc · on April 7, 2020

Unless it's overfit on some particular inputs, but if so-- it's bad science.

Ideally they would have trained the network on a non-overlapping collection of images from their testing but if they did that I don't see it mentioned in the paper.

The model is only 15.72 MiB (after compression with xz), so it would amortize pretty quickly... even if it was trained on the input it looks like it still may be pretty competitive at a fairly modest collection size.

chickenpotpie · on April 7, 2020

Yes you can. However, it doesn't mean it's good lossless compression.