To be clear -- it stores a low-res version in the output file, uses neural networks to predict the full-res version, then encodes the difference between the predicted full-res version and the actual full-res version, and stores that difference as well. (Technically, multiple iterations of this.)
I've been wondering when image and video compression would start utilizing standard neural network "dictionaries" to achieve greater compression, at the (small) cost of requiring a local NN file that encodes all the standard image "elements".
First author here. First of all thanks so much for the interest in SReC! It was a pleasant surprise seeing my research on top of Hacker News. Answering a few questions from reading the comments:
How is this lossless? The entropy coder is what makes this technique lossless. The neural network predicts a probability distribution over pixels, and the entropy coder can find a near optimal mapping of pixel values to bits based on those probabilities (near optimal according to Shannon’s entropy).
On practicality of the method, I don’t expect SReC to replace PNGs anytime soon. The current implementation is not efficient for client-side single-image decoding because of the cost of loading a neural network into memory. However, for decoding many high-quality images, this is efficient because the memory cost is amortized. Additionally, the model size can be reduced with a more efficient architecture and pruning/quantization. Finally, as neural networks become more popular in image-related applications, I think the hardware and software support to run neural nets client-side efficiently will get better. This project in its current form is just a proof of concept that we can get state-of-the-art compression rates using neural network. The previous practical neural network-based approach (L3C) was not able to beat FLIF on Open Images.
Yes. However, we only care about compressing natural images and they are not a big subset of the space of all possible images. In practice, we find that neural networks are quite good at making predictions on pixel values, especially when we frame the problem in terms of super-resolution.
Even though the implementation details are far from trivial, the general idea is fairly typical. Most advanced compression algorithms work the same way.
- Using the previously decoded data, try to predict what's next, and the probability or being right
- Using an entropy coder, encode the difference between what is predicted and the actual data. The predicted probability will be used to define how many bits to assign to each possible value. The higher the probability, the less bits will be used for a "right" answer and the more bits will be used for a "wrong" answer.
Decoding works by "replaying" what the encoder did.
The most interesting part is the prediction. So much that some people think of compression as a better test for AIs that the Turing test. You are basically asking the computer to solve one of these sequence-based IQ tests.
And of course neural networks are one of the first thing we tend to think of when we want to implement an AI, and unsurprising it is not an uncommon approach for compression. For instance, the latest PAQ compressors use neural networks.
Of course, all that is a general idea. How to do it in practice is where the real challenge is. Here, the clever part is to "grow" the image from low to high resolution. Which kinds of reminds me of wavelet compression.
> encode the difference between what is predicted and the actual data
Minor nitpick: In the idealized model there is no single prediction that you can take the difference with. There is just a probability distribution and you encode the actual data using this distribution.
Taking the difference between the most likely prediction and the actual data is just a very common implementation strategy.
What if there are two (nearly) equally likely predictions? What if there are N, and they are not close together?
"A single prediction that you can take the difference with" makes some assumptions about the shape of your distribution, at least under any reasonable model of "code the difference" where, e.g., probability decreases as the difference gets larger. These are often very good assumptions, to be fair.
Anyone interested in this approach to lossless compression should visit (and try to win!) Hutter Prize website[1][2]. The goal is to compress 1GB of english wikipedia
The actual difference here is the encoding, into a representation of a dataset annotated by thousands of people. Sounds like a basis of knowledge or even understanding.
I bet this scales way better than any other method on large datasets
Reminds me of doing something similar, albeit a thousand times dumber in ~2004 when I had to find a way to "compress" interior automotive audio data, indicator sounds, things like that. At some point instead of using traditional copression, I synthesized a wave function and and only stored its parameters and the delta from the actual wave which achieved great compression ratios. It was expensive to compress but virtually free to decompress.
And as a side effect my student mind was forever blown by the beauty of it.
It's a really cool idea, but I don't know if this would ever be a practical method for image compression. First of all, you could never change the neural network without breaking the compression, so you can't ever "update" it. Like: what if you figure out a better network? Too bad! I mean, I guess you could, but then you need to to version the files and keep copies of all the networks you've ever used, but this gets messy quick.
And speaking of storing the networks: I don't know that you would ever want to pay the memory hit that it would take to store the entire network in memory just to decompress images or video, nor the performance hit the decompression takes. The trade-off here is trading reduced drive space for massively increased RAM and CPU/GPU time. I don't know any case where you'd want to make that trade-off, at least not at this magnitude.
Again though: it's an awesome idea. I just don't know that's ever going to be anything other than a cool ML curiosity.
> First of all, you could never change the neural network without breaking the compression, so you can't ever "update" it. Like: what if you figure out a better network? Too bad!
Isn’t this just a special version of a problem any type of compression will always have? There’s all kinds of ways you can imagine improving on a format like JPEG, but the reason it’s useful is because it’s locked down and widely supported.
Usual compression standards are mostly adaptive, estimating statistical models of input from implicit prior distributions (e.g. the probability of A followed by B begins at p(A)p(B)), reasonable assumptions (e.g. scanlines in an image follow the same distribution), small and fixed tables and rules (e.g. the PNG filters): not only a low volume of data, but data that can only change as a major change of algorithm.
A neural network that models upscaling is, on the other hand, not only inconveniently big, but also completely explicit (inviting all sorts of tweaking and replacement) and adapted to a specific data set (further demanding specialized replacements for performance reasons).
Among the applications that are able to store and process the neural network, which is no small feat, I don't think many would be able to amortize the cost of a tailored neural network over a large, fixed set of very homogeneous images.
The imagenet64 model is over 21 MB: saving 21 MB over PNG size, at 4.29 vs 5.74 bpp (table 2a in the article), requires a set of more than 83 MB of perfectly imagenet64-like PNG images, which is a lot. Compressing with a custom upscaling model the image datasets used for neural network experiments, which are large and stable, is the most likely good application (with the side benefit of producing useful and interesting downscaled images for free in addition to compressing the originals).
Even if it's not useful for general-purpose compression, it may still be useful in a more restricted domain. In text compression, Brotli can be found in Chrome with a dictionary that is tuned for HTTP traffic. And in audio compression, LPCnet is a research codec that used Wavenet (neural nets for speech synthesis) to compress speech to 1.6kb/s (prior discussion from 2019 at https://news.ycombinator.com/item?id=19520194).
For a standard network, you're right there would only be one version. So you just make sure it's very carefully put together. (If a massively better one comes along, then you just make it a new file format.)
And as for performance/resources -- great point. But what about video, where the space/bandwidth improvements become drastically more important?
Since h.264 and h.265 already has dedicated hardware, would it be reasonable to assume that a chip dedicated to this would handle it just fine?
And that if you've already got hardware for video, then of course you'd just re-use it for still images?
Then you get the layperson who doesn’t understand that and asks why their version 42 .imgnet won’t open in a program only supporting up to 10 (but they don’t know their image is v42 and the program only supports v10). It’s easier to understand different formats more than different versions
I think the idea is the network is completely trained and encoded along with the image and delta data. A new network would just require retraining and storing that new network along with the image data. It doesn't use a global network for all compressions.
Yes, and this is why you couldn't update the network. Still, much like how various compression algos have "levels," this standard could be more open in this regard, adding new networks (sort of what others above refer to as versions) and the image could just specify which network it uses. Maybe have a central repo from where the decoder could pull a network it doesn't have (i.e. I make a site and encode all 1k images on it using my own network, pull the network to your browser once so you can decode all 1k images). And even support a special mode where the image explicitly includes the network to be used for decoding it along with image data (could make sense for a very large images, as well as for specialized/demonstrational/test purposes).
I wonder what the security implications of all this is, sounds dangerous to just run any old network. I suppose maybe if it's sandboxed enough with very strongly defined inputs and outputs then the worst that could happen is you get garbled imagery?
they include the trained models under the "model weights" section. imagenet is ~20mb, openimages is ~17mb.
now this might be prohibitive for images over the web, but it'd be interesting whether it might be applicable for images with huge resolutions for printing, where single images are are hundreds of megabytes
There could be a release of a new model every 6 months or something (although even that is probably too often, the incremental improvement due to statistical changes in the distribution of images being compressed isn't likely to change much over time), and you just keep a copy of all the old models (or lazily download them like msft foundation c++ library versions when you install an application).
I don't know why this comment was downvoted - it's a legitimate question.
One scenario I can picture is the Netflix app on your TV. Firstly, they create a neural network trained on the video data in their library and ship it to all their clients while they are idle. They could then stream very high-quality video at lower bandwidth than they currently use and, assuming decoding can be done quickly enough, provide a great experience for their users. Any updates to the neural network could be rolled out gradually and in the background.
Google used to do something called SDCH (Shared Dictionary Compression for HTTP), where a delta compression dictionary was downloaded to Chrome.
The dictionary had to be updated from time to time to keep a good compression rate as the Google website changed over time. There was a whole protocol to handle verifying what dictionary the client had and such.
Not just that, but you could take a page out of the "compression" book and treat the NN as a sort of dictionary in that it is part of the compressed payload. Maybe not the whole NN, but perhaps deltas from a reference implementation, assuming the network structure remains the same and/or similar.
> I don't know that you would ever want to pay the memory hit that it would take to store the entire network in memory just to decompress images or video, nor the performance hit the decompression takes.
The big memory load wouldn't neccesarily be a problem for the likes of Youtube and Netflix - they could just have dedicated machines which do nothing else but decoding. The performance penalty could be a killer though.
There is already a startup that makes a video compression codec based on ML - http://www.wave.one/video-compression - I am personally following their work because I think it's pretty darn cool.
It's an old idea really, or a collection of old ideas with a NN twist. Not really clear how much that latter bit brings to the table but interesting to think about.
The "dictionary" approach was roughly what vector quantization was all about. The idea of turning lossy encoders into lossless by also encoding the error is a old one too, but somewhat derailed by focus on embedable codecs with an ideal of each additional bit read will improve your estimate.
I think the potentially novelty here is really in the unfortunately-named-but-too-late-now super-resolution aspects. You could do the same sort of thing ages ago with say IFS projection, or wavelet (and related) trees, or VQ dictionaries with a resolution bump, but they were limited by the training a bit (although this approach might have some overtraining issues that make it worse for particular applications.
The majority of photos you already have most likely contain thumbnail and larger preview images embedded in the EXIF header.
Raw images typically contain an embedded, full-sized JPEG version of the image as well.
All of these are easily extracted with `exiftool -b -NameOfBinaryTag $file > thumb.jpg`.
I've found while making PhotoStructure that the quality of these embedded images are surprisingly inconsistent, though. Some makes and models do odd things, like handle rotation inconsistently, add black bars to the image (presumably to fit the camera display whose aspect ratio is different from the sensor), render the thumb with a color or gamma shift, or apply low quality reduction algorithms (apparent due to nearest-neighbor jaggies).
I ended up having to add a setting that lets users ignore these previews or thumbnails (to choose between "fast" and "high quality").
The point is to have originals available at a good compression rate.
Having a thumbnail in the original sucks, as I don’t want lossy compression on my originals.
My interpreataion:
Create and distribute a library of all possible images (except ones which look like random noise or are otherwise unlikely to ever be needed). When you want to send an image, find it in the library and send its index instead. Use advanced compression (NNs) to reduce the size of the library.
Of the papers at Mahoney's page [0], "Fast Text Compression with Neural Networks" dates to 2000; people have been applying these techniques for decades.
so you could say it precomputes a function (and it's inverse) which allows computing a very space-efficient information-dense difference between a large image and its thumbnail?
Interesting. It sounds like the idea is fundamentally like factoring out knowledge of "real image" structure into a neutral net. In a way, this is similar to the perceptual models used to discard data in lossy compression.
I wonder if there's a way to do this more like traditional compression; performance is a huge issue for compression, and taking inspiration from a neural network might be better than actually using one. Conceptually, this is like a learned dictionary that's captured by the neural net, it's just that this is fuzzier.
Training the model is extremely expensive computationally, but using it often isn't.
For example, StyleGAN takes months of compute-time on a cluster of high-end GPUs to train to get the photorealistic face model we've all seen. But generating new faces from the trained model only takes mere seconds on a low-end GPU or even a CPU.
This is really interesting but out of my league technically. I understand that super-resolution is the technique of inferring a higher-resolution truth from several lower-resolution captured photos, but I'm not sure how this is used to turn a high-resolution image into a lower-resolution one. Can someone explain this to an educated layman?
From peaking at the code, it seems like each lower res image is a scaled down version of the original plus a tensor that is used to upscale to the previous image. The resulting tensor is saved and the scaled image is used as the input to the next iteration.
The decode process takes the last image from the process above, and iteratively applies the upscalers until the original image has been reproduced.
If we substitute "information" for "image", "low information" for "low resolution" and "high information" for "high resolution", perhaps compression could be obtained generically on any data (not just images) by taking a high information bitstream, using a CNN or CNN's (as per this paper) to convert it into a shorter, low information bitstream plus a tensor, and then an entropy (difference) series of bits.
To decompress then, reverse the CNN on the low information bitstream with the tensor.
You now have a high information bitstream which is almost like your original.
Then use the entropy series of bits to fix the difference. You're back to the original.
Losslessly.
So I wonder if this, or a similar process can be done on non-image data...
But that's not all...
If it works with non-image data, it would also say that mathematically, low information (lower) numbers could be converted into high information (higher) numbers with a tensor and entropy values...
We could view the CNN + tensor as mathematical function, and we can view the entropy as a difference...
In other words:
Someone who is a mathematician might be able to derive some identities, some new understandings in number theory from this...
Convolution only works on data that is spatially related, meaning data points that are close to each other are more related than data points that are far apart. It doesn't give meaningful results on data like spreadsheets where columns or rows can be rearranged without corrupting the underlying information.
If by non-image data you mean something like audio, then yes it could probably work.
This technology is super awesome... and it's been available for awhile.
A few years ago, I worked for #bigcorp on a product which, among other things, optimized and productized a super resolution model and made it available to customers.
For anyone looking for it - it should be available in several open source libraries (and closed source #bigcorp packages) as an already trained model which is ready to deploy
The encode/decode is almost certainly not optimized, it's using Pytorch and is a research project, a 10x speedup with a tuned implementation is probably easily reachable, and I wouldn't be surprised if 100x were possible even without using a GPU.
pytorch has optimized generic primitives, generally optimization means including safe assumptions specific to the problem you are restricting the solution to.
For example, YACC is highly optimized, but the parsers in GCC and LLVM are an order of magnitude faster because they are custom recursive-descent parsers optimized for the specific languages that those compilers actually support. GCC switched from YACC/Bison, which are each highly optimized, in version 4.0, and parsing was sped up dramatically.
Additionally, a lot of the glue code in any pytorch project is python, which is astonishingly slow compared to C++.
So I reiterate, a 10x speedup would be mundane when moving from a generic solution like pytorch and coding something specific for using this technique for image compression.
Finally, Pytorch is optimized primarily for model training in an Nvidia GPU. Applying models doesn't need a GPU for good performance, and in fact probably isn't a net win due to the need to allocate and copy data, and the fact that consumer computers often have slow integrated GPU's that can't run nvidia gpu code, and the compatible system (OpenCL, which basically isn't used in ML in a serious way yet) is supported on many systems on the CPU only since integrated GPU's are still slower than the CPU even with OpenCL.
That could be an acceptable trade off for some applications. I could see this being useful for companies that host a lot of images. You only need to encode an image once but pay the bandwidth costs every time someone downloads it. Decoding speed probably isn't the limiting factor of someone browsing the web so that shouldn't negatively impact your customer's experience.
> Decoding speed probably isn't the limiting factor of someone browsing the web so that shouldn't negatively impact your customer's experience.
Unless it is with battery powered devices. However I would say that with general web browsing without ad-blocking it wouldn't count much either it terms of bandwidth or processing milliwatts.
Gave me a comical thought if such things can be permitted.
You split into rgb and b/w, turn the pictures into blurred vector graphics. Generate and use an incredibly large spectrum of compression formulas made up of separable approaches that each are sorted in such a way that one can dial into the most movie-like result.
3d models for the top million famous actors and 10 seconds of speech then deepfake to infinite resolution.
Speech to text with plot analysis since most movies are pretty much the same.
Sure, it wont be lossless but replacing a few unknown actors with famous ones and having a few accidental happy endings seems entirely reasonable.
I remember talking with the team and they had production apps using it and reducing bandwidth by 30%, while only adding a few hundred kb to the app binary.
and what's the size of the neural network you have to ship for this to work? has anyone done the math on the break even point compared to other compression tools?
e: actually a better metric would be how much does it compress compared to doing the resolution increase with just lanczos in place of the neural net and keeping the Delta part intact
Would massive savings be achieved if an image sharing app like say, Instagram were to adopt it, considering a lot of user-uploaded travel photos of popular destinations look more or less the same?
My guess is that it would be much more expensive unless it's a frequently accessed image. CPU and GPU time is much more expensive than storage costs on any cloud provider.
Wouldn't it be cheaper if the image is infrequently accessed? I'm thinking in the extreme case where you have some 10-year-old photo that no one's looked at in 7 years. In that case the storage costs are everything because the marginal CPU cost is 0.
It depends if the decompression is done on the server or on the client. If the client is doing the decompressing it would be better to compress frequently accessed images because it would lower bandwidth costs. If the server does the decompressing it would be better for infrequently accessed images to save on CPU costs.
Where it might be very useful is for companies who distribute Cellular IOT devices where they pay for each byte uploaded. That could have a real impact on cost with the tradeoff being more work on-device (which can be optimized).
I believe a big issue with this will be floating point differences. Due to the network being essentially recursive, tiny errors in the initial layers can grow to yield an unrecognizably different result in the final layers.
That's why most compression algorithms use fixed point mathematics.
There are ways to quantizise neutral networks to make them use integer coefficients, but that tends to lose quite a lot of performance.
Still, this is a very promising lead to explore. Thank you for sharing :)
Is this actually lossless - that is, the same pixels as the original are recovered, guaranteed? I'm surprised such guarantees can be made from a neural network.
The way many compressors work is based on recent data, they try to predict immediately following data. The prediction doesn't have to be perfect; it just has to be good enough that only the difference between the prediction and the exact data needs to be encoded, and encoding that delta usually takes fewer bits than encoding the original data.
The compression scheme here is similar. Transmit a low res version of an image, use a neural network to guess what a 2x size image would look like, then send just the delta to fix where the prediction was wrong. Then do it again until the final resolution image is reached.
If the neural network is terrible, you'd still get a lossless image recovery, but the amount of data sent in deltas would be greater than just sending the image uncompressed.
The neural net predicts the upscaled image, then they add on the delta between that prediction and the desired output. No matter what the neural net predicts, you can always generate some delta.
Note that this is similar to how for example MPEG does it with the intermediate frames and motion vectors. First it encodes a full frame using basically regular JPEG, then for the next frames it first does motion estimation by splitting the image into 8x8 blocks and then for each block it tries to find the position in the previous frame which best fits it. The difference in position is called the motion vector for that block.
It can then take all the "best fit" block from the previous frame and use it to generate a prediction of the next frame. It then computes the difference between the prediction and the actual frame, and stores this difference along with the set of motion vectors used to generate the prediction image.
If nothing much has changed, just camera moving a bit about, the difference between the prediction and the actual frame data is very small and can be easily compressed. Also, the range of the motion vectors is typically limited to +/-16 pixels, and you only need one per block, so they take up very little space.
They reformulate the decompression problem in the shape of a supperresolution problem conforming to what you just wrote. Instead of getting variety through images of a video clip they use the generalization properties of a neural network.
"For lossless super-resolution, we predict the probability of a high-resolution image, conditioned on the low-resolution input"
This is interesting but I'm not sure if the economics of it will ever work out. It'll only be practical when the computation costs become lower than storage costs
but if its something that's requested / viewed a lot thats probably something don't want to be compressing/decompressing all the time. Neural networks still take quite a lot of computational power and require GPUs.
If its something you don't necessarily require all the time its still probably cheaper to just store it instead of run it through a ANN. You just need to look at the prices of a GPU server compared with storage costs on AWS and the estimated run time to see there is still a large difference.
I mean I could be wrong (and I'd love to be since I looked at a lot of SR stuff before) but that's sort of the conclusion I reached before and I don't really see anything has significantly changed since
> but if its something that's requested / viewed a lot thats probably something don't want to be compressing/decompressing all the time.
You're compressing only once. Decompression is usually way less expensive and that happens on the client, so without additional cost to netflix/youtube.
Of course this does not mean youtube must use it on all the videos including the ones with 10 views.
How do ML based lossy codecs compare to state of the art lossy compression? Intuitively it sounds like something AI will do much better. But this is rather cool.
Agreed, although this bit is unclear - the compressed representations of the ML-based methods take up much less space in memory than traditional methods, but yes - the decompression pipeline is memory-intensive due to intermediary feature maps.
It seems like "lossless" isn't quite right; some of the information (as opposed to just the algo) seems to be in the NN?
Is a soft-link a lossless compression?
It's like the old joke about a pub where they optimise by numbering all the jokes, .. just the joke number isn't enough, it can be used to losslessly recover the joke, but it's using the community storage to hold the data.
We're talking about lossless compression. A URL is a way to locate a resource, it is not compression. Compression is taking an existing recourse and transforming it into something smaller. A URL isn't a transformation. If I delete my comment the URL no longer refers to anything.
I guess this is the crux of it, a NN seems like a concentration of data, more data-y than algorithmic to me (not a computer scientist). Lossless compression has an inference of essential data being intrinsic in the compressed file, but with the arrangement of the OP it seems like some of that data is extrinsic - some of the essential data is in the NN. Whilst you might make that claim for any regular algo, it seems to me this has a different complexion to it. Ultimately an algo could use all extent online images, then use reverse image search and be 'just' providing an internal link to return.
Another way of looking at it is that you could have a 3D model of a person, as used for CG in movies, and then have an error map + config data to return an image of the person that is exactly the same as a mask of a photo that was taken. The error map + config data wouldn't really be a "lossless compression". Much of the data would lie in the 3D model. Do you agree that this would not be "lossless compression"? So, there's a dividing line somewhere?!
1) that the aggregate savings from compressing the images needs to outweigh the initial cost of distributing the decompressor.
2) to be lossless, decompression must be deterministic an unambiguous, so you can't compress _everything_ down to zero bits; you can compress only _one_ thing down to zero bits, because otherwise you wouldn't be able to unambiguously determine which thing is represented by your zero bits.
You have to convey which algorithm, which takes bits. And at the very least you need a pointer to a file, which also takes bits. You’d do well to look for archives of alt.comp.compression.
There was also a classic thread that surfaced recently on HN about a compression challenge whereby someone tried to disengenuously attempt to compress a file of random data (uncompressable by definition) by splitting it on a character then deleting that character from each file. Was a simple algorithm, that appeared to require fewer bits to encode. The problem is, all this person did was shift the bits to the filesystem’s metadata, which is not obvious from the command line. The final encoding ended up taking more bits once you take said metadata into account.
"I will custom write a new program to (somehow) generate each image and then distribute that instead of my image" is not a compression algorithm. But I think you'd do well over at the halfbakery.
I'm arguing it's the same as this image compression technique. They rely on a huge neural network which must exist wherever the image is to be decompressed.
If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.
In contrast, an algorithm like LZ78 can expressed in a 5 line python script and perform decently on a wide variety of data types.
> If I'm allowed to bring along an unlimited amount of background data, then I can compress everything down to zero bits.
If by "background data" you mean the decompressor, this is patently false. No matter how much information is contained in the decompressor (The Algorithm + stable weights that don't change), you can only compress one thing down to any given new representation ( low resolution image + differential from rescale using stable weights ).
If by "background data" you mean new data that the decompressor doesn't already have, then you're ignoring the definition of compression. Your compressed data is all bits sent on the fly that aren't already possessed by the side doing the decompression regardless of obtuse naming scheme.
> I'm arguing it's the same as this image compression technique.
That's wrong, because this scheme doesn't claim to send a custom image generator instead of each image, which is what you're proposing.
Unless it's overfit on some particular inputs, but if so-- it's bad science.
Ideally they would have trained the network on a non-overlapping collection of images from their testing but if they did that I don't see it mentioned in the paper.
The model is only 15.72 MiB (after compression with xz), so it would amortize pretty quickly... even if it was trained on the input it looks like it still may be pretty competitive at a fairly modest collection size.
This is true of all compression formats. The receiver has to know how to decode it. A good compression algorithm will attempt to send only the data that the receiver can't predict. If we both know you're sending English words, then I shouldn't have to tell you "q then u" - we should assume there's a "u" after "q" unless otherwise specified. This isn't new to this technique, it's a very common and old one (it's one of the first ones I learned watching a series of youtube lectures on compression or maybe it was just information theory in general) and it has been commonly called lossless compression none the less.
You're just not sending the database - you're sending whether or not it's the database. If you can only send a binary "is the database or is not the database" then 0 and 1 is indeed fully losslessly compressed information. If that's really what you want, then that's how you would do it. Full, perfect, lossless compression reduces your data down to only that information not shared between the sender and the receiver. Sending either 1 or 0 is, in fact, exactly what you want to do if the receiver already knows the contents of the database. Compression asks the question "what is the smallest amount of data I have to receive for the person on the other side to reconstruct the original data?" If the answer is "1 bit" then that's a perfectly valid message - the only information the receiver is missing is a single bit.
When you consider the compression algorithm itself a form of information, your point about it not being quite lossless could be applied to any lossless compression method.
To be clear -- it stores a low-res version in the output file, uses neural networks to predict the full-res version, then encodes the difference between the predicted full-res version and the actual full-res version, and stores that difference as well. (Technically, multiple iterations of this.)
I've been wondering when image and video compression would start utilizing standard neural network "dictionaries" to achieve greater compression, at the (small) cost of requiring a local NN file that encodes all the standard image "elements".
This seems like a great step in that direction.