Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Visualizing and Understanding JPEG Format (github.com/corkami)
215 points by yasoob on July 10, 2020 | hide | past | favorite | 46 comments


Check out Unraveling the JPEG from Omar Shehata on Parametric Press[1] for an interactive essay on the JPEG format.

Very cool to be able to manipulate the bytes directly and seeing how it affects the image.

[1] https://parametric.press/issue-01/unraveling-the-jpeg/


Hexfiend (open source hex editor for mac) [0] and synalize it [1] (closed source hex editor for mac) both support a similar feature (hexfiend only in the newest github release) that shows the structure of known file formats similar to what wireshark [2] does for network packages.

Thought about writing an hex editor or adding the hex fiend functionality to other hex editors based on the templates of hex fiend.

[0] http://ridiculousfish.com/hexfiend/

[1] https://www.synalysis.net/

[2] https://www.wireshark.org/


I love Hexfiend, so I was checking wireshark.org to see what you meant, since I don't use it nearly as much. First, Cloudshare is evaluating my browser to see if it's genuine access. Why? Second, scrolling the site uses 97% of my CPU. WTF is it doing? Some parallax effect on the shark, that could easily be done with a few lines of CSS? Lost interest at the spot. What have we turned the web into.


Can't reproduce the problem on the site, but here

> Wireshark is the world’s foremost and widely-used network protocol analyzer. It lets you see what’s happening on your network at a microscopic level and is the de facto (and often de jure) standard across many commercial and non-profit enterprises, government agencies, and educational institutions.

From the website. Btw wireshark is is one of the most useful tools for system administrators (especially in combination with tcpdump and ssh) and giving up on a project, because your browser bugs out is kind of bad form. You should at least take a look at it. Anyway hope you change your mind and have fun ;) some of my most interesting nights I have spent with wireshark


Thanks. I know how awesome Wireshark is from the brief interactions I've had in the past. But their website experience the terrible.

It's much more a critique of the state of the web than of the project itself.

BTW, both behaviors (insane CPU usage while scrolling and Cloudflare checking) are reproducible in Safari, Chrome and Firefox, so it's not the browser that's bugging out, but the website scroll highjacking that's poorly written.


This repo from the author is full of gold. Shows the details and visualization of various file structures (including multiple zip formats, image formats and executable files)

https://github.com/corkami/pics


Checked it out and the name Ange Albertini rang a bell. I fondly remember his Funky File Formats talk from 31C3. More gold from him here:

https://media.ccc.de/search?p=Ange+Albertini

The Preserve Arcade Games talk is also good, I only realized now that both talks were by the same person.


By the way, you can decode JPEG with less artifacts using these projects:

https://github.com/ilyakurdyukov/jpeg-quantsmooth (good for most mid-quality and high-quality images)

https://github.com/victorvde/jpeg2png (use for very low-quality images, slower)

jpeg-quantsmooth gives a sharper output (use "-q6" option).

jpeg2png output is a little blurry, but it does a better job of unblocking very low-quality images.


ffmpeg/mplayer has some more principled smoothing filters than these (especially spp); they're made for MPEG4 but JPEG is pretty similar.

Another algorithm that should work but I haven't seen tried is edge-aware chroma upsampling. libjpeg uses nearest neighbor upscaling for the subsampled color planes, which is why JPEG looks so bad when there's a transition from eg saturated red to black.


Both of these projects have chroma upsampling.


All JPEG decoders have chroma upsampling. Here jpeg2png only seems to actually optimize it though; seems like it works but optimizing for "smoothness" could maybe be improved.

I'm suggesting something like eedi3 (http://avisynth.nl/index.php/Eedi3) but guided by the luma plane.

Oh, you can also jointly optimize the YUV-RGB conversion because if the pixel ends up out of range in RGB space it is probably a compression artifact. I read a paper on this once but haven't seen it implemented.


"All JPEG decoders have chroma upsampling."

Well, I meant better upsampling than bilinear interpolation, which blurs chroma.

"Oh, you can also jointly optimize the YUV-RGB conversion because if the pixel ends up out of range in RGB space it is probably a compression artifact."

Probably not, I suspect that the mozjpeg encoder may use this, so images with black on white text look better (artifact noise go beyond of range).


If you're looking for a fun project that can be completed in a weekend or less, one of the things I recommend is to write a (baseline) JPEG decoder. The official spec (linked from the article near the top) is quite readable as far as standards go, and even contains helpful flowcharts for basically the whole process. When I did it, it took less than 1kLoC of C, including some minor optimisations like array-based Huffman decoding (although the version in the spec is quite compact, it is slower --- but still not as slow as the "theoretical" bit-by-bit tree-traversal that a lot of other tutorials on Huffman show.)

H.261 (a video format) is roughly of the same if not slightly less complexity, and also makes a good weekend project.


Where do you get the spec for video? That really something that can be done in a weekend?



If you can write a JPEG decoder in a weekend, H.261 should be doable. It's much simpler in many ways --- hardcoded Huffman tables and dimensions. The spec is much shorter too.


I wish I had had this when I wrote my 250 LOC jpeg visualizer!

https://github.com/aguaviva/micro-jpeg-visualizer


This is great, I was literally banging my head against a wall last night trying to determine how the length of the ECS was determined. What a lucky coincidence!

I'm trying to build a small C# library to support reading image types to learn about them, png was quite understandable but jpg is a little trickier, given I panic when I see maths.


Very nice read, thanks for posting it!!!

I also learned a lot from the various JPEG articles on ImpulseAdventure (no affiliation), e.g.:

https://www.impulseadventure.com/photo/jpeg-quantization.htm...

https://www.impulseadventure.com/photo/jpeg-compression.html

https://www.impulseadventure.com/photo/chroma-subsampling.ht...

They also have a lot of practical stuff (which chroma subsampling will Photoshop actually use when you save at 80% quality?) in addition to the theory.


This is great!

The model used by JPEG is not that unusual in image processing, so this can help us to understand many types of processes.

This person has a whole bunch of this stuff! Gold!

https://github.com/corkami


They could also note how 3D MPO images work, they’re just two jpegs concatenated together with some special markers. It’s been a while but I’ve written MPO parsers for a couple languages.


I have a question in case someone knows:

If I wanted to develop a custom JPEG decoder, what would be the easiest way to get image data from a jpeg file into Tensorflow (or numpy, Pytorch, etc.) tensors? I mean the raw DCT coefficients, and the coefficient table?


> lossless storage:

> to make JPEG store data losslessly: use grayscale, 100% quality, then either width or eight to 1 pixel, or duplicate the padded data 8 times (JPEG images are stored in 8x8 blocks).

Interesting. I wonder if this is used internally in any applications.


Actually - JPEG is not lossless even at 100% quality, there is a little noise at +/-1 of pixel value, some rare pixels at 2. Grayscale is not needed, you may skip RGB->YUV conversion and encode in RGB colors, libjpeg can do that.


1 - No.

2 - JPEG algorithm allows to have lossless data without padding

3 - For lossless data rar/arc/ace are better then JPEG


If I'm being honest, I'm more of a PNG fan.


PNG is a pretty poor file format, just because zlib (a byte-level text codec) is not a good compressor for image data. Any lossless video codec works much better.


Where’s all the Hard G “Jerraffic” folks in this thread fighting for the pronunciation [JAY-feg]?


When you know it's going to be Corkami before you even click... :D


why i still see jpeg in 2020 ?


It works incredibly well for a format its age. It's simple, and widely supported.

Internally, JPEG has the key techniques required to compress photos well, and nothing else. Most newer formats are based on the same principles, except each bit is upgraded to the point of diminishing returns.

HEIF and AVIF compress twice as well as JPEG, but in terms of complexity and computational requirements are hundreds of times more complex.


Also, in addition to what you mentioned, we have jpegs because photographers are still shooting jpegs with their cameras.

The economics of the camera market right now will make adoption of new formats (and the additional/more expensive hardware in the camera to support it) slow and jpeg is a very reasonable format for photographers to shoot when they don't want/need to shoot raw.


Incumbent format momentum, for a large part. Nothing else has offered enough benefit for lossful image compression for the mass market to make the effort to switch.

Also, some of the alternative formats have licensing or patent issues that would effectively block wide adoption.


HEIF isn't supported by any browsers.

BPG isn't supported by any browsers.

WEBP isn't supported by IE11 or Safari.


WebP is supported in Safari 14 preview.


The question you should be asking is 'why I still see animated gifs in 2020?'


Do you? I thought many ‘animated GIFs’ aren’t GIF files anymore. Twitter, for example, serves MP4 files (https://techcrunch.com/2014/06/19/gasp-twitter-gifs-arent-ac...)


As well as the compression, I think this also had a lot to do with native mobile app support, where MPEG4 was far easier to deal with than animated GIFs.


What's wrong with JPEG?


Nothing particularly.

Though there are other lossful encoding schemes developed since that either produce better results with the same data size or comparable results with better compression (or both). But the improvements have not been large enough to make the industry as a whole consider making the effort to support these other formats (and in some cases there would be licensing/patent issues making support legally/financially onerous on top of the coding/testing required).


I would add that the "reference implementation" of jpeg encoding and decoding is very easy to compile and to use, reasonably efficient, and completely free software. There has never been a corresponding jpeg2000 implementation, for example, leading to the fast demise of the arguably superior format.


JPEG2000 is not actually much better and I always found the artifacts to be kind of unpleasant. Wavelets are not very good psychovisually because they make the image blurry; they're also more complicated to decode and cost more memory.

It's not a good idea to enable too many file formats in a browser because of the new security issues, so it really needs to be a huge improvement. I also think WebP was a mistake for this reason, it's not good enough.


Actually, JPEG 2000 seems pretty badass. I believe Apple is the only major vendor that supports JPEG 2000 out of the box.

From "JPEG 2000: The Better Alternative to JPEG That Never Made it Big": https://petapixel.com/2015/09/12/jpeg-2000-the-better-altern...

JPEG 2000 is a much better image solution than the original JPEG file format. Using a sophisticated encoding method, JPEG 2000 files can compress files with less loss of, what we might consider, visual performance. In addition, the file format is less likely to be affected by ‘bit errors’ and other file system errors due to its more efficient coding structure.

Those who choose to save their files in the JPEG 2000 standard can also choose between utilizing compression or saving the file as lossless to retain original detail. A higher dynamic range is also supported by the format with no limit of an image’s bit depth. Together, these abilities created a much better alternative than the original JPEG solution.


> In addition, the file format is less likely to be affected by ‘bit errors’ and other file system errors due to its more efficient coding structure.

This is confusing. I think they mean that a file is less likely to be corrupt if it's smaller, which is debatable. But I wouldn't use a newer codec just to make smaller files, I'd make them higher quality at the same size. In that case you need redundancy, which is the opposite of compression efficiency.

> A higher dynamic range is also supported by the format with no limit of an image’s bit depth.

JPEG supports this, but most decoders don't because pixel depth is not something you can just abstract away. Do JPEG2K decoders actually support 10/12-bit? HEIF does.


A file format can be notably more or less resilient to bit errors. It can be the difference between getting a slightly different output, a garbled one or an oops, sorry.

BTW compression efficiency is orthogonal not opposite of structured redundancy that you would want. As a thought experiment, imagine as a last step of coding, encrypting the data with a publicly known key. Theoretical redundancy remains the same, but good luck¹ getting your data back, if you get a bit error.

¹ Imagine a variable length single block cypher was used, multi round CBC or something.


> JAY-peg

I think they mean JAY-pheg.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: