Kolmogorov complexity is absolutely the wrong metric. It doesn't account for big...

shpx · on Jan 3, 2024

What I'm asking about is how much energy and time (computation) by a human brain it takes to emit or ingest each program because a human working 40 hour weeks from 18-65 will have 100,000 hours of working time, which at a typing speed of 250 characters per minute and a reading rate of 1500 characters per minute is a total career budget of like a billion characters emitted and 10 billion characters ingested.

For emitting programs, if we assume the program is already fully formed in my brain and I'm just transcribing it, then we would like an accurate physical model of my hands moving over a QWERTY keyboard that can tell us how many joules and milliseconds I will use for each keystroke to type the given sequence of symbols. Ideally we would have a complete model or simulation of my brain so we can measure how many neurons need to fire for each keystroke as well. We don't have that, so we could measure my average milliseconds per character and multiply it by the total count of characters, but a language that is just made up of one symbol function names is probably harder to type than one made out of English words and there's also all the typing mistakes I will make. This is what the simple compression algorithm (gzip) is an attempt to normalize for, that a more verbose but more predictable language is as fast to write and read than an overly terse one. gziping is an imitation of a complex model.

igouy · on Jan 3, 2024

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

lifthrasiir · on Jan 3, 2024

> Do you write minified javascript?

Yes [1] [2] [3]?

[1] https://js1024.fun/demos/2020/46/readme

[2] https://js1024.fun/demos/2022/18/readme

[3] https://github.com/lifthrasiir/roadroller/blob/442caa4/index...

Jokes aside, human doesn't read each (uncompressed) byte anyway. The number of tokens would have been much better than the number of bytes, but even this is unclear because a single token can have multiple perceived words (e.g. someLongEnoughIdentifier) and multiple tokens can even be perceived as a single word for some cases (e.g. C/C++ `#define` is technically two tokens long, but no human would perceive it as such). I would welcome a more realistic estimate than the gzipped size, but I'm confident that it won't be the number of uncompressed bytes.