If I recall correctly one of the most popular third party apps first released for the iPhone was a flashlight app that was a blank app with an icon and a name. Slightly more code than any of those under the hood, but I feel the idea is the same.
Straying further from empty-file-ness, but in a similar vein of low-effort apps in the early days of the iPhone was the controversial I Am Rich, which still makes me giggle:
> Send Me To Heaven (officially stylized as S.M.T.H.) is an Android application developed by Carrot Pop which measures the vertical distance that a mobile phone is thrown. Players compete against each other by seeking to throw their phones higher than others, often at the risk of damaging their phones.
While this was clearly an experiment etc, a "I am rich" showing off is still prevalent in most games these days with high ticket in-game purchases. Either of the variant that gives your character or base or whatever an appearance trait, or pay-to-win items that puts you high up on the leaderboards.
I find it ironic that Apple, which is now infamous for 700 dollar computer wheels and 999 dollar display stand, etc, was the party to swiftly ban such an experiment.
It was probably an oversight in the rules, as the entry's readme alludes to. They knew that by submitting this, the rules would be amended to be much more strict. Whether that includes requiring a compiler of some sort, I'm not sure. I doubt it - I would imagine the file just has to be 1 or more bytes now.
And it's actually pretty clever from a performance point-of-view.
You always have to read the inode to determine whether the file is executable. The inode included the locations of the first "direct" blocks of the file's contents. So if the file is empty you know that immediately.
If the file even had a single comment line, it would mean having to do a disk seek to read it. So it would effectively double the disk I/O needed to run "/bin/true"
IIRC GNU's /bin/true also has massive performance advantage. I am not able to find it presently, but an old post tried to reverse engineer it, and turns out it does caching and whatnot to get something like 12 GBps throughput.
And yet you might still be right: If an empty file gets parsed and executed by the shell[1], the shell's initialization code might make things much slower than a small C program.
[1] Which is a valid assumption, as it's often the default (not sure if it's POSIX[2]), e.g. this gets clearly executed by my shell despite not having a shebang or anything else:
The real reason why things are fast in popular shells is because they implement most of coreutils as builtins. The /bin/true being empty file would prevent it from working when called with exec for example, so that's why we don't use empty files anymore.
From quick scan of v7 exec() implementation it seems that it has to be case as getxfile() would return ENOEXEC for files smaller than sizeof(u_exdata) (which is a.out header as C struct).
In case anyone's curious, in absence of a shebang (#!), Linux [1] returns an ENOEXEC to the execve syscall, after which the invoking program (the shell) handles the failure. Usually shells default to running the file as a shell script with itself as argv0.
This can lead to an interesting situation where a program will work if launched by a shell (or by another program that uses `execlp`), but will fail if it is launched with a different variant of `exec`. For example:
$ touch empty
$ chmod a+x empty
$ ./empty # Works
$ valgrind -q ./empty # Works
$ timeout 10s ./empty # Works
$ /usr/bin/time ./empty # Works
$ perl -e 'exec("./empty") or die' # Works
$ python -c 'import subprocess; subprocess.check_call("./empty", shell=True)' # Works
$ python -c 'import subprocess; subprocess.check_call("./empty")' # Fails
...
OSError: [Errno 8] Exec format error
$ ruby -e 'exec "./empty"' # Fails
-e:1:in `exec': Exec format error - ./empty (Errno::ENOEXEC)
from -e:1
$ strace ./empty # Fails
execve("./empty", ["./empty"], 0x7fff6639f3a0 /* 84 vars */) = -1 ENOEXEC (Exec format error)
strace: exec: Exec format error
+++ exited with 1 +++
It can be quite fun to track down why a program executes successfully and then later fails to execute, with no changes made to the program in between.
I knew this about the programming languages which wrap execution in a shell, but I never knew that execlp/execvp/execvpe handled ENOEXEC internally and wrapped the command in a shell too. TIL!
A fun consequence of this is that running a python script that doesn't have the shebang will mysteriously hang with a crosshair cursor until you click the mouse. When the shell encounters the first "import foo" line it tries to run ImageMagick's "import" utility which tries to take a screenshot of the selected window.
> If the execl() function fails due to an error equivalent to the [ENOEXEC] error defined in the System Interfaces volume of POSIX.1-2017, the shell shall execute a command equivalent to having a shell invoked with the pathname resulting from the search as its first operand, with any remaining arguments passed to the new shell, except that the value of "$0" in the new shell may be set to the command name. If the executable file is not a text file, the shell may bypass this command execution. In this case, it shall write an error message, and shall return an exit status of 126.
> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
The idea behind the definition is that it defines set of files that will not trigger various bugs in traditional unix text tools implementations (ie. various variants of not checking the fgets() return value).
I'd expect the same as any other failure from execv(2). Something like,
zsh: Exec format error: empty-file
(last command returned 127.)
(127 is the exit status for other failures, so I've used it here. The second line is intended to be part of my PS1, the first zsh's normal reporting. The string used here is what I get from perror() for that code.)
Status 129 or something else outside the usual range, plus a message that the file was not a valid ELF and also has no shebang, therefore cannot be executed.
I always thought that 128+ are for (shell-spawned) processes and 0-127 are for shell. But, apparently POSIX requires this behavior for empty files - we need to amend the standard, first (or fail to comply, muahahaha).
the multiple execution seems to be much slower with the empty executable than with /bin/true using the multitime tool https://tratt.net/laurie/src/multitime/
$ multitime -n 10 ./empty
===> multitime results
1: ./empty
Mean Std.Dev. Min Median Max
real 0.006 0.000 0.005 0.006 0.006
user 0.001 0.001 0.000 0.002 0.002
sys 0.000 0.001 0.000 0.000 0.002
$ multitime -n 10 /bin/true
===> multitime results
1: /bin/true
Mean Std.Dev. Min Median Max
real 0.002 0.000 0.001 0.002 0.003
user 0.001 0.000 0.001 0.001 0.001
sys 0.000 0.000 0.000 0.000 0.000
> What is also interesting is that while strace seems to report much less overhead in "./empty" compared to /bin/true
That's because it reports an error and does not actually execute the file.
>strace: exec: Erreur de format pour exec()
`strace` uses one of the exec()-style functions that don't automatically invoke a shell here. And if it did, it would invoke an entire shell (/bin/sh, typically), which might have more overhead.
----
You need to be very careful when measuring this, because what happens is that a shell executes the shebangless empty file.
If you try launching it from e.g. a C program using execlp(), it will have to first start that shell.
If you try launching it from e.g. bash, it might directly run that in-process or at least after a fork() or similar, skipping some of the shell setup.
This is a great trick for a lot of things. For example, if a web server is getting lots of attacks against a particular URL, one could block that URL or otherwise make it inaccessible. That'll turn it into a 404 or 500, or whatever. But it turns out that a 200 is much faster, and so replacing the target with a blank file is faster than denying it. This is particularly true with xml-rpc.php. Instead of just removing it, empty it. Problem solved.
204 is the fastest, since it is exactly zero bytes of content. For most webservers you don't even need to point it at a specific file, just say "return 204" and that's enough. I've done this numerous times to "whitehole" bad scanners or bots with great success :)
Imagine if you have a site that displays some sort of custom content when a client is hitting a 404 error. Even if it’s just a static page, the web browser still has to follow the code path and open it. Even if it caches the page internally, it still has to regularly check if the file updated on disk.
The amount of time spent by the web browser is fairly small, but the parent comment mentioned it in context of an attack, so that small per-page effort multiplied by many connections makes for more of a substantial load.
> Yesterday's post about "pipefail" also involved some systems which treated an empty file as a valid file. This turns out to be surprisingly common.
I don't understand why this is surprising. Of course an empty file is a valid file, just like an empty piece of paper is a valid piece of paper.
I mean, yes, a piece of paper is also different. Paper is a physical object with volume and weight; a file is a sequence of bytes representing information. But, step out of your programmer mindset for a moment. Almost every user interface represents files as objects containing information, not information itself.
I think the implication isn't merely "valid file" but rather it's "valid file of a particular type". The example given is that an empty file is a valid executable. It might also be a valid CSV file but not a valid Excel file.
I know what you mean, I also took a double-take when I read that sentence. But when I read a bit further I decided it was just poor phrasing initially.
OK, but this is about an empty file being a valid program file (though the title doesn't make that clear). I mean, an empty sheet of paper is also a valid list of instructions, but that's at least slightly weirder than it being a valid sheet of paper.
Yes, I know of the value of zero (and not just because I studied theory/math heavy fields in uni).
Does not change that this does not always apply. Having no piece of paper is valid as "no PhD", but an empty piece of paper is not a valid PhD. In the same sense, an empty piece of paper is also not a "$0 banknote", it's not a banknote, period.
It is true that in many cases, empty files are valid for their context (e.g. CSV, plain text files, shell scripts), in many other cases, they are not (e.g. any file format at all that requires a magic number in a header or footer, which is many many file formats). Try to open these in those contexts, and you would usually get an error, for example: "You told me this was supposed to be a JPEG, but this file does not start with ff d8 ff e0 [Trivially true for an empty file! No, files are not equivalent to sets in the Zermelo-Fraenkel sense!] What is wrong with you?"
I don't get the discussion here. This is all obvious, and it was obvious in the original sentence from context.
I mean, if we want to run with the man analogies...
The empty set is a valid set. Perfectly reasonable, and completely necessary in many cases.
But the empty Group is not a valid Group. A Group requires an identity element, which means non-empty. The smallest valid Group, the Trivial Group, still requires an element.
I would argue a diploma or a banknote resemble a Group more than a Set. There are required elements, such as an issuing institution.
I think she meant "an empty file is a valid file to some programs that should ostensibly expect otherwise", but worded in a way that ends up being confusing.
An empty file was also the infinitely profitable program: http://peetm.com/blog/?p=55 On CP/M, it was useful to rerun the previous program that was still in memory.
> It's probably something funny involving how the interpreter is spun up - looking for #! and that kind of stuff.
Is this actually a result of exec(2) treating the file as text?
My intuition is something else: that this is instead a long-inherited compatibility shim for very, very old executable object formats (e.g. COM, a.out) for systems without virtual memory — where in these executable formats, you’d expect to be able to have the instruction pointer “run off the end” of an executable, with the effect of that being to cleanly quit the program — rather than trapping on an undefined op, hitting a HLT, or getting anything like a protection fault. The OS would ensure control returns to it after program “run-out” execution, by the OS exec(2)-alike syscall ensuring it writes a RET op or the like just past the end of the memory it copied the loaded program into, before jumping to the beginning of said memory.
Mind you, jumps into “hyperspace” (beyond where anything bothered to write; perhaps beyond where any chips are mapped on the bus) won’t take you to a clean RET op written by the OS, but rather likely to a 00 (active-high logic) or FF (active-low logic) op. Which is precisely why CPU designers usually make the relevant one of those for their logic into the ISA’s HLT op!
(Fun fact: even bytecode abstract machines do this. Read “off the end” of an Ethereum contract and you get 00 bytes — and, no surprise, 00 is the EVM ISA HLT op.)
I ran into this lately when I hadn’t validated a successful file transfer. The script ran to completion against all expectations. Obviously you should check a hash before executing a download anyway, but this was just some automated build code.
It is indeed a shell behavior, specified in POSIX sh.
> If the execve() function fails due to an error equivalent to the [ENOEXEC] error defined in the System Interfaces volume of IEEE Std 1003.1-2001, the shell shall execute a command equivalent to having a shell invoked with the pathname resulting from the search as its first operand, with any remaining arguments passed to the new shell, except that the value of "$0" in the new shell may be set to the command name. If the executable file is not a text file, the shell may bypass this command execution. In this case, it shall write an error message, and shall return an exit status of 126.
Quote: "Awesome, right? This works just fine on Macs as well."
Well, doesn't work on Windows though. I get "Access is denied" when I try to run an empty .exe. Even when I do it with an elevated prompt I get the same. Yeah, so..¯\_(ツ)_/¯
The thing is that the executable is not ran at all. You get the error coming from OS directly before, not after it fake-ish exited like explained in article for Linux/MacOS.
I think this was neatly explained by Ray in one of his older entries. The OS tries to load the .exe's header, then assign heap+stack according to that header, and only then will start the execution at entry point pointed by the header.
So, empty file -> no header -> OS complains -> no execution + above error.
> as far as the package database was concerned, it was
"rpm -qV" is a command that will audit the signatures of a package's contents. I don't know if dpkg has something similar but it would be cool if it did.
2002 according to the copyright. But it also has this note:
debsums is intended primarily as a way of determining what installed files have been locally modified by the administrator or damaged by media errors and is of limited use as a security tool.
I've misread the title as "You can do a lot with an empty life", and clicked eagerly hiping to learn how to deal with the feeling of emptiness in your life.
There is a bit of a difference to me between the behavior described in the article (directly executing with ./foo) and "sh foo" or "python foo".
In the former case, I would have expected a shebang to be needed, so that execv(2) would know what to execute (as it needs some magic bytes to determine what loader to use, or whether to treat it as a script, and I didn't think it had a fallback). (And I'm right… in a sense. The actual behavior is more complex & more horrifying, and covered in other comments.)
I often feel the same way. I feel like I'm reading the blog of a tech newbie at the level of discovering `symlinks` for the very first time. The title is overrated and seems clickbait-y and so I was left disappointed.