Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You can do a lot with an empty file (rachelbythebay.com)
332 points by picture on April 6, 2022 | hide | past | favorite | 117 comments


You can even win an IOCCC and be remembered nearly thirty years later: https://github.com/c00kiemon5ter/ioccc-obfuscated-c-contest/...


If I recall correctly one of the most popular third party apps first released for the iPhone was a flashlight app that was a blank app with an icon and a name. Slightly more code than any of those under the hood, but I feel the idea is the same.


Straying further from empty-file-ness, but in a similar vein of low-effort apps in the early days of the iPhone was the controversial I Am Rich, which still makes me giggle:

https://en.wikipedia.org/wiki/I_Am_Rich

>I am rich

>I deserv [sic] it

>I am good,

>healthy & successful


This one deserves a mention as well:

> Send Me To Heaven (officially stylized as S.M.T.H.) is an Android application developed by Carrot Pop which measures the vertical distance that a mobile phone is thrown. Players compete against each other by seeking to throw their phones higher than others, often at the risk of damaging their phones.

https://en.wikipedia.org/wiki/Send_Me_To_Heaven


This could be a direct ancestor of NFTs


An NFT way ahead of its time.


While this was clearly an experiment etc, a "I am rich" showing off is still prevalent in most games these days with high ticket in-game purchases. Either of the variant that gives your character or base or whatever an appearance trait, or pay-to-win items that puts you high up on the leaderboards.


I find it ironic that Apple, which is now infamous for 700 dollar computer wheels and 999 dollar display stand, etc, was the party to swiftly ban such an experiment.


https://github.com/c00kiemon5ter/ioccc-obfuscated-c-contest/...

The compilation command line for anyone curious. Clever.


So the participant was in fact allowed to change the compiler command so `gcc` isn't even used...? I had wondered how gcc would compile an empty file.


It was probably an oversight in the rules, as the entry's readme alludes to. They knew that by submitting this, the rules would be amended to be much more strict. Whether that includes requiring a compiler of some sort, I'm not sure. I doubt it - I would imagine the file just has to be 1 or more bytes now.


Gcc is perfectly happy to compile an empty file. But the linker does not find a "main" symbol in its output, so complains.


> Does this mean the smallest /bin/true is actually an empty file?

That was exactly the implementation of /bin/true in Unix v7: <https://github.com/v7unix/v7unix/blob/master/v7/bin/true>


And it's actually pretty clever from a performance point-of-view.

You always have to read the inode to determine whether the file is executable. The inode included the locations of the first "direct" blocks of the file's contents. So if the file is empty you know that immediately.

If the file even had a single comment line, it would mean having to do a disk seek to read it. So it would effectively double the disk I/O needed to run "/bin/true"


Clever yes, but you'll get much bigger benefits if you use a filesystem that can store tiny files inside the inode.


It’s also the smallest quine, a program that outputs its own source code.


And probably the one with the single highest number of programming languages for which it is a valid quine[1].


You appear to be missing the link, I would quite like to read it however if you get the chance to edit that in!


He's not missing the link. The link is the empty string.


It doesn’t actually support what I said I think, but you might like it anyway: http://www.madore.org/~david/computers/quine.html


This is worth its own post on HN



This subject has been posted about a few times on HN, along with some looks at the evolution of the implementation of the command: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

Here's the most recent, with the most comments: https://news.ycombinator.com/item?id=28257666


Meanwhile GNU coreutils version of /bin/true ... https://github.com/coreutils/coreutils/blob/master/src/true....


Fair, though most of that is the fact that it has --help output and also that the same source covers both true and false. But yeah.


IIRC GNU's /bin/true also has massive performance advantage. I am not able to find it presently, but an old post tried to reverse engineer it, and turns out it does caching and whatnot to get something like 12 GBps throughput.

Edit: I was wrong, it is actually GNU 'yes': https://www.teddit.net/r/unix/comments/6gxduc/how_is_gnu_yes...


And yet you might still be right: If an empty file gets parsed and executed by the shell[1], the shell's initialization code might make things much slower than a small C program.

[1] Which is a valid assumption, as it's often the default (not sure if it's POSIX[2]), e.g. this gets clearly executed by my shell despite not having a shebang or anything else:

    echo 'echo $SHELL' > /tmp/foo; chmod +x /tmp/foo; /tmp/foo
[2] Edit: It is POSIX, I know now because there's a whole subthread about it further down, that even quotes the standard, thanks gnarula.


The real reason why things are fast in popular shells is because they implement most of coreutils as builtins. The /bin/true being empty file would prevent it from working when called with exec for example, so that's why we don't use empty files anymore.


I was going to see what strace does with an empty /tmp/foo, and it fails:

> write(2, "strace: exec: Exec format error\n", 32strace: exec: Exec format error

So the empty file is not exec()able. I think it can only be invoked from a shell.

subprocess.call("/tmp/foo", shell=xxxx) demonstrates it.

I wonder if that was the case for Unix v7 that had it zero length? (...mentioned in another thread.)


From quick scan of v7 exec() implementation it seems that it has to be case as getxfile() would return ENOEXEC for files smaller than sizeof(u_exdata) (which is a.out header as C struct).


A more interesting question: Is it copyrightable?

I'd say it isn't (and the same goes for false); but whether it's patentable is more debatable.



It also means that the shortest quine is an empty file.


In case anyone's curious, in absence of a shebang (#!), Linux [1] returns an ENOEXEC to the execve syscall, after which the invoking program (the shell) handles the failure. Usually shells default to running the file as a shell script with itself as argv0.

[1] https://github.com/torvalds/linux/blob/3e732ebf7316ac83e8562...


This can lead to an interesting situation where a program will work if launched by a shell (or by another program that uses `execlp`), but will fail if it is launched with a different variant of `exec`. For example:

    $ touch empty
    $ chmod a+x empty

    $ ./empty  # Works

    $ valgrind -q ./empty  # Works

    $ timeout 10s ./empty  # Works

    $ /usr/bin/time ./empty  # Works

    $ perl -e 'exec("./empty") or die'  # Works

    $ python -c 'import subprocess; subprocess.check_call("./empty", shell=True)'  # Works

    $ python -c 'import subprocess; subprocess.check_call("./empty")'  # Fails
    ...
    OSError: [Errno 8] Exec format error

    $ ruby -e 'exec "./empty"'   # Fails
    -e:1:in `exec': Exec format error - ./empty (Errno::ENOEXEC)
            from -e:1

    $ strace ./empty  # Fails
    execve("./empty", ["./empty"], 0x7fff6639f3a0 /* 84 vars */) = -1 ENOEXEC (Exec format error)
    strace: exec: Exec format error
    +++ exited with 1 +++

It can be quite fun to track down why a program executes successfully and then later fails to execute, with no changes made to the program in between.


I knew this about the programming languages which wrap execution in a shell, but I never knew that execlp/execvp/execvpe handled ENOEXEC internally and wrapped the command in a shell too. TIL!


A fun consequence of this is that running a python script that doesn't have the shebang will mysteriously hang with a crosshair cursor until you click the mouse. When the shell encounters the first "import foo" line it tries to run ImageMagick's "import" utility which tries to take a screenshot of the selected window.


Was this behavior inherited from Unixes past? It seems like it'd be better if shells just returned the error back to the user.


From POSIX [1][2]:

> If the execl() function fails due to an error equivalent to the [ENOEXEC] error defined in the System Interfaces volume of POSIX.1-2017, the shell shall execute a command equivalent to having a shell invoked with the pathname resulting from the search as its first operand, with any remaining arguments passed to the new shell, except that the value of "$0" in the new shell may be set to the command name. If the executable file is not a text file, the shell may bypass this command execution. In this case, it shall write an error message, and shall return an exit status of 126.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V... [2] https://unix.stackexchange.com/a/373229


Thanks!

Does POSIX define what a "text file" is?


> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...


Does an empty file contain characters? That first sentence seems like it contradicts itself.


Yes. It contains zero characters.

An empty file actually contains zero of everything in the universe.


The idea behind the definition is that it defines set of files that will not trigger various bugs in traditional unix text tools implementations (ie. various variants of not checking the fgets() return value).


I think the implication is that empty files are considered text files.


If it does, I never found it.


Shells return exit codes, not typed error values. What kind of output would you expect from the shell, exactly?


I'd expect the same as any other failure from execv(2). Something like,

  zsh: Exec format error: empty-file
  (last command returned 127.)
(127 is the exit status for other failures, so I've used it here. The second line is intended to be part of my PS1, the first zsh's normal reporting. The string used here is what I get from perror() for that code.)


Okay, yeah. That could be useful. Seems POSIX forces the behavior described in the blog post. Time to change the standard!


Status 129 or something else outside the usual range, plus a message that the file was not a valid ELF and also has no shebang, therefore cannot be executed.


I always thought that 128+ are for (shell-spawned) processes and 0-127 are for shell. But, apparently POSIX requires this behavior for empty files - we need to amend the standard, first (or fail to comply, muahahaha).


Yeah, it predates the addition of shebang support in BSD.


What is also interesting is that while strace seems to report much less overhead in "./empty" compared to /bin/true

   $ strace -c ./empty 
   strace: exec: Erreur de format pour exec()
   % time     seconds  usecs/call     calls    errors syscall
   ------ ----------- ----------- --------- --------- ----------------
     0,00    0,000000           0         1         1 execve
   ------ ----------- ----------- --------- --------- ----------------
   100,00    0,000000           0         1         1 total


   $ strace -c /bin/true
   % time     seconds  usecs/call     calls    errors syscall
   ------ ----------- ----------- --------- --------- ----------------
    54,05    0,000060          15         4           mprotect
    24,32    0,000027          27         1           set_tid_address
    10,81    0,000012          12         1           set_robust_list
     8,11    0,000009           9         1           munmap
     2,70    0,000003           3         1           prlimit64
     0,00    0,000000           0         1           read
     0,00    0,000000           0         2           close
     0,00    0,000000           0         8           mmap
     0,00    0,000000           0         1           brk
     0,00    0,000000           0         4           pread64
     0,00    0,000000           0         1         1 access
     0,00    0,000000           0         1           execve
     0,00    0,000000           0         2         1 arch_prctl
     0,00    0,000000           0         2           openat
     0,00    0,000000           0         2           newfstatat
   ------ ----------- ----------- --------- --------- ----------------
   100,00    0,000111           3        32         2 total

the multiple execution seems to be much slower with the empty executable than with /bin/true using the multitime tool https://tratt.net/laurie/src/multitime/

   $ multitime -n 10 ./empty
   ===> multitime results
   1: ./empty
               Mean        Std.Dev.    Min         Median      Max
   real        0.006       0.000       0.005       0.006       0.006       
   user        0.001       0.001       0.000       0.002       0.002       
   sys         0.000       0.001       0.000       0.000       0.002       
   
   $ multitime -n 10 /bin/true
   ===> multitime results
   1: /bin/true
               Mean        Std.Dev.    Min         Median      Max
   real        0.002       0.000       0.001       0.002       0.003       
   user        0.001       0.000       0.001       0.001       0.001       
   sys         0.000       0.000       0.000       0.000       0.000


> What is also interesting is that while strace seems to report much less overhead in "./empty" compared to /bin/true

That's because it reports an error and does not actually execute the file.

>strace: exec: Erreur de format pour exec()

`strace` uses one of the exec()-style functions that don't automatically invoke a shell here. And if it did, it would invoke an entire shell (/bin/sh, typically), which might have more overhead.

----

You need to be very careful when measuring this, because what happens is that a shell executes the shebangless empty file.

If you try launching it from e.g. a C program using execlp(), it will have to first start that shell.

If you try launching it from e.g. bash, it might directly run that in-process or at least after a fork() or similar, skipping some of the shell setup.

So the overhead might depend on context.


Interesting!


This is a great trick for a lot of things. For example, if a web server is getting lots of attacks against a particular URL, one could block that URL or otherwise make it inaccessible. That'll turn it into a 404 or 500, or whatever. But it turns out that a 200 is much faster, and so replacing the target with a blank file is faster than denying it. This is particularly true with xml-rpc.php. Instead of just removing it, empty it. Problem solved.


204 is the fastest, since it is exactly zero bytes of content. For most webservers you don't even need to point it at a specific file, just say "return 204" and that's enough. I've done this numerous times to "whitehole" bad scanners or bots with great success :)


What does it mean for a 200 to be faster? I guess I can see some framework or CDN reasons that might be so but it doesn’t feel like an inherent thing


Imagine if you have a site that displays some sort of custom content when a client is hitting a 404 error. Even if it’s just a static page, the web browser still has to follow the code path and open it. Even if it caches the page internally, it still has to regularly check if the file updated on disk.

The amount of time spent by the web browser is fairly small, but the parent comment mentioned it in context of an attack, so that small per-page effort multiplied by many connections makes for more of a substantial load.


In the case of an attack, you don’t care about how much effort the web browser would have to make, only the web server.

In fact, instead of serving a malicious client a 200 empty file, you might want to serve up <script>while(1){}</script> .


But why can't you just return a 404 with no content instead of a 200 with no content?


You can, but many CMSs (WordPress specifically) do not. They serve a dynamic page.


This reminds me of the story about the "go.com" program someone sold for £5, which was just an empty file: http://peetm.com/blog/?p=55


It's probably the cleanest code you will ever see.


And it probably had at least one bug.


I’d say, that’s probably the only situation where there’s actually no bugs.


I really wish it used CR/LF instead of just LF ;P

wonder what unix2dos would do to it


Historically there was also NaDa:

https://web.archive.org/web/20140813120630/http://www.bernar...

but is bloated in comparison, being one byte in size.


It was very useful!


this is fantastic


> Yesterday's post about "pipefail" also involved some systems which treated an empty file as a valid file. This turns out to be surprisingly common.

I don't understand why this is surprising. Of course an empty file is a valid file, just like an empty piece of paper is a valid piece of paper.

I mean, yes, a piece of paper is also different. Paper is a physical object with volume and weight; a file is a sequence of bytes representing information. But, step out of your programmer mindset for a moment. Almost every user interface represents files as objects containing information, not information itself.


I think the implication isn't merely "valid file" but rather it's "valid file of a particular type". The example given is that an empty file is a valid executable. It might also be a valid CSV file but not a valid Excel file.


Thanks, yes, that makes much more sense!


I know what you mean, I also took a double-take when I read that sentence. But when I read a bit further I decided it was just poor phrasing initially.


OK, but this is about an empty file being a valid program file (though the title doesn't make that clear). I mean, an empty sheet of paper is also a valid list of instructions, but that's at least slightly weirder than it being a valid sheet of paper.


Yeah, and an empty sheet of paper is not a valid diploma for example, or a valid banknote. Pretty clear from context.


Zero is a valuable concept. It's a diploma for no degree from nowhere or a banknote worth no value.

You can spot a lot of mistakes by asking code to compute with a 0-by-0 matrix or to write zero rows of data.

For example, the matrix case was fixed in the GSL in 2017: https://git.savannah.gnu.org/cgit/gsl.git/tree/matrix/Change.... In $DAYJOB I have seen multiple generations of systems get the no rows edge case wrong in the API design.

Zero goes back fewer millenia than one might think: https://en.wikipedia.org/wiki/0


Yes, I know of the value of zero (and not just because I studied theory/math heavy fields in uni).

Does not change that this does not always apply. Having no piece of paper is valid as "no PhD", but an empty piece of paper is not a valid PhD. In the same sense, an empty piece of paper is also not a "$0 banknote", it's not a banknote, period.

It is true that in many cases, empty files are valid for their context (e.g. CSV, plain text files, shell scripts), in many other cases, they are not (e.g. any file format at all that requires a magic number in a header or footer, which is many many file formats). Try to open these in those contexts, and you would usually get an error, for example: "You told me this was supposed to be a JPEG, but this file does not start with ff d8 ff e0 [Trivially true for an empty file! No, files are not equivalent to sets in the Zermelo-Fraenkel sense!] What is wrong with you?"

I don't get the discussion here. This is all obvious, and it was obvious in the original sentence from context.


I mean, if we want to run with the man analogies...

The empty set is a valid set. Perfectly reasonable, and completely necessary in many cases.

But the empty Group is not a valid Group. A Group requires an identity element, which means non-empty. The smallest valid Group, the Trivial Group, still requires an element.

I would argue a diploma or a banknote resemble a Group more than a Set. There are required elements, such as an issuing institution.


I think she meant "an empty file is a valid file to some programs that should ostensibly expect otherwise", but worded in a way that ends up being confusing.


> Does this mean the smallest /bin/true is actually an empty file?

Yes. It used to be. Here's the history from Rob Pike https://twitter.com/rob_pike/status/966896123548872705


An empty file was also the infinitely profitable program: http://peetm.com/blog/?p=55 On CP/M, it was useful to rerun the previous program that was still in memory.


> It's probably something funny involving how the interpreter is spun up - looking for #! and that kind of stuff.

Is this actually a result of exec(2) treating the file as text?

My intuition is something else: that this is instead a long-inherited compatibility shim for very, very old executable object formats (e.g. COM, a.out) for systems without virtual memory — where in these executable formats, you’d expect to be able to have the instruction pointer “run off the end” of an executable, with the effect of that being to cleanly quit the program — rather than trapping on an undefined op, hitting a HLT, or getting anything like a protection fault. The OS would ensure control returns to it after program “run-out” execution, by the OS exec(2)-alike syscall ensuring it writes a RET op or the like just past the end of the memory it copied the loaded program into, before jumping to the beginning of said memory.

Mind you, jumps into “hyperspace” (beyond where anything bothered to write; perhaps beyond where any chips are mapped on the bus) won’t take you to a clean RET op written by the OS, but rather likely to a 00 (active-high logic) or FF (active-low logic) op. Which is precisely why CPU designers usually make the relevant one of those for their logic into the ISA’s HLT op!

(Fun fact: even bytecode abstract machines do this. Read “off the end” of an Ethereum contract and you get 00 bytes — and, no surprise, 00 is the EVM ISA HLT op.)


I ran into this lately when I hadn’t validated a successful file transfer. The script ran to completion against all expectations. Obviously you should check a hash before executing a download anyway, but this was just some automated build code.


This is hilarious, though interestingly if you try to see what's going on with gdb or strace it actually errors out properly.

    $ strace ./empty
    execve("./empty", ["./empty"], 0x7ffe0156c9d0 /* 61 vars */) = -1 ENOEXEC (Exec format error)
    strace: exec: Exec format error
    +++ exited with 1 +++
(I'm not very good with gdb so maybe this is wrong)

    gdb empty
    "0x7ffd34c7dc30s": not in executable format: file format not recognized
    (gdb) exec-file
    No executable file now
I think what's actually happening is that bash is `source`-ing the file, so that's what causes the exit code to be zero.


It is indeed a shell behavior, specified in POSIX sh.

> If the execve() function fails due to an error equivalent to the [ENOEXEC] error defined in the System Interfaces volume of IEEE Std 1003.1-2001, the shell shall execute a command equivalent to having a shell invoked with the pathname resulting from the search as its first operand, with any remaining arguments passed to the new shell, except that the value of "$0" in the new shell may be set to the command name. If the executable file is not a text file, the shell may bypass this command execution. In this case, it shall write an error message, and shall return an exit status of 126.

https://pubs.opengroup.org/onlinepubs/009604499/utilities/xc...


Which likely means the zero byte /bin/true is SLOWER than a compiled correct one because it has to first ENOEXEC and then reexecute via the shell.


Does file that don't contain any LF is valid "text file", though?


Quote: "Awesome, right? This works just fine on Macs as well."

Well, doesn't work on Windows though. I get "Access is denied" when I try to run an empty .exe. Even when I do it with an elevated prompt I get the same. Yeah, so..¯\_(ツ)_/¯


Try EMPTY.BAT from CMD.COM, or EMPTY.PS1 from PowerShell.

A zero-byte executable doesn’t work on macOS either. This article is about command shell behavior, not UI shell behavior.


Windows isn't a UNIX clone, at very much an VMS inspired OS.


I don't know how executables work on Windows but I'm guessing that the shell expects a different kind of 'success' response?


The thing is that the executable is not ran at all. You get the error coming from OS directly before, not after it fake-ish exited like explained in article for Linux/MacOS.

I think this was neatly explained by Ray in one of his older entries. The OS tries to load the .exe's header, then assign heap+stack according to that header, and only then will start the execution at entry point pointed by the header.

So, empty file -> no header -> OS complains -> no execution + above error.


Maybe an empty com file will work?


Nope, same error message.


My favorite way to create an empty file and impress ladies at bars is:

"> emptyfilename"


> as far as the package database was concerned, it was

"rpm -qV" is a command that will audit the signatures of a package's contents. I don't know if dpkg has something similar but it would be cool if it did.



md5? Is 'jammy' a remarkably old platform? I hope so.

But regardless of the signature algorithm, thanks for sharing the mechanism.


2002 according to the copyright. But it also has this note:

debsums is intended primarily as a way of determining what installed files have been locally modified by the administrator or damaged by media errors and is of limited use as a security tool.


> md5? Is 'jammy' a remarkably old platform? I hope so.

Jammy comes out next month.

Though if you encounter an md5 second preimage attack that's pretty impressive.


Misread the title: "You can do a lot with an empty rifle". Whole 'nother topic.


Clearly following the "everything is a rifle" philosophy of OS development.

Though one notes that rifle-based storage incorporates the pipe and executable.


I use empty files extremely successfully as a task management system

Total commander allows you to assign custom icons to extensions and lots of other things that make this very handy.


I've misread the title as "You can do a lot with an empty life", and clicked eagerly hiping to learn how to deal with the feeling of emptiness in your life.


The infinitely profitable program: http://www.peetm.com/blog/?p=55



AFAICS some other interpreters, perl, python, ruby, tcl are happy to return a 0 exit code if invoked on an empty file.


There is a bit of a difference to me between the behavior described in the article (directly executing with ./foo) and "sh foo" or "python foo".

In the former case, I would have expected a shebang to be needed, so that execv(2) would know what to execute (as it needs some magic bytes to determine what loader to use, or whether to treat it as a script, and I didn't think it had a fallback). (And I'm right… in a sense. The actual behavior is more complex & more horrifying, and covered in other comments.)


Pretty much. They ran all the code that was in the file without errors.


Empty files are common in Python: a __init__.py is required (or was) to mark a directory as a Python package, allowing its content to be imported.


it should be 'touch true'. shortest true ever.


none of the posts on that blog make sense


I often feel the same way. I feel like I'm reading the blog of a tech newbie at the level of discovering `symlinks` for the very first time. The title is overrated and seems clickbait-y and so I was left disappointed.


Who is this Rachel and why does she use HN as her personal blog outlet?


Rachel doesn’t. This was posted by ‘picture.


Yet every single post is posted here. And you didn't answer the question, who is Rachel why does she or it matter?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: