Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Checksum.sh verify every install script (checksum.sh)
119 points by gavinuhma on Oct 28, 2022 | hide | past | favorite | 75 comments
The pattern of downloading and executing installation scripts without verifying them has bothered me for a while.

I started messing around with a way to verify the checksum of scripts before I execute them. I've found it a really useful tool for installing things like Rust or Deno.

It's written entirely as a shell script, and it's easy to read and understand what's happening.

I hope it may be useful to someone else!



There are two big problems with the use of `echo $s` in bash/POSIX sh:

1. Never use echo to output untrusted content as the first argument

Let's say `s='-e 1\n2'`, then `echo $s` will output:

> 1

> 2

Instead of:

> -e 1\n2

Always use printf if you want to start output with untrusted content, e.g., `printf %s\\n "$s"`.

2. Never use unquoted variable expansion when trying to exactly reproduce contents of the variable

Similarly, unquoted variable expansion re-tokenizes the contents and will not preserve spaces appropriately. Say `s='"a<space><space>b"'` (where each <space> is a literal ' ', HN seems to be collapsing 2 spaces down to 1), then `echo $s` will output:

> "a<space>b"

Instead of:

> "a<space><space>b"

You can get the latter with `echo "$s"` but use `printf %s\\n "$s"` to fix both issues.

PS: If you fail to use quoted expansion with printf, for example like so, `printf %s\\n $s`, then you'll notice the problem right away, as it will effectively turn that into `for i in $s ; do printf %s\\n "$i" ; done`. That's actually a very useful feature of printf if you know to use it.

Edit: These problems exist for bash/POSIX sh at least. Perhaps you're using a shell that works differently, like zsh, because otherwise issue 2 would probably have led to some checksum fails for you already.


If I may pile on with a general suggestion for people writing shell scripts: Use shellcheck. Always. It will catch these things automatically for you:)


Great post, you are wise in the ways of the shell. Minutiae like this is exactly why I stop writing shell scripts the moment I start, and reach for python or some other sane language. But, I can't help but respect when I see masters of sh work their magic.


Honestly, 90% of problems with scripts are people forgetting to put double quotes around stuff. The other stuff doesn't come up that much, and once you write a few decent scripts, the other stuff is as easy as noticing someone wrote `open = True` in Python, not realizing they've redefined a builtin function, and the fix is just do `is_open = True`.

So just put double quotes around all your variable expansions unless you know you shouldn't -- 90% of scripts would be "fixed" with just that. And don't bother putting curly braces into the variable expansion unless you know you need to. People tend to think `echo ${s}` is somehow better than `echo $s` when it's exactly the same -- the curly braces are just a way to allow you to, e.g., write `"${s}_"` as distinct from `"${s_}"`. AFAIK in fish `${s}` is identical to `"$s"`, but that's a different kettle of sh.


Try enabling shellcheck linting in your editor! It would immediately warn about unquoted variable expansions and the like. For the vast majority of "gotchas" shellcheck will prod you in the right direction.

That said, to write good shell scripts requires actually learning the shell paradigm, instead of just trying to write "Python with Shell syntax".


I think there's place for both. When I build software installers for our embedded systems, the "frontend" tools production uses are written in a "sane" language, but the actual installation of software and configuration of the systems is done in shell. Because for running a bunch of apt/tar/cp/ln/sed commands to configure a Linux machine, it is the sane language.


This is awesome. Thank you! I've been through so many iterations but it's been fun to improve



Missed the other `echo $s` piped into shasum. But I echo the sentiment of the another commenter that I'd rather rely on `shasum --check` to give the OK or not.


Got it. Thanks.

Re --check, I suppose the way to do that would be to download the file to disk, which --check requires as fair as I can tell. So I could download the file to disk, --check, and then remove it. I think most of these installs scripts are trying not to leave any artifacts around from install, other than the resulting binary.


You only need to create a temp file for the checksum file, not the downloaded contents. In the below example, no file exists on disk with the contents of `$s`.

> $ s='1<space><space>2'

> $ printf %s\\n "$s" | shasum -a 256 > tmp.sum

> $ printf %s\\n "$s" | shasum --check tmp.sum

> -: OK

So you can just `printf '%s<space><space>-\n' "$c" > tmp.sum` and check with `printf %s\\n "$s" | shasum --check --status tmp.sum || { echo "checksum failed" > &2 ; exit 1 ; }`

Having to create temp files is a wrinkle (could probably avoid it by using process substitution if you want to give up on POSIX sh), but so is writing bash scripts in general.


Pr for that https://github.com/gavinuhma/checksum.sh/pull/4

I ended up trying with process substitution so no tmp file.

It works. Trying to decide if it’s more difficult to read


Solid! I couldn’t figure this out which I why I stopped using “—-check”. I’ll take a look


For more caveats like this one I recommend reading: https://www.etalabs.net/sh_tricks.html


I think this is a worthy cause, but maybe a little misguided: the problem with "curl-piping" isn't so much the fact that you're throwing a random shell script into your shell, but the fact that you're downloading arbitrary code in a way that's disconnected from the normal integrity/authenticity guarantees of a package manager.

In other words: you can be confident in the bootstrapping script you've just downloaded because it passed its checksum, but that script is just going to download more binaries from the Internet.


> the fact that you're downloading arbitrary code in a way that's disconnected from the normal integrity/authenticity guarantees of a package manager.

I'm old enough to remember when apt packaging got burned because it used http instead of https even though apt packages get signed.

If you're downloading software from websites protected with HTTPS, and that's good enough for you, then downloading and executing a script from those same websites using HTTPS is also good enough.

Would it be better if those things were signed with a key for which there is a code signing certificate? Eh, maybe, yes, if the PKI for the code signing is sufficiently better than WebPKI, which... is not necessarily obvious. Meanwhile, access to that PKI is probably sufficiently harder to come by than WebPKI TLS server certificates that a lot of people don't bother, and rightly so.

Now suppose you say "I don't trust this, I'm just going to clone their github repo and build from source". Do you get more protection that way? Maybe, maybe not.

Now, if you get packages from Debian and the like, you get them signed, and maybe the person who contributed the package to their repository did a thorough code review and audit of the upstream they are packaging, or maybe not, who knows.

This is why containerizing this stuff helps. But it's not really accessible to people yet.

What might be nice is that any program that a user executes automatically gets some level of isolation corresponding to how it was delivered, authored by whom, etc. So programs from the OS get the least isolation, and programs written by the user less isolation, and programs of unknown provenance get the most isolation.


> that script is just going to download more binaries from the Internet

Not necessarily. A number of these scripts either configure a package manager or the shell script contains the binary itself which is unpacked when the script is run.


Sure, I suppose that's possible. Most of the ones I'm familiar with just download an architecture-compatible binary from a CDN somewhere.

Even if there's a shar-style[1] packed binary in the script, you have no idea what that binary does when you verify that the checksum is correct.


Well-written scripts I've seen also contain the hash of binaries that are downloaded. So as long as the hash function is good, checking the hash of the script should still ensure that the binary downloaded is what you want.

> you have no idea what that binary does when you verify that the checksum is correct.

This isn't any different from using a package manager. You're still downloading a binary that could do anything and you have to have some level of trust in the source.


This function is flawed, containing unquoted variable interpolations:

  s=$(curl -fsSL $1)
  ...
  c=$(echo $s | shasum | awk '{print $1}')
what it means is that the checksum is being calculated on a whitespace-mangled version of the data that is pulled down from the web.

It appears to work because the author calculated the checksums with the same script and is just validating that they are not changing.

In other words, it's possible to make whitespace changes such that the hash won't change.

Here are two scripts: a harmless one and a malicious one, which produce the same whitespace-ignorant SHA256:

  $ foo='# this is a comment
  > # rm -rf /'

  $ echo $foo | sha256sum 
  8b87547d4d214038b153ce57d929be4c835b7690c930c1e83a25fc1509390cf9  -

  $ foo='# this is a comment #
  > rm -rf /'
  $ echo $foo | sha256sum 
  8b87547d4d214038b153ce57d929be4c835b7690c930c1e83a25fc1509390cf9  -
The first foo contains two comments. The "rm -rf /" command is commented out. The second foo moves the hash mark of the second comment into the previous line, uncommenting the command.

(I know about GNU Coreutils' safeguard in rm against removing / recursively, by the way.)


Thanks, I believe this is fixed now in the /checksum.sh file but I forgot to update the function on the website


Ok should be fixed now. Appreciate you pointing it out. That whitespace trick is really interesting


I don't know; what's the threat model here?

If the script is deliberately malicious as originally published, then the publisher will provide a valid checksum; so it doesn't help.

If the script source is subverted by an attacker, then it only helps if the attacker doesn't also have the means to change the published checksum too.

If an attacker can modify the site which publishes the URL for the script and the checksum, they can modify both at the same time.


That’s right. The checksum shouldn’t be provided by the site. I’m producing the checksum myself after reviewing the install scripts manually. Once I produce the checksum I can keep relying on it. The install scripts don’t tend to change very often.


That makes some kind of sense. The original post makes you sound like you're one of those crazy people who thinks e.g. Flatpak is fine but curl | bash is horribly insecure.

However I'm still not sure it really makes sense. Do you also manually review the code of the binaries that the bash scripts download?


so you’re storing the checksums locally for each script then?

is that much different than just storing the verified copies of the scripts?


Storing them in the readme which others can use as well. I jump around to new machines a lot so I can reference checksum.sh if I want to install rust for example


Makes sense, congrats on shipping!


I wrote hundreds of those checks in scripts, makefiles, CI and whatever else. After I found Nix (and NixOS) it's ridiculous not to use it. Use it.


I hadn’t heard of NixOS. Super cool


>I've found it a really useful tool for installing things like Rust or Deno.

For Rust you can ignore sh.rustup.rs and just download and set up rustup manually.

    CARGO_HOME="${CARGO_HOME:-$HOME/.cargo}"
    mkdir -p "$CARGO_HOME/bin"
    curl -Lo "$CARGO_HOME/bin/rustup" 'https://static.rust-lang.org/rustup/dist/x86_64-unknown-linux-gnu/rustup-init'
    chmod +x "$CARGO_HOME/bin/rustup"
    hash -r
    rustup set auto-self-update disable
    rustup set profile minimal
    rustup default stable
    rustup update --force
    rustup self update # Create hardlinks under $CARGO_HOME/bin/


Awesome, that avoids downloading and running executable code from the Rust project!


rustup is still executable code you are downloading from the Rust Project. It then downloads Cargo and rustc, which are both executable code, downloaded via the Rust project.

The only difference here is that you’re running a few commands by hand instead of running them in a single invocation of a shell script.


Yeah, the only thing this achieves is not having to worry about the `curl | sh` step. The rest of the threat model is exactly the same.


Of course; I was being sarcastic.


Poe’s law strikes again, my bad!


Why not use the -c option? Especially if you're using Bash or Zsh which has "here-strings":

    checksum() {
      hash="$1"
      file="$2"
      sha256sum -c <<< "${hash}  ${file}"
    }
Or if you need to use a POSIX-ish shell:

    checksum() {
      hash="$1"
      file="$2"
      printf '%s  %s' "$hash" "$file" | sha256sum -c
    }
Of course you can add a `--binary` option (uses '%s *%s' instead of '%s %s'), options to use different hash functions, etc.

I also think it's weird to use `alias` inside a function, instead of just using a parameter to store the name of the program to execute.


Great point on alias, thanks. I think that was a relic of an older iteration.

I'll work through these suggestions. Appreciate it. Feel free to send a PR if you want.

For the here string I think that won't work because the file isn't being saved locally, it's just being piped (so $2 is a URL). I can't do the usual `shasum -c <<< "132e320edb0027470bfd836af8dadf174e4fee00 install.sh" which takes a local filename but not the file content. As far as I could tell anyway. I'll try it some more


OP may want to take a look at "Shell script best practices" (https://news.ycombinator.com/item?id=33354286) submitted two days ago. :)


Nice! Thanks for sharing. I also learned about shellcheck thanks to this thread, which has been super useful


This just shifts the trust to the checksum. How do you know you downloaded the right checksum? Checksum the checksum?

Whatever you are doing to protect sending the checksum can also be used for protecting the script itself.


Agree. Although checksums are smaller and easier to copy/paste. Same with a url.

I download the script from A, and the checksum from B. And then I verify them locally. So A and B both need to be compromised. It all assumes the script was safe to begin with, and this just verifies that nothing has changed


Checksum the checksum_s_

Checksum.sh could keep track of checksums. Then an attacker has to alter the original script and checksum.sh.


Just remember that any script that fetches anything else remotely would still pass the checksum as only the initial script is checked.


It's the age old root of trust problem. In practice the good enough is that if it passes SSL/TLS authentication on the official domain then we wouldn't be able to stop an injection attack either way. Validating against the source is no good if it is the source that is compromised.

That's also kind of the issue with a lot of these shell injection attacks. Sure someone could insert environment variables or other shenanigans to take over your machine, but if they have that much control over your shell there are countless other ways they could also do it. Guarding against this one particular case doesn't buy you much.


Definitely. Important to note. There is a long long supply chain


Yep. As an example, rustup happens to be in this category as the checksums for rustc, cargo, etc. aren't checked.


It's really interesting. There should be a massive ledger of checksums for software


It's called apt. Or dnf. Or most any package manager. Having a gigantic general list runs into the problem of how do you update it and how do you verify the updates?


You use GPG and trust the people publishing things, who sign the artifact that you actually download. Which is internally how every package manager I've seen works internally, anyways.


> You use GPG

“and now you have two problems.” —jwz

We haven’t been able to trust public pgp keyservers for a decade or more (possibly never, really).

So now we’re back at having to trust where-ever we get the proof from, whether that’s the file hash, or the public key.

(Which, as you say, is what package managers provide, and if you don’t trust your system’s apt/yum/pacman/whatever, then you have a bigger problem that trusting any random install shell script)


An idea might be to get the checksum from the URL, for example:

    checksum https://sh.rustup.rs/#8327fa6ce106d2a387fdd02cb95c3607193c2edb | sh
Otherwise I don't understand why your script is loaded as a function rather than run as a script.


Awesome. I made something similar in https://github.com/mkmik/runck

But I didn't but a fancy domain name :-)


Haha thanks! Honestly when I saw the domain was available it motivated me to finish the project and share it


Domain driven development


Haha gold


If we kept a mirrored or distributed decentralized network of just cryptographic hashes, that might solve a huge number of problems around distributing files securely.


and where do we get the checksum? HN has a hall monitor mentality issue


I generate the checksum myself after reviewing a given install script. Then I add it to the readme. And then anytime I go to install something I reuse the checksum


Serious question - What is the benefit of verifying a hash? Are we really worried about file integrity? Why don't people use GPG?

The hash only verifies file integrity, and that the content of the url doesn't switch the script later. But keep in mind in most scenerios, and attacker would also just change the hash listed too (they're usually on the same website). This only mitigates one very specific attack.

Why don't we use GPG here? That way we can verify ownership and file integrity with at minimum TOFU, plus optional manual verification? If we're going through the work of adding a wrapper and all that, we may as well no?

This has the benefit that you only need to import the owner's cert once, all future changes have the same cert. Where hashes are obviously different every time, you have to trust the source of the hash every time it changes. With GPG at the very least you have TOFU with certs - and very best can have better assurance of the initial download too.

EDIT: Just want to clarify - I'm openly asking why the "developer community" is going the direction of hashes for script verification vs GPG signatures.

I don't mean to diminish your project, your project looks fun, and does make verifying hashes easier :)


Because for all of its problems, Web PKI is a working, practical, large scale system of verification and GPG isn't - you don't get much by trying to replicate what your web browser and CAs do for you but clunkier.


> would also just change the hash listed too

In my project I "host" the hash on a different medium, so in order to compromise the file download the attacker would have to compromise both the file hosting server and the hash hosting medium (which in my case is GitHub).

I also don't really display the hashes, as the download only happens when the script is updated, so your current version of the script will check the hash on GitHub vs the hash of the file download from the file hosting server.

EDIT: To be clear, this doesn't solve the problem with the initial install and it is also not related to the Checksum.sh script.


Interesting idea,

Does the script get the new version url&expected hash from the website alone? Or does it get the expected hash from the website, then calculate the URL from github?

Basically I'm wondering if that prevents just needing to attack the website - if the url to download the update and the expected hash are in the same place then it's still a single point of failure.


The latest file download URL is always the same /latest, hosted on my server.

The version number and latest file hash are also fixed URLs, stored on GitHub.

So for an update, the script checks GitHub for latest version number, if newer it downloads the latest version from my server, computes the hash and compares it to the hash stored on the fixed GitHub URL before proceeding.

I think there's no way to replace the file with a malicious one that will be distributed to the users unless you get access to both my server and the GitHub repository.


Yeah I think that should work.

It does have the downside still that changes to the website/github might break future updates in a way that isn't (easily) verifiable.

While this is a solution personally I still like the idea of GPG more since it'll work for any new files, works for your new projects automagically, etc.

But I think you did at least fix the future update problem with auto-updates, which is a lot more work then most people put into it so thank you for addressing the issue!


I'm not terribly deep in this space. What is the conceptual difference of hash vs GPG sig?


Hash essentially proves that the file you downloaded is the same as the file that was uploaded. It tells you nothing about Who uploaded the file. An attacker could make you download their own file, but then the hash of the file won't match what's published (unless the attacker changes the published hash).

A GPG sig proves that the file was signed & uploaded by the author, which defacto doubles as proof that it's the same file. The idea here is that the author uploads their public key, signs the package with their private key, and now there's an association between the package and the author. An attacker would have to obtain the author's private key, or replace the public key with their own. Changing the public key, however, is a big red flag.


I don’t think that’s a real problem here though.

I couldn’t care less if the Chinese government hosts an install script, if there’s no possibility they could have changed a single byte of the script.

Assuming I have a trusted way of knowing the installer script hash (which is a big assumption), I don’t need authentication for the script download, I only need integrity checks.


A hash is the same when the values of the content are the same. But when you get a new (maliciously hacked) install script chances are that you won't have an old hash lying around to check whether the script changed. Any attacker who could swap the sceipt could also swap the hash, unless it is a different channel.

With GPG the developer has a key pair (one private, one public). They can then sign all their scripts with their private key and publish the public one wherever. You can then take that public key and verify that the script has been indeed signed by the developers private key.


Admittedly this is likely the main reason GPG isn't more common place because of the complexity.

This is the overview:

Developer generates a private/public key they use for all of their projects.

You import their public key once - you can verify this from their github, twitter, etc but that's optional.

They can sign a file with their key. You can check this signature against their public key. This will guarantee the file was signed by using that key and is unmodified.

If someone hijacks the website after this point and signs the new downloads with their own key - then you will be able to see it's invalid.

If you manually verify the key then you'll know your initial download is valid - if you trust on first use then you at least know all future files signed from that developer with that cert are valid.

They also are effectively a hash for file integrity.

tl;dr - hashes tell you if a file is changed. Signatures tell you if the file is changed, and who the person that made the file is.


>The pattern of downloading and executing installation scripts without verifying them has bothered me for a while.

Thanks for sharing this work OP! I didn't see a license mentioned -- did you intend this to go into the public domain? I like how you set up a cool domain name and did some sick graphics, but I'm not sure how I can legally use your code in the future.

That being said, I appreciate the work you put into this project.

I'm not going to list off specific examples, but MANY open source projects serve either PGP keys or hashes in the clear. Or they serve just hashes over HTTPS and now you have a trust issue.

Or, in one case, my favorite -- they had lovingly listed out the MD5 sum for the program... but they served both that checksum, and the code itself... over HTTPS.

Now, to be fair, HTTPS does provide an integrity check, so there's a benefit beyond privacy or whatever but... this is a RAMPANT problem in the open source community.

I ran into it mostly when trying to find esoteric security tools when I was attempting OSCP and interviewing around for penetration testing roles.

I got the sense rapidly shifting from "I was so scared of the CFAA I did an entire master's thesis on the design of censorship circumvention tools" to "Oh gee, I used to be such a narcissis, demanding a high falutin salary when I couldmn't even fire up Metasploit to wipe a server."

(The implication being that some folks abused their access when my powers were week, and now, in time for spooky season, it's time lean in to letting people take whatever drug they want if they feel scared -- reality scares me too some days.)


Good catch. Let me add a license


Thanks, it wasn't meant in a gotcha way.


I totally just forgot to add one. Added MIT just now. Appreciate it!


I feel like bash/sh should have this built in




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: