As is the case with the Docker's best practices for Dockerfiles in the official documentation, they're leaving out some really important details.
Specifically, they don't really express how Docker packaging is a process integrating the way you build, where you build, and how you build, not just the Dockerfile.
1. Caching is great... but it can also lead to insecure images because you don't get system package updates if you're only ever building off a cached image. Solution: rebuild once a week from scratch. (https://pythonspeed.com/articles/docker-cache-insecure-image...)
2. Multi-stage builds give you smaller images, but if you don't use them right they result in breaking caching completely, destroying all the speed and size benefits you get from layer caching. Solution: you need to tag and push the build-stage images too, and then pull them before the build, if you want caching to work. (Long version, this is a bit tricky to get right: https://pythonspeed.com/articles/faster-multi-stage-builds/)
The documentation may do but many of the official images run as root and provide little to no documentation on how to change that, the Bitnami images in comparison are light years ahead.
A simple multistage example for Golang apps would be:
FROM golang:1.12.4
WORKDIR /opt/src/github.com/project1/myprog/
COPY . .
RUN go get -d -v ./...
RUN CGO_ENABLED=0 GOOS=linux go build -a -mod=vendor -o myprog .
FROM ubuntu:latest
RUN useradd -u 5002 user1
FROM scratch
COPY --from=0 /opt/src/github.com/project1/myprog/myprog .
COPY --from=1 /etc/passwd /etc/passwd
USER user1
ENTRYPOINT ["./myprog"]
I honestly don't understand this sentiment (other than shamelessly plugging your posts, which is ok).
The tradeoff in any packaging mechanism is always "pin everything to the byte" for maximum security and minimum updateability vs. "blindly update to latest without thinking". We normally develop with the second and deploy using the first and docker is no different.
This isn't really about "getting the latest version", it's about "getting the latest security patches for a stable version."
The presumption here is that you're running on a stable base image, Debian Stable or Ubuntu LTS or something. Updates to system packages are therefore basically focused on security fixes and severe bug fixes (data corruption in a database, say).
Even if you're pinning everything very specifically (and you probably should), you still want to at some point get these (severe-bugfix-only) updates on a regular basis for security and operational reasons.
Blind reliance on caching prevents this from happening.
> This isn't really about "getting the latest version", it's about "getting the latest security patches for a stable version."
Typically security patches trigger new releases with minor/patch version number bumps, which are then installed by getting the latest version of the package. That's pretty much the SOP of any linux distribution.
I’ve noticed the official docker images don’t seem to do this. E.g. the official “java” images seem to be uploaded and then are never changed, the only way to get a newer underlying base system is to upgrade to a newer version tag release. Is this true of all the official images, I wonder?
Using tagged upstreams is a good idea as it puts you in control of forcing an upgrade.
Best combo is to pin to a specific tag, that you periodically update to the latest stable release, and also allow overriding via a build arg. Anyone who wants the bleeding edge, say for a CI server, can run a build with “latest” as the tag arg.
The Python ones seem to be rebuilt much more frequently (last update was 10 days ago). This is perhaps because it depends on pip which has frequent releases.
We check once a day to see if the upstream repo has been updated and build our base images. I have used versions of this with clients. https://github.com/boxboat/auto-dockerfile
Node 6.1.0 was released on May 6 2016, it looks to me like the image was never changed after that? And if I run `ls -lah /var/lib/dpkg/info/*.list` inside the image, I get a modification time of May 3, 2016 on all the files... I tried the "node:10.0" image as well and I see similar behavior.
It only is showing the newest version of node 8.16 listed in the manifest file. In other words, if I had an image based off node 8.15, it isn’t going to be updated ever.
So it’s not a matter of just rebuilding regularly, if you aren’t updating your dockerfiles to use newer language versions, you also aren’t going to get system updates.
Edit: I think i do see your point which is that if you are completely up to date on language versions, clearing the build cache every once in a while may still help get a system update if an upstream image is changed in between the release of a new language tag.
Yes, and it is mostly up to the maintainer of the image on how to handle tags.
Typically minor patch releases are not kept around once the new patch is out.
May be worth filing an issue if this is problematic?
> Solution: you need to tag and push the build-stage images too
With BuildKit cache for all the intermediate stages is tracked. You can push the whole cache with buildx or inline the intermediate cache metadata in buildx or v19.03. `--cache-from` will pull in matched layers automatically on build. Can also export the cache to a shared volume if that suites you better than a registry.
I find it odd that docker pipelines don't cache all the layers by default. That is to say, that they don't push all cached layers to a central repository. I'm not even sure if it's possible.
Is there a reason why this would be bad? Clearly you'd have to clean the old cache layers regularly, but I'm more concerned with some layers taking very long times - the caching seems required.
Though I suppose if you set up your CI system to store the caches locally then you get caching, and it's more efficient as you're not downloading layers. So maybe that's just the "right" way to do it, regardless. /shrug
Sometimes multi-stage builds are used to get build secrets into the build image. The final image just copies the resulting binaries over, and so you don't want to push the build image since you'd be leaking secrets. (I talk about alternatives to that here, since caching the build image is something you do want: https://pythonspeed.com/articles/docker-build-secrets/).
Eventually this will be unnecessary given (currently experimental) secret-sharing features Docker is adding. But for now pushing everything by default would be a security risk for some.
Just to add onto your second point for others who might not be aware, the experimental secret and SSH agent forwarding features have greatly simplified a lot of Dockerfiles I work on. SSH forwarding in particular has been really helpful for dealing with private dependencies.
There's a good summary here: https://medium.com/@tonistiigi/build-secrets-and-ssh-forward.... The tl;dr is you can now write lines like "RUN --mount=type=ssh git clone ssh://git@domain/project.git" (or any command that uses SSH) in a Dockerfile to use your host machine's SSH agent during "docker build". You do currently need to specify experimental syntax in the Dockerfile, and set the DOCKER_BUILDKIT environment variable to 1, and pass "--ssh default" to "docker build", but it's a great workflow improvement IMO.
The flipside of 1. though is that cached images can't be hijacked and have vulnerabilities injected, or from a non security standpoint, have breaking changes introduced which mean the image no longer builds and/or runs how you're expecting.
Specifically, they don't really express how Docker packaging is a process integrating the way you build, where you build, and how you build, not just the Dockerfile.
1. Caching is great... but it can also lead to insecure images because you don't get system package updates if you're only ever building off a cached image. Solution: rebuild once a week from scratch. (https://pythonspeed.com/articles/docker-cache-insecure-image...)
2. Multi-stage builds give you smaller images, but if you don't use them right they result in breaking caching completely, destroying all the speed and size benefits you get from layer caching. Solution: you need to tag and push the build-stage images too, and then pull them before the build, if you want caching to work. (Long version, this is a bit tricky to get right: https://pythonspeed.com/articles/faster-multi-stage-builds/)