Hi everyone, I'm one of the co-founders of Dagger (and before that founder of Docker). This is a big day for us, it's our first time sharing something new since, you know... Docker.
If you have any questions, I'll be happy to answer them here!
The carbon footprint of the cloud has exceeded the footprint of air travel. A move away from monolithic statically compiled binaries to constellations of microservices (usually bloated docker containers) is a significant part of the problem.
Docker's explosive growth is partly due to the convenience of the abstraction it provides, abstracting the entire linux userspace, putting even OS-wide package managers and language-specific package managers inside another box. This usually breaks any caching / code sharing that the now containerized packages had, resulting in the bloat. The docker image is portable, yes, but disk and RAM efficiency of the systems people are building are awful. It has been the norm for every little microservice to add a few GB of bloat to the overall software system. A dev writes "RUN pip install pytorch" and you have CICD servers pulling down 2GB of pytorch to build the container, every time the software is built, probably forever. Meanwhile species are going extinct and a lot of people are starting to wonder if it's ethical to work in technology at all.
What can your team do to reverse this tragedy of the commons? Can you come up with some equally ergonomic tool that can migrate the container ecosystem on to something that has a solid foundation with good caching?
> What can your team do to reverse this tragedy of the commons?
It's tragedies all the way down :-)
Making a successful tool in a competitive space is hard enough. Asking a creator to somehow factor in environmental impact isn't going to work. A creator that places additional constraints on themselves will more likely lose out to a competitor that doesn't.
This is the kind of thing a carbon tax is perfect for. Cost optimization is infused into how businesses and individuals operate. Tax the things you care about and watch all that machinery get to work!
Right now, people use Docker (and dynamic languages, bloated JS libraries, etc) because burning electricity is the cheapest route. (Sort of... the market's obviously not perfect.) Make electricity expensive enough and we'll get more efficient systems.
And of course taxes aren't the only solution. For example, Google Flights now shows the carbon cost of each flight option, which I think might actually move the needle. But that category of solution might only work in an industry with similar characteristics, e.g. infrequent but large purchases, mature industry that can define measurement standards.
> Asking a creator to somehow factor in environmental impact isn't going to work.
That attitude doesn't leave a great impression for me. I take your points about how difficult it is, but I think we can all do better than throw our hands up in the air. For instance, you could talk about how easy containerization of CI/CD makes it easier to move your pipeline where impact is lowest. Or that you can control your own impact rather than leave it up to the whim of someone like CircleCI.
There's no silver bullet with environmental impact, which is why we all have to collectively apply whatever wins we can, wherever we can.
I haven't fully formed this thought yet, but your request might result in a net loss of "good" people. The people who actually consider your request will probably fail to launch their business due to those environmental constraints. If you take 10 people who run a business regular and compare them to 10 who do it in an environmentally conscience way (while environmentalism not being he the bizmodel in the first place), I'd think you'd have equal or less environmental businesses succeeded which actually ends up hurting your problem more than helping.
I think the ideal way is to let a startup thrive any way possible and when they're not trying to not starve anymore, begin environmental changes.
> For instance, you could talk about how easy containerization of CI/CD makes it easier to move your pipeline where impact is lowest. Or that you can control your own impact rather than leave it up to the whim of someone like CircleCI.
Oh yeah -- I'm not at all saying Docker is all bad.
* Like you mentioned, increasing interoperability allows the market be more efficient.
* Docker continued the path that VMs started towards making strong isolation even more efficient and accessible.
* Layers and caching are obviously good for resource consumption.
It's just that the original comment seemed to try shaming the Docker creators about what they've built. All they did was try and make something better. And if they didn't, someone else would have.
It was meant as genuine constructive criticism and a plea to do better. They have more traction than most people to morph docker into something that comes close to its predecessors in efficiency. Somehow replace their fragile layer caching strategy based around diffs of the entire filesystem, with something that understands the OS and language-specific packaging systems being used inside the containers, and can therefore at ~least cache the package downloads. We desperately need a better, more efficient package manager, and Docker has been a huge setback; it has become normal for CICD to rebuild your image every time you touch the code, and pull down practically an entire linux distribution every time. I know you can do better with docker caching if you put in enough consistent and dedicated engineering effort into how you structure things, and set up local apt and pypi mirrors, etc... but that's the default behavior, and none of (admittedly very small number organizations) I've worked at have had the organizational capacity to really get past it. I don't know if we need what they're building, but we absolutely need a much more efficient new version of Docker with an easy migration path. Indeed they may be the ~only people with traction to resolve this, since in the current climate as soon as someone introduces a new package manager with more efficient dependency resolution or caching (i.e. nix, poetry, cargo...), people just stick it in their Docker container and break the caching anyway.
One of the big points Bill Gates makes in his recent book on climate change is that once the genie is out of the bottle in terms of lifestyle, there's no going back. It's naive to expect people to willingly reduce their energy use enough to make an impact, and immoral when you consider all the people in poor villages that don't even have electricity yet.
The solution is, in a nutshell, to electrify everything, and push to make electricity clean and plentiful. Anything else is doomed to fail because we can't beat climate change by reducing our carbon footprint; we have to eliminate it entirely.
From that perspective, cloud energy usage is not a problem, since it's already electrified by nature. Now we just need to stop emitting carbon in order to make electricity (among other things).
“A move away from monolithic statically compiled binaries to constellations of microservices (usually bloated docker containers) is a significant part of the problem.”
It won't be true in every shop, but I do this professionally and it's been my firsthand experience. A native statically compiled binary containing just the functions that actually get called will usually be... 10-100 MB. Ungroomed Docker images are ~10GB-20GB, same as you'd have on the root partition if you sat down and brought up a linux workstation or server node manually, and this is not a coincidence. Sure, docker avoids duplicating the linux kernel, making it more efficient than an old school VM, but these days all the ~other software bloat dominates the kernel in size. Most companies do not have a 100 person team of engineers dedicated to optimizing their image build and management workflow, and pruning what goes into their containers.
Hi! I've browsed the docs quickly, and I have a few questions.
Seems to assume that all CI/CD workflows work in a single container at a time pattern. How about testing when I need to spin up an associated database container for my e2e tests. Is it possible, and just omitted from the documentation?
Not familiar with cue, but can I import/define a common action that is used across multiple jobs? For example on GitHub I get to duplicate the dependency installation/caching/build across various jobs. (yes, I'm aware that now you can makeshift on GitHub a composite action to reuse)
Can you do conditional execution of actions based on passed in input value/env variable?
> Seems to assume that all CI/CD workflows work in a single container at a time pattern.
Dagger runs your workflows as a DAG, where each node is an action running in its own container. The dependency graph is detected automatically, and all containers that can be parallelized (based on their dependencies) will be parallelized. If you specify 10 actions to run, and they don't depend on each other, they will all run in parallel.
> How about testing when I need to spin up an associated database container for my e2e tests. Is it possible, and just omitted from the documentation?
It is possible, but not yet convenient (you need to connect to an external docker engine, via a docker CLI wrapped in a container) We are working on a more pleasant API that will support long-running containers (like your test DB) and more advanced synchronization primitives (wait for an action; terminate; etc.)
> For example on GitHub I get to duplicate the dependency installation/caching/build across various jobs. (yes, I'm aware that now you can makeshift on GitHub a composite action to reuse)
Yes code reuse across projects is where Dagger really shines, thanks to CUE + the portable nature of the buildkit API.
Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
> Can you do conditional execution of actions based on passed in input value/env variable?
Yes, that is supported.
> Any public roadmap of upcoming features?
For now we rely on raw Github issues, with some labels for crude prioritization. But we started using the new Github projects beta (which is a layer over issues), and plan to open that to the community as well.
Generally, we develop Dagger in the open. Even as a team, we use public Discord channels (text and voice) by default, unless there is a specific reason not to (confidential information, etc.)
Thank you for the detailed response. I appreciate you taking the time. One last question/note.
> Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
Is this strictly because it's using Docker underneath and layers can be reused? If so, unless those intermediary layers are somehow pushed/pulled by the dagger github action (or any associated CI/CD tool equivalent), experience on hosting server is going to be slow.
Sidenote, around 2013 I've worked on a hacky custom container automation workflow within Jenkins for ~100 projects, and spent considerable effort in setting up policies to prune intermediary images.
Thus on certain types of workflows without any prunning a local development machine can be polluted with hundreds of images, unless the user is specifically made aware of stale images. Does/will dagger keep track of the images it builds? I think a command like git gc could make sense.
> > Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
> Is this strictly because it's using Docker underneath and layers can be reused?
Not exactly: we use Buildkit under the hood, not Docker. When you run a Dagger action, it is compiled to a DAG, and run by buildkit. Each node in the DAG has content-addressed inputs. If the same node has been executed with the same inputs, buildkit will cache it. This is the same mechanism that powers caching in "docker build", but generalized to any operation.
The buildkit cache does need to be persisted between runs for this to work. It supports a variety of storage backends, including posix filesystem, a docker registry, or even proprietary key-value services like the Github storage API. If buildkit supports it, Dagger supports it.
Don't let the "docker registry" option confuse you: buildkit cache data isn't the same as docker images, so it doesn't carry the same garbage collection and tag pruning problems.
> Don't let the "docker registry" option confuse you: buildkit cache data isn't the same as docker images, so it doesn't carry the same garbage collection and tag pruning problems.
IIRC doesn't buildkit store its cache data as fake layer blobs + manifest?
I don't see how it can avoid the garbage collection and tag pruning problems since those are limitations of the registry implementation itself.
You still need to manage the size of your cache, since in theory it can grow infinitely. But it’s a different problem than managing regular Docker images, because there are no named references to worry about: just blobs that may or may not be reused in the future. The penalty for removing the “wrong” blob is a possible cache miss, not a broken image.
Dagger currently doesn’t help you remove blobs from your cache, but if/when it does, it will work the same way regardless of where the blobs are stored (except for the blob storage driver).
> In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
Re: SBOM: Software Bill of Materials,
OSV (CloudFuzz), CycloneDX, LinkedData, ld-proofs, sigstore, and software supply chain security:
"Podman can transfer container images without a registry"
https://news.ycombinator.com/item?id=30681387
Can Dagger cache the (layer/task-merged) SBOM for all of the {CodeMeta, SEON OWL} schema.org/Thing s?
Are you guys aware of Nix, both the language and the build system? Nix at its core is a build system, but the community pushed the boundary of what a "build" means so hard, now Nix could also be used as one definition language for everything in a CI/CD pipeline (also with a canonical collection of "building blocks" in nixpkgs), from (reproducibly) building artifacts to running automated testing/integrating tasks to automatically deliver the artifacts to whatever the "infrastructure" is. After all in a very general sense the whole CI/CD pipeline could be seen as just another build artifact, which I think resonates a lot with your idea.
How do you think your project and Nix would overlap and/or (or both) complement each other?
Thanks for answering Qs. Does this compete directly with Tekton ( https://tekton.dev/ ), or do you imagine a way the two could interoperate? Why choose Dagger over Tekton to power pipelines?
You can (and people do) run Dagger on top of Tekton, in the same way that you might run a Makefile or shell script on top of Tekton. The benefit is that you are less tied to a particular runtime environment. The same Dagger pipeline can run on Tekton, Jenkins, or your laptop. This makes local debugging and testing in particular much easier.
Yes :) You can write one last Jenkinsfile that runs Dagger, then do everything inside Dagger. Then you can run the exact same Dagger configuration on another CI system, or on your laptop. All you need is a Docker-compatible runtime (we run everything on buildkit under the hood).
> You can write one last Jenkinsfile that runs Dagger
I'm very confused by this sentiment. This approach loses the best of existing CI tooling does it not?
Jenkins users lose understanding of what's actually being executed, what stage they're in, and how long it took. It might provide convenience, and the a (great) benefit of running locally the same as in your CI environment, but it seems to me this would make it difficult for devs to easily understand where/why their build failed, since it just has one megastep.
I assume if the jenkins-step failed, you'd click a link to the dagger UI to see which dagger-step failed. Alternatively, never open jenkins at all and instead keep a tab open with the dagger UI.
I guess I'm a little confused where the line here...
Where is dagger UI and how does it relate to your CI? I don't see it in docs or cli help. Sounds like Dagger UI in this context (above) is providing little value beyond logging if it's not doing workflow execution.
I ask not to talk down on the product, but because I'm actually quite interested. Local execution, plus containerized execution sounds awesome. Just trying to understand the vision.
It can be a “megastep” in jenkins, but it doesn’t have to. It could be one individual step that happens to run on Dagger. Both work equally well.
In the “megastep” approach, it boils down to which tool can provide the most useful information. Jenkins is more mature but Dagger has more information about the DAG. So in some cases developers might actually prefer using Jenkins as a “dumb” runner infrastructure. It depends on the situation.
You should be able to drastically simplify you Jenkinsfile(s) and have them just invoke Dagger. The issue you may run into is when you have different Jenkins nodes for different types of work. You could always invoke Dagger on each of these, depending on your setup and needs. Where there is a will, there is a way, with Jenkins :]
There will be an optional cloud service (not yet available). Its features will be based on requests from the community. Some problems just can't be solved with a command-line client. For example: visualization of your pipelines; a centralized audit log of all jobs run across all machine; centrally managed access control and policies; etc.
We will not rely on unusual licences to restrict competitors from running Dagger as a service. We don't need to, since `dagger` is a client tool. We do encourage cloud providers to run buildkit as a service, though :)
Generally, we take inspiration from Red Hat for their balancing of open-source community and business. They are very open with their code, and tightly control how their trademark is used. You can clone their IP and use it to compete with them - but you have to build your own brand, and you can't confuse and fragment the Dagger developer community itself. We think that is a fair model, and we are using it as a general guideline.
> We will not rely on unusual licences to restrict competitors from running Dagger as a service.
Your "Trademark Guidelines" appear to contradict you:
> Third-party products may not use the Marks to suggest compatibility or interoperability with our platform. For example, the claims “xxx is compatible with Dagger”, “xxx can run your Dagger configurations”, are not allowed.
> but you have to build your own brand, and you can't confuse and fragment the Dagger developer community itself
If I do an incognito Google search for "dagger", the first result is the Wikipedia page for the knife, and the second result is for Dagger, the dependency injection tool. By naming this "Dagger" you're confusing not just your own developer community but the pre-existing one as well.
> > We will not rely on unusual licences to restrict competitors from running Dagger as a service.
> Your "Trademark Guidelines" appear to contradict you:
They do not. Software licenses and trademark guidelines are two different things. Some commercial open-source vendors have changed their licenses to restrict use of the software in various ways - typically to limit competition from large cloud providers. We don't do that, and have no intention to. Our license is OSI-approved and we intend to keep it that way. That is what I am referring to.
> but you have to build your own brand, and you can't confuse and fragment the Dagger developer community itself
This is the intent behind the language in the trademark guideline which you quoted: you can redistribute and modify our code. But if you distribute a modified copy, call it something else.
> > Third-party products may not use the Marks to suggest compatibility or interoperability with our platform. For example, the claims “xxx is compatible with Dagger”, “xxx can run your Dagger configurations”, are not allowed.
> By naming this "Dagger" you're confusing not just your own developer community but the pre-existing one as well.
I disagree. Dagger has existed in private beta for over a year, thousands of engineers have been given access, and I can't remember a single instance of any of them being confused by the name. We have registered the trademark, and nobody has raised an issue.
> > > We will not rely on unusual licences to restrict competitors from running Dagger as a service.
> > Your "Trademark Guidelines" appear to contradict you:
> They do not. Software licenses and trademark guidelines are two different things. Some commercial open-source vendors have changed their licenses to restrict use of the software in various ways - typically to limit competition from large cloud providers. We don't do that, and have no intention to. Our license is OSI-approved and we intend to keep it that way. That is what I am referring to.
I'm glad the product is open source, but that provision isn't in the context of source code, it is a top-level item listed on that page. That's why I interpreted "unusual licences" to generally mean a sort of "legal acrobatics".
When you're threatening people with legal action you need to be clear, and right now the text on that page is not, according to what you're saying here. I doubt many people are going to be searching Hacker News comments for the true intent behind these guidelines.
> I disagree. Dagger has existed in private beta for over a year, thousands of engineers have been given access, and I can't remember a single instance of any of them being confused by the name. We have registered the trademark, and nobody has raised an issue.
I don't think that really addresses the point. Dagger (as started under Square) is nearly ten years old, and Google's 2.0 fork is from 2016. It's used by thousands of published Maven artifacts, countless applications, and tens of thousands of developers (at least). This is the first time I've heard of your project, but that's bound to happen in tech. Whether you registered it or not without complaint doesn't much matter either, the issue is being raised here, now that you've publicly launched.
> When you're threatening people with legal action you need to be clear, and right now the text on that page is not, according to what you're saying here.
That's good feedback, thank you. We can try and make it clearer as long as it remains legally correct and enforceable. Do you have specific feedback on which parts you found unclear, and why?
"It is perfectly acceptable and within the bounds of the law to use another's trademark in advertising, provided certain standards are met. The advertisement must be truthful and the use of another's trademark must not give a false impression of connection, approval or sponsorship by the owner of the other mark."
Congratulations! I know exactly how this tool will benefit us DevOps Engineers as I knew when you did a demo of Docker at PyCon 2013, wishing you and your team the best!
How mature is this? We have a 20 person team and we're prototyping different pipelines for our next CI/CD pipeline (currently Heroku). Is this ready for production workloads?
We consider Dagger to be beta-quality. It definitely has bugs, and APIs are still moving, though we make a big effort to limit breaking changes, and help developers migrate when breaking changes do occur. We aim to be able to guarantee complete peace of mind and continuity for production pipelines, but can't make that guarantee yet.
That said, one nice aspect of Dagger is that you don't need to rip and replace your entire system: you can use it in a localized way to solve a specific problem. Then expand over time. It's similar to writing a quick shell script, except it's easier to reuse, refactor and compose over time.
Dagger is already a popular dependency injection framework, so why choose a name that will be confusing to people who will likely use both of these frameworks in their projects?
And the reason they're both called Dagger is as a play on using a DAG, directed acyclic graph, to model dependencies. Stealing that wittiness and pretending it's their own is pathetic.
If you have any questions, I'll be happy to answer them here!