An example is easiest. I have my machine setup to provide who I am to machines I...

Diederich · on Nov 30, 2016

This is well stated.

I have been making end-user apps for myself and for folks at work that require such identity, in one case, ~/.ssh, and in another, ~/.gnupg.

My solution isn't particularly novel or clever, but it works well.

The docker image of the command-line app is the same for all users, and so lacks their identity built in.

The hack is to drive invocation of the docker image with a shell script that makes a temporary directory, copies in the necessary identity files from ~, and does a docker run that maps those identify files into the docker image.

After the docker image exits, the bash invocation script cleans up.

It's a hack, but it works surprisingly well. In my tests, it adds about 100ms of invocation latency for a python program. That is, running the docker image containing a python program that copies some files in as described is about 100ms slower than just running the same python program directly.

It would be nice to have a more elegant solution to this, but it's not too bad.

tomjakubowski · on Nov 30, 2016

disclaimer: I am not a security expert. Reader beware!

If you're using ssh-agent, maybe you could bind-mount your host's $(dirname $SSH_AUTH_SOCK) into the container, and then set the SSH_AUTH_SOCK environment variable to point at it when you run the Docker container. That way you're not even sharing the private key with the container.

I imagine you could do the same with gpg-agent, too.

Diederich · on Nov 30, 2016

I didn't think about that, thanks!

I didn't mention it, but for one of the apps, I also needed ~/.gitconfig, which I don't think has an agent. :(

taeric · on Nov 30, 2016

I think this is the common solution. It definitely feels like a hack, though.

I thought of the ssh-agent trick, but never thought of how to actually do it and moved on to other problems.

I would love to hear thoughts on better ways to fix this. Apologies for not responding earlier, as I really want this conversation to end somewhere. I just feel ashamed that I don't know how to continue it.

I do want to throw out there that this should not keep folks from trying docker. Please try it. Even better, suggest ways to advance this use case.

cheez · on Dec 1, 2016

You're not doing things the "Docker" way. If you've got data you want to pull out, use a mounted volume and get it out that way.

taeric · on Dec 1, 2016

This doesn't really change anything. I want to put data in and then take derived data out. All while maintaining access protections to my user on the host system.

cheez · on Dec 1, 2016

That's what mounts let you do. I'm not sure why you don't agree.

taeric · on Dec 1, 2016

That is what they do nominally. But for existing data, it falls flat. Hard.

Again, an example is easiest. (Will also shed light on if I am actually doing something wrong. Entirely in the realm of plausible.)

So, you have a file share with your data, scoped to user "cheez". You can ssh to machine "remote" and see this data. You can copy this data over to your machine, call it "local" and it will still be scoped to your user, "cheez". This is great, as you don't have to worry about someone else accidentally getting access to it.

However, you decide to use "container A" to run some analysis on this data. The folks that setup "container A" did so using the username "container". You mount your data on a volume to "/my/data". You check, and the container can now see it, but user "container" does not have permission to do anything with it.

You have plenty of options at this point. You can give public permissions to see the data in the directory. You can launch ssh from in the container and copy the data in. (Note that this way will have the data in the container as belonging to "container" which will make similar concerns on egress.)

You can also run the container as "root" which will clearly have permission. However, recognize the concerns with this approach, as this also gives you permission to see any other file in that directory that "cheez" wouldn't normally have permission to see.

You can also recreate the container such that "cheez" is the user that is created in the container. This sounds great, but redoing the entire setup of the image has a few problems. Not to mention the security implications this has, since you could also recreate the image with the user of your boss and circumvent some other security measures of your machine.

Now, if your container doesn't set anything up in userspace (defined as "not root") until launch of the container, much of this concern goes away. Not all of it, I believe, but most. However, anything that you want to run "in the container" using your identity from "out of the container" seems to fall into this trap. It is subtle, but gets in the way of a lot of use cases.

cheez · on Dec 1, 2016

Yep, this is a common problem. The solution is not perfect, but works correctly with respect to permissions:

https://medium.com/@ramangupta/why-docker-data-containers-ar...

taeric · on Dec 1, 2016

I'm not seeing anything there that addresses this.

To be clear, as a service deployment mechanism, this is fine. As an end user tool, this blows. Unless every container builds in support for LDAP or Kerberos, then each container is effectively a new computer on the network without auth setup in a meaningful way.

cheez · on Dec 2, 2016

What it's saying is that you have a container you use only to store the data, then share that container's volume with the target container. Finally, because it's on your hard drive, you can access it.

Basically, instead of host -> container, you do container -> container.

And yeah, the container is supposed to be effectively a new computer, but you can build images to create the auth environments you need and base everything off those.

taeric · on Dec 2, 2016

But that doesn't work unless you do everything with public permissions. Seriously try it. Make a container that you share a volume with. Do something in that container on some data and see who owns it according to your computer.

The use case I have above is a real one. Somehow try to share your .ssh folder with a utility container sometime.

User namespaces somewhat help. But fall flat for the case of "still being you" on a container.

It can be comical even for the intended two containers choice. Unless both use the same uid:gid to write files, one container will not be able to read the other containers data.

cheez · on Dec 2, 2016

So here is how I would think about your problem: am I doing this the way the tool wants me to do this? Does the tool even support this use case? If not, how should it be done? Is it impossible, or do I just not like the aesthetics of the solution?

If it's just the aesthetics of the solution, then you have a way forward. But if it is literally impossible to get data on and off a container, then you have a problem.

saganus · on Nov 30, 2016

I see.

This is an interesting problem, and I would think one that a lot of people are experiencing.

Seems like an area of opportunity to make docker images a lot more flexible.

taeric · on Nov 30, 2016

Apologies for the slow response on this post. I would love to think of ways to fix this. I think most uses of docker today involve deploying applications that you do want completely sandboxed. And for those, this works rather well. Hopefully this doesn't keep you from trying it. (Indeed, you may have a nice solution to this that escaped me.)

angersock · on Dec 1, 2016

Unless I grossly misread your problem--which I may have!--it seems like you're trying to handle authentication and accounts on the docker instance.

What's wrong with using ye olde Kerberos and LDAP to accomplish this exactly?

taeric · on Dec 1, 2016

That is certainly a path I considered. However, having to setup either of those is not exactly trivial. And not something I care to do for a user application.