You're not doing things the "Docker" way. If you've got data you want to pull ou...

taeric · on Dec 1, 2016

This doesn't really change anything. I want to put data in and then take derived data out. All while maintaining access protections to my user on the host system.

cheez · on Dec 1, 2016

That's what mounts let you do. I'm not sure why you don't agree.

taeric · on Dec 1, 2016

That is what they do nominally. But for existing data, it falls flat. Hard.

Again, an example is easiest. (Will also shed light on if I am actually doing something wrong. Entirely in the realm of plausible.)

So, you have a file share with your data, scoped to user "cheez". You can ssh to machine "remote" and see this data. You can copy this data over to your machine, call it "local" and it will still be scoped to your user, "cheez". This is great, as you don't have to worry about someone else accidentally getting access to it.

However, you decide to use "container A" to run some analysis on this data. The folks that setup "container A" did so using the username "container". You mount your data on a volume to "/my/data". You check, and the container can now see it, but user "container" does not have permission to do anything with it.

You have plenty of options at this point. You can give public permissions to see the data in the directory. You can launch ssh from in the container and copy the data in. (Note that this way will have the data in the container as belonging to "container" which will make similar concerns on egress.)

You can also run the container as "root" which will clearly have permission. However, recognize the concerns with this approach, as this also gives you permission to see any other file in that directory that "cheez" wouldn't normally have permission to see.

You can also recreate the container such that "cheez" is the user that is created in the container. This sounds great, but redoing the entire setup of the image has a few problems. Not to mention the security implications this has, since you could also recreate the image with the user of your boss and circumvent some other security measures of your machine.

Now, if your container doesn't set anything up in userspace (defined as "not root") until launch of the container, much of this concern goes away. Not all of it, I believe, but most. However, anything that you want to run "in the container" using your identity from "out of the container" seems to fall into this trap. It is subtle, but gets in the way of a lot of use cases.

cheez · on Dec 1, 2016

Yep, this is a common problem. The solution is not perfect, but works correctly with respect to permissions:

https://medium.com/@ramangupta/why-docker-data-containers-ar...

taeric · on Dec 1, 2016

I'm not seeing anything there that addresses this.

To be clear, as a service deployment mechanism, this is fine. As an end user tool, this blows. Unless every container builds in support for LDAP or Kerberos, then each container is effectively a new computer on the network without auth setup in a meaningful way.

cheez · on Dec 2, 2016

What it's saying is that you have a container you use only to store the data, then share that container's volume with the target container. Finally, because it's on your hard drive, you can access it.

Basically, instead of host -> container, you do container -> container.

And yeah, the container is supposed to be effectively a new computer, but you can build images to create the auth environments you need and base everything off those.

taeric · on Dec 2, 2016

But that doesn't work unless you do everything with public permissions. Seriously try it. Make a container that you share a volume with. Do something in that container on some data and see who owns it according to your computer.

The use case I have above is a real one. Somehow try to share your .ssh folder with a utility container sometime.

User namespaces somewhat help. But fall flat for the case of "still being you" on a container.

It can be comical even for the intended two containers choice. Unless both use the same uid:gid to write files, one container will not be able to read the other containers data.

cheez · on Dec 2, 2016

So here is how I would think about your problem: am I doing this the way the tool wants me to do this? Does the tool even support this use case? If not, how should it be done? Is it impossible, or do I just not like the aesthetics of the solution?

If it's just the aesthetics of the solution, then you have a way forward. But if it is literally impossible to get data on and off a container, then you have a problem.