If you have the privilege to pull off bind-mounting on top of a particular process directory in /proc you should also be able to mount something on top of /proc itself that will present a false /proc/mounts and conceal your trickery. I experimented a bit and wasn't able to bind a single file on top of /proc/mounts and I seem to be having trouble getting procfs to participate in an overlayfs, but if you were really dedicated you could construct a whole fake /proc by hand, or create something using FUSE to create a /proc that redacts anything you want, including mountpoints.
If you have root you can just reboot or kexec and embed yourself at kernel level.
The trick is to hide in plain sight. Call your process “tracker-miner-fs” (common suspiciously named GNOME process”). When the system administrator sees your process eating up huge amount of CPU time and does a web search, they’ll come to the conclusion it’s just GNOME being GNOME.
creating a whole fake proc is simple. I mean the code is already in the kernel as a module. Just copy it, put a few if/else's in the code to control what its readdir ( this case just pid's there's proc_readdir() that calls proc_pid_readdir(), just modify that very simply).
mount is a bit harder (as its calls into code that is more in the core kernel fs code, but one can just reimplement it oneself, to filter out the mounts you don't want to show.
If you have root of course, you can do anything, BUT -- could a eBPF module be loaded into the kernel to just filter the syscalls to hide some nefarious process? This simplifies the 'evil VM' attack. Does eBPF have anything that could plausably help with that?
A variant of hiding in plain sight for a backdoor if you have a webserver: create a directory ". " (dot space) somewhere in the web-application and put your backdoor in that directory.
A sysadmin just would see . . .. as directory entries.
Nitpicking: mount reads from /proc/self/mounts, that is, the mounts as seen from that process. This is how tricks such as chroot jails work, as they present a different mount set and filesystem tree from the base system.
My understanding is that chroot presently does make use of the different mount namespaces, though I could be wrong here.
How did chroot used to work in this regard? Were mountpoints outside the chroot jail itself accessible? Because if so, that ... would violate some key points of using a chroot jail in the first place. Though if chroot applied strictly to the filesystem and not mount points that might still offer benefits, and I seem to recall running into ... unexpected behaviours ... using chroots in the past.
The classic chroot(2) system call just changes the directory entry that's used for the root in that process. It only affects pathname resolution (including resolution of the pathnames in symbolic links).
If an unconstrained uid zero process is inside the chroot but can access or create device nodes, it can mount devices within the chroot just fine.
When you create a new mount namespace, the kernel creates a copy of every mount reachable from the filesystem (as visible to the process creating it). Then, it changes the current process's root directory and current directory to point to the same paths in the corresponding mounts. Changing a process's root via chroot is otherwise a totally separate operation.
How? I don't think the kernel cares what the directory looks like - userspace will break in funny ways, but you can boot without procfs even mounted (well, I say boot, but more like "you can start a shell and some things will still work")
When your application calls open() or read() or stat() or whatever, it passes through libc and then makes a syscall to ask the kernel "hey, give me a file handle for /proc/foo", "hey, give me the first 500 bytes from this file handle", etc. Then in the kernel, that syscall triggers a function that figures out what filesystem you're talking to and passes the request to that filesystem driver. AFAIK, the procfs driver just reads (and writes) a bunch of data structures in kernel memory and passes the result through as a filesystem. So the actual underlying data structure does always exist and is a hard requirement for the system to work, but procfs is just a userspace view into that underlying data in the kernel. Incidentally sysfs would be the same - the stuff you see in /sys exists in the kernel regardless, but the kernel doesn't really care whether those data structures are exposed to userspace as a virtual filesystem.
Thanks for this extra context! I wonder now how much latency the system experiences with the procfs driver loaded. I'm guessing virtually none, but still, for a driver that is essentially just decoration for userspace to look at, it seems like there would be some noticeable overhead in having the procfs driver loaded
I wouldn't think so, no. If userspace isn't using it, then there's no latency overhead; the kernel access its own internal data structures the same regardless. If userspace is using it, the kernel still uses its own data structures just the same, but now there's a way for programs to read those data structures. Using the procfs driver doesn't change anything inside the kernel (well, anything important; obviously it exercises the procfs driver code (which also will have its own variables) and there's some data structures that describe mounted filesystems), it just provides an interface to what's already there.
If you're relying on such superficial disguising, just put your process name in brackets directly, no mount privileges required.
As a general rule, if malware gets root privilege, no other process on the same system (container, at least) should be expected to be able to detect it.
This would help against basic ps commands but if your security tool reads the task's PF_ flags (from /proc/PID/stat or kernel memory), you'd still see that your [task] is not an actual kernel task... (the PF_KTHREAD flag is 0x200000):
The original 'ps' got the data by reading kernel memory directly (/dev/kmem or /dev/mem), and I'm old enough to remember systems doing that. I don't think there were any Unix-derivatives that used system calls? Minix is closest, in that you could query the 'mm' task using RPCs which are sort of system calls.
kvm_getprocs is in userspace library libkvm which abstracts away how that is done. And that library is primary meant for accessing /dev/kmem and crashdumps, with this functionality being bolted onto it (FreeBSD manapage even mentions that as a bug) and it works by examining the symbol table of running kernel and just walking the kernel datastructures.
On FreeBSD there seems to be some trick where opening "/dev/null" instead of "/dev/kmem" causes the process "to not access kernel memory directly". Looking at the code it seeems to me that it means that libkvm will really open /dev/null and treat it as if it was /dev/kmem, which raises a question of exactly how that works.
[Edit: apparently this works because in case of querying processes of running kernel, the code in kvm_proc.c does not read from the file and instead calls sysctl().]
One problem with reading /proc is that its impossible to get consistent view of the whole system because you are iterating the processes one by one, instead of getting snapshot of all the processes.
I guess my point/question was more - is whatever system/library call you'd use atomic? There's technically no reason the system couldn't have a syscall that returns a full data structure of process info for as many processes as are in scope and guarantee that it's a perfect atomic snapshot of exactly how the process table looked when that call returned, but I'd want to be very confident and have explicit documentation promising as much before I depended on it.
Trying to parse the output of several procfs files isn't as easy as it looks. The per-pid maps file for one (smaps is even worse). Much better if I could just get a struct with all this information.
I've not looked at kernel source but, when I had a task to read info from /proc it looked like a lot of info was just pack() in there. So my parsing was just done with unpack(). Mapping those values from things that could be found in existing userland tools.
I'm totally with you for processes should probably use glibc/platform API calls, but procfs is nothing short of brilliant and a great validation of the power of the "everything is a file".
Though it does stop implementing obsolete syscalls on newer architectures. So the set of syscalls you can effectively use often rotates as the hardware treadmill turns.
And since they haven't been removed, just not implemented, you get the joy of finding out it doesn't work anymore when you try to use it, instead of when you build it!
What I want to hide is mounts. With containers, snap (squashfs), etc., the number of mounts on a modern Linux system is growing to ridiculous numbers. And why should they all be visible to normal users?
Coincidentally, I just used a veritable litany of bind mounts this week. I did not attempt to hide anything, though one of those bind mounts exists in order to fake something. I needed this to be able to properly run some x86 software under Rosetta 2 in an arm64 Ubuntu VM.
I wasn't happy with how handling packages for multiple architectures seems to work in Ubuntu/Debian: It looks like you basically have to choose what package you want to install with each architecture, because if you try to install a package for both architectures, they almost always conflict with each other (trying to occupy the same files).
So instead, I made an x86 chroot. I used "debootstrap" to populate the x86 chroot with an x86 Ubuntu base system, and schroot so that a regular non-root user could use that chroot. That way, I can install any x86 package I want without conflicting with the surrounding VM by just chroot'ing into it and regularly typing e.g. "apt install emacs".
But both because a lot of software needs it, and for interop with the surrounding VM (unix domain sockets etc.), I had to create a bunch of bind mounts to bring shared "state" directories into the x86 chroot:
mount --make-private --bind /proc /data/x86/proc
mount --make-private --bind /proc /data/x86/sys
...
I did this for at least /proc, /sys, /run, /dev, /dev/pts and /home. Depending on how much you want to bring the environments together, you could also add /run, /tmp, and others.
I also added bind mounts for individual files:
mount --make-private --bind /etc/passwd /data/x86/etc/passwd
mount --make-private --bind /etc/shadow /data/x86/etc/shadow
mount --make-private --bind /etc/group /data/x86/etc/group
so that my x86 environment has the same user/group database.
Finally, and I think this is the most interesting piece, some x86 software I tried to run did not work because it tried to parse /proc/cpuinfo for some x86 CPU features. Rosetta 2 implements those, but of course /proc/cpuinfo in the system describes the actual (virtualized) ARM CPUs, which is of course not what x86 software expects.
So, I crafted my own fake cpuinfo text file that looked roughly like it would look in a real x86 environment (I did not put much effort in it, just copied an output from a random real x86 machine and adjusted the number of CPUs), and bind mounted that into the x86 /proc overlay:
mount --make-private --bind /data/fake-x86-cpuinfo.txt /data/x86/proc/cpuinfo
Now, /proc/cpuinfo inside the chroot (so, /data/x86/proc/cpuinfo) would give the fake cpuinfo and make the x86 software that parses it happy. This is also where the --make-private that I applied to /data/x86/proc earlier becomes important: Without that, that last command would not just overlay /data/x86/proc/cpuinfo, but also /proc/cpuinfo itself, now in turn making arm64 software potentially unhappy. With --make-private on the /proc bind mount, it effectively becomes a separate filesystem in that regard (only).
Finally, the biggest hurdle I had was getting systemd to properly mount all of these in the right order at startup. systemd parallelizes the mounts in fstab (and mounts generally). But if e.g. the /data/x86/proc mount does not happen after both /proc and /data/x86 have been mounted already, you effectively get an empty directory (you can easily work out yourself why that results in all improper cases). This was even more complicated because I use ZFS, and so /data/x86 gets mounted by zfs.mount-service. After much fiddling, I gave up. No combination of "x-systemd.requires=<mountpoint>", "x-systemd.after=zfs-mount.service" and whatever else in the fstab options really fully did the right thing.
I resorted to having a shell script that just runs the bind mounts in the correct order.
in Linux you can rename your own process name, many applications do that to name their threads or to describe what they are doing
isn't it just much easier to rename our process to looks like kernel thread or some other process?
we could easily escape process tree by forking child and exiting parent (we would get adopted by init)
If it's taking up a lot of CPU and MEM, someone runnning htop will begin to wonder what that process is doing. By bind mounting it, you obscure it from htop altogether
>> My guess is that nobody is going to notice this unless they are specifically looking for this technique.
But having two identical PIDs is a pretty weak cloak. Even more so when reducing terminal clutter e.g. run "ps | grep procname" ... anyone not completely asleep is bound to notice it.
Probably you read the article that was linked from this article, which does indeed make the process completely vanish, but leaves a suspicious empty /proc directory.
This article 'solves' the problem of an empty directory by simply bind-mounting another process instead - but that causes ps to output a duplicate line (including process ID) for the other process, in lieu of the process being hidden.