Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmmm.... Wondering if this could be eventually used to emulate a PCIe card using another device, like a RaspberryPi or something more powerful... Thinking the idea of a card you could stick in a machine, anything from a 1x to 16x slot, that emulates a network card (you could run VPN or other stuff on the card and offload it from the host) or storage (running something with enough power to run ZFS and a few disks, and show to the host as a single disk, allowing ZFS on devices that would not support it). but this is probably not something easy...




Hi! Author here! You can technically offload the transactions the real driver on your host does to wherever you want really. PCI is very delay-tolerant and it usually negotiates with the device so I see not much of an issue doing that proven that you can efficiently and performantly manage the throughput throughout the architecture. The thing that kinda makes PCIem special is that you are pretty much free to do whatever you want with the accesses the driver does, you have total freedom. I have made a simple NVME controller (With a 1GB drive I basically malloc'd) which pops up on the local PCI bus (And the regular Linux's nvme block driver attaches to it just fine). You can format it, mount it, create files, folders... it's kinda neat. I also have a simple dumb rasteriser that I made inside QEMU that I wanted to write a driver for, but since it doesn't exist, I used PCIem to help me redirect the driver writes to the QEMU instance hosting the card (Thus was able to run software-rendered DOOM, OpenGL 1.X-based Quake and Half-Life ports).

Just to hijack this thread on how resilient PCIe is. PS4 Linux hackers ran PCIe over UART serial connection to reverse engineer the GPU. [0] [1]

[0] https://www.psdevwiki.com/ps4/PCIe

[1] https://fail0verflow.com/blog/2016/console-hacking-2016-post...


> PCI is very delay-tolerant

That fascinates me. Intel deserves a lot of credit for PCI. They built in future proofing for use cases that wouldn't emerge for years, when their bread and butter was PC processors and peripheral PC chips, and they could have done far less. The platform independence and general openness (PCI-SIG) are also notable for something that came from 1990 Intel.


Can one make a PCIe analyzer out of your code base by proxy all transactions thru a virtual PCIem driver to a real driver?

You can definitely proxy the transactions wherever you may see fit, but I'm not completely sure how that'd work.

As in, PCIem is going to populate the bus with virtually the same card (At least, in terms of capabilities, vendor/product id... and whanot) so I don't see how you'd then add another layer of indirection that somehow can transparently process the unfiltered transaction stream PCIem provides to it to an actual PCIe card on the bus. I feel like there's many colliding responsabilities in this.

I would instead suggest to have some sort of behavioural model (As in, have a predefined set of data to feed from/to) and have PCIem log all the accesses your real driver does. That way the driver would have enough infrastructure not to crash and at the same time you'd get the transport layer information.


Maybe: if PCIe device in on BDF 00AA:BB:00, create the proxy device on 00AA:BB:01 and the typical PCIe utils that talk to default 00AA:BB:00 will stead be config to talk to 00AA:BB:01 node. Some wireshark plugin will get the sniffed data (io, memory read/write, DMA read/write, etc) from the virtual device interface.

Ideally, the setup might be genetic enough to apply to all (most?) of the pcie device/driver....


Is it possible to put such a driver for nvme under igb_uio or another uio interface? I have an app that uses raw nvme devices and being able to tests strange edge cases would be a real boon!

Fantastic tool, thank you for making this it is one of those things that you never knew you needed until someone took the time to put it together.

This is really interesting. Could it be used to carve up a host GPU for use in a guest VM?

Depends on the use-case. For the standard hardware-accelerated guest GPU in virtualized environments, there's already QEMU's virtio-gpu device.[1]

For "carving up" there are technologies like SR-IOV (Single Root I/O Virtualization).[2]

For advanced usage, like prototyping new hardware (host driver), you could use PCIem to emulate a not-yet-existing SR-IOV-capable GPU. This would allow you to develop and test the host-side driver (the one that manages the VFs) in QEMU without needing the actual hardware.

Another advanced use-case could be a custom vGPU solution: Instead of SR-IOV, you could try to build a custom paravirtualized GPU from scratch. PCIem would let you design the low-level PCIe interface for this new device, while you write a corresponding driver on the guest. This would require significant effort but it'd provide you complete control.

[1] https://qemu.readthedocs.io/en/v8.2.10/system/devices/virtio...

[2] https://en.wikipedia.org/wiki/Single-root_input/output_virtu...


Really cool stuff thanks for creating it

As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.

> carve up

Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.

Note that the GPU vendors all deliberately include this feature as part of their market segmentation.


It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.

Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.


I wonder if it's possible to create a wire shark plugin for analyzing PCIE?


This kind of stuff is stupid easy on an OS like Plan 9 where you speak a single protocol: 9P. Ethernet devices are abstracted and served by the kernel as a file system explained in ether(3). Since it's all 9P the system doesn't care where the server is running; could be a local in-kernel/user-space server or remote server over ANY 2-way link including TCP, IL, PCIe link, RS232 port, SPI, USB, etc. This means you can mount individual pieces of hardware or networking stacks like ip(3), any 9P server, from other machines to a processes local namespace. Per-process name spaces let you customize the processes view of the file system and hence all its children allowing you to customize each and every programs resource view.

There is interest in getting 9front running on the Octeon chips. This would allow one to run anything they want on an Octeon card (Plan 9 cross platform is first class) so one could boot the card using the hosts root file system, write and test a program on the host, change the objtype env variable to mips/arm, build the binary for the Octeon and then run it on the Octeon using rcpu (like running a command remotely via ssh.) All you need is a working kernel on the Octeon and a host kernel driver and the rest is out of the box.


This is also the case with Google Fuchsia, just replace 9P with FIDL. I'm really hoping Fuchsia doesn't end up just being vaporware since it has made some very interesting technical decisions (often borrowing from Plan 9, NixOS, and others.)

> emulate a PCIe card using another device

The other existing solution to this is FPGA cards: https://www.fpgadeveloper.com/list-of-fpga-dev-boards-for-pc... - note the wide spread in price. You then also have to deal with FPGA tooling. The benefit is much better timing.


Indeed, and even then, there's some sw-hw-codesign stuff that kinda helps you do what PCIem does but it's usually really pricey; so I kinda thought it'd be a good thing to have for free.

PCIe prototyping is usually not something super straightforward if you don't want to pay hefty sums IME.


The "DMA cards" used for video game cheating are generic PCIe cards and (at least the one I got) comes with open documentation (schematics, example projects etc).

What's this? Hardware specifically for game cheating? Got any links?

Direct Memory Access (DMA) via PCI-e bypasses anti-cheat in the OS because the OS doesn't see the call to read or write the memory. There's no process to spy on, weird drivers, system calls, etc. You can imagine that maybe the anticheat could detect writes that perform a cheat by this method, but it has zero chance of detecting a wallhack style cheat that just reads memory. This is getting to be less relevant with modern OSs, though. Window 11 has IOMMU which only allows DMA to a given memory region defined per device. I think it should be impossible to do this on win11.

Don't you still need a driver to control the DMA card to tell it what to do with the memory?

If you search “DMA card”, there’s a lot of DMA cards all over the internet.

some ARM chips can do PCIe endpoint mode, and the kernel has support for pretending to be an nvme ssd https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html

Something like the stm32mp2 series of MCUs can run Linux and act as a PCIe endpoint you can control from a kernel module on the MCU. So you can program an arbitrary PCIe device that way (although it won’t be setting any speed records, and I think the PHY might be limited to PCIe 1x)

(Ha, nice to see Jon Corbet's name on the PCI Endpoint documentation...)

interesting... x1 would too slow for large amounts of storage, but as a test, a couple small SSDs could potentially be workable... sounds like im doing some digging...

There are many workloads that would not be able to saturate even an x1 link, it all depends on how much of the processing can be done internally to whatever lives on the other side of that link. Raw storage and layer-to-layer communications in AI applications are probably the worst cases but there are many more that are substantially better than that.

If there's any particular feature you feel you are missing on PCIem or anything, feel free to open an Issue and I'll look into it ;)

I ordered a pair of Orange PI 5+'s to work on playing with programming a PCIe device, but haven't made the time to get it working yet.

https://blog.reds.ch/?p=1759 and https://blog.reds.ch/?p=1813 is what inspired me to play with it.


ohh now thats cool! Thanks for the links!

… or pcie over ethernet ;)

That has a name: ExpEther[1], and likely more than one. pciem does mean you could do this with software.

[1] https://www.expether.org/products.html


Could add one or more (reprograble?) FPGA's for extra? processing power OR reconfiguration ease to such a card ......

I've often wondered why such a card (with FPGA) is not available for retro? computer emulation or simulation ??



This is what DPUs are for.

this is what dma cards do

I recently bought a DMA cheating card because it's secretly just an FPGA PCIe card. Haven't tried to play around with it yet.

Seems unlikely you'd emulate a real PCIe card in software because PCIe is pretty high-speed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: