I disagree with the fundamental assertion that 2d is harder than 3d. I think a more accurate title would be "Why are 2D vector graphics so much harder than 3D when using a 3D-oriented raster graphics pipeline?"
If we remove the existing constraints and say you have to build these things in pure software, I think the equation would look a little different. I don't know of many developers who can accurately describe what the GPU does these days. Triangle rasterization is not an easy problem if you have to solve it yourself.
It's that you're usually rasterizing much more complicated shapes than triangles in 2D, like polygons and curves and fonts with hinting (which is often actually implemented as Turing-complete byte code, not stuff that's easy to run in parallel entirely in the GPU).
Writing a triangle rasterizer is not that hard. What APIs like OpenGL give you for free (other than a performance boost) is walking all the pixels that are covered by each triangle, and computing the barycentric coordinates for each pixel (and then using these to lerp the vertex data). So that's what you have to replace by a CPU program.
I find the much harder part is how to setup the architecture in such a way that the data flows through your shader pipelines without an unbearable amount of boilerplate. 3D APIs don't help with that - if anything they make it harder.
Certainly. One can write a trivial version in maybe 30 lines of code. Writing a triangle rasterizer that you would want to use in a product that is consumed by another human is hard.
Also, it is my experience that none of these things can truly be built in isolation. Depth buffers and acceleration structures crosscut all aspects of a rendering engine.
I do agree regarding the 3d APIs though. Writing it yourself in software mode can be easier than learning someone else's mousetrap. This is the path I prefer, even if it is slower at first.
Is it even that? People tolerate a lot more artifacts in 3D than 2D. If you wrote a 2D graphics engine that used triangles as primitives people probably wouldn't like it (and it would probably render text very slowly.)
If we remove the existing constraints and say you have to build these things in pure software, I think the equation would look a little different. I don't know of many developers who can accurately describe what the GPU does these days. Triangle rasterization is not an easy problem if you have to solve it yourself.