Smells like an article from someone that didn’t really USE the XML ecosystem.
First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.
Then, schemas sound great, until you run into DTD, XSD, and RelaxNG. Relax only exists because XSD is pretty much incomprehensible.
Then let’s talk about entity escaping and CDATA. And how you break entire parsers because CDATA is a separate incantation on the DOM.
And in practice, XML is always over engineered. It’s the AbstractFactoryProxyBuilder of data formats. SOAP and WSDL are great examples of this, vs looking at a JSON response and simply understanding what it is.
I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.
This is both painfully hilarious and hilariously painful. It might even be hilarious, but my JVM ran out of memory while trying to build the DOM model.
XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea. Entities were a so-so idea, which became unapologetically terrible when URLs and file references were allowed. CDATA was an interesting idea but an error-prone one, and likely it just did not belong.
OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.
XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.
Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.
Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.
Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.
What's missing is a widespread replacement for XSLT. Could be a fun project.
> XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea.
That's exactly what I wanted to say. The author talks as if XML was well designed to represent structured data, but it was not, it grew out of the idea of marking up text, which is a completely different problem. The hilarious part is that he doesn't recognize the problem when he gives his example of "or with attributes".
The other thing, is that the JSON model doesn't just give you a free parser/serializer in JavaScript. It actually maps to the basic data model of the entire generation of dynamic languages that the Web grew on: perl, Python, JS, PHP and Ruby. Arrays and maps are the basic way to represent structured data in these languages, and JSON just serializes that. Which means that getting data in and out of your language is just a single line.
The author seems to think that XML maps a proper conceptual model and JSON doesn't, but the model of "nodes with attributes and content" is a worse match for structured data than JSON's model of "arrays and maps of values".
Other than that, it's really a question of how much tooling you want to use. Both JSON and XML grew entire ecosystems of it, and nowadays if you want to read your JSON according to a schema into typed objects, you can, and for any good-sized project, you probably should.
Also:
> There are cases where other formats are appropriate: small data transfers between cooperating services and scenarios where schema validation would be overkill.
That's actually most of the cases for your average web dev!
I don't know, the few times I have had to XML, I went "This is not so bad, I don't know what all the fuss is about" until I hit namespaces. I don't know if I was just using an inferior library but namespaces sucked. The minute namespaces came into the picture all the joy left the project. And XSLT... I only ever did one thing with it "use the browser to turn demarc XML records into a webpage" and that was pretty cool. but it also firmly convinced me that XML is very much the wrong form factor for a programing language.
My personal thought is that css is not a sgml-like as a sort of rebellion against the way XML was taking over the world. It feels like author had written one too many XSLT's and said "Nope, it ends here, we are not doing that again." Because really, it is very weird that css does not use an XML syntax.
On the topic of the wrong form factor for a programing language. Another good contender is ansible when you try to use it's YAML looping constructs.
XSLT ended at 1.1 for me. Everything that was "designed-by-committee" later was subverted to serve the bottom line of Michael Kay enterprises, although I hesitate to attribute to malice the grueling incompetence of the working group at the time.
It still works well in the appropriate settings. LibreOffice (nee OpenOffice) uses ODF, an XML format, for its document files, and it has been working nicely enough for a long time.
If-I-Recall-Correctly, it was typically a 10x memory load to open an XML file in a DOM parser. Which could get really ugly, really fast when you were dealing with many files.
I really like Clojure EDN. Its very simply, but adds just enough on-top that make a difference. Namespaces, a few more types and a way to add costume stuff in a reasonable standard way.
And of course XML libraries haven't had any security issues (oh look CVE-2025-49796) and certainly would not need to make random network requests for a DTD of "reasonable" complexity. I also dropped XML, and that's after having a website that used XML, XSLT rendering to different output forms, etc. There were discussions at the time (early to mid 2000s) of moving all the config files on unix over to XML. Various softwares probably have the scars of that era and therefore an XML dependency and is that an embiggened attack surface? Also namespaces are super annoying, pretty sure I documented the ughsauce necessary to deal with them somewhere. Thankfully, crickets serenade the faint cries of "Bueller".
The contrast with only JSON is far too simplistic; XML got dropped from places where JSON is uninvolved, like why use a relational database when you can have an XML database??? Or those config files on unix are for the most part still not-XML and not-JSON. Or there's various flavors of markdown which do not give you the semi-mythical semantic web but can be banged out easily enough in vi or whatever and don't require schemas and validation or libraries with far too many security problems and I wouldn't write my documentation (these days) using S-expressions anyhow.
This being said there probably are places where something that validates strictly is optimal, maybe financial transactions (EDIFACT and XML are different hells, I guess), at least until some cheeky git points out that data can be leaked by encoding with tabs and spaces between the elements. Hopefully your fancy and expensive XML security layer normalizes or removes that whitespace?
> JSON has no such mechanism built into the format. Yes, JSON Schema exists, but it is an afterthought, a third-party addition that never achieved universal adoption.
This really seems like it's written by someone who _did not_ use XML back in the day. XSD is no more built-in than JSON Schema is. XSD was first-party (it was promoted by W3C), but it was never a "built-in" component of XML, and there were alternative schema formats. You can perfectly write XML without XSD and back in the heyday of XML in the 2000s, most XML documents did not have XSD.
Nowadays most of the remaining XML usages in production rely heavily on XSD, but that's a bit of a survivorship bias. The projects that used ad-hoc XML as configuration files, simple document files or as an interchange format either died out, converted to another format or eventually adopted XSD. Since almost no new projects are choosing XML nowadays, you don't get an influx of new projects that skip the schema part to ship faster, like you get with JSON. When new developers encounter XML, they are generally interacting with long-established systems that have XSD schemas.
This situation is purely incidental. If you want to get the same result with JSON, you can just use JSON Schema. But if we somehow magically convince everybody on the planet to ditch JSON and return to XML (please not), we'll get the same situation we have had with JSON, only worse. We'll just get to wear we've been in the early 2000s, and no, this wasn't good.
I read the article and my first thought was it was entirely missing the complexity of XML. It started out relatively simple and easy to understand and most people/programs wrote simple XML that looked a lot like HTML still does.
But it didn't take long before XML might well be a binary format for all it matters to us humans looking at it, parsing it, dealing with it.
JSON came along and it's simplicity was baked in. Anyone can argue it's not a great format but it forcefully maintains the simplicity that XML lost quite quickly.
> It started out relatively simple and easy to understand ....
when the specs for a data representation format evolved with XML bombs abilities, it has gone too far in trying to please everyone, and that is probably why JSON won in the long run, it's not perfect but stable and simple without crazy issues you have to worry about when parsing it. If XML had a Torvaldish kind of dictator who can afford to say no, I doubt JSON would have won
We use Mulesoft where I work, and XML namespaces are a constant issue. We never managed to define an API spec in such a way that the RAML compiler and the APIKit validator would both accept the same payload. In the end we just had to turn off validations in APIkit.
Namespaces were fun! But mostly used for over engineering formats and interacted with by idiots who do not give a toss. Shout out to every service that would break as soon as elementtree got involved. And my idiot colleagues who work on EDI.
>First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.
I don’t get this argument. There exist streaming APIs with convenient mapping. Yes, there can exist schemas with weird structure, but in practice they are uncommon. I have seen a lot of integration formats in XML, never had the need to parse to DOM first.
Hence why in 2026, I still hang around programming stacks, like Java and .NET, where XML tooling is great, instead of having to fight with YAML format errors, Norway error, or JSON without basic stuff like comments.
This is why I hate (HATE. LET ME TELL YOU HOW MUCH I'VE COME TO HATE YAML SINCE I BEGAN TO WORK WITH K8S) working with Helm charts. As an example from the Helm docs...
> Note how we do the indentation above: indent 2 tells the template engine to indent every line in "myfile.txt" with two spaces. Note that we do not indent that template line. That's because if we did, the file content of the first line would be indented twice.
So you end up with YAML that looks weird, and heaven help you if you refactor and now have to adjust all the `indent N` functions to a new value of N.
That said, Helm's approach of "YAML, but with Go templating" is the main source of my hatred - why they didn't take the "It's a tree, and this child node is designated to be replaced" approach is something that's always baffled me.
If only there was one good library. libxml2 is the leading one, and it has been beleaguered by problems internal and external. It has had ABI instability and been besieged by CVE reports.
I agree it shouldn’t be hard. On the evidence, though, it is. I suspect the root problem is lack of tools. Lex and yacc tools for Unicode are relatively scarce. At least that’s what’s set me back from rolling my own.
You managed to convey my thoughts exactly, and you only used term "SOAP" once. Kudos!
SOAP was terrible everywhere, not just in Nigeria as OP insinuates. And while the idea of XML sounds good, the tools that developed on top of it were mostly atrocious. Good riddance.
> I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.
I remember a decade ago seeing job ads that explicitly requested XML skills. The fact that being able to do something with XML was considered a full time job requiring a specialist says everything there is to be said about XML.
I had great experiences with XSD as a contract in systems integration scenarios, particularly with big systems integrators. It's pretty clear whose fault it is when somebodys XML doesn't validate.
The issue is that XSD came along much later, and its use did not become binding in XML validation scenarios, hence partial success, even when the XSD-based validation tooling was available at the time.
XSD provides a clean abstraction for the technical validation that sits separately from the application / business / processing layers and dramatically increases the chances of a «clean» request reaching the aforementioned layers without having to roll multiple defensive checks in there.
Granted, an XSD can become complex very quickly, especially if indulged in too much, but it does not have to be.
First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.
Then, schemas sound great, until you run into DTD, XSD, and RelaxNG. Relax only exists because XSD is pretty much incomprehensible.
Then let’s talk about entity escaping and CDATA. And how you break entire parsers because CDATA is a separate incantation on the DOM.
And in practice, XML is always over engineered. It’s the AbstractFactoryProxyBuilder of data formats. SOAP and WSDL are great examples of this, vs looking at a JSON response and simply understanding what it is.
I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.