Not being a frontend developer, I'm not sure why everyone in the comments (and t...

cnity · on May 26, 2023

I like to read and sometimes participate in these types of arguments, but do people actually hope to convince one another? We're all just flexing our viewpoint and knowledge, as far as I can tell.

If you told, for example, suckless that their page builder code[0] should not use string wrangling they'd certainly laugh and ignore the advice.

[0] https://git.suckless.org/sites/file/build-page.c.html#l23

austin-cheney · on May 26, 2023

As somebody doing this work for over 20 years I agree with you. You have to understand that HTML is itself a string serialization as is XML, markdown, YAML, and JSON. HTML is not the end state or the goal. The end state is a DOM object in memory and the goal is something visual and/or auditory rendered onto a screen. That said, HTML immediately achieves obsolescence when a browser accepts any string format with which to parse into a DOM instance.

Falkon1313 · on May 27, 2023

I don't really think that's quite correct.

To someone reading (or authoring) a document, the end state/goal is having that string of text visible and formatted well enough. They don't care about DOM objects.

The main reason we use HTML is not because people enjoy DOM-traversal or parsing or abstract syntax trees. It's because a little markup in your strings can make them format nicely and make it easy to link to and embed other stuff like images, video, and audio.

String templating/interpolation is the goal.

austin-cheney · on May 27, 2023

What some people claim to want is irrelevant to how the technology executes. We may want little strings, HTML, or whatever. The end state is a DOM object in memory irrespective of what some developers enjoy. If you want a different end state your choices at the moment are either fully abandon web technologies or use WASM.

ummonk · on May 26, 2023

It's not handwavy. If you use un-escaped user-provided strings in your HTML, it is almost certainly catastrophically insecure. One user can submit text that includes password-stealing code and when it's displayed to another user it can steal their passwords, private information, etc.

Hand-written HTML is static and thus doesn't use user-provided strings, while HTML generated by other means automatically escapes any strings given to it.

lucideer · on May 27, 2023

Security is far from the only reason, but it's a good starting point to understand the downsides of string templates so we can start by looking at it purely through the lens of security:

From a security perspective, when it comes to HTML output you're concerned about injection. A HTML opening/closing tag are code - they're read by the browser and have technical meaning for the renderer. The text in between those tags is content: you generally want to render that as is. So these two parts of the string have different purposes, they're contextually different.

Injection happens when a malicious actors gets data into your content that the browser will think (& interpret) as code. The best way to avoid this on the server side is to be aware of whether you're outputting content or code at any given point in your html document.

That's possible with string templates: its called output escaping and you just wrap and variable printing in a function that escapes special characters to avoid them being interpreted as HTML code - this is simple as there's only 5: gt, lt, ampersand, single- and double-quotes.

Every modern string templating system does this by default and its ok. But it's only a start, and is pretty limited it how far it can go.

There's actually three types of injection for HTML: injection actual HTML is just one. There's also JS or CSS injection (CSP is starting to allow mitigation of some of this but no-one uses it, mainly as it's discouraged by Google & others) and attribute injection.

Output escaping only deals with the first type of HTML injection. It can be extended to try and deal with the other two, but it's very error prone if it doesn't have knowledge of whether it's printing a variable inside a script tag, inside an HTML attribute, or just in a normal tag content. It's very difficult for string templates to get that context: structured generation gets all that context for free.

Once you have all that extra context for security, it's also useful for a bunch of other stuff (debugging/analytics/dynamic server side formatting/etc.)