"Hey, they reported cross-site scripting! Let's blacklist angle brackets, that'll do the trick!"
In case this is not clear to anyone in 2016, blacklisting known-dangerous characters is not an adequate bug fix. It's a rabbit hole, you will burn hours trying to blacklist every character or character combination that can cause a vulnerability just to have someone own you anyway.
The proper fixes for common web application vulnerabilities are as follows:
Session Hijacking/Fixation/etc.: Use TLS.
SQL Injection: Prepared statements that AREN'T emulated; PHP's defaults are bad here.
EDIT: If you're writing in another language, make sure it's not providing string escaping masquerading as prepared statements, but actual prepared statements. (My earlier comment was too broad; some forms of emulated prepared statements might be OK, but PHP's is dangerous.)
Encryption, Digital Signatures, Authenticated Key Exchanges, etc.: Hire an expert, don't do it yourself based on the advice contained within HN comments.
File Inclusion / Directory Traversal: Don't write your applications in a dumb way that makes these vulnerabilities possible. But if you must, use something like realpath() with a sanity check based on the expected parent directory (in PHP).
XML External Entities: Make sure you disable the entity loader:
libxml_disable_entity_loader(true);
PHP Object Injection in PHP 5: don't ever pass user input to unserialize(); use json_decode() instead.
PHP Object Injection in PHP 7: either disable object loading or whitelist the allowed types; i.e. unserialize($var, false); or unserialize($var, ['DateTime']);
These are just some of the common problems I frequently find, of course. There are more basic ways to mess up an application ("not even checking that you're authenticated" being at the top of that list).
"Encryption, Digital Signatures, Authenticated Key Exchanges, etc.":
If you just want to get data from A to B over the network, TLS 1.2 (but upgrade to 1.3 when it's ready). For an app(lication) where you control the code on both ends, with additional certificate pinning. Probably still worth hiring an expert to make sure you're doing it right but you have less chance of shooting yourself in the foot than if you try and roll your own.
Sometimes I think if cryptographers wrote libraries that the rest of us could use and "just work", security worldwide would improve. Bernstein's NaCl and the derived libsodium is a good starting point though.
> If you just want to get data from A to B over the network, TLS 1.2 (but upgrade to 1.3 when it's ready).
Right. If you're not using TLS for your network communications, then your communications are not secure.
Some people also have other requirements (e.g. "I need to store SSNs, how can I encrypt them and still be able to search by them in MySQL?") which require separate app-layer crypto. In those situations, don't roll your own. :)
> Probably still worth hiring an expert to make sure you're doing it right but you have less chance of shooting yourself in the foot than if you try and roll your own.
Agreed.
> Sometimes I think if cryptographers wrote libraries that the rest of us could use and "just work", security worldwide would improve.
Ah yes, boring cryptography. :)
> Bernstein's NaCl and the derived libsodium is a good starting point though.
>PHP Object Injection in PHP 5: don't ever pass user input to unserialize(); use json_decode() instead.
>PHP Object Injection in PHP 7: either disable object loading or whitelist the allowed types; i.e. unserialize($var, false); or unserialize($var, ['DateTime']);
I'd stick to not unserializing user input in both cases, that's a can of worms you just don't want to open.
Also, RNG bugs are common and exploitable enough to be worth noting: Never use mt_rand, stick to openssl_random_pseudo_bytes.
Do prepared statements count as emulated if the DB doesn't support prepared statements, but the DB adapter is doing replacement during the encoding-to-typed-binary-wire-protocol step (i.e. replacement of typed tokens with other typed tokens) rather than by just concatenating strings?
By prepared statements, I mean your application actually sends the query string in a separate packet from the data, and thereby gives the data no opportunity to corrupt the query string.
What PHP does is silently perform string escaping for you instead of doing a prepared statement. This is stupid, but PHP Internals discussions are painful (so changing it is unlikely to happen any time soon) and the userland fix is easy:
That doesn't really address my question. There are real prepared statements like you're talking about; there's the crap PHP does; and then there's what you get if you use e.g. Erlang's Postgres library, which is that you pass it this:
Postrges's prepared statements aren't being used, but the distinction between "tainted" user-generated data and the "trusted" statement is maintained, because the 5 in the above is typed data being sent over the wire in a length-prefixed binary encoding, rather than string data being serialized+escaped into another string.
Which is to say, if you (or your users) tried to put a fragment of SQL in place of the 5 above, it'd just get treated as string-typed data, rather than SQL. You don't need packet-level separation to achieve that.
But is this approach still bad for "emulating" prepared statements, somehow? I don't see how.
The answer to your question is: I don't know, that's a new solution to me.
It looks like it could be safe, but I'd have to dig into its internals to know for sure. My gut instinct is that it's probably safer than escape-and-concatenate.
If any Erlang experts want to chime in with their insight, please do.
EDIT:
> Which is to say, if you (or your users) tried to put a fragment of SQL in place of the 5 above, it'd just get treated as string-typed data, rather than SQL. You don't need packet-level separation to achieve that.
>
> But is this approach still bad for "emulating" prepared statements, somehow? I don't see how.
Above you said:
> the distinction between "tainted" user-generated data and the "trusted" statement is maintained
If this holds true, then you've still solved the data-instructions separation issue and what Erlang does is secure against SQL injection. So, yes, you don't need to send separate packets to ensure query string integrity in that instance.
The shit PHP does is what I meant to decry when I was talking about emulated prepared statements.
Thanks for broadening my horizons a bit. I've edited my earlier post. :)
Is it too early to be suggesting Argon2? I've not heard of it until now, but the Wikipedia entry[1] shows that the paper was just released late last year.
Most environments don't have an implementation for it yet, and the ones that do will probably only get it through libsodium for the first few years.
> I've not heard of it until now, but the Wikipedia entry[1] shows that the paper was just released late last year.
Argon2 was the winner of the Password Hashing Competition, a several-year cryptography competition to find a new password hashing algorithm that would be secure against an attacker armed with a large GPU cluster.
The judges included a lot of famous cryptographers and security experts. Of particular note: Colin Percival, the author of scrypt, and Jens Steube, the project lead for hashcat.
I've read the paper and I think Argon2 will stand the test of time, but I could (of course) be wrong.
Most environments don't have an implementation for it yet
The speed with which environments actually got implementations of previous secure algorithms was half the problem with their use, but I think Argon2 has this nailed. The README now links bindings for Go, Haskell, JavaScript, JVM, Lua, OCaml, Python, Ruby and Rust.
I don't trust them. The various language bindings are maintained by random people who have gone through no particular vetting, and their code is not formally reviewed by anyone.
When I started looking through the node bindings, I found a number of minor bugs and a critical issue that left ~1% of passwords vulnerable.
I trust that the C developers do a good job, but phc-winner-argon2 does not appear to have ever made a formal release. Is master really always perfect?
My suggestion, if you really want to overkill and knock it out of the park: use both. Run it through bcrypt, then through Argon2. If something happens where one of them is deemed insecure/bad practice, you've still got the other one.
This falls into the category of "coming up with your own system". It sounds theoretically as strong as either one, but it could end up weaker overall.
Define X as the maximum time you can allow a hash to run on your server, before it either starts to annoy users, or becomes a DoS issue. Moving from "Argon2, such that it runs for X" to "both algorithms, with a total cost X" means both of them are running with a much reduced work strength.
In the case of Argon2, there is an "iterations" counter, but t=2 is already reasonable, and on low end hardware, you may see t=1. So as per the spec, reducing runtime in order to make whole thing work is going to involve reducing m.
Except bcrypt is already not memory hard, and you've just reduced the only memory constraint in your algorithm.
And entirely possible there are bigger issues I didn't up with two minutes of thinking about it.