> Why would you not recommend code that checks the hashed password against a local DB?
I would. In fact that's why I have created a product to do exactly that in an efficient way...
Doing the checks online has too many drawbacks:
- availability: what do you do when the service/API is down or you can't reach it?
- determinism: what works today might not tomorrow
- security/privacy/anonymity: ...
I am just uncomfortable with naive code that makes it barely practical:
- if the dataset isn't pre-processed properly, binary searching through it won't lead to the expected results (and that's not always obvious)
- distributing a 30GB+ file on all the DCs
- binary searching through the dataset at runtime means seeking through 30GB... with a O(log n) complexity... in practice that means a very slow response time that gets exponentially worst with load.
If you pre-process the dataset you might as well do it "properly" and make it usable :p
Why would you not recommend code that checks the hashed password against a local DB?
See also the “pwned-passwords-django” process here: https://www.b-list.org/weblog/2018/mar/06/two-new-projects