What baffles me is that everyone talks about software quality but very little organization actually measure their software quality. Not just some made up metrics but exactly how well the software meets its requirements.
And not that we don't know how. For any ML model, we have a validation data set and it is imperative that we measure how well a model performs on this data set and not the training data set. We know that without a validation, a machine will overfit its model into the data it has. Programmers... do the same thing. We're very good at passing tests and, unless we have a separate independent validation and verification process, we convince ourselves that green tests = quality. So our tests are always green but our backlogs are always red. And nobody seems to notice the contradiction.
Sorry, I should have been more clear. Independent validation and verification is a thing: https://csrc.nist.gov/glossary/term/independent_verification.... We still use it in, for instance, NPP automation, avionics, and defense. Which not only makes sense but usually required by law. Interestingly, outside these few domains, we usually omit it as too costly as if doing the first and only independent validation with our own users is not.
Useful to know, thanks. Trouble is, it's going to be damn expensive and I know no shortage of bosses who will become angry at having failures pointed out. It really takes money and a good mindset.
> we usually omit it as too costly as if doing the first and only independent validation with our own users is not
Users will accept absolute shite and that is definitely a cost saving. Unfortunately. End users complain a lot but in the end they'll just work around bugs and this is why I blame a lot of the deficiency in current software development on end users.
And not that we don't know how. For any ML model, we have a validation data set and it is imperative that we measure how well a model performs on this data set and not the training data set. We know that without a validation, a machine will overfit its model into the data it has. Programmers... do the same thing. We're very good at passing tests and, unless we have a separate independent validation and verification process, we convince ourselves that green tests = quality. So our tests are always green but our backlogs are always red. And nobody seems to notice the contradiction.