The Epistemology of Software Quality

thephyber · on April 28, 2023

Agree with lots of the criticisms of the piece in other comments.

I do instinctively agree with the theme of the piece that “human factors” have more of an effect on software quality than “technical factors”. I just don’t think the physiological factors listed in the articles are the biggest levers.

Sure, getting enough sleep, having a reasonable diet and exercise are great for people long term, but I think these factors have a much larger impact:

  - company culture,
  - whether a developers feel like they can push back on requirements,
  - whether deadlines / expectations allow for sustainable pace of product development,
  - whether product managers / other stakeholders put objectively bad requirements on the project (eg. trading off security for user convenience)

ChrisMarshallNY · on April 28, 2023

There’s also the very old-fashioned “talent & passion” thing.

Some people have it; some don’t.

If someone has it, it’s generally attached to a persona with various drivers and flaws, and it takes a good manager to direct that talent and passion.

It can get even trickier, when we have teams of these folks.

That’s not a popular stance, these says. We’re supposed to come up with process and technology, that lets mediocre managers run teams of mediocre workers, yet produce excellent results.

Lemme know how that works out.

thephyber · on April 28, 2023

I would agree that there’s probably no match for talent+passion, but the fact is that there aren’t enough people with those attributes to pull an entire economy. Far better to live on the ground (not in the clouds).

That said, processes and technologies don’t have to treat each employee as a newbie. Example: I remember Facebook had a deployment process which gave each developer a reputation. Getting a reputation for deploying code that requires hot fixes to revert broken code meant penalties. Penalties required additional pre-merge / pre-deploy reviews and proof of testing. I remember that as a good combination of treating developers as adults but also providing targeted procedures when they are needed.

hn_user82179 · on April 28, 2023

Huh, how was the reputation determined? Was it like a quantified & automated thing, or more casual like, a senior developer decides you need to do more testing since you’ve broken too much stuff.

thephyber · on April 28, 2023

I learned about this from outside, but my understanding was that negative reputation was earned when someone from the deployment or development team pointed to your code as being the reason a patch had to be rolled back. Presumably because it broke during CI on the master/release branch. Not sure if there were other was to earn/lose reputation points.

giovannibonetti · on April 28, 2023

Given the thousands of developers there, I would assume it was automated. FB is known for having built great tooling for this kind of thing, like measuring how flaky each test is.

disgruntledphd2 · on April 28, 2023

I think that it was originally up to the release engineers, but they probably automated it after they moved to continuous deployments. (chuckr, where are you now?).

rqtwteye · on April 28, 2023

"If someone has it, it’s generally attached to a persona with various drivers and flaws, "

this stereotype needs to die. I have never found any correlation between being talented and passionate and being jerk. Quite the opposite. Most talented people are easy to work with but you should be able to keep up. Most jerks just think they are talented but aren't and have to be jerks to keep up appearances.

throwaway-3-2-1 · on April 28, 2023

I think there is merit to this stereotype and you should not use personal anecdotes to argue ("i have never met/seen/found") against it.

Having a talent for programming imo means being good at abstracting away subproblems into nice encapsulated code units and having a passion for such tasks or the technical environment around it is like being a hobby bureaucrat. Now, tell me, this is normal :)

I could also use personal anecdotes of me, a classic nerd, rubbing incompatible company culture the wrong way.

Previously on HN: Y Combinator - The Cult of Conformity in Silicon Valley https://www.youtube.com/watch?v=ia7IKW0yuG0

ChrisMarshallNY · on April 28, 2023

“various drivers and flaws” doesn’t mean “jerk,” although lots of folks like to assume that anyone not in a fairly narrow band of behavior, is one.

I get treated like a jerk, sometimes, and I’m not one. I just don’t match the “modern software developer” stereotype, and that often makes folks uncomfortable. Most of my employees, while I was a manager, also were quite “different.” We weren’t always sunflowers and unicorns, but we also weren’t prima donnas.

I will tell you one thing: If we treat people with hostility, they will usually return the favor, which can establish an immediate negative feedback loop. Since humans tend to have “other -> hostile” built into our operating system, we sometimes never give cooperation a chance; starting many relationships off on a bad foot, right out the door. We decide someone is a “jerk,” because they fit (or don’t fit) an internal stereotype, and the rest is history, as we make sure that the relationship is going nowhere, from the start.

It can be seen on this very forum. I know that it happens to many others, but I can only speak from my experience.

Fairly regularly, someone that has never had any interaction with me, of any kind, suddenly responds to one of my posts, with hostility; usually in the form of an insult.

I may come across as a bit "stuffy," but I sincerely never mean to offer offense, or throw punches. I do have some personal positions that go against the common grain, but I don't consider myself to be much of a "bomb-thrower." Basically, I feel like it's a privilege to participate here, and try to bring something good to the table.

It was my job, as a manager, to understand each of my employees, as an individual, and my team, as a whole, and keep a balance, while also ensuring that the company’s priorities were being met. Also, there are limits, as to how much individual focus we can give each team member, when it comes to things like employment law and corporate policy, so there’s always tradeoffs.

It’s also been my experience that a good manager (and I like to think I was one) can coax excellence out of almost anyone. We often have a rockstar in us, but each individual has different blockers and accelerators. If a corporation has a culture of mediocrity, then they can force racehorses to pull plows, and Clydesdales to run steeplechase.

Some of my employees were driven by basic avarice, but I can really only think of maybe one or two, during my 25 years as a manager. Most were focused on their families, excitement over their work, being included as peers in high-functioning teams, or the satisfaction of a job well done.

Not everyone in my team always got along with each other, but it’s difficult to have a bunch of self-sufficient high achievers together, without friction. Surprisingly, we managed to stay together, as a team, for decades, and deliver value for our company.

switchbak · on April 28, 2023

You're not particularly wrong, I've just found working with alpha-geek Prima Donnas gets old really fast. If they don't mature as they come along, their antisocial behaviour typically outweighs their contributions by a large measure.

Personally I'd much rather work with bright folks that are empathetic and work well in teams, and trust that they'll learn what they need to when we need them to. Seems to work out pretty well.

ChrisMarshallNY · on April 28, 2023

It's a shame that we immediately assume talented and passionate people are jerks. Not sure where that comes from. I suspect that it may be a US thing, as we are a ferociously competitive culture, and, these days, it seems that everyone on a team considers their teammates to be "the competition."

switchbak · on April 30, 2023

You said "it’s generally attached to a persona with various drivers and flaws", and I likened that to some common personality flaws I've seen in the wild. I don't think that's a wildly unfair interpretation, and I think it was reasonable in this context.

I don't suspect it's a US thing in my case, because I don't live in the US.

Not all talented and passionate people are jerks. Not all Prima Donnas are jerks either, but working with them can be very tiring. I find it to be a losing proposition in mature teams, but they have their place. I say this as a recovering Prima Donna myself.

staunton · on April 28, 2023

Trading off security for user convenience is not objectively bad. In fact, it is always a must. The most secure product is no product.

Of course, what you really mean is that this tradeoff must be made in such a way that the product is still secure enough, as defined by the use-case. I only think this nitpick is warranted because we are discussing "epistemology".

smokel · on April 27, 2023

This article is nice to read, but I believe that some of the arguments are quite flawed. I doubt that empirical evidence is helpful in this context.

Some references to studies are used to make bold statements. The ones on static vs dynamic typing for example, may look scientific, but scanning GitHub repositories or comparing student assignments does not convince me much about professional applications. It just seems nearly impossible to say something evidence based about this topic. Perhaps rewriting the entire Linux kernel along with all its drivers in Rust might be a nice approach. If it were done in parallel, starting in 1991.

Also, applying results from studies in one domain to another is a red flag to me: if productivity drops for construction crews working extra hours, I fail to see how one can conclude that the same holds for software developers. Of course the conclusion is likely, but I don't need a scientific paper for that. The study was from 1980, mind you.

Too bad, because the premise is really interesting and thought provoking. Focusing on the human factor in software development makes a lot of sense.

nunuvit · on April 27, 2023

The fundamentals of defect density have been known for a long time [1]. The factors that explain the vast majority of problems fit on a single slide. They're worded generically to apply across domains and it's very simple to evaluate whatever your process is against them.

[1] https://www.slideshare.net/AnnMarieNeufelder/the-top-ten-thi...

majormajor · on April 28, 2023

Somewhat orthoganal list here. That top 10 is almost all about "release frequently and test exhaustively and have well understood requirements." The linked article is all about "sleep and be healthy and work sustainably." You can be healthy happy and productive and ignore tests; or stressed and unproductive and write shit tests against shit requirements.

nunuvit · on April 28, 2023

The linked article is about productivity and code quality. This list is about defect density, which is one aspect of code quality. It does not address everything in the article, but it does remove the mystery and common misconceptions of that one aspect.

marcosdumay · on April 28, 2023

That's very interesting, I just loved how she couldn't discover the impact of Agile because nobody she looked was doing it right.

switchbak · on April 28, 2023

That tracks with my experience. Or certainly what I hear others say of their experiences. I've struggled and had what I believe to be a fair amount of success with Agile methods in many ways over the last 20 years. What shocks me is how few folks I talk to have similar stories.

Edit: clarification.

marcosdumay · on April 28, 2023

Just to be clear, she didn't test for it because no two people were doing the same thing when they said they did agile.

Her data fits my experience. Your comment seems to reflect a different thing, that may not even be incompatible with that, just not very related.

m3047 · on April 27, 2023

Code review works. Ok, yeah I'd say that is the one practice which I've seen consistently tied to high(er) quality work.

Article is pretty handwavy about epistemology and just things generally. But I'd say not all code reviews are equal, and that the best ones aren't about the code (in the way that baseball is not about bats and balls) and are instead about people learning to talk to each other, about code. Probably also about how that "campfire" provides a venue to discuss other institutional lore and hence culture.

majormajor · on April 28, 2023

There's no better example of the sort of problems "you get what you measure" causes than how even code review can go wrong. I've seen people use code review as an excuse not to test their own code, which is one of those things that made my eyes pop out the first couple of times I looked at a request with code that blatantly didn't compile or run.

That's not to say "don't do code review" but it's worth making sure the team understands the goals and their individual responsibilities vs just blindly putting a process out there. Like you say, "communication" is handwavy but critical.

kqr · on April 28, 2023

Code review really does work, yes. Boehm has it at 60 % defects discovered at 15 % additional effort.

Apparently "directed" or "scenario-based" code review works even better, uncovering an additional 20 or so percentage points of defects. But I have yet to find out what that means! Does it mean having a list of common problems and looking for one at a time? Does it mean taking a very concrete user story and mentally executing the code involved in satisfying it?

m3047 · on April 28, 2023

Best code reviews I've participated in were in a CMM 3 shop which was a division of a large corp. There was overarching method to the madness, but as a lowly engineer serving our manufacturing customer it looked more or less like:

1) pair up (from my POV) randomly (not everybody included in this is a coder)

2) (from my POV) flip a coin to decide who picks an item to review

3) pick an item you've touched recently or if there isn't one something you're going to touch in the next week (could be doc, build scripts, other things depending on what your job is; you might adjust your selection depending on your peer's duties and responsibilites, no sense having them review something they will never ever even look at).

4) flip a coin to decide who is going to explain what the code does

5) one person explains what the code does / doc means to the other person; the other person asks clarifying questions

Repeat the above steps weekly.

There was a sheet that was filled out that recorded defects found in categories such as logic, (failure of) abstraction, requirements, doc. We talked about style but there wasn't really an enforced style guide.

It can be uncomfortable at first to talk about somebody else's code or have them explain your code. You learn to listen and have a sense of humor and be patient. Perhaps the most poignant lessons learned had to do with doc in code (generally more is better but watch out for zombie comments) and sometimes the most clever way of doing something isn't worth the potential for misunderstandings. There was usually something which needed more doc! Improving tests was also a common outcome. We did find logic errors occasionally, which would inevitably lead to discussion around improving tests. Requirements issues and abstraction / architecture issues came up infrequently and typically spawned email or a meeting which came later.

kqr · on April 29, 2023

This sounds interesting! So instead of someone else asynchronously asking clarifying questions of the author, one person synchronously asks clarifying questions of someone who is not the author? I really like the idea that the artifact should be understandable to people who don't have immediate access to the author.

What was the process if the artifact was "rejected" in this type of review and neither half of the pair had enough understanding to fix it?

m3047 · on April 29, 2023

Good questions!

> instead of ... asynchronously

Yes.

> What was the process if the artifact was "rejected"

We only did it once a week and all code was not reviewed. Well, I should say not all code was formally reviewed. And we had tests, and some of them came from "real" Engineering (level 1). We didn't have 100% coverage, but "more tests" was a frequent outcome, and it was implicit that this wasn't limited to the specific artifacts which were reviewed. There might be a timing or integrity issue and so there might be some attempt to determine what the appropriate scope of testing was (the emails / meetings I mentioned).

I don't recall that anything was ever rejected in one of these. [Edit - added] This was mostly front-loaded, because we never wrote code until there was some consensus on what change was needed. So some of these were somebody bringing a "problem child" and either walking through it or enjoying the entertainment as some rando did it.

The process existed with the implicit goal of levelling everybody into some shared framework of discussing code. We would reach out to peers all the time for informal review and it was pretty frictionless (email somebody with a branch and particular range of LOC). About the only time you wouldn't reach out like that was if you had pretty close to 100% test coverage. 8-)

> neither ... of the pair had enough understanding to fix it

In another fine example of "moral hazard for good" that never happened! Less facetiously those ended up being the requirements / architecture issues. So as far as the process was concerned the pair reached a consensus.

> synchronously asks clarifying questions of someone who is not the author

Yes, about half the time the person explaining the code was the one who'd never seen it before that moment. LOL. Brutal. OTOH I don't know if it's more brutal than explaining code to someone whose job doesn't actually include coding... Just different.

Like I said, it improves people's ability to talk about code and creates a culture that makes it easier to seek help the rest of the time.

kqr · on April 29, 2023

Thank you for sharing! It's clear you've thought about this a bit. Do you happen to have a blog or similar where you publish more of your thoughts?

m3047 · on April 29, 2023

> Do you happen to have a blog

No I do not. I've got an unindexed web server where I post random stuff from time to time, and occasionally I write articles on LinkedIn.

I'm heartened you care. I care about quality, what gets me up in the morning is helping people up their game. But like security (which is largely a quality problem) quality is something you do not something you buy (baseball comment farther upthread) and it's been an even harder sell since the VCs were given zero interest money to throw around.

The disciplines where people (are supposed to) care about quality and measurement seem to be captured by vendors with checklists and pseudo managers with alphabet soup after their names. I'm in a niche market which largely gets papered over by those parties. Most of what I'd write has been or would be co-opted and therefore would sound the same as what they'd write and I'd just be lost in the noise.

I'm open to smart ideas for finding the people who care enough about quality to realize that it's actually something you have people do utilizing the expensive toys you buy.

flurgle · on April 27, 2023

There was no epistemology in this article.

whitemary · on April 27, 2023

Yeah, it gives a brief critique of existing ontologies but does not proceed into epistemology.

actionfromafar · on April 27, 2023

I guess we are it then?

_match · on April 27, 2023

> Static typing? One study, presented at FSE 2014, found no evidence that static typing is helpful—or harmful

And yet the abstract of the linked paper says:

> Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing.

luu · on April 27, 2023

If you click through the link in that sentence to https://danluu.com/empirical-pl/ or read the study itself, you'll see that the paper doesn't support the claims made in the abstract at all.

It used automatic classification that's obviously wrong. Table 1 gives a list of "top" projects for each language and many of them are simply misclassified.

> ... the "top three" TypeScript projects are bitcoin, litecoin, and qBittorrent). These are C++ projects. So the intermediate result appears to not be that TypeScript is reliable, but that projects mis-identified as TypeScript are reliable. Those projects are reliable because Qt translation files are identified as TypeScript and it turns out that, per line of code, giant dumps of config files from another project don't cause a lot of bugs. It's like saying that a project has few bugs per line of code because it has a giant README. This is the most blatant classification error, but it's far from the only one.

> For example, of what they call the "top three" perl projects, one is showdown, a javascript project, and one is rails-dev-box, a shell script and a vagrant file used to launch a Rails dev environment. Without knowing anything about the latter project, one might expect it's not a perl project from its name, rails-dev-box, which correctly indicates that it's a rails related project.

There are other major problems with the study, but that one is sufficient to make the results invalid.

_match · on April 27, 2023

Are you the author of the meta analysis? If so, thanks for your work on that.

But I was not commenting on the quality of the studies. I did not think the author of "The Epistemology of Software Quality" was either.

nathias · on April 28, 2023

I'm more on the technical solution team, mostly because the problems of the whole process (including your sleepless nights) will be determined by the particularities of your technologies. You can choose your technology well or you can choose poorly and go down the rabbit hole of new and new secondary problems and issues that only exist because of that poor decision. Practical wisdom is very effective.

That said of course life/work balance is an important aspect and we should strive to avoid work cultures that value time invested versus output, because they reward mostly the appearance of bussiness instead of useful work, so you get both negatives workers are tired from working hard on appearing to be busy and the production becomes secondary, lower quality and slower.

evolve2k · on April 27, 2023

Great article but I’m left with that it doesn’t actually address the title. Academically what is the epistemology of software quality? I sense it’s also related to practices of software craftsmanship, of which taking care of yourself and your sleep could be interpreted to fit within a broader goal of “improving my quality as a software developer”.

okaleniuk · on April 28, 2023

What baffles me is that everyone talks about software quality but very little organization actually measure their software quality. Not just some made up metrics but exactly how well the software meets its requirements.

And not that we don't know how. For any ML model, we have a validation data set and it is imperative that we measure how well a model performs on this data set and not the training data set. We know that without a validation, a machine will overfit its model into the data it has. Programmers... do the same thing. We're very good at passing tests and, unless we have a separate independent validation and verification process, we convince ourselves that green tests = quality. So our tests are always green but our backlogs are always red. And nobody seems to notice the contradiction.

flappyeagle · on April 28, 2023

What would you have us do? There is no such validation process.

okaleniuk · on April 28, 2023

Literally independent validation and verification. https://csrc.nist.gov/glossary/term/independent_verification...

In some domains, doing IV&V to see how well your software really works is required by regulations, in others... we just guess.

_a_a_a_ · on April 28, 2023

Agree. A positive suggestion would be welcome and very interesting to me.

okaleniuk · on April 28, 2023

Sorry, I should have been more clear. Independent validation and verification is a thing: https://csrc.nist.gov/glossary/term/independent_verification.... We still use it in, for instance, NPP automation, avionics, and defense. Which not only makes sense but usually required by law. Interestingly, outside these few domains, we usually omit it as too costly as if doing the first and only independent validation with our own users is not.

_a_a_a_ · on April 28, 2023

Useful to know, thanks. Trouble is, it's going to be damn expensive and I know no shortage of bosses who will become angry at having failures pointed out. It really takes money and a good mindset.

> we usually omit it as too costly as if doing the first and only independent validation with our own users is not

Users will accept absolute shite and that is definitely a cost saving. Unfortunately. End users complain a lot but in the end they'll just work around bugs and this is why I blame a lot of the deficiency in current software development on end users.

m3kw9 · on April 27, 2023

For sure like top tier athletes, they know it isn’t their skill that could hinder them the most, but sleep. Your brain needs to be sharp to beat other world class athletes to the punch. But it’s just a factor, a big factor. Skill matters too but that comes from years of practice

switchbak · on April 28, 2023

A huge part of keeping your brain sharp is getting lots of exercise too!

evolve2k · on April 27, 2023

I recently added standardb a very comprehensive linter to an old community project. https://github.com/demingfactor/calagator

The tools detected an absolute wall of code smells and security bad practices. Surprised that linters don’t have a reported quality impact. I personally was very glad for the issues it auto resolved and the issues it bought to my attention to address.

lifeisstillgood · on April 27, 2023

The things that make a team more effective are small things like psychological safety.

But at a level of 2 orders of magnitude above the team say (ie team(s) stop being teams at 100 people - so imagine you manage 10,000 people) the psychological safety factors stop mattering as much as just throw bodies at the wall till you break through.

not sure how to fix that

nathias · on April 28, 2023

Poorly titled, I expected an article about how we can know software quality more objectively or something like that, as quality is always somewhat subjective or culturally dependent.