If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.
I know where to find research literature. I'm asking for specific citations. Is the claim an urban legend? If "studies show" X, one ought to be able to point to the studies.
How many lines is this?
All three of your examples have the same number of tokens, so to judge by them alone, token count is not just a good measurement of code size, it's a perfect one. My question is what's wrong with it.
I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens. In fact I don't see how anything can be simpler than counting tokens, since it's easy to know what it means, a tokenizer is always available, and everything irrelevant to the program is by definition dropped.
At least by the definition I'm accustomed with, a logical line is a statement or series of statements which directly and logically belong together, for example a function call or an arithmetic operation.
Logical lines are independent from physical lines as each logical line can be split over multiple physical lines (eg splitting up a long string IO operation), or one physical line can contain multiple logical lines (this is a bad idea in most cases though).
Since the definition hinges a bit on what somebody considers as "logically belonging together", the whole concept is a bit fuzzy. Consider this string formatting operation (Python):
Both are valid interpretations of logical lines, but they are visually and conceptually quite different, which makes it - imho - a bit problematic to try and use them as a measure of code quality.
You make sense, but the concept itself seems so fuzzy and hard to nail down that I marvel at how it ever arose, given that there already exists an unambiguous and ubiquitous way to distill the logical structure of code free of textual artifacts.
I think token count hasn't gained traction because of languages with syntax and types. "Surely all the boilerplate in defining a class doesn't make it more complex," goes the argument.
I prefer to turn the debate around on its head. Rather than argue complexity metrics (boring) I say I prefer languages without boilerplate because they make complexity harder to camouflage.
One can argue that boilerplate doesn't add to complexity (though I don't agree). But no one can argue that it doesn't add to code size. The studies cited in this thread show either linear or superlinear growth of bug count with code size. If those studies are correct, doesn't that rather settle the issue?
One way that I could see where actual lines is a more useful measurement is if there is any correlation between what percentage of the code you can see on-screen at any time and bugs.
Not saying that there is such a correlation, just that there may be cases where it is useful to measure by line count.
> I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens.
I wasn't rejecting token counts per se. I think that it's a useful metric too.
What I was trying to convey is that "logical lines" is the term used in the literature. Logical lines can cover token counts if you define 1 token = 1 logical line. Or it might not. Either way, you have to settle on a definition.
I know where to find research literature. I'm asking for specific citations. Is the claim an urban legend? If "studies show" X, one ought to be able to point to the studies.
How many lines is this?
All three of your examples have the same number of tokens, so to judge by them alone, token count is not just a good measurement of code size, it's a perfect one. My question is what's wrong with it.
I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens. In fact I don't see how anything can be simpler than counting tokens, since it's easy to know what it means, a tokenizer is always available, and everything irrelevant to the program is by definition dropped.