I took his post as a fairly standard "worse is better" argument. Many newer or more fashionable languages enable some very elegant programming styles, but this comes with a concern about the elegance of one's code, which can easily result in the programmer spending more time thinking about elegance than about functionality. For example... should I use map or reduce here, or maybe an iterator, or... oh fuck it, I'll use a for loop. It turns out that the old-school for loop works just as well despite earning you precisely zero style points.
There's actually something quite liberating about languages that deny you any clever solutions. Just write code that works and don't worry about whether you could have used <fashionable programming concept X> instead. I found Go to be quite refreshing for the same reason - the standard library is pretty small and the language lacks anything particularly magical, even some stuff that Java has like generics. The end result, however, is code that is very easy to read and write for anyone who learned imperative programming in the last 30 years.
> Many newer or more fashionable languages enable some very elegant programming styles, but this comes with a concern about the elegance of one's code, which can easily result in the programmer spending more time thinking about elegance than about functionality.
That's a common anti-intellectual fallacy about why people code in higher level languages. High level languages are attractive because they let you write code faster and make your code easier to maintain, and not because they're "fashionable".
>There's actually something quite liberating about languages that deny you any clever solutions
To me, "deny[ing] clever solutions" translates to "this code is going to be a pain in the ass to write because I'm going to waste time fighting the language to get what I want." Programmers learn to avoid unnecessarily complicated code as they gain more experience. Mature programmers should have the freedom to exercise their judgement about what kind of code is appropriate.
> That's a common anti-intellectual fallacy about why people code in higher level languages. High level languages are attractive because they let you write code faster and make your code easier to maintain, and not because they're "fashionable".
Writing code faster is not my problem. The problem is building systems (that work) faster and this need sometimes requires using higher-level abstractions that maybe allow for more modularity, that maybe are more composable, or maybe are safer to use (and I'm specifically thinking here of concurrency, parallelism, asynchronous I/O, etc...). Higher-level abstractions is my goal when learning a new language (this or gaining access to another platform).
"clever solutions" is indeed an unwarranted euphemism that people use. But we are ending up having such discussions, precisely because we aren't defining well what we are talking about. Writing less code is a subjective problem. Not being able to build and/or work with a certain abstraction is an objective problem.
>High level languages are attractive because they let you write code faster and make your code easier to maintain, and not because they're "fashionable".
Maybe, maybe not. But at least 2 of the 3 points that you made are not true of JavaScript. Also, the whole "write code faster" has always been perplexing to me. The speed at which you write code is never significantly different between any two mainstream languages to really matter in the end, especially since the lifetime of a piece of software is dominated by maintenance.
And I don't know what you mean by "High level language". Java is a high level language.
>To me, "deny[ing] clever solutions" translates to "this code is going to be a pain in the ass to write because I'm going to waste time fighting the language to get what I want."
"Also, the whole "write code faster" has always been perplexing to me. The speed at which you write code is never significantly different between any two mainstream languages to really matter in the end, especially since the lifetime of a piece of software is dominated by maintenance."
One of the very few bits of relatively solid software engineering that we have is that line count for a given task does matter. Fewer lines written by the programmer to do the same thing strongly tends to yield higher productivity. (Note the "by the programmer" clause; lines autogenerated... well... correctly autogenerated tend not to count against the programmer, which is an important part of doing Java, or so I hear.)
And remember, if this were not true, we'd be programming entirely differently; why do anything but assembler if line count doesn't matter? You might be tempted to thing that's some sort of rhetorical exaggeration, since it sort of sounds like one, but it's not; it's a very serious point. If line counts are actually irrelevant, then we'd never be bothering with high-level languages, which up until fairly recently have the primary purpose of doing a whole bunch of things which, in the end, reduce line count.
(Slowly but surely we're starting to see the mainstream introduction and use of languages that also focus on helping you maintain invariants, but that has still historically speaking been a sideline task and niche products.)
> Note the "by the programmer" clause; lines autogenerated... well... correctly autogenerated tend not to count against the programmer, which is an important part of doing Java, or so I hear.
Sorry, but unless you have an architect dissect the problem to exaustion and freeze the architecture after it, no piece of software creates the correct autogenerated code, and those lines still have to be changed by the programmer. Several times.
And if you have an architect dissect the problem to exaustion and freeze the architecture after it, that's already a bigger problem than dealing with all that autogenerated code. No win.
In theory, I have a citation. There have been actual studies done that show roughly equal productivity in several languages as measured by lines of those languages. However, I can't google them up through all the noise of people complaining about line counts being used for various things. And I phrased it as "one of the very few bits of relatively solid software engineering" on purpose... that phrase isn't really high praise. You can quibble all day about the precise details, not least of which is the age of the studies in question.
Still, I do stick by my original point... if you think lines of code are irrelevent, it becomes very difficult to understand the current landscape of language popularity. A language in which simply reading a line from a file is a multi-dozen line travesty is harder to use than a language in which it's two or three, and that extends on through the rest of the language. I know when I go from a language where certain patterns are easy into a higher B&D language where the right thing is a lot more work, I have to fight the urge to not do the lot-more-work, and this higher level "how expensive is it to use the correct pattern?" is a far more important, if harder to describe, consideration across non-trivial code bases.
1) How do you count "expressions"? is (b+sqrt(b * b - 4 * a * c))/(2 * a) one expression or 14?
2) Assuming reasonable coding style and reasonable definition for what an "expression" is, the variance of the measurement "expressions per line" will be very small - thus, "number of expressions" and "number of lines" are statistically equivalent as far as descriptive power goes.
I don't have a citation, although I do remember this conclusion mentioned in PeopleWare - specifically, that "number of bugs per line" tends to be a low variance statistic per person, with the programming language playing a minor role. I might be wrong though.
But I can offer my personal related experience ("anecdata"?) - when you ask multiple people to estimate project complexity using a "time to complete" measure, you get widely varying results that are hard to reason about. However, when you ask them to estimate "lines of code", you get much more consistent results, and meaningful arguments when two people try to reach an agreement. YMMV.
I feel like you probably haven't coded IO in a language like c# (in earlier versions anyways) or Java if you think I'm playing semantics.
Expressions are distinct from Compositions, and both influence LOC. I wouldn't suspect that Java software is of generally lower quality than Ruby code on average for example even though in Java you might see a Reader around a Buffer around a Stream instead of Ruby's `open`.
I guess what I'm getting at is what you might loosely call boiler-plate. Java has a lot more boiler plate. Which could easily result in 2X higher LOC. Having worked with more Ruby than the average bear I feel very confident being skeptical of the assertion that Ruby libraries are generally of higher quality/fewer bugs.
I think your last anecdote is more getting into Five Whys territory, and it's probably reasonable to expect a greater degree of consensus then.
Final note: Scala is typically less verbose than Ruby by a fair margin (at least if you leave off imports). Idiomatically usage is also Functional to a significant degree in a way that no Ruby library I've ever seen comes close to. So does that automatically mean that Scala is the superior language? (Well of course it is ;-D, but is that the reason?)
The question is simple, and it's about math and statistics.
How do you count lines? On unix, "wc -l"; if you insist, sloccount, but "wc -l" is a good approximation.
How do you count expressions? The fact it will take you a few paragraphs to answer (you haven't, btw) indicates that it's a poor thing to measure and try to reason about.
I've done some IO code in C# (mostly WCF, bot not just), and I still think you are playing with semantics as far as statistics is concerned.
Figure out an objective, automatable way to count your "expressions" or "compositions" or "code points" or "functional points" or whatever you want to call it. Run it on a code base, and compute the Pearson r coefficient of correlation. It's likely to be >95%, which means one is an excellent approximation of the other.
And I have no idea what you were trying to say about Scala. I wasn't saying "terser is automatically better". I was saying, (and I'm quoting myself here: "number of bugs per line" tends to be a low variance statistic per person, with the programming language playing a minor role"). Note "per person"?
So backwards first I guess. "per person". Ok. But given the range of programmers I guess that's not an incredible surprise. Yes the person is more important than the language. I'd buy that.
I guess "expression" seems semi-obvious to me since it's a standard rule in SBT. Variable assignments, return values and function bodies might get close.
val a = 1 + 1
That would be an expression. Instantiating a DTO with a dozen fields, using keyword arguments and newlines between for clarity would be a single expression to me.
An if/else with a simple switch for the return value would be an expression for example. A more complex else case might have nested expressions though.
It takes some charity I suppose; one of those "I know it when I see it" things. I don't do a lot of Math based programming though. It's all business rules, DTOs, serialization, etc. So maybe not something that could be formalized too easily.
I guess where I'd intuitively disagree (and would be interested in further reading) is that LOC as a measure just doesn't feel like it works for me.
Considering only LOC to implement a task it's likely: Java, Ruby and Scala in that order (from most to fewest). But in my personal experience bugs are probably more like: Ruby, Java, Scala from most to fewest.
Hopefully that helps clarify and not just muddy what I'm trying to express further.
What confuses me is that you appear to be claiming that fewer LOC should correlate strongly with fewer bugs, but then go on to say that terser is not automatically better (in this context (sic?)). Maybe I'm reading more into it than you intend, but I'm left a bit confused.
Which is a confusing use of the term "expression", since it is very well defined when talking about languages - in fact, most formal grammars have a nonterminal called "expr" or "expression" when describing the language.
Your description, though, more closely correlates with what most languages consider a statement.
Regardless, it's just pure statistics - if you calculate it, you'll notice that you have e.g. 1.3 expressions per line, with a standard deviation of 1 expressions per line - which means that over 1000 lines, you'll have, with 95% confidence, 1200-1400 expressions -- it wouldn't matter if you measure LOC or "expressions".
> What confuses me is that you appear to be claiming that fewer LOC should correlate strongly with fewer bugs, but then go on to say that terser is not automatically better (in this context (sic?)). Maybe I'm reading more into it than you intend, but I'm left a bit confused.
What I'm claiming is that, when people actually measured this, they found out that a given programmer tends to have a nearly constant number of bugs per line, regardless of language - that is, person X tends to have (on average) one bug per 100 lines, whether those lines are C, Fortran or Basic - the variance per programmer is way larger than the variance of that programmer per language.
Now, PeopleWare which references those studies (where I read about that) was written 20 years ago or so - so the Java or C++ considered wasn't today's Java/C++, things like Scala and Ruby were not considered. However, I'd be surprised if they significantly change the results - because those studies DID include Lisp, which -- even 20 years ago -- had everything to offer that you can get from Scala today.
So, in a sense - yes, you should write terse programs, regardless of which language you do that in. If you wrote assembly code using Scala syntax, and compiled with a Scala compiler - Scala is not helping you one bit.
No, his point is that you don't have to fight the language. The course in the language is obvious, it's just long and kind of ugly. It's wading, not fighting.
My problem is that the solution in a slightly more expressive language is equally obvious, so I have to fight my own rage at java for its stuttering and clumsy excuse for closures. Fighting the language, on the other hand, is when I try to write a conditional or loop in sh.
> Many newer or more fashionable languages enable some very elegant programming styles, but this comes with a concern about the elegance of one's code, which can easily result in the programmer spending more time thinking about elegance than about functionality.
This isn't a bad tradeoff though. Elegance means generally gains in maintainability, possibly with lesser costs in the actual development. And thinking about elegance in code is the first step to writing better, more maintainable code.
> For example... should I use map or reduce here, or maybe an iterator, or... oh fuck it, I'll use a for loop. It turns out that the old-school for loop works just as well despite earning you precisely zero style points.
I usually stick with for loops because they are clear. Remember:
elegance is about simplicity and clarity. If you sacrifice either of these, you are reducing the elegance of your code.
The "worse is better" argument is in the context of Unix and C and cannot be separated from that context, otherwise it is meaningless.
And a lot of thought went into Unix, as evidenced by its longetivity and long lasting tradition of its phylosophy. To date it's the oldest family of operating systems and at the same time, the most popular. Anybody that thinks the "worse" in the "worse is better" argument is about not carrying, is in for a surprise: http://en.wikipedia.org/wiki/Unix_philosophy
Even in the original comparisson to CLOS/Lisp Machines outlined by Richard Gabriel, he mentions this important difference (versus the MIT/Stanford style): It is slightly better to be simple than correct.
But again, simplicity is not about not carrying about design or the implementation and in fact the "worse is better" approach strongly emphasises on readable/understandable implementations. And simplicity is actually freaking hard to achieve, because simplicity doesn't refer to "easy", being the opposite of entanglement/interwiving: http://www.infoq.com/presentations/Simple-Made-Easy
"Worse is better" can easily be separated from that context, though I would admit that most people do it incorrectly.
"Worse is better" is, ultimately, an argument against perfectionism. Many of the features of Unix could have been implemented in a "better" way, and these ways were known to people working at the time. But it turns out that those "better" options are much more difficult to implement, harder to get right and are ultimately counter-productive to the goal of delivering software that works. We can set up clear, logical arguments as to why doing things the Unix way is worse than doing things another way (e.g. how Lisp Machines would do it), but it turns out that the Unix approach is just more effective. Basically, although we can invent aesthetic or philosophical standards of correctness for programs, actually trying to follow these in the real world is dangerous (beyond a certain point, anyway).
I think that's pretty similar to the OP's argument that, whilst Haskell is clearly a superior language to Java in many respects, writing code properly in Haskell is much harder than doing so in Java because, probably for entirely cultural reasons, a programmer working with Haskell feels a greater need to write the "correct" program rather than the one that just works. Java gives the programmer an excuse to abandon perfectionism, producing code that is "worse" but an outcome that is "better".
I think I know what you're getting at, which is that a comparison between Unix and the monstrous IDE-generated Java bloatware described in the OP is insulting to Unix. On this you are correct. But for "worse is better" to be meaningful, there still has to be some recognition that, yes, Unix really is worse than the ideal. Unix isn't the best thing that could ever possibly exist, it's just the best thing that the people at the time could build, and nobody has ever come up with a better alternative.
I do not agree. "Worse is better" emphasizes on simplicity - and as example, the emphasis on separation of concerns by building components that do one thing and do it well. It's actually easier to design monolithic systems, than it is to build independent components that are interconnected. Unix itself suffered because at places it made compromises to its philosophy - it's a good thing that Plan9 exists, with some of the concepts ending in Unix anyway (e.g. the procfs comes from Plan9). And again, simplicity is not the same thing as easiness.
> Haskell is clearly a superior language to Java in many respects, writing code properly in Haskell is much harder than doing so in Java
I do not agree on your assessment. Haskell is harder to write because ALL the concepts involved are extremely unfamiliar to everybody. Java is learned in school. Java is everywhere. Developers are exposed to Java or Java-like languages.
OOP and class-based design, including all the design patterns in the gang of four, seem easy to you or to most people, because we've been exposed to them ever since we started to learn programming.
Haskell is also great, but it is not clearly superior to Java. That's another point I disagree on, the jury is still out on that one - as language choice is important, but it's less important than everything else combined (libraries, tools, ecosystem and so on).
I think Worse is Better can be used by either side. You seem to be on the "Worse" side, ie. the UNIX/C/Java side, and claim the moral of WIB to be that perfect is the enemy of good. That's a perfectly fair argument.
However, on the "Better" side, ie. the LISP/Haskell side, the moral of WIB is that time-to-market is hugely important. It's not that the "Better" side was bogged-down in philosophical nuance and was chasing an unattainable perfectionism; it's that their solutions took a bit longer to implement. For example, according to Wikipedia C came out in '72 and Scheme came out in '75. Scheme is clearly influenced by philosophy and perfectionism, but it's also a solid language with clear goals.
The problem is that Scheme and C were both trying to solve the 'decent high-level language' problem, but since C came out first, fewer people cared about Scheme when it eventually came out. In the mean time they'd moved on to tackling the 'null pointer dereference in C problem', the 'buffer overflow in C' problem, the 'unterminated strings in C' problem, and so on. Even though Scheme doesn't have these problems, it also doesn't solve them "in C", so it was too difficult to switch to.
Of course, this is a massive simplification and there have been many other high level languages before and since, but it illustrates the other side of the argument: if your system solves a problem, people will work around far more crappiness than you might think.
More modern examples are Web apps (especially in the early days), Flash, Silverlight, etc. and possibly the Web itself.
> The problem is that Scheme and C were both trying to solve the 'decent high-level language' problem, but since C came out first, fewer people cared about Scheme when it eventually came out. In the mean time they'd moved on to tackling the 'null pointer dereference in C problem', the 'buffer overflow in C' problem, the 'unterminated strings in C' problem, and so on. Even though Scheme doesn't have these problems, it also doesn't solve them "in C", so it was too difficult to switch to.
C is quite odd in that the programmer is expected to pay dearly for their mistakes, rather than be protected from them. BTW it wouldn't be as much fun if they were protected.
Regarding Scheme, it has withstood the test of nearly forty years very well.
C is unique because it's really easy to mentally compile C code into assembler. Scheme is more "magical".
The more I learn about assembler, the more I appreciate how C deals with dirty work like calling conventions, register allocation, and computing struct member offsets, while still giving you control of the machine.
On the other hand, some processor primitives like carry bits are annoyingly absent from the C language.
> For example... should I use map or reduce here, or maybe an iterator, or... oh fuck it, I'll use a for loop.
Use the least powerful tool that solves the problem. A map is less powerful than a fold (reduce), so use that if you can. A fold is probably less powerful than a for-loop, so use that if you can. An iterator is probably less powerful than a for-loop, so use that if you need the more streamlined accessing of elements rather than a fine-grained, indexed loop.
But I know how to write a for loop in 15 different languages, and all of those other solutions work differently! If I have a problem that I can solve with a for loop, I should probably just use a for loop and move on to bigger problems, rather than remind myself exactly how iterators work in $LANGUAGE.
But I know how to write a GOTO in 15 different machine code dialects, all of those other solutions work differently! If I can solve with a GOTO, I should probably just use a GOTO and move on to bigger problems, rather than remind myself exactly how loops work in $LANGUAGE.
Goto has a problem, it breaks the structure of the code. For does not have this problem. Thus, you should avoid goto, and there is no reason to avoid for. The fact that you knows how to write both changes what?
Actually Goto doesn't have a problem; And it wouldn't matter in modern languages that deny access to that low-level instruction.
In basic you can use your labels and goto statements to create a hierarchy of dispatching routines. And if you're very disciplined about which global variables you use as flags to control the execution flow, i.e. which goto statement gets selected next, and which variables to hold return values, you could write a decent program.
The danger lies in that it allows the programmer too much range to produce low-quality code. The negative effect includes too much attention to the implementation of abstractions, instead of its usage. The existence of the goto statements can undermine other abstractions offered by the programming language.
Goto doesn't break structure of the code, programmers do. (I guess that's the whole reason we stopped using it: reducing the risk of making crappy software)
Certainly. Use of goto to implement structured programming is structured programming, but if you're implementing control flow structures that are provided by your language anyway why are you bothering to use goto? The result will be slightly less readable and slightly less maintainable. There remain a few places where a commonly used language doesn't implement the control structure that we'd want to use and goto can be a reasonable choice - the most common example is using goto to implement (limited) exception handling in C.
The point Dijkstra was trying to make is that humans are inherently incapable of dealing with that kind of detailed complexity, and still reliably make useful programs. That's why he proposed that goto should be excluded from all higher-level programming languages.
In my comment structured programming refers to using structured syntax to generate goto statements, so you don't have to see them or implement them yourself. It should free the programmer of considering those alternative ways of controlling program flow. Presence of goto statement points to a flaw in the language design.
To answer your question:
Because the basic language I'm referring to, was on my TI84-plus calculator, and it only had an if-statement (no if-else!).
"The point Dijkstra was trying to make is that humans are inherently incapable of dealing with that kind of detailed complexity, and still reliably make useful programs. That's why he proposed that goto should be excluded from all higher-level programming languages."
There is a very simple isomorphism between each of the typical control structures (sequencing, choice, and iteration) and its implementation with gotos. It's an easy mechanical translation, in either direction. I don't think Dijkstra was making any claim that spelling these control structures with goto radically increased the difficulty of programming. The important thing was using reasonable control structures (and only reasonable control structures) in the design of your program. Obviously, having the language do it for you is preferred much like any other mechanical translation - but that's not the key point.
"It should free the programmer of considering those alternative ways of controlling program flow. Presence of goto statement points to a flaw in the language design."
I don't disagree with any of that.
"Because the basic language I'm referring to, was on my TI84-plus calculator, and it only had an if-statement (no if-else!)."
That's still an example of using goto to implement missing control structures, not using goto when the control structure you want is present.
The point is that different languages have different features and even those with similar features may place different emphasis on which to use. By writing code for the 'lowest common denominator' like this, you're missing the advantages of whichever language you're using.
The most obvious symptom would be with libraries; even though GOTOs, loops, recursion and morphisms are equivalent, if most libraries expose APIs in a different style than your code, you'll have to sprinkle translation code all over the place.
It also makes a difference for static analysis, eg. in IDEs and linters. For example, Iterators might give you more precise type hints or unhandled-exception warnings, which makes that style safer (all else being equal).
Of course, the other point is that there are no such 'lowest common denominators'. GOTO certainly isn't, since Python, Java, etc. don't support them. For loops aren't, since languages like Prolog, Haskell, Erlang and Maude don't support them. Recursion isn't supported in COBOL or OpenCL, and took several decades to appear in FORTRAN. Morphisms require first-order functions, which rules out FORTRAN and C, and until very recently C++ and Java. It may or may not be possible to build such features from the other ones, but even so that's clearly working against the language and causing massive incompatibility with everyone else.
Clinging to particular features like this will only blinker us to the possibilities which are out there. In this case, clinging to for loops implies avoidance of at least Erlang, Haskell and Prolog. These languages have a lot to offer, and are/look-to-become the go-to solutions* in the domains of concurrent, high-confidence and logical/deductive code respectively. Clinging to inappropriate concepts dooms us to 'reinventing the square wheel'.
There's actually something quite liberating about languages that deny you any clever solutions. Just write code that works and don't worry about whether you could have used <fashionable programming concept X> instead. I found Go to be quite refreshing for the same reason - the standard library is pretty small and the language lacks anything particularly magical, even some stuff that Java has like generics. The end result, however, is code that is very easy to read and write for anyone who learned imperative programming in the last 30 years.