Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Leveraging the type system to avoid mistakes (beyondthelines.net)
167 points by m1245 on May 12, 2018 | hide | past | favorite | 116 comments


Something that I enjoy very much in Scala is sealed traits + case classes.

  sealed trait Food
  case class FishTaco(size: FishSize) extends Food
  case class Burger(numPatties: Int) extends Food
  case class Spaghetti(sauce: PastaSauce) extends Food

It’s like Enums + Data Classes on steroids. You can pass around and transform your data as usual, but if you forgot to handle a specific case when doing pattern matching you’ll get a compile warning. For example in the following case:

  input match {
    case Burger(patties) => ...
    case Spaghetti(sauce) => ...
  }
The compiler will happily yell at you something like:

  warning: match may not be exhaustive.
  It would fail on the following input: FishTaco(_)

Little things such as this one help reasoning about your program since you can rely on the types describing what is possible and what is not.


If you like that kinda thing you should check out F# as well. The IDE experience/speed is much better for F# IMHO, but I have a lot of respect for the concepts behind both. And of course they ride on completely separate massive ecosystems, so you can't exactly swap them out willy-nilly..

I wish I could figure out why my IDE experience with Scala and IntelliJ(IDEA) was so darn sluggish. Also hoping they jump on the language server protocol bandwagon in the future.


Yep, these are called "discriminated unions" in F#. I think most functional programming languages have something similar.

The underlying concept is "algebraic data types": You can compose types as products (tuples or records) and sums (discriminated unions) of simpler types. From there, you can actually use algebra on types, which is pretty mind-blowing. For example, we can define a generic sum type called Either:

    Either a b = Left a | Right b
You can think of Either as the sum of types "a" and "b": a + b. Tuples are product types, so you can use the distributive property of multiplication over addition to establish that:

    (a, Either b c) = Either (a, b) (a, c)
Or algebraically:

    a * (b + c) = (a * b) + (a * c)
This is one of those areas where category theory is actually useful for programmers.


Depending on when you were using Scala, it could have to do with abuse of implicits; back in like 2012/2013ish there we a bit of a craze with using implicits everywhere which would make any sort of IDEs job incredibly tough and compilation painfully slow. I write Scala in Emacs at the moment so I can't speak to intellij but Emacs certainly isn't the fastest editor around and I've yet to run into major issues.

The core team also finally realized how important tooling is and recently dedicated a team (a bunch of teams?) to working on language server based intelllisense which is nice.

I'd highly encourage anybody who last used Scala a while ago and didn't enjoy it to give it another shot at some point in the future. Lots of improvements over the past few years (and some really cool stuff coming up with Scala 3) make Scala IMO easily one the best languages around.


It was within the past 6 months and I was up-to-date on IDEA and Scala. I was running it on OSX though, so that may have had something to do with it.. It's hard to recall, but I may have run this by another team member and he didn't think it was unusual. Perhaps Microsoft has just spoiled me with C#, F#, and TypeScript tooling :|


Firstly, IDEA's default JVM settings are a bit conservative. The default 750MB maximum heap size is too low for any non-trivial Scala project. Here's someone who attempts to explain all settings: https://github.com/FoxxMD/intellij-jvm-options-explained

Secondly, IDEA's SBT plugin works pretty well, use it if that's not the case yet, and put a reasonable .jvmopts file in your project. SBT can still be slow and there no magic cure for that (even though it's constantly improving). Some of my coworkers use bloop instead: https://scalacenter.github.io/bloop/

That said, Scala is indeed jumping on the LSP bandwagon, and it's already supported by Dotty out of the box.


I want desperately to start moving a legacy C# codebase to F#. The problem is that I'm currently the only one that understands the C# codebase, other developers on the project struggle to understand the JS front-end, and flail on contact with the back-end. I'm at severe risk of making myself an Expert Beginner indispensable piece, but I don't know how to move forward with what I have to work with.


Based on what you’re saying, adding greater understanding to the current codebase is problem #1, which you’ll need to solve before going to F#. Language migrations require two things, at minimum:

[1] A really firm understanding of the business logic the code is performing (which can really be simplified to, do you have a deep understanding of “the business”)

[2] An understanding of the edge cases.

#2 is often only captured in the code, and these regressions end up being the thing that cause lots of bumps along the way with business teams. Knowledge gets lost so easily, or silo’d in distant parts of the business, that the code really becomes the “living history” of the business.

I don’t have a lot of good solutions for you. Migrating languages is tough business, but I’d potentially think about starting a “lunch and learn” with some colleagues to do high level code walkthroughs. This could help to distribute knowledge, and build a better understanding of what you already have. The process of prepping can help you to also document what it might take to do the migration, so you can better make the case to any superiors you need to get buy in from.

Good luck!


My own migration of a 7+ years C# codebase to F# (in 2013) began when I decided, belatedly, that I'm just not smart enough to pair a deep OO hierarchy with multicore parallelism without having my vacations spoiled with troubleshooting. The first thing I did was take a tiny part of the code, read the C# data into F# records and used F#'s Array.Parallel.map for the multicore crunching. Like magic, the intermittent threading bugs I had with that small part disappeared, so I kept going. I hadn't planned to rewrite everything in F#, but I had so much fun with it, and grew to love the language so much, that 18 months later, that's what had happened.

I'd recommend starting by nibbling around the edges: find small pain points where F#'s union types, immutability, pattern-matching, option types, active patterns, etc, make some small part of the software easier to understand and maintain. And then if you like how that goes, don't stop.


In the spirit of leveraging the type system, you could try doing some type-first development. If possible, carve a logical module or sub-system out of your existing code base and then attempt to represent the data and functions for that module in F# types. Don't implement any of the functions, just see if you can describe the data and operations using the type system. Tomas Petricek describes this approach really nicely in his post "Why type first development matters".

This has a the following benefits:

  - It helps you review and clarify what the existing code does.
  - It will force you to name the data and operations explicitly.
  - You won't be distracted by actually implementing the operations.
  - You have a succinct description of the module that you can discuss with other developers.
  - It divides the migration into manageable chunks.
You could challenge yourself to educate team members, through the type system.

You may also find some of the techniques from the book "Working Effectively with Legacy Code" useful.

Updated: Removed links as comment was not posting.


That looks similar to Rust's enums. The compiler enforces exhaustiveness in pattern matching.

  enum WebEvent {
    // An `enum` may either be `unit-like`,
    PageLoad,
    PageUnload,
    // like tuple structs,
    KeyPress(char),
    Paste(String),
    // or like structures.
    Click { x: i64, y: i64 },
  }


OCaml was a big influence on the design of Rust.


> It’s like Enums + Data Classes on steroids.

Also known as algebraic data types, which have been around for decades and can be found in Ml/OCaml, Haskell, and Rust.


Types can definitely help with data well-formedness but personally I'm very suspicious of 'mini-types' like Name and CartID. If the data you're wrapping has no internal structure (that can be formally verified) and you're just using the type system as an argument checker then it's not clear to me that you're really adding much value. If CartID and ProductID are both just strings what's to stop a user from doing the same as:

  CartID cartId = new CartID(productId.toString());
  addToCart(..., cartId, ...); <-- but look that's really a ProductID!
I'd suggest that while you may have reduced the likelihood of certain errors you have not actually eliminated those errors.

The right approach here is to be very suspicious of any methods in your domain that just take a bunch of IDs. Either:

(a) Force the caller to resolve the IDs to actual entities before invoking the method:

  addToCart(Cart c, Product p, ...)
or (b) Represent the method parameters as a Parameters object or, preferrably, an Event object.

  handle(ProductAddedEvent event);
The real problem here is not the type system but a domain that is not sufficiently protected by an Anti-Corruption Layer [1]. Unfortunately most developers are not familiar with this kind of strategic design and languages are not much help here. It would be interesting if there was a language that could flag things like weak ACLs.

[1] https://docs.microsoft.com/en-us/azure/architecture/patterns...


This line of code is very suspicious:

    CartID cartId = new CartID(productId.toString());
This line of code looks perfectly fine:

    addToCart(..., productId, ...);
There is tremendous benefit to making wrong code obviously wrong.

Not to mention that even if you do not have formally verifiable structure, you can almost always do sanity checks.


You can get very far by just containing unsafety. In this example, you shouldn't need to sprinkle `new CardID` calls throughout your whole application. You will only use them when deserializing values from a typeless source (e.g. AJAX response), and that part should be contained to some AjaxService.


I think that this is a smoking gun where the user has purposefully violated the typing that's been introduced to stop errors. It's like altering tests so your code passes - basically sabotage.


> new CartID(productID.toString());

With good data modeling and safety checks, this should fail: a product id and cart id should have some sort of distinguishing syntax (eg "P1234" vs "C1234") and so `new CartID("P1234")` will fail.

At some point, no sorts of type system can stop incorrectly implemented/specified logic. An even simpler example would be just passing some constant, valid, but not semantically correct string to the CartID constructor.

Your (a) and (b) solutions don't solve this. I could just as easily create a meaningless Cart object, or a meaningless ProductAddedEvent the same way you constructed that meaningless CartID.


As I said, if your data actually has internal structure that can be verified then the Type is meaningful. (Though that sort of prefixing introduces its own problems and I think most IDs just end up being UUIDs or URIs or numbers.) Otherwise you're just wrapping strings. The point here is that a Type represents data and behavior or, more formally, Types represent constraints over a space of data and behavior. Types that don't actually do any constraining are useless.

> I could just as easily create a meaningless Cart object, or a meaningless ProductAddedEvent the same way you constructed that meaningless CartID.

The problem here is not mixing up IDs. The problem is actually guaranteeing that the pre-conditions of addToCart() have been satisfied. Here, for example, addToCart() probably wants more than just a Cart and Product object, but a Cart and Product objects that have been persisted. Presumably the domain has been constrained so that you can't create meaningless objects. Otherwise what's the point of any object? If the only valid Cart and Product objects are those that can be retrieved from a persistent repository then, yes, this not only solves the trivial problem of mixing up IDs, but also the more valuable underlying problem that addToCart() is constrained to act on real, valid (according to the domain) Cart and Product objects.

In other words, this isn't really an issue of language design. And while there's a tendency to believe the language can solve every problems and that design patterns are a sign of language "weakness" the reality is developers that really want to solve these problems will have to think carefully about their domain and apply the appropriate design patterns.


>Types that don't actually do any constraining are useless.

Exactly!

I can make CartId/ProductId/WhateverId types that are just wrappers around String. But that's not actually using the type system to enforce correct behavior. Nothing in the type system prevents me from violating the invariants of my application.

I would only actually be using the type system, if my types were actually constrained so that only valid CartIds, that refer to real Carts, could be created. Then I would actually have a useful invariant, which the type system guarantees cannot be violated. (inasmuch as the validation step when creating CartIds is correct)


But aren't you then constrained until the end of time to ensure that cart ids and product ids are entirely disjoint? That's way too coupled for my little brain.


In some languages, you can ensure that no `.toString()` equivalent is available. Then, your ID is truly opaque and can only be used in the prescribed ways.


> (a) Force the caller to resolve the IDs to actual entities before invoking the method:

This doesn't really fix the potential problem you pointed out. All it turns it into is this:

> Cart cart = lookupCart(productId);

With the popularity of auto increment, that above code will likely return a valid cart.


I came to say the same thing. I appreciated the blog post but the example was overly contrived. The addToCart method should have been passed the objects the ids were representing and should not have been responsible (even via delegation) for resolving the ids.


I would say the unsafe thing here is `new CartID`, which allows you to turn an arbitrary string into a `CartID`. That shouldn't be possible outside of a small, encapsulated module somewhere, which exposes only safe usages (e.g. wrappers around database calls).

One way to do this is via a module system, like in Racket, ML, Haskell, etc. I'm not sure if Java et al. allow private constructors, but if so we could use those and provide the safe wrappers as static methods.

An alternative/complementary approach is to make `CartID` abstract, e.g. an existential type in ML or an interface in Java. This way our code is forced to be generic, working for any possible implementation of `CartID`. We specialise it to some sort of `StringCartID` implementation once, at the top-level.


This. Doing actions against an id is a shortcut. Deserialise the object from the db so you can make changes and verify integrity before serialization.

Also allows you to leverage the type of object your playing with.

No need for CartId types if your passing around Cart types. Which you probably already have.


I'd go further than saying "doing actions against an id is a shortcut". I say it's an antipattern. Ids should only ever exist at the edge of your system: When you're taking serialized data in and turning it into objects, or you're taking objects and turning them into serialized data out. That serialization should be handled completely separately from your application logic; you shouldn't see (non-opaque) ids in your application logic.


Another interesting and, retrospectively, obvious idea I recently read about: Have a class 'CleartextPassword' around, which overrides all string-serialization as either a sequence of *s, or a hashcode of the password. Suddenly the type system prevents security issues.


Scott Wlaschin, of fsharp for fun and profit, has written/presented[1] quite a bit about utilizing the type system to prevent invalid states. Diff lang, but sim concepts. It was an epiphany to me and a concept I have at least tried to bring back to the ALGOL side where possible.

[1]https://fsharpforfunandprofit.com/series/designing-with-type...


This reminds me of an article by Foursquare about how they use Scala Phantom Types to ensure MongoDB queries are semantically well formed:

https://engineering.foursquare.com/going-rogue-part-2-phanto...


I've though would be nice if languages had a 'secure' type qualifier which prevents leakage when something goes out of scope. AKA, compiler will issue code to zero out the data in memory once it's out of scope.

Extending that, passing a secure type to an insecure function should result in a warning at least.

   log_printf( password.cleartext()); // error

   log_printf( password.safehash()); // okay


Such real-world concerns can and should be modeled using types.

Your comment reminded me of taint mode in Perl. https://perldoc.perl.org/perlsec.html#Taint-mode


Not sure if types will force the compiler to do things like constant time comparisons and overwrite sensitive information as it goes out of scope.

I think Java, C#, and C/C++ the compiler will decide that code to clear memory just before it goes out of scope has no side effects and optimize that code away.


You can definitely rig up the erasure to happen in a destructor (C++) or Drop implementation (Rust). Convincing the compiler not to optimize it out is tricky, but many cryptographic libraries include that functionality. The c11 standard has a memset_s function with the right semantics, but there are other ways to implement it.

Re: constant time operations, I don't know of any language or system that does this currently. But a while back I was kicking around the idea with a friend and we came to a design that I'm pretty sure would work. Never got around to implementation though; so many projects, so little time.


We can absolutely use types to ensure that our algorithms are constant time, e.g.

https://www.usenix.org/conference/usenixsecurity16/technical...

We can use types to enforce the big-O runtime of our functions too, e.g. ensuring that a mergesort implementation runs in O(nlogn) steps: https://www.twanvl.nl/blog/agda/sorting

I have no idea about doing these things in Java/C/C++/C# though; anything we try to enforce can be trivially broken by `NULL` :(


Cool idea. However It’s the DB that usually leaks the password, not the stack*

Assuming we’re talking about Scala and not C.


The issue isn't the stack, but rather an errant printf() or log() call — presumably, something like this would help with the issues that Twitter had recently.

https://www.theverge.com/2018/5/3/17316684/twitter-password-...


I’m pretty sure the Twitter fiasco was some sort of trace logging that logged every request body, and some of those requests happened to contain passwords.

Introducing a separate type for passwords doesn’t solve this issue.


Whatever the type was, it contained a password, and therefor could have been moved into a type that prevented the bug.

Whether it's a password or an "unparsed client data" structure or whatever.


This is going to be even more interesting if the EU keeps pushing privacy laws. Are you even allowed to trace productive unrestricted HTTP calls anymore?

I find this exciting from a problem-solving point of view, because for one, this is potentially a hard, ground-shifting problem. And on the other hand, languages like scala, haskell or rust have the tools to make this kind of requirements simple. There's an interesting time coming.


Exactly. Such a class doesn't secure the entire multi-container / multi-vm setup of a clustered web application, how could it? However, this is a simple layer of defense in depth, since it makes it harder for your application code to accidentally log out passwords.

And in a language with deconstructors, you can remove the secret from memory after it has been used. This in turn reduces the window of vulnerability against memory disclosure attacks, especially in shared virtualized environments.


The other great motivating example I’ve seen is an Amount type parametrised by currency as well as the numerical value, so that the type system can enforce that you don’t add two amounts of different currencies, or that to add, you have to also provide a correctly typed exchange rate value.


I wrote a blog post that goes into a lot more detail into some of these techniques, in case anyone wants to dive deeper

- http://www.lihaoyi.com/post/StrategicScalaStylePracticalType... - http://www.lihaoyi.com/post/StrategicScalaStyleDesigningData...


I suggested a similar trick in Gecko when we were seeing too many bugs in the rendering/async scrolling code because of mistakes when transforming between the different unit spaces:

https://dxr.mozilla.org/mozilla-central/rev/4303d49c53931385...

Transformations between units are also typed so you wont accidentally use the wrong scale factor when transforming.


Working on porting code with time units that are totally unspecified. Sometimes it's seconds, ms, 1/256th of a second or HH:MM:SS.

Just shoot me.


You need F#'s units of measure [1].

[1] https://fsharpforfunandprofit.com/posts/units-of-measure/


The problem with complex type systems is that they force you to spend a lot of your development time thinking about type structures and their relationships (in a universal way) instead of thinking about actual logic (in a more localized way).

For example, when you introduce third party modules into your code, sometimes it's impossible to cleanly reconcile the type structures exposed by the module with the type structures within your own system... You may end up with essentially identical concepts being classified as completely different types/classes/interfaces... It creates an unnecessary impedance mismatch between your logic and the logic exposed by third-party modules.

This type impedance mismatch may discourage developers from using third-party modules altogether. That might explain why dynamically typed languages like JavaScript and Ruby have such rich third-party module ecosystems compared to that of statically typed languages.

Unless there is a single standardized and universally accepted type system which is fully consistent across the minds of all developers of a specific language/platform (for all possible applications), then a type system makes no sense at all; especially a complex one.


Yeah this is almost completely wrong IMHO, could you give some concrete examples of when this happened to you or someone you know?

I'm writing a relatively small haskell app, and though I've had large swaths of time where I had to think about the types I was writing exclusively, I almost always come out of the 30 minutes or so understanding my code, and what I'm trying to do so much better.

Forcing you to spend time thinking about type structures and their relationships is (dare I say) the essence of programming. Programming is in the end about transforming data for some useful end (and some side effects along the way), and no one wants to work with 0s and 1s directly.

Integrating with third party modules in haskell is no more painful than integrating with 3rd party modules in Java, in fact is way way way simpler, given Java's infamous verbosity and love of enterprise patterns (tm).


Not the OP, but I remembered writing something in Haskell, and being extremely frustrated when I chose to model something as a List.NonEmpty, because it wasn't straightforward to pass them around to functions that expected List.


Maybe it's the stockholm syndrome talking, but this sounds like a feature to me -- The type that represents a non-empty list and one that can be empty are distinct. If a function is a mapping of inputs to outputs, then those two functions are (and should be treated as?) distinct, despite the fact that one result space is clearly the subset of another.

The problem is I don't know much about the List.NonEmpty type -- it seems like it would be almost trivial to implement it while maintaining List-compatability, but clearly it's not (and was frustrating to use) so...

Your point is definitely true, some things like that are very frustrating to deal with in Haskell -- especially when you expect something like that to 'just work' and it doesn't.


Haskell does not have OOP style inheritance. If you define a new type, it is impossible to use it where you were expected a value from a concrete old type.

The way around this is typeclasses, which are essentially interfaces. But this would require that the functions you are calling (which may be written by a third party) be written against a generic typeclass instead of the concrete List type. Haskell is improving in this regard, but there is still many places that are written expecting a List. Also, if the function needs any beyond simple iteration, the typeclass it would use would have a scary name like Semigroup (even iteration uses Foldable, which might not be clear to beginners)

There is the IsList typeclass. But the intended use for that is to enable a list-like datatype to be represented with list literals in source code.

Even with all of these difficulties, the worst case scenario is to spam your code with NonEmpty.toList and NonEmpty.fromList (or its safe sister NonEmpty.nonEmpty which returns an error value instead of throwing when given an empty list.


I would not call it a feature. The clean solution would be dependent types so we can model the proposition "that list is not empty" in the type system. It is very neat to have the standard list as your data-type, but accompanying it is a proof that the list has elements. However, dependent types are not mainstream yet and we need a lot more research before that can change.


Java gets away with it because it has a very large core library which defines pretty much any type that a developer could think of. For example, there is even a standard class type for URLs which all good Java developers must agree on. All these standardized types are often unnecessarily restrictive and demand too much attention and global awareness of the environment.

It's often possible to write good code without necessarily understanding everything that is happening above or below a particular unit of logic; so why force the developer to internalize all these rigid concepts?


The larger standard library of languages like Java is indeed a big benefit -- I think I see more where you're coming from now. Haskell does not have a large standard library like other languages, but people usually build on one or two libraries in a given problem space to avoid duplication (checking to see if someone has a lib you can use instead of writing one yourself is pretty natural). In my experience, it's been rare to find two libraries that need to interoperate that defined two different classes for URLs (for example) -- but you're absolutely right, it could be a problem.

The problem would be solved if Haskell were to adopt a large standard library, but I still don't see this as a problem with types, this is more a language/community issue.

Speaking more concretely, I'd consider understanding the types flowing in and out of a function/abstraction that you got from a library table stakes for writing good code. IMO it's rare that you can just import a library, write strictly glue code (without thinking about the concepts) that magically solves all your problems.

For example, lets say you needed to parse a ISO 8601 date string (and it isn't in your standard library) -- in that kind of case you can just pull down a library and run one function and be done. Usually though, you'll need to at least understand the functions inputs (unless the function is defined for every kind of input you could give it), and understand the output...


Those impedance mismatches are equally present in dynamically typed code. The only thing types do is help make it obvious when there are problems with composition.

I work in Haskell with many other Haskell developers and a wide variety of Haskell mindsets. It’s a very complex type system we use a lot of, and so far, the issues you mention are so minor and worth it.


With dynamically languages, the type of an object is not important so you don't need to explicitly typecast external objects to fit into your logic. With dynamically typed languages, if two different objects expose the same functions then you can use them in the same way. The behavior of an object is all that counts.


If two different objects expose the same functions then you can call them in the same way. You have no way of knowing, from the function names and arity, whether their semantics are what you need. Once you start asking that question, you are asking the sort of questions that using types helps answer.


You seem to be talking about nominal types being a problem. Structural subtyping and row types are two ways of getting the benefits you're talking about while still being statically typed.


There is often an I love {Scala|Rust|Erland} post showing an simple type issue and claiming that using a bunch of new syntax would somehow fix it.

<rant>

1. I want a minimum of typed syntax for safety. For example, remember Hungarian Notation? My linter barfs if I try "icCart = ipMyProduct" because of the semantic type issue.

2. I want a full set of primitives. For example, counts (c) are 0 or above, not infinite, can do arithmetic) while ids (i) are positive, not infinite, cannot do arithmetic, cannot assign constant). I want optionals; I want clearer error handling; I want fully declared I/O instead of manual error handling; I want better.

3. I want tallies of how often simple types are confused before changing languages to fix. My type confusions are complicated nested collections or variant records.

</rant>

Typing is a means for correct programming, not an end in itself.


>Typing is a means for correct programming, not an end in itself.

That's not always the case.

https://aphyr.com/posts/342-typing-the-technical-interview


This got me thinking, and I'm curious what a language would look like if basic types weren't directly usable.

In other words, maybe type checking doesn't go far enough: checking that I'm getting an int is less valuable than checking that I'm getting the right sort of data for the domain.


This might be a good time to check out algebraic data types, esp. with a single constructor. For example, in Elm you might declare:

    type Length = Inches Int
Then any function that accepts a Length, you can only pass in a length. This effectively acts like tagged tuples verified at compile time. You can later extend this type to be:

    type Length = Inches Int | Meters Int
and any code that reads length would have to handle both cases (also checked at compile that).

I have found this to be a great way of ensuring all of your types are correct (and sometimes, a great frustration of wishing I didn't have to wrap/unwrap types so often).


F# has units of measure [1], which are very nice for such a case you're describing.

[1] https://fsharpforfunandprofit.com/posts/units-of-measure/


Yeah, that's one of my favorite takeaways from Elm.

I'm thinking of it in an OO style, where perhaps instead of wrapping/unwrapping you're specifying an analog to object properties.


Whenever I play with that idea in my head I get hung up on the question wether one could go even further down that road and drop variable names as a concept separate from types. If you already have a Surname instead of a String and a Birthday instead of a Date, why duplicate that into surname and bday? In a way, variable names are a bit like informal domain subtypes. Could we make them formal?

I'm really not sure if that could be in any way practical at all, but I like the idea.


In some languages you don't give field names, just types. You bind fields to names only when pattern matching. For example in Haskell:

    type Year = Int
    type Month = Int
    type Day = Int

    -- fields have no names, just types
    data Date = Date Year Month Day

    -- we bind fields to names only when necessary
    next (Date y 12 31) = Date (y + 1) 1 1
    next (Date y m d)
      | d == daysOf m = Date y (m + 1) 1
      | otherwise     = Date y m (d + 1)
      where daysOf = (!!) [0, 31, 28, 31, 30,
                              31, 30, 31, 31,
                              30, 31, 30, 31]
Is something like this what you mean?


Thank you, good to know.

Maybe those who don't know Haskell are doomed to reinvent isolated parts of it.


( yes )


That's where my mind was headed as well. You'd still want variables for passing around instances (in a general sense; not specifically an OO sense) but everything else could be defined on the type. Sort of a types-as-classes approach.


There is/was a programming language called Wake that used this idea. The supposed domain is down right now, so this documentation source code is the best reference I can find in a few minutes: https://github.com/MichaelRFairhurst/wake-compiler-docs/blob...


If you did that, in what language would you write the validation code to check if you have a value in the correct domain, or convert between domains?


I think this post [0] points out how Haskell can implement this sort of thing, using DataKinds, and GeneralizedNewTypeDeriving with StandaloneDeriving.

[0] https://lexi-lambda.github.io/blog/2016/06/12/four-months-wi...


See Boost.Units: https://www.boost.org/doc/libs/1_65_0/doc/html/boost_units.h...

It enforces correct units/dimensions at compile time, so you won’t be able to mix up length with time or area, etc.


unsigned int?


I wrote about the same topic in

https://medium.com/sensorfu/using-static-typing-to-protect-a...

Type systems are really good in helping to avoid this problem!


Make illegeal states unrepresentable


I have a system set up in C++ to make it trivial to add new unique kinds of IDs (wrappers around ints) as well as vectors that can be indexed only with that kind of ID. So you can have a ShoeVec containing Shoes that is indexable only by ShoeIDs and a ShirtVec containing Shirts that is indexable only by ShirtIDs. It's zero-overhead in optimized builds, of course. You'd be surprised how rarely you have to convert to or from the underlying type.


Fwiw the Haxe language calls this an "abstract type", and they're very useful.

https://haxe.org/manual/types-abstract.html


Generic types are useful, but still should be called as such, or for the new fad: dependent types. Abstract types would not allow instantiation.


Haxe abstract types are distinct from abstract classes, but I need to learn more about dependent types.


Congrats, you almost have Pascal there.


The suggested type system adds a lot of code and complexity. For the specific case, I suggest we can try a convention-based solution. Let's, eg. establish a convention that all product id variable named with `prod_` prefix and customer id variable named with `cust_` prefix. Then a simple static analyzer filter can serve as the unit test to catch mistakes (include passing in variable names that don't conform the convention). We should recognize that the enforced prefix serves the same role as the type system but at the preprocessing stage rather than the compiler stage. It does not require the programmer to add any extra code; rather it helps one of the difficult problems of naming things. Unlike the type system, it is easy to adopt strong convention (the strongest being a set of fixed names) and then relax the convention as the program evolves. In contrast, the type system often takes the opposite path of from relax to strict with steep increasing cost. Of course, unlike the type system, the enforcement of the convention does not come automatically -- the programmer has to write static analysis code to enforce it -- this is fundamentally not different from writing unit tests. But the static analysis code is easy to write and less likely to have bugs (and bugs have less severe consequences). I use a general purpose preprocessor, MyDef, and every line of my code goes through several places of filtering anyway, so adding a set of static checks seem trivial. But even you don't use a preprocessor, implementing a simple static convention checker (to be enforced at repository check-in) doesn't seem difficult.


> unlike the type system, the enforcement of the convention does not come automatically

This defeats the whole purpose.

> the programmer has to write static analysis code

It is legwork that should be relegated to the computer. And, in fact, has been for a long time. I don't only mean type systems; various linters and checkers work where the compiler proper does not (see valgrind or findbugs). Frankly, a lot of test cases for code can be written automatically; the prime example is Haskell's QuickCheck, but a number of other languages has similar tools, all the way down to fully dynamic ones like Python.


Creating and maintain type system is also leg walk (and I was arguing it is a bigger burden of leg walk). And a Turing complete type system can have Turing complete errors. Let's not pretend we have silver bullet and have a fair discussion


So, you want to implement a second type system using naming conventions and an extra step in the compilation pipeline? In order to reduce complexity?


Types are not implementable as conventions. Use the language, not some failable process. Otherwise we could all write in C using void* as the only type.


One thing that I personally would find useful is a F# like Unit of Measure, kind of type metadata, that allows units of quantities to be explicitly mentioned, and there by help in preventing unit confusions in code, ie passing around values in the wrong kind of units, esp degrees and radians for eg, can often be a difficult mistake to catch.


If you like this, you're probably also going to like Haskell/OCaml/Ada/F#/Coq (ML family of languages?):

I'll leave links for Haskell since the only one of those I've tried and liked personally:

https://www.haskell.org/

http://learnyouahaskell.com/

IMO Haskell is the type system you wanted Java to have, with even more strictness. While it's possible to write unsafe haskell, just about everything you will read, and the language itself encourages you to push as many worries as you can to be solved by the type system.

If that doesn't get you excited, there's actually a saying that is mostly true in my experience writing Haskell -- "If it compiles, it works"(tm).

Also Haskell on the JVM if you want to dip your toes in:

- https://eta-lang.org/

- https://github.com/Frege/frege

[EDIT] - If you try to learn haskell you'll inevitably run into the M word which rhymes with Gonad. The best guide I've ever read is @ http://adit.io/posts/2013-04-17-functors,_applicatives,_and_...

BTW, be ye warned: when you start understanding/getting cozy with the concepts introduced by Haskell you'll wish they were everywhere, and they're not. Languages without proper union types will look dumb to you, easy function composition/points free style will be a daily desire, and monadic composition of a bunch of functions will be frustratingly absent/ugly in other languages, not doing errors-as-values will look ironically error-prone.

[EDIT 2] - More relevant to the actual post, how you would do this in Haskell is using the `type` keyword (type aliasing, e.g. `type Email = String`) or the `newtype` keyword, which creates a new type that is basically identical (what the article talks about). Here's some discussion on newtype:

- http://degoes.net/articles/newtypes-suck (this article is good, it describes the premise, promise, shortcomings, and workarounds, forgive the clickbaity title)

- https://www.reddit.com/r/haskell/comments/4jnwjg/why_use_new...


Although Haskell has a relatively powerful and useful type system, you are perhaps being a little kind to it here. Haskell also has partial functions, including quite a few dangerous examples in the standard library, various mechanisms to escape the type system entirely, and 27 different ways to handle errors, 28 of which look a bit like exceptions but need handling in 29 different ways. And it still can't handle basic record types very elegantly, nor dependent types such as fixed-size arrays/matrices. Haskell's type system does have a lot going for it and it also makes a useful laboratory for experiments with type-based techniques, but as far as safety goes it has never really lived up to the hype, and for all its advanced magic, it is surprisingly lacking in support for some basic features that are widely available in other languages.


You are absolutely right -- there are other ML-family languages that do a better job at avoiding some of the pitfalls (but surely with other tradeoffs). Most of the stuff you've mentioned is actively recognized by the Haskell community and has been somewhat improved:

> Haskell also has partial functions, including quite a few dangerous examples in the standard library,

Partial functions -> https://wiki.haskell.org/Avoiding_partial_functions (introduces the safe library @ https://hackage.haskell.org/package/safe)

> various mechanisms to escape the type system entirely

Type systems are leaky abstractions. While that's just a random assertion with no fact to back it up, I think it is a good idea design wise to account for that possibility, and give people the ability to completely bin the type system when they need to. Most of that stuff is marked with some wording that strongly suggests against use, but does not prohibit (e.g. unsafePerformIO)

> 27 different ways to handle errors

Could you elaborate? I understand this is hyperbole, but I feel like the number in actuality is <= 3... which might make your statement ~90% hyperbole, which is a bit much.

> still can't handle basic record types very elegantly

Compared to which language? Records in Haskell are indeed a PITA to get values out of when they're deeply nested, and you have to destructure or string to gether the accessor functions or something, but I still like it more than other languages which take a looser approach, often leading to runtime errors. If I do a obj.something.somethingElse, that line might blow up at runtime, but in Haskell I have some guarantees/checking that it won't. C#'s `?.` operator is pretty good for accessors when dealing with nullable values as well.

The lens library helps with that, but though it's simple in function, the mathematics around it that are often introduced at the same time complicate things.

> nor dependent types

https://www.reddit.com/r/haskell/comments/8cp2zg/whats_the_s...

> laboratory for experiments with type-based techniques

Haskell is less of a laboratory these days, people are actually using it for production stuff now, and they can't change core pieces of the languages so quickly -- I'm sure people would love to go back and eliminate partial functions from the standard library where reasonable.

Languages like Ada might be better for experimental type stuff?

> never really lived up to the hype

Compared to what? Would you consider Haskell better for type safety than say Java/C#? Because I certainly do (I'd love to be proven wrong as well).

I assume you meant the features you noted above as missing (easy record types, coherent error handling, lack of partial functions) in Haskell but widely available in other languages -- could you note which ones you had in mind? Java is what I've seen used most in enterprise and every function is basically a huge walking partial function -- Haskell's type system (at the very least) seems like a huge step up there. Things got better with 1.8 and the introduction of Optional (and are probably even better these days) but back when I wrote a lot of Java, it didn't seem right to 'poison' all the code with Optional<>s everywhere... Now I know better, I wasn't poisioning the code, I was writing better code but couldn't recognize it at the time.

Contrast this with Java that waited until 1.8 for functions as a first class object.


A nitpick. At top level:

> Haskell/OCaml/Ada/F#/Coq (ML family of languages?)

Then:

> Languages like Ada might be better for experimental type stuff?

It seems to me there's a confusion between Ada [1] and Agda [2], where Agda is what you meant. Ada is also interesting, but in the Pascal/Modula family and although strictly typed the type system is nowhere near as sophisticated as the ML family languages (and here too a big difference between real ML and dependently typed languages like Coq/Galina and Agda).

Among the things I liked in Ada:

- generic packages, a more disciplined way to have static genericity than C++ templates. One can declaratively constrain the input types to a generic package, with clean error codes. C++ concepts will close the gap and maybe be even more powerful, but Ada had this from day 1;

- protected types for mutual access. A higher level mutual exclusion scheme than mutex. Add in Ada 95 IIRC;

- integer subtypes, with ranges and automatic checks.

Still I believe the train has passed, and nowadays rust (with some strong inspiration from the ML family) has more of a chance to make it as the next gen system programming language, although it'll take time.

[1] http://www.adaic.org/ [2] http://wiki.portal.chalmers.se/agda/pmwiki.php


Ahhhh thanks I knew I was mixing them up somewhere -- I think they'd morphed into the same thing in my head.


>> 27 different ways to handle errors

> Could you elaborate? I understand this is hyperbole, but I feel like the number in actuality is <= 3

Not quite 27, but exceptions in Haskell are quite messy. You have:

1) Exceptions. Which can be thrown from pure code, but only caught in IO. And which get thrown when evaluated, which causes a mess with Haskell's lazy evaluation. So the following code is not safe:

    val <- catch (willThrowError) handler
    print val
Because willThrowError may return a thunk that will not throw an error until you attempt to resolve it to a value. This means that it will not actually throw an error until `print val`, at which point it is outside of the catch, so your entire function will error, and your handler will not be called. To be safe, you want to do something like:

    val <- catch (evaluate . force <$> willThrowError) handler
    print val
Which requires the return value of willThrowError to be a support type. With the proper language extensions, it is not difficult to autogenerate the instance needed for a type to work with force, so long as you control the type definition and all the types it relies on supports it. I avoid actual Exceptions like the plauge, so I might be off on some details here.

2) Maybe/MaybeT/Either/EitherT

Arguably 4 methods, but I will concede that they are the same-ish.

3) ExceptT.

Maybe belongs in catagory 2?

4) MonadError

5) MonadFail

I might be missing somne.


Thanks, that covers much of what I had in mind before.

Error handling in Haskell seems to suffer from two fundamental problems.

One is that there is a clash between non-strict evaluation and exceptions, as your first example illustrates. I understand the arguments about why throwing can be done in pure code but catching needs to be in IO in Haskell, but those reasons are consequence of other design decisions in Haskell. The asymmetry is awkward, and it feels like you can never quite trust pure functions to do what you expect. It's a little like the long-standing criticisms of exceptions in C++, except that at least in C++ idioms have evolved to compensate, whereas shoving everything behind one or another of the `catch` variants in Haskell feels very unidiomatic to me.

The other fundamental problem is that the different techniques for representing potential failure don't compose very elegantly. Given that flexible composition is arguably the biggest selling point of being lazy in the first place, that is unfortunate. The elegance gained from a language like Haskell in expressing tidy, pure code loses its shine a little if you then have to wrap a lot of real world code that calls out to other libraries to do things inside 17 levels of monad transformers, forced evaluations and catches.

Of course I'm not suggesting that these are new or insurmountable problems or that no-one can write reasonably safe code in Haskell or anything silly like that. But I do think some advocacy, as if the type system is borderline magical, can be rather idealistic, and sometimes overlooks the warts that also arise as a direct result of the decisions that made the language safer and more elegant in other respects.


Hey thanks for clarifying -- Maybe I've just been a victim too long, but those actually don't seem like distinct ways. As far as why you can throw an exception in pure code but only catch it in monad, much ink has been spilled[0]. I won't claim to completely understand the reasoning in that SO post and the linked paper[1], but the reasons for this dichotomy seem to be clear.

2) -> 5) are not ways of handling errors though, they're ways of characterizing/combining monadic computations that CAN error, IMO. In the end, it's still the errors-as-values (`Either SomeException value`) concept, but in combined with the fact that your computation is in a monadic context. Also, maybe I've just been brainwashed by myself to think that way after doing Haskell for this short amount of time.

In practice, how I throw an error from a monadic computation (I generally have never done it from pure code) has been pretty well-defined, I haven't had to go down this particular rabit hole much at all -- ExceptT is what I've used the most (I've done most of my work with Servant[2]). Honestly what I've had problems the most with from my haskell code was dealing with the database or mail servers and tcp sockets and stuff -- stuff that failed at runtime because I put in some bad code Haskell couldn't check before runtime.

That said, monads & monad transformers are indeed a complex concept, and are very confusing to beginners (I'm still not 100% confident in my knowledge of them), so this is definitely a point against Haskell.

And of course your earlier point about unsafe functions in the std lib is as valid as ever.

[0]: https://stackoverflow.com/questions/12335245/why-is-catching...

[1]: https://www.microsoft.com/en-us/research/publication/tacklin...

[2]: https://haskell-servant.readthedocs.io/


My only real complaint about exception handling in Haskell is the interaction with laziness. 90% of my problems would go away if they just made the default catch functions include `evaluate . force` and save the lazy behavior for special "I know what I'm doing" functions. Without that, exceptions are just a loaded gun. Even with making exceptions "safe", I still much prefer the error-as-values approach. Otherwise, every pure function is really some type of ExceptT IO monad. I'm sure this escape hatch is useful to someone, but actually using it is a pretty big code smell.


We basically ban throwing exceptions in pure code at my shop; usually we use ExceptT or throwIO or something.

I used to think the multiplicity of error handling constructs in Haskell was confused and bad, but they all have their uses. The key insight is that your error handling should conform to the local structure of your code, not the other way around.

Propagating errors across multiple component boundaries or multiple layers can be cumbersome but I think that reveals fundamental architectural weaknesses in the program itself.


Yeah this is actually an oft-repeated problem. I haven't really run into problems with enforcing strictness but that just means I haven't written/profiled enough Haskell yet, probably.

Note though that there is the new Strict pragma:

https://ghc.haskell.org/trac/ghc/wiki/StrictPragma

Also, this isn't really a solution to the problem (at best I'm just being an apologist), but you could write rules with hlint or any other code-smell-checking-tool to make sure no one every tries to use the escape hatch(es) you don't like.


Type systems are leaky abstractions. [...] I think it is a good idea design wise to account for that possibility, and give people the ability to completely bin the type system when they need to.

Agreed on both counts, if we're talking about having escape hatches like `unsafePerformIO`. Maybe even `undefined`.

But I grimace a little whenever I see someone criticising the prevalence of runtime problems like, say, null pointer exceptions in Java, and then in practically their next breath advocating a functional programming language with a better type system like Haskell... and giving a neat example that will quite happily compile and then crash at runtime when given an empty list as input because it used a partial function like `head` without checking. This sort of gaping hole in the protection of the type system is less welcome.

Could you elaborate [on 27 different ways to handle errors]?

I think gizmo686 did a nice job of covering this in another reply to your comment, and I added some further thoughts of my own in a reply there.

In our current context, I note in passing that the issue of throwing from pure code when catching needs IO is another somewhat counter-intuitive result that we might instinctively expect the type system to protect against.

Compared to which language [can Haskell not handle basic record types very elegantly]?

Perhaps C, C++, C#, Java, and friends as a starting point? At least the names of members of structs/classes there are used in context and don't clash with the same name used in a different context.

Haskell also has the same challenge as any functional language when it comes to constructing new values from existing ones where there is deep nesting. As you say, lenses can help there, but bring some additional complexity of their own.

Compared to what? Would you consider Haskell better for type safety than say Java/C#?

Yes, certainly. I'd much rather write software that involves complicated logic and needs high reliability or longevity in a language like Haskell than a language like Java, other things being equal.

I'm just wary of putting Haskell up on too high a pedestal, partly to avoid setting expectations too high for those not familiar with it yet, and partly because I think a lot of mainstream languages could benefit from relatively simple improvements in their type systems without necessarily encouraging the "using types for non-trivial things requires five PhDs" stereotype.

I assume you meant the features you noted above as missing (easy record types, coherent error handling, lack of partial functions) in Haskell but widely available in other languages -- could you note which ones you had in mind?

Yes, for example the struct/class member naming mentioned above. Almost every statically typed language I have ever used does records better than Haskell, because Haskell doesn't really support record types at all, it just has a bit of syntactic sugar that creates the illusion of that support if you don't look too closely. (This assumes the basic language without the various relatively recent GHC extensions/proposals, of course.)


> points free style will be a daily desire

Not...really.


That is why they call it pointless style sometimes.

Point free is good for short functions, like 1-3 compositions is fine (maybe little more), but if you overuse it, it's actually much less flexible and you have to rewrite more code if you want to change something.

"Point-free style can (clearly) lead to Obfuscation when used unwisely. As higher-order functions are chained together, it can become harder to mentally infer the types of expressions. The mental cues to an expression's type (explicit function arguments, and the number of arguments) go missing.

Point-free style often times leads to code which is difficult to modify. A function written in a pointfree style may have to be radically changed to make minor changes in functionality. This is because the function becomes more complicated than a composition of lambdas and other functions, and compositions must be changed to application for a pointful function.

Perhaps these are why pointfree style is sometimes (often?) referred to as pointless style. " https://wiki.haskell.org/Pointfree


I believe a lot of similar concepts are explained in Domain Driven Design.


What really put me on to typing JavaScript was this. The ability to avoid unit tests. And frankly what unit test would ever even capture something like this, an implementation issue, (unless you refactored a lib).


> Don’t get me wrong I’m not saying to scrap all your test suites and try to encode all your constraints using the type system instead. Not at all I still firmly believe that tests are useful but in some cases you can leverage the type system to avoid mistakes.

You should still write unit tests. You should also write integration/functional tests that might have caught this error when you tested the code that calls the function in question. The only scenario I can see the argument swapping bug not being caught is when all of the ID arguments equal the same value.


I’m seeing a little bit of a waterbed effect here. The author has replaced the basic type with one that matches it in name. You will never accidentally send CustomerId to CartID because the compiler will fail... but now we’ll nees to dig through the code to find out whether customer ID should be an int, int32, String.... what the heck is it anyway?

In scala can I not just say foo(item1=item1) even if foo takes only 1 arg?


Usually you would lookup the definition of those types just as you do for the function that you are trying to call. At least it’s a good idea to know what a function accepts before passing around your customer IDs ;)

And to answer your last question, yes in Scala you can use named parameters like in your example.


> In scala can I not just say foo(item1=item1) even if foo takes only 1 arg?

Why would you want to do that? That's less succinct than foo(item1). What are you really asking?


The id type parameter is not called phantom type, it's called generic type.


Even though people love bashing Objective-C for its “weird” method syntax, it trivially circumvents these kind of problems.

Also related: Mike Ash’s post about using single-member C structs instead of unitless scalar types: https://www.mikeash.com/pyblog/friday-qa-2013-08-02-type-saf...


Leverage the type system? Type systems exists to allow you catch errors at compile time. That is what it's there to do. It's like saying leveraging the compiler to compile or leveraging the cup to drink water.


The article's example boils down to this pseudocode:

    function foo(s, i) { s * i }
    s = "hello"
    i = 2
    foo(i, s) // BUG. The runtime crashes.
The article advocates adding a type system to detect the bug:

    function foo(String str, Integer int) { s * i }
    String s = "hello"
    Integer i = 2
    foo(i, s) // BUG. The compiler detects the error.
If you're interested in these kinds of errors, then you may also want to consider ideas such as named parameters:

    function foo(text: s, count: i) { s * i }


The problem with named parameters is that you can theoretically still pass the wrong value, and in practice not everyone is using named parameters on every function call.

By relying on the type system you can catch that issue at compile time and not have to worry about it.

We use this technique at work when there’s lots of fields in our data types as it’s nice to be able to keep track of the types during all the transformations and to avoid subtle bugs. Specially as the codebase is constantly changing.


I agree with you.

My opinion is that type checking helps, and so do assertions, borrow checkers, pen testers, etc. The tradeoff IMHO is that these take some time and thought to do well.

Would you be willing to send me an example of your best-case for using a type system? For example some code where you feel the type system really helps you and your team?

I keep a log of these real world areas, for training my teams. Anonymized code is fine. My email is joel@joelparkerhenderson.com. Thank you!


Do you do this for every string, int, float etc..?

Any downsides you run into? I always thought it would be interesting to try it using a type even if it wraps something but never had tried going all in on the idea.

In scala do you not use alias ever? I guess they don't help much? Alias of strings can be used interchangeably I think?


I've found there are situations where primitives are fine and wrapping doesn't add much value, e.g., (using F#)

  let calcCorrelation (symbol1:string) (symbol2:string) = ...
Here, calling the function with misordered arguments won't change the result, so the wrapping\unwrapping friction doesn't have much ROI.

but here, where 'beta' is the slope of a least-squares fitted line:

  let calcBeta (stockSym:string) (indexSym:string) = ... 
that's a mistake waiting to happen, because if you accidentally flip the arguments when you call the function you get a different number. Better to have:

  type StockSymbol = StockSymbol of string
  type IndexSymbol = IndexSymbol of string 
  let calcBeta (IndexSymbol index) (StockSymbol stock) = ...  
then call the function like this:

  let beta = calcBeta (IndexSymbol "SPX") (StockSymbol "AMZN")
Of course you could still make a mistake:

  let beta = calcBeta (IndexSymbol "AMZN") (StockSymbol "SPX") 
but that would stick out like a sore thumb.


We don’t do it for every type, instead on a case by case basis. IDs are usually the obvious ones since we might have a Product class with fields like ProductId, BrandId, List[TagId], SellerId, etc... and it’d be nice to not mess that up.

Another example is if you have some pricing info you could group it in a case class like:

  case class PriceInfo(price: BigDecimal, discount: BigDecimal, currency: Currency)
And then put that into your product class:

  case class Product(id: ProductId, price: PriceInfo, ...)
The nice thing about this is that all these related fields can be dealt with in context, i.e. if a function accepts a PriceInfo then it now has access to the price itself and the currency. Not to mention that a price without currency in an e-commerce app can be quite ambiguous.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: