Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python 3 Types in the Wild (neverworkintheory.org)
68 points by zdw on March 18, 2022 | hide | past | favorite | 112 comments


> stripped annotations out of files and askedPyType to infer them, it failed to do so in 77% of cases [...] MyPy found that only 15% of the 2,678 repositories examined were type-correct; this may be a result of MyPy being very conservative and producing false positives

The paper referenced is a year and a half old and type checkers have fixed/added a lot since then. Also, I'd guess that more projects use Mypy than Pytype and fix errors reported, so that may be another reason more pass Mypy even without annotations (though conservatism does play).


Anybody swears by Pytype? I'm puzzled by it existing between MyPy(static) and unit tests (dynamic).


Wdym "between"? Pytype is another static analyzer.


> which allow developers to say that a function returns Dict[List[Set[FrozenSet[int]]], str]

this type is impossible in Python


This bugged me, too.

You can have a dictionary which maps to strings, you can have a list containing sets of frozensets of ints, but you can’t have a dictionary of unhashable types.


Yep, here's how it plays out:

  Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
  [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
  >>> d=dict()
  >>> l=[1,2,3]
  >>> d[l]=42
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unhashable type: 'list'


https://pypi.org/project/forbiddenfruit/

Apparently there are ways to monkey patch core types, so you could probably add a __hash__ method to list…


While it's impossible for this type to exist in python, will a type checker complain if you write that type out? Does it have the notion that the first type of `Dict` (or just `dict` in Python 3.9+, iirc) must be `Hashable`


Correct. Keys must be immutable and hashable.


Technically keys only have to be hashable, but an object that is hashable but not immutable is not a good idea and in practice the two will go together.


With this type system is it possible to subclass string and int, to enforce types for, let’s say, firstName, postalCode, etc.


yes, but the current way of defining it [1] is not very elegant:

    FirstName  = NewType('FirstName', str)
    PostalCode = NewType('PostalCode', str)

[1] https://docs.python.org/3/library/typing.html#newtype


Yes, simply

    FirstNameType = str
(But don't use ints for postal codes, even if the postal codes are numeric)


Huuum. If I sign a function so it only accepts a FirstNameType, and I use the statement you propose, won’t any string be acceptable as an input parameter, from the type system point of view?


Yeah, that is the syntax for a type alias. You can use NewType to create an actual subclass of the type instead. https://docs.python.org/3/library/typing.html#newtype


In that case, you need a subclass of string that does nothing but has its type as FirstNameType. Wherever you create or return strings that are first names, you'll need to instantiate FirstNameType instead.


> ... even if the postal codes are numeric

Maybe in the US. In Canada, they are 6 characters of letter-digit-letter-digit-letter-digit (in regex: /((?:[A-Z][0-9]){3})/)


And in the future, they could be anything. Finland expanded the allowed characters in person identification numbers to support people born after 2000. Usually bad idea to be too strict in checking fields where people manually enter text.


Does IntelliJ use any kind of static type analysis to help the user autocomplete properly?

[== do we have something like Java???]


Yes. Try pycharm.


Within reason.

I use type annotations solely as means of documenting code, with the added benefit of having autosuggest work all of the time. I am one of those guys who put a lot of comments in their code.


I strictly use type annotations in python as if python was a typed language. This is within reason as typed languages exist.


Not only impossible in python, but also seems conceptually useless (why would you have a set of frozen sets?).


Almost the entire purpose of frozensets was so that you could have sets of frozensets. It's the only way to have nested sets. Frozensets also work as dictionary keys.


This is not useless. Why would you have a list of lists? A list is just an iterable container class, which is what a set is.

If the need exists for a list of lists then it exists for a set of sets.

An example of this is finding all combinations possible non ordered groups of letters given a collection of letters: abadfjiik.

A typical algorithm will produce dupes. But if your accumulator is a set of frozen sets you can produce a result with no dupes.


Ensure uniqueness, i guess? Though you'd lose that when you make the list of sets.

I could maybe see something like this as an intermediary result of parsing some horrible, nested obscure message format (except the dict, that baffles me unless the key-value types are reversed?)


I guarantee something like that is running in production somewhere in the world (maybe without the list part).


Python type annotations are more useful for a human as a form of documentation, than they ever will be to a machine. I think Python completely ignores them at runtime, and tools like pytype or mypy will never capture all the complicated hacks possible in the Python type system (as evidenced by only 15% of repos passing mypy)


Exactly. When I initially started using typed Python, I would write code as though I were using a statically-typed language. In order to appease mypy, I found myself increasingly needing to either rewrite my code in awkward ways or add explicit ignore comments.

What happened is I started losing out on the advantages that Python provides as a dynamically-typed language without truly gaining those of statically-typed ones. Then I realized, type hints are just that: hints. They're a form of documentation. In some cases, they're very helpful, but in other cases, it doesn't really make sense to use them. I don't need to appease mypy. The type hints are their for the benefit of myself and other developers.


There is definitely truth to this statement. However often when I write something that’s hard to type, it’s a code smell. Ex. A complex nested map should probably be represented by a few concrete classes/abstractions.

(Similar to when you something you wrote is hard to test, it’s probably just poorly abstracted code.)


I agree and I think a good thing about the type hints is they help uncover these kinds of code smells. But there are definitely cases where it's definitely not worth the effort to fight mypy.

One example is that variable re-assignment can cause type errors in mypy but re-assigning a variable to a value of a different type is an extremely common and reasonable thing to do in Python. Another situation where strict typing can be borderline hopeless is when parsing e.g. very dynamic content like json. Sure you could encode every possible situation using types but this would basically defeat the purpose of using Python for such a task.


> rewrite my code in awkward ways or add explicit ignore comments.

Probably just used to writing poor quality (immediately unintelligible to someone other than the author but functional) code then.


Using typing in Python is a code smell. Lack of understanding of what the language is for and how to use it.


> Python type annotations are more useful for a human as a form of documentation, than they ever will be to a machine

I mean, machines serve people (for now!). So in addition to type documentation, type annotations are useful for things like "jump to definition" and "find references" and automatic refactoring. Moreover, the utility of annotations for type documentation depends on the correctness of those type annotations--if you don't have something like Mypy, then those inevitably annotations grow outdated over time and incorrect type annotations are even worse than none at all.

> tools like pytype or mypy will never capture all the complicated hacks possible in the Python type system

Technically you can represent any complicated hack if only via `any`, but that's just pedantry and we all know what you mean. Even still, it's not like "well you can't represent everything, so there's no point in trying to appease mypy" (which may not be your intended implication--I can't tell). The benefits of appeasing mypy are proportional to the amount of your code that is properly annotated--this is the thesis of gradual typing.

> as evidenced by only 15% of repos passing mypy

This very likely indicates that 85% of repos aren't running mypy regularly, or they're in some transition state. Certainly more than 15% of real-world Python can be described via Python type annotations.


I've used python typing heavily at work and was really useful to detect bugs and helped with developement.

> as evidenced by only 15% of repos passing mypy

That is no evidence. Of course, if you don't have mypy in CI (or pre-commit hooks or whatever), your repo will not pass mypy. But if you have it in CI, then you can rely on them being correct.

> tools like pytype or mypy will never capture all the complicated hacks possible

But they can either enforce everything being correctly typed (if you go to really strict settings), or yes, you will have some code untyped, so you will be careful around that part. It's not like all or nothing.


Python does ignore them at runtime, but as long as you're validating your inputs (e.g. using Pydantic for HTTP request validation) then you can trust that the static types are truthful


I want to emphasize for folks reading: if you want actual type enforcement and you want that type enforcement with minimum boilerplate, Pydantic is a fantastic option (I'm sure there are others as well). It's a great library that's totally use-agnostic (it's not HTTP specific, you can use it for whatever) and quite tight. Caveats are that it uses runtime validation to enforce types so there's a performance cost, but it's a performance hit that is mostly eclipsed by all the other huge performance hits that litter your typical Python implementation.

Pydantic is amazing!


IIRC, pydantic was quite big and I chose typeguard instead for a smaller project. Might be an option.


> trust that the static types are truthful

That's often a mistake even in typed languages like typescript. Satisfying a static type analyzer isn't enough. Real data is whatever it is going to be. If your types are meaningless at runtime, you can easily encounter type mismatches.

Static typing as implemented in typescript and python type hints is a tool for engineers instead of a tool for systems.


The types are usually ignored, but not entirely erased- Pydantic uses type hints to do validation at runtime, so if you construct a Pydantic object with invalid data it will throw an error, so eg

    foo=PydanticFoo(**request.json())
Will try and validate a json request against the implied schema provided by the type hints on PydanticFoo while constructing it and will throw if it fails.


You missed my point. If you validate input, then the static types are representative of runtime types


Ack, so I did :P


This is only because the Python type hinting ecosystem is so bad. Tons of projects don't have type hints, there are multiple conflicting interpretations of the types, MyPy is very buggy (Pyright is much much better but few people use it).

Compare it to Typescript which is a similar sort of tacked on type hinting system - Typescript is basically always machine checked.


Type annotations are used at runtime in some cases, most notably when defining dataclasses.


I find this a bit of a wart on the dataclass implementation. Type hints (to me) are more like a comment than a statement. So, from my perspective, dataclasses feel like they're pulling data out of nearby comments. Ick.

But it's a wart worth the conciseness, at least, until a better idea comes along.

Maybe if type hints could be optionally enforced as a language feature, I'd feel it was more integrated.


Most static typed compiled languages also ignore types at runtime :)


Not true. Java, C#, Scala, Kotlin, Swift, Dart, and others all keep around type information at runtime so that casts can throw exceptions at runtime when the value isn't of the right type. Otherwise, soundness is lost.

In practice, most modern statically typed languages maintain soundness through a combination of compile time and runtime checking.


> most modern statically typed languages maintain soundness through a combination of compile time and runtime checking.

Rust, C, C++ - all do not keep around this information.

You're right that the JVM languages (and thus, naturally, C#) do it differently. I think it is a bit silly to name out each of the JVM languages separately.


Haskell, Go, and I think OCaml also rely on some amount of runtime checking for soundness.

I think basically every managed statically typed language ends up carrying around runtime representation and uses some amount of runtime checks.


Haskell has full type erasure.


It still has to carry around enough information to detect non-exhaustive matches and throw at runtime, no?


Only in the same way that "if ... else if ... else" needs to know about types, which is to say mostly not.

AFAIK - and this might be either dated or just wrong - the two places GHC doesn't fully erase are 1) polymorphic recursion (where if we squint it's probably not unreasonable to treat the dictionaries as type tags), and 2) when the programmer explicitly asks for run-time type information with Typeable.


But what are you performing the "if else" on? Type level information stored at runtime, no?


Value level information. The compiler statically knows the type that's being matched on. It's true that the generated code depends on the type, but that's not any more true than when a C compiler generates an integer comparison vs a float comparison in the conditional of that "if". All that's looked at at runtime is the data.


Yes, but those aren't types, they're values.


I am unfamiliar with Haskell and Go. What you are saying is not true for OCaml


My understanding is that you can have non-exhaustive matches which the compiler won't always catch statically and will throw a Match_failure at runtime if no case matches.


Non-exhaustive matches are always detected statically in OCaml.

However, such matches are considered valid code: "non-exhaustive match" is a warning by default, even if most people advise to turn that warning into an error.

In other words, the `Match_failure` exception is part of OCaml non-typed semantics and there is no soundness issue involved here.


But how is Match_failure detected at runtime unless type information is passed around?


There is no type information at runtime at all in OCaml. Once the program has been typed, the program is translated into an untyped lambda-calculus and all type information is throw away. In particular, pattern matching is compiled into this untyped lambda-calculus using if/then/else (and switch statement, and exception and exceptions handlers, ...). The type system is sufficient to detect before this compilation if the pattern is total or if it need to add a failure case. In this case, the `Match_failure` is just a catch-all term in this conditional expression.

For instance

  let f x = match x with
  | [] -> ()
is translated into

    (is_empty/267 =
       (function x/269 : int
         (if x/269
           (raise (makeblock 0 (global Match_failure/18!) [0: "r.ml" 1 17]))
           1)))
where you can see the exception being build and raised in the `then` branch of the test. Contrarily, the total function

  let is_empty x = match x with
  | [] -> true
  | _ :: _ -> false
becomes

  (let (is_empty/267 = (function x/269 : int (if x/269 0 1)))
And since the function doesn't need to handle the failure case, it doesn't have that `(raise ...)` case.

Another important point is that type system information is only needed to check the exhautiveness of pattern matching in presence of GADTs. Otherwise the exhautiveness of pattern matching can be checked using syntatic criteria on the pattern matching and type definitions.


Yes, sorry I meant in the context of GADTs. Is it really full type erasure if that GADT information is passed aorund?


No type information is ever passed to a compilation phase lower than the typechecker in the OCaml compiler pipeline.

In the GADTs case, the type system is only used to remove the failure branch. For instance,

    type 'a t = A: int t | B: float t
    let always_a (x:int t) = match x with
    | A -> ()
    | _ -> .
is translated to

  (let (always_a/270 = (function x/272[int] : int 0))
because the typechecker can prove that the `B` case is impossible. In other words, GADTs are yet another instance where the type system can be used to eliminate dead code in the untyped IR.


I see, so you can never really get a match_failure due to type level mismatch with GADT because it would be caught by the compiler.

Makes sense to me.


You know, that's a solid question that I don't know the answer to.


C++ community is split by the RTTI compiler flags


Rust standard library includes RTTI support, though it's very rarely used.


If you make it part of the deploy / build process and force programers to only use the subset of features supported by a type checker then you can benefit.


The first time I tried typing some Python code was a little experimental HTML generator, pure functional style, that used the keyword args dict as a kind of miscellaneous grab-bag to pass functions and data to sub-functions. You set up a dict and called a top-level function with it and then it worked like a kind of template system, but again all through functions & applications of functions rather than "mail merge" of templates, etc.

It was simple, elegant, perfectly cromulent Python code. However, there was no way to specify the type of the dicts because they were so heterogeneous.

- - - -

I have to say, I was a Python fan for ages, used it professionally for over a decade, so this isn't an outsider's opinion: Python is kind of over. It won't die any time soon, but it has been "improved" out of its sweet spot of applicability, and no longer has a compelling story for adoption going forward. Every niche Python is known for serving well now has even better options:

- Rapid Application Development: Nim, Red

- Readability: Go, Nim, D, Zig

- Science: Julia

- Large multi-dev projects: Java, Go

- Systems programming (python glue): D, Rust, Zig

- Web app backends: Erlang/Elixir

And so on...

Really, as much as I still love Python (and I do) the only things I feel it's appropriate for these days are small scripts that need to remain flexible (in other words, that are likely to be modified often), things that are too complex to be a CLI command, but less complex or more dynamic than, say, an email client or RSS reader. If the type domain of the program doesn't change rapidly or unpredictably then you don't need Python's dynamic typing. If you need static typing I think you should use a language that supports that out-of-the-box, not tacked on later.


I think what is missing from all of those languages you suggest is them providing an intersection of those niches.

Yes I am quite sure for any specific niche there is a language that is better suited than Python (though I would argue with a lot of your choices there). But let's say I want a readable language that is good at data science, web backend, and provides a system glue. Python handles that much better than all the others you mentioned.

Also the one choice I am going to explicitly call out is Rust as a good systems glue language. In fact a very common practise I see now is all the good Rust libraries are getting Python bindings and made available on PyPI so all that power of Rust for systems programming can be used in Python as the glue language.


You've got a point in re: Python being good enough for several niches. However, I think the set of people who want to do all those things but don't want to use multiple languages is pretty small, like it's a hobbyists' market, eh? A scientist typically wouldn't write web backend, a sysadmin doesn't do a lot of statistical stuff, etc.

A small startup might do well to make their MVP in Python, but as the code grows the implicit costs (of using Python) do too.

- - - -

In re: Rust, sorry I wasn't clear above. I don't mean that Rust is a glue language, I mean that people write e.g. grep replacements in it and things like that. Python does systems programming by being glue, Rust does it by being, well, Rust. It makes sense to me that Rust libs would get Python wrappers, but it also seems to me that that adds to my argument: Python is good for small glue, but crunchy things (like grep) should be written in e.g. Rust or Go or something.

(I feel I should mention in this connexion my favorite bit of obscure Python trivia: Python was originally meant to be the shell language of the Amoeba distributed OS! https://en.wikipedia.org/wiki/Amoeba_(operating_system) )

- - - -

One other thing about Python is that the packaging & distribution "story" is ridiculous now. The people in charge of that call themselves the Python Packaging Authority (which name, given what they're doing, reminds me of Brazil the movie) and they seem to me to be running amok, cargo-culting the crap out of what should be a pretty simple and straightforward problem. I could go on but I feel a rant brewing, so I'll cut it off there.

It's not just the PyPA folks that are having problems packaging and distributing Python. The Conda folks ship Tkinter in a broken state for five years now: https://github.com/ContinuumIO/anaconda-issues/issues/6833 That's the default GUI system that's in the Python Standard Library.

Compare and contrast with Rust's Cargo, or Nim's Nimble, or Erlang's Rebar, etc.


Nim is a systems programming language [1] just with very decent ergonomics. Interfacing to C/C++ is trivial (just a declaration) and to others (e.g. Rust) is possible. E.g., there is code that interfaces to a custom Linux kernel module for fast file tree walking. [2] { That same github account has several other "systems programming in Nim projects" - low overhead syslogd, color ps/color top, ls, fast symspell/edit distances, a prototype search engine, an alternative to on-disk protobuffers with online random access, etc. There are also a lot over at https://github.com/treeform and surely many others. }

You can also export Nim impls to Python easily, though I very much agree Python's time in the sun is over (and I say this having been a user since ~1993). It now seems more a generator of problems/complexity than of simplifying solutions. Nim can also compile to Javascript letting you do both front- & back-ends in Nim.

Science is an almost unbounded fractal set of human investigations. Nim is usable for science (e.g. ask questions here [3]), but the ecosystem is undeniably much smaller than Julia or Python's. (Julia's is much smaller than Python's, and for some things R can be much bigger, actually.) So, a "depends what you need" caveat just has a much stronger "caveat force" here.

What "large, multi-dev" needs is honestly harder to say/more subjective/maybe manager-subjective. If it is "cheap, just out of school programmers with which to replace expensive, departing mess-makers", I think Java may have always had more plentiful options than Python. ;-) This is closer to a human-organization-complete kind of concern and almost intrinsically "very dependent upon circumstances/context". I just didn't want to ignore it. At least Stefan Salewski [4] thinks Nim is suitable or teaching people introductory programming, though.

[1] https://nim-lang.org/

[2] https://github.com/c-blake/cligen/blob/master/cligen/dents.n...

[3] https://app.element.io/#/room/#nim-science:envs.net

[4] https://ssalewski.de/nimprogramming.html


"No one goes there anymore, it's too crowded."

Silly, Python was always second-best at everything and promoted as a prototyping language from the beginning. But that's more powerful than you've given credit for. You get extreme productivity up front, and a growing number of exit strategies for the fraction of projects that outgrow it.


FWIW, I still use Python as a prototyping language, it's just so flexible and you can model things in it in so many neat ways.

> a growing number of exit strategies for the fraction of projects that outgrow it.

I like that formulation. Cheers!


Your post fits with a conversation I had with someone last week, so let me ask - why would you choose Erlang / Elixir over Python for web backends? I've got my reasons for both camps, but since you point out the above information I'd love to hear your thoughts.


The semantics of the BEAM VM and Erlang Run-Time System (ERTS) are much closer to the typical semantics of a large dynamic web app backend than those of Python. No surprise there, it was designed to support telephone exchanges. It provides facilities natively that have to be "bolted on" to Python.

I might still write a prototype or proof-of-concept web app in Python, but for anything more serious it seems to me that the advantages of Erlang loom large (and that's before you get to Phoenix LiveView: https://www.phoenixframework.org/ )


I wouldn't put golang under readability nor Large multi-dev projects, it fails spectacularly on both counts.


After many years of writing Python code, I just can’t stand the distraction and messiness that Types results in and the appearance of clutter and unnecessary characters and symbols to otherwise neat code.


After many years of writing and maintaining Python code, I respectfully disagree with your statement. The marginal reading overhead that type annotations bring (which can be alleviated by specific color schemes) dwarfs the ability to have a reasonable expectation of the code base trustworthiness.


Also just the ability to understand a code base, exploring libraries in dynamic python is super painful


PyCharm is really good when exploring Python. But this made browsing Python code on Github much more difficult. I've pulled down some project more than once just to get PyCharm's type inference and documentation. With type annotations it makes just reading the file without an IDE much nicer.


A Python code bases trustworthiness depends on it's unit tests. If you are using type hints for that, you have seriously gone wrong.


I'll never understand the desire to bolt types onto Python.

Dynamic typing is not a deficiency, it's a design choice. SICP for example, is filled with small examples of things you can do with dynamic languages you cannot trivially do with static one. There are certain styles of solving problems that suit dynamic languages particularly well, and anyone programming in a dynamic language should understand and embrace these.

There are of course drawbacks to dynamic typing, as is typically the case you exchange predictability for flexibility.

If, for your given problem or preferred style of programming, you want types there are so many fantastic languages around today that have much better type systems than what was available when Python was first gaining popularity.

Programming well in a statically typed language is fundamentally different than programming well in a dynamically typed language. Typing in python will never be powerful enough to allow you to "think in types", and if it were then python really wouldn't be python anymore.


Python is strongly typed though, in addition to being dynamically typed like you said. I think that makes Python a great language for type hints. I know the language won't do an unexpected conversion the way javascript would, yet variables aren't confined to just one type since it's dynamic - and the type hints don't hinder that at all. It's basically just an extra feature in Python for helping lint tools and as documentation for the user.


99+% of Python code is not intentionally using dynamic typing in order to take advantage of wacky SICP algorithms. It's good that Python allows that flexibility, but it's also good that in much more common cases you can indicate that a function argument should be a string rather than an int and have IDEs and linters be aware of that.


The usage of dynamic typing gets rid of a lot of boiler plate code and over complicated designs. Python code that doesn't use typing is generally better than Python code that does.


Would you please give an example of an algorithm or technique you think is particularly poorly suited (or impossible) in a statically typed language?

I’m curious to see how different an equivalent solution might look in a statically typed language and if any features like type classes or row polymorphism close that gap.


Yeah it’s just that Python dominates some ecosystems like machine learning. There isn’t much of a choice, so making these code based more maintainable is a valiant effort


I think the best way to get some perspective on unnecessary clutter versus valuable annotations is to wade into a project that you've never seen before, with many thousands of lines of python.

If you pick a line at random and try to understand what it does or why it's failing, you will have a hard time. Python (by design) is so mega ultra dynamic that each line could conceivably turn the world upside-down. Most code I've seen doesn't take advantage of that dynamism. But still - it's tricky just reasoning about stuff like: what kind of values that call could return?" or "what kind of an operation the index really does here?"


Funnily, I am about to accept a job to semantize some python code where nobody knows exactly what all those float, int, string and arrays in all those functions really correspond to.

[note: i am a java developper :)]


I'm a python developer and within the last couple years I've encountered some interviews where they explicitly told me not to code in python.

Its because it's the language of choice for bad programmers who dont learn languages with any amount of depth. It's getting associated with mediocrity because it's so easy to pick up and low quality coders turn to it.

Your job solidifies this point. Literally, they need to hire someone to type annotate their code base because they don't know how? How low can you go?


> the language of choice for bad programmers

Sure, and Java went through this phase too, and I'm sure so have plenty of other languages. It's the stage at which a language has reached mass adoption and it is inevitable, but it doesn't mean there's anything wrong with the language.


The problem they face is that they need to chain function calls, and spend ages discovering that a given function returns speed in m/s, whereas the next one expects miles/hour as an input. That is just a small example of the issue they face (that could solved by other means than statically typing the Python code), they really want to make things more robust.


This sounds like they had people working on that code before, who had no clear picture of making a consistent interface, even if only by convention. Basically one should always default to standard SI units, or let the type system guarantee it, or in lack of such a system write comments explaining things, such as units and perhaps even give variables appropriate names indicating the unit. Another option would be using libraries, which take care of calculating with units and checking the units given.

Hope you can fix it. However, be prepared for other kinds of issues, which arise when there was not much discipline in writing the code.


who had no clear picture of making a consistent interface, even if only by convention.

There are few conventions that are consistent across domains. As a concrete example I worked on a code base where functions from some libraries wanted angles as radians and the others wanted them as degrees. Both where internally consistent and followed all the relevant conventions for their domain, however when they intersected it was a pain.


Just to open up the discussion, I am looking at the FMI [1] standard as a way to describe the interface of different block codes. And then try to generate the FMI description from static analysis of the Python function definitions.

That’s my current plan of study. But as I said earlier, I have some background in static typing in Java, and I might find the current state of the art in Python not to be adapted to my idea.

[1]: https://en.m.wikipedia.org/wiki/Functional_Mock-up_Interface


Something that seemed common in Erlang, but less so in Python, was to annotate numbers using tuples. If you used both radians and degrees in a single code base, they’d be passed around as named tuples.


> How low can you go?

php?

Sorry, couldn't resist.

On the more serious side, what you're describing about python I feel happened to PHP long ago. I wonder if Ruby is next. Hope not. I'm doing mostly Ruby these days.


Good luck! Adding type hints to Python code that was written without them is extremely difficult.


After many years of python I welcome the change.

Neatness is in the eye of the beholder. Type safety is a quantitative metric of improvement. A program with type checking is more safe then one without.

A quantitative improvement cannot be argued against. A qualitative one... Well, personally I find types to be neater than no types.


Yes, though I'd say subjective vs. objective rather than qualitative vs. quantitative. Type safety is a quality, not a quantity (unless you're counting the expressions that type-check!).


Type safety is a quantity measured in type errors.

How many type errors in a language that is not type checked vs. how many errors in a program that is?

The numerical difference is quantitative. You can therefore do data analysis on this. Though a type safe program should in theory have ZERO type errors during runtime.


We can agree to disagree on that. :) By your metric, I can improve type safety simply by weakening the guarantees of my type system.

Putting it another way -- a type safe program, in the limit, should have an infinite number of type errors. :)


I'm the opposite. I can't stand looking at Python code that does not have type annotations. Type annotations have incredible utility, just visually.


After many years of writing Python code, I absolutely love python types. We've got type checking in our pre-commit and in our CI, and they help our IDEs with completions. I don't think code with type hints in it really looks any different, it's still code.


Python type hints are such a huge boon to the language.

I worked in what I would imagine was a pretty typical mature medium-size python codebase.

Typically, it was never too hard to figure out what a type was. Sometimes types were passed through several layers of functions and had the same short variable name through all of them - quite a few times I had to click around quite a bit to figure out what "f" was. Especially when I could infer it was some kind of file-ish thing but whether it was a file object, a string, or some composite type that held one of the two as a member, or what?! was sometimes difficult to determine, especially if the actual use of the parameter wasn't immediately obvious either.

When we updated to python 3, I started leaving type hints both in old code as I answered these questions for myself and new code as I created what otherwise would've been new questions for my future self. And I noticed two things happened:

* annoyance went down as I fixed the most well-trod of these problems

* I became less averse to using slightly more complicated types

I think python pushes you to keep things really simple and not use custom types unless really necessary. This is, overall, good? I do think Java developers, for instance (of which I consider myself one), generally reach to create datatypes that are basically just a simpler wrapper or a 2-tuple of collections or other primitives, and it can make the code a bit annoying to grok.

But the trouble is, in un-type-hinted python, I already start getting nervous about things like: [('2022-03-18', "something"), ('2022-03-19', "something else")]. And if your data content doesn't make it obvious (or at least somewhat guessable) what it is, it can make it hard to grok in a slightly worse way than the Java code would be.

In python 2 I'd usually make a namedtuple in these situations, but oftentimes I felt a bit weird there because I'd usually reach for it in lightweight situations when I feared they were becoming more complex.

But finally, in python 3, I feel like I'm generally happy with, in this order:

1. just use plain primitive types. no type hints. we all know what's in dates = ['2022-03-18'].

2. just use a type hint. I feel better about a Tuple[str, Dict[int, int]] if it's type hinted than not.

3. Use a namedtuple. This puts names onto the fields. so maybe my Tuple[str, Dict[int, int]] becomes a MyEntry(token: str, settings: Dict[int, int]) or something.

4. use a fully-fledged custom data type class.


I would recommend using `dataclasses.dataclass` over namedtuple. Namedtuples are factories that generate objects with different identities every time you create one and will not behave like you expect when comparing them. With dataclasses you can type-annotate fields as well as generate various special functions just as args to `@dataclass` so you get better safety, immutable objects (when using `frozen=True`), `__eq__` generation, `asdict` to serialize complex objects (recursively!), and a bunch of other great stuff.

I come from a massive hard-realtime system codebase that's mostly in Python and there's a lot of moving parts and complex data types. Type hints, `typing.Protocol`, and `dataclass` are all godsends for having any sort of sane, human parseable structure to the code. Being able to navigate to type definitions with `gd`/ctrl + click is massively helpful.


> On the other hand, MyPy found that only 15% of the 2,678 repositories examined were type-correct

Not surprising. In my experience, upgrading or downgrading Mypy always breaks type-checking on real code bases: older versions because they fail to infer types or have bugs, and newer versions because they are stricter or error on '# type: ignore' that are now useless (but were needed to work around bugs that are now fixed).

And there are 10 Mypy version released over the last 12 months, so projects probably used different versions than the one used by the authors of the paper.

While the article does not mention when they downloaded the repositories, I am guessing they started in August 2019, given section 3.1 (with no specified end date, besides the paper's publication in November 2020). And the single Mypy version they tried (0.770) was released in March 2020.


Python 3 types are very useful, especially when paired with something like PyLance. I started putting them everywhere and that's been a big boon to my productivity when writing Python.

Python is not Perl, it's very strongly typed, has operator overloading and close to no "special syntax" (i.e. Perl's sigils, etc), so it's kinda hard to follow code sometimes. `typing` helps a lot.


I learned Python before it had type hints. What books cover this topic well?


I don't know of any books, but I've found this Pycon talk to be useful: https://www.youtube.com/watch?v=pMgmKJyWKn8



It may be time for a book, but it has been changing so rapidly I'd not want a hardcopy yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: