Losing the GIL strictly makes the language strictly more flexible. Previous GILe...

gshulegaard · on Oct 17, 2021

Call me optimistically skeptical. I share similar reservations about GIL obsession with the original comment author, but if this is true:

> The overall effect of this change, and a number of others with it, actually boosts single-threaded performance slightly—by around 10%

Then it sounds like having the cake and eating it too (optimism). Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).

comex · on Oct 18, 2021

Perhaps better coverage on LWN:

https://lwn.net/Articles/872869/

The no-GIL version is actually about 8% slower on single-threaded performance than the GIL version, but the author bundled in some unrelated performance improvements that make the no-GIL version overall 10% faster than today's Python.

Ph0X · on Oct 18, 2021

Right, the 20% boost is unrelated to the Gilectomy.

> though, as Guido van Rossum noted, the Python developers could always just take the performance improvements without the concurrency work and be even faster yet.

Why be 10% faster single threaded when you can be 20% faster single threaded!

qeternity · on Oct 18, 2021

This suggests that the unrelated patches improve perf by 20% (0.92 * 1.20 ~= 1.10)

I would love to be proven wrong but I am skeptical.

selcuka · on Oct 18, 2021

That is already explained by the author [1]:

> The resulting interpreter is about 9% faster than the no-GIL proof-of-concept (or ~19% faster than CPython 3.9.0a3). That 9% difference between the “nogil” interpreter and the stripped-down “nogil” interpreter can be thought of as the “cost” of the major GIL-removal changes.

[1] https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...

qeternity · on Oct 18, 2021

Thanks for this.

It’s interesting because I think many people (myself included) would be far more interested in the perf patches than the GILectomy.

guenthert · on Oct 18, 2021

Why? It's not like CPython is a speed daemon. I'd think there are some low hanging fruits, simply because performance is such a low priority for the maintainers. It doesn't even do TCO after all.

vitus · on Oct 18, 2021

Subscriber link from Twitter if you (like me) ran into a paywall:

https://lwn.net/SubscriberLink/872869/0e62bba2db51ec7a/

ignoramous · on Oct 18, 2021

also: https://archive.is/1gLVY

thaumasiotes · on Oct 18, 2021

> Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).

Well, yeah, someone had to make the changes. That's the cost that was paid.

You can get a mass-produced machete that is cheaper and higher-quality than a 7th-century sword. It's easy for one thing to be better than another thing across several dimensions simultaneously. That's why certain technologies go out of use -- they have negative value compared to other technologies. But that has nothing to do with the principle that there's no such thing as a free lunch.

gshulegaard · on Oct 18, 2021

I feel like you aren't well informed on why removing the GIL results in a single-threaded performance hit. And while I think it's always nice to keep in mind the developer effort required, it's not the only cost as GIL removal has been done before (several times, even as far back as Python 1.5 [1]).

The crux of the issue (as I understand it) is that the GIL absolves the Python interpreter of downstream memory access control. You can replace the GIL with memory access controls of various strategies, but the overhead of that access control is just that: overhead. In a multi-threaded program the concurrency gains should outweigh that overhead, but in a single-threaded one it's just extra work that wasn't being done before.

Which brings us back to no free lunch. It turns out that the claim "10%" faster without the GIL is actually a result of Gross (GIL removal author) doing a multitude of unrelated performance improvements. These performance improvements increase performance enough that the performance of single-threaded no GIL code (with overhead) is ~10% higher than today. But as Guido pointed out, the core developers could upstream the performance improvements without the GIL removal:

> To be clear, Sam’s basic approach is a bit slower for single-threaded code, and he admits that. But to sweeten the pot he has also applied a bunch of unrelated speedups that make it faster in general, so that overall it’s always a win. But presumably we could upstream the latter easily, separately from the GIL-freeing part. [2]

[1] https://docs.python.org/3/faq/library.html#can-t-we-get-rid-...

[2] https://lwn.net/ml/python-dev/CAP7+vJJ1hzXiyDwVs6-eXed+DtodH...

thaumasiotes · on Oct 18, 2021

> he has also applied a bunch of unrelated speedups that make it faster in general

Tell me how "faster in general" doesn't make you suspicious about a free lunch.

gshulegaard · on Oct 18, 2021

To be explicit, I was skeptical because I believed that GIL removal requires adding overhead for managing memory access. Having dug a bit deeper into it that seems confirmed. The proposed GIL removal strategy _is slower for single-threaded code_ like other solutions before it. It turns out the reported performance increase was the result of orthogonal performance improvements overshadowing the overhead of GIL removal.

Put another way, if the performance improvements were upstreamed without removing the GIL the resulting performance increase would be ~20% instead of just ~10%. Which is what Guido was getting at in the quote I cited. Assuming the benchmarks to be true for the moment, this means that removing the GIL on this PoC branch is a 10% performance hit to single-threaded workloads.

atoav · on Oct 18, 2021

> "there is not such thing as a free lunch" (skepticism).

When you carry a heavy suitcase filled with lead and you drop it, things get lighter for free. You paid for it by carrying the damn thing around with you for the whole time.

xiaodai · on Oct 18, 2021

GIL will improve performance of multi-threaded code but the issue with Python performance is single-threaded code and its rich object system.

Can't see Python getting there unless we go to Python 4 which, given the fiasco that was Python 2->3 is probably never gonna happen.

Might as well wait for Julia to improve its TTFP then to hope for a Python 4.

native_samples · on Oct 18, 2021

Well, CPython probably won't ever get there. But Python as a language maybe could.

The GraalPython implementation of Python 3 is built on the JVM, which is a fully thread safe high performance runtime, and Graal/Truffle provide support for speculation on many things. For pure Python it provides a 5-7x speedup already and the implementation is not really mature. Although at the moment they're working on compatibility, in future it might be possible to speculatively remove GIL locks because you have support for things like forcing JITd code to a safepoint and discarding it, if you want to change the basic semantics of the language.

kaba0 · on Oct 18, 2021

How does it relate to PyPy? I read that the latter uses tracing JIT, while GraalPython builds on truffle’s AST-based one, that basically maps the JVM’s primitive structures to Python’s and thus making use of all the man-hours that went into the JVM’s development.

But last time I checked, pypy had much better performance than Graal, even though TruffleJS (javascript interpreter built on the same model as graalpython) has comparable performance to the v8 engine for long running code. Though the latter is the most actively developed truffle language, let me add that.

salawat · on Oct 18, 2021

This is different from Jython, correct?

native_samples · on Oct 18, 2021

It's sort of taking Jython's implementation approach to a much greater extreme, and bypassing bytecode, so it isn't limited by the Java semantics anymore.

It resolves a few big problems Jython had:

- GraalPython is Python 3, not Python 2

- It can use native extensions that plug into the CPython interpreter like NumPy, SciPy etc. The C code is itself virtualized and compiled by the JVM!

salawat · on Oct 18, 2021

Neat! Now that I've gotta check out!

aeturnum · on Oct 17, 2021

Yah, that's definitely the future I'm hoping for. What I am worried about are the kind of transition issues I mentioned. Python 2 -> 3 strictly made the language more flexible too - but the Python ecosystem is about existing code almost more than the language and I worry that we could find similar problems here. Potential for plenty of growing pains while chasing relatively small gains.

ynik · on Oct 17, 2021

In the company I'm working for, we had to spent more engineer time on GIL workarounds (dealing with the extra complexity caused by multiprocessing, e.g. patching C++ libraries to put all their state into shared memory) than we needed for the Python 2 -> 3 migration. And we've only managed to parallelize less than half of our workload so far.

Even if this will be a major breaking change to Python, it'll be worth it for us.

birdyrooster · on Oct 18, 2021

Python needs to be compiled into machine language to ever have a chance of competing on speed. We can already get around the GIL with multiprocess but Python is still to slow even when not bound by copying memory between processes.

saghm · on Oct 18, 2021

The phrase "competing on speed" begs the question "competing...with what?" If the answer is "machine compiled languages", then yes, it's unlikely Python will ever match their speed without also being compiled to machine code, but there are plenty of other interpreted languages with better performance than Python (even ruling out stuff like Java that technically isn't "compiled into machine language" in the way that phrase usually would mean); lots of work is done on JavaScript interpreters to improve performance, and I don't think that specifically has cost the language much flexibility.

acomjean · on Oct 18, 2021

I use python. I don’t love it but it has a good selection of libraries for what I do. It’s not blazing fast but not terribly slow either.

As for multiprocess, I currently have 150 python process running on the work cluster. Each doing their bit of a large task. The heavy lifting is in a python library but it’s C code. but it’s actually not bad performance wise and frankly wasn’t to bad to code up. I think for my use case threads would make it harder.

Maybe Im liking python more over time..

pjmlp · on Oct 18, 2021

Java is technically compiled into machine language, it is a matter to chose the JDK that offers such options, many people don't, but that is their problem, not lack of options.

JavaScript interpreters that people actually use, have a JIT inbox.

nextaccountic · on Oct 22, 2021

> other interpreted languages

Both Java and Javascript are ultimately compiled into machine code, through JIT. And this matters because Python doesn't have JIT.

Even Ruby, that is historically a way slower language, is gaining JIT nowadays. Python has got no excuses.

Ph0X · on Oct 18, 2021

I don't think the goal is to "compete on speed", but I'm sure people wouldn't complain about their Python scripts running 15x faster on their 16 core CPU.

And it is also about flexibility. What I love about Python is the simplicity, and let's be honest, multiprocess anything but. Especially if you fall into one of the gotchas (unpickable data for example).

birdyrooster · on Oct 18, 2021

Multiprocess is quite easy, have you tried aio_multiprocess?