This is an important moment. We now have verifiable evidence that these systems ...

_delirium · 2025-05-15T00:19:51 1747268391

Genetic programming systems have periodically made improvements to algorithms (dating back decades). Whether LLM-powered GP, which is effectively what this is, will be a step change or an evolution of that is still an open question I think. I'm also a little wary of reading too much into the recursive self-improvement idea, because "the GP system can use GP to improve the GP system itself!" is a very old idea that just has never worked, although I realize that isn't proof that it won't eventually work.

Some related work from a different company: https://sakana.ai/ai-cuda-engineer/

And some academic papers kind of in this space: https://arxiv.org/abs/2206.08896, https://arxiv.org/abs/2302.12170, https://arxiv.org/abs/2401.07102

sebstefan · 2025-05-15T09:19:50 1747300790

It's always "revolutionizing our internal workflows" or "30% of code at Microsoft is AI now" but never improving a codebase you can actually see

Making a significant improvement to the state of the art of one particular algorithm is one thing, but I've seen new tools do that since the 80s

I'll be convinced when LLMs start making valuable pull requests, non-obvious corner cases or non-trivial bugs in mature FOSS projects

antihipocrat · 2025-05-14T23:52:55 1747266775

Is it new? I'm getting mixed messages from the posts here. On one side there is evidence that 48 and 46 multiplication solutions have been known (and could have found themselves in the model training data).

On the other side I see excitement that the singularity is here.

If the latter were the case surely we wouldn't be reading about it in a published paper, we would already know.

dymk · 2025-05-15T02:35:55 1747276555

Let's assume that the 46 multiplication algorithm was known, prior to AlphaEvolve re-discovering it. AlphaEvolve still has made an improvement to a performance critical area that has had likely had thousands of engineer-hours put into it. None of those engineers apparently knew about the improved algorithm, or were able to implement the algorithm. This is empirical evidence of an LLM outperforming its (domain expert) human counterparts.

amelius · 2025-05-15T11:01:47 1747306907

Isn't this like comparing a human historian to Wikipedia though? Of course the knowledge in Wikipedia will in most cases beat the human. However, that's not the kind of thing we're looking for here.

dymk · 2025-05-15T15:35:01 1747323301

I don't think that's quite the comparison we're looking for though, because this wasn't just rote data retrieval. It was the recognition of patterns that humans could have noticed given enough time, but had not. It's more like a system that can make inferences as insightful as a thousand skilled human historians typing away at typewriters, armed with the collective knowledge of Wikipedia, in a short period of time.