> I'm curious - do you have ANY idea what it costs to have humans write 100,000 lines of code???
I'll bite - I can write you an unoptimised C compiler that emits assembly for $20k, and it won't be 100k lines of code (maybe 15k, the last time I did this?).
It won't take me a week, though.
I think this project is a good frame of reference and matches my experience - vibing with AI is sometimes more expensive than doing it myself, and always results in much more code than necessary.
Does it support x64, x8664, arm64 and riscv? (sorry, just trolling - we don't know the quality of backend other than x8664 which is supposed to be able to build bootable linux.)
> I can write you an unoptimised C compiler that emits assembly for $20k
You may be willing to sell your work at that price, but that’s not the market rate, to put it very mildly. Even 10 times that would be seriously lowballing in the realm of contract work, regardless of whether it’s “optimised” or not (most software isn’t).
> Deal. I'll pay you IF you can achieve the same level of performance. Heck, I'll double it.
> You must provide the entire git history with small commits.
> I won't be holding my breath.
Sure; I do this often (I operate as a company because I am a contractor) - money to be held in escrow, all the usual contracts, etc.
It's a big risk for you, though - the level of performance isn't stated in the linked article so a parser in Python is probably sufficient.
TCC, which has in the past compiled bootable Linux images, was only around 15k LoC in C!
For reference, for a engraved-in-stone spec, producing a command-line program (i.e. no tech stack other than a programming language with the standard library), a coder could reasonably produce +5000LoC per week.
Adding the necessary extensions to support booting isn't much either, because the 16-bit stuff can be done just the same as CC did it - shell out to GCC (thereby not needing many of the extensions).
Are you *really* sure that a simple C compiler will cost more than 4 weeks f/time to do? It takes 4 weeks or so in C, are you really sure it will take longer if I switch to (for example) Python?
> the level of performance isn't stated in the linked article so a parser in Python is probably sufficient.
No, you'll have to match the performance of the actual code, regardless of what happens to be written in the article. It is a C compiler written in Rust.
Obviously. Your games reveal your malign intent.
EDIT: And good LORD. Who writes a C compiler in python. Do you know any other languages?!?
> No, you'll have to match the performance of the actual code, regardless of what is in the article. It is a C compiler written in Rust.
Look, it's clear that you don't hire s/ware developers very much - your specs are vague and open to interpretation, and it's also clear that I do get hired often, because I pointed out that your spec isn't clear.
As far as "playing games" goes, I'm not allowing you to change your single-sentence spec which, very importantly, has "must match performance", which I shall interpret to as "performance of emitted code" and not "performance of compiler".
> Your games reveal your intent.
It should be obvious to you by know that I've done this sort of thing before. The last C compiler I wrote was 95% compliant with the (at the time, new) C99 standard, and came to around 7000LoC - 8000LoC of C89.
> EDIT: And good LORD. Who writes a C compiler in python. Do you know any other languages?!?
Many. The last language I implemented (in C99) took about two weeks after hours (so, maybe 40 hours total?), was interpreted, and was a dialect of Lisp. It's probably somewhere on Github still, and that was (IIRC) only around 2000LoC.
What you appear to not know (maybe you're new to C) is that C was specifically designed for ease of implementation.
1. It was designed to be quick and easy to implement.
2. The extensions in GCC to allow building bootable Linux images are minimal, TBH.
3. The actual 16-bit emission necessary for booting was not done by CC, but by shelling out to GCC.
4. The 100kLoC does not include the tests; it used the GCC tests.
I mean, this isn't arcane and obscure knowledge, you know. You can search the net right now and find 100s of undergrad CS projects where they implement enough of C to compile many compliant existing programs.
I'm wondering; what languages did you write an implementation for? Any that you designed and then implemented?
So you are not willing to put $20k in escrow for, as per your offer:
>>>> Deal. I'll pay you IF you can achieve the same level of performance. Heck, I'll double it.
I just noticed now that you actually offered double. I will do it. This is my real name, my contact details are not hard to find.
I will do it, with emitted binaries performing as well as or better than the binaries emitted by CC.
Put your $40k into a recognised South African escrow service (I've used a few in the past, but I'd rather you choose one so you don't accuse me of being some sort of African scammer).
Because I am engaged in a 6+ hours/day gig right now, I cannot do it f/time until my current gig is completed (and they are paying me directly, not via escrow, so I am not going to jeopardise that).
I can however do a few hours each day, and collect my payment of $40k only once the kernel image boots in about the same time that the CC kernel image boots.
> Yes, we all took the compilers class in college. Those of us who went to college, that is.
If you knew that, why on earth would you assume that implementing a C compiler is at all a complex task?
> Naw. I got him to reveal himself, which was the whole point.
Reveal myself as ... a contractor agreeing to your bid?
> It's amazing what you can get people to do.
There's a ton of money now floating around in pursuit of "proving" how cost-efficient LLM coding is.
I'm sure they can spare you the $40k to put into escrow?
After all, if I don't deliver, then the AI booster community gets a huge win - highly respected ex-FAANG staff engineer with 30 years of verified dev experience could not match the cost efficiency of Claude Code.
I am taking you up on your original offer: $40k for a C compiler that does exactly what the CCC program in the video does.
No, you're overestimating how complex it is to write an unoptimized C compiler. C is (in the grand scheme of things) a very simple language to implement a compiler for.
The rate probably goes up if you ask for more and more standards (C11, C17, C23...) but it's still a lot easier than compilers for almost any other popular language.
This is very much a John Brown claim that will in the end, kill the OP. I'd rather have the OP using LLM powered code review tools to add their experience to that AI generated compiler.
That feels like Silicon-Valley-centric point of view. Plus who would really spend $20k in building any C compiler today in the actual landscape of software?
All that this is saying is that license laundering of a code-base is now $20k away through automated processes, at least if the original code base is fully available. Well, with current state-of-the-art you’ll actually end up with a code-base which is not as good as the original, but that’s it.
You wouldn’t pay a human to write 100k LOC. Or at least you shouldn’t. You’d pay a human to write a working useful compiler that isn’t riddled with copyright issues.
If you didn’t care about copying code, usefulness, or correctness you could probably get a human to whip you up a C compiler for a lot less than $20k.
In fact it is. And can be useful. IF you have quality controls in place, so the code has a reasonable quality, the LOC will correlate with amount of functionality and/or complexity. Is a good metric? No. Can be used just like that to compare arbitrary code bases, absolutely no!
As a seasoned manager, I have an idea how long a feature should take, both in implementing effort and longness of code. I hace to know it, is my everyday work.
As an informal measure of the complexity of the code sure 100k lines are inherently more complex than 10k because there’s just more there to look at. And if you are assuming that 2 projects were made by competent teams, saying that one application is 10k LOC and one is 1 million might be useful as a heuristic for number of man hours spent.
But I can write a 100k LOC compiler where 90k lines are for making error messages look pixel perfect on 10 different operating systems. Or where 90k lines are useless layers upon layers of indirection. That doesn’t mean that someone is willing to pay more for it.
AI frequently does exactly that kind of thing.
So saying my AI made a 100k LOC program that does X, and then comparing the cost to a 100k LOC program written by a human is a nonsense comparison. The only thing that matters is to compare it to how much a company would pay a human to produce a program capable of the same output.
In this case the program is commercially useless. Literally of zero monetary value, so no company would pay any money for it. Therefore there’s nothing to compare it to.
That’s not to say it’s not an interesting and useful experiment. Or that things can’t be different in the future.
Without questioning the LOC metric itself, I'll propose a different problem: LOC for human and AI projects are not necessarily comparable for judging their complexity.
For a human, writing 100k LOC to do something that might only really need 15k would be a bit surprising and unexpected - a human would probably reconsider what they were doing well before they typed 100k LOC. Where-as, an AI doesn't necessarily have that concern - it can just keep generating code and doesn't care how long it will take so it doesn't have the same practical pressure to produce concise code.
The result is that while for large enough human-written programs there's probably an average "density" they reach in relation of LOC vs. complexity of the original problem, AI-generated programs probably average out at an entirely different "density" number.
"I'm curious - do you have ANY idea what it costs to have humans write 100,000 lines of code???"
which any reasonable reading would take to mean "paid-by-line", which we all know doesn't happen. Otherwise, I could type out 30,000 lines of gibberish and take my fat paycheck.
Certainly tcc. Probably also rui314's chibicc as it's relatively popular. sdcc is likely in there as well. Among numerous others that are either proprietary or not as well known.
Well, if these humans can cheat by taking whatever needed degree of liberty in copycat attitude to fit in the budget, I guess that a simple `git clone https://gcc.gnu.org/git/gcc.git SomeLocalDir` is as close to $0 as one can hope to either reach. And it would end up being far more functional and reliable. But I get that big-corp overlords and their wanna-match-KPI minions will prefer an "clean-roomed" code base.
Yep. Building a working C compiler that compiles Linux is an impossible task for all but the top 1% of developers. And the ones that could do it have better things to do, plus they’d want a lot more than 20K for the trouble.
What's so hard about it? Compiler construction is well researched topic and taught in the universities. I made toy language compiler as a student. May be I'm underestimating this task, but I think that I can build some simple C compiler which will output trivial assembly. Given my salary of $2500, that would probably take me around a year, so that's pretty close LoL.
Everybody talks as Linux is the most difficult thing to compile in the world. The reality is that linux is well written and designed with portability with crappy compilers in mind from the beginning.
Also, the booting part, as stated some times, is discutable.
The reality is you can build Linux with gcc and clang. And that’s it. Years ago you could use Intel’s icc compiler, but that stopped being supported. Let’s stop pretending it’s an undergrad project.
It's a bit more nuanced. You can build a simple compiler without too many issues. But once you want it to do optimisations, flow control protection, good and fast register allocation, inling, autovectoriasation, etc. that's going to take a multiples of the original time.
Some of the hardest parts of the compiler are optimization and clear error handling/reporting. If you forego those - because you're testing against a codebase that is already free of things that break compilation and have no particular performance requirements for the generated code - it's a substantially simpler task.
Making a basic C compiler, without much error/warn detection and/or optimizations, is as a matter if fact no so difficult. In many Universities is a semester project for 2 to 3 students.
I’m not. I’ve been working with C on and off for 30 years. Linux requires GNU extensions beyond standard C. Once you get the basics done, there’s still a lot more work to do. Compiling a trivial program might work. But you’ll hit an edge case or 50 in the millions of lines in Linux.
I also should’ve qualified my message with “in 2 weeks”, or even “in 2 months.” Given more time it’s obviously possible for more people.
Interesting, why impossible? We studied compiler construction at uni. I might have to dig out a few books, but I’m confident I could write one. I can’t imagine anyone on my course of 120 nerds being unable to do this.
You are underestimating the complexity of the task so do other people on the thread. It's not trivial to implement a working C compiler very much so to implement the one that proves its worth by successfully compiling one of the largest open-source code repositories ever, which btw is not even a plain ISO C dialect.
You thought your course mates would be able to write a C compiler that builds the Linux?
Huh. Interesting. Like the other guy pointed out, compiler classes often get students to write toy C compilers. I think a lot of students don't understand the meaning of the word "toy". I think this thread is FULL of people like that.
I took a compilers course 30 years ago. I have near zero confidence anyone (including myself) could do it. The final project was some sort of toy language for programming robots with an API we were given. Lots of yacc, bison, etc.
If it helps, I did a PhD in computer science and went to plenty of seminars on languages, fuzz testing compilers, reviewed for conferences like PLDI. I’m not an expert but I think I know enough to say - this is conceptually within reach if a PITA.
Hey! I built a Lego technic car once 20 years ago. I am fully confident that I can build an actual road worthy electric vehicle. It's just a couple of edge cases and a bit bigger right? /s
That's really helpful, actually, as you may be able to give me some other ideas for projects.
So, things you don't think I or my coursemates could do include writing a C compiler that builds a Linux kernel.
What else do you think we couldn't do? I ask because there are various projects I'll probably get to at some point.
Things on that list include (a) writing an OS microkernel and some of the other components of an OS. Don't know how far I'll take it, but certainly a working microkernel for one machine, if I have time I'll build most of the stack up to a window manager. (b) implementing an LLM training and inference stack. I don't know how close to the metal I'd go, I've done some low level CUDA a long time ago when it was very new and low-level, depends on time. I'll probably start the LLM stuff pretty soon as I'm keen to learn.
Are these also impossible? What other things would you add to the impossible list?
Building a microkernel based OS feels feasible because it’s actually quite open ended. An “OS” could be anything from single user DOS to a full blown Unix implementation, with plenty in between.
Amiga OS is basically a microkernel and that was built 40 years ago. There are also many other examples, like Minix. Do I think most people could build a full microkernel based mini Unix? No. But they could get “something” working that would qualify as an OS.
On the other hand, there are not many C compilers that build Linux. There are many implementations of C compilers, however. The goal of “build Linux” is much more specific.
Have you ever seen Tsoding youtube channel? I’m sure Mr Zosin can very much do it in one week. And considering russian salaries, it will be like an order of magnitude cheaper.
Do you think this was guided by a low quality Anthropic developer?
You can give a developer the GCC test suite and have them build the compiler backwards, which is how this was done. They literally brute forced it, most developers can brute force. It also literally uses GCC in the background... Maybe try reading the article.
I'm curious - do you have ANY idea what it costs to have humans write 100,000 lines of code???
You should look it up. :)