Copilot (the existing gpt-3 one) definitely helps at writing unit tests. Yeah, sometimes it doesn't nail it, but one thing it can do reliably is to repeat a pattern, and I don't know about you, but my unit tests tend to repeat the same pattern (with some tweaks to test this-or-that-case). Quite often it infers the correct change from the name I gave the test method, but even if it doesn't it'll write a 90% correct case for me. I imagine the GPT-4 version will do more of the same with better results.
It cannot replace reasoning, but it can augment it (by suggesting patterns and implementations from its latent space that I hadn't thought of), and worst case it can replace quite a bit of typing.
Long-term, it remains to be seen how far bigger/better/stronger LLMs can push the illusion of rationality. In many fields, they may be able to simply build their ability to pattern-match to beyond some threshold of usefulness. Perhaps (some subset of) programming will be one of them.
> one thing it can do reliably is repeat a pattern
Isn't this something we've built into every modern language(and arguably the entire point of languages)? If you have multiple pieces of code that share code with tweaks(to test this or that case for example), shouldn't you parameterize the bulk of the code instead of getting autocomplete to parameterize it for you and dump it into your source file multiple times?
Testing best practices have the opposite philosophy for the most part. Avoid abstraction as much as possible. Do repeat yourself. Because a bug in tests is insidious, so you want to minimize that. One of the best ways to minimize bugs is to explicitly avoid abstraction.
That just generated an empty test function in a convenient place for me. I'm not just talking about boilerplate, it's definitely a more... organic-feeling sort of pattern matching. In fact, one of the things I find most interesting about it is the sort of mistakes it makes, like generating wrong field names (as if it simply took a guess). This is the sort of thing that I've grown to expect the deterministic tooling of IDEs to get right, so it always surprises me a bit.
By the same token, often it takes a stab at generating code based on something's name (plus whatever context it's looking at) and does a better job than the IDE could, because the IDE just sees datatypes and code structure. It really does feel like a complementary tool.
Copilot (the existing gpt-3 one) definitely helps at writing unit tests. Yeah, sometimes it doesn't nail it, but one thing it can do reliably is to repeat a pattern, and I don't know about you, but my unit tests tend to repeat the same pattern (with some tweaks to test this-or-that-case). Quite often it infers the correct change from the name I gave the test method, but even if it doesn't it'll write a 90% correct case for me. I imagine the GPT-4 version will do more of the same with better results.
It cannot replace reasoning, but it can augment it (by suggesting patterns and implementations from its latent space that I hadn't thought of), and worst case it can replace quite a bit of typing.
Long-term, it remains to be seen how far bigger/better/stronger LLMs can push the illusion of rationality. In many fields, they may be able to simply build their ability to pattern-match to beyond some threshold of usefulness. Perhaps (some subset of) programming will be one of them.