Language agnostic development may be quite viable within a year or so.
I doubt that very much, GPT4 (to my knowledge still the best LLM) is far from being there. As (my) initial hype is overcome, I have basically stopped using it because I have to “help” it too much (and it got really worse over time…) so that I spent more time to get any usable results from it, instead of just writing the goddamn code myself.
There has to be a very large step in progress, that this is anywhere feasible (maybe that’s true for some “boilerplate” react UI code though). You have to have in mind, that you should still review all the code which takes a good chunk of the time (especially if it’s full with issues as it is with LLMs). Often I go over it and think yes, this is ok, and then I check it out in more detail and find a lot of issues that cost me more time compared to writing the code myself in the first place.
I have actually fed GPT4 a lot of natural language instructions to write code, and it was kind of a disaster, I have to try that again with more code instructions, as I think it’s better to just provide an LLM the code directly, if it will really get smart enough it will understand the intentions of the code without comments (as it has seen a lot of code).
Context size is also a bigger issue, the LLM just doesn’t have as much overview over the code and the relevant details (I need to try out the 32k GPT4 model though and feed it more code of the architecture, this may help, but is obviously a lot of work…)
Same for humans, if your code is really too complex, you can likely simplify it, such that humans can read it without comments.
If not, it falls for me in the first category I’ve listed (complex math or similar). And then of course comments make sense for a complex piece of code that may need more context.
I would only add comments otherwise for edgecases and ideas (e.g. TODO).
For the rest a good API doc (javadoc, rustdoc etc.) is more than enough (if it’s clear what a function should do and the function is written in a modular way, it should be easy to read the code IMHO.
Really if you need comments, think about the code first, is it the simplest approach? Can I make it more readable? I feel like I have written a lot of “unreadable” (or too complex) code in my junior years…
What otherwise makes sense for me is a high level description of the architecture.
There’s a world of difference between using ChatGPT and something like Copilot within a mature codebase.
Once a few of the Copilot roadmap features are added, I suspect you’ll be seeing yet another leap forward.
Too many commenting on this subject focus in on where the tech is at today without appropriately considering the jump from where it was at a year ago versus today and what that means for next year or the year after.
I’m mostly using ChatGPT4, because I don’t use vscode (helix), and as far as I could see it from colleagues, the current Copilot(X) is not helpful at all…
I’m describing the problem (context etc.), maybe paste some code there, and hope that it gets what I mean, when it doesn’t (which seems to be rather often), I’ll try to help it with the context it hasn’t gotten, but it very often fails, unless the code stuff is rather simple (i.e. boilerplaty).
But even if I want the GPT4 to generate a bunch of boilerplate, it introduces something like // repeat this 20 times in between the code that it should actually generate, and even if I tell it multiple times that it should generate the exact code, it fails pretty much all the time, also with increased context size via the API, so that it should actually be able to do it in one go, the gpt4-0314 model (via the API) seems to be a bit better here.
I’m absolutely interested where this leads, and I’m the first that monitors all the changes, but right now it slows me down, rather than really helping me. Copilot may be interesting in the future, but right now it’s dumb as fu… I’m not writing boilerplaty code, it’s rather complex stuff, and it fails catastrophically there, I don’t see that this will change in the near future. GPT4 got dumber over the course of the last half year, it was certainly better at the beginning. I can remember being rather impressed by it, but now meh…
It’s good for natural language stuff though, but not really for novel creative stuff in code (I’m doing most stuff in Rust btw.).
But GPT5 will be interesting. I doubt, that I’ll really profit from it for code related stuff (maybe GPT6 then or so), but we’ll see… All the other developments in that space are also quite interesting. So when it’s actually viable to train or constrain your own LLM on your own bigger codebase, such that it really gets the details, and gives actual helpful suggestions, (e.g. something like the recent CodeLlama release) this stuff may be more interesting for actual coding.
I’m not even letting it generate comments (e.g. above functions) because it’s kinda like this currently (figurative, more fancy but wordy, and not really helpful)
I can’t disagree with your colleagues more, and suppose that perhaps they are reporting experiences in a fresh codebase or early on in its release.
With a mature codebase, it feeds a lot of that in as context, and so suggestions match your naming conventions, style, etc.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
BTW, if you want more milage out of ChatGPT, I would actually encourage it to be extremely verbose with comments. You can always strip them out later, but the way generative models work, the things it generates along the way impact where it ends up. There’s a whole technique around having it work through problems in detailed thoughts called “chain of thought prompting” and you’ll probably have much better results instructing it to work through what needs to be done in a comment preceding its activity writing the code than just having it write the code.
And yes, I’m particularly excited to see where the Llama models go, especially as edge hardware is increasingly tailored for AI workloads over the next few years.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
Maybe I should try it again, I doubt thought that it really helps me, I’m a fast typer, and I don’t like to be interrupted by something wrong all the time (or not really useful) when I have a creative phase (a good LSP like rust-analyzer seems to be a sweet spot I think). And something like copilot seems to just confuse me all the time, either by showing plain wrong stuff, or something like: what does it want? ahh makes sense -> why this way, that way is better (then writing instead how I would’ve done it), so I’ll just skip that part for more complex stuff at least.
But it would be interesting how it may look like with code that’s a little bit less exotic/living on the edge of the language. Like typical frontend or backend stuff.
In what context are you using it, that it provides good results?
I would actually encourage it to be extremely verbose with comments
Yeah I don’t know, I’m not writing the code to feed it to an LLM, I like to write it for humans, with good function doc (for humans), I hope that an LLM is smart enough at some day to get the context. And that may be soon enough, but til then, I don’t see a real benefit of LLMs for code (other than (imprecise) boilerplate generators).
I doubt that very much, GPT4 (to my knowledge still the best LLM) is far from being there. As (my) initial hype is overcome, I have basically stopped using it because I have to “help” it too much (and it got really worse over time…) so that I spent more time to get any usable results from it, instead of just writing the goddamn code myself. There has to be a very large step in progress, that this is anywhere feasible (maybe that’s true for some “boilerplate” react UI code though). You have to have in mind, that you should still review all the code which takes a good chunk of the time (especially if it’s full with issues as it is with LLMs). Often I go over it and think yes, this is ok, and then I check it out in more detail and find a lot of issues that cost me more time compared to writing the code myself in the first place.
I have actually fed GPT4 a lot of natural language instructions to write code, and it was kind of a disaster, I have to try that again with more code instructions, as I think it’s better to just provide an LLM the code directly, if it will really get smart enough it will understand the intentions of the code without comments (as it has seen a lot of code).
Context size is also a bigger issue, the LLM just doesn’t have as much overview over the code and the relevant details (I need to try out the 32k GPT4 model though and feed it more code of the architecture, this may help, but is obviously a lot of work…)
Same for humans, if your code is really too complex, you can likely simplify it, such that humans can read it without comments. If not, it falls for me in the first category I’ve listed (complex math or similar). And then of course comments make sense for a complex piece of code that may need more context. I would only add comments otherwise for edgecases and ideas (e.g.
TODO
).For the rest a good API doc (javadoc, rustdoc etc.) is more than enough (if it’s clear what a function should do and the function is written in a modular way, it should be easy to read the code IMHO.
Really if you need comments, think about the code first, is it the simplest approach? Can I make it more readable? I feel like I have written a lot of “unreadable” (or too complex) code in my junior years…
What otherwise makes sense for me is a high level description of the architecture.
How were you feeding it?
There’s a world of difference between using ChatGPT and something like Copilot within a mature codebase.
Once a few of the Copilot roadmap features are added, I suspect you’ll be seeing yet another leap forward.
Too many commenting on this subject focus in on where the tech is at today without appropriately considering the jump from where it was at a year ago versus today and what that means for next year or the year after.
I’m mostly using ChatGPT4, because I don’t use vscode (helix), and as far as I could see it from colleagues, the current Copilot(X) is not helpful at all…
I’m describing the problem (context etc.), maybe paste some code there, and hope that it gets what I mean, when it doesn’t (which seems to be rather often), I’ll try to help it with the context it hasn’t gotten, but it very often fails, unless the code stuff is rather simple (i.e. boilerplaty). But even if I want the GPT4 to generate a bunch of boilerplate, it introduces something like
// repeat this 20 times
in between the code that it should actually generate, and even if I tell it multiple times that it should generate the exact code, it fails pretty much all the time, also with increased context size via the API, so that it should actually be able to do it in one go, thegpt4-0314
model (via the API) seems to be a bit better here.I’m absolutely interested where this leads, and I’m the first that monitors all the changes, but right now it slows me down, rather than really helping me. Copilot may be interesting in the future, but right now it’s dumb as fu… I’m not writing boilerplaty code, it’s rather complex stuff, and it fails catastrophically there, I don’t see that this will change in the near future. GPT4 got dumber over the course of the last half year, it was certainly better at the beginning. I can remember being rather impressed by it, but now meh…
It’s good for natural language stuff though, but not really for novel creative stuff in code (I’m doing most stuff in Rust btw.).
But GPT5 will be interesting. I doubt, that I’ll really profit from it for code related stuff (maybe GPT6 then or so), but we’ll see… All the other developments in that space are also quite interesting. So when it’s actually viable to train or constrain your own LLM on your own bigger codebase, such that it really gets the details, and gives actual helpful suggestions, (e.g. something like the recent CodeLlama release) this stuff may be more interesting for actual coding.
I’m not even letting it generate comments (e.g. above functions) because it’s kinda like this currently (figurative, more fancy but wordy, and not really helpful)
// this variable is of type int let a = 8;
I can’t disagree with your colleagues more, and suppose that perhaps they are reporting experiences in a fresh codebase or early on in its release.
With a mature codebase, it feeds a lot of that in as context, and so suggestions match your naming conventions, style, etc.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
BTW, if you want more milage out of ChatGPT, I would actually encourage it to be extremely verbose with comments. You can always strip them out later, but the way generative models work, the things it generates along the way impact where it ends up. There’s a whole technique around having it work through problems in detailed thoughts called “chain of thought prompting” and you’ll probably have much better results instructing it to work through what needs to be done in a comment preceding its activity writing the code than just having it write the code.
And yes, I’m particularly excited to see where the Llama models go, especially as edge hardware is increasingly tailored for AI workloads over the next few years.
Maybe I should try it again, I doubt thought that it really helps me, I’m a fast typer, and I don’t like to be interrupted by something wrong all the time (or not really useful) when I have a creative phase (a good LSP like rust-analyzer seems to be a sweet spot I think). And something like copilot seems to just confuse me all the time, either by showing plain wrong stuff, or something like: what does it want? ahh makes sense -> why this way, that way is better (then writing instead how I would’ve done it), so I’ll just skip that part for more complex stuff at least.
But it would be interesting how it may look like with code that’s a little bit less exotic/living on the edge of the language. Like typical frontend or backend stuff.
In what context are you using it, that it provides good results?
Yeah I don’t know, I’m not writing the code to feed it to an LLM, I like to write it for humans, with good function doc (for humans), I hope that an LLM is smart enough at some day to get the context. And that may be soon enough, but til then, I don’t see a real benefit of LLMs for code (other than (imprecise) boilerplate generators).