I think the main barriers are context length (useful context. GPT-4o has “128k context” but it’s mostly sensitive to the beginning and end of the context and blurry in the middle. This is consistent with other LLMs), and just data not really existing. How many large scale, well written, well maintained projects are really out there? Orders of magnitude less than there are examples of “how to split a string in bash” or “how to set up validation in spring boot”. We might “get there”, but it’ll take a whole lot of well written projects first, written by real humans, maybe with the help of AI here and there. Unless, that is, we build it with the ability to somehow learn and understand faster than humans.
I don’t know, some of these guys have acccess to a LOT of code, and even more debate about what those good codebases entail.
I think the other issue is more relevant. Even 128K tokens is not enough for something really big, and the memory and processing costs for that do skyrocket. People are trying to work around it with draft models and summarization models, so they try to pick out the relevant parts of a codebase in one pass and then base their code generation just on that, and… I don’t think that’s going to work reliably at scale. The more chances you give a language model to lose their goddamn mind and start making crap up unsupervised the more work it’s going to be to take what they spit out and shape it into something reasonable.
I think the main barriers are context length (useful context. GPT-4o has “128k context” but it’s mostly sensitive to the beginning and end of the context and blurry in the middle. This is consistent with other LLMs), and just data not really existing. How many large scale, well written, well maintained projects are really out there? Orders of magnitude less than there are examples of “how to split a string in bash” or “how to set up validation in spring boot”. We might “get there”, but it’ll take a whole lot of well written projects first, written by real humans, maybe with the help of AI here and there. Unless, that is, we build it with the ability to somehow learn and understand faster than humans.
I don’t know, some of these guys have acccess to a LOT of code, and even more debate about what those good codebases entail.
I think the other issue is more relevant. Even 128K tokens is not enough for something really big, and the memory and processing costs for that do skyrocket. People are trying to work around it with draft models and summarization models, so they try to pick out the relevant parts of a codebase in one pass and then base their code generation just on that, and… I don’t think that’s going to work reliably at scale. The more chances you give a language model to lose their goddamn mind and start making crap up unsupervised the more work it’s going to be to take what they spit out and shape it into something reasonable.