Imagine a future in which artificial intelligence quietly arouses the zero -thinness of software development: re -invading entangled code, migration of older systems and hunting for racing conditions so that human engineers can devote themselves to architecture, design and innovative problems that are still out of reach of the machine. It seems that the last progress has violated the future temptingly close, but a new article of scientists from the Computer Science Laboratory and Artificial Intelligence (CSAIL) and several cooperation institutions claims that this potential future reality requires a difficult look at today's challenges.
Entitled “Challenges and paths towards AI for software engineering“Work fats many software engineering tasks outside the code generating, identifies the current bottlenecks and emphasizes the directions of research to defeat them, aimed at focusing on people on high level designing during automation of routine work.
“Everyone is talking about how we no longer need programmers, and now all this automation is available,” says Armando Solar -lezama, professor of electrical engineering and computer science, the main CSAIL researcher and the senior author of the study. “On the one hand, the field has made huge progress. We have tools that are much more powerful than any we have seen before. But there is also a long way to really get the full promise of automation that we would expect.”
Solar-Lezama claims that popular narratives often reduce the software engineering for “part programming part: someone gives you the specification to a small function and you implement it or solving program interviews in the Leetcode style.” Real practice is much wider. It includes daily refractors that Polish design, as well as sweeping migration, which transfer millions of lines from Cobol to Java and transform entire companies. This requires testing and analyzing without interruption, testing based on properties and other methods-to catch mistakes or defects in zero day. And this is associated with a maintenance grind: documenting the code decade, summary of the history of changes for new team members and browsing the demands of pulling for style, efficiency and security.
Optimization of the code in an industry scale-a kind of re-tuning GPU nuclei or relentless, multi-layer improvements behind the V8 Chrome-Postproof engine stubbornly stubbornly assess. Today's header indicators have been designed for short, independent problems, and although multiple selection tests still dominate the natural language research, they have never been the norm in AI-For-Code. De facto YardStick, Swe-Bench, simply asks the model to get the GitHub problem: useful, but still similar to the paradigm “programming exercises”. It applies only to several hundred code lines, risk leaking data from public repositories and ignores other contexts of the real world-refactors supported by AI, programming human couple or rewriting critical in critical performance, which includes millions of lines. Until comparative tests are expanding to capture these scenarios of higher rates, measuring progress-thus accelerating his-tailoring with an open challenge.
If the measurement is one obstacle, communication between people -sa is another one. The first author of Alex GU, a graduate of a myth in the field of electrical engineering and computer science, considers today's interaction as a “thin communication line”. When he asks the system to generate code, he often receives a large, unstructured file, and even a set of unit tests, but these tests are usually superficial. This gap extends to the ability of artificial intelligence to effectively use the wider package of software engineering tools, from debuggers to static analyzers, on which people rely on precise control and deeper understanding. “I don't really have much control over what the model is writing,” he says. “Without the AI channel to reveal your own confidence -” This part is correct … This part, maybe double checking ” – developers risk blindly trusting hallucinated logic, which is based on production. Another critical aspect is AI knowledge when to postpone the user to clarify. ”
Skalowe compounds are these difficulties. The current AI models struggle deeply with large code databases, often covering millions of lines. Foundation models learn from public github, but “the code of each company's code is in a sense different and unique,” says Gu, creating reserved coding conventions and the requirements of specifications essentially outside distribution. The result is the code that looks likely, but causes non -existent functions, violates the rules of the internal style or fails continuous pipelines. This often leads to the code generated by AI, which “hallucinates”, which means that it creates content that looks likely, but is not in line with specific internal conventions, auxiliary functions or architectural patterns of a given company.
Models also often download incorrectly, because it downloads code with a similar name (syntax) instead of functionality and logic, which may be necessary, how to write a function. “Standard download techniques are very easily cheated by fragments of the code that do the same, but they look different,” says Solar -lezama.
The authors mention that since there is no silver ball in these problems, they call instead in order to effort towards the community: richer, having data that captures the process of CODE programmers (for example, which programmers, compared to throwing the code, are reimbursed in time, etc.), joint sets of assessment that measure progress in the quality of the refactor, re -actor, long -term error and long -term and migration; and transparent equipment that allows models to reveal uncertainty and invites human control, not passive acceptance. GU presents the program as a “call to act” for greater open cooperation, which no individual laboratory could collect alone. Solar -lezama imagines incremental achievements – “research results separately with each of these challenges” – which provide commercial tools and gradually transfer artificial intelligence from a bus assistant towards a real engineering partner.
“Why something in this case? The software already at the basis of financing, transport, healthcare and little things of everyday life, and the human effort required to build and maintain it becomes a bottleneck. AI, which can deal with the work of gurgle – and does it without introducing hidden failures – the developers would free to focus on creativity, strategy and ethics and ethics.” But this future From the recognition that the end of the code is an easy part; everything else is not to replace our goal.
“With so many new works appearing in artificial intelligence for coding, and the community often pursuing the latest trends can be difficult to go back and think about which problems are most important to solve,” says Baptiste Rozière, AI scientists in Mistral Ai, who was not involved in the article. “I liked reading this article because it offers a clear review of key tasks and challenges in artificial intelligence for software engineering. This also presents promising tips on future research in this field.”
GU and Solar-Lezama wrote an article from the University of California in Berkeley Professor Koushik Sen and doctoral students, Naman Jain and Manish Shetty, assistant professor at the Cornell Kevin Ellis University University and PhD student and PhD student, assistant professor of Stanford University, professor Diyi Yang and doctoral Hopkins assistant to the University of Li. Their works were partly supported by the National Science Foundation (NSF), sponsors and associated entities from Sky Lab, Intel Corp. via the NSF and Office of Naval Research.
Scientists present their work at an international conference on machine learning (ICML).

















