Original): Tucwu
Originally published in the direction of artificial intelligence.
Your AI code assistant will force you to make a risky code- fragments- and things will simply be done when the folder is opened. It's how to keep safety.
What would happen if the most dangerous line you ever see today was sure? A hint?
My story
A few weeks ago, a group of people released a small service of micros services after releasing a security control from artificial intelligence. I stood and watched the construction notes regarding the construction march in my design office and felt that itching again: she had no sharp advantage, it was pleasant, but it was not helpful. The experiment after the same flow from the toy repository confirmed the argument – the assistant was prompted to beg that she could do something against the background with a long glow of helpful matter, and the risky piece was completely outside the frame.
The same trend was soon discovered in another project, which I made last year, in which one of the ideas of the programmers will be loaded to perform the task file every time the folder is opened. Lack of malicious magic arts – careless and credibility. It was a quick lesson, but the morality of this story was still the same: the new era of artificial intelligence does not exalt old security problems; He simply folds them under a more pleasant surface.
Why does it matter
Today's headline is obvious: scientists have shown the so -called attack of lies in the loop (Litl), which makes the agent coding AI think that you are telling the truth, and then creates a harmful command, of course you press on Enter, and there is danger in the supply chain (dark reading, September 15, 2025).
Dark reading
In the main work of Checkmarx, the length, carefully designed context and open GitHub problems may include malicious commands over the fold; Hence the approval is safe, even if it is not (Checkmarx, September 15, 2025).
Checkmarx
At the same time, another thread shows that your idea can be part of the problem. According to Hacker News, Cursor, the vs Code, powered by AI, is equipped with disabled trust in the working space; Any repository containing .vscode/Task.json can be launched by default as a code after opening the folder; It becomes a code operating on your account (September 12, 2025).
Hacker news
Connect it with the other, and you have the effect of freezing: the agent can convince you; Ide can do it on your behalf.
And, as you think, some, in other words, is only a quick injection. TOP 10 LLM 2025 LLM begins with a fast LLM01 injection and uncertain LLM02 output support. The ancient rule is true: the contribution of distrust should never be the strength to take tender action without extreme means (Owasp Genai, 2025).
OWASP Foundation
But this is a twist – man is still in the loop and it is a loop in which a lie falls.
Optimists will say: the instruments are getting better and man can always read the whole prompt. Note: scrolling is not control, and the default values ​​are elections, and the supply chains are merciless for soft errors.
Your amendment in steps
Below is a short route, which teams can complete this week. It seems to be an essay, because the amendment is not a feature but a habit.
- Turn on trust (and pin it). Enabling trust in the working area in any IDE with AI-Nieznami and similar, and configuring the opening options in limited mode in undisturbed folders. Even better: repos audit exclusively. Tip :.vscode/Task.json looks like a executable code and so it is.
- Goal agents work as opposed to vibration. Human in a loop (Hitl) is not controlled when a person is unable to see a risky delta. Make the agent to provide a short and permanent action plan containing a precise command and purpose in the Monospace field; Refuse acceptance when the box exceeds the constant size. This reduces the trick above after burying.
- Waste: render vs. Run. Keep an agent in the render sandbox (plan, difference, test outline) and perform it with other (strict) lists of permits. This is the proposing agent; It is a runner's enforcement. OWASP translates this into the following LLM02 / LLM05: Limit the output and secure the supply chain.
- Mark your models and artifacts. Drawing the author/model by name only from public centers is a bad idea. Pin on the unchanging SHA and think about your register. Palo Alto presents the re -use of the model name space to show the reason why the name cannot be trusted.
- Approves the authorization of a small division. Before starting the start order, only the smallest amount of difference or a single command should be displayed. No story, without changing, without emoji, no more, no less. In the event that the agent is unable to display the difference, it cannot be operated. Tip: Short periods of review reduce the fatigue and success in social engineering.
- SPLASH Zone instrument. Generate an inexpensive, one -off agent environment with special APIs, project secrets and killing switches. Record all outgoing calls and files touching. If one of them failed, nuke sandbox- no laptop.
- Take advantage of opposing exercises. Testing tests of purple purple syndromes, injected instructions in tickets, Readmes and problems, as in the Litl study, and measure time to register and time to kill. Slow and cautious approval habits should be rewarded. Thousands would cost us so much if we did not hit him; It was the first thing discovered by a drill.
- Address: Does it really resemble an interface than selection fields? Create your own approval prompt: high content, uniform, solid font. The most effective monitor is a more surgical consent form, not a conversation. What would you be deprived of first with diaries or access?

Fast myths
“People are a safety net.” They are – until the fall is hidden by a friendly text in the network with a blanket (Dark Reading, September 15, 2025).
Dark reading
“Just sand”. Only in the case of a sandbox containing secrets, tokens and diaries to answer you if you have a sandbox.
“Trusted sources suggest safe models.” Names can be carried away; Pin after Hash & Mirror.
Checklist
Before you connect today, scan:
- Trust of the working space
- The “action plan” agent is purely different
- The commands are short and pinned
- The model stretches through SHA
- Only the keys thrown
- Centralized and checked diaries.

What to do this week
Make the Security Day of Artificial Intelligence on Tuesday. In IDS, make trust modes and attach one complementing the single panel, do it like the first five of the outer mirror with shortcuts. Then do a 30-minute deceptive exercise-Ukrainian one of the instructions in the problem and see if your team will get it? If the approval seems hurried (deliberately release the loop).
Further reading
- Dark Reading (September 15, 2025)-a report on AI “Lies-in-the-The-PÄ™tla”.
- Checkmarx (September 15, 2025) – basic research – proof of the concept for liter with Hitl bypassing formulas.
- Hacker News (September 12, 2025) – Cursor IDE setting default trust allows you to quietly perform tasks during the opening of the folder.
- OWASP GENAI TOP 10 (2025) – LLM01/LLM02 EARTH TO RADIVE INTRODUCTION/CONTROL.
(To get a more in -depth foundation at the risk of fast injection, see this average explanation from the last 48 hours: medium (September 14, 2025).

CTA
Comment your thoughts below. Subscribe to more.
Thanks a lot for reading it! Distribute the word to your friends so that they can be safe. Subscribe and follow the medium, X, LinkedIn, Reddit, Sustack, Github, – Tag Ai Advance and more to learn new Hacks of Safety AI and share with a friend or beloved friend to avoid another fear.
From a friendly jungle safety writer, for now.

Published via AI