> it does that
It *probably* does that. There is a high chance it does that.
It cannot be proven that it will do that, it is not a computational guarantee, just something with a high statistical probability.
Sometimes it may write the script to drop a database instead, or do anything else. Seldom.
This isn't a bug, it's the exact way LLMs are supposed to operate. Those who complain about it when it happens don't know what they are doing.