Agree with this framing. The gap is usually not “can code be generated” but “can behavior be verified.”
What has worked for us is treating prompts as intent only, then enforcing guardrails in process:
- execution contract before implementation
- validation commands as required output
- corrective feedback captured as reusable memory
That keeps velocity while reducing silent logic regressions.