This article discusses the responsible use of coding agents, highlighting the risks of blindly deploying agent-generated code and advocating for a shift in focus from code generation to ensuring safe deployments.
The Problem: False Confidence
- Coding agents can generate code quickly, but without careful judgment, they can introduce issues into production.
- Agent-generated code often appears flawless, with polished PR descriptions, passing static analysis, and reasonable test coverage, creating a false sense of security.
- Agents lack understanding of production environments, traffic patterns, failure modes, and infrastructure constraints.
- Passing CI is no longer sufficient proof of safety, as agents can persuade pipelines that changes are safe even if they degrade infrastructure.
Leveraging vs. Relying
- Relying on AI means blindly trusting agent-generated code if it passes tests, leading to hidden assumptions and difficult-to-review PRs.
- Leveraging AI means using agents to iterate quickly while maintaining complete ownership of the output, understanding the code’s behavior and associated risks.
- Engineers should be comfortable owning production incidents tied to their pull requests.
Guarding Production
- The solution isn’t to stop using agents, but to build a closed-loop system where agents can act autonomously within a safe environment.
- Key principles:
- Self-driving deployments: Incremental rollouts through gated pipelines with automatic rollback on degradation.
- Continuous validation: Ongoing load tests, chaos experiments, and disaster recovery exercises.
- Executable guardrails: Encoding operational knowledge as runnable tools instead of documentation.
Investments at Vercel
Vercel is actively building guardrails into its shared infrastructure, including:
- Stronger guardrails around shared infrastructure with runtime validation.
- Stricter static checks at PR time, especially around feature flags.
- Production-mirroring end-to-end testing in staging.
- Read-only agents for continuous verification of system invariants.
- Metrics to surface increasing risk across the platform.
Conclusion
- The focus should be on leveraging agents while maintaining ruthless judgment over what is shipped.
- Engineers should ask themselves:
- What does this code do?
- How does it behave once rolled out?
- How can it adversely impact production or customers?
- Am I comfortable owning an incident tied to this code?