Tech Lead Reveals Simple Documentation Fix for AI-Generated Code That Passes Tests but Breaks Architecture

Breaking: AI Code Review Bottleneck Solved by Moving Team Memory Into Files

A tech lead has identified a critical gap in AI-generated pull requests: code that passes all tests but violates architectural rules because the AI lacks context that exists only in team memory. The fix is a pair of plain-text files stored in the repository.

Tech Lead Reveals Simple Documentation Fix for AI-Generated Code That Passes Tests but Breaks Architecture — Source: www.freecodecamp.org

"The AI wrote clean code. Tests passed. But it imported the wrong authentication middleware because the migration policy was in a six-month-old Slack thread, not in the diff," said the author, a tech lead at a fast-moving engineering team. "I caught it on the second read, but the third reviewer wouldn't have."

Background: The Slow-Burn Problem AI Accelerated

Code generation tools like Claude Code, Cursor, and GitHub Copilot have increased pull request throughput dramatically. But with faster generation comes a longer review queue. The hardest reviews are those where everything looks right—only one wrong import or missing line that lives only in tribal knowledge.

The author's team spent a quarter migrating from a v1 authentication middleware (MongoDB) to v2 (MySQL). New endpoints were required to use v2. An AI agent used v1 for three new endpoints. Tests passed because user records still existed in both databases. The error was invisible to automated checks.

"Every new endpoint we shipped was reinforcing the legacy auth path we had just spent a quarter trying to retire," the author noted.

The Solution: Two Files That Changed Everything

The fix was not a new tool but a new structure: AGENTS.md and CLAUDE.md files stored at the repository root. These documents contain the rules, conventions, and migration policies that previously lived only in team members' heads or scattered across Slack threads and meeting notes.

"The realization was simple: move the team's memory into a place the AI could actually read," the author wrote. "The structure matters more than the tool."

How It Works

Repository-level memory files: AGENTS.md for general AI agent instructions, CLAUDE.md for Claude-specific guidance. Each file contains actionable rules like "New endpoints must use v2 auth middleware."
Per-service memory files: For microservice architectures, each service gets its own memory file documenting service-specific conventions and dependencies.
Read-only guardrails: The AI PR reviewer is configured to read these files but not modify them, ensuring consistency.

What This Means for Engineering Teams

This approach addresses the root cause of AI-generated architectural violations: missing context. Instead of trying to train a model on every team rule, teams inject that knowledge directly into the codebase in a format the AI can parse.

"The bottleneck wasn't the AI; it was that we had no mechanism to share unwritten rules with the AI," the author explained. "Now the AI catches those mistakes before a human is pulled in."

The method is tool-agnostic and works with Claude Code, Cursor, Cline, GitHub Copilot, and any combination thereof. Early results show a meaningful reduction in review cycles spent on context-related errors.

Implementation: Two-Week Setup Plan

Teams can start from zero on an existing project. The author recommends:

Week 1: Create AGENTS.md and CLAUDE.md with the top five rules that trip up AI agents. Include migration targets, preferred libraries, and deprecation notes.
Week 2: Add per-service memory files for each microservice. Configure the PR review command to read these files by default (read-only).
Ongoing: Update the files whenever team conventions change. Generated documentation can be added as a side effect of the review process.

"The files become a living contract between the team and the AI," the author said. "They also serve as onboarding documentation for new engineers."

What Still Needs Human Review

Not all review can be automated. Architectural decisions involving tradeoffs between performance, security, and maintainability still require human judgment. The AI reviewer catches mistakes in rules that are explicitly documented—it does not replace human reasoning about novel design choices.

"We still do human review for anything that isn't a cut-and-dried rule," the author clarified. "But we've eliminated the category of errors that were invisible because the rule lived somewhere no one could quote it."

Outcome: A Compounding Loop of Improved Reviews

The feedback loop works as follows: each time a human catches a violation during review, they add the rule to the memory files. Over time, the AI catches more errors, reducing cycle time and building a shared knowledge base. The author reports that after two months, the team's review queue shrank by an estimated 30%.

"The compounding effect is real. Every rule we add saves about 20 minutes of back-and-forth per PR," the author estimated.

Sources

Based on a firsthand account published by a tech lead at a mid-sized engineering organization. The original guide includes detailed implementation instructions for teams using Claude Code, Cursor, Cline, or GitHub Copilot.