Skip to content

System Design with Codex

Your monolith has grown to the point where every deploy is a 45-minute adventure in “will this break something unrelated?” The team has decided to extract the payments module into a separate service, but nobody can agree on the API boundary, the data migration strategy, or how to handle the transition period when both systems run simultaneously. Architecture decisions like this are too important to rush and too complex to hold in one person’s head. Codex gives you an iterative design process: plan in the IDE with full codebase context, validate assumptions with cloud tasks, and explore alternatives in parallel worktrees.

  • A multi-surface architectural planning workflow using the IDE for design and cloud for validation
  • Prompts for decomposing monoliths, designing API boundaries, and planning data migrations
  • Techniques for evaluating architectural alternatives with parallel cloud tasks
  • An automation recipe for detecting architectural drift over time

Before designing a new architecture, understand what you have. Use the IDE extension with auto-context enabled — it can see your open files and gives you fast responses for analysis questions.

With the analysis complete, design the new architecture. Codex ships with an experimental $create-plan skill that structures a planning pass — install it once with $skill-installer install the create-plan skill from the .experimental folder, then invoke it inside a structured planning prompt in the Codex App. (The only skills bundled by default are $skill-creator and $skill-installer; $create-plan has to be installed before its first use.)

Architecture plans need validation. Use cloud tasks to test specific assumptions about the design.

Validation 1: API boundary feasibility

Validation 2: Data migration strategy

For contentious design decisions where the team cannot agree, spin up parallel exploration threads:

Worktree 1: REST API Communication

Worktree 2: Event-Driven Communication

Compare the prototypes side by side. The concrete implementations make it much easier to evaluate trade-offs than abstract whiteboard discussions.

Once the architecture is decided, document it and set up monitoring for drift:

Create docs/ARCHITECTURE-PAYMENTS.md documenting the payments service extraction:
1. Target architecture diagram (text-based)
2. API boundary definition (endpoints, request/response schemas)
3. Data ownership map (which tables belong to which service)
4. Communication patterns (synchronous calls, events)
5. Migration plan with phases and rollback strategy
6. Decision log: key decisions made and the reasoning behind each one
Format for an audience of senior engineers who will implement the migration.

Set up a weekly automation to detect architectural drift:

Before committing to an architecture, get a design review. In the CLI, run /review to start a review of your working tree, then steer it with a follow-up message:

/review

Once the review starts, send the focus as your next message:

Focus on the architectural decisions in this branch. Are the API boundaries clean? Is the data migration strategy sound? Are there coupling points that will cause problems during the transition? Flag anything that will make rollback difficult.

Or request a focused design review on GitHub by mentioning Codex on the PR. The for <focus> suffix is the documented way to scope a GitHub review:

@codex review for architectural soundness. Focus on: separation of concerns between the monolith and the new payments service, data consistency during migration, and failure mode handling.

The analysis misses hidden coupling. Codex finds explicit imports and function calls, but may miss implicit coupling: shared configuration values, assumptions about database transaction boundaries, or runtime dependencies injected through middleware. After the analysis, manually verify the top three integration points Codex identified by tracing them in a debugger or with logging.

The cloud validation uses a simplified test that does not reflect production complexity. A proof-of-concept with 1,000 writes does not reveal issues that appear at 10 million writes. Use the cloud validation to test correctness and identify obvious failures, then plan a more realistic load test for the staging environment.

Parallel prototype threads make different assumptions. If the REST prototype assumes synchronous writes and the event-driven prototype assumes eventual consistency, they are not comparing apples to apples. Define the comparison criteria in advance: “Both prototypes must handle the same 10 test scenarios. Report latency, consistency guarantee, and failure behavior for each scenario.”

Architecture documentation goes stale immediately. The migration takes three months. By month two, the actual implementation has diverged from the documented plan. The weekly automation helps catch drift, but it only works if someone acts on the findings. Assign a specific person (or rotate the responsibility) to review the automation inbox and update the docs.