System Design with Codex
Your monolith has grown to the point where every deploy is a 45-minute adventure in “will this break something unrelated?” The team has decided to extract the payments module into a separate service, but nobody can agree on the API boundary, the data migration strategy, or how to handle the transition period when both systems run simultaneously. Architecture decisions like this are too important to rush and too complex to hold in one person’s head. Codex gives you an iterative design process: plan in the IDE with full codebase context, validate assumptions with cloud tasks, and explore alternatives in parallel worktrees.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A multi-surface architectural planning workflow using the IDE for design and cloud for validation
- Prompts for decomposing monoliths, designing API boundaries, and planning data migrations
- Techniques for evaluating architectural alternatives with parallel cloud tasks
- An automation recipe for detecting architectural drift over time
The Workflow
Section titled “The Workflow”Step 1: Analyze the Current Architecture
Section titled “Step 1: Analyze the Current Architecture”Before designing a new architecture, understand what you have. Use the IDE extension with auto-context enabled — it can see your open files and gives you fast responses for analysis questions.
Step 2: Design the Target Architecture
Section titled “Step 2: Design the Target Architecture”With the analysis complete, design the new architecture. Use the $plan skill if available, or a structured planning prompt in the Codex App:
Step 3: Validate with Cloud Tasks
Section titled “Step 3: Validate with Cloud Tasks”Architecture plans need validation. Use cloud tasks to test specific assumptions about the design.
Validation 1: API boundary feasibility
codex cloud exec --env arch-test "Test the proposed API boundary for the payments service extraction.
Create a minimal Express server that exposes the 6 payment endpoints from the design plan:- POST /payments/charge- POST /payments/refund- GET /payments/:id- GET /payments/customer/:customerId- POST /payments/webhooks/stripe- POST /payments/webhooks/paypal
For each endpoint, verify that the request/response can be served without accessing any database tables outside of the payments schema. If an endpoint requires data from the users or orders table, report the dependency and suggest how to resolve it (pass the data as a parameter, or the caller enriches the request before forwarding).
Run the existing payment-related tests against this server to verify coverage."Validation 2: Data migration strategy
codex cloud exec --env arch-test --attempts 2 "Test two data migration approaches for the payments extraction:
Approach A (dual-write): During migration, all writes go to both the monolith DB and the new service DB. Reads gradually shift to the new service.
Approach B (CDC): Use change data capture to replicate the payments tables to the new service's database. Switch reads to the new service when replication lag is acceptable.
For each approach:1. Write a proof-of-concept implementation (enough to test the concept)2. Simulate 1000 writes and verify data consistency between both databases3. Measure the latency overhead of each approach4. Identify failure modes (what happens if the new service DB is down during a write)
Report which approach is more reliable and has lower latency overhead."Step 4: Explore Alternatives in Parallel
Section titled “Step 4: Explore Alternatives in Parallel”For contentious design decisions where the team cannot agree, spin up parallel exploration threads:
Worktree 1: REST API Communication
Prototype the payments service using REST for inter-service communication.
Implement:- A payments service with the 6 endpoints from the design- A proxy in the monolith that forwards payment requests to the new service- Error handling: what happens when the payment service returns 500 or times out- Circuit breaker pattern for the proxy
Measure: latency overhead of the proxy, behavior under the payment service being down for 30 seconds.Worktree 2: Event-Driven Communication
Prototype the payments service using events for inter-service communication.
Implement:- A payments service that emits events (payment.created, payment.refunded, etc.)- The monolith subscribes to events and updates its local state- Commands are still synchronous (monolith calls the payment service API)- Event store or message queue (use Redis Streams for simplicity)
Measure: eventual consistency delay, behavior when events are lost or duplicated.Compare the prototypes side by side. The concrete implementations make it much easier to evaluate trade-offs than abstract whiteboard discussions.
Step 5: Document and Monitor Architecture
Section titled “Step 5: Document and Monitor Architecture”Once the architecture is decided, document it and set up monitoring for drift:
Create docs/ARCHITECTURE-PAYMENTS.md documenting the payments service extraction:
1. Target architecture diagram (text-based)2. API boundary definition (endpoints, request/response schemas)3. Data ownership map (which tables belong to which service)4. Communication patterns (synchronous calls, events)5. Migration plan with phases and rollback strategy6. Decision log: key decisions made and the reasoning behind each one
Format for an audience of senior engineers who will implement the migration.Set up a weekly automation to detect architectural drift:
Using Codex for Design Reviews
Section titled “Using Codex for Design Reviews”Before committing to an architecture, get a design review. Use /review with a custom focus:
/review Focus on the architectural decisions in this branch. Are the API boundaries clean? Is the data migration strategy sound? Are there coupling points that will cause problems during the transition? Flag anything that will make rollback difficult.Or request a design review on GitHub:
@codex review for architectural soundness. Focus on: separation of concerns between the monolith and the new payments service, data consistency during migration, and failure mode handling.When This Breaks
Section titled “When This Breaks”The analysis misses hidden coupling. Codex finds explicit imports and function calls, but may miss implicit coupling: shared configuration values, assumptions about database transaction boundaries, or runtime dependencies injected through middleware. After the analysis, manually verify the top three integration points Codex identified by tracing them in a debugger or with logging.
The cloud validation uses a simplified test that does not reflect production complexity. A proof-of-concept with 1,000 writes does not reveal issues that appear at 10 million writes. Use the cloud validation to test correctness and identify obvious failures, then plan a more realistic load test for the staging environment.
Parallel prototype threads make different assumptions. If the REST prototype assumes synchronous writes and the event-driven prototype assumes eventual consistency, they are not comparing apples to apples. Define the comparison criteria in advance: “Both prototypes must handle the same 10 test scenarios. Report latency, consistency guarantee, and failure behavior for each scenario.”
Architecture documentation goes stale immediately. The migration takes three months. By month two, the actual implementation has diverged from the documented plan. The weekly automation helps catch drift, but it only works if someone acts on the findings. Assign a specific person (or rotate the responsibility) to review the automation inbox and update the docs.