Backup, Recovery, and Rollback

The AI agent just refactored your authentication module across 47 files. The tests pass. The types check. You merge the PR. Two hours later, session tokens are not being validated in the admin API — the agent removed a middleware registration that was not covered by tests. You need to roll back, but three other PRs have merged on top. This is the disaster recovery scenario that every team using AI tools eventually faces.

What You’ll Walk Away With

Checkpoint and rollback strategies specific to AI-assisted development
Recovery procedures for AI-introduced regressions at every stage of the pipeline
Pre-flight safety checks that prevent disasters before they happen
Incident response patterns for AI-related production issues
Audit practices that make root cause analysis fast and reliable

Prevention: The Safety Net Architecture

The best disaster recovery is preventing disasters. Build safety nets at every stage.

Stage 1: Pre-Change Safety

Cursor’s checkpoint system provides automatic rollback points:

Checkpoints are created automatically before each agent action
Use the Timeline panel to view and restore any checkpoint
Create manual checkpoints before high-risk operations: right-click in the timeline

Add explicit safety rules:

SAFETY REQUIREMENTS:
Before any multi-file refactoring:
1. List all files that will be modified
2. Verify the test suite passes BEFORE making changes
3. After changes, run the full test suite
4. If any test fails, revert ALL changes and report what went wrong

NEVER delete files without explicit user confirmation.
NEVER modify configuration files (*.config.*, .env*, Dockerfile) without showing the diff first.

Claude Code works with Git directly. Establish commit-based safety:

SAFETY PROTOCOL:
Before starting any multi-file modification:
1. Run: git stash (save any uncommitted work)
2. Create a safety branch: git checkout -b ai/[task-description]
3. Commit after each logical step with descriptive messages
4. Run tests after each commit
5. If tests fail, use git diff to identify the problem

After completing the task:
- Run the FULL test suite (npm test)
- Run type checking (npm run type-check)
- Run linting (npm run lint)
- Show the complete diff from main for review

NEVER force-push. NEVER modify the main branch directly.

Claude Code’s permission system provides an additional safety layer — file writes require explicit approval unless auto-approved in settings.

Codex cloud tasks run in isolated sandboxes with built-in safety:

SAFETY PROTOCOL:
- All changes happen in a new branch (never modify main)
- Cloud tasks cannot push directly to main
- Every task produces a PR for human review
- Worktrees provide isolation between parallel tasks

Before submitting a PR:
1. Run the full test suite
2. Run the linter
3. Generate a comprehensive PR description explaining all changes
4. Flag any files that were deleted or had configuration changes

Codex’s sandboxed environment means a runaway task cannot affect your local environment or other branches.

Stage 2: Review Safety

Copy-paste prompt for AI change review:

I need you to review the changes the AI agent just made. Be adversarial.

1. Read the complete diff (git diff main...HEAD)
2. For each modified file, check:
   - Were any function calls removed that are not covered by tests?
   - Were any middleware registrations, event listeners, or side effects changed?
   - Were any error handling paths altered or removed?
   - Were any configuration values changed?
3. List files that were DELETED and verify nothing else depends on them
4. Check for behavioral changes that would pass type-checking but change runtime behavior
   (e.g., changing an async function to sync, changing error types, altering return values)

I am especially worried about silent behavioral changes that tests do not catch.

Stage 3: Deployment Safety

Feature flags for AI-generated changes

Deploy AI-assisted changes behind feature flags. If something goes wrong, flip the flag instead of rolling back the deployment.
Canary deployments

Route 5% of traffic to the new version. Monitor error rates, latency, and key business metrics for 30 minutes before expanding.
Automated rollback triggers

Set up automatic rollback when error rate exceeds 2x baseline or p99 latency exceeds 3x baseline.
Post-deployment monitoring

Watch dashboards for 4 hours after deploying AI-generated changes. The failure modes of AI code are often subtle — edge cases and race conditions rather than crashes.

Recovery Procedures

Scenario 1: AI Broke Tests (Pre-Merge)

This is the easiest recovery. The AI made changes that break the test suite.

Use Cursor’s checkpoint timeline to restore the last good state:

Open the Timeline panel
Find the checkpoint before the breaking change
Click “Restore” to return to that state
Alternatively, use Cmd+Z aggressively — Cursor tracks AI changes separately from manual edits

# If working on a branch (recommended):
git diff main  # See what changed
git stash       # Save current state
git checkout main  # Return to clean state

# If you committed incrementally (recommended):
git log --oneline -10  # Find the last good commit
git revert HEAD~3..HEAD  # Revert the bad commits

Scenario 2: AI-Generated Code Merged But Causes Production Issues

Copy-paste prompt for production incident investigation:

We have a production incident. The error rate on /api/admin/* endpoints spiked to 15%
after merging PR #847 (AI-assisted auth module refactoring).

Investigate immediately:
1. Show me exactly what PR #847 changed in the auth middleware pipeline
2. Compare the middleware registration order before and after the PR
3. Check if any route-specific middleware was removed or reordered
4. Look at the error logs - what specific error are users hitting?
5. Identify the minimal fix (do not refactor further - just restore correct behavior)
6. Generate the fix as a hotfix PR with only the necessary changes

Speed is critical. Do not optimize or clean up. Fix the regression and nothing else.

Scenario 3: AI Corrupted Data

The most dangerous scenario. AI-generated code introduced a data corruption bug.

Stop the bleeding

Deploy the rollback immediately. Do not try to fix forward when data integrity is at risk.
Assess the damage

Query the database for records modified during the incident window. Determine the scope of corruption.
Restore from backup

Use your point-in-time recovery to restore affected data to the state before the incident.
Root cause analysis

Identify which AI-generated code caused the corruption. Was it a missing validation? A wrong query? A race condition?
Prevent recurrence

Add specific test cases for the failure mode. Add database constraints that would catch the corruption at the data layer. Update AI rules to prevent similar patterns.

Building Resilience Into AI Workflows

The Incremental Commit Strategy

Never let an AI agent make 47 file changes in a single commit. Break large changes into small, reviewable, revertable commits.

Copy-paste prompt for incremental refactoring:

I need to refactor the UserService from a monolithic class to the repository pattern.
This affects approximately 30 files.

DO NOT make all changes at once. Follow this incremental plan:

Commit 1: Create the UserRepository interface and implementation.
          Change nothing else. All existing code still works.
          Run tests.

Commit 2: Add the repository as a dependency to UserService.
          UserService still uses its internal methods (no behavior change).
          Run tests.

Commit 3: Migrate the first method (findById) to use the repository.
          Run tests.

Commit 4: Migrate the next method (create). Run tests.

Continue this pattern for each method. Every commit must:
- Pass all tests
- Be independently revertable
- Change the minimum number of files

Stop after each commit so I can review before continuing.

The Companion Test Strategy

Before any AI-generated change, create a test that captures the current behavior.

Copy-paste prompt for behavior-capture tests:

Before we refactor the auth middleware, I need behavior-capture tests.

For each middleware function in /src/middleware/auth/:
1. Write a test that calls the middleware with a valid request and asserts the EXACT current behavior
2. Write a test that calls it with an invalid request and asserts the EXACT error response
3. Write a test that verifies the middleware chain order (which middleware runs before which)

These tests should FAIL if the refactoring changes any observable behavior.
They are intentionally brittle - that is the point. They are our safety net.

Name them clearly: auth-middleware.behavior-capture.test.ts
After the refactoring is complete and verified, we will remove these in favor of proper tests.

When This Breaks

“We merged AI code without proper review and now production is down.” Roll back immediately. Do not try to fix forward during an active incident. After rolling back, conduct a blameless post-mortem focused on what safety net was missing, not who approved the PR.

“We cannot roll back because other changes depend on the AI-generated code.” This is why incremental commits matter. If you can identify which specific commit introduced the issue, you can revert just that commit. If changes are tangled together, you may need to create a targeted hotfix rather than a full rollback.

“The AI deleted files we need and we did not notice until much later.” Git has your back. Use git log --diff-filter=D to find deleted files and git checkout <commit>^ -- <filepath> to restore them. Add a CI check that flags file deletions for extra scrutiny during code review.

“Our backup strategy does not cover AI-specific failure modes.” Standard backup strategies (database backups, code in Git) cover most AI failure modes. The unique risk with AI is subtle behavioral changes that pass all checks. Add behavioral regression tests for critical paths and deploy behind feature flags.

What’s Next

Incident Response On-call automation and AI-assisted incident handling.

CI/CD Pipelines Build deployment safety nets into your CI/CD pipeline.

Security Compliance Security standards and audit trails for AI-assisted development.