Skip to content

Compliance Automation with AI

The auditor wants evidence that every production deploy runs through code review, that no GPL-licensed dependency shipped, and that you can produce a data-flow diagram showing where EU personal data lives. You have a SOC 2 Type II window closing in three weeks and a codebase nobody documented for compliance. Screenshotting GitHub settings by hand is not going to scale.

This is exactly the kind of repetitive, evidence-heavy work an AI coding agent is good at: it can read the repo, draft the scripts that collect evidence, wire the CI gates that enforce a control, and turn code into the narrative an auditor expects. You stay the reviewer; the agent does the typing.

  • A prompt that audits a repo against a specific SOC 2 control (CC6.1, CC8.1) and returns a gap table with file:line evidence
  • A working GitHub Actions job that fails the build on disallowed dependency licenses
  • A pre-commit secret-scanning gate the agent wires for you
  • A GDPR data-flow document the agent drafts by tracing personal-data fields through the codebase
  • The same workflow shown in Cursor, Claude Code, and Codex, plus the MCP servers that make evidence collection one step instead of ten

Compliance automation with an agent is four moves: map the control to your code, generate the evidence collector, enforce the control in CI, then write the human-readable narrative. Treat the agent’s output as a first draft you review, never as the auditor’s word.

Start narrow. Pick one control and ask the agent to find where you do and don’t satisfy it. The trick is forcing file:line citations so you can verify every claim instead of trusting a confident summary.

The “do not invent” clause matters. Compliance prompts are where models are most tempted to hallucinate a tidy, fully-compliant answer. Demanding citations turns “trust me” into something you can spot-check in 30 seconds.

The mechanics of pointing the agent at your repo differ per tool:

Open the repo and switch the Agent to a planning-grade model (Fable 5 or Opus 4.8 for thorough audits, Sonnet 4.6 for everyday passes). Paste the prompt in Agent mode and add @Codebase so it searches the whole project rather than just open files. Cursor renders the gap table inline; click each file:line citation to jump straight to the evidence and confirm it.

Auditors want repeatable evidence, not a one-off chat. Have the agent write a script that pulls the proof on demand — branch-protection settings, the list of who can merge, deploy approvals — so you can re-run it the morning of the audit.

Review what it produces. Confirm it calls real gh api / Octokit endpoints (not invented ones), reads the token from the environment rather than hardcoding it, and that the control mapping in the header comment is honest. Then run it once and eyeball the JSON against what you see in the GitHub UI.

Evidence proves a control existed in the past. A CI gate keeps it true going forward. Two of the highest-leverage gates: a dependency-license check and a secret scan. Have the agent write both.

For secret scanning, prefer wiring an established tool over asking the agent to invent regexes — it should configure gitleaks, not write its own scanner:

Ask the Agent: “Add a pre-commit hook using gitleaks that blocks commits containing secrets, plus a CI job that runs gitleaks detect on every PR.” Cursor edits .pre-commit-config.yaml and the workflow file. Review the diff in the Source Control panel, then stage a fake AWS_SECRET_ACCESS_KEY=... line locally to confirm the hook actually blocks it before you trust it.

The last mile of most audits is prose: a data-flow description an auditor or DPO can read. The agent can trace personal-data fields through the codebase far faster than you can grep for them — as long as you make it cite sources and flag uncertainty.

The Mermaid diagram renders directly in most docs tools and gives your DPO a picture instead of a wall of text. The “NEEDS REVIEW” tag is the safety valve — it surfaces the fields the agent couldn’t fully trace so a human closes the gap.

Evidence collection gets dramatically shorter when the agent can query your systems directly instead of shelling out to CLIs. The relevant connections here are all real, first-party MCP servers:

  • GitHub MCP — the agent reads branch protection, PR reviews, and Actions runs natively. Use the hosted server at https://api.githubcopilot.com/mcp/ (the legacy local server @modelcontextprotocol/server-github is deprecated but still installable). Add it to Claude Code with:

    Terminal window
    claude mcp add --transport http github https://api.githubcopilot.com/mcp/
  • Sentry MCP (https://mcp.sentry.dev/mcp) — pull incident and error history as evidence for availability and incident-response controls.

  • Filesystem MCP — for scoping the agent to a specific evidence directory when generating reports.

If you genuinely need to script an MCP client (rather than letting the agent drive one), the SDK construction is specific — the package exports Client, not a root MCPClient, and it connects over a transport, never a bare server-name string:

// Connect a programmatic MCP client to the GitHub server
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
const client = new Client({ name: 'compliance-evidence', version: '1.0.0' });
await client.connect(
new StreamableHTTPClientTransport(new URL('https://api.githubcopilot.com/mcp/'))
);

On the Skills side, a single-purpose code review skill from the open skills marketplace (browse skills.sh and install with npx skills add <owner/repo>) is a lighter alternative to a full MCP server when all you want is a consistent compliance-flavored review on each PR. Reach for a skill when you need repeatable behavior; reach for an MCP server when the agent needs a live connection to a system of record.

  • The agent marks a control “Met” with no real evidence. This is the failure mode that gets you a finding. Always require file:line citations, then spot-check three of them. If a citation points at a file that doesn’t contain the claimed control, discard the whole table and re-run with a stricter prompt.
  • It invents an MCP server or npm package. Do not paste regulatory-mcp-server, @compliance/*, or similar — none exist. Verify any suggested package with npm view <pkg> version before wiring it in, and stick to the GitHub/Sentry/Filesystem MCPs above.
  • Generated regexes miss real secrets. Don’t let the agent hand-roll a secret scanner. Use gitleaks or gh secret-scanning; their rule sets are maintained and tested. Use the agent to wire the tool, not replace it.
  • License data is wrong for transitive deps. license-checker reads declared licenses, which are sometimes mislabeled upstream. For anything you’re about to ship under audit, confirm flagged copyleft packages manually before failing or unblocking a build.
  • The data-flow doc leaks a real secret or PII sample. Tell the agent to redact values and reference field names only. Review the diff before committing anything to evidence/.