One-line summary description Stop your agent from claiming "done" before it's proven. A verification gate that classifies each change by risk (payment, auth, database, user-facing), picks the tests that actually cover it, demands evidence, maps regression risk, and outputs an honest pass/fail report. Turns "looks good to me" into "here's what I ran, and here's what's still unverified."
1
code-reviewqa-automationdevsecops+2
Evidence Integrity Gate — Stop AI From Shipping Unsupported Claims in High Stakes Content
Runs an ordered evidence-integrity gate over any AI draft — grade sources, ground claims, verify technical assertions, stress-test — then returns one PASS/REVISE/FAIL ship decision.
Audit your dbt project for the test and documentation gaps that let bad data ship. Flags models with no unique or not_null tests, sources missing freshness config or tests, likely keys without a not_null test, models missing descriptions, SELECT * in models, and raw table references that should use ref() or source(). Each finding comes with a suggested tests: YAML snippet to drop into schema.yml.
Scaffold a complete, production-ready transactional email system — verification, password reset, receipts, and notifications — with deliverability hardening, retries, idempotency, and a suppression list.
Catch documentation that drifted from your code. Flags functions and methods named in your docs that are gone from the source, CLI flags documented but missing from the arg parser, env vars the docs mention but the code never reads, example imports of modules that no longer exist, and npm scripts or Make targets your docs reference but the project does not define. Cross-references your README and docs against Python and JS/TS source.
Check ad copy for the editorial issues that get Google and Meta ads disapproved, before you submit. Flags all-caps words, gimmicky punctuation (!!, ??), prohibited and hype words, Meta's banned personal-attribute framing ("Are you over 40?"), absolute or unsubstantiated claims, ad-field length overflow, and trademark-term risk. The word lists and field limits are editable, so you can tune it to your accounts.
A repo visitor decides in about ten seconds whether your project solves their problem — and most READMEs spend those ten seconds on installation instructions instead of answering 'what does this actually do'
An iterative agent loop that optimizes any prompt, config, or artifact by making one change at a time, scoring it against a metric, and keeping only the winners.
Audit your frontend build against a performance budget and catch size regressions before you ship. Flags total bundle over budget, initial bundle over budget, individual chunks over a threshold, oversized image assets, source maps shipped to production, and large unminified JavaScript. Reads a webpack or Vite-style stats.json plus a perf-budget.json you control.
An adversarial reviewer for AI-written code changes. It pressure-tests a pull request or diff for untested branches, silent behavior changes, missing edge cases, over-confident code that only looks right, and weak tests, then returns a PASS / REVISE / BLOCK verdict before the change merges.
Flag the hidden and look-alike characters lurking in a handle or brand string. Catches zero-width characters, mixed-script look-alikes (a Cyrillic "а" passing as a Latin "a"), right-to-left and bidi override characters, unexpected non-ASCII, and stacked combining marks. These are the spoofing tricks and display bugs you cannot catch by reading.
Audit your frontend for accessibility violations before release — flags WCAG failures, gives prioritized fixes, and blocks the broken patterns that get sites sued.
Synced shared screen for a Three.js multiplayer room: a SharedScreen client (YouTube on a canvas texture, clock-based playback tracking, 3D positional audio), a screenHandlers Colyseus pattern (setUrl, play, pause, seek, volume, stop, controlled-by tracking), an optional yt-dlp plus ffmpeg proxy, and an optional Puppeteer BrowserManager. Field-verified from nex-vr-room.