Multifamily Underwriting Bot

investorNot ready

23 ways your automation can silently break

Each one is a way it produces a wrong result, drops data, or fails without telling you. We don't just grade it, we get it to green.

23% of the Production Standard holds

What the grade means

Ways it breaks

of 30 checkpoints

Critical ways

Fix these first

Get to green23 fixes readyApply them yourself, or have us close every gap and hand it back certified.

Have us fix it See the fixes

Category breakdown

8 disciplines of the Standard

Data Integrity2/7

Source Truth and Document Handling0/2

Reliability and Failure Handling2/5

Observability and Monitoring0/4

Accuracy and Evals1/5

Control and Approval1/2

Security and Access1/3

Maintainability and Cost0/2

Scan history

1 scan

Graded against Standard v1.0.0

Jun 12, 3:30 PM · demo grade

7/30

The 30 checkpoints

Expand any point for the fix

Data Integrity2/7 pass

What this means

No rows quietly go missing. A 240-unit rent roll comes out as 240 units, not 228.

The standard

Rows in equals rows out, checked on every run. Any mismatch halts the job instead of quietly returning a short table.

What this costs you

No evidence this build handles row count reconciliation. Rows in equals rows out, checked on every run is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Rows in equals rows out, checked on every run. Any mismatch halts the job instead of quietly returning a short table.

Paste this to Claude

You are hardening a production AI automation. Implement the "Row Count Reconciliation" safeguard described below. Goal: Rows in equals rows out, checked on every run. Any mismatch halts the job instead of quietly returning a short table. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Row Count Reconciliation and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

If a source renames or moves a column, the run stops instead of putting rent in the expense field.

The standard

Columns are read by header name and verified before parsing. A renamed or moved column stops the run, it never silently shifts data into the wrong field.

What this costs you

No evidence this build handles schema drift detection. Columns are read by header name and verified before parsing is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Columns are read by header name and verified before parsing. A renamed or moved column stops the run, it never silently shifts data into the wrong field.

Paste this to Claude

You are hardening a production AI automation. Implement the "Schema Drift Detection" safeguard described below. Goal: Columns are read by header name and verified before parsing. A renamed or moved column stops the run, it never silently shifts data into the wrong field. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Schema Drift Detection and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Monthly versus annual and dollars versus thousands never get mixed up, so NOI is not off by 12x.

The standard

Every figure carries an explicit unit and period, with a test proving the conversion. No naked numbers, no 12x NOI error.

Why this passes

This one holds. Unit and Currency Normalization is handled the way the standard requires.

Keep it that way

Keep it that way: Every figure carries an explicit unit and period, with a test proving the conversion. No naked numbers, no 12x NOI error.

What this means

A trailing-12 and a current rent roll are not blended across different dates into a wrong NOI.

The standard

Every financial source is tagged with an as-of date and period type, and incompatible periods are refused before they ever blend.

What this costs you

No evidence this build handles period and date alignment. Every financial source is tagged with an as-of date and period type, and incompatible periods are refused before they ever blend is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Every financial source is tagged with an as-of date and period type, and incompatible periods are refused before they ever blend.

Paste this to Claude

You are hardening a production AI automation. Implement the "Period and Date Alignment" safeguard described below. Goal: Every financial source is tagged with an as-of date and period type, and incompatible periods are refused before they ever blend. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Period and Date Alignment and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Blank or 'N/A' values are treated as missing, not silently turned into $0.

The standard

Missing stays missing. Blanks and "N/A" are never coerced to zero, and unknown values are excluded from averages, not counted as $0.

What this costs you

No evidence this build handles null and sentinel handling. Missing stays missing is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Missing stays missing. Blanks and "N/A" are never coerced to zero, and unknown values are excluded from averages, not counted as $0.

Paste this to Claude

You are hardening a production AI automation. Implement the "Null and Sentinel Handling" safeguard described below. Goal: Missing stays missing. Blanks and "N/A" are never coerced to zero, and unknown values are excluded from averages, not counted as $0. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Null and Sentinel Handling and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

The same property or comp is not counted twice because the address was written two ways.

The standard

Records dedupe on a resolved identity key, never the raw text, so the same entity collapses to one record.

Why this passes

This one holds. Deduplication and Identity is handled the way the standard requires.

Keep it that way

Keep it that way: Records dedupe on a resolved identity key, never the raw text, so the same entity collapses to one record.

What this means

Every number can be traced back to the document and cell it came from when someone asks.

The standard

Every output value carries its source: file, tab, cell. Any number can be traced back to the document it came from on demand.

What this costs you

No evidence this build handles input provenance. Every output value carries its source: file, tab, cell is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

Every output value carries its source: file, tab, cell. Any number can be traced back to the document it came from on demand.

Paste this to Claude

You are hardening a production AI automation. Implement the "Input Provenance" safeguard described below. Goal: Every output value carries its source: file, tab, cell. Any number can be traced back to the document it came from on demand. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Input Provenance and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Source Truth and Document Handling0/2 pass

What this means

When the rent roll, T-12, and OM disagree, it flags the conflict instead of silently picking one.

The standard

A defined precedence rule decides which source wins, and any disagreement between sources is flagged, never silently resolved.

What this costs you

No evidence this build handles source-of-truth conflict resolution. A defined precedence rule decides which source wins, and any disagreement between sources is flagged, never silently resolved is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

A defined precedence rule decides which source wins, and any disagreement between sources is flagged, never silently resolved.

Paste this to Claude

You are hardening a production AI automation. Implement the "Source-of-Truth Conflict Resolution" safeguard described below. Goal: A defined precedence rule decides which source wins, and any disagreement between sources is flagged, never silently resolved. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Source-of-Truth Conflict Resolution and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

It handles messy real files (scans, phone photos, merged cells) or refuses them. It never guesses a number.

The standard

The parser handles degraded real-world documents or refuses them outright. It never returns a confident number from a file it could not actually read.

What this costs you

No evidence this build handles document-ingestion robustness. The parser handles degraded real-world documents or refuses them outright is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

The parser handles degraded real-world documents or refuses them outright. It never returns a confident number from a file it could not actually read.

Paste this to Claude

You are hardening a production AI automation. Implement the "Document-Ingestion Robustness" safeguard described below. Goal: The parser handles degraded real-world documents or refuses them outright. It never returns a confident number from a file it could not actually read. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Document-Ingestion Robustness and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Reliability and Failure Handling2/5 pass

What this means

Running the same job twice does not create duplicates or double-send anything.

The standard

Re-running the same input changes nothing and re-sends nothing. Resuming after a failure only backfills what did not finish.

What this costs you

No evidence this build handles idempotency and re-run safety. Re-running the same input changes nothing and re-sends nothing is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Re-running the same input changes nothing and re-sends nothing. Resuming after a failure only backfills what did not finish.

Paste this to Claude

You are hardening a production AI automation. Implement the "Idempotency and Re-Run Safety" safeguard described below. Goal: Re-running the same input changes nothing and re-sends nothing. Resuming after a failure only backfills what did not finish. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Idempotency and Re-Run Safety and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

If LoopNet or a county site changes its page, you get alerted instead of pulling garbage comps.

The standard

Every scrape asserts the page is shaped the way it expects and alerts on any change, instead of returning whatever element it happened to grab.

What this costs you

No evidence this build handles scraper layout-change detection. Every scrape asserts the page is shaped the way it expects and alerts on any change, instead of returning whatever element it happened to grab is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Every scrape asserts the page is shaped the way it expects and alerts on any change, instead of returning whatever element it happened to grab.

Paste this to Claude

You are hardening a production AI automation. Implement the "Scraper Layout-Change Detection" safeguard described below. Goal: Every scrape asserts the page is shaped the way it expects and alerts on any change, instead of returning whatever element it happened to grab. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Scraper Layout-Change Detection and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

One source going down does not crash the whole run or store a blank as if it were real data.

The standard

A single failed source degrades to skip-or-fallback with retries, and a failed fetch is stored as a distinct state from a genuine zero result.

What this costs you

No evidence this build handles graceful source degradation. A single failed source degrades to skip-or-fallback with retries, and a failed fetch is stored as a distinct state from a genuine zero result is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

A single failed source degrades to skip-or-fallback with retries, and a failed fetch is stored as a distinct state from a genuine zero result.

Paste this to Claude

You are hardening a production AI automation. Implement the "Graceful Source Degradation" safeguard described below. Goal: A single failed source degrades to skip-or-fallback with retries, and a failed fetch is stored as a distinct state from a genuine zero result. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Graceful Source Degradation and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

One bad file in a batch is set aside so the rest still finish.

The standard

One bad item in a batch is caught and quarantined while the rest finish, and quarantined items can be re-run on their own.

Why this passes

This one holds. Partial-Failure Isolation is handled the way the standard requires.

Keep it that way

Keep it that way: One bad item in a batch is caught and quarantined while the rest finish, and quarantined items can be re-run on their own.

What this means

A slow source cannot hang the whole job forever.

The standard

Every external call has an enforced timeout. No single slow dependency can hang the whole run.

Why this passes

This one holds. Timeout and Hang Protection is handled the way the standard requires.

Keep it that way

Keep it that way: Every external call has an enforced timeout. No single slow dependency can hang the whole run.

Observability and Monitoring0/4 pass

What this means

If the job stops running, you are told. Silence is not treated as everything-is-fine.

The standard

Every successful run emits a heartbeat, and a missed or failed run alerts a human within one cycle. Silence is never treated as success.

What this costs you

No evidence this build handles liveness alerting. Every successful run emits a heartbeat, and a missed or failed run alerts a human within one cycle is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Every successful run emits a heartbeat, and a missed or failed run alerts a human within one cycle. Silence is never treated as success.

Paste this to Claude

You are hardening a production AI automation. Implement the "Liveness Alerting" safeguard described below. Goal: Every successful run emits a heartbeat, and a missed or failed run alerts a human within one cycle. Silence is never treated as success. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Liveness Alerting and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

If a run processes 4 deals instead of the usual 80, it flags it instead of passing green.

The standard

Each run's record count is compared to the recent normal, and a drop or spike past threshold flags the run instead of passing it green.

What this costs you

No evidence this build handles volume anomaly detection. Each run's record count is compared to the recent normal, and a drop or spike past threshold flags the run instead of passing it green is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Each run's record count is compared to the recent normal, and a drop or spike past threshold flags the run instead of passing it green.

Paste this to Claude

You are hardening a production AI automation. Implement the "Volume Anomaly Detection" safeguard described below. Goal: Each run's record count is compared to the recent normal, and a drop or spike past threshold flags the run instead of passing it green. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Volume Anomaly Detection and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Stale data and six-month-old comps are flagged before you underwrite on them.

The standard

Datasets carry a last-refreshed time and records a transaction date, and either past its limit is flagged before it is used.

What this costs you

No evidence this build handles data freshness and comp decay. Datasets carry a last-refreshed time and records a transaction date, and either past its limit is flagged before it is used is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

Datasets carry a last-refreshed time and records a transaction date, and either past its limit is flagged before it is used.

Paste this to Claude

You are hardening a production AI automation. Implement the "Data Freshness and Comp Decay" safeguard described below. Goal: Datasets carry a last-refreshed time and records a transaction date, and either past its limit is flagged before it is used. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Data Freshness and Comp Decay and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Important alerts reach a person by text or Slack, not a log nobody reads.

The standard

Critical alerts reach a channel a human actually watches, with severity and escalation, while low-priority noise is batched or muted.

What this costs you

No evidence this build handles actionable alert routing. Critical alerts reach a channel a human actually watches, with severity and escalation, while low-priority noise is batched or muted is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Critical alerts reach a channel a human actually watches, with severity and escalation, while low-priority noise is batched or muted.

Paste this to Claude

You are hardening a production AI automation. Implement the "Actionable Alert Routing" safeguard described below. Goal: Critical alerts reach a channel a human actually watches, with severity and escalation, while low-priority noise is batched or muted. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Actionable Alert Routing and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Accuracy and Evals1/5 pass

What this means

The model cannot invent a cap rate or NOI that no document supports.

The standard

Every number in the output traces to a source cell or a value the code computed. Anything the model free-types is blocked, not printed.

What this costs you

No evidence this build handles numeric source-tracing. Every number in the output traces to a source cell or a value the code computed is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Every number in the output traces to a source cell or a value the code computed. Anything the model free-types is blocked, not printed.

Paste this to Claude

You are hardening a production AI automation. Implement the "Numeric Source-Tracing" safeguard described below. Goal: Every number in the output traces to a source cell or a value the code computed. Anything the model free-types is blocked, not printed. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Numeric Source-Tracing and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

The math runs in code, so the same deal always produces the same number.

The standard

All math runs in code, with the model only describing the result. The same input computes the same number every time.

Why this passes

This one holds. Arithmetic in Code, Not Prose is handled the way the standard requires.

Keep it that way

Keep it that way: All math runs in code, with the model only describing the result. The same input computes the same number every time.

What this means

Before any change ships, it is tested against known-good deals so accuracy does not quietly drop.

The standard

A fixed set of labeled real documents runs on every change, and field-level accuracy must hold above a floor before anything ships.

What this costs you

No evidence this build handles regression eval set. A fixed set of labeled real documents runs on every change, and field-level accuracy must hold above a floor before anything ships is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

A fixed set of labeled real documents runs on every change, and field-level accuracy must hold above a floor before anything ships.

Paste this to Claude

You are hardening a production AI automation. Implement the "Regression Eval Set" safeguard described below. Goal: A fixed set of labeled real documents runs on every change, and field-level accuracy must hold above a floor before anything ships. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Regression Eval Set and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

On a blurry scan it says 'I cannot read this' instead of confidently guessing.

The standard

Confidence is scored from real signals like cross-field checks and OCR quality, and low-confidence rows route to human review instead of passing as fact.

What this costs you

No evidence this build handles abstention on unreadable input. Confidence is scored from real signals like cross-field checks and OCR quality, and low-confidence rows route to human review instead of passing as fact is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

Confidence is scored from real signals like cross-field checks and OCR quality, and low-confidence rows route to human review instead of passing as fact.

Paste this to Claude

You are hardening a production AI automation. Implement the "Abstention on Unreadable Input" safeguard described below. Goal: Confidence is scored from real signals like cross-field checks and OCR quality, and low-confidence rows route to human review instead of passing as fact. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Abstention on Unreadable Input and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Impossible numbers (140% occupancy, $4/SF office rent) get flagged, not passed downstream.

The standard

Every output is checked against plausible ranges, and impossible values are flagged, never passed downstream.

What this costs you

No evidence this build handles output range and sanity bounds. Every output is checked against plausible ranges, and impossible values are flagged, never passed downstream is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

Every output is checked against plausible ranges, and impossible values are flagged, never passed downstream.

Paste this to Claude

You are hardening a production AI automation. Implement the "Output Range and Sanity Bounds" safeguard described below. Goal: Every output is checked against plausible ranges, and impossible values are flagged, never passed downstream. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Output Range and Sanity Bounds and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Control and Approval1/2 pass

What this means

Nothing irreversible (an LOI, an offer, a tenant message) goes out without a person approving it.

The standard

Nothing irreversible (an offer, a payment, an outbound message) goes out without a logged human approval.

Why this passes

This one holds. Human Approval Gates is handled the way the standard requires.

Keep it that way

Keep it that way: Nothing irreversible (an offer, a payment, an outbound message) goes out without a logged human approval.

What this means

The AI model is pinned, so an overnight provider change does not silently shift your outputs.

The standard

The model and key dependencies are pinned to exact versions and re-validated on any change, never floating on a "latest" alias.

What this costs you

No evidence this build handles model and dependency pinning. The model and key dependencies are pinned to exact versions and re-validated on any change, never floating on a "latest" alias is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

The model and key dependencies are pinned to exact versions and re-validated on any change, never floating on a "latest" alias.

Paste this to Claude

You are hardening a production AI automation. Implement the "Model and Dependency Pinning" safeguard described below. Goal: The model and key dependencies are pinned to exact versions and re-validated on any change, never floating on a "latest" alias. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Model and Dependency Pinning and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Security and Access1/3 pass

What this means

API keys and passwords live in a secure place, never hardcoded in the code or repo.

The standard

Secrets load from environment or a managed store, never the repo, and are rotated periodically.

Why this passes

This one holds. Secret Management is handled the way the standard requires.

Keep it that way

Keep it that way: Secrets load from environment or a managed store, never the repo, and are rotated periodically.

What this means

The tool only has the access it needs, read-only where it just reads, not admin keys to everything.

The standard

Each integration is scoped to the minimum access it needs, read-only where it only reads. No single admin token reused everywhere.

What this costs you

No evidence this build handles least-privilege access. Each integration is scoped to the minimum access it needs, read-only where it only reads is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

Each integration is scoped to the minimum access it needs, read-only where it only reads. No single admin token reused everywhere.

Paste this to Claude

You are hardening a production AI automation. Implement the "Least-Privilege Access" safeguard described below. Goal: Each integration is scoped to the minimum access it needs, read-only where it only reads. No single admin token reused everywhere. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Least-Privilege Access and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

Tenant SSNs and bank details are masked, and a malicious document cannot hijack the tool.

The standard

Sensitive fields are masked in logs and outbound calls, and text inside documents is treated as data that can never override the agent's instructions.

What this costs you

No evidence this build handles pii and untrusted-input handling. Sensitive fields are masked in logs and outbound calls, and text inside documents is treated as data that can never override the agent's instructions is not enforced, so it can fail silently in production.

CriticalFix effort: ~30 min in Claude

How to fix it

Sensitive fields are masked in logs and outbound calls, and text inside documents is treated as data that can never override the agent's instructions.

Paste this to Claude

You are hardening a production AI automation. Implement the "PII and Untrusted-Input Handling" safeguard described below. Goal: Sensitive fields are masked in logs and outbound calls, and text inside documents is treated as data that can never override the agent's instructions. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip PII and Untrusted-Input Handling and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Maintainability and Cost0/2 pass

What this means

Settings live in one place a non-engineer can change, with a doc so it is not stuck in one person's head.

The standard

Business parameters live in external config a non-engineer can change, with a named owner and a runbook another person can recover the system from.

What this costs you

No evidence this build handles runbook, ownership, and config. Business parameters live in external config a non-engineer can change, with a named owner and a runbook another person can recover the system from is not enforced, so it can fail silently in production.

MediumFix effort: ~20 min in Claude

How to fix it

Business parameters live in external config a non-engineer can change, with a named owner and a runbook another person can recover the system from.

Paste this to Claude

You are hardening a production AI automation. Implement the "Runbook, Ownership, and Config" safeguard described below. Goal: Business parameters live in external config a non-engineer can change, with a named owner and a runbook another person can recover the system from. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Runbook, Ownership, and Config and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

What this means

It is tested at real volume and has spend caps, so a runaway loop cannot burn your budget overnight.

The standard

The system is tested at real peak volume with concurrency limits, and per-run and per-day spend caps hard-stop a runaway before it burns the budget.

What this costs you

No evidence this build handles load behavior and cost ceilings. The system is tested at real peak volume with concurrency limits, and per-run and per-day spend caps hard-stop a runaway before it burns the budget is not enforced, so it can fail silently in production.

HighFix effort: ~20 min in Claude

How to fix it

The system is tested at real peak volume with concurrency limits, and per-run and per-day spend caps hard-stop a runaway before it burns the budget.

Paste this to Claude

You are hardening a production AI automation. Implement the "Load Behavior and Cost Ceilings" safeguard described below. Goal: The system is tested at real peak volume with concurrency limits, and per-run and per-day spend caps hard-stop a runaway before it burns the budget. Do this: 1. Find where the automation performs the operation this checkpoint covers. 2. Add the check so a bad, missing, or malformed value stops the run or routes to human review, instead of flowing through as a confident wrong result. 3. Handle the edge cases explicitly: empty input, partial data, and a source that changed shape since the last run. 4. Fail loud. Raise a clear error or flag the item with the offending input attached, and log it. Never coerce missing data to a default silently. Constraints: do not change unrelated behavior, keep the change minimal, and add a test that covers it. Verify: Feed an input that should trip Load Behavior and Cost Ceilings and confirm it flags or halts, then confirm a clean input still passes.

Paste that into your Claude or Cursor and it applies the fix in your code. Or we get it to green for you.

Confirm it is fixed

Feed an input that should trip this check. It should flag or halt, not pass clean.

Scan another automation