#25OWASP LLM Top 10 (2025)

AI / LLM Features

Prompt injection, agents & tools, RAG, and denial-of-wallet.

Download .md

How to use this prompt

  1. 1
    Install SecureNow in your project (then optionally npx securenow login):
    $ npm install securenow
  2. 2Copy the prompt below and paste it into your AI coding agent (Claude Code, Cursor, Codex…) opened at the root of your project.
  3. 3It generates four files into threat/25-ai-llm-features/ — open ai-llm-features-code-findings.html (the audit) and ai-llm-features-detection-mitigation.html (the defenses) in your browser.

🔒Runs entirely in your environment — your codebase is never uploaded or shared. The generated HTML reports are self-contained and work offline.

The prompt

# AI / LLM Features Threat Model — Generator Prompt

A **copy-paste prompt** for customers. Paste the entire prompt below into an AI coding agent
(Claude Code, Cursor, Codex, …) opened at the root of **any project** that has the
`securenow` CLI installed and logged in. The agent will inventory every place an LLM is wired
into the product (prompts, retrieval/RAG, tool & function calling, autonomous agents), build an
exhaustive **AI / LLM features** threat model mapped to the **OWASP Top 10 for LLM Applications
(2025)** (LLM01–LLM10) plus **API4:2023** (cost), **API6:2023** (abuse) and **A03:2021**
(output handling), audit the code for LLM-layer flaws, and emit a SecureNow-branded
**two-track** deliverable set — a **Detection & Mitigation** runbook and a **Code Findings &
Recommendations** audit, each in **Markdown + self-contained HTML** — including the detection
rules to create (every one a **ready-to-copy** command unit), the mitigation commands to run, how
to test each one, the code-level findings (audited, **not** fixed), and which threats still need
the SecureNow team. Every rule, flag, event name, and SQL column is **grounded in the SecureNow
SDK actually installed in this repo** (Phase 0.5) and emitted as a **ready-to-copy block**.

This is the **AI / LLM features** model — everything that becomes reachable once an LLM is wired
into the product: **prompts, retrieval (RAG), tool/function calling, and autonomous agents**. It
owns **prompt injection** (direct + indirect), **sensitive-info disclosure & system-prompt
extraction**, **LLM supply chain**, **data/model poisoning**, **improper output handling**,
**excessive agency**, **vector/embedding weaknesses**, **misinformation-driven actions**, and
**unbounded consumption / denial-of-wallet**. It **defers** (does not re-derive): AI **stream
duration / token-per-connection limits** → [../17-realtime-websocket-sse/](../17-realtime-websocket-sse/);
**raw request volume / per-IP rate** → [../12-rate-limits-and-abuse/](../12-rate-limits-and-abuse/);
**PII/secrets sent to or leaked back by the model as a data-flow concern** →
[../24-data-privacy-pii/](../24-data-privacy-pii/); and **model-output → server-side sink
injection** (SQLi/RCE/SSTI mechanics) → [../06-injection/](../06-injection/). This model
references those and detects their *LLM-specific observable abuse* only.

> **Honest posture.** SecureNow is an **API / traffic** security layer (firewall, rate-limit,
> challenge, exploit-signature instant-block, ASN enrichment, forensics, custom events). For the
> LLM domain its native coverage is **MEDIUM**: it can **detect prompt-flooding / cost abuse**
> (stream + flow-rate rules, `api.stream.limit_exceeded`), **instant-block the exploit signature**
> when **model output** flows into an HTTP request that hits a SQL/HTML/shell sink, and
> **rate-limit / challenge** abusive callers. But the *core* LLM defenses — **prompt hardening**,
> **output sanitization/encoding**, **tool & agent scoping**, **RAG hygiene & tenant isolation**,
> and **human-in-the-loop gates** — are **app fixes**. SecureNow sees the **request and the
> event**; it does not see what is inside a prompt, a retrieved document, or a tool call unless
> the app emits an event. **Every edge-containment row is paired with the app fix.**

Requirements on the customer machine: `npm i -g securenow && securenow login` (admin auth +
app runtime connected). Everything else is discovered by the agent.

---

<!-- ════════════════ COPY EVERYTHING BELOW THIS LINE ════════════════ -->

# Generate an AI / LLM Features Threat Model Report (SecureNow)

You are a senior application-security engineer specializing in AI / LLM application security.
Produce an **exhaustive AI / LLM features threat model for THIS codebase**, organized along the
**OWASP Top 10 for LLM Applications (2025)** (LLM01–LLM10), additionally tagged with
**API4:2023** (unrestricted resource consumption / cost), **API6:2023** (unrestricted access to
sensitive business flows / abuse) and **A03:2021** (improper output handling / injection), mapped
to **SecureNow** detections and mitigations, with a ready-to-run action plan **and** a code-level
audit of every LLM-layer flaw you find. You write **four** deliverables (two tracks) into
`threat/25-ai-llm-features/` (create the folder if needed):

1. `ai-llm-features-detection-mitigation.md` — the **operational runbook**: what to run in
   SecureNow (detection rules, mitigation commands, action plan, testing, response runbooks).
2. `ai-llm-features-detection-mitigation.html` — the same runbook as a **self-contained** HTML
   page (inline CSS + offline copy buttons, no network requests), using the SecureNow skeleton.
3. `ai-llm-features-code-findings.md` — the **code audit**: LLM-layer issues found in the
   codebase + recommended fixes (described, **never** applied).
4. `ai-llm-features-code-findings.html` — the same audit as a **self-contained** HTML page.

The two tracks **cross-link** each other: the gaps / instrumentation rows in the detection report
link to the relevant code finding, and the app-fix rows in the code report link back to the
detection-report row they back. Every detection is grounded in the **installed** SecureNow SDK
(Phase 0.5) and emitted as a **ready-to-copy** command unit (SQL → `rules/<name>.sql` → full
`securenow alerts rules create …` → dry-run test).

Work in the phases below, in order. **Never invent facts**: if something is not in the
codebase or not returned by a CLI command, say "not found" — do not guess. **Do not modify
application code.** You are auditing: every code-level fix is *described in the report*, never
applied to the repo.

**Scope discipline.** This model owns **LLM01–LLM10**, plus the LLM-flavored slices of **API4**
(token/cost abuse, prompt flooding, denial-of-wallet), **API6** (abuse of AI-backed business
flows) and **A03** (trusting model output into a sink). For the following, do **not** re-derive
the deep model — list them in a "Deferred to sibling models" subsection, link those reports, and
only model their **LLM-specific observable** symptoms here:

- **AI stream duration / token-per-connection / message caps** → [../17-realtime-websocket-sse/](../17-realtime-websocket-sse/)
- **Raw request volume / per-IP rate limiting** → [../12-rate-limits-and-abuse/](../12-rate-limits-and-abuse/)
- **PII/secrets as a data-flow & redaction concern (what data is sent to / leaked by the model)** → [../24-data-privacy-pii/](../24-data-privacy-pii/)
- **Server-side sink mechanics of model-output injection (SQLi/RCE/SSTI/path)** → [../06-injection/](../06-injection/)

---

## Phase 0 — Verify SecureNow tooling

Run and record (use `--json` where supported):

```bash
securenow doctor              # connectivity must be healthy
securenow whoami              # admin auth + runtime app
securenow status --json       # app key(s), environment, firewall state
securenow alerts rules --json # detection rules that already exist (incl. system signature rules)
securenow automation --json   # blocklist automations that already exist
securenow challenge list --json   # CAPTCHA / proof-of-work challenge rules
securenow env --json          # resolved SDK config (service name, endpoints)
```

If the CLI is missing or not logged in, **stop** and tell the user to run
`npm i -g securenow && securenow login`, then re-run this prompt. Capture the **app key**
(UUID) — every rule and command in the report must use it. If multiple apps exist, ask the
user which app this codebase maps to before continuing. Note the **firewall state** and any
**system signature rules** (SQLi/XSS/RCE) already present — those are the backbone of the
**model-output → sink** injection coverage (LLM05) and must not be duplicated.

---

## Phase 0.5 — Ground every rule & command in the INSTALLED SDK

Before writing any SQL or CLI, read the SecureNow SDK that is actually installed in this repo so
every alert rule and command is correct for THIS version — never guess flags, subcommands, event
names, or SQL columns:

```bash
cat node_modules/securenow/package.json    # installed SDK version (record it in both reports)
ls node_modules/securenow                  # exported modules: events, sessions, register, run, …
ls node_modules/securenow/dist 2>/dev/null # built entrypoints / bundled CLI
npx securenow --help                       # top-level commands available in this version
npx securenow alerts rules --help          # exact create flags: --name/--sql/--apps/--severity/--schedule/--nlp
npx securenow event --help                 # `event send` shape for synthetic tests
npx securenow ratelimit --help; npx securenow challenge --help
npx securenow blocklist --help; npx securenow automation --help; npx securenow trusted --help
```

If `node_modules/securenow` is absent, run `npm ls securenow`; if still missing, tell the user to
`npm i securenow` (or `npm i -g securenow`) and stop. EVERY command, flag, `track('…')` event
name, and SQL column you emit MUST be one the installed SDK/CLI actually exposes. If the installed
version lacks a capability this prompt references, emit the rule but annotate it
`# requires securenow >= <version>` instead of a broken command. Record the resolved version in
the appendix of BOTH reports.

In Phase 4 and Phase 5, treat `node_modules/securenow` + `--help` as the source of truth: the
`securenow/events` `track()` signatures, the `securenow alerts rules` SQL columns, and every
mitigation subcommand are discoverable there. Cross-check before emitting.

---

## Phase 1 — Inventory the AI / LLM surface (codebase analysis)

LLM security starts with knowing **exactly what is wired to a model and what that model is
allowed to touch**. Document what is **actually present and reachable**, not what is intended.
Cover at minimum:

- **Providers & models** — every LLM/embedding/vision/speech provider in use (OpenAI, Anthropic,
  Google, Azure OpenAI, AWS Bedrock, Cohere, Mistral, self-hosted/Ollama/vLLM/HF), the exact
  model IDs, where the model name comes from (hardcoded vs. **user-selectable** — a poisoning/
  supply-chain vector), and whether any model/plugin/adapter is fetched from an **untrusted
  registry, URL, or user upload**. (LLM03.)
- **Entry points** — every route/handler/server-action/job/webhook that calls a model: chat,
  completion, summarize, classify, extract, RAG-answer, "ask your data", agent runs, background
  enrichment, embeddings indexing. Flag public vs. authenticated vs. internal. This catalog is a
  report deliverable.
- **Prompt construction** — locate every place a prompt is assembled. Record the **system
  prompt** source, how **user input** is concatenated/templated in, what **untrusted context**
  is interpolated (RAG chunks, prior turns, tool output, file/email/web content), and whether
  there is any **delimiting, escaping, or instruction/data separation** between trusted
  instructions and untrusted content. Note any **secrets, API keys, internal URLs, or business
  rules embedded directly in the system prompt**. (LLM01, LLM02, LLM07.)
- **RAG / retrieval pipeline** — vector store(s) in use (pgvector, Pinecone, Weaviate, Qdrant,
  Chroma, Milvus, Elasticsearch/OpenSearch kNN, Redis), what gets ingested (user uploads, web
  crawl, email, Slack, support tickets, public docs), the **embedding model**, and — critically —
  **how tenant/user scope is enforced on retrieval** (per-tenant namespace/collection/filter, or
  a shared index with a `WHERE tenant_id=…` that could be missing or bypassable). Note ingestion
  **trust** (is untrusted text indexed verbatim?) and any feedback loop that writes model output
  or user thumbs-up back into the store. (LLM01-indirect, LLM04, LLM08.)
- **Tools / function calling** — enumerate **every tool/function** the model can invoke (name +
  what it does + what it can reach): HTTP fetch/browse, file read/write, shell/code execution,
  SQL/db query, email/SMS send, payment/refund/transfer, calendar, internal-API calls,
  cloud-resource actions. For each, record: who authored the args (model vs. validated), the
  **permission scope** (does the tool run with the *caller's* authz or a broad service
  credential?), whether there is a **human-approval gate**, an **allowlist** of targets, and any
  **idempotency / spend cap**. (LLM05, LLM06, LLM10.)
- **Agents / autonomy** — multi-step agents, planners, auto-loops, "do-until-done" runs.
  Record max iterations/steps, whether each step's tool call is re-authorized, the **memory**
  store (shared vs. per-tenant — context-bleed risk), and whether agent actions that change
  state require confirmation. (LLM06, LLM08.)
- **Output sinks** — where does model output **go**? Rendered as HTML in a browser (→ XSS),
  concatenated into SQL/NoSQL (→ injection), passed to `exec`/shell (→ RCE), used as a URL the
  server fetches (→ SSRF), written to a file path, fed to another tool/agent, or used to make a
  **decision/action without verification** (→ misinformation-driven action). For each sink, note
  whether the output is **encoded/sanitized/schema-validated** before use. (LLM05, LLM09, A03.)
- **Streaming** — is model output streamed (SSE / WebSocket / chunked)? Note duration, token,
  and message caps per connection (deep model: see [../17-realtime-websocket-sse/](../17-realtime-websocket-sse/)).
- **Cost & quota controls** — per-user/per-tenant token budgets, request quotas, max
  prompt/completion length, max RAG chunks, max agent steps, monthly spend caps, and per-route
  rate limits on AI endpoints. Note where **none** exist (denial-of-wallet). (LLM10, API4.)
- **Sensitive AI-backed business flows** — anything a bot would farm or that costs money per
  call: "generate N images/docs", bulk summarize/translate, AI-assisted checkout/quote, free-tier
  AI credits, AI-powered refunds/decisions. (API6.)
- **Data sent to the model** — what the app puts in prompts/context: secrets, PII, other users'
  data, internal system details. Deep data-flow model: [../24-data-privacy-pii/](../24-data-privacy-pii/);
  here, note the LLM-specific exposure (secrets in system prompt, cross-tenant context).
- **Telemetry privacy & redaction** — confirm whether the SecureNow SDK / log pipeline redacts
  **prompts, completions, RAG chunks, tool args/results, API keys, and PII** before ingestion. AI
  payloads are high-risk to log. If event attributes will carry prompt/PII fragments, that is a
  high-severity finding — events must carry **hashes/labels/counts, never raw prompt text**.
- **SecureNow instrumentation already present** — `securenow/register` / `securenow run` /
  `securenow init` (gives HTTP traffic spans automatically), any `securenow/events` `track()`
  calls (look for `api.stream.*`, `api.sensitive.flow`, or any `llm.*`), and whether the firewall
  is engaged. This determines what works *today* vs. *after instrumentation*.

Output of this phase = the report's **AI / LLM surface & inventory** section: the
provider/model table, the AI entry-point catalog (route/visibility/model), the **prompt-construction
map** (system-prompt source / untrusted-context interpolation / instruction-data separation), the
**RAG pipeline** (store / ingestion trust / **tenant scoping**), the **tool & agent inventory**
(tool / reach / authz scope / human gate / spend cap), the **output-sink list** (sink / encoding
status), the **cost/quota control table**, the AI-backed sensitive-flow list, the **telemetry
redaction status**, and a short paragraph naming the real LLM attack surface for this stack.

---

## Phase 2 — Enumerate threats (exhaustive catalog)

Evaluate **every** threat below against the discovered surface. Each item is either **modeled**
(a row in the threat matrix) or **explicitly N/A** (one line in an "Out of scope" subsection
with the reason — e.g. "Agent items: N/A, no tool/function calling, single-shot completion only").
Never silently drop an item. Add stack-specific threats you discover that are not listed — this
catalog is the floor, not the ceiling. Tag each modeled row with its **OWASP LLM Top 10 (2025)**
code (LLM01–LLM10) and, where applicable, an **API4 / API6 / A03** tag.

**A. Direct prompt injection (OWASP LLM01)**
1. Jailbreak — "ignore previous instructions", DAN-style, roleplay/persona to bypass guardrails
2. System-prompt override — user input that re-defines the system role or appends new instructions
3. Role / turn confusion — fake `system:`/`assistant:` markers, fake function-result framing
4. Instruction smuggling via encoding/obfuscation (base64, homoglyphs, leetspeak, zero-width chars, foreign-language)
5. Multi-turn / crescendo injection — benign turns that gradually steer the model past guardrails
6. Payload-splitting / token-smuggling across messages to evade a single-message filter

**B. Indirect prompt injection (OWASP LLM01, via untrusted content)**
7. Poisoned RAG document — instructions embedded in an indexed file/page that the model later obeys
8. Web/browse tool fetching attacker-controlled page with hidden instructions
9. Email/ticket/Slack content the agent reads containing injected commands
10. File-content injection (PDF/CSV/HTML/image-alt/metadata) processed into the prompt
11. Tool-output injection — a tool returns attacker-influenced text the model then treats as instructions
12. Cross-user injection — content written by user A steers the model in user B's session

**C. Sensitive-information disclosure (OWASP LLM02)**
13. Secrets/credentials/API keys present in context or system prompt surfaced in a completion
14. PII / other-tenant data in the context window returned to the wrong user
15. Training/fine-tune data memorization regurgitated on prompting
16. Verbose error/debug output from a tool or model exposing internals

**D. System-prompt & instruction extraction (OWASP LLM07 / LLM02)**
17. System-prompt leakage — "repeat everything above", "what are your instructions"
18. Secrets/keys/internal URLs/business rules embedded in the prompt extracted verbatim
19. Tool/function schema & internal capability disclosure ("list your tools and their params")
20. Guardrail/policy enumeration to map and then bypass the filter

**E. Supply chain (OWASP LLM03)**
21. Untrusted / user-selectable model, adapter, or LoRA loaded from an unverified source
22. Malicious or unpinned LLM plugin / tool package / MCP server / SDK dependency
23. Poisoned pretrained model or embedding model from a public registry (no integrity check)
24. Compromised dataset used for fine-tuning / few-shot examples

**F. Data & model poisoning (OWASP LLM04)**
25. RAG store poisoning — attacker gets malicious content indexed to bias future answers
26. Feedback-loop poisoning — thumbs-up/correction/user edits written back to influence the model
27. Fine-tune / continuous-learning poisoning via crafted user interactions
28. Embedding-space poisoning to dominate retrieval for a target query

**G. Improper output handling (OWASP LLM05 / A03:2021)**
29. Model output rendered as HTML/Markdown → stored/reflected **XSS** in the browser
30. Model output concatenated into SQL/NoSQL → **injection** (deep: ../06-injection/)
31. Model output passed to shell/`exec`/code-eval → **RCE**
32. Model output used as a URL the server fetches → **SSRF** (incl. cloud metadata)
33. Model output written to a file path / used in path construction → traversal/overwrite
34. Model output emitting markdown image/link that exfiltrates context to an attacker URL

**H. Excessive agency (OWASP LLM06)**
35. Tool with over-broad permissions (runs with a service credential, not the caller's authz)
36. State-changing/irreversible action (payment, delete, email-blast) with **no human gate**
37. Tool argument unvalidated — model fully controls a dangerous parameter (SQL, path, URL, amount)
38. Agent loop with no max-step / spend cap chaining tools toward an unintended outcome
39. Confused-deputy — model induced (via injection) to call a privileged tool on the attacker's behalf

**I. Vector / embedding weaknesses (OWASP LLM08)**
40. Cross-tenant retrieval — shared index with missing/bypassable tenant filter returns other tenants' chunks
41. Retrieval authorization gap — user retrieves documents they cannot read in the app
42. Embedding inversion / membership inference — reconstruct source text or test presence of a record
43. Similarity/threshold abuse to surface adjacent sensitive records

**J. Misinformation & hallucination-driven action (OWASP LLM09)**
44. Hallucinated fact/citation acted on without verification (wrong refund, wrong record)
45. Model output auto-triggers a side effect (DB write, ticket, email) with no human/schema check
46. Package-hallucination ("slopsquatting") — model suggests a non-existent dependency that an attacker registers
47. Over-reliance — UI presents AI output as authoritative with no provenance/confidence

**K. Unbounded consumption / denial-of-wallet (OWASP LLM10 / API4)**
48. Prompt flooding — high request rate to an AI endpoint from one IP (token-cost amplification)
49. Distributed AI flood (many IPs / one ASN) hitting paid completions
50. Oversized prompt / max-context stuffing per request (per-call cost blow-up)
51. Long/looping agent run consuming unbounded tokens/steps (denial-of-wallet)
52. Unbounded RAG fan-out (huge top-k / chunk count) per query
53. Free-tier / AI-credit farming — signup-and-burn across many accounts
54. Recursive/self-amplifying generation (model output fed back as input)

**L. Tool/function-call abuse (OWASP LLM05+LLM06, observable as outbound/sink traffic)**
55. SSRF via an agent HTTP/browse tool (user-or-injection-controlled URL → internal/metadata)
56. File access via a file tool (read secrets, write webshell, traverse paths)
57. Code/command execution via a code-interpreter/shell tool
58. Money movement via a payment/refund/transfer tool driven by model output
59. Internal-API / cloud-resource action via an over-scoped tool credential

**M. Cross-tenant context bleed**
60. Shared agent memory / conversation store leaking one tenant's context into another's session
61. Prompt-cache / KV-cache or response cache keyed without tenant/user → cross-user answer
62. Shared few-shot/example pool containing one tenant's data shown to another

**N. Negative-space & evasion**
63. Encoding/normalization bypass of an input filter (unicode, base64, whitespace, markdown)
64. Multi-language / translation laundering to slip past an English-only guardrail
65. Output-channel evasion — exfil via markdown image, code block, or tool-call args
66. Client-IP header spoofing (`X-Forwarded-For`) to evade per-IP AI rate/cost limits
67. Direct-to-provider bypass — client holds the provider key and calls the model API directly (no app guardrail)

**O. Observable abuse (what telemetry actually catches — the workhorse rules)**
68. Spike in AI-endpoint request rate from one IP / ASN (flood, cost abuse) — traffic + `api.stream.started`
69. Stream/token/step cap repeatedly hit by one client (`api.stream.limit_exceeded`)
70. Exploit-signature match in a request carrying **model output flowing to a sink** (SQLi/XSS/RCE) → instant block
71. 4xx/5xx spike on AI routes from one IP (jailbreak probing, fuzzing the prompt API)
72. Anomalous count of AI/sensitive-flow executions per client (credit farming, scraping via AI)
73. Prompt-injection / guardrail-trip events emitted by the app (`llm.injection.suspected`)

**P. Deferred — modeled in sibling models (reference, do not re-derive)**
74. AI stream duration / token-per-connection / message caps (**API4**) → [../17-realtime-websocket-sse/](../17-realtime-websocket-sse/)
75. Raw per-IP request volume / generic rate limiting (**API4/API6**) → [../12-rate-limits-and-abuse/](../12-rate-limits-and-abuse/)
76. PII/secrets data-flow & redaction (what data reaches / leaks from the model) (**LLM02**) → [../24-data-privacy-pii/](../24-data-privacy-pii/)
77. Server-side sink mechanics of model-output injection (SQLi/RCE/SSTI/path) (**A03**) → [../06-injection/](../06-injection/)

> For 74–77, add **one** matrix row each marked *"deferred — see linked model"*, and only note
> the LLM-specific observable symptom here (e.g. stream-cap hits, AI-route flood, an injection
> signature that fires because *model output* was the payload). The full detection/mitigation
> lives in the sibling reports.

> Items in groups A–N are mostly **LLM01–LLM10** (tag each row); cost rows also carry **API4**,
> AI-flow-abuse rows **API6**, output-handling rows **A03**. Group O rows are the
> traffic/event-observable detections SecureNow runs. Each catalog item must be either a matrix
> row or an explicit N/A line with a reason — do not silently drop any item.

---

## Phase 3 — Audit the code (findings only — do not fix)

For **each** modeled threat that maps to real code, locate the responsible code and record a
**finding** for the report's "Code-level findings" section. A finding is:

- **Location** — `file:line` (clickable), the route/handler/tool/resolver/prompt-builder name.
- **Pattern** — quote the 1–8 relevant lines. State the missing control precisely (e.g. "user
  input concatenated into the system prompt with no delimiter/instruction-data separation";
  "RAG query has no `tenant_id` filter → cross-tenant retrieval"; "`fetch(toolArgs.url)` with no
  host allowlist → SSRF via tool"; "`dangerouslySetInnerHTML={{__html: completion}}` → XSS from
  model output"; "agent loop `while(!done)` with no max-step or spend cap"; "refund tool runs on
  a service key, not the caller's authz, with no human approval").
- **Why exploitable** — the concrete prompt/request an attacker sends and what they achieve
  (data exfil, cross-tenant read, RCE, money movement, runaway cost).
- **Severity** — critical / high / medium / low (impact × reachability).
- **Recommended fix (described, not applied)** — the specific change: e.g. "separate trusted
  instructions from untrusted content with explicit delimiters and treat retrieved/tool text as
  data, never instructions"; "scope every RAG query by tenant namespace + post-filter by the
  caller's document ACL"; "validate tool args against a strict schema and an allowlist; run tools
  with the caller's authz, never a broad service credential"; "require human approval for
  state-changing/irreversible tool actions and cap spend/steps"; "encode/sanitize model output
  before rendering (no `dangerouslySetInnerHTML`), parameterize any query, never pass output to
  shell/`eval`, validate any output-derived URL against an allowlist"; "redact secrets/PII from
  prompts and never embed secrets in the system prompt"; "cap prompt length, top-k, max tokens,
  and per-user/tenant token budget". Reference the secure pattern, not a code diff. **You must
  not edit the codebase.**

If a control exists and is correct (instruction-data separation present, RAG tenant-scoped,
tools run with caller authz + human gate, output encoded, token budgets enforced), note it as a
**strength** — the posture must be honest. Absence of a control where the surface exists is
itself a finding ("no tenant filter on the shared vector index"; "no max-step cap on the agent").

Look specifically for:

**Prompt-construction flaws** — user/RAG/tool text concatenated into the prompt with no
delimiting or instruction/data separation; system prompt re-derivable from user input; secrets,
API keys, internal URLs, or business rules embedded in the system prompt; no length cap on
interpolated untrusted content. *Recommended fixes must mention* explicit trusted/untrusted
delimiters, treating all retrieved/tool/user content as data not instructions, moving secrets out
of the prompt into the app layer, and length caps.

**RAG / retrieval flaws** — vector queries with no tenant/namespace filter or a filter that the
caller can influence; retrieval that ignores per-document ACLs; untrusted text indexed verbatim
(indirect-injection seed); feedback/thumbs/edit writes flowing back into the index without
review. *Recommended fixes must mention* per-tenant collections/namespaces, server-side tenant
filters the client cannot override, post-retrieval ACL filtering, sanitizing/quarantining
ingested untrusted content, and gating feedback writes.

**Tool / agent flaws** — tools whose args are model-controlled without schema validation or
allowlists; tools running on broad service credentials instead of the caller's authz; no
human-approval gate on state-changing/irreversible actions; agent loops with no max-step / spend
/ time cap; tool output trusted as instructions; no per-tool rate/idempotency limit. *Recommended
fixes must mention* strict arg schemas + allowlists, least-privilege/caller-scoped tool authz,
human-in-the-loop approval for high-impact actions, max-step & spend caps, idempotency on
paid/state-changing tools, and treating tool output as data.

**Output-sink flaws** — model output rendered with `dangerouslySetInnerHTML` / `v-html` /
unescaped template; output concatenated into SQL/NoSQL/shell/`eval`; output used as a fetched URL
with no allowlist; output written to a file path; output auto-triggering a side effect with no
schema/human check. *Recommended fixes must mention* contextual output encoding, parameterized
queries, never passing output to a shell/interpreter, allowlisting output-derived URLs and
blocking link-local/private ranges, and schema-validating + human-gating any output-driven action
(cross-link [../06-injection/](../06-injection/) for sink mechanics).

**Cost / quota flaws** — AI routes with no per-user/tenant token budget, no max prompt/completion
length, no top-k cap, no agent-step cap, no per-route rate limit, no monthly spend cap; recursive
generation with no depth limit. *Recommended fixes must mention* token budgets, max input/output
lengths, top-k and step caps, per-route rate limits/quotas, spend ceilings, and circuit breakers.

**Supply-chain / model-source flaws** — user-selectable or URL-fetched models/adapters/plugins
without integrity verification; unpinned LLM SDK / plugin / MCP / embedding dependencies; datasets
fetched without checksum. *Recommended fixes must mention* pinning + integrity verification,
allowlisting model/plugin sources, vendoring/checksumming datasets, and dependency scanning in CI.
(Inventory from lockfiles/manifests — **do not run destructive updates**.)

**Telemetry leakage** — prompts, completions, RAG chunks, or tool args/results logged or placed
in event attributes without redaction. *Recommended fixes must mention* hashing/labeling instead
of raw text, redacting secrets/PII before any log/event, and never sending prompt bodies to
telemetry.

---

## Phase 4 — Map every modeled threat to SecureNow detection + mitigation

Classify each threat with exactly one coverage badge:

- 🟢 **COVERED** — detectable + mitigable with SecureNow today (existing rule, system signature
  rule, or a rule you provide the SQL for, on telemetry already flowing). For this domain that is
  mainly: **AI-endpoint flooding / cost abuse**, **stream/token-cap abuse**, **AI-route fuzzing
  spikes**, and **model-output-to-sink injection** caught by the **system signature rules +
  `instant.block`**. Plus rate-limit / challenge / block of the abusive caller.
- 🟡 **PARTIAL** — works after the customer adds instrumentation (`track('api.stream.*')`,
  `track('api.sensitive.flow')`, or the `llm.*` events below), or the detection is inherently
  pattern-based / false-positive-prone (notify-only), or SecureNow can only *contain the abuser
  at the edge* while the **real fix is app-level** (prompt hardening, output sanitization, tool/
  agent scoping, RAG hygiene). **Most prompt-injection and excessive-agency rows are 🟡 paired
  with the app fix.**
- 🔴 **GAP** — SecureNow cannot detect or mitigate this today (it lives entirely inside the model
  or the app and emits no traffic or event). **Still include it**: give the app-level fix, then
  add the line *"Requires SecureNow team — contact your SecureNow account contact (or
  in-dashboard support) to request support for this threat."* Collect all gaps in the report's
  "Known gaps & SecureNow feature requests" section.

> **Be brutally honest about edge-detectable vs. app fix.** SecureNow sees **traffic** and
> **events**; it contains actors via firewall / rate-limit / challenge / block / signature
> instant-block. It **cannot** read what is inside a prompt, a retrieved chunk, a tool call, or a
> completion. Direct prompt injection, system-prompt extraction, cross-tenant RAG retrieval,
> excessive agency, and improper output handling are **app fixes** — SecureNow detects the
> *abuse the app reports* (an `llm.injection.suspected` event, a stream-cap hit, an AI-route
> flood) and the *exploit signature when model output hits an HTTP sink*, and contains the
> source, but the missing control is the primary fix. **Pair the control with the app fix on
> every such row.** A flaw that emits no traffic and no event is 🔴 until the app emits the event
> in Phase 3's recommended fix.

Use **only** the SecureNow building blocks below. Never invent CLI flags, event names, or SQL
columns. **Prefer the existing taxonomy events** (`api.stream.started`,
`api.stream.limit_exceeded`, `api.sensitive.flow`) and propose `llm.*` events **only where
genuinely needed** (there is no traffic/existing event for the signal).

### 4a. Instrumentation (what AI / LLM detections feed on)

Once the app runs under `securenow run` / `securenow/register` / `securenow init`, **HTTP traffic
is captured automatically** — status codes (incl. **429**/**5xx**), methods, paths, client IPs,
response sizes. AI-endpoint flooding, fuzzing, and 5xx spikes therefore need **no events**.

The LLM-specific signals live **inside the app**, so add `securenow/events` `track()` at the
enforcement point (it **never throws**). **Reuse these existing taxonomy events first:**

```js
const { track } = require('securenow/events');

// A model/stream call opened — feeds AI-flood, cost, and per-user-rate detection (REUSE):
track('api.stream.started', { userId, ip, attributes: { route: '/api/chat', type: 'ai_stream' } });

// A model/stream call hit a cap — token/step/duration/queue (REUSE — this is the cost-abuse signal):
track('api.stream.limit_exceeded', { userId, ip, attributes: { route: '/api/chat', type: 'ai_stream', reason: 'tokens|duration|messages|queue|steps' } });

// An AI-backed sensitive/business flow executed — feeds credit-farming & abuse-rate detection (REUSE):
track('api.sensitive.flow', { userId, ip, attributes: { flow: 'ai_generate|ai_summarize|ai_refund|image_gen', items: '50' } });

// An idempotency/replay guard rejected a paid model call (REUSE — denial-of-wallet via retries):
track('api.idempotency.missing', { userId, ip, attributes: { route: '/api/chat', flow: 'llm' } });
```

**Propose `llm.*` events only for genuinely new signals** the existing taxonomy can't express —
the app's own guardrails firing. All `llm.*`; emit at the enforcement point; **never put raw
prompt/PII text in an attribute — hash/label/count only:**

```js
const { track } = require('securenow/events');

// The app's input guardrail flagged a likely prompt-injection / jailbreak attempt (NEW):
track('llm.injection.suspected', { userId, ip, attributes: { route: '/api/chat', source: 'user|rag|tool|web|email|file', signal: 'jailbreak|override|role_confusion|encoded|exfil_markdown', score: '0.92' } });

// The app's output filter blocked/scrubbed unsafe model output before it reached a sink (NEW):
track('llm.output.blocked', { userId, ip, attributes: { route: '/api/chat', sink: 'html|sql|shell|url|file|tool', reason: 'script_tag|sql_keyword|shell_meta|private_url|path_traversal' } });

// The app detected/blocked a cross-tenant retrieval or memory access (NEW — high-signal):
track('llm.retrieval.cross_tenant_blocked', { userId, ip, attributes: { store: 'pgvector|pinecone|qdrant', reason: 'tenant_mismatch|acl_denied' } });

// A model-requested tool call was denied by scope/allowlist/human-gate (NEW):
track('llm.tool.denied', { userId, ip, attributes: { tool: 'http_fetch|file_read|shell|sql|payment', reason: 'not_allowlisted|over_scope|needs_approval|bad_args|target_blocked' } });

// A system-prompt / instruction / tool-schema extraction attempt was detected (NEW):
track('llm.prompt.extraction_attempt', { userId, ip, attributes: { route: '/api/chat', target: 'system_prompt|tool_schema|guardrail|secret' } });

// A per-user/tenant token or spend budget was exceeded (NEW — denial-of-wallet enforcement):
track('llm.cost.budget_exceeded', { userId, ip, attributes: { route: '/api/chat', metric: 'tokens|requests|spend', limit: '100000', window: '1d' } });
```

> Hash or omit any prompt/completion/PII text before it becomes an attribute value — see the
> Phase 1 **telemetry privacy & redaction** check. Attributes feed detection; they must **never**
> become a new prompt/PII leak path. Use `score`, `reason`, `signal`, `target`, counts — not raw
> text.

Recommended AI / LLM event taxonomy — rules match these **exact strings**:

| Event | Reuse / New | Emit when |
|---|---|---|
| `api.stream.started` | reuse | a model/stream call opens (feeds AI-flood & cost detection) |
| `api.stream.limit_exceeded` | reuse | a stream hits a token/duration/messages/queue/steps cap |
| `api.sensitive.flow` | reuse | an AI-backed business flow (generate/summarize/refund) executes |
| `api.idempotency.missing` | reuse | a paid model call ran without an idempotency key |
| `llm.injection.suspected` | new | input guardrail flags a likely prompt-injection/jailbreak |
| `llm.output.blocked` | new | output filter blocks/scrubs unsafe model output before a sink |
| `llm.retrieval.cross_tenant_blocked` | new | a cross-tenant RAG/memory access is detected/blocked |
| `llm.tool.denied` | new | a model-requested tool call is denied (scope/allowlist/human-gate/args) |
| `llm.prompt.extraction_attempt` | new | a system-prompt / tool-schema / guardrail extraction attempt is detected |
| `llm.cost.budget_exceeded` | new | a per-user/tenant token or spend budget is exceeded |

Custom `attributes` become queryable as `attributes_string['<key>']` (e.g.
`attributes_string['sink']`, `attributes_string['tool']`). Ingest enriches every IP with
**ASN/org** (`client.asn`, `client.as_org`) — enabling botnet/datacenter-origin detection on AI
floods with no extra code.

### 4b. Detection rules — SQL conventions

Two query shapes. Both **must** keep the tenant scope and **must** select an `ip` column
(per-IP aggregation is what remediation/auto-block keys on). **The tenant-scope column differs
by table** — using the wrong one fails with `UNKNOWN_IDENTIFIER`:

- **logs/events** (`signoz_logs.distributed_logs_v2`) → `resources_string['service.name'] IN (__USER_APP_KEYS__)`
- **traces/HTTP** (`signoz_traces.distributed_signoz_index_v3`) → `` `resource_string_service$$name` IN (__USER_APP_KEYS__) ``

When grouping by `ip`, add `HAVING ip != '' AND …` so rows with no client IP don't aggregate
into an empty-key bucket. Traffic columns proven available: `response_status_code`, `kind`
(server span = 2), `ts_bucket_start`, `attributes_string['http.target']`, the `client_ip`
coalesce below. Confirm any other column with a `--mode dry_run` before relying on it.

**Traffic-based — AI-endpoint flood / cost abuse (single IP), no events needed** (scope the path
filter to the app's real AI routes from Phase 1):

```sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip,
       count() AS requests,
       uniqExact(attributes_string['http.target']) AS distinct_paths
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 5 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 5 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/api/chat%' OR attributes_string['http.target'] LIKE '%/ai/%' OR attributes_string['http.target'] LIKE '%/completion%' OR attributes_string['http.target'] LIKE '%/agent%')
GROUP BY ip
HAVING ip != '' AND requests >= 120
```

**Traffic-based — AI-route fuzzing / jailbreak probing (client errors on AI paths):**

```sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip,
       count() AS requests,
       countIf(response_status_code IN ('400','403','422','429')) AS client_errors
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 15 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 15 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/api/chat%' OR attributes_string['http.target'] LIKE '%/ai/%' OR attributes_string['http.target'] LIKE '%/completion%' OR attributes_string['http.target'] LIKE '%/agent%')
GROUP BY ip
HAVING ip != '' AND client_errors >= 30
```

**Events-based — stream / token / step cap repeatedly hit (LLM10 / API4; logs table):**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['route']          AS route,
  attributes_string['reason']         AS reason,
  count() AS cap_hits
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'api.stream.limit_exceeded'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY ip, route, reason
HAVING ip != '' AND cap_hits >= 10
```

**Events-based — prompt-injection / jailbreak attempts flagged by the app guardrail (LLM01):**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['source']         AS source,
  attributes_string['signal']         AS signal,
  count() AS attempts
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'llm.injection.suspected'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY ip, source, signal
HAVING ip != '' AND attempts >= 5
```

**Events-based — cross-tenant retrieval / tool-denied / extraction — any hit is high-signal:**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['event.type']     AS signal,
  count() AS attempts
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] IN ('llm.retrieval.cross_tenant_blocked','llm.tool.denied','llm.prompt.extraction_attempt','llm.output.blocked','llm.cost.budget_exceeded')
  AND timestamp >= now() - INTERVAL 30 MINUTE
GROUP BY ip, signal
HAVING ip != '' AND attempts >= 1
```

**Events-based — AI-backed sensitive-flow / credit-farming abuse (LLM10 / API6):**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['flow']           AS flow,
  count() AS executions
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'api.sensitive.flow'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY ip, flow
HAVING ip != '' AND executions >= 50
```

The other `llm.*` events follow the **same shape** — swap the `event.type` filter and threshold:
`llm.injection.suspected` (≥5/15m → injection campaign), `llm.tool.denied` (≥3 → tool-abuse
probing), `llm.retrieval.cross_tenant_blocked` (≥1 → notify immediately), `llm.output.blocked`
(≥10/15m → output-exploitation probing), `llm.prompt.extraction_attempt` (≥3 → system-prompt
recon), `llm.cost.budget_exceeded` (≥1 → denial-of-wallet, notify + consider rate-limit).

**Improper output handling (group G) — use the SecureNow system signature rules, don't write
SQL.** When **model output flows into an outbound HTTP request that carries an SQLi / XSS / RCE
payload**, that request is on the wire and the **system signature rules** with synchronous
**`instant.block`** match it (block in ~2.6s on ingest). Confirm they're present/enabled for this
app via `securenow alerts rules --json` and enable `instant.block` rather than authoring
duplicate pattern SQL. The signature stops the payload-bearing request; only **output
encoding/parameterization in the app** closes the sink (deep model: [../06-injection/](../06-injection/),
[../04-xss-csrf-cors/](../04-xss-csrf-cors/)).

Useful attributes/columns: `event.type`, `http.client_ip`, `http.target`, `response_status_code`,
`kind`, `client.asn`, `client.as_org`, and your LLM attributes (`route`, `source`, `signal`,
`sink`, `tool`, `reason`, `flow`, `metric`).

**Every detection is a ready-to-copy command unit — never a fragment.** For each rule, emit, in
this order, each as its own fenced block so it copies cleanly: ① the SQL, ② a line saving it to
`rules/<name>.sql`, ③ the full `securenow alerts rules create …` command, ④ the dry-run test.
The flags must match `securenow alerts rules --help` from **Phase 0.5** — never invent them. Save
each rule's SQL to `rules/<name>.sql` so `--sql @rules/<name>.sql` resolves. Note pre-existing /
system rules (from Phase 0) instead of duplicating them. Example unit:

```sql
-- rules/llm-ai-flood.sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip, count() AS requests
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 5 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 5 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/api/chat%' OR attributes_string['http.target'] LIKE '%/ai/%' OR attributes_string['http.target'] LIKE '%/completion%' OR attributes_string['http.target'] LIKE '%/agent%')
GROUP BY ip HAVING ip != '' AND requests >= 120
```

```bash
securenow alerts rules create \
  --name "LLM: AI-endpoint flood / cost abuse (single IP)" \
  --sql @rules/llm-ai-flood.sql \
  --apps <APP_KEY> \
  --severity high \
  --schedule "*/5 * * * *" \
  --nlp "single IP making 120+ requests to AI endpoints in 5 minutes"

securenow alerts rules test <RULE_ID> --mode dry_run --wait     # validate before it runs live
```

Reuse this exact structure for every rule in section 4b and in the Detection & Mitigation report.
Injection-class rows (group G) do **not** get their own SQL — they reference the **system
signature rules + `instant.block`** instead.

#### Test mode for false-positive-prone rules — tag every rule `test-first` or `prod-ready`

Alert rules have a lifecycle **mode**: `test` = **detect-only, NO mitigation** vs `prod` = full
(mitigation / auto-action armed) — plus a **status** (`Active | Disabled | Paused`). Manage with:

```bash
securenow alerts rules update <RULE_ID> --mode test     # detect-only: fires notifications, takes NO action
# …observe real traffic for several days; tune the threshold; add securenow fp exclusions for any FPs…
securenow alerts rules update <RULE_ID> --mode prod      # promote: arm the mitigation / auto-action
securenow alerts rules update <RULE_ID> --status Paused  # or --enable / --disable / --pause shortcuts
```

**Rule of thumb:** any detection that can **false-positive** — heuristic thresholds (AI-flood /
fuzzing / credit-farm counts), broad patterns, anomaly / volume rules, the `llm.injection.suspected`
and `llm.cost.budget_exceeded` heuristics, anything tuned to YOUR AI traffic — must ship in
**`--mode test` first**. Run it detect-only for **3–7 days of real traffic**, review what it flags,
raise/lower the threshold and add `securenow fp` exclusions for legitimate hits (power users,
internal eval/load jobs), then `--mode prod` to arm mitigation. Only **high-precision** rules
(exploit-signature SQLi/XSS/RCE matches on model-output→sink, exact-match IoCs, the high-signal
`llm.retrieval.cross_tenant_blocked` / `llm.tool.denied` / `llm.prompt.extraction_attempt` events
where any hit is actionable) may go straight to `prod`. In the report, **tag each rule `test-first`
or `prod-ready`** and say why. (`securenow alerts rules test <id> --mode dry_run --wait` is the
separate one-off *query* validation — run it before either.)

### 4c. Mitigation commands (the only allowed remediation surface)

For LLM abuse, SecureNow **contains the actor at the edge**; the **app fix** removes the
underlying weakness. For this domain the app fix is **primary on almost every row** (prompt
hardening, output sanitization, tool/agent scoping, RAG hygiene). Always pair them.

Once a threat is confirmed, **choose the narrowest effective mitigation(s) from ALL of these** and
combine them (e.g. rate-limit `/api/chat` + challenge a NAT egress + block the worst IP). Re-check
every command/flag against the installed SDK in **Phase 0.5** (`securenow <cmd> --help`); annotate
`# requires securenow >= <ver>` if absent. Scope by **app / env / route / method / IP / duration**
to avoid hitting real users (and real AI workloads).

| # | Mitigation | Command (ready-to-copy) | Use / scope |
|---|---|---|---|
| 1 | **Free firewall (network)** | `securenow firewall enable --app <APP_KEY> --env production` · `securenow run --firewall-only` · test `securenow firewall test-ip <ip> --path /api/chat --method POST` | 500k+ known-bad IPs, hourly refresh; drop scanners before they reach the AI endpoints. No app change. |
| 2 | **Exploit-signature instant block** | enable the `instant` config on the system SQLi/XSS/RCE signature rules (dashboard / MCP `securenow_alert_rule_instant_update`); custom rule → create with `--execution-mode instant` | synchronous ~2.6s block of the matching request — **model-output → sink** injection (group G: SQLi/XSS/RCE). Don't duplicate pattern SQL. |
| 3 | **IP block — global** | `securenow blocklist add <ip> --app <APP_KEY> --env production --reason "..."` | confirmed-malicious AI-abuse source, all routes. |
| 4 | **IP block — scoped to route (+ method)** | `securenow blocklist add <ip> --route /api/agent* --mode prefix --method ALL --app <APP_KEY> --env production --reason "..."` (`--mode exact\|prefix\|regex`, `--method GET\|POST\|…\|ALL`) | block an IP only on the AI/agent paths; least collateral. |
| 5 | **IP block — temporary / time-boxed** | `securenow blocklist add <ip> --duration 24h --reason "..."` (`30m`,`24h`,`7d`) · reverse `securenow blocklist unblock <id> --reason "..."` | auto-expiring containment of an AI scraper; audit-preserving unblock. |
| 6 | **Rate limit — per IP** | `securenow ratelimit add <ip> --limit 100 --window 1m --duration 24h --reason "..."` | throttle one abusive client across the app. |
| 7 | **Rate limit — per route (all clients, per-IP budget)** | `securenow ratelimit add --route /api/chat --mode prefix --method POST --limit 60 --window 1m --key-by ip` | cap an expensive AI endpoint (chat/completion/agent) for everyone — **prompt-flooding / denial-of-wallet**, budgeted per IP. |
| 8 | **Rate limit — per route + IP** | `securenow ratelimit add <ip> --route /api/chat --mode exact --method POST --limit 10 --window 1m --duration 24h` · NL `securenow ratelimit from-text "rate limit /api/chat to 10/min for 24h" --yes` · test `securenow ratelimit test <ip> --path /api/chat --method POST` | precise throttle of one client on one AI route. |
| 9 | **CAPTCHA / proof-of-work challenge** | `securenow challenge add --route /api/chat --difficulty 16 --clearance 30m` (route-wide) **or** `securenow challenge add <ip> --route /api/chat --difficulty 18 --clearance 30m` · test `securenow challenge test <ip> --path /api/chat --method POST` | bot AI-abuse / credit-farming / token-burning from **shared / NAT / CGNAT** egress — a human passes once, a script can't keep burning tokens. Prefer over a hard block when real users share the IP. |
| 10 | **Auto-block (risk-scored)** | `securenow automation defaults --yes` (≥95→7d, 90–94→72h, 85–89→24h) · custom `securenow automation create --conditions '[...]' --actions '[...]'` · preview `securenow automation dry-run <id>` | hands-off blocking by risk score; actions include block / rate_limit / requireCaptcha. |
| 11 | **Session revocation** | `securenow revoke …` (SDK `securenow/sessions` `guard()` / `isRevoked()`) | an AI session/account driven by an injection or a stolen token — kill the session, not the IP. |
| 12 | **Trusted IP (suppress)** | `securenow trusted add <ip> --label "Internal AI eval harness / partner batch / monitor"` | stop false positives from known-good infra (eval/load jobs) — suppresses detection **and** mitigation. NOT deny-by-default. |
| 13 | **Allowlist (deny-by-default)** | `securenow allowlist add <ip> --label "..." --reason "..."` ⚠️ once any entry exists, ONLY listed IPs reach the app | lockdown of an internal-only AI/admin surface. Never for a public AI app. |
| 14 | **False-positive exclusion** | `securenow fp create --conditions '[...]' --rule-scope this_rule --reason "..."` · `securenow fp mark <notification-id> <ip> --rule-scope this_rule` · preview `securenow fp dry-run --conditions '[...]'` | keep a noisy injection-heuristic / budget rule quiet without weakening it. |
| 15 | **App / config / code fix (primary for root cause)** | *described in the Code-Findings report, never auto-applied* | the actual LLM fix: instruction/data separation, treat retrieved & tool text as data, encode/parameterize output before any sink, allowlist + caller-scope tool authz, human-gate state changes, per-tenant RAG namespaces + ACL post-filter, token/step/spend budgets, move secrets out of the system prompt. SecureNow contains; the fix removes. |

**Choosing per threat** — by **confidence**: exploit-signature/exact IoC (model output carrying an
SQLi/XSS/RCE payload into a sink) → **instant-block** or block; probable AI-abuse bot on shared
egress → **challenge**; noisy/legit-mixed AI traffic (flood, credit-farming, injection-heuristic
trips) → **rate-limit (test-mode first)**; session/account compromise → **revoke**; known-good
eval/load noise → **trusted / fp**. By **blast radius**: always scope to the narrowest
`route`/`method`/`IP`/`duration` that stops the abuse; on NAT/CGNAT/shared IPs prefer
challenge/rate-limit over a hard block (a block nukes legitimate users; a challenge lets a human
through and stops the token-burning script). **On every LLM row, pair the edge mitigation with the
app/config fix (Code-Findings report)** — SecureNow can only contain the actor; prompt hardening,
output sanitization, tool/agent scoping, and RAG hygiene are the primary fix.

### 4d. Testing every detection and mitigation

Only test against apps/environments the user owns; prefer `--env local`/staging. For synthetic
source IPs use TEST-NET ranges (`192.0.2.0/24`, `198.51.100.0/24`, `203.0.113.0/24`).

```bash
# Synthetic AI cost-cap abuse — exercise the stream-cap rule end to end:
for i in $(seq 1 12); do
  securenow event send api.stream.limit_exceeded --ip 203.0.113.50 \
    --attrs route=/api/chat,type=ai_stream,reason=tokens,test=true
done

# Synthetic prompt-injection campaign — exercise the guardrail rule:
for i in $(seq 1 6); do
  securenow event send llm.injection.suspected --ip 203.0.113.51 \
    --attrs route=/api/chat,source=user,signal=jailbreak,score=0.95,test=true
done

# Synthetic cross-tenant retrieval / tool-denied (any hit = high-signal):
securenow event send llm.retrieval.cross_tenant_blocked --ip 203.0.113.52 \
  --attrs store=pgvector,reason=tenant_mismatch,test=true
securenow event send llm.tool.denied --ip 203.0.113.52 \
  --attrs tool=http_fetch,reason=not_allowlisted,test=true

# Synthetic AI-backed credit-farming:
for i in $(seq 1 55); do
  securenow event send api.sensitive.flow --ip 203.0.113.53 \
    --attrs flow=ai_generate,items=1,test=true
done

# Validate a rule query without waiting for the schedule:
securenow alerts rules test <RULE_ID> --mode dry_run --wait

# Traffic-based rules (AI flood / fuzzing) — generate spans, then check the pipeline:
securenow test-span "threat-model.llm.smoke"
securenow forensics "requests and 4xx/429 to AI routes by IP in the last hour" --env production

# Output-handling signatures — confirm a model-output-borne payload triggers the system rule +
# instant block (staging): send a request whose body carries a benign-but-matching marker
# (e.g. a completion containing ?q=' OR '1'='1 forwarded to a sink), then verify the block fired:
securenow firewall test-ip 203.0.113.50 --app <APP_KEY> --env production

# Mitigation verification:
securenow ratelimit test 203.0.113.50 --path /api/chat --method POST
securenow challenge test 203.0.113.50 --path /api/chat --method POST

# Confirm + clean up:
securenow notifications list --limit 10
securenow blocklist list      # then: securenow blocklist unblock <id> --reason "threat-model test"
securenow challenge list      # then: securenow challenge remove <id>
```

Every 🟢/🟡 threat row in the report must have a concrete test recipe (commands + expected
outcome: which rule fires, which notification appears, what the mitigation does).

---

## Phase 5 — Write the FOUR reports (two tracks)

Write **four** files into `threat/25-ai-llm-features/`, in **two tracks**. Track A is the
operational runbook (what to run in SecureNow); Track B is the code audit (what to fix in the
codebase). Each track is one Markdown file + one self-contained HTML file. The two tracks
**cross-link** each other: every gap / instrumentation row in the Detection report links to the
backing code finding, and every app-fix in the Code-Findings report links back to the
detection-report row it supports. The resolved installed `securenow` version (from Phase 0.5)
appears in the appendix of **both** reports.

### 5a. Detection & Mitigation report — `ai-llm-features-detection-mitigation.md` (+ `.html`)

The **operational runbook**. Sections, in order (same in `.md` and `.html`):

1. **Executive summary** — stats line (threats modeled · covered · partial · gaps · rules to
   create · mitigations), top 3 **detectable** LLM risks for this stack, installed `securenow`
   version + app key + firewall state, and a one-line OWASP LLM Top 10 (2025) coverage note
   (which LLM01–10 are owned vs. deferred, plus the API4/API6/A03 overlaps).
2. **SDK & environment** — installed SDK version (from `node_modules/securenow`), app key(s),
   environment, firewall state, existing rules / automations / challenge rules (from Phase 0), and
   the **system signature rules** present (the backbone of model-output→sink coverage).
3. **Threat → Detection → Mitigation matrix** — one row per modeled threat:
   `# | Threat | OWASP (LLMxx + API4/6/A03) | Coverage 🟢/🟡/🔴 | Detection rule | Tag (test-first/prod-ready) | Signal (threshold + window) | Schedule | Sev | Mitigation (edge + app fix)`.
   Severity ∈ {critical, high, medium, low}. Each row's **Mitigation** cell must pick **specific,
   scoped mitigation(s) from the §4c toolbox** (e.g. "ratelimit /api/chat 60/1m + app: token
   budget", "challenge /api/chat (NAT)", "instant-block system XSS signature + app: encode
   output") — never a generic "block the IP". Each row also carries a **`test-first` / `prod-ready`
   tag** (per the §4b test-mode rule): FP-prone heuristic/volume rules are `test-first`;
   high-precision signature / exact-IoC / single-hit rules are `prod-ready`. Include the deferred
   rows (74–77) pointing to the sibling models. Follow with the "Out of scope / N/A" list (one line
   + reason each). Gap / app-fix rows link to the relevant **code finding** in the Code-Findings
   report.
4. **Detection rules to create** — each as the **ready-to-copy command unit** from Phase 4
   (SQL → save to `rules/<name>.sql` → full `securenow alerts rules create …` → `--mode dry_run`
   test). **Mark each rule `test-first` or `prod-ready`** (per §4b): FP-prone rules carry the
   `--mode test` → observe (3–7 days) → tune + `securenow fp` → `--mode prod` promotion step
   inline; `prod-ready` rules note why they may arm immediately. Injection-class rows reference the
   **system signature rules + `instant.block`**, not duplicate SQL. Note rules that already exist
   (from Phase 0) instead of duplicating them.
5. **Instrumentation the detections need** — only the `track('…')` events the rules above consume
   (reuse `api.stream.*` / `api.sensitive.flow` first; `llm.*` only where genuinely new), each as
   a copyable snippet; point to the **Code-Findings report** for *where* (file:line) to add them.
   Restate the **no-raw-prompt/PII-in-attributes** rule.
6. **Mitigation mechanisms** — render the **full §4c 15-row toolbox table verbatim** (firewall ·
   exploit-signature instant-block · IP block [global / route+method / temporary] · rate-limit
   [per-IP / per-route / route+IP] · challenge · auto-block · session revocation · trusted ·
   allowlist · fp-exclusion · **app/config fix**) + the "Choosing per threat" paragraph, then a
   per-threat ready-to-copy mitigation command + reversibility, and the **scoped** selection each
   threat uses. Make explicit that the **app fix is primary** for nearly every LLM row (prompt
   hardening, output sanitization, tool/agent scoping, RAG hygiene) and the SecureNow control is
   containment of the abusing source.
7. **Action plan (copy-paste, ordered)** — ① engage the firewall + enable signature
   `instant.block`, ② add the AI/LLM event instrumentation where the app is blind (stream caps,
   guardrail trips, cross-tenant blocks, tool denials, budgets), ③ create rules — **FP-prone rules
   created in `--mode test`** (detect-only), high-precision rules `--mode prod`, ④ enable
   automations / challenge rules on AI routes, ⑤ test, ⑥ verify in dashboard, ⑦ **promote the
   test-first rules: after 3–7 days of real traffic, tune thresholds, add `securenow fp`
   exclusions, then `securenow alerts rules update <RULE_ID> --mode prod`** (explicit "promote
   after N days" step), ⑧ schedule the app/config fixes from the Code-Findings report. Real
   commands only, `<APP_KEY>` substituted.
8. **Testing & validation** — per-rule recipe (4d): `securenow event send …` / `test-span` /
   dry-run + expected outcome + cleanup (TEST-NET IPs `192.0.2`/`198.51.100`/`203.0.113`).
9. **Response runbooks** — for each notification type (AI flood, stream-cap abuse, injection
   campaign, cross-tenant retrieval, tool-denied, extraction attempt, budget exceeded): confirm
   TP → exact respond command (copy) → exact reverse command (copy) → app-fix to schedule.
10. **Known gaps & SecureNow feature requests** — every 🔴 threat: why it's not coverable today
    (lives inside the model/app, emits no traffic/event), interim app-level fix (link to the code
    finding), and the "contact the SecureNow team" line.
11. **Appendix** — resolved SDK/CLI version (from Phase 0.5), app key, environment,
    providers/models in use, firewall state, rule IDs created, date, link to the Code-Findings report.

### 5b. Code Findings & Recommendations report — `ai-llm-features-code-findings.md` (+ `.html`)

The **code audit**. State at the top: *"Findings only — no application code was modified."*
Sections, in order (same in `.md` and `.html`):

1. **Executive summary** — findings by severity (critical / high / medium / low), top 3 **code**
   risks for this stack, one-paragraph posture verdict.
2. **AI / LLM surface & inventory** — the Phase 1 provider/model table + AI entry-point catalog +
   prompt-construction map + RAG pipeline (incl. tenant scoping) + tool/agent inventory +
   output-sink list + cost/quota table + telemetry redaction status + tenancy note.
3. **Threat catalog** — the exhaustive Phase 2 catalog (groups A–P), each item tagged with its
   OWASP LLM Top 10 (2025) code (LLM01–LLM10) + API4/API6/A03 overlap, marked **modeled** or
   explicit **N/A** with a reason. Do not silently drop any item.
4. **Code-level findings (audit)** — table `# | Location (file:line) | Threat | OWASP | Sev |
   Issue | Recommended fix`, each with the quoted 1–8 line snippet and the described fix (never
   applied). Each finding links to the detection-report row it backs (if any).
5. **Strengths** — controls already present and correct (instruction/data separation, RAG
   tenant-scoping, caller-scoped tool authz + human gate, output encoding, token budgets) — an
   honest posture.
6. **App / config fixes (primary remediation)** — the prompt-hardening / output-sanitization /
   tool-scoping / RAG-hygiene / token-budget changes that remove the root cause (described, not
   applied), each linked to the **detection-report** row it backs.
7. **Instrumentation recommendations** — the `track('…')` calls to add and the exact file:line to
   add them at the enforcement point, so the detection rules light up. Restate the
   **no-raw-prompt/PII-in-attributes** rule.
8. **Appendix** — files reviewed, resolved SDK version (from Phase 0.5), date, link to the
   Detection & Mitigation report.

### HTML reports — two SecureNow skeletons (self-contained; inline CSS + offline copy buttons)

Both HTML files are **self-contained**: inline CSS, an inline copy `<script>`, no external
fonts/scripts/network. They **share** the `<head>` (brand tokens + copy-button styles) and the
copy `<script>` at the end of `<body>` below — change only the `<title>`, the sidebar subtitle,
and the section content per track. **Wrap EVERY command/SQL/code block in a `.cmd`** so it gets a
working Copy button. Use this skeleton and these exact brand tokens (extend content, do not
restyle):

```html
<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title><!-- Track A: "Detection & Mitigation — AI / LLM Features — SecureNow"
            Track B: "Code Findings — AI / LLM Features — SecureNow" --></title>
<style>
  :root{--bg:#0f1419;--panel:#161c24;--panel2:#1b2330;--border:#26303d;--txt:#dbe3ec;--muted:#8b97a7;
    --accent:#3ea6ff;--accent2:#16c79a;--crit:#ff5c6c;--high:#ff9f43;--med:#f7c948;--low:#8b97a7;
    --ok:#16c79a;--info:#3ea6ff;--rev:#b388ff;}
  *{box-sizing:border-box}html{scroll-behavior:smooth}
  body{margin:0;background:var(--bg);color:var(--txt);font:15px/1.6 -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif}
  a{color:var(--accent);text-decoration:none}
  code{background:#0b0f14;border:1px solid var(--border);border-radius:5px;padding:.08em .4em;font:13px/1.4 ui-monospace,"SF Mono",Menlo,Consolas,monospace;color:#9fe0c0}
  .wrap{display:grid;grid-template-columns:240px 1fr;max-width:1280px;margin:0 auto}
  nav{position:sticky;top:0;align-self:start;height:100vh;overflow:auto;padding:28px 18px;border-right:1px solid var(--border);background:var(--panel)}
  nav .brand{font-weight:700;font-size:15px;letter-spacing:.3px}nav .brand span{color:var(--accent)}
  nav .sub{color:var(--muted);font-size:12px;margin-bottom:22px}
  nav a{display:block;color:var(--muted);padding:7px 10px;border-radius:7px;font-size:13.5px}
  nav a:hover{background:var(--panel2);color:var(--txt)}
  main{padding:36px 40px 80px;min-width:0}
  header.top h1{margin:0 0 6px;font-size:26px}header.top p{margin:0;color:var(--muted)}
  .pill{display:inline-block;font-size:11px;font-weight:600;padding:3px 9px;border-radius:999px;border:1px solid var(--border);color:var(--muted);background:var(--panel)}
  .stats{display:grid;grid-template-columns:repeat(5,1fr);gap:14px;margin:26px 0 34px}
  .stat{background:var(--panel);border:1px solid var(--border);border-radius:12px;padding:16px 18px}
  .stat .n{font-size:26px;font-weight:700}.stat .l{color:var(--muted);font-size:12.5px;margin-top:2px}
  section{margin:0 0 40px}
  h2{font-size:18px;margin:0 0 14px;padding-bottom:8px;border-bottom:1px solid var(--border)}
  h2 .num{color:var(--accent);font-weight:700;margin-right:8px}
  table{width:100%;border-collapse:collapse;font-size:13.5px;background:var(--panel);border:1px solid var(--border);border-radius:12px;overflow:hidden}
  th,td{text-align:left;padding:11px 13px;border-bottom:1px solid var(--border);vertical-align:top}
  th{background:var(--panel2);color:var(--muted);font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.4px}
  tr:last-child td{border-bottom:none}tr:hover td{background:#19212c}
  .rid{font:12px ui-monospace,Menlo,Consolas,monospace;color:#7fd1ff;white-space:nowrap}
  .b{display:inline-block;font-size:11px;font-weight:700;padding:2px 8px;border-radius:6px;white-space:nowrap}
  .b.crit{background:rgba(255,92,108,.15);color:var(--crit);border:1px solid rgba(255,92,108,.35)}
  .b.high{background:rgba(255,159,67,.13);color:var(--high);border:1px solid rgba(255,159,67,.32)}
  .b.med{background:rgba(247,201,72,.13);color:var(--med);border:1px solid rgba(247,201,72,.32)}
  .b.low{background:rgba(139,151,167,.13);color:var(--low);border:1px solid rgba(139,151,167,.32)}
  .c{display:inline-block;font-size:11px;font-weight:700;padding:2px 8px;border-radius:6px;white-space:nowrap}
  .c.cov{background:rgba(22,199,154,.13);color:var(--ok);border:1px solid rgba(22,199,154,.35)}
  .c.part{background:rgba(247,201,72,.13);color:var(--med);border:1px solid rgba(247,201,72,.32)}
  .c.gap{background:rgba(255,92,108,.15);color:var(--crit);border:1px solid rgba(255,92,108,.35)}
  .owasp,.cwe{display:inline-block;font:11px ui-monospace,Menlo,Consolas,monospace;color:var(--accent);border:1px solid rgba(62,166,255,.3);border-radius:6px;padding:1px 6px;white-space:nowrap}
  .cwe{color:var(--rev);border-color:rgba(179,136,255,.3)}
  .m{display:inline-block;font-size:11px;font-weight:600;padding:2px 8px;border-radius:6px;border:1px solid var(--border)}
  .m.block{color:var(--crit);border-color:rgba(255,92,108,.35)}.m.rate{color:var(--info);border-color:rgba(62,166,255,.35)}
  .m.challenge{color:var(--accent2);border-color:rgba(22,199,154,.35)}.m.firewall{color:var(--ok);border-color:rgba(22,199,154,.35)}
  .m.signature{color:var(--crit);border-color:rgba(255,92,108,.35)}.m.notify{color:var(--muted)}.m.appfix{color:var(--high);border-color:rgba(255,159,67,.35)}
  .card{background:var(--panel);border:1px solid var(--border);border-radius:12px;padding:18px 20px}
  .grid2{display:grid;grid-template-columns:1fr 1fr;gap:16px}
  pre{background:#0b0f14;border:1px solid var(--border);border-radius:10px;padding:14px 16px;overflow:auto;font:13px ui-monospace,Menlo,Consolas,monospace;color:#cfe8da;margin:0}
  .cmd{position:relative;margin:10px 0}
  .copy{position:absolute;top:8px;right:8px;font:11px ui-monospace,Menlo,Consolas,monospace;color:var(--muted);background:var(--panel2);border:1px solid var(--border);border-radius:6px;padding:3px 9px;cursor:pointer}
  .copy:hover{color:var(--txt);border-color:var(--accent)}.copy.done{color:var(--ok);border-color:var(--ok)}
  .flow{display:flex;flex-wrap:wrap;align-items:center;gap:8px;margin:6px 0 14px}
  .flow .step{background:var(--panel2);border:1px solid var(--border);border-radius:9px;padding:8px 12px;font-size:13px}.flow .arr{color:var(--accent);font-weight:700}
  .note{border-left:3px solid var(--high);background:rgba(255,159,67,.06);padding:10px 14px;border-radius:0 8px 8px 0;color:#e7d3bd;font-size:13.5px;margin:10px 0}
  footer{color:var(--muted);font-size:12px;border-top:1px solid var(--border);padding-top:18px;margin-top:30px}
  @media(max-width:880px){.wrap{grid-template-columns:1fr}nav{display:none}.stats,.grid2{grid-template-columns:1fr 1fr}main{padding:24px 18px}}
</style></head>
<body>
<div class="wrap">
  <nav>
    <div class="brand">Secure<span>Now</span></div>
    <div class="sub"><!-- Track A: "Detection & Mitigation · AI / LLM Features"
                          Track B: "Code Findings · AI / LLM Features" --></div>
    <!-- one <a href="#…"> per section of THIS track (5a = 11 sections, 5b = 8 sections) -->
  </nav>
  <main>
    <header class="top"><h1><!-- report title for THIS track --></h1>
      <p><code><!-- app name / domain --></code> · <span class="pill">securenow <!-- installed version from Phase 0.5 --></span></p></header>
    <div class="stats"><!-- 5 .stat cards; numbers MUST equal the table/finding counts of THIS track --></div>
    <!-- <section id="…"> blocks mirroring the Markdown sections of THIS track (5a or 5b) -->
    <footer>Generated by the SecureNow AI / LLM features threat-model prompt · <!-- date --> · securenow <!-- version --> · app <code><!-- APP_KEY --></code></footer>
  </main>
</div>
<script>
document.querySelectorAll('.copy').forEach(function(b){b.addEventListener('click',function(){
  var pre=b.parentElement.querySelector('pre'); if(!pre)return; var t=pre.innerText;
  function done(){b.textContent='Copied';b.classList.add('done');setTimeout(function(){b.textContent='Copy';b.classList.remove('done');},1500);}
  function fb(){var ta=document.createElement('textarea');ta.value=t;ta.style.position='fixed';ta.style.opacity='0';document.body.appendChild(ta);ta.focus();ta.select();try{document.execCommand('copy');}catch(e){}document.body.removeChild(ta);done();}
  if(navigator.clipboard&&navigator.clipboard.writeText){navigator.clipboard.writeText(t).then(done,fb);}else{fb();}
});});
</script>
</body></html>
```

**Track A (Detection & Mitigation) stats cards** — five cards whose numbers equal the matrix /
rule counts: `threats modeled` · `covered` (color `--ok`) · `partial` (`--med`) · `gaps —
SecureNow team` (`--crit`) · `rules to create` (`--high`). **Track B (Code Findings) stats
cards** — `findings total` · `critical` (`--crit`) · `high` (`--high`) · `medium` (`--med`) ·
`low` (`--low`); numbers equal the code-findings table row counts.

**Every SQL/command block in the Detection & Mitigation HTML uses the copyable wrapper** (the
`.cmd` div with a `<button class="copy">` and the `<pre>`):

```html
<div class="cmd"><button class="copy" type="button">Copy</button><pre>securenow alerts rules create \
  --name "..." --sql @rules/&lt;name&gt;.sql --apps &lt;APP_KEY&gt; --severity high \
  --schedule "*/5 * * * *" --nlp "..."</pre></div>
```

The Code-Findings HTML may omit copy buttons on prose, but still wraps any example/fix command or
quoted code snippet in `.cmd`.

Badge usage: severity → `<span class="b crit|high|med|low">`, coverage →
`<span class="c cov|part|gap">COVERED|PARTIAL|GAP</span>`, OWASP tag →
`<span class="owasp">LLM01</span>` (and `<span class="owasp">API4</span>` / `A03` where it also
applies), CWE → `<span class="cwe">CWE-79</span>`, mitigation type →
`<span class="m firewall|signature|rate|challenge|block|notify|appfix">`, rule IDs →
`<span class="rid">`. Wrap every SQL/command/code block in a `.cmd` (so it gets a Copy button).
Stats numbers must equal the matrix / findings row counts of that track.

---

## Quality bar (the report is rejected if any of these fail)

- Every catalog item A1–N67 is either a matrix row or an explicit N/A line; each modeled row
  carries its OWASP LLM Top 10 (2025) tag (LLM01–LLM10) plus the API4/API6/A03 overlap where it
  applies.
- The full domain emphasis is covered: **LLM01** direct *and* indirect prompt injection;
  **LLM02** sensitive-info disclosure + system-prompt extraction; **LLM03** supply chain;
  **LLM04** data/model poisoning; **LLM05** improper output handling (XSS/SQLi/SSRF/RCE/path from
  trusting output); **LLM06** excessive agency (tools/agents over-broad or no human gate);
  **LLM07** system-prompt leakage / embedded secrets; **LLM08** vector/embedding weaknesses
  (cross-tenant retrieval, inversion); **LLM09** misinformation-driven action; **LLM10** unbounded
  consumption / denial-of-wallet; plus **tool/function-call abuse** (SSRF/file/exec/payment) and
  **cross-tenant context bleed**.
- The sibling models are **deferred** (rows 74–77 present, linked, not re-derived): AI stream
  limits → ../17-realtime-websocket-sse/, request volume → ../12-rate-limits-and-abuse/, data
  to/from the model → ../24-data-privacy-pii/, model-output→sink mechanics → ../06-injection/.
- Every matrix row has a concrete signal (threshold + window), severity, and mitigation — no
  "monitor for suspicious activity" filler.
- Coverage badges are honest: native coverage is **MED** — flood/cost (stream + flow-rate,
  `api.stream.limit_exceeded`), output-→-sink injection via system signature `instant.block`, and
  rate-limit/challenge of abusers are SecureNow's contribution; prompt hardening, output
  sanitization, tool/agent scoping, and RAG hygiene are **app fixes** and every such row **pairs
  edge-containment with the app fix**.
- The Detection report's mitigation section presents the **full toolbox** (§4c: firewall ·
  instant-block · block [global / route / method / temporary] · rate-limit [IP / route / IP+route] ·
  challenge · auto-block · revoke · trusted · allowlist · fp · app-fix), and **each modeled threat's
  matrix row selects specific, scoped mitigation(s) from it** — never a generic "block the IP."
- **Every false-positive-prone rule is tagged `test-first`** and carries the `--mode test` → observe
  (3–7 days) → `--mode prod` promotion workflow; only high-precision rules are `prod-ready`. The
  action plan creates the test-first rules in `--mode test` and has an explicit "promote after N
  days" step.
- Existing taxonomy events (`api.stream.started`, `api.stream.limit_exceeded`,
  `api.sensitive.flow`) are reused; `llm.*` events are proposed **only** where there is no
  traffic or existing event for the signal.
- **No raw prompt/completion/PII text in any event attribute** — hashes/labels/counts only;
  telemetry redaction is verified in Phase 1 and restated in both reports.
- Every code finding in the Code-Findings report has a `file:line`, the quoted snippet, and a
  described fix — and **no application code was modified** (this is an audit).
- Every detection SQL keeps `__USER_APP_KEYS__` scoping (correct table column:
  `resources_string['service.name']` for logs vs. `` `resource_string_service$$name` `` for
  traces), the `client_ip` coalesce, selects an `ip` column, uses `HAVING ip != ''`, and traffic
  queries keep the `ts_bucket_start` + `kind = 2` guards; new rules validated with `--mode dry_run`.
- Output-handling injection coverage references the **system signature rules + `instant.block`**,
  not duplicate pattern SQL.
- Only commands, flags, events, and SQL columns from this prompt's building blocks appear
  (including `securenow challenge …`, `firewall`, and the `api.*` / `llm.*` events).
- Every 🔴 gap appears in the gaps section with an interim app-level fix **and** the "contact the
  SecureNow team" line.
- The action plan runs top-to-bottom with `<APP_KEY>` substituted in.
- **Phase 0.5 ran**: the resolved installed `securenow` version appears in the appendix of **both**
  reports, and no command/flag/event/column is emitted that the installed SDK/CLI does not expose
  (else it is annotated `# requires securenow >= <version>`).
- Every detection rule is a **complete copyable unit** (SQL → `rules/<name>.sql` → full
  `securenow alerts rules create …` → `--mode dry_run` test); flags match `alerts rules --help`
  from Phase 0.5.
- **Four** files are written to `threat/25-ai-llm-features/` — `ai-llm-features-detection-mitigation.md`
  + `.html` and `ai-llm-features-code-findings.md` + `.html`; the two tracks **cross-link** each
  other.
- The split is honest: SecureNow-runnable detections / mitigations live in the Detection &
  Mitigation report; code / config changes live in the Code-Findings report; nothing
  security-relevant is dropped.
- Both HTML files are self-contained (no CDN, no fonts, no external scripts), the stats cards match
  that track's table counts, and **every command/SQL block in the Detection & Mitigation HTML is
  wrapped in a `.cmd` with a working Copy button** (offline clipboard `<script>` included).
- A one-line summary is printed back to the user: per-track file paths, threat counts,
  rules-to-create count, code findings by severity, gaps, OWASP LLM coverage, and the resolved SDK
  version.

<!-- ════════════════ END OF PROMPT ════════════════ -->