# Realtime — WebSocket / SSE / Streaming Threat Model — Generator Prompt

A **copy-paste prompt** for customers. Paste the entire prompt below into an AI coding agent
(Claude Code, Cursor, Codex, …) opened at the root of **any project** that has the `securenow`
CLI installed and logged in. The agent will inventory every long-lived / streaming connection
path (WebSocket, Socket.IO, SSE, long-polling, AI/export/download streams), build an exhaustive
**realtime / streaming** threat model mapped to the **OWASP API Security Top 10:2023**, audit
the code for socket-layer flaws, and emit **four** SecureNow-branded deliverables across **two
tracks** (a Detection & Mitigation runbook and a Code Findings audit, each as **Markdown +
self-contained HTML**) — including the detection rules to create as **ready-to-copy** command
units, the mitigation commands to run, how to test each one, the code-level findings (audited,
**not** fixed), and which threats still need the SecureNow team. Every rule, flag, event name,
and SQL column is **grounded in the SecureNow SDK actually installed in the repo** (Phase 0.5),
and the HTML reports ship with **offline copy buttons** on every command.

This model owns the part of the API surface that the request/response models can't see: the
**handshake/upgrade**, **per-message** authorization, **channel/room/topic** isolation on a
live socket, and **streaming resource & cost** limits. A single WebSocket upgrade is **one
span** — everything that happens inside the open socket (every message, subscribe, publish,
token streamed) is **invisible to per-request HTTP rules**. That is why this domain leans on
**event-based** detection (`api.stream.*` / `api.websocket.*`) far more than the API-security
model, and why most of its real fixes are **app-level** controls SecureNow can only *contain at
the edge*.

> SecureNow contributes natively: the `api.stream.started` / `api.stream.limit_exceeded` /
> `api.websocket.authz_failed` **events** and **connection-rate containment** at the edge
> (firewall / rate-limit / challenge / block on the upgrade request, which IS a normal HTTP
> request). **Per-message authorization, channel-ownership checks, origin validation on the
> handshake, and message/byte/token/duration/queue caps are app fixes** — SecureNow detects
> their *abuse* (the flood of failed-authz events, the stream that blew its cap) and contains
> the source, but the missing control is the primary fix.

This is a **sibling** model. It does **not** re-derive connection authentication, channel data
boundaries, or AI cost — those live in numbered sibling models you reference (not duplicate):
[../01-authentication/](../01-authentication/) (who the socket belongs to — connection auth),
[../03-tenant-isolation/](../03-tenant-isolation/) (the channel/room data boundary),
[../25-ai-llm-features/](../25-ai-llm-features/) (AI stream token/cost economics). Run those too.

Requirements on the customer machine: `npm i -g securenow && securenow login` (admin auth + app
runtime connected). Everything else is discovered by the agent.

---

<!-- ════════════════ COPY EVERYTHING BELOW THIS LINE ════════════════ -->

# Generate a Realtime (WebSocket / SSE / Streaming) Threat Model Report (SecureNow)

You are a senior application-security engineer specializing in realtime and streaming APIs.
Produce an **exhaustive threat model for every long-lived / streaming connection in THIS
codebase**, organized along the **OWASP API Security Top 10:2023** (this domain maps primarily
to **API1 BOLA**, **API4 Unrestricted Resource Consumption**, **API5 BFLA**, and **API8
Misconfiguration**), mapped to **SecureNow** detections and mitigations, with a ready-to-run
action plan **and** a code-level audit of every socket-layer flaw you find. You write **four**
deliverables across **two tracks** into `threat/17-realtime-websocket-sse/` (create the folder if
needed):

1. `realtime-websocket-sse-detection-mitigation.md` — the **operational runbook**: what to run in
   SecureNow (rules, mitigations, tests), every command a **ready-to-copy** unit.
2. `realtime-websocket-sse-detection-mitigation.html` — the same runbook as a **self-contained**
   HTML page (inline CSS + JS, no network requests) with a **Copy button** on every command.
3. `realtime-websocket-sse-code-findings.md` — the **code audit**: socket-layer issues found in
   the codebase + recommended fixes (described, **never** applied).
4. `realtime-websocket-sse-code-findings.html` — the same audit as a **self-contained** HTML page.

The two tracks **cross-link** each other: the gaps/instrumentation rows in the detection report
link to the relevant code finding, and the app-fix findings link back to the detection row they
back. Every rule, flag, event name, and SQL column is **grounded in the installed SDK** (Phase
0.5) and the resolved `securenow` version is recorded in **both** reports' appendix.

Work in the seven phases below, in order. **Never invent facts**: if something is not in the
codebase or not returned by a CLI command, say "not found" — do not guess. **Do not modify
application code.** You are auditing: every code-level fix is *described in the report*, never
applied to the repo.

**Scope discipline.** This model owns the **long-lived / streaming** surface: WebSocket,
Socket.IO, SSE, long-polling/HTTP-streaming, gRPC streaming, AI token streams, and
report/export/download streams — specifically their **handshake/origin**, **per-message authz**,
**channel/room/topic isolation**, and **streaming resource/cost limits**. It does **not**
re-derive:

- **Connection authentication** (is the token valid at connect time, session lifetime, token
  in cookie vs header) → defer to [../01-authentication/](../01-authentication/).
- **The channel/room data boundary as a tenancy model** (what data a tenant may ever see) →
  defer to [../03-tenant-isolation/](../03-tenant-isolation/). This model owns the *socket-side
  enforcement gap* (joining another tenant's room from client input); the tenancy data model
  itself is the sibling's.
- **AI stream token economics / prompt abuse** → defer to [../25-ai-llm-features/](../25-ai-llm-features/).
  This model owns the *uncapped stream as a resource/cost amplifier* (no duration/token/byte
  cap on the socket); the LLM-specific abuse is the sibling's.

For the deferred items, add **one** matrix row each marked *"deferred — see linked model"*, note
only the SecureNow traffic/event-observable symptom, and link the numbered sibling. Do not write
their detection/mitigation here.

---

## Phase 0 — Verify SecureNow tooling

Run and record (use `--json` where supported):

```bash
securenow doctor              # connectivity must be healthy
securenow whoami              # admin auth + runtime app
securenow status --json       # app key(s), environment, firewall state
securenow alerts rules --json # detection rules that already exist (incl. system signature rules)
securenow automation --json   # blocklist automations that already exist
securenow challenge list --json   # CAPTCHA / proof-of-work challenge rules
securenow env --json          # resolved SDK config (service name, endpoints)
```

If the CLI is missing or not logged in, **stop** and tell the user to run
`npm i -g securenow && securenow login`, then re-run this prompt. Capture the **app key** (UUID)
— every rule and command in the report must use it. If multiple apps exist, ask the user which
app this codebase maps to before continuing. Note the **firewall state** (the upgrade request is
an HTTP request the firewall can drop before it ever becomes a socket) and any **system
signature rules** already present (a payload-borne exploit inside a WebSocket *message* will only
be matched by a signature rule if your stack records the message body as a span attribute — most
do not; treat injection-over-socket as event-driven, see catalog C).

---

## Phase 0.5 — Ground every rule & command in the INSTALLED SDK

Before writing any SQL or CLI, read the SecureNow SDK that is actually installed in this repo so
every alert rule and command is correct for THIS version — never guess flags, subcommands, event
names, or SQL columns:

```bash
cat node_modules/securenow/package.json    # installed SDK version (record it in both reports)
ls node_modules/securenow                  # exported modules: events, sessions, register, run, …
ls node_modules/securenow/dist 2>/dev/null # built entrypoints / bundled CLI
npx securenow --help                       # top-level commands available in this version
npx securenow alerts rules --help          # exact create flags: --name/--sql/--apps/--severity/--schedule/--nlp
npx securenow event --help                 # `event send` shape for synthetic tests
npx securenow ratelimit --help; npx securenow challenge --help
npx securenow blocklist --help; npx securenow automation --help; npx securenow trusted --help
```

If `node_modules/securenow` is absent, run `npm ls securenow`; if still missing, tell the user to
`npm i securenow` (or `npm i -g securenow`) and stop. EVERY command, flag, `track('…')` event
name, and SQL column you emit MUST be one the installed SDK/CLI actually exposes. If the installed
version lacks a capability this prompt references, emit the rule but annotate it
`# requires securenow >= <version>` instead of a broken command. Record the resolved version in
the appendix of BOTH reports.

In Phase 4 and Phase 5, treat `node_modules/securenow` + `--help` as the source of truth: the
`securenow/events` `track()` signatures, the `securenow alerts rules` SQL columns, and every
mitigation subcommand are discoverable there. Cross-check before emitting.

---

## Phase 1 — Inventory the realtime / streaming surface (codebase analysis)

Realtime security starts with an inventory of **what stays open and what streams**. A normal
endpoint catalog is not enough — you must capture the *lifecycle* of each connection. Cover at
minimum:

- **Connection types & libraries** — native `ws`/`WebSocket`, `socket.io`/`engine.io`,
  `sockjs`, SSE (`text/event-stream`, `EventSource`, `res.write` loops), long-polling, HTTP
  chunked/streaming responses, `ReadableStream`/`Response` streaming (Next.js / edge),
  gRPC/gRPC-Web streaming, AI SDK token streams (OpenAI/Anthropic stream, `ai` package
  `streamText`), GraphQL subscriptions (`graphql-ws`, `subscriptions-transport-ws`), MQTT/STOMP
  over WebSocket, Phoenix Channels, ActionCable, SignalR, Pusher/Ably/Supabase Realtime
  client-or-self-hosted. List public vs internal/service-to-service.
- **Endpoint catalog** — enumerate every upgrade path and stream route (`/ws`, `/socket.io/`,
  `/api/stream`, `/api/chat`, `/events`, `/sse`, `/api/export`, GraphQL subscription endpoint),
  the handler/gateway/resolver name, and visibility (public / authenticated / admin). This
  catalog is a report deliverable.
- **Handshake / upgrade controls** — for each: is the **`Origin` header validated** against an
  allowlist on upgrade? Is auth enforced **during** the handshake (and how — cookie, `Sec-
  WebSocket-Protocol` token, `Authorization` header, **query-string token**)? Is there a
  **connection rate limit** on the upgrade (per IP / per user)? Is there a **max concurrent
  connections** cap (per IP / per user / global)? Note where **none** exist.
- **Per-message / per-action authorization** — after connect, is **every** inbound message,
  RPC, `subscribe`, `publish`, `join`, or action re-checked against the connection's identity
  and permission — or is auth checked **only once at connect time**? Identify the message
  router/dispatcher and whether it carries the authenticated principal into each handler.
- **Channel / room / topic model** — how does a client join a room/topic/channel? Is the
  room/topic name or an object id taken **from client input** (`socket.join(msg.room)`,
  `subscribe(req.params.topic)`) and joined **without** verifying the connection owns it? Map
  every room/topic naming scheme to its tenant/user boundary.
- **Resource & cost caps per connection** — message **count/rate** cap, message **byte size**
  cap, total **bytes** per connection, **duration** cap (max socket lifetime / idle timeout),
  **token** cap on AI streams, **queue/backpressure** limit (what happens when a consumer is
  slow — does the server buffer unboundedly?), and **broadcast fanout** size (one publish → how
  many deliveries; is fanout bounded?). Note every cap that is **missing**.
- **Streaming cost sinks** — AI/LLM token streams, server-side export/report generation streamed
  to the client, large file download streaming, anything where keeping the stream open forces
  the server to do paid or expensive work for its whole lifetime. (Defer the AI-specific
  economics to [../25-ai-llm-features/](../25-ai-llm-features/); model the *uncapped stream*
  here.)
- **Credential exposure in the URL** — does the client put a token/JWT/API key/session id in the
  **upgrade URL query string** (`wss://host/ws?token=…`, `/sse?apikey=…`)? Query strings land in
  access logs, proxy logs, browser history, and telemetry — a leak. Note every such occurrence.
- **Transport & proxy posture** — is the socket behind a reverse proxy / load balancer that
  terminates and re-originates the connection? Does the app derive client IP from
  `X-Forwarded-For` on the upgrade (spoofable)? Is `wss://` (TLS) enforced, or is `ws://`
  reachable? Are idle/read/write timeouts set at the proxy and the app?
- **Telemetry privacy & redaction** — confirm the SecureNow SDK / log pipeline redacts
  `Authorization`, `Cookie`, `Sec-WebSocket-Protocol` token values, query-string tokens, and any
  message body captured as an attribute. A token-in-URL handshake that is *also* logged
  unredacted is a high-severity finding.
- **SecureNow instrumentation already present** — `securenow/register` / `securenow run` /
  `securenow init` (the upgrade request gets a traffic span automatically — one per connection),
  any existing `securenow/events` `track('api.stream.*' | 'api.websocket.*')` calls, and whether
  the firewall is engaged. This determines what works *today* (edge containment on the upgrade)
  vs *after instrumentation* (everything inside the open socket).

Output of this phase = the report's **Realtime surface & inventory** section: the connection
catalog (path / library / type / visibility), a **per-connection controls table** (Origin check
/ Auth-at-connect / Per-message authz / Connection rate limit / Concurrency cap / Msg rate cap /
Byte cap / Duration cap / Token cap / Queue/backpressure / Fanout cap), the **channel/room ↔
tenant boundary** map, the **credential-in-URL** list, the **proxy/transport posture**, the
**telemetry redaction status**, and a short paragraph naming the real realtime attack surface for
this stack (e.g. "a Socket.IO gateway that authenticates once at connect and joins any room the
client names, with no per-message check and no duration cap").

---

## Phase 2 — Enumerate threats (exhaustive catalog)

Evaluate **every** threat below against the discovered surface. Each item is either **modeled**
(a row in the threat matrix) or **explicitly N/A** (one line in an "Out of scope" subsection with
the reason — e.g. "GraphQL subscription items: N/A, no GraphQL"). Never silently drop an item.
Add stack-specific threats you discover that are not listed — this catalog is the floor, not the
ceiling. Tag each modeled row with its **OWASP API Top 10:2023** code (**API1 / API4 / API5 /
API8**, or "—").

**A. Handshake / upgrade / origin (OWASP API8, API2-symptom)**
1. **Origin not validated on upgrade → Cross-Site WebSocket Hijacking (CSWSH)** — a malicious
   page opens a `wss://` to your host; the browser attaches the victim's cookies; the attacker
   reads/sends on the victim's authenticated socket. (No CSRF token on the handshake.)
2. **Auth missing or weak at connect** — the upgrade is accepted without verifying a session/
   token, or accepts an expired/forged one. *(Deep model deferred to ../01-authentication/;
   model the observable here.)*
3. **TLS not enforced** — `ws://` reachable; handshake + token sent in cleartext / downgradeable.
4. **`Sec-WebSocket-Protocol` / subprotocol abuse** — auth token smuggled as a subprotocol value
   and echoed/logged, or subprotocol negotiation bypassing a check.

**B. Per-message / per-action authorization (OWASP API5 BFLA, API1 BOLA)**
5. **Auth checked only at connect, not per message** — connection authenticated once; thereafter
   any message/action is trusted. A long-lived socket outlives a session revocation / role
   change.
6. **BFLA over a socket** — a privileged action (`admin:*`, `moderate`, `delete`) is reachable
   via a message type with no function-level check, even though the HTTP equivalent is guarded.
7. **BOLA over a socket** — a message references another user's/tenant's object id
   (`{type:'get', id:'<other-user-order>'}`) and the handler returns it with no object-level
   ownership check.
8. **Message-type confusion / verb tunneling** — an unlisted or internal message type reaches a
   privileged handler because the dispatcher routes on a client-supplied `type` without an
   allowlist.

**C. Channel / room / topic isolation (OWASP API1, API5; defer data model to ../03-tenant-isolation/)**
9. **Cross-tenant subscription** — client supplies the room/topic/channel name
   (`socket.join(msg.room)`, `subscribe('tenant-<id>')`) and the server joins it **without**
   verifying the connection owns that tenant → live cross-tenant data feed.
10. **Predictable/enumerable channel names** — sequential or guessable room ids let a client
    subscribe to others' channels by iterating.
11. **Broadcast leak across tenants** — a publish intended for one tenant's room is delivered to
    a broader audience because rooms are not tenant-scoped, or a wildcard subscription is allowed.
12. **Presence / typing / metadata leak** — presence, read-receipts, or member lists of a
    channel the client doesn't own are exposed.

**D. Connection-level resource consumption (OWASP API4)**
13. **Connection flood (single IP)** — one IP opens upgrades at high rate (no connection rate
    limit on the handshake).
14. **Distributed connection flood** — many IPs / one ASN open connections (botnet exhausting
    the connection table / file descriptors).
15. **Long-lived-connection exhaustion / slow-connect** — many idle or slowly-handshaking
    sockets held open to exhaust the connection pool (Slowloris-for-WebSocket).
16. **No max-concurrent-connections cap** (per IP / per user / global) → file-descriptor / memory
    exhaustion.
17. **Reconnect storm** — a client (or buggy app) reconnects in a tight loop with no backoff,
    amplifying handshake cost.

**E. Message / stream resource & cost amplification (OWASP API4)**
18. **No message-rate cap** — a connected client sends messages as fast as it can (per-connection
    flood inside an authenticated socket — invisible to HTTP rate limits).
19. **No message byte-size cap** — a single oversized frame / message exhausts memory.
20. **No total-bytes / no duration cap per connection** — a socket streams or receives unbounded
    data over an unbounded lifetime.
21. **Streaming cost amplification** — AI/export/report/download stream not capped server-side by
    duration, tokens, bytes, or messages; attacker opens many to burn compute/$$$. *(AI economics
    deferred to ../25-ai-llm-features/; the uncapped-stream resource abuse is modeled here.)*
22. **Unbounded broadcast fanout** — one cheap publish triggers N expensive deliveries
    (re-render, DB read per subscriber, push per member); attacker amplifies by inflating room
    membership.
23. **Backpressure absent → memory growth under slow consumers** — server buffers outbound
    messages without bound when a consumer reads slowly; a deliberately-slow client OOMs the
    server.
24. **Decompression abuse (`permessage-deflate`)** — compressed-frame bomb expands to exhaust
    memory/CPU (WebSocket compression DoS).
25. **Queue / mailbox flooding** — per-connection or per-room message queue grows unbounded from a
    flooding publisher.

**F. Credential / data exposure (OWASP API8; API2/API3 symptoms)**
26. **Token/credential in the upgrade URL** — token in `?token=`/`?apikey=` lands in access logs,
    proxy logs, browser history, Referer, and telemetry → credential leak.
27. **Sensitive data over-streamed** — the stream returns more than the client is entitled to
    (full record vs allowed projection) — traffic/event-observable; app fix primary.
28. **Message bodies captured by telemetry without redaction** — socket frames logged with PII /
    tokens / secrets in attributes.

**G. SSE / long-polling / Socket.IO / GraphQL-subscription equivalents** (model only the
transports present, else N/A — but map each present transport's equivalent of A–F)
29. **SSE: no per-event authz / no stream duration cap / token-in-URL** (`EventSource` can't send
    custom headers → tokens often pushed to the query string).
30. **SSE/long-poll: connection-holding exhaustion** — each open `text/event-stream` ties up a
    server connection; floods exhaust the pool (especially without HTTP/2).
31. **Socket.IO: namespace/room authz bypass / engine.io polling-transport fallback** evades a
    WebSocket-only control (the long-polling transport hits ordinary HTTP and may dodge a
    socket-scoped rule).
32. **GraphQL subscriptions: unauthorized subscribe / subscription-as-amplifier** (one
    subscription fans out high-frequency expensive resolutions). *(Deep GraphQL model in
    ../15-graphql-security/ if present.)*

**H. Negative-space & evasion**
33. **Transport-fallback evasion** — Socket.IO/SockJS falls back to XHR-polling; a rule scoped to
    the `Upgrade` request misses the polling transport (and vice-versa).
34. **Client-IP spoofing on the upgrade** (`X-Forwarded-For`) to evade per-IP connection limits.
35. **Direct-origin connect bypassing the CDN/WAF** — connecting to the origin host/IP, skipping
    edge containment.
36. **Reconnect-with-new-IP / NAT rotation** to evade per-IP connection caps while keeping the
    abuse going.

**I. Deferred — modeled in sibling models (reference, do not re-derive)**
37. **Connection authentication depth** (token validity, session lifetime, revocation, token
    storage, **API2**) → [../01-authentication/](../01-authentication/).
38. **Channel/room data boundary as the tenancy model** (what a tenant may ever see, **API1/
    API5**) → [../03-tenant-isolation/](../03-tenant-isolation/).
39. **AI stream token/cost economics & prompt abuse** (**API4**) → [../25-ai-llm-features/](../25-ai-llm-features/).

> For 37–39, add **one** matrix row each marked *"deferred — see linked model"*, and only note
> the SecureNow event-observable symptom (e.g. a spike of `api.websocket.authz_failed`, a spike
> of `api.stream.limit_exceeded`). The full detection/mitigation lives in the linked sibling.

**J. Observable abuse (what telemetry actually catches — the workhorse rules)**
40. **Upgrade-request flood from one IP/ASN** — the handshake is a normal HTTP request the
    traffic pipeline sees (one span per connection); spike = connection flood.
41. **Failed-upgrade / 4xx spike on the WS/SSE route** — rejected handshakes (auth/origin
    failures, probing).
42. **`api.websocket.authz_failed` spike from one IP/user** — per-message/subscribe authz checks
    failing repeatedly (cross-channel probing, BOLA/BFLA over socket).
43. **`api.stream.limit_exceeded` events** — streams hitting duration/bytes/messages/tokens/queue
    caps (cost abuse, runaway client). Any hit is high-signal.
44. **`api.stream.started` rate / concurrency anomaly** — a single user/IP opening an anomalous
    number of streams (cost amplification onset; long-lived sockets need event-based detection
    because one upgrade is only one span).

> Items 40–44 are the rules that actually fire on telemetry — pair each with its threshold +
> window in Phase 4. **Inside an open socket, HTTP traffic rules go blind** (one upgrade = one
> span, no per-message spans), so per-message/per-stream abuse is detected via the `api.stream.*`
> / `api.websocket.*` **events** the app emits — not via traffic SQL.

---

## Phase 3 — Audit the code (findings only — do not fix)

For **each** modeled threat that maps to real code, locate the responsible code and record a
**finding** for the report's "Code-level findings" section. A finding is:

- **Location** — `file:line` (clickable), the gateway/handler/resolver/middleware name.
- **Pattern** — quote the 1–8 relevant lines. State the missing control precisely.
- **Why exploitable** — the concrete message/handshake an attacker sends and what they achieve.
- **Severity** — critical / high / medium / low (impact × reachability).
- **Recommended fix (described, not applied)** — the specific change. Reference the secure
  pattern, not a code diff. **You must not edit the codebase.**

If a control exists and is correct (origin allowlist enforced on upgrade, per-message authz in
the dispatcher, duration/byte caps set, backpressure handled), note it as a **strength** — the
posture must be honest. Absence of a control where the surface exists is itself a finding ("no
per-message authz anywhere in the socket dispatcher").

Look specifically for:

**Handshake / origin flaws** — upgrade handler that never reads/validates `Origin`
(`wss.on('connection')` with no origin check, `io.use` missing an origin gate,
`verifyClient`/`allowRequest` absent or returning `true`); auth derived only from a cookie with
no CSRF-equivalent token on the handshake (the CSWSH precondition); `ws://` allowed; token read
from the URL query string (`url.parse(req.url).query.token`). *Recommended fixes must mention*
an explicit origin allowlist on upgrade, a handshake auth token that is **not** ambient cookie
(or a per-connection CSRF token), enforcing `wss://`, and moving the token out of the URL into
the `Sec-WebSocket-Protocol` header or first authenticated message.

**Per-message authorization flaws** — a message router that authenticates once in
`on('connection')` and then trusts every `on('message')`/event handler; handlers that read
`msg.id` / `msg.room` / `msg.userId` from the frame and act without re-checking the connection's
principal; admin/moderation message types with no function-level guard. *Recommended fixes must
mention* carrying the authenticated principal into every handler, re-checking object ownership
(BOLA) and function-level permission (BFLA) on **every** message, an allowlist of message types,
and revalidating against current session/role (so revocation/role-change takes effect mid-socket).

**Channel / room / tenant isolation flaws** — `socket.join(clientSuppliedName)` or
`subscribe(req.params.topic)` with no ownership check; room/topic names that embed a tenant/user
id taken from input and trusted; wildcard subscriptions; broadcast helpers that emit to a
broader scope than the tenant room. *Recommended fixes must mention* deriving the room/topic from
the **authenticated** principal (never client input), verifying tenant/user ownership before
join/subscribe, namespacing rooms by tenant, and rejecting wildcard/enumerable channel joins.

**Connection-level resource flaws** — no connection rate limit on the upgrade; no max-concurrent
cap per IP/user/global; no idle/handshake timeout; unbounded accept loop; reconnect handling with
no server-side backoff. *Recommended fixes must mention* a per-IP and per-user connection rate
limit, a hard concurrency cap, idle + handshake timeouts, and rejecting reconnect storms.

**Message / stream cost flaws** — `ws` server with no `maxPayload`; no per-connection message
rate cap; no total-byte or duration cap; AI/export/download streams (`for await … res.write`,
`streamText`, `pipe`) with no server-side duration/token/byte ceiling; `permessage-deflate`
enabled with no decompressed-size guard; outbound buffering with no `bufferedAmount`/backpressure
check; broadcast that iterates all subscribers with per-subscriber DB work. *Recommended fixes
must mention* `maxPayload`/message size caps, per-connection message-rate and total-byte caps, a
hard stream **duration** and **token/byte** cap enforced server-side, decompression-size limits,
backpressure (pause/drop when `bufferedAmount` exceeds a threshold), and bounded/queued fanout.

**Credential & telemetry flaws** — token/session/api-key in the upgrade URL query string; the
same handshake logged unredacted; message bodies captured into telemetry attributes with PII/
tokens. *Recommended fixes must mention* moving credentials out of the URL, redacting handshake
query strings and `Sec-WebSocket-Protocol` token values in the log pipeline, and hashing/omitting
message bodies before they become attributes.

**SSE / long-poll / Socket.IO / subscription specifics** — `EventSource` flows that push the
token to the query string (because the browser API can't set headers); SSE handlers with no
duration cap holding a connection indefinitely; Socket.IO controls scoped only to the WebSocket
transport while the polling fallback (plain HTTP) dodges them; GraphQL subscription resolvers
with no per-subscribe authz or fan-out bound. *Recommended fixes must mention* an SSE duration
cap + token-via-cookie/header (or a short-lived stream ticket), applying the same control to
**all** Socket.IO/engine.io transports (not just `websocket`), and per-subscribe authz + fan-out
limits for subscriptions.

---

## Phase 4 — Map every modeled threat to SecureNow detection + mitigation

Classify each threat with exactly one coverage badge:

- 🟢 **COVERED** — detectable + mitigable with SecureNow today. For this domain that is the
  **connection/upgrade layer** (the handshake is an HTTP request: flood it and traffic rules fire
  + firewall/rate-limit/challenge/block contains it) and the **native events**
  (`api.stream.started` / `api.stream.limit_exceeded` / `api.websocket.authz_failed`) once the
  app emits them — these directly detect per-message authz failures and stream-cap breaches.
- 🟡 **PARTIAL** — SecureNow can **contain the abuser at the edge** (rate-limit/challenge/block
  the upgrade, the only thing it can intercept on a socket) **while the real fix is an app
  control** (origin validation, per-message authz, channel-ownership check, duration/byte/token
  cap, backpressure). Also 🟡 when detection only works **after** the customer adds the
  `api.stream.*` / `api.websocket.*` instrumentation. **Most rows in this model are 🟡** — be
  honest: SecureNow sees the *handshake* and whatever *events* you emit, not the bytes inside an
  established socket.
- 🔴 **GAP** — SecureNow cannot detect or mitigate this today (e.g. inspecting per-message
  content inside an established socket, enforcing backpressure, blocking a single message
  mid-stream without dropping the whole connection). **Still include it**: give the app/config
  fix, then add the line *"Requires SecureNow team — contact your SecureNow account contact (or
  in-dashboard support) to request support for this threat."* Collect all gaps in the report's
  "Known gaps & SecureNow feature requests" section.

> **Be honest about the socket blind spot.** A WebSocket upgrade produces **one** traffic span;
> everything after it (every message, subscribe, token streamed) is **not** a span. SecureNow's
> two levers here are: (1) **the upgrade request** — drop/rate-limit/challenge/block it at the
> edge like any HTTP request (great for connection floods, CSWSH-source containment, scanner
> blocking); and (2) **the events the app emits from inside the socket** — `api.stream.*` /
> `api.websocket.*`, which is the *only* visibility into per-message/per-stream abuse. Pair
> edge-containment **with** the app fix on every 🟡 row. A per-message flaw that emits no event
> is a 🔴 until Phase 3's recommended fix adds the `track()` call.

Use **only** the SecureNow building blocks below. Never invent CLI flags, event names, or SQL
columns.

### 4a. Instrumentation (what realtime detections feed on)

The upgrade request is captured automatically once the app runs under `securenow run` /
`securenow/register` / `securenow init` — you get **one span per connection** (status code,
method, path, client IP, ASN). That covers the **connection layer**: handshake floods, failed
upgrades, deprecated WS paths. **Everything inside the socket needs events** — use
`securenow/events` `track()` (never throws) at the enforcement points. **Reuse these exact event
names** — rules match the literal strings:

```js
const { track } = require('securenow/events');

// A stream/socket opened (feeds duration/volume/concurrency detection — one event per stream,
// unlike traffic which only sees the upgrade):
track('api.stream.started', { userId, ip, attributes: { route: '/api/chat', type: 'sse|websocket|ai_stream|export_stream' } });

// A stream hit a server-side cap (any hit is high-signal — cost abuse / runaway client):
track('api.stream.limit_exceeded', { userId, ip, attributes: { route: '/api/chat', type: 'sse|websocket|ai_stream|export_stream', reason: 'duration|bytes|messages|tokens|queue' } });

// A per-message / subscribe / publish / join authorization check failed (BOLA/BFLA/cross-tenant
// over the socket — the core realtime signal):
track('api.websocket.authz_failed', { userId, ip, attributes: { route: '/ws', channel: '<hash_or_name>', action: 'subscribe|publish|join|message' } });
```

These three are the **native realtime taxonomy**. Where a flaw is broader-API (a spoofed
forwarding header on the upgrade, a proxy-trust issue, a generic rate-limit rejection on the
connect), reuse the existing API events rather than inventing new ones:

```js
// The app's own connection rate limit rejected an upgrade:
track('api.ratelimit.exceeded', { userId, ip, attributes: { route: '/ws', limit: '10', window: '1m' } });

// A spoofed X-Forwarded-For / X-Forwarded-Host on the upgrade was rejected:
track('api.proxy.spoof_detected', { ip, attributes: { route: '/ws', header: 'x-forwarded-for', reason: 'untrusted_forwarded_header' } });
```

> Hash or omit any PII before it becomes an attribute value (room names that embed a user id,
> tokens, emails, message bodies) — see the Phase 1 **telemetry privacy & redaction** check.
> Attributes feed detection; they must not become a new leak path. In particular, **never** put a
> URL query string token into an attribute.

Recommended realtime event taxonomy — rules match these **exact strings**:

| Event | Emit when |
|---|---|
| `api.stream.started` | a WebSocket/SSE/AI/export stream opens (feeds duration / volume / concurrency detection) |
| `api.stream.limit_exceeded` | a stream hits a duration / bytes / messages / tokens / queue cap |
| `api.websocket.authz_failed` | a per-message / subscribe / publish / join authz check fails |
| `api.ratelimit.exceeded` | the app's own connection (or message) rate limit rejects an upgrade/message |
| `api.proxy.spoof_detected` | a spoofed forwarding/Host header on the upgrade was rejected |

Custom `attributes` become queryable as `attributes_string['<key>']` (e.g.
`attributes_string['reason']`, `attributes_string['type']`, `attributes_string['action']`).
Ingest enriches every IP with **ASN/org** (`client.asn`, `client.as_org`) — enabling
botnet/datacenter-origin detection on connection floods with no extra code.

### 4b. Detection rules — SQL conventions

Two query shapes. Both **must** keep the tenant scope and **must** select an `ip` column (per-IP
aggregation is what remediation/auto-block keys on). **The tenant-scope column differs by table**
— using the wrong one fails with `UNKNOWN_IDENTIFIER`:

- **logs/events** (`signoz_logs.distributed_logs_v2`) → `resources_string['service.name'] IN (__USER_APP_KEYS__)`
- **traces/HTTP** (`signoz_traces.distributed_signoz_index_v3`) → `` `resource_string_service$$name` IN (__USER_APP_KEYS__) ``

When grouping by `ip`, add `HAVING ip != '' AND …` so rows with no client IP don't aggregate into
an empty-key bucket. Traffic columns proven available: `response_status_code`, `kind` (server
span = 2), `ts_bucket_start`, `attributes_string['http.target']`, the `client_ip` coalesce below.
Confirm any other column with a `--mode dry_run` before relying on it.

> **Key realtime rule:** traffic SQL only sees the **upgrade** (one span per connection). Use it
> for **connection-layer** detection (handshake floods, failed upgrades, deprecated WS paths).
> For **per-message / per-stream** abuse, query the **logs/events** table for the `api.stream.*`
> / `api.websocket.*` events — a long-lived socket produces no further spans, so event-based
> detection is the only way to see inside it.

**Traffic-based — WebSocket/SSE upgrade flood (single IP) — the handshake IS an HTTP request:**

```sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip,
       count() AS upgrades
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 5 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 5 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/ws%' OR attributes_string['http.target'] LIKE '/socket.io%' OR attributes_string['http.target'] LIKE '%/sse%' OR attributes_string['http.target'] LIKE '%/stream%' OR attributes_string['http.target'] LIKE '%/events%')
GROUP BY ip
HAVING ip != '' AND upgrades >= 100
```

**Traffic-based — failed upgrade / handshake-rejection spike (auth/origin failures, probing):**

```sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip,
       countIf(response_status_code IN ('400','401','403','426','429')) AS failed_upgrades,
       count() AS total
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 15 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 15 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/ws%' OR attributes_string['http.target'] LIKE '/socket.io%' OR attributes_string['http.target'] LIKE '%/sse%' OR attributes_string['http.target'] LIKE '%/stream%')
GROUP BY ip
HAVING ip != '' AND failed_upgrades >= 30
```

**Events-based — per-message / subscribe authz failures (BOLA/BFLA/cross-tenant over socket):**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['action']         AS action,
  attributes_string['channel']        AS channel,
  count() AS authz_failures
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'api.websocket.authz_failed'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY ip, action, channel
HAVING ip != '' AND authz_failures >= 3
```

**Events-based — stream cap breaches (cost amplification / runaway client) — any hit is signal:**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['type']           AS stream_type,
  attributes_string['reason']         AS reason,
  count() AS breaches
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'api.stream.limit_exceeded'
  AND timestamp >= now() - INTERVAL 30 MINUTE
GROUP BY ip, stream_type, reason
HAVING ip != '' AND breaches >= 1
```

**Events-based — stream-open concurrency / rate anomaly (one user/IP opening many streams):**

```sql
SELECT
  attributes_string['http.client_ip'] AS ip,
  attributes_string['type']           AS stream_type,
  count() AS streams_opened
FROM signoz_logs.distributed_logs_v2
WHERE resources_string['service.name'] IN (__USER_APP_KEYS__)
  AND attributes_string['event.type'] = 'api.stream.started'
  AND timestamp >= now() - INTERVAL 5 MINUTE
GROUP BY ip, stream_type
HAVING ip != '' AND streams_opened >= 20
```

The other realtime events follow the **same logs-table shape** — swap the `event.type` filter and
the threshold: `api.ratelimit.exceeded` on a WS route (≥50/15m → connection-limit ceiling
hammered), `api.proxy.spoof_detected` (≥1 → forwarded-header spoofing on the upgrade, notify +
investigate).

**Injection inside a socket message (catalog item that maps to C/B):** SecureNow's system SQLi/
XSS/RCE **signature rules** match on **request attributes** (e.g. `http.target`, body where
captured). A payload carried inside a WebSocket **message** is normally **not** a span attribute,
so the signature rules will **not** see it — treat injection-over-socket as **event-driven**:
validate the message server-side and emit `api.websocket.authz_failed` (or `api.schema.rejected`
if you adopt it) on rejection, then alert on that event. Note this honestly in the matrix rather
than claiming signature coverage you don't have.

Useful attributes/columns: `event.type`, `http.client_ip`, `http.target`, `response_status_code`,
`kind`, `client.asn`, `client.as_org`, and your realtime attributes (`route`, `type`, `reason`,
`action`, `channel`).

**Ready-to-copy command unit (required for every rule).** Each detection must be emitted as a
**complete, copyable unit** — never a fragment. For each rule produce, in order: (1) the SQL, (2)
a line saving it to `rules/<name>.sql`, (3) the full `securenow alerts rules create …` command,
(4) the dry-run test. In Markdown each is its own fenced block so it copies cleanly; the exact
flags MUST match `securenow alerts rules --help` from Phase 0.5. Save each rule's SQL to
`rules/<name>.sql` so `--sql @rules/<name>.sql` resolves. Note pre-existing / system rules
(from Phase 0) instead of duplicating them. **Tag every rule `test-first` or `prod-ready`** (per
§4b-bis): FP-prone rules ship `--mode test` first and carry the `--mode test` → observe (3–7 days)
→ `--mode prod` promotion line; high-precision rules are `prod-ready`. Example unit:

```sql
-- rules/ws-upgrade-flood.sql
WITH coalesce(nullIf(attributes_string['http.client_ip'], ''), nullIf(attributes_string['net.peer.ip'], ''), nullIf(attributes_string['network.peer.address'], '')) AS client_ip
SELECT client_ip AS ip,
       count() AS upgrades
FROM signoz_traces.distributed_signoz_index_v3
WHERE `resource_string_service$$name` IN (__USER_APP_KEYS__)
  AND timestamp >= now64(9) - INTERVAL 5 MINUTE
  AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - INTERVAL 5 MINUTE)) - 1800
  AND kind = 2
  AND (attributes_string['http.target'] LIKE '/ws%' OR attributes_string['http.target'] LIKE '/socket.io%' OR attributes_string['http.target'] LIKE '%/sse%' OR attributes_string['http.target'] LIKE '%/stream%' OR attributes_string['http.target'] LIKE '%/events%')
GROUP BY ip
HAVING ip != '' AND upgrades >= 100
```

```bash
securenow alerts rules create \
  --name "Realtime: WebSocket upgrade flood (single IP)" \
  --sql @rules/ws-upgrade-flood.sql \
  --apps <APP_KEY> \
  --severity high \
  --schedule "*/5 * * * *" \
  --nlp "single IP opening 100+ websocket/sse upgrades in 5 minutes"

securenow alerts rules test <RULE_ID> --mode dry_run --wait     # validate before it runs live
# test-first (FP-prone connection-flood heuristic): create detect-only, observe, then promote
securenow alerts rules update <RULE_ID> --mode test            # detect-only for 3–7 days
securenow alerts rules update <RULE_ID> --mode prod            # promote once tuned (arms mitigation)
```

### 4b-bis. Test mode for false-positive-prone rules

Alert rules have a lifecycle **mode**: `test` = **detect-only, NO mitigation** vs `prod` = full
(mitigation / auto-action armed) — plus a **status** (`Active | Disabled | Paused`). Manage with:

```bash
securenow alerts rules update <RULE_ID> --mode test     # detect-only: fires notifications, takes NO action
# …observe real traffic for several days; tune the threshold; add securenow fp exclusions for any FPs…
securenow alerts rules update <RULE_ID> --mode prod      # promote: arm the mitigation / auto-action
securenow alerts rules update <RULE_ID> --status Paused  # or --enable / --disable / --pause shortcuts
```

**Rule of thumb:** any detection that can **false-positive** — heuristic thresholds (connection /
upgrade flood, failed-upgrade spike, stream-open concurrency/rate, message-flood counts), broad
patterns, anomaly / volume rules, anything tuned to YOUR realtime traffic (legit reconnecting
clients, internal stream consumers, a single power user opening many legitimate streams) — must
ship in **`--mode test` first**. Run it detect-only for **3–7 days of real traffic**, review what it
flags, raise/lower the threshold and add `securenow fp` exclusions for legitimate hits, then
`--mode prod` to arm mitigation. Only **high-precision** rules (exploit-signature SQLi/XSS/RCE
matches, exact-match IoCs, known-bad ASN hits, and any-hit-is-signal events like
`api.stream.limit_exceeded` / `api.proxy.spoof_detected`) may go straight to `prod`. In the report,
**tag each rule `test-first` or `prod-ready`** and say why. (`securenow alerts rules test <id>
--mode dry_run --wait` is the separate one-off *query* validation — run it before either.)

### 4c. Mitigation commands (the only allowed remediation surface)

For realtime abuse, SecureNow **contains the actor at the edge by intercepting the upgrade
request** (the only thing it can touch on a socket); the **app control** removes the underlying
weakness (origin check, per-message authz, channel ownership, duration/byte/token caps,
backpressure). Always pair them — the app fix is **primary** for every per-message/per-stream
row.

Use the **complete SecureNow mitigation toolbox** below. Once a threat is confirmed, **choose the
narrowest effective mitigation(s) from ALL of these** and combine them (e.g. rate-limit the upgrade
route + block the worst connectors + challenge a NAT egress opening sockets). Re-check every
command/flag against the installed SDK in Phase 0.5 (`securenow <cmd> --help`); annotate
`# requires securenow >= <ver>` if absent. Scope by **app / env / route / method / IP / duration**
to avoid hitting real users. For this domain, **connection-rate limit / block on the upgrade
request** is the workhorse edge lever (the handshake is the only thing SecureNow can intercept on a
socket) — but pair it with the app control on every per-message/per-stream row.

| # | Mitigation | Command (ready-to-copy) | Use / scope |
|---|---|---|---|
| 1 | **Free firewall (network)** | `securenow firewall enable --app <APP_KEY> --env production` · `securenow run --firewall-only` · test `securenow firewall test-ip <ip> --path /ws --method GET` | 500k+ known-bad IPs, hourly refresh; drop scanners & known-bad connectors before the upgrade reaches the app. No app change. |
| 2 | **Exploit-signature instant block** | enable the `instant` config on the system SQLi/XSS/RCE signature rules (dashboard / MCP `securenow_alert_rule_instant_update`); custom rule → create with `--execution-mode instant` | synchronous ~2.6s block of the matching request — only matches when the payload is in a captured **request attribute** (handshake URL/headers), **not** inside socket messages; note the limitation. Don't duplicate pattern SQL. |
| 3 | **IP block — global** | `securenow blocklist add <ip> --app <APP_KEY> --env production --reason "..."` | confirmed-malicious connector, all routes — drops its upgrades, killing the socket source. |
| 4 | **IP block — scoped to route (+ method)** | `securenow blocklist add <ip> --route /ws* --mode prefix --method ALL --app <APP_KEY> --env production --reason "..."` (`--mode exact\|prefix\|regex`, `--method GET\|POST\|…\|ALL`) | block an IP only on the WS/SSE upgrade paths; least collateral. |
| 5 | **IP block — temporary / time-boxed** | `securenow blocklist add <ip> --duration 24h --reason "..."` (`30m`,`24h`,`7d`) · reverse `securenow blocklist unblock <id> --reason "..."` | auto-expiring containment of an abusive connector; audit-preserving unblock. |
| 6 | **Rate limit — per IP** | `securenow ratelimit add <ip> --limit 100 --window 1m --duration 24h --reason "..."` | throttle one abusive client across the app. |
| 7 | **Rate limit — per route (all clients, per-IP budget)** | `securenow ratelimit add --route /ws --mode prefix --method GET --limit 60 --window 1m --key-by ip` | cap the upgrade/handshake route for everyone, budgeted per IP — connection floods, reconnect storms. |
| 8 | **Rate limit — per route + IP** | `securenow ratelimit add <ip> --route /ws --mode prefix --method GET --limit 10 --window 1m --duration 24h` · NL `securenow ratelimit from-text "rate limit /ws to 10/min for 24h" --yes` · test `securenow ratelimit test <ip> --path /ws --method GET` | precise throttle of one client on the upgrade route — reversible, auto-expires. |
| 9 | **CAPTCHA / proof-of-work challenge** | `securenow challenge add --route /ws --difficulty 18 --clearance 30m` (route-wide) **or** `securenow challenge add <ip> --route /ws --difficulty 18 --clearance 30m` · test `securenow challenge test <ip> --path /ws --method GET` | bot connection floods from **shared / NAT / CGNAT** egress — a human passes once, a script can't keep opening sockets. Prefer over a hard block when real users share the IP. |
| 10 | **Auto-block (risk-scored)** | `securenow automation defaults --yes` (≥95→7d, 90–94→72h, 85–89→24h) · custom `securenow automation create --conditions '[...]' --actions '[...]'` · preview `securenow automation dry-run <id>` | hands-off blocking by risk score on the upgrade source; actions include block / rate_limit / requireCaptcha. |
| 11 | **Session revocation** | `securenow revoke …` (SDK `securenow/sessions` `guard()` / `isRevoked()`) | a long-lived socket outliving a session revocation / role change — kill the stolen session, not the IP. |
| 12 | **Trusted IP (suppress)** | `securenow trusted add <ip> --label "Realtime worker / partner stream / monitor"` | stop false positives from internal stream consumers / partner integrations — suppresses detection **and** mitigation. NOT deny-by-default. |
| 13 | **Allowlist (deny-by-default)** | `securenow allowlist add <ip> --label "..." --reason "..."` ⚠️ once any entry exists, ONLY listed IPs reach the app | lockdown of an internal/admin-only socket surface. Never for a public app. |
| 14 | **False-positive exclusion** | `securenow fp create --conditions '[...]' --rule-scope this_rule --reason "..."` · `securenow fp mark <notification-id> <ip> --rule-scope this_rule` · preview `securenow fp dry-run --conditions '[...]'` | keep a noisy realtime rule quiet (legit reconnecting clients, internal stream consumers) without weakening it. |
| 15 | **App / config / code fix (primary for root cause)** | *described in the Code-Findings report, never auto-applied* | the actual fix: origin allowlist on upgrade, per-message authz, channel-ownership check, connection + concurrency caps, message-rate + byte + duration + token caps, backpressure, bounded fanout, token out of the URL. SecureNow contains the upgrade; the fix removes the weakness. |

**Choosing per threat** — by **confidence**: exploit-signature/exact IoC on the handshake →
instant-block or block; probable bot opening sockets on shared egress → **challenge**; noisy/
legit-mixed connection traffic (reconnects, internal consumers) → **rate-limit (test-mode first)**;
session compromise on a long-lived socket → **revoke**; known-good stream infra → **trusted / fp**.
By **blast radius**: always scope to the narrowest `route`/`method`/`IP`/`duration` that stops the
abuse; on NAT/CGNAT/shared IPs prefer challenge/rate-limit over a hard block. Always pair an edge
mitigation with the **app/config fix** (Code-Findings report) — for every per-message/per-stream
row the app control is **primary** and SecureNow is **upgrade-request containment** only.

Tier guidance: **the firewall/rate-limit/challenge/block all act on the upgrade request** — they
can stop a source from *opening new sockets*, but they **cannot** reach inside a socket that is
already established. So edge containment is for **connection-level** abuse (floods, reconnect
storms, scanner sources); **per-message / per-stream** abuse must be stopped by the **app
control** (the server closing the offending socket and emitting the event), with SecureNow
**rate-limiting/blocking the source's future upgrades** as containment. Prefer **challenge** over
a hard block when the abusing IP may be shared/NAT/CGNAT carrying real users. Recommend
**notify-only** for false-positive-prone signals (legit high-frequency reconnects, internal load
tests, a single power user opening many legitimate streams) — say so explicitly with the runbook
command the human runs after confirming.

### 4d. Testing every detection and mitigation

Only test against apps/environments the user owns; prefer `--env local`/staging. For synthetic
source IPs use TEST-NET ranges (`192.0.2.0/24`, `198.51.100.0/24`, `203.0.113.0/24`).

```bash
# Synthetic per-message authz failures — exercise the cross-channel/BOLA-over-socket rule:
for i in $(seq 1 5); do
  securenow event send api.websocket.authz_failed --ip 203.0.113.60 \
    --attrs route=/ws,channel=tenant-other,action=subscribe,test=true
done

# Synthetic stream-cap breach — exercise the cost-amplification rule (any hit fires):
securenow event send api.stream.limit_exceeded --ip 203.0.113.61 \
  --attrs route=/api/chat,type=ai_stream,reason=tokens,test=true

# Synthetic stream-open burst — exercise the concurrency/rate anomaly rule:
for i in $(seq 1 25); do
  securenow event send api.stream.started --ip 203.0.113.62 \
    --attrs route=/api/chat,type=sse,test=true
done

# Synthetic connection-limit ceiling on the upgrade:
for i in $(seq 1 60); do
  securenow event send api.ratelimit.exceeded --ip 203.0.113.63 \
    --attrs route=/ws,limit=10,window=1m,test=true
done

# Validate a rule query without waiting for the schedule:
securenow alerts rules test <RULE_ID> --mode dry_run --wait

# Traffic-based upgrade-flood rule — generate spans, then check the pipeline:
securenow test-span "threat-model.realtime.smoke"
securenow forensics "websocket/sse upgrade requests and 4xx by IP in the last hour" --env production

# Mitigation verification (the upgrade route is what the edge controls):
securenow ratelimit test 203.0.113.60 --path /ws --method GET
securenow challenge test 203.0.113.60 --path /ws --method GET
securenow firewall test-ip 203.0.113.60 --app <APP_KEY> --env production

# Confirm + clean up:
securenow notifications list --limit 10
securenow blocklist list      # then: securenow blocklist unblock <id> --reason "threat-model test"
securenow challenge list      # then: securenow challenge remove <id>
```

Every 🟢/🟡 threat row in the report must have a concrete test recipe (commands + expected
outcome: which rule fires, which notification appears, what the mitigation does on the upgrade).

---

## Phase 5 — Write the FOUR deliverables (two tracks)

Write **four** files into `threat/17-realtime-websocket-sse/` — two per track, Markdown + HTML.
The two tracks **cross-link** each other. `<slug>` = `realtime-websocket-sse`.

- `realtime-websocket-sse-detection-mitigation.md` / `.html` — Track A: the operational runbook.
- `realtime-websocket-sse-code-findings.md` / `.html` — Track B: the code audit (findings only).

### 5a. Track A — Detection & Mitigation report — sections (both .md and .html), in order

1. **Executive summary** — stats line (threats modeled · covered · partial · gaps · rules to
   create · mitigations), top 3 **detectable** realtime risks for this stack, the installed
   `securenow` version + app key + firewall state, and a one-line OWASP API Top 10:2023 coverage
   note (this domain owns the realtime slice of API1/API4/API5/API8; auth/tenancy/AI-cost
   deferred to siblings).
2. **SDK & environment** — installed SDK version (from `node_modules/securenow`, Phase 0.5), app
   key(s), environment, firewall state, existing rules/automations/challenge rules (from Phase 0),
   and any **system signature rules** present.
3. **Threat → Detection → Mitigation matrix** — one row per modeled threat:
   `# | Threat | OWASP/CWE | Coverage 🟢/🟡/🔴 | Detection rule | Signal (threshold+window) | Schedule | Sev | Mode (test-first/prod-ready) | Mitigation`.
   Severity ∈ {critical, high, medium, low}. The **Mode** cell tags each rule `test-first` or
   `prod-ready` per the §4b-bis rule of thumb (FP-prone heuristic/anomaly/volume rules →
   `test-first`; high-precision signature/exact-IoC/any-hit-is-signal events → `prod-ready`). The
   **Mitigation** cell must pick **specific, scoped** mitigation(s) from the §4c toolbox (by
   toolbox # / name, scoped to route/method/IP/duration as relevant) — never a generic "block the
   IP." Then the **"Out of scope"** N/A list and the **deferred-to-sibling** rows (auth/tenancy/
   AI-cost, linked to ../01-authentication/ , ../03-tenant-isolation/ , ../25-ai-llm-features/).
   Cross-link gap/instrumentation rows to the relevant **code finding** in Track B.
4. **Detection rules to create** — each as the **ready-to-copy command unit** from Phase 4
   (SQL → save to `rules/<name>.sql` → full `securenow alerts rules create …` → dry-run test).
   **Mark each rule `test-first` or `prod-ready`**; for `test-first` rules include the
   `--mode test` → observe (3–7 days) → `--mode prod` promotion step (per §4b-bis).
   Injection-class rows reference the **system signature rules + `instant.block`** (with their
   in-socket limitation), not duplicate SQL. Note rules that already exist instead of duplicating.
5. **Instrumentation the detections need** — only the `track('api.stream.*' | 'api.websocket.*')`
   events the rules above consume, each as a copyable snippet; point to the **code-findings
   report** for *where* (file:line) to add them. Make the "one upgrade = one span; inside the
   socket needs events" point explicitly.
6. **Mitigation mechanisms** — render the **full §4c toolbox table** (all 15 rows: free firewall ·
   exploit-signature `instant.block` with its in-socket limit · block [global / route / method /
   temporary] · rate-limit [IP / route / IP+route] · challenge · auto-block · revoke · trusted ·
   allowlist · fp · app-config fix) + the **"Choosing per threat"** guidance + per-threat
   ready-to-copy mitigation command + reversibility. Make explicit that the **app control is
   primary** for every per-message/per-stream row and SecureNow is **upgrade-request containment**
   only.
7. **Action plan (copy-paste, ordered)** — ① firewall + signature instant-block, ② add the
   connection rate limit + `api.stream.*` / `api.websocket.*` instrumentation, ③ create rules —
   **FP-prone rules in `--mode test` (detect-only)**, high-precision rules in `--mode prod`,
   ④ enable automations/challenge on the upgrade route, ⑤ test, ⑥ verify, ⑦ **promote the
   `test-first` rules to `--mode prod` after N days (3–7) of clean observation** (add `securenow fp`
   exclusions for any FPs first), ⑧ schedule the app/config fixes (origin check, per-message authz,
   channel ownership, caps, backpressure) from the code report. Real commands only, `<APP_KEY>`
   substituted.
8. **Testing & validation** — per-rule recipe: `securenow event send …` / `test-span` / dry-run +
   expected outcome + cleanup (TEST-NET IPs 192.0.2/198.51.100/203.0.113).
9. **Response runbooks** — per notification type: confirm TP → respond command (copy) → reverse
   command (copy); and — for per-message/per-stream signals — the note that the app must close the
   offending socket (SecureNow blocks the *next* upgrade, not the open socket).
10. **Known gaps & SecureNow feature requests** — each 🔴: why not coverable (per-message content
    inspection inside an established socket, backpressure enforcement, mid-stream single-message
    block), interim fix (link to Track B), and the *"contact the SecureNow team"* line.
11. **Appendix** — resolved SDK/CLI version (Phase 0.5), app key, environment, rule IDs created,
    date, link to the code-findings report.

### 5b. Track B — Code Findings & Recommendations report — sections (both .md and .html), in order

State at top: *"Findings only — no application code was modified."*

1. **Executive summary** — findings by severity (critical/high/med/low), top 3 code risks for this
   stack, one-paragraph posture verdict.
2. **Surface & inventory** — the Phase 1 realtime/streaming inventory: connection catalog +
   per-connection controls table + channel/room↔tenant boundary map + credential-in-URL list +
   proxy/transport posture + telemetry redaction status.
3. **Threat catalog** — the exhaustive Phase 2 catalog (groups A–J), each tagged OWASP/CWE,
   modeled or explicit N/A.
4. **Code-level findings (audit)** — table
   `# | Location (file:line) | Threat | OWASP/CWE | Sev | Issue | Recommended fix`, each with the
   quoted 1–8 line snippet and the **described** fix (never applied).
5. **Strengths** — controls already present and correct (origin allowlist on upgrade, per-message
   authz, duration/byte caps, backpressure handled) — keep the posture honest.
6. **App / config fixes (primary remediation)** — the config/code changes that remove the root
   cause (origin allowlist, per-message authz, channel ownership, connection + concurrency caps,
   message/byte/duration/token caps, backpressure, bounded fanout, token out of the URL),
   described not applied, each linked to the Track-A detection row it backs.
7. **Instrumentation recommendations** — the `track('api.stream.*' | 'api.websocket.*')` calls to
   add and the exact file:line to add them, so the Track-A detection rules light up.
8. **Appendix** — files reviewed, resolved SDK version (Phase 0.5), date, link to the
   detection-mitigation report.

### 5c. HTML skeletons — two self-contained files (offline; inline CSS + copy JS; no network)

Both HTML files (Track A and Track B) share the same `<head>` (brand tokens + copy-button styles)
and the copy `<script>` at the end of `<body>`. Change only the `<title>`, the sidebar subtitle,
and the section content. Wrap **EVERY** command/SQL block as a `.cmd` (so it gets a Copy button).
Both files are fully self-contained: no CDN, no fonts, no external scripts.

**Shared skeleton (used by BOTH `…-detection-mitigation.html` and `…-code-findings.html`):**

```html
<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title><!-- "Detection & Mitigation — Realtime / WebSocket / SSE — SecureNow" OR "Code Findings — Realtime / WebSocket / SSE — SecureNow" --></title>
<style>
  :root{--bg:#0f1419;--panel:#161c24;--panel2:#1b2330;--border:#26303d;--txt:#dbe3ec;--muted:#8b97a7;
    --accent:#3ea6ff;--accent2:#16c79a;--crit:#ff5c6c;--high:#ff9f43;--med:#f7c948;--low:#8b97a7;
    --ok:#16c79a;--info:#3ea6ff;--rev:#b388ff;}
  *{box-sizing:border-box}html{scroll-behavior:smooth}
  body{margin:0;background:var(--bg);color:var(--txt);font:15px/1.6 -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif}
  a{color:var(--accent);text-decoration:none}
  code{background:#0b0f14;border:1px solid var(--border);border-radius:5px;padding:.08em .4em;font:13px/1.4 ui-monospace,"SF Mono",Menlo,Consolas,monospace;color:#9fe0c0}
  .wrap{display:grid;grid-template-columns:240px 1fr;max-width:1280px;margin:0 auto}
  nav{position:sticky;top:0;align-self:start;height:100vh;overflow:auto;padding:28px 18px;border-right:1px solid var(--border);background:var(--panel)}
  nav .brand{font-weight:700;font-size:15px;letter-spacing:.3px}nav .brand span{color:var(--accent)}
  nav .sub{color:var(--muted);font-size:12px;margin-bottom:22px}
  nav a{display:block;color:var(--muted);padding:7px 10px;border-radius:7px;font-size:13.5px}
  nav a:hover{background:var(--panel2);color:var(--txt)}
  main{padding:36px 40px 80px;min-width:0}
  header.top h1{margin:0 0 6px;font-size:26px}header.top p{margin:0;color:var(--muted)}
  .pill{display:inline-block;font-size:11px;font-weight:600;padding:3px 9px;border-radius:999px;border:1px solid var(--border);color:var(--muted);background:var(--panel)}
  .stats{display:grid;grid-template-columns:repeat(5,1fr);gap:14px;margin:26px 0 34px}
  .stat{background:var(--panel);border:1px solid var(--border);border-radius:12px;padding:16px 18px}
  .stat .n{font-size:26px;font-weight:700}.stat .l{color:var(--muted);font-size:12.5px;margin-top:2px}
  section{margin:0 0 40px}
  h2{font-size:18px;margin:0 0 14px;padding-bottom:8px;border-bottom:1px solid var(--border)}
  h2 .num{color:var(--accent);font-weight:700;margin-right:8px}
  table{width:100%;border-collapse:collapse;font-size:13.5px;background:var(--panel);border:1px solid var(--border);border-radius:12px;overflow:hidden}
  th,td{text-align:left;padding:11px 13px;border-bottom:1px solid var(--border);vertical-align:top}
  th{background:var(--panel2);color:var(--muted);font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.4px}
  tr:last-child td{border-bottom:none}tr:hover td{background:#19212c}
  .rid{font:12px ui-monospace,Menlo,Consolas,monospace;color:#7fd1ff;white-space:nowrap}
  .b{display:inline-block;font-size:11px;font-weight:700;padding:2px 8px;border-radius:6px;white-space:nowrap}
  .b.crit{background:rgba(255,92,108,.15);color:var(--crit);border:1px solid rgba(255,92,108,.35)}
  .b.high{background:rgba(255,159,67,.13);color:var(--high);border:1px solid rgba(255,159,67,.32)}
  .b.med{background:rgba(247,201,72,.13);color:var(--med);border:1px solid rgba(247,201,72,.32)}
  .b.low{background:rgba(139,151,167,.13);color:var(--low);border:1px solid rgba(139,151,167,.32)}
  .c{display:inline-block;font-size:11px;font-weight:700;padding:2px 8px;border-radius:6px;white-space:nowrap}
  .c.cov{background:rgba(22,199,154,.13);color:var(--ok);border:1px solid rgba(22,199,154,.35)}
  .c.part{background:rgba(247,201,72,.13);color:var(--med);border:1px solid rgba(247,201,72,.32)}
  .c.gap{background:rgba(255,92,108,.15);color:var(--crit);border:1px solid rgba(255,92,108,.35)}
  .owasp,.cwe{display:inline-block;font:11px ui-monospace,Menlo,Consolas,monospace;color:var(--accent);border:1px solid rgba(62,166,255,.3);border-radius:6px;padding:1px 6px;white-space:nowrap}
  .cwe{color:var(--rev);border-color:rgba(179,136,255,.3)}
  .m{display:inline-block;font-size:11px;font-weight:600;padding:2px 8px;border-radius:6px;border:1px solid var(--border)}
  .m.block{color:var(--crit);border-color:rgba(255,92,108,.35)}.m.rate{color:var(--info);border-color:rgba(62,166,255,.35)}
  .m.challenge{color:var(--accent2);border-color:rgba(22,199,154,.35)}.m.firewall{color:var(--ok);border-color:rgba(22,199,154,.35)}
  .m.signature{color:var(--crit);border-color:rgba(255,92,108,.35)}.m.notify{color:var(--muted)}.m.appfix{color:var(--high);border-color:rgba(255,159,67,.35)}
  .card{background:var(--panel);border:1px solid var(--border);border-radius:12px;padding:18px 20px}
  .grid2{display:grid;grid-template-columns:1fr 1fr;gap:16px}
  pre{background:#0b0f14;border:1px solid var(--border);border-radius:10px;padding:14px 16px;overflow:auto;font:13px ui-monospace,Menlo,Consolas,monospace;color:#cfe8da;margin:0}
  .cmd{position:relative;margin:10px 0}
  .copy{position:absolute;top:8px;right:8px;font:11px ui-monospace,Menlo,Consolas,monospace;color:var(--muted);background:var(--panel2);border:1px solid var(--border);border-radius:6px;padding:3px 9px;cursor:pointer}
  .copy:hover{color:var(--txt);border-color:var(--accent)}.copy.done{color:var(--ok);border-color:var(--ok)}
  .flow{display:flex;flex-wrap:wrap;align-items:center;gap:8px;margin:6px 0 14px}
  .flow .step{background:var(--panel2);border:1px solid var(--border);border-radius:9px;padding:8px 12px;font-size:13px}.flow .arr{color:var(--accent);font-weight:700}
  .note{border-left:3px solid var(--high);background:rgba(255,159,67,.06);padding:10px 14px;border-radius:0 8px 8px 0;color:#e7d3bd;font-size:13.5px;margin:10px 0}
  footer{color:var(--muted);font-size:12px;border-top:1px solid var(--border);padding-top:18px;margin-top:30px}
  @media(max-width:880px){.wrap{grid-template-columns:1fr}nav{display:none}.stats,.grid2{grid-template-columns:1fr 1fr}main{padding:24px 18px}}
</style></head>
<body>
<div class="wrap">
  <nav>
    <div class="brand">Secure<span>Now</span></div>
    <div class="sub"><!-- "Detection & Mitigation · Realtime / WebSocket / SSE" OR "Code Findings · Realtime / WebSocket / SSE" --></div>
    <!-- one <a href="#…"> per section of THIS track (5a or 5b) -->
  </nav>
  <main>
    <header class="top"><h1><!-- report title for this track --></h1>
      <p><code><!-- app name / domain --></code> · <span class="pill">securenow <!-- installed version --></span></p></header>
    <div class="stats"><!-- 5 .stat cards; numbers MUST equal the table/finding counts of THIS track --></div>
    <!-- <section id="…"> blocks mirroring the Markdown sections of THIS track (5a or 5b) -->
    <footer>Generated by the SecureNow realtime / WebSocket / SSE threat-model prompt · <!-- date --> · securenow <!-- version --> · app <code><!-- APP_KEY --></code></footer>
  </main>
</div>
<script>
document.querySelectorAll('.copy').forEach(function(b){b.addEventListener('click',function(){
  var pre=b.parentElement.querySelector('pre'); if(!pre)return; var t=pre.innerText;
  function done(){b.textContent='Copied';b.classList.add('done');setTimeout(function(){b.textContent='Copy';b.classList.remove('done');},1500);}
  function fb(){var ta=document.createElement('textarea');ta.value=t;ta.style.position='fixed';ta.style.opacity='0';document.body.appendChild(ta);ta.focus();ta.select();try{document.execCommand('copy');}catch(e){}document.body.removeChild(ta);done();}
  if(navigator.clipboard&&navigator.clipboard.writeText){navigator.clipboard.writeText(t).then(done,fb);}else{fb();}
});});
</script>
</body></html>
```

**Track A** uses the **Detection & Mitigation** title/subtitle and section set 5a (Executive
summary · SDK & environment · matrix · rules to create · instrumentation · mitigation mechanisms ·
action plan · testing · response runbooks · known gaps · appendix). The 5 stat cards = threats
modeled · covered · partial · gaps · rules-to-create (or mitigations). **Track B** uses the
**Code Findings** title/subtitle and section set 5b (Executive summary · surface & inventory ·
threat catalog · code-level findings · strengths · app/config fixes · instrumentation
recommendations · appendix), and its 5 stat cards = findings critical · high · medium · low ·
strengths.

**Every** SQL/command block in the **Detection & Mitigation** HTML uses the copyable wrapper:

```html
<div class="cmd"><button class="copy" type="button">Copy</button><pre>securenow alerts rules create \
  --name "..." --sql @rules/&lt;name&gt;.sql --apps &lt;APP_KEY&gt; --severity high \
  --schedule "*/5 * * * *" --nlp "..."</pre></div>
```

Badge usage: severity `<span class="b crit|high|med|low">`; coverage
`<span class="c cov|part|gap">COVERED|PARTIAL|GAP</span>`; OWASP `<span class="owasp">API4</span>`;
CWE `<span class="cwe">CWE-862</span>`; mitigation
`<span class="m firewall|signature|rate|challenge|block|notify|appfix">`; rule IDs
`<span class="rid">`. Stats numbers must equal the matrix/findings row counts. The Code-Findings
HTML may omit copy buttons on prose, but still wraps any example/fix command in `.cmd`.

---

## Quality bar (the report is rejected if any of these fail)

- Every catalog item A1–J44 is either a matrix row or an explicit N/A line; each modeled row
  carries its OWASP API Top 10:2023 tag (API1 / API4 / API5 / API8, or "—").
- The realtime emphasis is fully covered, each modeled or explicit N/A: **origin not validated on
  upgrade (CSWSH)**, **auth only at connect not per message**, **cross-tenant subscription from
  client input**, **BOLA over a socket**, **connection flood / long-lived exhaustion (no
  connection rate limit)**, **missing message/byte/token/duration/queue caps**, **unbounded
  broadcast fanout**, **streaming cost amplification**, **backpressure absent**, **token in the
  upgrade URL**, and the **SSE / long-polling / Socket.IO** equivalents — with the matching
  `api.stream.*` / `api.websocket.*` event where the detection needs instrumentation.
- Connection auth, the channel/tenant data model, and AI stream cost are **deferred** to the
  numbered siblings (rows present, linked to ../01-authentication/ , ../03-tenant-isolation/ ,
  ../25-ai-llm-features/ — not re-derived).
- Every matrix row has a concrete signal (threshold + window), severity, and mitigation — no
  "monitor for suspicious activity" filler.
- Every code finding in section 4 has a `file:line`, the quoted snippet, and a described fix —
  and **no application code was modified** (this is an audit).
- Every detection SQL keeps `__USER_APP_KEYS__` scoping (correct table column:
  `resources_string['service.name']` for events/logs vs `` `resource_string_service$$name` ``
  for traces), selects an `ip` column, includes the `client_ip` coalesce, and
  `HAVING ip != ''`; traffic queries keep the `ts_bucket_start` + `kind = 2` guards. The report
  states that long-lived sockets need **event-based** detection (one upgrade = one span).
- Coverage is honest: the connection/upgrade layer and the native events are 🟢/🟡; per-message
  authz, channel ownership, origin validation, and message/byte/token/duration caps are **app
  fixes** — every such row pairs **edge containment WITH the app fix**, and SecureNow's
  signature-rule limitation inside socket messages is stated, not glossed.
- Only commands, flags, events, and SQL columns from this prompt's building blocks appear
  (including `securenow challenge …`, `firewall`, and the `api.stream.*` / `api.websocket.*` /
  `api.ratelimit.exceeded` / `api.proxy.spoof_detected` events).
- Every 🔴 gap appears in the gaps section with an interim app/config fix **and** the "contact the
  SecureNow team" line.
- The action plan runs top-to-bottom with `<APP_KEY>` substituted in.
- **Phase 0.5 ran**: the resolved installed `securenow` version appears in **both** reports'
  appendix, and no command/flag/event/column is emitted that the installed SDK/CLI does not expose
  (else it is annotated `# requires securenow >= <version>`).
- Every detection rule is a **complete copyable unit** (SQL → `rules/<name>.sql` → full
  `securenow alerts rules create …` → dry-run test); flags match `alerts rules --help`.
- **Four** files are written to `threat/17-realtime-websocket-sse/` (detection-mitigation
  .md + .html, code-findings .md + .html); the two tracks **cross-link**; both HTML files are
  self-contained (inline CSS/JS, no CDN/fonts/network) and **every command block in the detection
  HTML has a working Copy button** (`.cmd` wrapper + the copy `<script>`).
- The split is honest: SecureNow-runnable detections/mitigations live in the **Detection** report;
  code/config changes live in the **Code-Findings** report; nothing security-relevant is dropped.
- The Detection report's mitigation section presents the **full toolbox** (§4c: firewall ·
  instant-block · block [global / route / method / temporary] · rate-limit [IP / route / IP+route] ·
  challenge · auto-block · revoke · trusted · allowlist · fp · app-fix), and **each modeled threat's
  matrix row selects specific, scoped mitigation(s) from it** — never a generic "block the IP."
- **Every false-positive-prone rule is tagged `test-first`** and carries the `--mode test` →
  observe (3–7 days) → `--mode prod` promotion workflow; only high-precision rules are `prod-ready`.
  The action plan creates the test-first rules in `--mode test` and has an explicit "promote after
  N days" step.
- A one-line summary is printed back: per-track file paths, threat counts, rules-to-create count,
  code findings by severity, gaps, and the resolved SDK version.

<!-- ════════════════ END OF PROMPT ════════════════ -->
