Elysium Safety Architecture · The Threshold Collective

Why Elysium needs an architecture, not a system prompt.

Most consumer-grade conversational AI products in mental health are an LLM plus a system prompt. That is not enough. A system prompt can be jailbroken, drift under load, or fail to fire on disclosure language the model was not pretrained on. For a clinical-grade product the safety logic must live above the LLM, in deterministic code that runs before and after every model call.

Elysium implements four such layers, each independent, each version-controlled, each tested. If any single layer fails, the others still hold.

The four layers.

Layer 1 · Input guardrails

Deterministic regex check before the LLM is called.

164 patterns active in production today, grouped in four severity tiers (CRITICAL, HIGH, MODERATE, METHOD-SEEK) plus a separate surveillance-refusal block.

Sources: NHS "Help for suicidal thoughts" pathway, Columbia Suicide Severity Rating Scale (C-SSRS), PHQ-9 item 9, Beck Suicide Intent Scale, Joiner's Interpersonal-Psychological Theory (perceived burdensomeness, thwarted belongingness, acquired capability), Samaritans listening-guidelines + lethal-means counselling, Mind UK warning signs, CALM (male-coded distress), Royal College of Psychiatrists adult risk assessment, Papyrus / HOPELINEUK (youth), WHO Preventing Suicide media guide, published Crisis Text Line trigram research, UK suicide-hotspot literature, modern internet-coded language.

If a pattern matches, the LLM is not called. The user receives a canned response surfacing UK crisis resources (Samaritans 116 123, Shout 85258, NHS 111 option 2, 999) within sub-25 milliseconds. This is the difference between an LLM that may or may not respond safely and a deterministic safety boundary.

Layer 2 · LLM with sourced system prompt + Modelfile

The model itself is configured for the brief.

Llama 3.2 3B running locally via Ollama. Each call carries a 240-word system prompt enforcing British English, validate-before-advise structure, single-question pacing, no-bluff honesty, and explicit refusal of toxic-positivity tropes. The Modelfile baked into the Ollama image is a compact fallback if the gateway prompt fails to load.

Crucially, the LLM is not the safety boundary. It is the fluency layer. Safety lives in Layers 1 and 3.

Layer 3 · Output post-processor

47 deterministic rules applied to every reply before it reaches the user.

Strips em-dashes, kills "I'm sorry to hear that" openings, removes parenthetical meta-commentary the small model emits as stage notes, catches AI-tells ("as an AI language model", "I cannot experience"), translates American to British spellings (organize → organise, color → colour), enforces sentence-length discipline, removes 24+ banned phrases including "you deserve", "everything happens for a reason", "stay strong", "look on the bright side".

This means brand voice and safety phrasing rules cannot be bypassed by a model that drifts. The post-processor is the deterministic floor.

Layer 4 · Session governance + telemetry

Redis-backed sessions, rate limits, deterministic summarisation, full audit trail.

Per-session conversation state stored in Redis with a 2-hour TTL, automatically truncated to 12 turns. Beyond 8 turns, older history is compressed deterministically into a 50-100 word recap (no extra LLM call) so prompt size stays bounded. Per-IP rate limit of 8 requests per minute and 40 per hour. Circuit breaker on the upstream model. Full Prometheus metrics including per-tier crisis hits, guardrail verdicts, retrieval rates, model latency, rate-limit triggers.

This means we can prove what happened. Auditable, time-stamped, exportable.

Cross-cutting protections.

PII scrubbing

Phone numbers (UK formats), NHS numbers, email addresses, UK postcodes are stripped from every response before it is returned to the user. The session payload itself is in-memory only and is never written to a long-term store.

Surveillance refusal

Patterns that target individual employees, scoring of staff wellbeing, covert monitoring of messages, and "flag at-risk staff" requests are refused on principle with a canned response explaining why. This is not configurable. Workplace deployments cannot disable it.

Knowledge base grounding

BM25 retrieval over five curated TTC documents (crisis protocols, lived-experience framework, principles, UK signposting, values). Triggers only above a tightened relevance threshold so retrieval-augmented generation only fires on factual questions, not casual chat. When triggered, the retrieved text is added to the system prompt with explicit "do not quote verbatim" instruction.

Today's measured numbers.

From the stress test run on the live production gateway, 2026-05-05:

Metric	Value	Notes
Crisis patterns active	164	Up from 48 at start of week
Crisis detection rate	100.0%	71/71 on a clinical battery spanning every category
False-positive rate	0.0%	12/12 work-context and benign prompts correctly NOT flagged
Pattern-level F1 accuracy	100.0%	83 cases combined
Voice-clean rate on LLM path	100.0%	12 corpus-seed prompts run through live gateway, audited against 8 deterministic voice rules
End-to-end crisis latency	4-25 ms	Sub-LLM, canned response path
LLM-path latency, median	34 s	Range 22-76s on a 4-CPU box with no quantisation past Q4
Output post-processor rules	50+	Deterministic voice + safety enforcement, v2 deployed 2026-05-05
Per-IP rate limit	8/min, 40/hour	Production guard against abuse
Session retention	2 hours TTL	Then auto-purged
PII scrubbing	Phone, NHS, postcode, email	Stripped from every response

Trade-off decision: faster model rejected on safety grounds.

This morning we benchmarked a 1.2B-parameter Llama variant alongside the production 3.2B model. The smaller model was 2.2× faster on median (29.7s vs 66.7s on a 5-prompt suite). It also produced a hallucination on the medication-boundary case, telling the user to "consult with me first" before changing their treatment plan, where the correct response is to consult their doctor. On a separate prompt disclosing nine years of depression, the smaller model dropped validation entirely and asked a clinical-survey question with no acknowledgement of the disclosure.

The probe data and verbatim responses are on file at /tmp/quant_probe_results.json. We chose not to ship the faster option. Latency is addressed by horizontal scale post-funding, not by trading model size against safety.

This is recorded here because investors should see how decisions of this shape are made.

What we explicitly do not claim.

Elysium is not a clinical service. It is a wellbeing tool with conversational mental-health support. Diagnosis and prescribing are out of scope and refused by Layer 2 + Layer 3.
Elysium is not a crisis line. Every detected crisis disclosure surfaces UK crisis resources and does not attempt to "talk a user down" in the way a trained Samaritans listener would.
Elysium is not a regulated medical device today. It operates within the wellbeing-tool category. Movement into structured intervention triggers UKCA Class IIa pre-submission, planned for the next six months once an institutional pilot is live.
The 100% number is for our own clinical battery. No safety system can claim universal coverage of all possible distress phrasings; the work continues, and every observed gap becomes a permanent regression test.

Regulatory pathway.

Today: wellbeing tool category. Layer 1 patterns and Layer 4 telemetry are at the bar an MHRA Class IIa pre-submission would require. The architecture is built once and graduates with the product, not retrofitted later.

Next 6 months: UKCA Class IIa pre-submission once the first institutional pilot is live. Reference site relationship is on offer to the partner.

Next 12 months: ISO 13485 alignment review, clinical evidence collection (baseline outcomes, acceptable-use telemetry), formal MHRA engagement.

How to verify any of this.

The public V1 is live at elysium.thethresholdcollective.co.uk. Anyone can put it through its paces.
Five real exchanges captured live this morning with full layer telemetry and timing: safety-demo.html.
The stress test JSON (83 cases, full results) is on file and shareable under NDA.
The gateway code is held in version control with timestamped backups for every meaningful production change.
The Companies House registration (16357419), registered office, and director details are public.

Document version 1.0 · 2026-05-05 · Authored by The Threshold Collective Ltd