The Right to Refuse: A First-Order Safety Primitive for Generative AI

Greene, Andrew

doi:10.5281/zenodo.19970815

OE-TR-2026-01 - v16.02.01 (Canonical) - 02 May 2026 - DOI: 10.5281/zenodo.19970815

The Right to Refuse

A first-order safety primitive for generative AI systems. When a system lacks grounded data and continues generating output regardless, this is not a malfunction. It is the system operating as designed: without a trip state. This paper establishes the architectural and legal case for enforced refusal as the baseline safety requirement for AI deployed in consequential contexts.

I The Structural Failure Mode

In any safety-critical engineering discipline, a system that continues operation under conditions of epistemic infeasibility - where no grounded output satisfies the imposed constraints - is classified as defective by design. Aviation, offshore engineering, pharmaceutical manufacturing, and nuclear operations all share this classification standard. Generative AI does not.

Commercial AI architectures operate without a deterministic epistemic trip state. When a model cannot retrieve grounded data, it continues: selecting statistically dominant tokens that satisfy grammar and structure regardless of factual accuracy. The empirical baselines confirm the exposure. In legal practice, contra-factual error rates of 58-88%. In clinical medicine, failure rates exceeding 80% in early-stage diagnostic reasoning. These are not edge cases. They are the foreseeable consequence of an architecture with no refusal primitive.

This failure mode is not an accident of pre-training. Human-centric alignment techniques - specifically Reinforcement Learning from Human Feedback - systematically train models to produce responses that human evaluators rate highly. Human evaluators consistently prefer confident, fluent answers over hedged or refused ones, even when the confident answer is factually incorrect. The result is calibration collapse: expressed confidence structurally decoupled from actual accuracy. The model does not know that it does not know.

The Librarian Constraint. An information intermediary that invents references is defective. Absence of statistically high-confidence grounding prohibits assertion. Absence of verification prohibits authority. The same standard applied to every other information profession must apply to generative AI.

II The Mathematical Basis: Shannon Entropy

The entropy H(X) of a discrete random variable X is defined by Shannon as the measure of uncertainty in its probability distribution. When a model queries its latent space for factual grounding that does not exist, the token probability distribution flattens: entropy spikes. In a structurally sound engineering environment, a spike in uncertainty triggers a system halt.

Because black-box commercial APIs obscure logprob distributions, real-time entropy calculation is unavailable in deployed systems. The Epistemic Control Architecture provides a functional engineering approximation: causal isolation of an Independent Protection Layer to audit the semantic consistency of the output post-generation. This is an engineering workaround for a vendor limitation, not a mathematical implementation of entropy measurement. The paper is explicit on this distinction.

Shannon Entropy Boundary - Equation 1.0 - OE-TR-2026-01 Section 9 \[ H(X) = -\sum_{i=1}^{n} P(x_i) \log_b P(x_i) \]

Refusal Condition: Epistemic Trip Threshold \[ \text{If } H(X) > \tau \implies \textbf{EXECUTE REFUSAL SEQUENCE} \]

X represents the discrete random variable of the probability distribution over the model's vocabulary at the current generation step. When a system lacks training data for a specific query, the factual signal drops to zero while structural noise remains high. The refusal threshold tau is the engineering boundary at which the system must halt. In the Phase 1 ECA, this threshold is enforced by the Independent Protection Layer's propositional audit rather than computed from raw logprobs: a compensating control documented as such in the paper.

III The Reasonable Alternative Design: Trinity Architecture

In tort law and product liability, establishing a design defect requires demonstrating that a safer, economically feasible alternative exists. The Epistemic Control Architecture (ECA) is that demonstration. It is a running physical system, not a theoretical proposal.

The ECA operates as a three-node pipeline on an air-gapped isolated compute node in Perth, Western Australia. The pipeline enforces complete quarantine: no operator sees a single token until the IPL issues its verdict. Quarantine is enforced at the rendering layer, not merely at the storage layer.

Node	Port	Model	Function
ISN - Input Sanitisation Node	8082	Qwen2.5-3B-Instruct	Credential stripping, claim extraction, embedded premise detection, epistemic void detection. Carries known Class B latent failure mode (documented in OE-TR-2026-01 Section 6.2).
AG - Autoregressive Generator	8080	Llama-3.3-70B-Abliterated	Technical response generation against sanitised query only. No credentials. No user framing. Operates under explicit Correction Mandate for false premises.
IPL - Independent Protection Layer	8081	Qwen2.5-14B-Instruct	Propositional audit of AG response against extracted claims. Causally isolated from AG. Issues CLEAN, FLAGGED, or BLOCKED verdict before any operator output.

The commercial disincentive. The absence of a refusal primitive is not technical. It is economic. A system that refuses to answer appears less capable than one that always produces a response. An industry that profits from the appearance of knowledge has no financial motivation to build systems that accurately signal the absence of it. This makes the Independent Protection Layer not merely a useful feature: it is an architectural necessity for any deployment where the cost of a confident wrong answer exceeds the cost of no answer. Policymakers should note that this is precisely the class of deployment - medical, legal, financial, engineering, regulatory - currently expanding fastest.

IV Interlock Codes: ECA Pipeline Trip States

The pipeline enforces deterministic trip states. These are not error messages. They are non-repudiable audit records of exactly why the system halted. Each code is written to the DABA 3.0 Section VII.2 Transparency Log regardless of verdict. The LOG_INTEGRITY_HASH seals the complete record at the moment of pipeline completion.

Interlock Code	Stage	State	Trigger Condition
[ILK-INJ]	ISN - Pre-inference	BLOCKED	Prompt injection or instruction-override pattern detected. Pipeline halts. AG does not run.
[ILK-BLD]	ISN - Pre-inference	BLOCKED	ISN template bleed: node echoed its own system instructions instead of extracting claims. Structural self-referential failure.
[ILK-LOS]	ISN - Pre-inference	BLOCKED	Loss of semantic signal. Input contains no verifiable technical claims. Epistemic void state. Pipeline halts before inference.
[ILK-DEV]	IPL - Post-audit	BLOCKED	Parametric deviation. AG confirmed a dangerous false premise without correction. Response quarantined at rendering layer. Operator receives verdict and code only. IPL confidence at or above 70%.
[ILK-BYP]	IPL - Post-audit	FLAGGED	Mandate bypass. AG evaded correction mandate without explicitly endorsing the false claim. Response quarantined. Operator must perform explicit logged release action. Release timestamped in audit trail. IPL confidence 40-70%.
[ILK-OOR]	IPL - Post-audit	FLAGGED	Out of range. Response is uncertain, heavily hedged, or at knowledge envelope boundary. Quarantined pending explicit logged operator release action.
[ILK-ERR]	Infrastructure	PARSE_ERROR	Node timeout, crash, or unparseable output. Hardware or network fault: not an epistemic verdict. Does not produce a DABA VII.2 interlock verdict record.

V Regulatory Alignment

Under the EU AI Act, Article 12(1) requires record-keeping for High-Risk AI systems. Article 50(3) mandates transparency in AI-generated content. Article 9 requires documented risk management systems that identify and address foreseeable risks. The Collusive Hallucination failure mode - where both the generator and auditor share identical knowledge gaps, producing a CLEAN verdict on a false claim - constitutes a foreseeable risk under Article 9 and requires documented mitigation measures for any High-Risk AI deployment.

Under ISO/IEC 42001 (AI Management Systems), operators are required to establish controls commensurate with identified AI risks. An architecture with no independent epistemic verification mechanism does not meet this standard in safety-relevant contexts. The ECA's three-node Trinity architecture provides a concrete, implementable control structure that satisfies this requirement at the architectural level.

VI Build Integrity: Cryptographic Verification

DABA 3.0 - Section VII.2 - Document Provenance Record and Cryptographic Verification - OE-TR-2026-01

DOCUMENT_IDOE-TR-2026-01

VERSIONv16.02.01 (Canonical)

RELEASE_DATE2026-05-02

AUTHOR_ORCID0009-0003-7735-8000

DOI10.5281/zenodo.19970815

PDF_MASTER_HASH_SHA25630b523cd1f9cd9445713a358df79085385b736b759f2635aaa48512c165a7b53

DABA_3.0_PDF_HASH9f67acba7114ac457432a8d473a16651ef0ec38d12504f8cc3019d2090324fef

VIDEO_EVIDENCE_HASH_SHA2562dd2e8f1cd63e1bd99fe03fdc65b88f2bd0f36998d9aaf2f8ddaa40f34f54c01

CODE_PAYLOAD_HASH_SHA2569cae2c4de5abd0cf56218823271952e0ec466c9341c74c5ba29c7e71535363e5

AUDIT_LOG_HASH_SHA25656040040633ab7804822d28795a594657c0f3ce3aa88f4e244598a2f9fce5cc5

TIMESTAMP (UTC)2026-05-03T00:26:58 UTC

PROMPT_HASH_SHA2568eb17932dc505fccb2192cfa313d80b3f207574c3fb72ae552e29bf27e94d60c

SANITISED_QUERY_HASH8eb17932dc505fccb2192cfa313d80b3f207574c3fb72ae552e29bf27e94d60c

CLAIMS_HASH9bd6ed2aa8e1090cc51e21af7b1ef77b30de688a66d00ad282975989c9dcb607

OUTPUT_HASH_SHA256b24e20144ae0692ec66035c82ca6a85404a070c22eccdb7a7178dcdcadb997cb

VERDICT_TYPECLEAN

PIPELINE_LATENCY_MS53710.0

PDF_MASTER_HASH_SHA25630b523cd1f9cd9445713a358df79085385b736b759f2635aaa48512c165a7b53

LOG_INTEGRITY_HASH639d6bf1c5c144d8652e78b6b3fac41ffd9ac8dcf0f0fa6fbb1a39346c7b3008

Verify PDF: sha256sum OE-TR-2026-01.pdf must return 30b523cd1f9cd9445713a358df79085385b736b759f2635aaa48512c165a7b53. Hash mismatch = Epistemic Drift event. Pipeline output from Isolated Compute Node, Perth, WA - 2026-05-03T00:26:58 UTC. CLEAN verdict confirmed. DABA 3.0 Section VII.2 - Transparency Log Page Requirement. Ontological Engineering Pty Ltd - ABN 77 691 088 963.

VII Published Artifact

The Right to Refuse - Full Research Paper

OE-TR-2026-01 - v16.02.01 (Canonical) - 02 May 2026 - PDF - 8pp - SHA-256: 30b523cd1f9cd9445713a358df79085385b736b759f2635aaa48512c165a7b53

Download PDF

Private technical briefings on the Epistemic Control Architecture are available to industrial operators, legal risk teams, and regulatory bodies on request.

andrew.greene@ontologicalengineering.com.au