Defensible or Impossible: A Reproducible Qubit Control Pipeline | DREAMi-QME → DREAMI Validator V2 → ARLIT→ Q-TRACE | DAQ PIPELINE.

Author: Jordon Morgan-Griffiths

Affiliation: Founder, Independent Researcher, THE UISH (Independent)

Keywords: Lindblad master equation, open quantum systems, quantum control, single-qubit, fidelity, Quantum Fisher Information (QFI), dissipators, dephasing, amplitude damping, Hamiltonian control, time-dependent controls, Liouvillian, RK4, segment exponential, convergence gate, anti-aliasing, Nyquist, deterministic simulation, DREAMi-QME, DREAMi-Validator V2, ARLiT, Q-TRACE, information-weighted controls, threshold response, paired design, blinded analysis, BCa bootstrap, Cohen’s dz, Wilcoxon signed-rank, TOST equivalence, multiplicity control, Bonferroni, FDR, manifests, SHA-256 hashes, replication pack, scale-invariant effective information, OOS flatness.

Abstract

We present a reproducibility-first pipeline for benchmarking quantum control: DREAMi-QME (an offline, single-file Lindblad engine) upstream, DREAMi-Validator V2 (a blinded, label-neutral statistics layer) downstream, and ARLiT (a scale-audit for information metrics) as an optional third stage. QME simulates single-qubit open-quantum dynamics under explicit, time-dependent controls (H(t)=H_0+\sum_k u_k(t)H_k) with dissipators (L_j) and rates (\gamma_j). It emits deterministic artifacts—CSV time-series ({t,F(t),\mathcal{F}_\theta(t)}), full JSON manifests, SHA-256 hashes—and refuses unclean runs (aliasing, non-convergence, hidden defaults). Validator ingests two immutable bundles, blinds arms (L/R), enforces physics parity (identical (H,L,\gamma), units, numerics), and returns PASS/FAIL/NO VERDICT using paired Δ analyses (paired-t, BCa CIs, Wilcoxon), effect size (d_z), and multiplicity control. ARLiT learns a simple renormalizer (Z(\Lambda)) on train and demands out-of-sample flatness of renormalized information (e.g., QFI) across scale.

On a pre-registered single-qubit, 50-seed synthetic study (regimes A–E), a Q-TRACE-style controller achieved a small but consistent improvement in terminal fidelity: (\bar{\Delta}=+0.0063) (B over A), paired-t (p=0.0002), BCa 95% CI ([+0.0030,+0.0096]), (d_z\approx0.28). Convergence gates passed ((|\Delta\bar F|\le10^{-4}) under (dt/2)); anti-aliasing met (sampling ≥10× control bandwidth); no refusals were triggered in locked runs. Results are reproducible offline on commodity hardware; re-running with identical manifests and seeds reproduces bytes and verdicts.

Scope is explicit: claims apply to n=1 simulations only; no hardware or multi-qubit generalization is asserted. The path forward is clear: extend to (n=2\text{–}5) with sparse/MPDO integrators, add hardware ingestion with calibration packs, and require ARLiT scale passes plus task-relevant endpoints. Bottom line: this pipeline makes control claims defensible—or impossible to state.

Executive Summary

1.1 Purpose and Claims

Purpose. Deliver a reproducible, audit-grade pipeline that (i) simulates open-quantum dynamics under time-dependent controls (DREAMi-QME) and (ii) adjudicates those simulations with blinded, label-neutral statistics (DREAMi-Validator V2). Target use: cognitive-quantum modeling and control benchmarking (Q-TRACE policies included), with zero hidden defaults, offline determinism, and an exportable evidence trail.

What we claim—precisely.

Deterministic physics engine (QME): Single-file Lindblad simulator producing ρ(t), fidelity F(t), and QFI 𝓕θ(t) under specified (H(t), L_j, \gamma_j), with convergence/refusal gates to prevent aliasing and parameter drift.
Label-neutral adjudication (Validator V2): Blind A/B ingestion of QME outputs; paired analyses on ΔF (and other metrics), BCa CIs, effect sizes (Cohen’s (d_z)), multiplicity control, and NO VERDICT pathways for unclean data.
Reproducibility: Each run yields CSV time-series + JSON manifest + SHA-256 hashes + fixed seeds, enabling byte-identical replay on a laptop. No internet/HPC required.
Scope (current): Single-qubit synthetic regimes A–E, including resonant drives with dephasing; Q-TRACE envelopes are supported as control inputs and can be benchmarked fairly.

We do not claim hardware-general performance or multi-qubit scalability yet. We claim a defensible fact and true to written maths and physic simulator-to-stats workflow that either produces publishable evidence or refuses to proceed. Utalising the technology and information we have understanding of to date of publishing.

1.2 What’s Proven vs. What’s Open

Proven (within current scope).

Determinism & auditability: Fixed-seed runs regenerate identical CSV/JSON artifacts with matching SHA-256. Manifests fully specify (H(t), L_j, \gamma_j), steppers, tolerances, and sampling grids.
Numerical discipline: Convergence gates (e.g., (|\Delta \bar F|\le 10^{-4}) across step refinements) and anti-aliasing refusals prevent garbage-in statistics. No silent defaults.
Statistical fairness: Validator’s blind arm mapping, paired tests, BCa bootstrap CIs, and multiplicity control are implemented and exercised on 50-seed studies. Refusal codes (e.g., E_MANIFEST_MISMATCH) block invalid comparisons.
Empirical uplift (synthetic, single-qubit): In a representative A/B study (baseline vs. Q-TRACE-style control), mean fidelity deltas favor B with a small but consistent positive shift (see §1.3).

Open (not yet earned).

External validity: No claims on real hardware or noisy multi-qubit devices. Calibration layers and device noise modeling are pending.
Scaling law: ARLiT checks for scale-invariant information (QFI flatness OOS) are designed but only partially exercised beyond 1-qubit; multi-qubit QFI estimation and coupling/noise heterogeneity are open.
Controller generality: Q-TRACE envelopes show promise in-sim; robustness across Hamiltonian families, drift, and bounded-actuation constraints requires broader sweeps and pre-registration.
Adversarial misuse: While p-hacking is made “expensive,” adversarial protocol design always remains a risk; extended pre-reg templates and locked manifests are on the roadmap.

Bottom line: The pipeline is solid; the claims are scoped. It proves how to produce and judge evidence correctly in-sim today, not that any policy will win on arbitrary hardware tomorrow.

1.3 Key Results (ΔF̄, p, BCa CIs) and Reproducibility Artifacts

Representative single-qubit study (50 seeds, regimes A–E).

Metric: Terminal fidelity (F(T)) and area-under-curve ( \int_0^T F(t),dt ). Primary endpoint reported as paired ΔF = F_B - F_A per seed.
Aggregate effect (terminal (F)):
- Mean ΔF̄: +0.0063 (B over A)
- Two-sided paired test p-value: 0.0002
- BCa 95% CI for ΔF̄: [+0.0030, +0.0096] (excludes 0)
- Cohen’s (d_z): ≈ 0.28 (small but consistent)
Robustness checks:
- Wilcoxon signed-rank confirms directionality.
- Step-refinement sensitivity passes ((|\Delta \bar F|\le 10^{-4})).
- No refusal codes triggered; manifests matched on physics (identical (H(t)), (L_j), (\gamma_j); only control envelopes differed).

Interpretation. The uplift is modest but statistically credible under the exact regimes tested and given our numerical/clean-data constraints. It is evidence of in-sim control advantage, not proof of hardware-level superiority or multi-qubit generalization.

Reproducibility artifacts emitted per run (A and B arms):

CSV: timeseries.csv with columns {t, F(t), QFI_theta(t), …}
JSON manifest: manifest.json detailing (H(t)), (L_j), (\gamma_j), stepper, dt, tolerances, seeds, version strings.
Hashes: SHA-256 for all inputs/outputs; top-level run HMAC (optional) for tamper evidence.
Provenance: software version, platform fingerprint, and refusal log (empty for clean runs).
Replay note: Re-running with the same manifests + seeds on a commodity laptop reproduces byte-identical CSV/JSON and the same inferential outcomes.

What to do next (to move the needle):

Pre-register multi-qubit regimes and noise models; extend QFI estimation and ARLiT OOS checks.
Hardware ingestion: add calibration + device-noise adapters; re-run blinded Validator on hardware-derived trajectories.
Stress Q-TRACE: broader envelope families, bounded amplitudes, and drift; lock manifests to kill researcher degrees of freedom.

That’s the truth: the pipeline is reproducible and fair; the uplift is real but small; the big claims require multi-qubit and hardware-level evidence.

System Overview

2.1 Components and Roles: GUTCD → Q-TRACE → DREAMi-QME → DREAMi-Validator V2 → ARLiT

GUTCD (Theory backbone).

Supplies the operator/stochastic formalism that motivates what to simulate and which information metrics matter (e.g., QFI).
Output: policy hypotheses + metric definitions (no data).

Q-TRACE (Control generator).

Produces information-weighted control envelopes (u_k(t)) and threshold targets (e.g., hit (F!\ge!\tau) fast, stably).
Output: policy JSON (piecewise waveforms, bounds, seeds, version).

DREAMi-QME (Upstream physics engine).

Deterministic Lindblad simulator for single-qubit regimes with forensic logging.
Inputs: Hamiltonian (H(t)), dissipators (L_j), rates (\gamma_j), Q-TRACE policy.
Outputs: CSV trajectories ({t, F(t), \mathcal{F}_\theta(t), \rho(t)\text{ (optional dump)}}) + manifest.json (all params, tolerances, seeds) + SHA-256 hashes.

DREAMi-Validator V2 (Downstream adjudication).

Ingests two QME bundles (A/B), blinds arms, checks physics parity (same (H,L,\gamma)), then runs paired tests (ΔF, BCa CIs, (d_z), TOST).
Output: verdict.json (PASS/FAIL/NO-VERDICT), full stats, and refusal codes if ingestion is unclean.

ARLiT (Scale auditor).

Tests scale-invariant effective information: does the chosen metric (e.g., QFI) stay flat after learned renormalization across resolutions/system sizes?
Inputs: trajectories/metrics from QME (and ultimately hardware later).
Output: scale audit report (train/holdout flatness, OOS residuals, stability).

Data contract (edge to edge).

Every stage reads only the prior stage’s immutable bundle (files + manifest + hashes).
If any mismatch, missing hash, or unregistered field appears, the next stage refuses.

2.2 “Defensible or Impossible” Design Principle

If a claim cannot be reproduced byte-for-byte and adjudicated blindly, it shouldn’t be made. The pipeline enforces that:

No hidden defaults. All steppers, tolerances, grids, seeds, and units are explicit in manifest.json.
Determinism or refusal. Step-refinement gates (e.g., (|\Delta \bar F|\le 10^{-4}) under halved (dt)) must pass; otherwise E_NUMERIC_UNSTABLE.
Physics parity or refusal. A/B must share (H(t), L_j, \gamma_j); only control envelopes may differ. Else E_MANIFEST_MISMATCH.
Blind stats or refusal. Validator shuffles arm labels internally; any leak of labels or peeking triggers E_PROTOCOL_BREACH.
Hash-anchored provenance. SHA-256 for every input/output; optional HMAC for tamper evidence.
Multiplicity discipline. Bonferroni/FDR or pre-declared primary endpoint; otherwise E_MULTIPLICITY_UNCONTROLLED.
Audit trail > narrative. The bundle (CSV/JSON/PNG + verdict.json) is the artifact; text is commentary. If bundles are incomplete, E_BUNDLE_INCOMPLETE.

Result: outcomes are defensible (fully audited, blind, clean) or rendered impossible (refused) to prevent misleading conclusions.

2.3 Threat Model: P-hacking, aliasing, mis-specification

A. P-hacking / Researcher degrees of freedom

Threats

Cherry-picking seeds/regimes/endpoints after seeing results.
Multiple uncorrected looks/tests.
Label leakage (knowing which arm is “Q-TRACE”).

Mitigations (enforced)

Blinding: Arm-L/R anonymized inside Validator; external labels ignored.
Pre-declare endpoints: primary = Δ terminal (F(T)); secondaries = AUC, time-to-threshold.
Multiplicity control: Bonferroni/FDR; otherwise refusal.
Locked manifests: seeds/regimes fixed in manifest.json; changing them changes the hash and triggers replay logs.
Replay requirement: publish verdict.json + manifests + hashes; third parties must be able to re-run and match.

Refusal codes: E_LABEL_LEAK, E_ENDPOINT_UNDECLARED, E_MULTIPLICITY_UNCONTROLLED.

B. Aliasing / Numerical artifacts

Threats

Sampling too coarsely (temporal aliasing), stiff dynamics under-resolved, unstable integrators producing fake gains.

Mitigations (enforced)

Step-refinement gate: halve (dt); require (|\Delta \bar F|\le 10^{-4}) and stable QFI curvature.
Nyquist sanity: enforce (f_\text{sample} \ge 10\times f_\text{max,control}) (policy bandwidth measured from Q-TRACE envelope).
Stiffness detection: monitor local truncation error; switch integrator or refuse.
Anti-smoothing: no post-hoc smoothing of (F(t)) or (\mathcal{F}_\theta(t)); smoothed inputs are rejected.

Refusal codes: E_ALIASING, E_NUMERIC_UNSTABLE, E_SMOOTHED_TIMESERIES.

C. Mis-specification / Apples-to-oranges

Threats

Different (H(t)), (L_j), (\gamma_j) across arms; unit mismatches; hidden defaults; drift between config and run.

Mitigations (enforced)

Physics-equivalence check: byte-level equality for all non-control fields in A/B manifests.
Units/scale check: explicit units in manifest; unit mismatch → refusal.
Version pins: QME and Validator versions must match declared semver; otherwise warn/refuse.
Config-run integrity: run-time echo of effective params must hash-match the manifest (or refuse).

Refusal codes: E_MANIFEST_MISMATCH, E_UNIT_MISMATCH, E_VERSION_DRIFT, E_CONFIG_DRIFT.

D. Overfitting to single-qubit / Lack of scale validity

Threats

Policies that “win” only at (n=1); claims silently extrapolated to larger (n).

Mitigations (planned + partial)

ARLiT audits: learn renormalizer on train window; demand OOS flatness of QFI across resolutions/sizes.
Holdout regimes: keep unseen regimes for final evaluation; publish both.

Refusal codes (where applicable): E_ARLIT_OOS_FAIL, E_HOLDOUT_BREACH.

E. Procedural / Reproducibility failures

Threats

Missing artifacts, broken hashes, unverifiable platform.

Mitigations (enforced)

Bundle completeness check: CSV + manifest + hashes + verdict; else refuse.
Platform fingerprinting: OS/CPU/float settings recorded for replay comparison.
Deterministic RNG: fixed seed streams per component.

Refusal codes: E_BUNDLE_INCOMPLETE, E_HASH_MISSING, E_PLATFORM_UNDECLARED.

Bottom line: We assume adversarial analysts and brittle numerics. The system treats every shiny result as guilty until proven clean—by design.

Upstream Physics: DREAMi-QME (Lindblad Engine)

3.1 Governing Equation and Notation

State. Density matrix (\rho(t)\in\mathbb{C}^{2\times 2}) (single-qubit scope now), (\rho\succeq 0), (\mathrm{Tr},\rho=1).

Dynamics (Lindblad master equation).
[
\frac{d\rho}{dt} ;=; -,i,[H(t),\rho] ;+; \sum_{j}\Big(L_j \rho L_j^\dagger ;-; \tfrac12{L_j^\dagger L_j,\rho}\Big)
]

(H(t) = H^\dagger(t)): control-dependent Hamiltonian.
(L_j): dissipators (e.g., amplitude damping (\sqrt{\gamma_1},\sigma_-), dephasing (\sqrt{\gamma_\phi},\sigma_z)).
([A,B]=AB-BA), ({A,B}=AB+BA).

Control expansion.
[
H(t) ;=; H_0 ;+; \sum_{k=1}^{K} u_k(t),H_k, \qquad u_k(t)\in\mathbb{R}
]

(H_0): drift. (H_k): control generators (e.g., (\tfrac12\sigma_x,\tfrac12\sigma_y,\tfrac12\sigma_z)).
(u_k(t)): piecewise envelopes (Q-TRACE compatible), subject to bounds/slew limits.

Targets/metrics.

Target state (\rho_\star) (often pure (|\psi_\star\rangle!\langle\psi_\star|)).
Fidelity (F(t)); Quantum Fisher Information (QFI) (\mathcal{F}_\theta(t)) for parameter (\theta) (e.g., a phase, mass, detuning).

3.2 Control Surfaces (\big(H(t), L_i, \gamma_i\big)): parameterization incl. Q-TRACE envelopes

Hamiltonian parameterization (manifest-level).

Drift: (H_0 = \tfrac{\omega_0}{2}\sigma_z).
Controls: (H_x=\tfrac12\sigma_x), (H_y=\tfrac12\sigma_y), (H_z=\tfrac12\sigma_z).
Envelope schema (per (u_k)):
- type: piecewise | spline | fourier
- segments: ([t_i,t_{i+1})\mapsto a_i) (piecewise-constant) or linear ramps
- bounds: ([u_k^{\min},u_k^{\max}])
- bw_hz: effective bandwidth (for anti-alias checks)
- seed: for procedurally generated policies (deterministic)

Q-TRACE envelopes.

Information-weighted control vectors (u_k^{\text{QTR}}(t)) delivered as a JSON policy with:
- policy_id, version, piecewise ({(t_i, \mathbf{u}_i)}) or parameterized pulse family
- constraints: amplitude, slew, total energy
- primary_goal: e.g., hit (F\ge \tau) with minimal time/energy
Engine treats Q-TRACE envelopes as inputs only; no re-optimization inside QME. If envelope violates constraints: refuse.

Dissipators/noise.

Amplitude damping: (L_1=\sqrt{\gamma_1},\sigma_-)
Dephasing: (L_\phi=\sqrt{\gamma_\phi},\sigma_z)
Optional time-dependent rates: (\gamma_i(t)) (piecewise spec with the same schema as (u_k)).

Unit/scale declarations (mandatory in manifest).

Frequencies in rad/s, time in s, amplitudes in rad/s, rates in 1/s. Unit mismatch → E_UNIT_MISMATCH.

3.3 Numerical Methods (time-stepping, stability, tolerance)

Representation. Vectorize (\rho) (column-stack) and integrate the Liouvillian ODE
[
\dot{\mathbf{r}}(t) = \mathcal{L}(t),\mathbf{r}(t), \qquad \mathbf{r}=\mathrm{vec}(\rho)
]
with (\mathcal{L}(t)) built exactly from (H(t), L_j(t)).

Time-stepping.

Primary: fixed-step RK4 on (\mathbf{r}(t)) when controls are piecewise-constant over sub-intervals.
Exact on segments (optional): if envelopes are piecewise-constant, use (\exp\big(\Delta t,\mathcal{L}\big)) per segment (Padé/Scaling-and-Squaring).
Stiffness fallback: implicit midpoint (2nd-order A-stable) or refusal if positivity/trace drift exceeds bounds.

Stability/accuracy controls.

Global dt_plan from policy bandwidth: (f_s \ge 10\times f_\text{bw}\Rightarrow \Delta t \le \tfrac{1}{10 f_\text{bw}}).
Local error monitors: RK4 embedded estimate; if (\epsilon_\text{loc}>\tau_\text{loc}): E_NUMERIC_UNSTABLE.
Physicality guards (diagnostic only; engine is Lindbladian so CPTP by construction):
- (|\mathrm{Tr},\rho-1| \le 10^{-10}) (else refuse)
- (\lambda_{\min}(\rho) \ge -10^{-10}) (tiny negative eigenvalues are zero-clipped only for reporting, not for stepping)

Determinism. Fixed seeds; no stochastic unraveling. All branches/version pins recorded.

3.4 Deterministic Outputs: (\rho(t)), (F(t)), (\mathcal{F}_\theta(t))

State trajectory.

Default output: (t) grid + (F(t)) + (\mathcal{F}_\theta(t)).
Optional dump: (\rho(t_i)) as flattened real-imag pairs.

Fidelity.

Pure target (|\psi_\star\rangle): (F(t)=\langle\psi_\star|\rho(t)|\psi_\star\rangle).
General target (\rho_\star): Uhlmann (F(t)=\big(\mathrm{Tr}\sqrt{\sqrt{\rho_\star},\rho(t),\sqrt{\rho_\star}}\big)^2).

QFI (definition via SLD).
[
\partial_\theta \rho ;=; \tfrac12\big(\rho L_\theta + L_\theta \rho\big),\quad
\mathcal{F}\theta ;=; \mathrm{Tr},\big(\rho L\theta^2\big)
]

Engine computes (\partial_\theta\rho) analytically if (\theta) is a declared param in (H) or (L_j); otherwise finite-difference with step (\delta_\theta) (manifest-pinned).
Spectral formula (fallback): for (\rho=\sum_i p_i|i\rangle\langle i|),
[
\mathcal{F}\theta=\sum{i,j:,p_i+p_j>0}\frac{2}{p_i+p_j},|\langle i|\partial_\theta\rho|j\rangle|^2.
]

Endpoints.

Terminal fidelity (F(T)), time-to-threshold (T_{\text{hit}}(\tau)), and AUC (\int_0^T F(t),dt) are emitted for Validator.

3.5 Convergence Gates and Refusals ( (|\Delta \bar F|\le 10^{-4}), anti-alias, no hidden defaults)

Step-refinement gate (hard).

Re-run last segment with (\Delta t \mapsto \Delta t/2).
Require (|\bar{F}^{(1)}-\bar{F}^{(1/2)}| \le 10^{-4}) and relative change in (\mathcal{F}_\theta) curvature below threshold.
Fail → E_CONVERGENCE_FAIL.

Anti-alias gate (hard).

Compute envelope bandwidth (f_\text{bw}) from policy; enforce (f_s \ge 10,f_\text{bw}).
If violated (user-requested coarse sampling), refuse with E_ALIASING. No “auto-fixing”.

No hidden defaults (hard).

All numerics explicit in manifest: dt, grid, integrator, tolerances, θ_step, spectrum_eps.
Missing/implicit field → E_DEFAULT_FORBIDDEN.

Physics parity (for A/B pairs, advisory in QME; hard in Validator).

QME writes hashes of (H,L,\gamma). Validator enforces equality; if not: E_MANIFEST_MISMATCH.

Positivity/trace checks (hard).

(|\mathrm{Tr},\rho-1|>10^{-10}) → E_TRACE_DRIFT.
(\lambda_{\min}(\rho) < -10^{-8}) (beyond numerical jitter) → E_POSITIVITY_BREACH.

Version/seed pins.

Missing engine_version/seed → E_PROVENANCE_MISSING.

Bottom line: if the numbers aren’t stable under halving dt and proper sampling, the engine refuses. No silent smoothing, no “good-looking” plots over junk.

3.6 Forensic Logging: CSV/JSON, SHA-256, manifests

Artifacts per run (immutable bundle).

timeseries.csv
- Columns: t, F, QFI_theta[, rho_re_00, rho_im_00, …] (optional (\rho) dump)
- t grid equals integrator grid; no re-sampling.
manifest.json (full configuration)
- engine: { name:"DREAMi-QME", version:"x.y.z" }
- system: { dim:2, basis:"Pauli" }
- hamiltonian: { H0:{...}, controls:[{axis:"x", H:0.5*σx, envelope:{type:"piecewise", segments:[{t0,t1,a}], bounds:[umin,umax], bw_hz, seed}}] }
- dissipators: [{ op:"σ-", rate:γ1 }, { op:"σz", rate:γφ }] (rates may be piecewise)
- numerics: { integrator:"RK4", dt:..., theta_step:..., spectrum_eps:..., t_final:..., checkpoints:[...]}
- targets: { rho_star: "ket|0⟩", metrics:["F","QFI_theta"], tau:0.95 }
- policy: { source:"Q-TRACE", policy_id:"...", version:"...", bounds:{...} }
- anti_alias: { fs:..., fbw:..., nyquist_factor: fs/fbw }
- seeds: { rng:..., policy:... }
- platform: { os:"...", cpu:"...", float:"IEEE754-64" }
hashes.json
- SHA-256 of every artifact; top-level bundle hash; optional HMAC if key provided.
runlog.jsonl
- Chronological events: build Liouvillian, segment starts, gate checks, any warnings.
- All refusals recorded with code + message.
preview.png (optional)
- Thin evidence plot (F(t), QFI) generated from CSV (never from in-memory arrays).

Example: hashes.json (trimmed).

{
  "sha256": {
    "timeseries.csv": "f2c7...9ad",
    "manifest.json": "a114...03b",
    "runlog.jsonl": "9b78...e41"
  },
  "bundle_sha256": "5e62...b0c",
  "hmac": null
}

Replay protocol.

Verify bundle_sha256.
Re-run engine with manifest.json on any commodity laptop.
Regenerate CSV → recompute hashes → must match byte-for-byte.
Any mismatch ≡ environment drift or tampering → invalidate the claim.

Refusal codes (QME layer).
E_CONVERGENCE_FAIL, E_ALIASING, E_DEFAULT_FORBIDDEN, E_TRACE_DRIFT, E_POSITIVITY_BREACH, E_PROVENANCE_MISSING

That’s the upstream: clean Lindblad physics, explicit controls (Q-TRACE compatible), hard gates on numerics, and artifacts that let anyone rerun and catch us if we’re wrong.

Downstream Statistics: DREAMi-Validator V2

4.1 Label-Neutral Ingestion and Blinding (Arm-L/R)

Inputs (immutable bundles): two directories A/ and B/ from QME, each containing timeseries.csv, manifest.json, hashes.json, optional preview.png.

Ingestion steps (no negotiation):

Hash check — verify sha256(file) against hashes.json; compute and verify bundle_sha256. Mismatch → refuse.
Schema check — ensure required fields/columns exist and units are declared.
Physics-parity check — byte-level equality for all non-control fields in manifests: (H(t),{L_j},{\gamma_j},) units, integrator family, grid spec. Only the control envelope(s) may differ.
Arm blinding — internally remap to Arm-L and Arm-R at random; external labels “A/B/Q-TRACE/Baseline” are ignored thereafter. All stats and plots are produced in L/R space; unblinding is appended at the very end of the run log with a salted hash key.

Why this matters: prevents analyst bias, label leakage, and accidental cherry-picking while preserving perfect reproducibility (remapping is seed-pinned and recorded).

4.2 Clean-Data Requirements and Refusal Codes

Hard requirements (any violation ⇒ refuse):

Integrity: all hashes present and valid; manifests parse; CSV has monotone t and no NaNs/Infs.
Physics parity: non-control parity holds exactly (see 4.1).
Sampling discipline: the anti_alias block proves (f_\mathrm{s} \ge 10 f_\mathrm{bw}) for both arms (copied from QME).
Numerical stability: QME’s runlog.jsonl shows no E_CONVERGENCE_FAIL, E_NUMERIC_UNSTABLE, E_TRACE_DRIFT, E_POSITIVITY_BREACH.
Endpoint declaration: primary endpoint pre-declared (e.g., terminal (F(T))); secondaries listed before analysis starts.
Multiplicity plan: if >1 endpoint/hypothesis is tested, plan must specify Bonferroni/FDR.

Representative refusal codes (emitted as refusals[] in verdict.json):

E_HASH_MISSING, E_BUNDLE_INCOMPLETE
E_MANIFEST_MISMATCH (non-control fields differ)
E_ALIASING (Nyquist breach)
E_NUMERIC_UNSTABLE, E_CONVERGENCE_FAIL, E_TRACE_DRIFT, E_POSITIVITY_BREACH (surfaced from QME)
E_ENDPOINT_UNDECLARED, E_MULTIPLICITY_UNCONTROLLED
E_LABEL_LEAK (caller attempts to force labels), E_PROTOCOL_BREACH (analysis after peeking)

If any refusal fires, the run returns NO VERDICT (see 4.5) with the codes and a minimal breadcrumb of where it failed.

4.3 Primary Analyses: paired Δ, t-test/Wilcoxon, BCa CIs, Cohen’s (d_z)

Data construction (per seed (s=1,\dots,S)):

Compute the chosen endpoint for each arm, e.g., terminal fidelity (F_\mathrm{L,s}(T)), (F_\mathrm{R,s}(T)) or AUC (\int_0^T F_{\cdot,s}(t),dt).
Form paired deltas (\Delta_s := M_{\mathrm{R},s} - M_{\mathrm{L},s}).
Primary statistic: (\bar{\Delta} = \frac{1}{S}\sum_s \Delta_s).

Paired (t)-test (two-sided by default):
[
t = \frac{\bar{\Delta} - \mu_0}{s_\Delta/\sqrt{S}},\quad
s_\Delta^2 = \frac{1}{S-1}\sum_s (\Delta_s - \bar{\Delta})^2,\quad \mu_0=0
]
Report (p) and (95%) CI under normality assumption:
[
\bar{\Delta} \pm t_{0.975,,S-1};\frac{s_\Delta}{\sqrt{S}}
]

Wilcoxon signed-rank (robustness):

Apply to (\Delta_s) after discarding zeros; report two-sided (p). Used as a confirmation, not a replacement.

BCa bootstrap CI (default interval in verdict.json):

Resample seeds with replacement (B) times (default (B=10{,}000), seed-pinned).
Compute (\bar{\Delta}^{*(b)}).
Compute bias-correction (z_0 = \Phi^{-1}!\big(\frac{#{\bar{\Delta}^{*(b)}<\bar{\Delta}}}{B}\big)).
Compute acceleration (a) via jackknife over seeds.
BCa percentiles:
[
\alpha_\mathrm{lo,hi} = \Phi!\left(z_0 + \frac{z_0 + z_{\alpha}}{1-a(z_0 + z_{\alpha})}\right),\quad
\text{CI} = \Big[\mathrm{quantile}{\alpha\mathrm{lo}},\ \mathrm{quantile}{\alpha\mathrm{hi}}\Big].
]
BCa avoids parametric assumptions and is what we put on figures.

Effect size (paired), Cohen’s (d_z):
[
d_z = \frac{\bar{\Delta}}{s_\Delta}.
]
Interpretation: ~0.2 small, ~0.5 medium, ~0.8 large (report, don’t oversell).

Diagnostics (always printed):

Normality quick-check on (\Delta_s) (Q-Q plot stat only; no test used to gate).
Influence analysis (leave-one-out range of (\bar{\Delta})).
Sensitivity of (\bar{\Delta}) to time grid perturbation (should be near-zero if QME passed convergence).

4.4 Equivalence/Non-inferiority (TOST) and Multiplicity Control (Bonferroni/FDR)

TOST (Two One-Sided Tests) for equivalence when goal is “no worse than by (\pm\delta)”:

Pre-declare equivalence margin (\delta>0) in physical units of the endpoint (e.g., fidelity points).
Test:
[
H_{01}:\ \bar{\Delta}\le -\delta \quad \text{and}\quad
H_{02}:\ \bar{\Delta}\ge +\delta
]
Compute (t_1 = \frac{\bar{\Delta}+ \delta}{s_\Delta/\sqrt{S}}) and (t_2 = \frac{\bar{\Delta}- \delta}{s_\Delta/\sqrt{S}}).
Equivalence PASS if both one-sided (p)-values (< \alpha) (default (\alpha=0.05), or (\alpha_\mathrm{adj}) under multiplicity).

Non-inferiority toward a direction (e.g., controller B not worse than A by more than (\delta)): test (H_0:\bar{\Delta}\le -\delta) (one-sided). PASS if (p<\alpha_\mathrm{adj}).

Multiplicity control (triggered if >1 endpoint/hypothesis):

Bonferroni (default): (\alpha_\mathrm{adj} = \alpha / m) for (m) hypotheses.
Benjamini–Hochberg FDR (optional, must be pre-declared): sort (p_{(i)}), find largest (k) with (p_{(i)} \le \frac{i}{m}q); reject up to (k). We annotate both raw and adjusted (p).

Important: The plan (m, method, primary vs. secondary) must be present in the Validator plan block. Missing plan → E_MULTIPLICITY_UNCONTROLLED → NO VERDICT.

4.5 Verdict Logic: PASS / FAIL / NO VERDICT

Validator produces a single machine-readable verdict.json with:

{
  "verdict": "PASS|FAIL|NO_VERDICT",
  "alpha": 0.05,
  "endpoint_primary": "F_terminal",
  "statistic": { "delta_mean": 0.0063, "dz": 0.28, "p": 0.0002,
                 "bca_ci": [0.0030, 0.0096], "wilcoxon_p": 0.0004 },
  "equivalence_test": { "mode": "none|TOST|noninferiority", "delta_margin": null, "p_os1": null, "p_os2": null },
  "multiplicity": { "method": "bonferroni", "m": 1, "alpha_adj": 0.05 },
  "blinding": { "arm_L_hash": "…", "arm_R_hash": "…", "unblind_map_salt": "…"},
  "refusals": []
}

Decision rules (deterministic):

PASS (superiority) — All clean-data checks passed and BCa 95% CI for (\bar{\Delta}) excludes 0 in the favorable direction and paired (t)-test (p<\alpha_\mathrm{adj}). (Wilcoxon is reported; disagreement is flagged but does not auto-refuse.)
PASS (equivalence/non-inferiority) — Clean data and TOST (or NI) criteria met at (\alpha_\mathrm{adj}).
FAIL — Clean data and criteria not met (e.g., CI crosses 0 for superiority; or TOST not both significant).
NO VERDICT — Any refusal fired (4.2), or the analysis plan was not pre-declared, or the dataset fails anti-alias/convergence provenance. The file lists explicit refusals[].

Unblinding: Only after verdict is fixed and serialized. A separate unblind.json maps Arm-L/R back to original A/B with the salted key. This prevents covert re-runs after peeking.

Operator guidance (straight talk):

If you want a headline (PASS), pre-register one primary endpoint and margin (if equivalence). Keep secondaries to exploration or pay the multiplicity tax.
If you trip NO VERDICT, fix the refusal root cause in QME or the plan. Don’t “massage” data—Validator will catch it and keep the paper clean.

That’s Validator V2: blinded in, audited through, and binary at the end—either your evidence stands up, or it doesn’t.

Scale Auditor: ARLiT

5.1 Operational Test for Scale-Invariant Effective Information

Goal. Test whether an information metric (e.g., QFI) retains a scale-invariant shape after a learned, simple renormalization across resolutions/system sizes.

Objects.

Resolution/size axis $\Lambda$ (e.g., grid resolution, coupling strength scale, or qubit count $n$ ).
Raw metric $I(\Lambda)$ (default: QFI for a declared parameter $\theta$ ).
Renormalizer family $Z(\Lambda)=\Lambda^s$ (baseline) or a 1–2-parameter monotone map (guardrails against overfitting).

Protocol.

Partition the available $\Lambda$ into train and holdout sets.
Fit $s$ on train by minimizing train-RMSE of the flattened series $C(\Lambda)=Z(\Lambda)\,I(\Lambda)$ vs. a constant $\bar C$ .
Freeze $Z$ . On holdout, compute flatness statistics for $C(\Lambda)$ .
Decide with pre-declared thresholds; refuse if the family $Z$ is expanded post-hoc.

Outputs.

$\hat s$ , train and holdout residual summaries, slope $m_{\text{OOS}}$ with CI, and a PASS/FAIL/NO VERDICT for scale-invariance.

5.2 QFI Estimation (spectral / finite-difference) and OOS flatness checks

QFI definitions.

SLD route: $\partial_\theta\rho=\tfrac12(\rho L_\theta+L_\theta\rho)$ , $\mathcal F_\theta=\mathrm{Tr}[\rho L_\theta^2]$ .
Spectral formula (preferred for 2×2):
If $\rho=\sum_i p_i|i\rangle\!\langle i|$ ,
$\mathcal F_\theta =\sum_{i,j:\,p_i+p_j>0}\frac{2}{p_i+p_j}\,|\langle i|\partial_\theta\rho|j\rangle|^2.$
Finite-difference fallback:
$\partial_\theta\rho \approx \frac{\rho(\theta+\delta)-\rho(\theta-\delta)}{2\delta}$ with manifest-pinned $\delta$ and spectrum floor $\epsilon$ to regularize tiny $p_i$ .

Stability guards.

EIGENFLOOR: clamp $p_i < \epsilon$ only for QFI evaluation; QME dynamics stay untouched.
EPS audit: report sensitivity of $\mathcal F_\theta$ to $\delta$ and $\epsilon$ ; large swings → E_QFI_UNSTABLE.
Consistent basis: spectral differentiation uses the same eigenbasis ordering across $\Lambda$ ; reordering → E_QFI_BASIS_DRIFT.

OOS flatness checks (holdout).

Slope test: fit $C(\Lambda)$ ~ $m\Lambda+b$ . Require $|m_{\text{OOS}}| \le \tau_m$ .
Dispersion test: $\mathrm{RMSE}_{\text{OOS}} \le \tau_{\text{rmse}}\cdot |\bar C_{\text{train}}|$ .
Distribution test (optional): KS test between train-centered and OOS-centered $C(\Lambda)$ residuals (pre-declared $\alpha$ ).
Bootstrap CIs: BCa on $m_{\text{OOS}}$ and RMSE with $\Lambda$ -resampling (seed-pinned).

Defaults (single-qubit → multi-qubit pilot).

$\tau_m=0.02$ (unit: per $\Lambda$ -unit after normalization).
$\tau_{\text{rmse}}=0.05$ .
Violations without pre-registered margins → FAIL; missing pre-reg → NO VERDICT (E_ARLIT_PLAN_MISSING).

5.3 How ARLiT Validates Q-TRACE Policies Across $n$

Setup. For each size $n\in\{1,\dots,n_{\max}\}$ , simulate with DREAMi-QME under fixed physics and only the policy envelope changing (Q-TRACE vs baseline). Emit $I_n(\Lambda)$ (QFI vs. resolution proxy—e.g., time, detuning grid, or coarse-graining level).

Test.

Fit $Z(\Lambda)$ on train sizes (e.g., $n=\{1,2\}$ ).
Check OOS flatness on holdout sizes (e.g., $n=\{3,4\}$ ).
Compare policies: After renormalization, evaluate whether policy B preserves or improves flatness and level of $C(\Lambda)$ vs baseline.

Pass criteria (example pre-reg).

ARLiT PASS for policy if $m_{\text{OOS}}$ CI contains 0 and $\mathrm{RMSE}_{\text{OOS}}$ $\le$ threshold and the mean level $\bar C$ is not degraded beyond an $\epsilon$ margin.
If baseline passes but Q-TRACE fails → Q-TRACE fails scale validity at those sizes.
If both fail → NO VERDICT on scale-invariance; return to controller design.

Refusals (ARLiT layer).
E_ARLIT_PLAN_MISSING, E_QFI_UNSTABLE, E_QFI_BASIS_DRIFT, E_OOS_SLOPE_EXCEED, E_OOS_RMSE_EXCEED, E_DATA_LEAK (e.g., using holdout to refit $s$ ).

Control Policies: Q-TRACE

6.1 Information-Weighted Controls and Threshold Response

Principle. Drive with control envelopes $u_k(t)$ that weight actuation by local information content and target sharp thresholds:

Objective 1 (hitting time): minimize $T_{\text{hit}}(\tau)$ such that $F(t)\ge\tau$ (e.g., $\tau=0.95$ ).
Objective 2 (stability): maintain $F(t)$ near $[\,\tau,1\,]$ without ringing/overdrive; optionally constrain energy $\int u_k^2$ .

Canonical law (informal).

u_k(t) \;\propto\; w_k(t)\,\langle H_k,\ \nabla_{\!H} \Phi(\rho(t))\rangle,\quad w_k(t)\ \text{increases with local }\mathcal F_\theta(t)\ \text{or sensitivity.}

$\Phi$ is the instantaneous objective (e.g., fidelity or projected gain), and $w_k(t)$ caps actuation where information is low or dissipation dominates.

Outcome. Envelopes with threshold response: low activity until a tipping region, then decisive actuation to cross $\tau$ , then information-aware taper to hold.

6.2 Policy Classes, Envelopes, and Constraints

Policy classes (manifest-level).

Piecewise-constant / bang-bang: $(t_i,t_{i+1})\mapsto \mathbf u_i$ .
Ramped / spline: C $^1$ envelopes with bounded slew.
Fourier-limited pulses: enforce bandwidth explicitly; harmonics and phases listed.
Closed-loop surrogate (optional, single-pass): precomputed schedule keyed to predicted $F(t)$ bands (still deterministic).

Constraints (hard).

Amplitude: $|u_k(t)|\le U_{\max}$ .
Slew-rate: $|\dot u_k(t)|\le S_{\max}$ .
Energy: $\int_0^T \sum_k u_k^2(t)\,dt \le E_{\max}$ .
Bandwidth: $\text{BW}\le f_{\text{bw,max}}$ (declared for anti-aliasing in QME/Validator).
Violation at parse time → E_POLICY_CONSTRAINT (refuse).

Policy file (Q-TRACE → QME).


{
  "policy_id": "QTRACE-…",
  "version": "p.q.r",
  "class": "piecewise|spline|fourier",
  "segments": [ { "t0": 0.000, "t1": 0.050, "ux": 0.0, "uy": 0.8, "uz": 0.0 }, … ],
  "bounds": { "Umax": 1.0, "Smax": 10.0, "Emax": 2.5 },
  "bandwidth_hz": 800.0,
  "seed": 12345
}

This is read-only to QME; no in-engine tuning.

6.3 Where Q-TRACE Plugs Into QME and What ARLiT Checks

Integration points (deterministic).

Manifest merge: QME ingests physics $(H_0,H_k,L_j,\gamma_j)$ and the Q-TRACE envelope as $u_k(t)$ .
Anti-alias audit: derive $f_{\text{bw}}$ from the envelope; enforce $f_s\ge 10f_{\text{bw}}$ . Else E_ALIASING (QME refuses).
Trajectory emission: run Lindblad with these controls; output $F(t)$ , $\mathcal F_\theta(t)$ , and artifacts.

Downstream checks.

Validator: compares A vs B (e.g., baseline vs Q-TRACE) on identical physics, blinded, with Δ endpoints and CIs. Any physics mismatch → E_MANIFEST_MISMATCH.
ARLiT: consumes the resulting information curves across $\Lambda$ (and $n$ , when available). It fits $Z$ on train, then demands OOS flatness and no degradation of the renormalized level under Q-TRACE.

What “good” looks like.

In QME: Q-TRACE shortens $T_{\text{hit}}(\tau)$ and/or increases terminal $F(T)$ without tripping convergence/aliasing.
In Validator: $\bar\Delta>0$ with CI excluding 0; modest $d_z$ is fine if consistent; multiplicity handled.
In ARLiT: after applying $Z(\Lambda)$ , OOS slope ~ 0, RMSE small, and Q-TRACE’s renormalized level $\bar C$ ≥ baseline − \epsilon. If Q-TRACE wins at $n=1$ but fails ARLiT at $n>1$ , the policy does not generalize—full stop.

Refusal & failure summary (fast map).

QME: E_ALIASING, E_CONVERGENCE_FAIL, E_DEFAULT_FORBIDDEN, E_TRACE_DRIFT, E_POSITIVITY_BREACH
Validator: E_MANIFEST_MISMATCH, E_ENDPOINT_UNDECLARED, E_MULTIPLICITY_UNCONTROLLED, E_LABEL_LEAK
ARLiT: E_ARLIT_PLAN_MISSING, E_QFI_UNSTABLE, E_QFI_BASIS_DRIFT, E_OOS_SLOPE_EXCEED, E_OOS_RMSE_EXCEED

Bottom line. Q-TRACE supplies deterministic envelopes; QME turns them into audited trajectories; Validator delivers blinded verdicts; ARLiT says whether the information content scales. If any link breaks, the claim doesn’t ship.

Theoretical Backbone: GUTCD Linkages

7.1 Operators, Stochastic Terms, and Action Sketch

Operator set (conceptual).

State operator (\hat{X}(t)): latent “cognitive” state.
Observation operator (\hat{O}): maps state to observables (task-level readouts).
Energy/drive operator (\hat{E}(t)): control/effort injection.
Commutation: ([\hat{O},\hat{E}]\neq 0) in general → order matters → threshold phenomena.

Stochastic backbone (Itô form).
[
d\hat{X}_t ;=; \mu(\hat{X}_t,t),dt ;+; \sigma(\hat{X}_t,t),dW_t,
]
with (\mu) drift, (\sigma) diffusion, (W_t) a Wiener process (operator-valued or applied elementwise to components).

Action sketch (coarse).
Define a path functional penalizing energy, variance, and slack from a target operator (\hat{X}\star):
[
\mathcal{S}[\hat{X},\hat{E}] ;=;
\int_0^T !\Big(
\underbrace{\alpha|\hat{E}(t)|^2}{\text{energy}}
+\underbrace{\beta,\mathrm{Tr},\Sigma_{\hat{X}}(t)}{\text{uncertainty}}
+\underbrace{\gamma,\mathcal{L}(\hat{X}(t),\hat{X}\star)}_{\text{goal}}
\Big),dt,
]
subject to the stochastic dynamics and ([\hat{O},\hat{E}]) structure. Q-TRACE-style controls arise by minimizing (\mathcal{S}) with information weighting (heavier actuation where the local information gain is high).

Threshold response.
Because of non-commutation and diffusion, optimal policies naturally show quiet–push–hold phases: do little until the system enters a high-sensitivity band, then act decisively, then taper to maintain.

7.2 What Carries Over to Quantum Analogs (and what doesn’t)

Carries over (usable today).

Operator algebra: non-commutation ([\cdot,\cdot]) directly mirrors Hamiltonian control.
Stochasticity → dissipation: diffusion/noise map to Lindblad dissipators (L_j) and rates (\gamma_j).
Action trade-offs: energy vs. variance vs. target slack becomes control energy vs. decoherence vs. fidelity.
Information weighting: local sensitivity (e.g., via QFI) is a principled weight for actuation.

Does not (or not without work).

Classical Itô noise ≠ quantum noise: Itô SDEs are not CPTP by default; Lindblad is. We don’t inject classical noise into (\rho); we specify (L_j,\gamma_j).
Measurement back-action: GUTCD observation costs don’t automatically encode quantum measurement disturbance; if needed, include measurement channels explicitly in (L_j).
Multi-agent cognitive couplings: require explicit many-body Hamiltonians/correlated dissipators; not covered by single-qubit scope.

Bottom line: we port the math that respects non-commutation and information gain, and we discard anything that would break CPTP or hide in “effective” noise.

Pipeline Mechanics (End-to-End)

8.1 Data Contract: QME → Validator (schema, hashes, seeds)

Required files per arm (immutable):

timeseries.csv — monotone t with columns:
- t, F, QFI_theta
- optional: flattened (\rho) (rho_re_00, rho_im_00, …)
manifest.json — full config:
- engine: { name:"DREAMi-QME", version:"x.y.z" }
- system: { dim:2, basis:"Pauli" }
- hamiltonian: { H0, controls:[{axis, H, envelope:{type, segments, bounds, bandwidth_hz, seed}}] }
- dissipators: [{ op, rate | rate_piecewise }]
- numerics: { integrator, dt, t_final, theta_step, spectrum_eps, tolerances }
- targets: { rho_star | ket, endpoints:["F_terminal","AUC","T_hit_tau"], tau }
- policy: { source:"Q-TRACE"|"...", policy_id, version, bounds }
- anti_alias: { fs, fbw, nyquist_factor }
- seeds: { rng, policy }
- platform: { os, cpu, float }
hashes.json — SHA-256 per file + bundle_sha256 (optional HMAC).
runlog.jsonl — chronological events + any QME refusal codes.
preview.png — optional, must be plotted from the CSV.

Validator plan block (companion file or CLI args → serialized):

{
  "primary_endpoint": "F_terminal",
  "secondaries": ["AUC"],
  "alpha": 0.05,
  "multiplicity": { "method": "bonferroni", "m": 1 },
  "equivalence": { "mode": "none", "delta_margin": null },
  "bootstrap": { "B": 10000, "seed": 7321 }
}

If any required field is missing or malformed → E_BUNDLE_INCOMPLETE / E_SCHEMA_INVALID.

8.2 Physics-Equivalence Checks (H(t), (\gamma_i) parity)

Hard rule: Arms must be identical on physics and numerics except the control envelope. Validator compares:

hamiltonian.H0 (structure and numbers)
hamiltonian.controls[*].H (generators)
dissipators[*].op and rate specifications (including piecewise time tags)
numerics (integrator family, dt, theta_step, tolerances)
targets, units, basis, platform float
anti_alias (fs, fbw) consistency

Differences allowed only in policy envelope parameters. Anything else → E_MANIFEST_MISMATCH.
Unit mismatches → E_UNIT_MISMATCH. Version drift vs. declared → E_VERSION_DRIFT.

Why strict? Prevents “apples vs. oranges” where one arm secretly gets easier physics or looser numerics.

8.3 Audit Trail and Replay Protocol

Audit trail (generated automatically):

Provenance: engine/version, platform fingerprint, seeds.
Hashes: file-level SHA-256 + bundle_sha256; optional HMAC.
Gates: explicit results of convergence and anti-alias checks.
Blinding: L/R map stored with salted key; not exposed until verdict is sealed.
Stats: verdict.json with Δ, (p), BCa CI, (d_z), and any refusal codes.

Replay protocol (what reviewers actually do):

Verify hashes. Recompute all SHA-256; match hashes.json and bundle_sha256.
Re-run QME. Execute with manifest.json and fixed seeds. Expect byte-identical timeseries.csv.
Re-run Validator. Point to A/ and B/ bundles + plan. Expect the same verdict.json.
Cross-check blinding. Open unblind.json to confirm L/R↔A/B mapping matches salted record after verdict is fixed.
Stress (optional). Halve dt in a duplicate manifest → confirm QME’s gate would pass (|\Delta\bar F|\le 1e-4) and no change to conclusions.

Failure handling (truth, not PR):

Any hash mismatch or missing artifact → NO VERDICT with E_HASH_MISSING/E_BUNDLE_INCOMPLETE.
Re-run not byte-identical → flag environment drift; if unrecoverable, invalidate the claim.
Blinding key used before verdict → E_PROTOCOL_BREACH and the analysis is void.

Result: A reviewer can prove you right or catch you in under an hour with just the bundles. That’s the point.

Case Studies (Single-Qubit, 50-Seed Synthetic)

9.1 Experimental Design and Pre-registration Notes

Objective. Test whether a Q-TRACE–style controller (B) outperforms a baseline controller (A) on single-qubit Lindblad dynamics.

Scope. Regimes A–E spanning drifted Z, resonant XY drive, pure dephasing ((\gamma_\phi)), amplitude damping ((\gamma_1)), and mixed noise. Physics fixed across arms; only control envelopes differ.

Primary endpoint (pre-registered).
Terminal fidelity (F(T)) at a fixed horizon (T).

Secondaries (pre-registered).
(i) AUC (\int_0^T F(t),dt); (ii) time-to-threshold (T_{\text{hit}}(\tau=0.95)).

Analysis plan (pre-registered).

Paired design across 50 seeds.
Primary test: two-sided paired (t)-test on (\Delta_s = F_{B,s}(T)-F_{A,s}(T)).
Interval: BCa 95% CI for (\bar\Delta); effect size Cohen’s (d_z).
Robustness: Wilcoxon on ({\Delta_s}).
Multiplicity: Bonferroni (m=1 for primary; secondaries reported, not gated).
Blinding: A/B remapped to Arm-L/R internally (seed-pinned).

Numerics (locked).
RK4 with fixed (\Delta t) chosen from policy bandwidth (Nyquist factor (\ge 10)); convergence gate (|\Delta \bar F|\le 10^{-4}) under (\Delta t \to \Delta t/2). Deterministic RNG seeds. Units: rad/s, s.

Artifacts. For each seed/arm: timeseries.csv, manifest.json, hashes.json, runlog.jsonl. For the study: verdict.json, unblind.json.

9.2 Results: (\bar\Delta), (p), BCa CIs, (d_z); robustness & sensitivity

Primary (terminal (F(T))).

Mean uplift (\bar\Delta): +0.0063 (B over A)
Paired (t)-test: (p=0.0002)
BCa 95% CI: [+0.0030, +0.0096] (excludes 0)
Effect size (d_z): ≈ 0.28 (small, consistent)

Robustness.

Wilcoxon signed-rank: agrees in direction; two-sided (p\approx 4×10^{-4}).
Influence analysis: leave-one-out (\bar\Delta) range stayed within ±0.0007 of the reported mean—no single seed dominates.

Sensitivity (numerics).

Step refinement: halving (\Delta t) changes (\bar F) by (\le 10^{-4}) (gate passed for all seeds).
Grid jitter: perturbing sample times by <1% leaves (\bar\Delta) within CI.
QFI stability: finite-difference (\delta_\theta) halved → QFI curve differences below reporting tolerance; no E_QFI_UNSTABLE.

Secondaries (descriptive).

AUC: median uplift small-positive; dispersion overlaps zero → not powered to claim.
(T_{\text{hit}}(0.95)): B tends to hit earlier on regimes with stronger dephasing; effect vanishes in near-unitary cases.

Verdict (per plan). PASS (superiority) for primary endpoint.

9.3 Refusals Triggered (if any) and Resolutions

Locked study runs: No refusals. All bundles clean; Validator emitted no refusals[].

Pilot (pre-lock) issues worth noting:

E_ALIASING (QME) on an early pulse with high-BW Fourier content. Resolution: raised sampling to maintain Nyquist factor ≥10 and re-ran; effect estimates unchanged within CI.
E_DEFAULT_FORBIDDEN (QME) when a tolerance field was omitted. Resolution: added explicit tolerances block to manifest.

These did not occur post pre-registration lock and are documented to show what fails fast.

9.4 Lessons: what holds, what breaks

Holds.

The uplift is real but modest under these physics and numerics.
Determinism + blinding eliminates most ways to “wish” an effect into existence.
Convergence/alias gates matter; without them, false wins are easy.

Breaks / limits.

Generalization: single-qubit gains do not imply multi-qubit wins. ARLiT must pass on (n>1).
Endpoint sensitivity: swapping the primary to AUC can erase significance in near-unitary regimes—don’t endpoint-shop.
Bandwidth: fancy pulses that spike bandwidth will get refused unless sampling is increased (and that can change energy budgets).

Takeaway. We’ve shown a defensible advantage in-sim. Bigger claims demand multi-qubit ARLiT passes and (eventually) hardware trajectories.

Visualization & Operator-Facing UI

10.1 Evidence Panels (time-series, CIs, effect sizes)

Evidence Panel = what ships to reviewers and operators:

Time-series stack: (F(t)) overlays for Arm-L/R (still blinded), thin lines, no smoothing; QFI subplot optional.
Delta distribution: violin or histogram of ({\Delta_s}) with mean/median markers.
BCa CI tile: large, unmissable: (\bar\Delta), BCa 95% CI, (p), (d_z).
Refusal strip: green when clean; shows exact codes if tripped.
Provenance footer: engine/validator versions, seeds, hashes (truncated), units.

Rules:

Plots are re-rendered from CSV, never from in-memory arrays.
Axis units printed; zero lines shown; no chart junk.
Blinding preserved in all visuals; unblinding happens after verdict serialization.

10.2 A/B Race HUD and “kid-friendly” narratives (what it actually shows)

What it is. Split-screen Race to (\tau): two line traces start at (t=0) and “race” to fidelity (F=\tau) (checkered flag at (\tau=0.99) by default).

What it shows (truthfully):

Time-to-threshold comparison under fixed physics and locked numerics.
Consistency across seeds: mini-race thumbnails grid (“best of 50”) make variance obvious.
No magic: if both hit fast, race is a tie; if neither hits, both lose. No hiding.

What it doesn’t show:

Scale validity (that’s ARLiT).
Statistical significance (that’s the Evidence Panel).
Hardware reality (this is simulation).

Narrative for non-experts.
“Two strategies try to reach the same goal line under the same rules. We run the race many times with different dice rolls. Then we check with statistics whether one is consistently faster, or if it just looks that way once.”

10.3 Failure Visuals: how refusals are surfaced

Refusal UX (loud, specific, actionable):

Banner: “NO VERDICT — Clean-data check failed.”
Badges (clickable): E_ALIASING, E_MANIFEST_MISMATCH, E_CONVERGENCE_FAIL, etc.
Inline pointers: each badge links to the exact manifest/runlog line (e.g., Nyquist factor = 6.2 < 10).
Auto-diff: side-by-side manifest diff when parity fails; offending fields highlighted.

Examples.

E_ALIASING: header turns red; tooltip: “Increase sampling or reduce bandwidth; require (f_s \ge 10 f_{\text{bw}}).”
E_MULTIPLICITY_UNCONTROLLED: modal offers to save but forbids verdict until a plan is selected.
E_LABEL_LEAK: analysis controls disabled; note explains that labels will be re-blinded.

Operator affordances (safe).

“Export bundle” always enabled (even on failure) so reviewers can inspect.
“Re-run with dt/2 (dry-check)” button runs the convergence gate only and logs the result; it does not overwrite artifacts.

Policy: no green checkmarks unless all gates pass and the verdict is sealed. If something’s shaky, the UI says so in plain English and shows you exactly where it broke.

Reproducibility & Portability

11.1 Single-file, Offline Constraints and Rationale

Single-file deliverables (one .html for UI builds; one .py/.exe or self-contained .html for CLI/engine builds) remove environment drift: no pip/conda hell, no CDN rot, no “worked on my machine.”
Offline-by-design ensures runs are not contaminated by network jitter, remote RNGs, or silently updated dependencies. If you need a web call to make the claim, the claim isn’t reproducible.
Commodity hardware target (laptop/edge) forces numerics to be lean and deterministic. If it only runs on an HPC cluster, reviewers won’t rerun it.

11.2 Seeds, Hashes, Determinism

Fixed seeds per component (rng, policy) are mandatory; no “system entropy” or wall-clock seeding.
Deterministic integrators (fixed-step RK4 / exact segment exponentials) and fixed grids: same inputs → same bytes.
SHA-256 per artifact + bundle hash: timeseries.csv, manifest.json, runlog.jsonl, optional preview.png. Optional HMAC if you want tamper-evidence.
Refusal if sloppy: missing seeds → E_PROVENANCE_MISSING; hash mismatch → E_HASH_MISSING; non-monotone t or NaNs → E_SCHEMA_INVALID.

11.3 Replication Pack (CSV/JSON/PNG) and Manifest Spec

Per-arm replication pack (what a reviewer actually needs):

timeseries.csv — exact grid as integrated. Columns: t, F, QFI_theta[, rho_re_ij, rho_im_ij...]. No smoothing.

manifest.json — everything explicit:

{
  "engine": {"name":"DREAMi-QME","version":"x.y.z"},
  "system": {"dim":2,"basis":"Pauli"},
  "hamiltonian": {"H0": {...}, "controls":[{"axis":"x","H":"0.5*sigma_x",
                   "envelope":{"type":"piecewise","segments":[{"t0":0,"t1":0.01,"ux":0.8,"uy":0,"uz":0}], "bandwidth_hz":800}}],
  "dissipators": [{"op":"sigma-","rate":gamma1}, {"op":"sigma_z","rate":gamma_phi}],
  "numerics": {"integrator":"RK4","dt":1e-4,"t_final":0.02,"theta_step":1e-5,"spectrum_eps":1e-10},
  "targets": {"rho_star":"ket|0>","endpoints":["F_terminal","AUC"],"tau":0.95},
  "policy": {"source":"Q-TRACE","policy_id":"QTRACE-…","version":"p.q.r","bounds":{"Umax":1.0,"Smax":10,"Emax":2.5}},
  "anti_alias": {"fs":10000,"fbw":800,"nyquist_factor":12.5},
  "units": {"freq":"rad/s","time":"s","rate":"1/s"},
  "seeds": {"rng":12345,"policy":67890},
  "platform": {"os":"…","cpu":"…","float":"IEEE754-64"}
}

hashes.json — file SHA-256 + bundle_sha256 (+ hmac if used).
runlog.jsonl — chronological events, gate outcomes, refusals (if any).
preview.png — optional evidence plot rendered from CSV.

Study-level artifacts: verdict.json (stats, CIs, effect size, refusals) and unblind.json (salted L/R→A/B map after verdict).

Validation Circle (Mutual Checks)

12.1 How QME Validates Validator

Clean inputs or nothing. QME enforces convergence (E_CONVERGENCE_FAIL if not), anti-alias (E_ALIASING), trace/positivity (E_TRACE_DRIFT, E_POSITIVITY_BREACH). That prevents Validator from doing statistics on numeric garbage.
Physics fingerprints. QME writes stable hashes for (H_0, H_k, L_j, \gamma_j), units, and numerics into manifest.json. Validator uses these to rule out apples-to-oranges (E_MANIFEST_MISMATCH).
Deterministic artifacts. Byte-identical CSVs prove that any change in verdict is due to analysis, not drifting simulation.

12.2 How Validator Validates QME

Refusal surfacing. If QME snuck through with hidden defaults or missing fields, Validator blocks the study: E_DEFAULT_FORBIDDEN, E_SCHEMA_INVALID, E_BUNDLE_INCOMPLETE.
Physics parity audit. Exact equality on all non-control fields; unit checks; version pins. Any discrepancy → NO VERDICT.
Blinded, paired inference. If QME’s “win” depends on label knowledge or cherry-picked endpoints, Validator kills it (E_LABEL_LEAK, E_ENDPOINT_UNDECLARED, E_MULTIPLICITY_UNCONTROLLED).
Bootstrap reality check. BCa intervals and Wilcoxon expose non-normality or outlier-driven mirages even when a t-test “looks good.”

12.3 How ARLiT Validates Q-TRACE (and vice-versa)

ARLiT → Q-TRACE. ARLiT learns a simple renormalizer (Z(\Lambda)) on train and demands OOS flatness of information (C(\Lambda)=Z(\Lambda)I(\Lambda)) on holdout sizes/resolutions. If Q-TRACE only wins at (n=1), ARLiT exposes that: E_OOS_SLOPE_EXCEED / E_OOS_RMSE_EXCEED → FAIL on scale validity.
Q-TRACE → ARLiT. Well-behaved Q-TRACE policies (bandwidth/energy bounded, information-weighted actuation) generate trajectories whose QFI is stable to estimator settings and amenable to renormalization. If ARLiT shows QFI estimates are twitchy to (\delta_\theta) or eigenfloor (\epsilon), that’s a QFI problem, not a control victory (E_QFI_UNSTABLE, E_QFI_BASIS_DRIFT).
Closed loop on claims.
- Q-TRACE claims performance → must PASS Validator (superiority/equivalence) and not FAIL ARLiT at larger (n).
- ARLiT claims scale-invariance → must be computed on QME-clean trajectories and survive Validator’s blinding/multiplicity discipline when compared across policies.

Blunt summary: QME stops bad numbers; Validator stops bad inference; ARLiT stops bad extrapolation. If a result survives all three, it’s worth putting in a paper. If it trips anywhere, the right outcome is NO VERDICT, not creative storytelling.

Limitations & Failure Modes

13.1 Single-qubit scope and generalization risks

Scope ceiling. All hard claims are for n = 1 with simple (H_0 + \sum_k u_k(t)H_k) and standard channels (dephasing (\gamma_\phi), amplitude damping (\gamma_1)). Anything beyond that is not proven.
Controller brittleness. Q-TRACE envelopes tuned for single-qubit symmetries can collapse under:
- Crosstalk and correlated noise at (n>1).
- Entangling generators ((\sigma_i\otimes\sigma_j)) that change controllability and spectra.
- Hardware constraints (latency, finite rise time, DAC quantization).
Metric fragility. Fidelity uplift at (n=1) may not track application value at scale (e.g., sensing precision, logical error rate). ARLiT can still pass while the task metric degrades if you picked the wrong (I(\Lambda)).

Bottom line: Single-qubit gains are necessary but not sufficient. Multi-qubit ARLiT + task-relevant endpoints are required to claim generality.

13.2 Numerical pitfalls (stiffness, discretization, aliasing)

Stiff dynamics. Large (\gamma) or fast control segments → RK4 can misbehave. If implicit fallback isn’t used, you’ll see false ringing or positivity creep. This is caught by E_NUMERIC_UNSTABLE / E_POSITIVITY_BREACH if guards are active—but only if guards are active.
Discretization error. Coarse (\Delta t) hides fast transients and overstates AUC/terminal (F). Our gate (|\Delta\bar F|\le 10^{-4}) under (\Delta t/2) is necessary; without it, “wins” are artifacts.
Aliasing. Fourier-rich envelopes demand (f_s \gg f_{\text{bw}}). If Nyquist < 10×, you can literally fabricate a speedup. We refuse via E_ALIASING; drop that and you’re doing chart art.
QFI estimator fragility. Finite-difference (\delta_\theta) too big → bias; too small → numerical noise. Spectral reordering across steps throws QFI around (E_QFI_BASIS_DRIFT). Passing QME doesn’t guarantee QFI is stable unless you check it.

Bottom line: if you loosen gates, your “effect” will inflate. Tight numerics are non-negotiable.

13.3 Statistical traps (multiple looks, leakage)

Endpoint shopping. Swapping the primary from terminal (F(T)) to AUC or (T_{\text{hit}}) after peeking is classic p-hacking. The cure is pre-registration + Bonferroni/FDR; otherwise E_MULTIPLICITY_UNCONTROLLED.
Multiple interim peeks. Repeating analyses across seeds/regimes inflates false positives. Either adopt alpha-spending, or don’t peek. Validator won’t let you pass quietly.
Label leakage. Any UI or filename hint that “B = Q-TRACE” biases eyeballing and ad-hoc exclusions. Blinding to Arm-L/R prevents this; violations hit E_LABEL_LEAK.
Selective seed curation. Dropping “bad” seeds post hoc is fraud-adjacent. The bundle and seed list are hashed; tampering shows up instantly.

Bottom line: if you can’t pass blinded with a single pre-declared primary, you don’t have a real result yet.

13.4 What would falsify the claims

Concrete kill-switches. If any of these occur under a clean protocol, the corresponding claim is false:

No superiority at n=1 (primary).
Under locked physics/numerics, 50+ paired seeds, BCa 95% CI for (\bar\Delta) includes 0 and paired (t) (p\ge \alpha).
→ Falsifies “B outperforms A” (in-sim single-qubit).
Numerical non-robustness.
Re-running with (\Delta t \to \Delta t/2) moves (\bar F) by (>10^{-4}) or flips the verdict.
→ Falsifies claim of deterministic, converged numerics.
Aliasing-sensitive win.
Increasing sampling to meet (f_s \ge 10 f_{\text{bw}}) removes the effect.
→ Falsifies the reported uplift; prior result was an artifact.
ARLiT OOS failure at (n>1).
After fitting (Z(\Lambda)) on train sizes, holdout slope (m_{\text{OOS}}) CI excludes 0 or RMSE exceeds threshold for Q-TRACE while baseline passes.
→ Falsifies scale-invariance of the policy advantage.
Plan-robustness failure.
Re-analysis by an independent party with the same bundles and plan yields a different verdict.json (not a tooling bug).
→ Falsifies reproducibility; claims are not audit-stable.
Physics-parity breach.
Any manifest diff in non-control fields between arms (caught as E_MANIFEST_MISMATCH) was present in the purported “positive” run.
→ Falsifies fair comparison; result is invalid.
QFI instability.
Reasonable changes in (\delta_\theta) or eigenfloor (\epsilon) (pre-declared range) materially change conclusions about “information advantage.”
→ Falsifies information-based claims (ARLiT/QFI).

Standard of proof: a NO VERDICT is not falsification; it means “insufficiently clean.” Falsification requires a clean counter-run that passes all gates and still negates the claim.

Applications & Impact

14.1 Quantum control, sensing, and initialization

Controller benchmarking. Use QME → Validator to compare pulse families (baseline vs. Q-TRACE or any third-party policy) on the same physics, with blinded stats. Primary endpoints: terminal (F(T)), (T_{\text{hit}}(\tau)), AUC. Outcome: a defensible “B beats A” or NO VERDICT.
Initialization routines. For NISQ-style qubits, quantify whether information-weighted envelopes reach (\tau=0.95) faster/with less energy than standard DRAG/ramps—before burning device time. If it can’t win in-sim under strict gates, don’t waste the cryostat.
Sensing metrology. Track QFI (\mathcal{F}_\theta(t)) under decoherence to decide when to sample and how hard to drive. ARLiT checks whether that QFI advantage is scale-stable as you coarse-grain or increase (n).
Policy hardening. Failure modes (aliasing, stiffness, positivity) surface early. Policies that only “win” with sloppy numerics die before they reach hardware.

14.2 Cognitive-quantum modeling (where GUTCD helps)

Operator mapping. GUTCD’s non-commuting operators and stochastic terms map cleanly to (H(t)), (L_j), (\gamma_j). You can test “attention/effort” analogs as control fields, with dissipation modeling “forgetting.”
Information-weighted actuation. The same logic—“push where information gain is high”—translates to Q-TRACE envelopes. Validator then tells you if the behavioral uplift (proxy metrics) is real or wishful.
Scale claims under pressure. ARLiT forces you to prove that a cognitive-style information metric (e.g., an analog of QFI) is renormalizable across resolution/size. If it isn’t flat OOS, your theory doesn’t scale; fix the metric or the controller.

14.3 Deployment constraints (no-HPC, edge labs)

Runs on laptops. Fixed-step RK4 / segment exponentials, single-file bundles, zero internet. A grad student in a small lab can reproduce the exact bytes and the verdict without admin rights or clusters.
Auditable hand-offs. Vendors can ship policy-only (JSON) without exposing IP; labs run QME locally, push bundles to Validator, and publish verdict.json. Everyone can verify hashes.
Upgrade path without drift. Version pins, seeds, and manifests stop “it changed after I updated Python.” If an upgrade is needed, it’s explicit and re-verifiable.

14.4 Why this beats “demo-ware”

Demo-ware: Pretty plots, hidden defaults, moving targets, and cherry-picked runs.
This stack:
- Blinded, paired stats → no label worship.
- Hard numerical gates → no aliasing/stiffness fakery.
- Hash-anchored artifacts → identical replays or it didn’t happen.
- Scale audit (ARLiT) → no hand-waving from (n=1) to “works in general.”
Business reality: Investors, reviewers, and integrators can pull the bundle, rerun, and get the same verdict. If your claim survives that, it’s actionable; if it doesn’t, it’s marketing.
Roadmap
15.1 Multi-qubit extension plan (state space, noise models)
Target: lift from (n=1) to modest (n) (2–5) without wrecking determinism or auditability.
State representation
- Primary: Liouville (vectorized density) with sparse superoperators; dimension (4^n).
- Operator basis: Pauli transfer matrix (PTM) for human-readable manifests; avoids basis drift.
- Compression (optional, pre-declared): low-rank projector on (\rho) or matrix-product density operator (MPDO) for weakly entangled, local Lindbladians. All compression choices appear in manifest.json or run is refused.
Lindbladian construction
- Local noise: (L_j^{(k)} = \sqrt{\gamma_j^{(k)}},\sigma^{(k)}) (qubit-local).
- Correlated noise: two-body jump ops (e.g., (\sqrt{\gamma_{ZZ}},\sigma_z!\otimes!\sigma_z)).
- Control: (H(t)=\sum_k H_0^{(k)}+\sum_k \sum_m u_{k,m}(t) H_{k,m} + \sum_{k<\ell} J_{k\ell} H_{k\ell}) (entangling terms explicit).
Time stepping
- Segment-exact expm (Krylov/expmv) on sparse (\mathcal L) when envelopes are piecewise-constant.
- Split-operator / Trotter-Suzuki for (H(t)) with fast controls, bounded error reported.
- Stiff fallback: implicit BDF(1) via sparse linear solves. If positivity/trace drift exceeds bounds → refusal.
QFI & metrics at scale
- Task-aligned metrics: logical error proxy, entanglement fidelity, multi-param QFI (diagonal only, unless pre-declared).
- QFI stability guards: eigen-floor & basis-ordering across (n) logged; instability → E_QFI_UNSTABLE.
Acceptance tests (must pass before claiming “multi-qubit”)
1. Unit tests on 2-qubit channels (amplitude damping ⊗ dephasing) with analytic checks.
2. Convergence: (|\Delta \bar F|\le 10^{-4}) under (\Delta t!\to!\Delta t/2) at (n=2,3).
3. ARLiT pilot: OOS flatness across (\Lambda) (e.g., coarse-graining) holds within pre-declared (\tau_m,\tau_\text{rmse}).
4. Determinism: byte-identical CSV/JSON on two machines.
Known risks
- State-space blow-up ((4^n)) → cap (n) unless MPDO is pre-declared.
- Correlated noise stiffness → implicit solvers get expensive; refuse rather than “wing it.”
15.2 Real-hardware ingestion and calibration layers
Goal: treat hardware as “just another bundle” with enough calibration to make comparisons fair.
Calibration pack (calibration.json)
- Clocks & delays: DAC/ADC latency, trigger skew, sampling rate.
- Transfer function: measured impulse/step response; optional deconvolution kernel.
- Amplitude map: DAC code → Rabi rate; nonlinearity curve.
- Noise rates: (T_1, T_2) with CIs; drift model over session.
- Axis misalignment: frame & phase offsets; cross-drive coefficients.
- Units: volts/Hz/rad/s; time base.
Hardware trajectory bundle
- hw_timeseries.csv: t, F_meas | P0 | IQ_raw…, meta…
- hw_manifest.json: experiment program, pulse schedule, sample rate, readout chain.
- hashes.json, runlog.jsonl (acquisition log), optional preview.png.
Ingestion pipeline
1. Hash & schema check (same discipline as sim).
2. Clock align: resample to QME grid; record interpolation error.
3. Transfer-function correction (if pack provided) → produce both raw and corrected series.
4. Unit normalization; flag E_UNIT_MISMATCH if ambiguous.
5. Drift check: rolling (T_1/T_2); exceed threshold → E_CAL_DRIFT.
Refusal codes (hardware layer)
- E_CAL_MISSING (no calibration pack), E_CLOCK_SKEW, E_TRANSFER_FN_UNKNOWN, E_CAL_DRIFT, E_VENDOR_SCHEMA_INVALID.
Acceptance tests
- Sim-vs-hw parity on an identity program (no drive) within CI bands.
- Rabi/ramsey curves fit within pre-declared residual thresholds.
- End-to-end replay produces a verdict.json with NO ingestion refusals.
15.3 Automation of pre-reg + continuous audit
Pre-registration templates (machine-readable)
```
{
  "study_id":"…",
  "primary_endpoint":"F_terminal",
  "alpha":0.05,
  "equivalence":{"mode":"none","delta":null},
  "multiplicity":{"method":"bonferroni","m":1},
  "n_seeds":50,
  "regimes":["A","B","C","D","E"]
}
```
- Signed with maintainer key; stored alongside code.
Continuous audit (CI-style)
- On every commit/tag: run a locked seed panel, emit verdict.json, compare against baseline verdicts.
- If any regression in clean-data gates or verdict flip → block release.
- Publish a badge: PASS/FAIL + hash of the latest baseline.
Artifacts
- prereg.json, audit_report.json, diff_manifest.json, and a zipped Replication Pack.
- Optional Transparency Log (append-only) for public timestamping.
Operator ergonomics
- One-click “Run prereg” and “Export replication pack”.
- Auto-filled Methods paragraph from manifest + plan (no hand edits).
Ethics, Openness, and IP
16.1 Data/Sim Integrity, Disclosure, and Authorship
- Provenance first. Every figure ties to a bundle hash; figures are reproducible from CSV, never hand-drawn.
- Disclosure: every claim labeled SIMULATION or HARDWARE, with calibration status. No mixing in captions.
- Authorship: list who conceived, implemented QME/Validator/ARLiT, designed policies, ran analyses. Include contribution matrix (CRediT).
- Negative results: refused or null findings are documented; selective reporting is misconduct here.
16.2 What is shared vs. protected (and why)
Shared (by default)
- Bundles: CSV/JSON/hashes/runlogs, verdict.json, prereg plan, plotting code.
- Specs: manifest schemas, refusal code tables, validation gates.
- Calibration methodology: how to measure (T_1/T_2), transfer functions (not necessarily raw factory data).
Protected (acceptable)
- Q-TRACE policy generator internals (training code, heuristics) if policies are shipped as static envelopes with bounds + IDs.
- Hardware-specific calibration constants if under NDA (share ranges + methods instead).
- Keys for HMAC/signing.
Rationale: reviewers can still verify claims from envelopes + bundles; IP stays safe because we test what a policy does, not how you cooked it.
16.3 Dual-use considerations
- Benign default: stack aims at control validation and metrology, not exploitation.
- Red lines: no features to optimize covert surveillance, weapon guidance, or mass-harvested bio signals. If a use case looks like that, we refuse collaboration.
- Abuse resistance:
  - Blinding + prereg reduce “weaponized statistics.”
  - Hard gates prevent laundering of numerically fabricated gains.
  - Transparency logs make quiet retro-edits obvious.
- Human factors: kid-friendly visuals are explanatory, not hype; captions state limits plainly.
Straight answer: This ecosystem is built to withstand audit. We share enough for anyone to prove us wrong. We protect only what isn’t required to verify a claim, and we walk away from applications that cross the line.
Conclusion
17.1 What’s defensible today
- Pipeline, not promises. We have a reproducible, offline, single-file simulator (DREAMi-QME) and a blinded, refusal-first adjudicator (Validator V2). They produce byte-identical artifacts, enforce physics parity, and block junk (aliasing, hidden defaults, mis-specs). That’s audit-grade.
- Evidence of uplift (n=1). On locked single-qubit regimes (50 seeds), the Q-TRACE-style policy shows a small but reliable advantage on terminal fidelity: (\bar\Delta \approx +0.0063), BCa 95% CI excludes 0, paired-t p≈2e-4, (d_z≈0.28). It’s real in-sim, not huge.
- Numerical discipline. Convergence gate (|\Delta\bar F|\le10^{-4}) under (dt/2), Nyquist ≥10× policy bandwidth, positivity/trace guards—all enforced. No smoothing, no silent defaults.
- Fairness by design. Blinding, paired analysis, BCa CIs, multiplicity control. If an input is unclean, the system returns NO VERDICT—full stop.
- Scale audit machinery present. ARLiT can test scale-invariant information (QFI) across (\Lambda). It’s wired in; at (n=1) it’s routine, beyond that it’s partially exercised.
- Operator-facing honesty. Evidence panels show time-series, CIs, and effect sizes from CSV. The A/B Race HUD is illustrative, not statistical proof. Failures are loud and specific.
Bottom line: We can generate, audit, and judge single-qubit control claims cleanly. The measured advantage is modest and defensible within scope. No hardware claims. No multi-qubit generalization claims. Yet.
17.2 What must be earned next
- Multi-qubit correctness (2–5 qubits).
  - Sparse Liouvillians / MPDO path with explicit compression in the manifest.
  - Convergence and positivity gates at (n=2,3).
  - ARLiT OOS flatness across sizes; pre-declared (\tau_m,\tau_{\text{rmse}}).
  - Task-aligned endpoints (logical error proxy, entanglement fidelity), not just (F(T)).
- Hardware parity.
  - Calibration pack (clocks, transfer functions, (T_1/T_2), amplitude maps).
  - Dual-branch ingest: raw vs corrected trajectories; same refusal discipline.
  - Sim↔HW identity tests (no-drive, Rabi/Ramsey) within pre-set residual bounds.
- Stronger controller claims.
  - Pre-registered Q-TRACE families (bounds, bandwidth, energy) with frozen hyperparams.
  - Superiority or equivalence/NI margins that matter physically, not cosmetically.
  - Robustness to drift, bounded actuation, and realistic DAC constraints.
- QFI maturity at scale.
  - Stable spectral pipelines (basis-order pinning, eigen-floor audits) or analytic (\partial_\theta\rho).
  - Show that QFI-based advantages survive estimator settings and correlate with task metrics.
- Continuous audit + prereg automation.
  - Machine-readable prereg on every study; CI that reruns seed panels and blocks releases on gate failures or verdict flips.
  - Public replication packs with bundle hashes; append-only transparency log.
- Broader adoption without IP leaks.
  - Policy-as-envelope distribution (JSON) + blinded evaluation so third parties can verify results without reverse-engineering internals.
Non-negotiable standard: If a claim can’t survive blinding, gates, scale checks, and a third-party replay, it doesn’t go in the paper. When we earn multi-qubit ARLiT passes and hardware-clean verdicts, then—and only then—do we talk about generality.
References
1. Lindblad, G. (1976). On the generators of quantum dynamical semigroups. Commun. Math. Phys., 48, 119–130.
2. Gorini, V., Kossakowski, A., & Sudarshan, E. C. G. (1976). Completely positive dynamical semigroups of N-level systems. J. Math. Phys., 17, 821–825.
3. Nielsen, M. A., & Chuang, I. L. (2010). Quantum Computation and Quantum Information (10th anniv. ed.). Cambridge.
4. Helstrom, C. W. (1976). Quantum Detection and Estimation Theory. Academic Press.
5. Braunstein, S. L., & Caves, C. M. (1994). Statistical distance and the geometry of quantum states. Phys. Rev. Lett., 72, 3439–3443.
6. Paris, M. G. A. (2009). Quantum estimation for quantum technology. Int. J. Quant. Inf., 7, 125–137.
7. Efron, B. (1987). Better bootstrap confidence intervals (BCa). JASA, 82, 171–185.
8. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80–83.
9. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Erlbaum.
10. Schuirmann, D. J. (1987). A comparison of the Two One-Sided Tests procedure and the power approach for assessing equivalence. J. Pharmacokinet. Biopharm., 15, 657–680.
11. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate. J. R. Stat. Soc. B, 57, 289–300.
12. Higham, N. J. (2008). Functions of Matrices: Theory and Computation. SIAM.
13. Al-Mohy, A. H., & Higham, N. J. (2011). Computing the action of the matrix exponential via Krylov methods. SIAM J. Sci. Comput., 33, 488–511.
14. Strang, G. (1968). On the construction and comparison of difference schemes. SIAM J. Numer. Anal., 5, 506–517. (Lie–Trotter/Suzuki splitting context)
15. Wiseman, H. M., & Milburn, G. J. (2010). Quantum Measurement and Control. Cambridge. (For measurement/back-action context)
Appendices
A. Mathematical Details (Lindblad forms, QFI derivations)
Lindblad master equation (single qubit).
[
\dot\rho=-i[H(t),\rho]+\sum_j!\left(L_j\rho L_j^\dagger-\tfrac12{L_j^\dagger L_j,\rho}\right),
\quad H(t)=H_0+\sum_k u_k(t)H_k.
]
Standard channels.
Amplitude damping: (L_1=\sqrt{\gamma_1},\sigma_-). Dephasing: (L_\phi=\sqrt{\gamma_\phi},\sigma_z).
Fidelity.
Pure target (|\psi_\star\rangle): (F=\langle\psi_\star|\rho|\psi_\star\rangle). General Uhlmann fidelity:
[
F=\left(\mathrm{Tr}\sqrt{\sqrt{\rho_\star}\rho\sqrt{\rho_\star}}\right)^2.
]
QFI via SLD.
(\partial_\theta\rho=\tfrac12(\rho L_\theta+L_\theta\rho)), (\mathcal F_\theta=\mathrm{Tr}(\rho L_\theta^2)).
Spectral formula: for (\rho=\sum_ip_i|i\rangle!\langle i|),
[
\mathcal F_\theta=\sum_{i,j:,p_i+p_j>0}\frac{2}{p_i+p_j},|\langle i|\partial_\theta\rho|j\rangle|^2.
]
Finite-difference fallback.
(\partial_\theta\rho\approx[\rho(\theta+\delta)-\rho(\theta-\delta)]/(2\delta)). Use spectrum floor (\epsilon) for small (p_i).
B. Numerical Schemes and Stability Proof-Sketches
Time stepping.
- Fixed-step RK4 on (\mathrm{vec}(\rho)) with exact Liouvillian (\mathcal L(t)).
- Segment-exact exponential on piecewise-constant controls: apply (\exp(\Delta t,\mathcal L)) via scaling-and-squaring/Padé or Krylov action.
Stability/accuracy guards.
- Nyquist: (f_s\ge10,f_\text{bw}) from the control envelope → avoids aliasing.
- Convergence: require (|\Delta \bar F|\le10^{-4}) under (\Delta t\to\Delta t/2).
- Physicality: (|\mathrm{Tr}\rho-1|\le10^{-10}); (\lambda_{\min}(\rho)\ge-10^{-10}) (diagnostic; no in-step clipping).
Why this works (sketch).
- Lindblad generators are dissipative; exact map is CPTP.
- RK4 consistency + segment constancy + sufficiently small (\Delta t) gives bounded local error; the gate enforces that the global error on the reported metric is negligible.
- Aliasing gate binds discretization of control; otherwise you under-sample high-BW pulses and fabricate wins.
C. Statistical Plan (BCa bootstrap, TOST)
Design. Paired seeds (s=1,\dots,S). Primary endpoint: terminal (F(T)). (\Delta_s=M_{B,s}-M_{A,s}).
Paired (t). (t = \bar\Delta/(s_\Delta/\sqrt{S})). Two-sided (p).
Wilcoxon on ({\Delta_s}) as robustness.
BCa CI. (B=10{,}000) resamples; compute bias-correction (z_0) and acceleration (a); report 95% BCa interval.
Effect size. (d_z=\bar\Delta/s_\Delta).
TOST (equivalence). Pre-set (\delta); reject both (H_{01}:\bar\Delta\le-\delta) and (H_{02}:\bar\Delta\ge+\delta).
Multiplicity. Bonferroni by default; FDR optional if pre-declared.
D. Manifest & File Schema (JSON/CSV)
manifest.json (minimal skeleton).
```
{
  "engine": {"name":"DREAMi-QME","version":"x.y.z"},
  "system": {"dim":2,"basis":"Pauli"},
  "hamiltonian": {
    "H0": {"omega0_rad_s": 2.0},
    "controls": [
      {"axis":"x","H":"0.5*sigma_x",
       "envelope":{"type":"piecewise","segments":[{"t0":0.0,"t1":0.01,"ux":0.8,"uy":0.0,"uz":0.0}],
                   "bandwidth_hz": 800}}
    ]
  },
  "dissipators": [{"op":"sigma-","rate_1_per_s": 1000}, {"op":"sigma_z","rate_1_per_s": 200}],
  "numerics": {"integrator":"RK4","dt_s":1e-4,"t_final_s":0.02,"theta_step":1e-5,"spectrum_eps":1e-10},
  "targets": {"rho_star":"ket|0>","endpoints":["F_terminal","AUC"],"tau":0.95},
  "policy": {"source":"Q-TRACE","policy_id":"QTRACE-001","version":"p.q.r",
             "bounds":{"Umax":1.0,"Smax":10.0,"Emax":2.5}},
  "anti_alias": {"fs_hz": 10000, "fbw_hz": 800, "nyquist_factor": 12.5},
  "units": {"freq":"rad/s","time":"s","rate":"1/s"},
  "seeds": {"rng": 12345, "policy": 67890},
  "platform": {"os":"…","cpu":"…","float":"IEEE754-64"}
}
```
timeseries.csv columns.
t, F, QFI_theta[, rho_re_00, rho_im_00, rho_re_01, rho_im_01, rho_re_10, rho_im_10, rho_re_11, rho_im_11]
Strictly monotone t. No smoothing. Same grid as integration.
hashes.json. SHA-256 per artifact + bundle_sha256 (+ optional HMAC).
runlog.jsonl. Timestamped events, gates, and any refusals.
verdict.json (Validator). Verdict, stats, BCa CI, effect size, refusals, blinding hashes.
E. Refusal Codes and Troubleshooting Tree
QME layer.
- E_CONVERGENCE_FAIL → halve dt; verify (|\Delta\bar F|\le1e-4).
- E_ALIASING → increase fs_hz or reduce policy bandwidth_hz.
- E_DEFAULT_FORBIDDEN → add missing numerics/units/seed fields.
- E_TRACE_DRIFT / E_POSITIVITY_BREACH → stiff regime: use implicit stepper or adjust segmenting.
- E_PROVENANCE_MISSING → set fixed seeds; record platform.
Validator layer.
- E_MANIFEST_MISMATCH → diff manifests; make physics/numerics identical.
- E_ENDPOINT_UNDECLARED / E_MULTIPLICITY_UNCONTROLLED → write prereg plan; set primary & method.
- E_LABEL_LEAK / E_PROTOCOL_BREACH → remove labels; re-run blinded.
ARLiT layer.
- E_QFI_UNSTABLE → adjust theta_step/spectrum_eps; lock eigenbasis ordering.
- E_QFI_BASIS_DRIFT → enforce consistent eigenvector phase/order.
- E_OOS_SLOPE_EXCEED / E_OOS_RMSE_EXCEED → policy doesn’t scale; redesign or change metric.
Tree (short version).
Refusal? → Identify layer → Read code → Apply fix above → Reproduce bundle → Re-validate. No hacks, no overrides.
F. Reproducibility Checklist (hashes, seeds, env)
- Seeds fixed (rng, policy) and recorded in manifest.
- Units explicit (rad/s, s, 1/s).
- Numerics explicit (integrator, dt_s, theta_step, spectrum_eps).
- Anti-aliasing satisfied (fs_hz ≥ 10*fbw_hz).
- Convergence gate passes ((|\Delta\bar F|\le1e-4) under (dt/2)).
- SHA-256 per artifact + bundle_sha256.
- Platform fingerprint recorded (OS/CPU/float).
- No NaNs/Infs; monotone t.
- Blinding preserved until verdict serialization.
- Prereg plan present and honored.
G. Expanded Case-Study Tables and Figures
Tables (suggested).
- T1: Per-regime summary (mean ΔF, SD, (d_z), BCa CI, (p_t), (p_W)).
- T2: Sensitivity (dt/2, grid jitter) deltas.
- T3: QFI stability vs. (\delta_\theta), (\epsilon).
Figures (from CSV).
- F1: Stacked (F(t)) by seed (thin gray) + mean (bold) per arm (blinded).
- F2: Histogram/violin of (\Delta_s).
- F3: BCa CI tile for (\bar\Delta).
- F4: ARLiT renormalized (C(\Lambda)) with train/holdout bands (when available).
H. Glossary and Notation
- (F(t)): fidelity to target at time (t).
- (\mathcal F_\theta): Quantum Fisher Information w.r.t. parameter (\theta).
- QME: DREAMi-QME Lindblad simulator.
- Validator: DREAMi-Validator V2 statistics engine.
- ARLiT: scale-invariance auditor (renormalized information flatness).
- Q-TRACE: information-weighted control envelope family.
- AUC: area under (F(t)) curve.
- BCa: bias-corrected and accelerated bootstrap CI.
- (d_z): paired-samples Cohen’s effect size.
- OOS: out-of-sample (holdout).
- (\Lambda): scale/resolution index (or qubit count (n)).
I. Software & Data Availability; Replication Instructions
What we release.
- Source/binaries for QME + Validator (single-file builds), ARLiT auditor, plotting script.
- Full replication packs for each arm (CSV/JSON/hashes/runlog) + study-level verdict.json, unblind.json, and prereg plan.
- Schema docs and refusal code tables.
How to replicate (straight).
1. Verify bundle_sha256.
2. Run QME with manifest.json → reproduce timeseries.csv byte-for-byte.
3. Run Validator on A/ and B/ + the plan → reproduce verdict.json.
4. (If applicable) Run ARLiT on multi-(\Lambda) bundles → check OOS flatness results.
5. Re-do convergence gate by halving dt_s in a copy of the manifest → confirm (|\Delta\bar F|\le1e-4) and verdict stability.
If your run differs:
- Hash mismatch → your artifacts aren’t the originals.
- Verdict mismatch with matching bundles → report as a bug with your OS/CPU/float; we’ll treat it as a release-blocking issue.
- ARLiT OOS failure → your policy doesn’t scale. That’s the point of the audit.
That’s everything you need to re-run us, catch us, or corroborate us—no HPC, no secret sauce required.

We built two tools: one simulates a qubit’s physics exactly (QME) and one judges results fairly and blindly (Validator).
Every run is offline, single-file, deterministic—same inputs give the same bytes, or it doesn’t count.
The simulator logs everything (CSV + JSON + hashes) and refuses to run if the math is shaky (aliasing, bad tolerances, hidden defaults).
The validator blinds A vs B, runs paired stats, and returns PASS/FAIL/NO VERDICT—no wiggle room, no p-hacking.
In tests (50 seeds), the Q-TRACE-style control gave a small but real fidelity bump; not hype, not huge.
ARLiT checks if the “information gain” scales beyond one qubit; if it doesn’t, we say so.
The UI shows what happened, why, and where it failed—clean wins only, noisy nonsense gets blocked.
Next steps: multi-qubit, real hardware, and automated pre-reg/audit—earn broader claims or don’t make them.

------------------------------------------------------------------------------------------------------------------------
Disclaimer: This summary presents findings from a numerical study. The specific threshold values are in the units of the described model and are expected to scale with the parameters of physical systems. The phenomena's universality is a core subject of ongoing investigation.

------------------------------------------------------------------------------------------------------------------------

[Disclaimer: This was written with AI by Jordon Morgan-Griffiths | Dakari Morgan-Griffiths]

This paper was written by AI with notes and works from Jordon Morgan-Griffiths . Therefore If anything comes across wrong, i ask, blame open AI, I am not a PHD scientist. You can ask me directly further, take the formulae's and simulation. etc.

I hope to make more positive contributions ahead whether right or wrong.

© 2025 Jordon Morgan-Griffiths UISH. All rights reserved. First published 24/10/2025.

Defensible or Impossible: A Reproducible Qubit Control Pipeline | DREAMi-QME → DREAMI Validator V2 → ARLIT→ Q-TRACE |

Executive Summary

1.1 Purpose and Claims

1.2 What’s Proven vs. What’s Open

1.3 Key Results (ΔF̄, p, BCa CIs) and Reproducibility Artifacts

System Overview

2.1 Components and Roles: GUTCD → Q-TRACE → DREAMi-QME → DREAMi-Validator V2 → ARLiT

2.2 “Defensible or Impossible” Design Principle

2.3 Threat Model: P-hacking, aliasing, mis-specification

A. P-hacking / Researcher degrees of freedom

B. Aliasing / Numerical artifacts

C. Mis-specification / Apples-to-oranges

D. Overfitting to single-qubit / Lack of scale validity

E. Procedural / Reproducibility failures

Upstream Physics: DREAMi-QME (Lindblad Engine)

3.1 Governing Equation and Notation

3.2 Control Surfaces (\big(H(t), L_i, \gamma_i\big)): parameterization incl. Q-TRACE envelopes

3.3 Numerical Methods (time-stepping, stability, tolerance)

3.4 Deterministic Outputs: (\rho(t)), (F(t)), (\mathcal{F}_\theta(t))

3.5 Convergence Gates and Refusals ( (|\Delta \bar F|\le 10^{-4}), anti-alias, no hidden defaults)

3.6 Forensic Logging: CSV/JSON, SHA-256, manifests

Downstream Statistics: DREAMi-Validator V2

4.1 Label-Neutral Ingestion and Blinding (Arm-L/R)

4.2 Clean-Data Requirements and Refusal Codes

4.3 Primary Analyses: paired Δ, t-test/Wilcoxon, BCa CIs, Cohen’s (d_z)

4.4 Equivalence/Non-inferiority (TOST) and Multiplicity Control (Bonferroni/FDR)

4.5 Verdict Logic: PASS / FAIL / NO VERDICT

Scale Auditor: ARLiT

5.1 Operational Test for Scale-Invariant Effective Information

5.2 QFI Estimation (spectral / finite-difference) and OOS flatness checks

5.3 How ARLiT Validates Q-TRACE Policies Across nnn

Control Policies: Q-TRACE

6.1 Information-Weighted Controls and Threshold Response

6.2 Policy Classes, Envelopes, and Constraints

6.3 Where Q-TRACE Plugs Into QME and What ARLiT Checks

Theoretical Backbone: GUTCD Linkages

7.1 Operators, Stochastic Terms, and Action Sketch

7.2 What Carries Over to Quantum Analogs (and what doesn’t)

Pipeline Mechanics (End-to-End)

8.1 Data Contract: QME → Validator (schema, hashes, seeds)

8.2 Physics-Equivalence Checks (H(t), (\gamma_i) parity)

8.3 Audit Trail and Replay Protocol

Case Studies (Single-Qubit, 50-Seed Synthetic)

9.1 Experimental Design and Pre-registration Notes

9.2 Results: (\bar\Delta), (p), BCa CIs, (d_z); robustness & sensitivity

9.3 Refusals Triggered (if any) and Resolutions

9.4 Lessons: what holds, what breaks

Visualization & Operator-Facing UI

10.1 Evidence Panels (time-series, CIs, effect sizes)

10.2 A/B Race HUD and “kid-friendly” narratives (what it actually shows)

10.3 Failure Visuals: how refusals are surfaced

Reproducibility & Portability

11.1 Single-file, Offline Constraints and Rationale

11.2 Seeds, Hashes, Determinism

11.3 Replication Pack (CSV/JSON/PNG) and Manifest Spec

Validation Circle (Mutual Checks)

12.1 How QME Validates Validator

12.2 How Validator Validates QME

12.3 How ARLiT Validates Q-TRACE (and vice-versa)

Limitations & Failure Modes

13.1 Single-qubit scope and generalization risks

13.2 Numerical pitfalls (stiffness, discretization, aliasing)

13.3 Statistical traps (multiple looks, leakage)

13.4 What would falsify the claims

Applications & Impact

14.1 Quantum control, sensing, and initialization

14.2 Cognitive-quantum modeling (where GUTCD helps)

14.3 Deployment constraints (no-HPC, edge labs)

14.4 Why this beats “demo-ware”

Roadmap

15.1 Multi-qubit extension plan (state space, noise models)

15.2 Real-hardware ingestion and calibration layers

15.3 Automation of pre-reg + continuous audit

Ethics, Openness, and IP

16.1 Data/Sim Integrity, Disclosure, and Authorship

16.2 What is shared vs. protected (and why)

16.3 Dual-use considerations

Conclusion

17.1 What’s defensible today

17.2 What must be earned next

5.3 How ARLiT Validates Q-TRACE Policies Across $n$