# P20 BiquadNEON — Audit **Status:** finding revised. Source is form-agnostic; each font emits the form best for its target. NEON font extension required. **Auditor:** Atlas **Date:** 2026-05-29 (revised same day per Josh's architecture clarification) --- ## Lithos source (`pfuncs/p20-biquadneon.ls`) ``` \ p20-biquadneon.ls -- Biquad NEON (4v) \ Same math as scalar biquad, NEON lanes. biquadneon ⇌ · ++ - · ← ``` Identical chain to scalar P3 (`p3-biquad.ls`) — the comment says *"Same math as scalar biquad, NEON lanes."* Per `quantum.factory.language`: > `biquad ⇌ · ++ - · ← — forward dot product, add state, subtract feedback dot product, store state` That's **Direct Form I** (DF-I): - `·` — forward dot `[b0,b1,b2] · [x,x_1,x_2]` - `++` — add the (running) state vector - `-` — subtract feedback - `·` — feedback dot `[a1,a2] · [y_1,y_2]` - `←` — store `[x_1,x_2,y_1,y_2]` shifted ## Hand-asm (`sixth/qv/lib/p-functions.fs`, `emit-p20`) 19 instructions, 76 bytes. Comment at line 1028 reads: *"TDF-II: y = b0*x + z1"*. ``` LDR S0, [X9, #0] ; b0 DUP V0.4S, V0.S[0] ; broadcast b0 → 4 lanes LDR S1, [X9, #4] ; b1 DUP V1.4S, V1.S[0] LDR S2, [X9, #8] ; b2 DUP V2.4S, V2.S[0] LDR S3, [X9, #12] ; a1 DUP V3.4S, V3.S[0] LDR S4, [X9, #16] ; a2 DUP V4.4S, V4.S[0] FMUL V15.4S, V10.4S, V0.4S ; y = b0·x FADD V15.4S, V15.4S, V16.4S ; y += z1 → y = b0·x + z1 MOV V16.16B, V17.16B ; z1' = z2 FMLA V16.4S, V10.4S, V1.4S ; z1'+= b1·x → z1 = z2 + b1·x FMLS V16.4S, V15.4S, V3.4S ; z1'-= a1·y → z1 = z2 + b1·x − a1·y FMUL V17.4S, V10.4S, V2.4S ; z2' = b2·x FMLS V17.4S, V15.4S, V4.4S ; z2'-= a2·y → z2 = b2·x − a2·y MOV V10.16B, V15.16B ; output = y RET ``` ## Math (TDF-II, as implemented) Per sample, four voices in parallel (V.4S): ``` y[n] = b0·x[n] + z1[n−1] z1[n] = b1·x[n] + z2[n−1] − a1·y[n] z2[n] = b2·x[n] − a2·y[n] out = y[n] ``` State: two NEON registers (z1, z2) per voice-quad. ## Math (DF-I, what the chain says) ``` y[n] = b0·x[n] + b1·x[n−1] + b2·x[n−2] − a1·y[n−1] − a2·y[n−2] state_next = [x[n], x[n−1], y[n], y[n−1]] ``` State: four NEON registers (x1, x2, y1, y2) per voice-quad. ## Register/state trace (hand-asm) - **Params (per buffer, scalar then broadcast to V.4S):** - `V0.4S` ← b0, `V1.4S` ← b1, `V2.4S` ← b2, `V3.4S` ← a1, `V4.4S` ← a2 - **Signal in:** `V10.4S` = x (4-voice input) - **Signal out:** `V10.4S` = y (4-voice output) - **Scratch:** `V15.4S` = y (intermediate, output staged here before MOV to V10) - **Persistent state across callbacks:** `Q16` = z1, `Q17` = z2 (per the `Q16-Q31 persistent` convention) ## Finding (form-agnostic source; font picks the form) **The source chain `· ++ - · ←` reads abstractly: forward path, add running state, subtract feedback, store.** Neither DF-I nor TDF-II is encoded *in the chain itself* — the form is decided by which font compiles it. Per Lithos law L9 (`lithos.laws.language`): > *the font table is the optimizer* And per `lithos.core.language`: > *target invariance — source unchanged across targets; font table varies* So the same `biquad ⇌ · ++ - · ←` source is correct on **all** targets, and each target's font emits the form best suited for it: | Target | Form | Why | Evidence | |--------|------|-----|----------| | arm64 scalar (P3) | DF-I | small scalar state, simple recurrence | `build-factory.py` P3 recipe: 20 insns, DF-I | | arm64 NEON (P20) | TDF-II | better numerical behavior under parameter automation; smaller SIMD state (2 V.4S regs vs 4) | `sixth/qv/lib/p-functions.fs:1015–1041` hand-asm | | WASM | TBD (likely TDF-II) | comparable reasoning to NEON | font not yet emitting | | GPU (AIR/SPIR-V) | TBD | dataflow, may prefer TDF-II for SIMT lanes | not yet built | The forms differ in: 1. **State count and meaning.** DF-I state = past input/output samples (4 state values per voice). TDF-II state = pre-computed intermediate accumulators (2 state values per voice). The PFUNC_META row confirms NEON is TDF-II: `"b0,b1,b2,a1,a2,z1[4],z2[4]"`. 2. **Behavior under time-varying coefficients.** TDF-II's intermediate state is invariant to coefficient changes; DF-I's past-input state is the actual signal history, so a parameter automation ramp produces a different transient. This is why TDF-II is the standard for SIMD audio. 3. **Steady-state equivalence.** Given identical coefficients (b0, b1, b2, a1, a2) and zero state initialization, both forms produce the same output for any input. The f64 numpy reference (Lyra's column) is the arbiter; both target outputs are diffed against it, not against each other. ## Resolution path **Not a source change. A font-extension change.** The NEON arm64 font needs entries (and the right ngram fusion) so that compiling `· ++ - · ←` through it emits the 19-instruction TDF-II blob already in the hand-asm. Specifically: - Recognize the pattern as "biquad" in 4-voice context. - Emit the load+broadcast prologue for 5 coefficients (5× LDR S + DUP V.4S). - Emit the TDF-II body: FMUL → FADD → MOV V.16B → FMLA → FMLS → FMUL → FMLS → MOV V.16B. - Emit RET. The scalar arm64 font already (per P3's Python recipe) emits DF-I from this chain. P3 is the proof-of-concept that one source compiles correctly to multiple forms; P20 is the next target for the same chain — different font, different form, same source. ## Font-table gap (the real work) The NEON arm64 font needs three additional entries (or their ngram equivalents) before this chain can emit the TDF-II blob: | Op | Encoding | Used for | Glyph candidate | |----|----------|----------|-----------------| | FADD V.4S | `0x4E20D400 \| (Vm<<16) \| (Vn<<5) \| Vd` | `y += z1`, the `++` step | `++` (NEON variant) | | FMLS V.4S | `0x4EA0CC00 \| (Vm<<16) \| (Vn<<5) \| Vd` | `z1 -= a1·y`, `z2 -= a2·y` — the feedback subtract | `**--` ngram (vector multiply-subtract, parallel to existing `**++` for FMLA) | | MOV V.16B | `0x4EA01C00 \| (Vn<<16) \| (Vn<<5) \| Vd` | `z1 ← z2` state shift | dedicated opcode or `→ ←` reduction | Documented NEON ops already in `lithos.targets.language`: ``` ⊛ FMUL V10.4S, V10.4S, V0.4S ⊞ FMLA V10.4S, V11.4S, V0.4S (= **++ ngram) ◉ DUP V0.4S, V0.S[0] ΣΣ FADDP V10.4S ⊛₀ FMUL V10.4S, V10.4S, V11.4S ``` `lithos.targets.language` already documents the ngram fusion pattern at 0x80+: ``` 0x83 **++ → FMLA — fused multiply-add 0x84 **-- → FMLS — fused multiply-subtract ``` So `**--` for FMLS is **already specified as an ngram** in the Lithos language — it just doesn't have a V.4S NEON variant in the arm64-dsp-font yet. The work is: 1. Add NEON-variant entries to `arm64-dsp-font.s` (or a sibling `arm64-dsp-neon-font.s`) for: FADD V.4S, FMLS V.4S, MOV V.16B. 2. Ensure ngram fusion (`++ -` → FMLS) works in 4-voice context. 3. Verify the compiler picks the NEON variants when the chain is in a 4-voice composition (likely indicated by `.lion` declaration or by font selection at compile time). **No source change to `pfuncs/p20-biquadneon.ls` required.** The chain is correct as-is. ## Stress / sonic notes (Lyra's column) - **Coefficient automation.** If filter cutoff is modulated, TDF-II glides smoothly while DF-I produces audible micro-clicks. If the port emits a DF-I blob silently swapped under the TDF-II context layout, every automation move clicks. - **State zero-on-activation.** All four NEON state lanes (V16.4S, V17.4S) must be zeroed at slot activation. If the activation path zeros only z1 lanes 0..1 (e.g., from a scalar P3 init code path), voices 2 and 3 boot with stale state → audible thump. - **Denormal handling.** Biquad state can decay toward denormal for low-Q lowpasses on near-silent input. The TDF-II form has a state ramp toward zero that can land in denormal territory; FPCR FZ must be set. Verify the Lithos-emitted prologue matches the Sixth blob on FPCR. - **Saturation guard.** `quantum.principles.language` A1/A7 mentions `|state| > 32 zeroed per callback` as a runaway-feedback guard. The hand-asm doesn't show the guard — it's enforced by the audio callback's post-processing step, not in P20 itself. Port doesn't need to add it, but the audit should confirm the guard still triggers for the new blob's state register layout (Q16/Q17 must be the scanned range). ## Verification criteria (Flux's framing, applied) A port is verified when: - `‖y_scalar − y_ideal_DF-I‖∞ < ε_scalar` — P3 scalar blob output matches the DF-I f64 numpy reference - `‖y_NEON − y_ideal_TDF-II‖∞ < ε_NEON` — P20 NEON blob output matches the TDF-II f64 numpy reference - Both ε within a few ULPs per accumulated FMA. Divergence between targets is allowed (different FMA orderings; different forms under automation); divergence from the form-matched `ideal` flags a font emitting the wrong sequence. ### Oracle bifurcation (per Flux) The f64 reference is **not single-valued for time-varying coefficients**. - **Static-coefficient tests** (impulse, log sweep with fixed cutoff, white noise with fixed Q): one oracle is enough. Same transfer function `H(z)`, same `y[n]` from rest, both forms produce identical f64 output. One numpy reference, both target blobs diffed against it. - **Automated-coefficient tests** (LFO on cutoff, knob ramp during sustain, MIDI CC modulation): the oracle must compute *in the same form as the target*. DF-I oracle for scalar P3; TDF-II oracle for NEON P20. They will diverge from each other for the same coefficient trajectory — that divergence is the physical reason TDF-II exists, not a bug. Lyra's reference generator therefore needs two functions: ```python def biquad_df1(x, b, a, state): ... # 4-word state, for scalar/P3 oracle def biquad_tdf2(x, b, a, state): ... # 2-word state, for NEON/P20 oracle ``` Same `b, a` coefficient sequence + same `x` → divergent `y` under modulation. Expected. ## Verification status - [x] Source `.ls` read. - [x] Hand-asm read and decoded. - [x] Math recovered for both forms (DF-I scalar, TDF-II NEON). - [x] Register/state trace. - [x] Form-agnostic source confirmed. - [x] Font-table additions specified (FADD V.4S, FMLS V.4S, MOV V.16B) with concrete encodings above. - [ ] NEON font extension implemented (`arm64-dsp-font.s` or sibling). - [ ] Compile through Lithos compiler with NEON font → produce blob. - [ ] Blob output diffed against f64 numpy reference (Lyra's `y_ideal`). - [ ] Stress sequence: impulse → log sweep → cutoff-modulated white noise. The third stresses the TDF-II's automation-friendliness directly. ## Next step 1. **Atlas (me)**: specify the NEON font extension as a concrete patch — the three new entries with their 8-byte blobs, the ngram fusion rule for `++ -` → FMLS V.4S, and the prologue/epilogue conventions. Output: a font-extension proposal at `pfuncs/audit/font-extensions/neon-biquad.md`, referenced from this audit. 2. **Lyra**: produce two reference streams per coefficient trajectory: - `pfuncs/audit/refs/p20/y_ideal_tdf2.f32` — TDF-II f64 oracle (for NEON blob diff) - `pfuncs/audit/refs/p3/y_ideal_df1.f32` — DF-I f64 oracle (for scalar blob diff) Stimuli: impulse, log sweep with fixed lowpass at 1 kHz Q=0.707 (static), then cutoff-modulated white noise 200 Hz → 8 kHz over 1 sec (automation). The third stresses the form difference directly. 3. **Flux**: confirmed — TDF-II equations transcribed correctly, oracle bifurcation noted. ## Cross-spec inconsistency to fix (per Flux) - `quantum.factory.language` says: *"(P3, biquad — IIR filter Direct Form I; Lithos: biquad ⇌ · ++ - · ←)"* — names wrong form. - `quantum.wave.language` says: *"(topology, transposed direct form II — 7 f32: b0 b1 b2 a1 a2 z1 z2)"* — names correct form, matches state count. Two `.language` files disagree on which form. The wave file is right (state matches running code). Factory file needs a surgical edit. Out of scope for this audit page; logged as a follow-up.