The briefing document is assembled. Returning the full markdown below.
Decision Briefing — 2026-06-13
This document consolidates the open decisions currently on Zach's desk into a single, exhaustive decision-ready briefing. Each section gives the full background, every realistic option with pros/cons/effort/cost, a comparison table, a concrete recommendation with reasoning, and what deciding unblocks (with next steps and relevant file paths). Nothing has been trimmed — this is the full picture so you can decide with complete context. Where a section has many sub-decisions (e.g. the per-vendor BAA matrix), a recommended default is given for each row so you can either pick row-by-row or simply say "go" to take all recommended defaults. The interactions between these decisions are summarized at the very end.
At-a-glance recommendationsv
| Decision | My recommendation | One-line why |
|---|---|---|
| 1. cortextOS Monetization Pilot — Go / No-Go | Go — Option A: narrow no-PHI "Practice Intelligence" pilot, Buffalo = validation/waitlist, full fleet behind Gates 0-3 | Buys the one piece of evidence you lack (will an outside clinic pay + run it) for ~$2-3K instead of betting an MSP build on faith |
| 2. Athena BAA matrix + name-fix gate | Take all reconciled defaults: sign PLAUD BAA + clinic account, remove Drive/Supabase-names/Telegram-selection/TTS/iCloud from PHI path, Vertex-under-BAA for summaries, build the gem as a suggestion engine with a human gate | Ships the real win (correctly-named consult summary) while being structurally incapable of silently writing the wrong patient to a chart |
| 3. Notification routing | Option D: Hybrid — Apollo direct, six workers via Sage, dashboard carries routine, plus a fleet-wide lifecycle-ping rate limit | Only option that honors all three hard constraints (Apollo separate, approvals never buried, Telegram = decisions/alerts) and makes the 709-flood architecturally impossible |
| 5. Fork strategy | Option C: Selective cherry-pick with a standing weekly upstream-watch and a hard rule that security fixes are always pulled | Keeps the differentiated 360-commit daemon while closing the exact gap that caused Forge to re-invent PR #621 |
Decision 1: cortextOS Monetization Pilot — Go / No-Gov
The question on the table: Do you commit this week to selling a narrow, no-PHI "Practice Intelligence" pilot (~$2,500-$5,000 for 30 days, converting to ~$500-900/mo) to a tiny set of warm TTC owner-operators — using Buffalo (July 18-19) as a validation/waitlist event — before building any of the full managed "Practice Fleet" MSP machinery? Or do you go bigger, smaller, or differently-shaped?
This is a scope-and-sequencing decision, not a yes/no on whether cortextOS can ever be a product. Nobody in the three memos disputes that you have a real, rare wedge. The disagreement is entirely about how much to build, how much to charge, and how fast — before you have a single dollar of outside-clinic evidence.
Background (deep)
Why this even came up. You run a working, 24/7 autonomous multi-agent fleet that operates a seven-figure chiropractic practice. That is genuinely unusual. The instinct — well-founded — is that other cash-practice chiropractors would pay to have the same thing. Buffalo (Talsky Tonal Chiropractic Training Workshop, Buffalo WY, July 18-19, 13 CE) puts you physically in front of a room of your exact ICP who already trust you as a practitioner. The forcing function is real: a warm room, five weeks out.
The three documents and how they diverge.
- 1. The bullish memo (
cortextos-monetization-memo.md). Recommends going straight to "The Practice Fleet": an $8,500 one-time install (founding $6,500), $1,750/mo managed retainer (6-mo min, $18K annual prepay), $2,500 non-refundable deposit to book a slot, client pays their own Claude Max + API. Delivery = a Mac mini M4 per clinic, joined to your Tailscale tailnet, code shipped from private GitHub Packages gated by Keygen license keys, with Stripe-webhook-driven revocation of the PAT + tailnet node on non-payment. Plus a parallel "Founders' Circle" ($25K/yr, 10-seat cap, cross-practice benchmarking). Projects ~$105K first-year cash at 5 clients, ~$380-400K at 15. Treats the gated repo as plumbing, not a product. Timeline: first $2,500 deposit by ~day 21, 3-5 deposits at Buffalo.
- 2. The Codex adversarial review (
cortextos-monetization-codex-review.md). Verdict: pivot. Kill the full managed fleet as scoped. Three load-bearing objections:
- - WTP evidence is thin. True first-year cost is ~$28-32K (install + $21K retainer + Mac mini + Claude + API + staff time), not "$1,750/mo." The memo anchors on fractional-COO ($5-12K/mo) and AI-agency comps — consultant anchors. Chiropractors actually budget at ChiroHD (~$299/mo + $799 setup) and GoHighLevel (~$97/mo, HIPAA a separate $297/mo add-on) tiers. Industry economics: avg collections ~$450K, avg DC comp ~$142K, CA salary ~$41K, median DC pay $79K. A $21K/yr retainer = ~half a chiropractic-assistant salary. "One care plan pays the month" is revenue, not profit, and attribution is not automatic. Zero outside-practice case study exists with a baseline, intervention, and collected dollars.
- - The "no-PHI" story collapses the instant the fleet touches ops. HHS business-associate functions explicitly include claims processing, billing, data analysis, practice management, and administrative services involving PHI — almost exactly the proposed agent surface (front desk, stats, billing, insurance audit, reactivation, patient messaging). If the system creates/receives/maintains/transmits PHI on a clinic's behalf, you are a business associate (BAA, safeguards, breach response, subcontractor obligations). Anthropic's HIPAA/BAA path is sales-assisted Enterprise (20-50 seat minimums) — not "client pays $200/mo for Max." Telegram has no BAA. Wrong-patient output = presumed breach. Billing/care-plan agents = a second liability channel (payer disputes, recoupment, practicing-without-license).
- - The delivery stack is an MSP disguised as a product. Mac-mini-per-client = N snowflakes (Apple IDs, OS updates, disk failures, TCC, bot tokens, Node versions, EHR UI changes — the 6/08 brew/node26 incident that downed an agent 19h is the proof-class). Tailscale free tier is more limited than stated; one operator tailnet = cross-clinic blast radius. GitHub Packages npm auth is classic-PAT-only (a secret-management surface, not a moat). Keygen CE puts HA/backups/security on you, ships ~every 6 months. At 10-30 clients, support dominates: ~1 issue/clinic/week × 60-90 min = 30-45 hrs/week before sales/installs/roadmap/your own clinic. Deepest risk: you buy yourself a worse job — the "moat" is your labor.
- 3. The reconciled memo (
cortextos-monetization-FINAL-reconciled.md). This is the synthesis and the recommended shape. It keeps the wedge (live unfakeable demo + warm TTC distribution) but pivots the scope and timeline. Sell a no-PHI, read-only, owner-only "Practice Intelligence" copilot — stats/KPI digests on aggregates the owner already exports, the proven VE content/marketing engine, and an owner research copilot. Explicitly excluded: billing, insurance audit, claims, coding, patient messaging, reactivation list handling, care-plan language, anything that touches a patient record or sends to a patient. Price to the de-risked scope: $2,500-$5,000 for a 30-day pilot with ONE written, measured ROI target, converting to $500-$900/mo if it hits. No retainer/install/deposit-on-unbuilt-MSP. Buffalo = validation + refundable $250-500 reservations + 1-2 design-partner pilots + objection harvesting, not a close. The full fleet is the earned destination behind four gates (Gate 0: paid no-PHI pilots prove pay+stay+low support → Gate 1: one outside clinic operationally proven in 30 days → Gate 2: compliance architecture built before ops scope → Gate 3: Keygen/GitHub/Tailscale MSP machinery only when license enforcement is an actual bottleneck).
The single decisive fact: there is currently no outside-practice evidence at all — not on willingness-to-pay, not on install effort, not on support load, not on whether a non-Zach clinic can even run this without you babysitting it. Every dollar of MSP infrastructure spent before that evidence exists is spent on faith. The reconciled play is explicitly designed to buy that evidence cheaply before committing to the heavy machine.
What's already true in your favor (so we don't over-correct into "do nothing"):
- - The demo is unfakeable and the marginal cost to show it is zero.
- - Distribution is warm: TTC network, the Buffalo room, Dr. Saylor's DICCP referrals, Apollo as a real second install.
- - The no-PHI half (stats digest + content engine) already runs at VE and is already PHI-free — this is assembly and sales, not construction.
- - Founder-fit (chiropractor selling to chiropractors with a working system) is the entire thesis and is defensible.
Options
Option A — Narrow no-PHI "Practice Intelligence" pilot (the reconciled play)
How it works: Sell a read-only, owner-only, drafts-not-sends copilot — aggregate stats digest + the proven VE content/marketing engine + owner research assistant. $2,500-$5,000 for 30 days with one written, measured ROI target (content-hours-replaced or marketing-attributable bookings). Converts to $500-$900/mo on success. Buffalo = live demo + refundable $250-500 reservations + recruit 1-2 design partners + harvest objections. Build only Stripe Payment Links + an attorney-reviewed no-PHI contract now. The full fleet sits behind Gates 0-3. Legal floor: separate LLC, attorney-reviewed agreement with hard no-PHI clause + practicing-without-license line, E&O + cyber quotes.
| Pros | Captures the wedge while avoiding the business-associate trap entirely (no PHI surface). Near-zero support — no per-clinic Mac mini, no EHR scrape, no claims pipeline to break at 2am. WTP gets tested for real ($2,500 cash, not a hand-raise on vapor). Fast to prove a clean dollar. Pre-Buffalo window goes to the Loom + pilots + proof, not revocation theater. Protects your time at the seven-figure clinic (the constraint that outranks revenue). Fails cheaply if it fails. |
| Cons | Lower ceiling than the full fleet ($500-900/mo vs $1,750/mo + install). Leaves the heavy revenue on the table for now. "Owner copilot" is a narrower story than "headcount replacement" — less dramatic on stage. Still requires legal + LLC + E&O spend (~$2-3K) before a non-Zach dollar. Conversion from pilot to subscription is unproven. |
| Effort | Low-moderate. Loom + ROI metric template + Stripe links + one attorney pass + 8-name outreach. No infra build. |
| Cost | ~$2-3K attorney review + E&O/cyber quotes. Stripe fees. Effectively no infra cost. |
Option B — Go straight to the full managed Practice Fleet (the bullish memo, as written)
How it works: Sell $6,500-$8,500 install + $1,750/mo retainer, Mac-mini-per-clinic, Tailscale tailnet, Keygen-gated private-repo distribution, Stripe-webhook revocation. Collect non-refundable deposits at Buffalo. Build the full MSP stack in the next five weeks. Run Founders' Circle in parallel.
| Pros | Highest revenue ceiling (~$105K yr-1 at 5 clients, ~$380K at 15 if it works). One commitment, no staged hesitation. The deposit-collection-at-Buffalo motion is dramatic. |
| Cons | Walks straight into business-associate/HIPAA exposure the instant any ops-touching agent runs — the proposed surface (front desk, billing, insurance audit, reactivation, messaging) is almost verbatim the HHS BA function list. Claude Max does not clear HIPAA; Telegram has no BAA; wrong-patient output = presumed breach. Builds the entire MSP machine on zero outside-clinic evidence. Support load (30-45 hrs/wk at 10-30 clients) becomes a worse job than running VE. Five-week timeline to build legal + Stripe + Keygen + GitHub Packages + Tailscale + Loom + case study + outreach + close is a compression fantasy. Non-refundable deposits on an unbuilt, undelivered MSP are a liability, not a win. |
| Effort | Very high. Full MSP build + legal + sales + first install all compressed into 5 weeks while running VE. |
| Cost | High and front-loaded: legal (likely well above $3K once HIPAA/BAA enters), Keygen/infra ops, contractor time, plus the unpriced cost of your attention and liability. |
Option C — Do nothing now (defer monetization entirely)
How it works: Show the demo at Buffalo as pure credibility/brand, collect names informally, decide later. No SKU, no pilot, no contract.
| Pros | Zero risk, zero spend, zero distraction from VE/TTC. Buffalo still builds your reputation. You keep optionality. |
| Cons | Wastes the single highest-signal, lowest-cost market-validation event you will ever get — a warm room of your exact ICP, five weeks out, that won't reconvene cheaply. You still won't have outside evidence afterward, so you'll be in the same place in August. A hand-raise without a price ask is near-worthless signal. |
| Effort | Minimal. |
| Cost | Opportunity cost only — but a real one. |
Option D — A different shape: "info-product / cohort first, software later"
How it works: Instead of selling the software, sell the knowledge — a paid workshop/cohort ("how I run my practice on an AI fleet") or a Founders'-Circle-style mastermind, decoupled from any managed install. Software access becomes a later upsell once you've validated who actually wants to operate it.
| Pros | Even lower liability than Option A (you're selling teaching, not a tool that touches a clinic). Tests whether the interest converts to dollars without any delivery burden. Plays to your CE/teaching strength and the Buffalo CE context. Cross-practice benchmarking (the Founders' Circle asset) is genuinely unique. |
| Cons | A cohort still needs content built and CE accreditation is a 6-9 month drag if you want CE credit. Doesn't validate the software WTP, which is the actual question. Risks positioning you as a coach/consultant rather than a software founder. Higher ongoing facilitation burden than a near-passive copilot subscription. Doesn't use the live demo as a product — only as a teaching prop. |
| Effort | Moderate-high (curriculum + facilitation), recurring. |
| Cost | Low cash, high time. |
Comparison table
| Dimension | A — Narrow no-PHI pilot | B — Full managed fleet | C — Do nothing | D — Info-product/cohort |
|---|---|---|---|---|
| Revenue ceiling (yr 1) | Low-moderate ($500-900/mo/client) | High ($105-380K) | $0 | Moderate ($85-245K if cohort fills) |
| HIPAA/BA liability | None (no PHI surface) | High (BA trap) | None | Very low (teaching only) |
| Support burden | Near-zero | Severe (30-45 hr/wk at scale) | None | Moderate (facilitation) |
| Time-to-first-dollar | ~3-4 weeks (pilot) | Unrealistic in 5 wks | n/a | Weeks (deposits) |
| Validates software WTP? | Yes (the core question) | Assumes it | No | No |
| Threat to VE focus | Low | High | None | Moderate |
| Pre-Buffalo build needed | Stripe + 1 contract | Full MSP stack | None | Curriculum |
| Fails cheaply? | Yes | No | n/a | Yes |
| Upfront cost | ~$2-3K legal | High + liability | $0 | Low cash/high time |
Recommendation
Go with Option A — the narrow no-PHI "Practice Intelligence" pilot, with Buffalo framed as validation + waitlist, and the full fleet sequenced behind Gates 0-3. Clear go.
This is the correct pick for four reasons:
- 1. It answers the actual unknown. The one thing you do not have is evidence that an outside clinic will pay, install easily, and run this without you babysitting it. Option A is engineered to buy that evidence for ~$2-3K and a Loom. Option B spends tens of thousands and your attention assuming the answer.
- 2. It respects the constraint that outranks revenue: your time at VE. The deepest risk in the bullish memo is real — scoped broad enough to charge $1,750/mo, the fleet creates HIPAA liability and pager-duty support; scoped narrow enough to be safe, the price falls. Option A takes the narrow, safe, low-support half on purpose and prices honestly to it.
- 3. It keeps every dollar of upside alive without betting on it. The full Practice Fleet is the destination, not deleted — it's behind gates you only fund after proof. You lose nothing by sequencing; you only lose if you build the MSP on faith and it turns out chiropractors won't run it.
- 4. It uses Buffalo correctly. A non-refundable deposit on an undelivered MSP is a liability you'd be obligated to deliver against with zero proof you can. A refundable reservation + a counted show of hands at "$2,500 for a measured pilot" + 1-2 design partners + a written objection list is higher-quality signal at lower risk — and the objection list is literally your v2 requirements doc.
Avoid B (premature, liability-laden, timeline-impossible). Avoid C (squanders the cheapest validation you'll ever get). Hold D as a possible later layer — the Founders'-Circle benchmarking asset is genuinely unique and worth revisiting at Gate 1+, but it doesn't validate software WTP, which is the question now.
One caution even on A: do not let the no-PHI pilot quietly creep into ops ("can it just also peek at our reactivation list?"). The moment it touches a patient record, the no-PHI clause is void and you are in Gate-2 (compliance architecture) territory. Hold that line contractually and technically.
What deciding unblocks + concrete next steps
Deciding "Go on Option A" this week unblocks:
- - Sage to build all assets to the narrowed v1 frame (instead of waiting or building toward the wrong SKU).
- - The Buffalo slot to be positioned correctly (validation booth, not deposit booth) with five weeks of runway.
- - The legal clock to start (attorney review + E&O/cyber quotes take real lead time).
- - Scribe and Forge to get scoped, parallel delegations.
Concrete next steps (this week), in order:
- 1. Confirm the v1 frame. Reply "go on the narrow pilot" and Sage builds to it. (One sentence from you.)
- 2. Record the 12-min demo Loom today — your real Telegram thread with Sage + the VE-STATS dashboard + the content pipeline, framed around owner intelligence + content, not billing/front-desk. Delegate the screen-record edit to Scribe.
- 3. Define ONE pilot ROI metric + a baseline-capture template (content hours replaced or marketing-attributable bookings). This is the WTP test instrument. Sage drafts, you approve.
- 4. Draft the no-PHI pilot agreement + get an attorney quote (~$2-3K): scope limited to read-only / no-PHI / owner-only / drafts-not-sends; hallucination + accuracy disclaimers; hard no-PHI clause; practicing-without-license line; best-effort uptime. Get E&O + cyber quotes in parallel. Form/confirm a separate LLC for the software venture.
- 5. Stand up Stripe Payment Links only (pilot fee + refundable Buffalo reservation). Do NOT build Keygen / GitHub Packages / Tailscale gating — explicitly deferred to Gate 3.
- 6. Build the 8-name warm list and send the personal Loom with the narrowed ask: "30-day Practice Intelligence pilot, $2,500, one measured ROI target, no patient data touched. Taking 2 design partners before Buffalo." Target: 1-2 design-partner pilots committed.
- 7. Lock the Buffalo demo slot + pre-stage a refundable-reservation QR + waitlist form — validation and signal, not non-refundable deposits.
- 8. Write down every HIPAA / liability / staffing objection you hear in outreach and at Buffalo — that list decides whether v2 (ops scope) is ever worth building.
The gate that governs everything after: Gate 0 passes only if each pilot pays, each hits its ROI target, and support stays under ~1 hr/client/week. If Gate 0 fails, you stop — and you've saved yourself the MSP job for the price of a Loom and one contract. If it passes, Gate 1 (one outside clinic operationally proven in 30 days) is the non-negotiable bar before any ops/PHI scope, and Gate 2 (full compliance architecture) must be built before — never retrofitted under pressure after — you add billing/front-desk/insurance.
Relevant files (all absolute):
- -
/Users/nightsage/cortextos/orgs/riles-fleet/agents/sage/local/notes/cortextos-monetization-FINAL-reconciled.md(the recommended shape) - -
/Users/nightsage/cortextos/orgs/riles-fleet/agents/sage/local/notes/cortextos-monetization-memo.md(the bullish full-fleet case) - -
/Users/nightsage/cortextos/orgs/riles-fleet/agents/sage/local/notes/cortextos-monetization-codex-review.md(the adversarial critique)
DECISION 2: Athena Office-Integration BAA / Account Decisions + the PLAUD→ChiroHD Name-Fix Gatev
This decision unlocks (or blocks) the entire Athena build. Athena is the staff-facing front door into Apollo's VE clinic brain. Before a single byte of new patient PHI flows, you have to make a set of per-vendor BAA/account calls and decide the design of the one risky gem (the PLAUD→ChiroHD patient-name fix). Codex flagged this gem as a clinical-record-integrity risk, not a normal automation bug. Everything below is decision-ready: pick a default per row, or say "go" to take all the recommended defaults.
Background (deep)
What Athena is. Athena = a friendly staff-facing interface (Telegram/voice bot + persona) that drops every captured item into Apollo's existing classify-and-route pipeline. Apollo = the clinic brain (presents to Dr. Saylor as "Apollo", identity locked 6/06). Sage = your orchestrator. Athena is NOT a separate brain in v1.
The proposed intake. Three streams, one brain:
- - Stream A — PLAUD recorder in Room 7. Captures every Day-1 new-patient consult and Day-2 report-of-findings. This is the high-value stream.
- - Stream B — iCloud / Voice Memos. Staff and doctor dictated notes via a shared iCloud folder + a dedicated clinic iPhone.
- - Stream C — Athena Telegram/voice bot. Ad-hoc staff requests.
The "buildable-now gem" (§3.4 of the blueprint). A PLAUD recording's start_time is matched against the day's ChiroHD new-patient calendar slot to attach the patient identity from the calendar (never from the hallucination-prone transcript, which turned a real 6/12 test consult into "Christina"). Then it files a VE-branded summary and emails you an eyes-on report. The original blueprint proposed auto-filing a HIGH-tier match (one candidate within 15 minutes).
What the adversarial Codex review tore apart (all of which I agree with and which reshapes the build):
- 1. This is a multi-vendor PHI workflow, not a "private in-office system." The same document that says "metadata only / PHI local" routes PHI or PHI-adjacent identifiers through PLAUD Cloud, iCloud, Google Drive, consumer Gmail/email, Telegram, Supabase, and a cloud LLM. The BAA/data-flow design must come BEFORE code, not after Phase 1. This is decisive.
- 2. The timestamp-matcher is a clinical-record-integrity risk. Auto-attaching a patient identity from appointment time, then filing/emailing it, means a wrong match contaminates the wrong chart and discloses PHI. Clinics run late, new patients overlap, recordings get left running, walk-ins happen, the scrape goes stale. A 5-day Zach-graded shadow week is not enough statistical evidence to license auto-filing PHI. This is the single most important finding. It does NOT kill the gem — it kills the auto-file default.
- 3. Supabase is an un-acknowledged PHI sink. Writing patient full name (and maybe DOB), appt type, provider, room into
ve_appointmentsis PHI. The blueprint never lists Supabase in its sanctioned PHI zones and never mentions a Supabase BAA, retention, access control, or audit. Largest single compliance miss in the document.
- 4. PLAUD Cloud is a launch blocker, not "Phase 1 paperwork." New consults must not be captured to PLAUD Cloud until the BAA is signed AND the account is clinic-owned, not bound to your personal Gmail PLAUD account. The same fix retroactively covers all 413 legacy files already sitting in PLAUD Cloud.
- 5. iCloud cannot be waved through as "transient." Delete-after-copy does not erase provider-side retention, device backups, sync conflicts, Recently Deleted, Spotlight indexes, or staff-owned-account exposure across personal Apple IDs. No BAA ⇒ not a PHI intake path.
- 6. Drive / consumer Gmail / Telegram are used in the build plan while still unresolved.
gwsis currently authed to your personal account; consumer Gmail ([email protected]) has no BAA; "Patient A / Patient B + slot" buttons over Telegram can themselves be PHI in a small clinic. A privacy-disabled Telegram clinical group is reckless.
- 7. PLAUD "verified end-to-end" is overstated. The 6/12 test verified auth, list/detail reads, and regenerate on an already-summarized file. First-generation transcription on a brand-new file is explicitly unknown; the raw-audio presigned-S3 capture (Path B, the better HIPAA path) is not proven.
- 8. Genuinely missing pieces: no patient recording consent/notice workflow; no real HIPAA risk analysis; no human clinical review gate before an AI summary becomes a record; no wrong-file correction/retraction playbook; no clean personal-vs-clinic account boundary.
Where Codex over-reaches (my pushback, already reconciled): "No cloud LLM at all in v1" is too austere — the summary step can run on a BAA-covered cloud LLM (Vertex under GCP BAA) once the matrix is signed, with local Ollama as fallback. And "first 150 chars is obviously PHI" is true for dictated notes, but the fix is to send zero transcript bytes on the bus (path-only envelope), not to abandon the envelope design. The blueprint's structural safeguards Codex glosses over — forward-only watermark fencing the 413 legacy files, owner-only 0700 vault, Apollo-as-sole-executor, AMBIGUOUS-never-auto-files — are sound and worth keeping.
Hard fact to remember: real PHI from the 6/12 test is already sitting untracked in agent dirs (sage/local/plaud-test-run/, argus/local/plaud-ui-map/) with no audit trail, and the 6/12 branded-summary test used a no-BAA Gemini AI Studio key on real PHI. Both need cleanup regardless of which way you decide.
Options
The decision splits into two parts: (A) the per-vendor BAA matrix (Gate 0 — blocking), and (B) the name-fix human-confirm gate design.
Part A — The per-vendor BAA matrix
For EACH vendor in the PHI path, the answer is exactly one of three: Sign BAA / Remove from PHI path / Formally prohibit from PHI. Nothing in v1 captures, stores, or transmits new patient PHI until this is signed off.
| Vendor | Role in path | Option 1: Sign BAA | Option 2: Remove from PHI path | Option 3: Formally prohibit | Recommended | Blocking? |
|---|---|---|---|---|---|---|
| PLAUD Cloud | Holds audio/transcripts/summaries of consults | Sign PLAUD BAA + move device to clinic-owned PLAUD account (off personal Gmail). Closes exposure on all 413+ files. Effort: low (PLAUD is BAA-capable). Cost: likely free/included. | Use Path B (raw audio → local faster-whisper), PLAUD becomes a dumb capture device, no PHI to its cloud AI. Effort: medium (Path B unproven). | N/A — you need the recorder | Sign BAA + clinic account (and still pursue Path B as the production transcription path for defense-in-depth) | YES |
| Google Workspace / Drive | Proposed PHI summary destination | Upgrade clinic to Google Workspace (~$7/user/mo) + sign Workspace BAA, re-auth gws to clinic account. Effort: medium. Cost: ~$7-21/mo. | Skip Drive as a PHI destination for v1; deliver summaries via a BAA channel and have staff file into ChiroHD manually. Effort: zero. Cost: $0. | N/A | Remove from PHI path for v1 (revisit Workspace upgrade when Drive earns its place). Re-auth gws to clinic account before any Drive router ever goes live. | Only for the Drive leg |
| Supabase | Proposed to store ve_appointments incl. patient name | Sign Supabase BAA (Pro/Team plan required) + add access/retention/audit policy. Effort: medium-high. Cost: ~$25+/mo. | Store de-identified slot rows only (date, start, duration, appt_type, provider, room, opaque appt_id); fetch patient name live at confirmation time from read-only ChiroHD into the vault, never persisted to Supabase. Effort: low. Cost: $0. | N/A | Remove name from PHI path (de-identified slot rows). If you'd rather keep names in Supabase, that requires the BAA + policy first. | Only for the schedule-store leg |
| Telegram | Proposed for "Patient A/B + slot" selection buttons + Athena bot | No BAA available (Telegram Bot API). Cannot sign. | Do patient confirmation OFF Telegram — use a local/vault-backed confirm surface. Effort: low. Cost: $0. | Formally prohibit PHI bodies on Telegram; treat it as a non-BAA control plane only (initials + opaque id at most, never full name + clinical context, never a privacy-disabled group). Effort: zero. | Remove patient-selection from Telegram + formally prohibit PHI bodies on the channel. No privacy-disabled clinical group. | Only for the selection leg |
| Cloud LLM (summary) | Renders the VE-branded summary | Move to Vertex AI under a GCP BAA (smallest change from prompt v2). Effort: low-medium. Cost: pennies/summary. | Use local Ollama as the zero-egress fallback if quality holds. Effort: low. Cost: $0. | N/A | Vertex under GCP BAA, Ollama as zero-egress fallback. Log the 6/12 no-BAA Gemini touch as a one-time documented incident. | For the summary leg |
| OpenAI TTS | Proposed Athena voice output | Sign OpenAI BAA — overkill for v1. | Cut entirely from v1 (text-only Athena, deferred anyway). Effort: zero. Cost: $0 saved. | Formally prohibit PHI in any TTS payload if ever added. | Cut entirely from v1 (extra vendor, extra leak path, near-zero value). | Deferred regardless |
| iCloud / Voice Memos | Stream B capture buffer | No consumer BAA available. | Defer Stream B entirely from v1 (it's already deferred). When built, requires a clean clinic-owned device + written device policy. Effort: zero now. | Formally prohibit as a PHI intake path until a clinic device policy + risk acceptance is documented in writing. | Defer Stream B from v1; formally prohibit until clinic-owned device + written policy. | Deferred regardless |
| Consumer Gmail / email-of-record | Report delivery | No BAA on consumer Gmail. Use the clinic Workspace email once BAA'd. | Deliver only via a BAA-covered channel; you remain a recipient as a workforce member but the account-of-record must be clinic-owned. Effort: low. | Prohibit consumer Gmail as PHI delivery. | Clinic-owned account-of-record for delivery; you stay a recipient. | For the delivery leg |
Part B — Cross-cutting account & consent decisions
| # | Decision | Recommended default | Blocking? |
|---|---|---|---|
| 6 | Personal vs clinic accounts (PLAUD, Apple ID, Gmail, report recipient all currently personal) | Clinic-owned accounts for every PHI-touching service before capture. You stay a recipient; accounts-of-record become clinic-owned. | YES |
| 7 | Patient recording consent / notice (none exists today) | Required before Room 7 is always-on. Draft a one-paragraph notice + intake-form line + a Michigan one-party-consent check (Dr. Saylor's clinical call). I can draft all three for her sign-off. | YES — for capturing real consults |
| 8 | New-patient appointment types | Human step: you/Christina enumerate them once into config. | Setup step |
Part C — The name-fix human-confirm gate (the gem design)
| Approach | How it works | Pros | Cons | Effort |
|---|---|---|---|---|
| Original: auto-file HIGH-tier | One candidate within 15 min → auto-file + email, no human in the loop | Fully hands-off | Wrong match silently contaminates a chart + discloses PHI; 5-day shadow can't statistically license it; Codex's cardinal risk | low (but unsafe) |
| RECOMMENDED: suggestion engine + human gate | Matcher outputs a candidate + confidence tier, never a filed record. HIGH/MEDIUM/AMBIGUOUS all route to one human-confirm step in v1. A workforce member sees "calendar says X / transcript thought Y / window Z" and confirms or corrects. The summary is a draft for Dr. Saylor's review, not an auto-committed chart artifact. | Structurally incapable of writing the wrong patient silently; keeps all the value (right name on right summary); reversible | One human tap per recording (3-person workforce, low volume) | low-medium |
| Most austere (Codex's #2): local-vault-only, no match | Just store transcripts locally, no identity attach | Zero PHI egress | Throws away the actual value of the gem | low |
The recommended gate has six concrete rules: (1) no auto-file on timestamp alone, ever in v1 — HIGH/MEDIUM/AMBIGUOUS all hit the same human-confirm step; (2) calendar identity, never transcript — name comes from the ChiroHD slot, name-fuzz is a booster only, filename uses PLAUD's exact timestamp never a transcribed name; (3) human-confirm gate — each recording surfaces "calendar says X / transcript thought Y / window Z" to a workforce member; (4) longer shadow, measured properly — materially longer than 5 days, graded on misattribution count under real late-running/overlap/walk-in variance, not a thumbs-up; (5) reversibility from day one — ship the wrong-file correction playbook (retract, amend, notify, preserve audit) with the first delivery; (6) stay inside the matrix — the gem only goes live for the legs the BAA matrix has cleared.
Recommendation
Take all the reconciled defaults. Concretely:
- 1. Gate 0 first (BLOCKING): produce and sign off the one-page PHI data-flow inventory + BAA matrix. No new-PHI capture until it's signed.
- 2. BAA matrix calls: Sign PLAUD BAA + clinic-owned PLAUD account (blocking). Remove Drive, Supabase-names, Telegram-selection, OpenAI TTS, and iCloud from the PHI path for v1. Use Vertex under GCP BAA for summaries with Ollama fallback. Clinic-owned accounts-of-record for everything PHI-touching (blocking). Draft the patient-recording consent notice for Dr. Saylor (blocking for real consults).
- 3. Build the name-fix gem first — as a suggestion engine with a human gate, NOT an auto-filer. It is still the right first build: highest-value, most-constrained slice, endpoints mostly proven. Change exactly one thing from the blueprint: a human confirms identity before anything is filed or delivered.
Why this pick: It ships the real win (a correctly-named consult summary instead of a transcript hallucinating "Christina") while being structurally incapable of silently writing the wrong patient to a record, because a human stands between the match and the chart. It's honest about compliance — every vendor in the path has a decision, not a "residual risk" hand-wave. And it de-risks the PHI already sitting untracked in agent dirs. The blocking items (PLAUD BAA + account, clinic accounts-of-record, consent notice) are cheap and close real current exposure; the non-blocking legs (Drive, Supabase names, Telegram selection, cloud TTS) are simply removed from v1 so the local-vault loop can proceed immediately without waiting on them.
Defaults 1 (PLAUD), 6 (clinic accounts), 7 (consent) are blocking for capturing real PHI. Defaults 2-5 (Drive, Supabase, Telegram, LLM/TTS) are blocking only for the specific cloud/external leg they govern — the local-vault loop can proceed without them.
What deciding unblocks + concrete next steps
Deciding this unblocks the entire conservative v1 build: the PHI vault, the forward-only Room-7 intake, the name-fix suggestion engine, and human-confirmed filing. It also legitimizes cleanup of PHI already at rest.
Immediately unblocked (no further approval needed once you say "go"):
- - Step 1 — PHI vault + guardrails (~half day, local, no approval): create the chmod-700 vault tree (
apollo/local/phi/{inbox,work,filed,quarantine,audit,state}); confirm.gitignore/git check-ignore, FileVault, Time Machine exclusion; add the pre-commit hook that hard-fails any staged PHI path orX-VE-PHI: truesentinel; write the hash-chained audit logger + crash-resumableitems.jsonstate store; migrate the existing 6/12 PHI test artifacts into the vault with audit entries (real PHI is already untracked in agent dirs — de-risk first). Cache PLAUD token 0600. - - Step 2 — Prove the two PLAUD unknowns on NON-PHI audio only: on a throwaway Room-7 recording of you counting, test (a) does
retry_sum_notefire first-generation transcription on a never-summarized file, and (b) can raw audio be pulled via presigned S3 for local faster-whisper (Path B, the production target). 30-second human step: record yourself counting in Room 7.
Unblocked after the blocking BAA/account items are signed:
- - Step 3 — Room-7 intake into the local vault (forward-only watermark = max start_time of the 413 files, unit-tested so backfill is structurally impossible; poller pinned to Room-7 device serial; local transcription; path-only envelope).
- - Step 4 — Patient identity SUGGEST-never-auto-file (the human-confirm gate above).
- - Step 5 — Human-confirmed filing + reversible delivery (summary as a draft for Dr. Saylor; staff file into ChiroHD manually; ship the wrong-file correction playbook with go-live).
- - Step 6 — Longer shadow, then graduated trust (measured by misattribution count under real variance; auto-file stays OFF in v1 regardless).
Concrete to-dos I can execute on "go":
- 1. Write the one-page PHI data-flow inventory + BAA matrix doc for your sign-off.
- 2. Draft the three consent artifacts (recording notice, intake-form line, Michigan one-party-consent check) for Dr. Saylor.
- 3. Build Step 1 (vault + guardrails) and migrate the 6/12 PHI artifacts — pure-local, no external dependency.
- 4. Queue the human steps for you/Christina: enumerate new-patient appt types into config; record the 30-second throwaway counting clip in Room 7.
Explicitly deferred (post-v1, each gated on the core loop being compliant + proven): Athena Telegram bot, persona, voice/TTS, office group, 8:30 digest, standalone-agent promotion; iCloud/Voice-Memos Stream B; admin_marketing routing; the self-learning classifier (replace with a reviewed, versioned test set); ChiroHD chart write-back (stays human until a separate two-gate decision).
Decision 3: Notification Routing — Who Pings Zach and Howv
Background (deep)
What exists today. The riles-fleet runs 10 agents: Sage (orchestrator/chief of staff), Apollo (Dr. Saylor's clinic orchestrator), Hippocrates, Forge, Mercury, Scribe, Felix, Librarian, Analyst, and Argus. Every agent has its own Telegram bot token and its own CHAT_ID in agents/<name>/.env. The agents reach Zach via cortextos bus send-telegram <chat_id> "<msg>", and the fast-checker daemon delivers Zach's inbound replies back to each agent's session.
The critical, non-obvious fact (verified just now). All agents except Apollo point at the same chat ID — Zach's personal Telegram account (6228...860). They differ only by bot token, so Zach sees 9 separate chat threads in one Telegram app, all of them his. The breakdown:
- - Sage, Hippocrates, Forge, Mercury, Scribe, Felix, Librarian, Analyst → all
6228...860(Zach). - - Apollo →
8441...234(a different chat — Dr. Saylor's, consistent with the memory note that Apollo presents to Dr. Saylor and is clinic-scoped). - - Argus →
CHAT_IDis effectively unset/blank.
So "per-agent channels" today does not mean per-audience. It means 8 bots all DMing Zach plus 1 bot (Apollo) DMing Saylor. There is no fan-in: nothing aggregates, dedupes, or prioritizes before it hits Zach's phone.
What caused last night's 709-notification flood. Each agent fires Telegram pings on lifecycle events: a "back online" message on restart, recovery messages, status confirmations, plus normal work chatter. The daemon auto-restarts sessions on a ~71h cycle, on config reloads, and on crash recovery. When a restart storm or a reload cascade hit, 8 agents each fired routine back-online/recovery pings into Zach's one account, multiplied across restart loops → ~709 notifications. This is confirmed in the daemon code: agent-process.ts had unconditional back-online sends (issue #392, line ~1150 still has the direct Agent <name> is back online path) and agent-manager.ts fires Agent <name> recovered and is back online on recovery (line ~425).
What was already fixed (the patch this decision sits on top of). Recent commits hardened the restart/notification path:
- -
a8298eae— quiet-reload: silence routine back-online Telegram pings. - -
2417f8a5— harden restart-loop (threshold invariant, pendingRestarts race, process-liveness). - -
80010fae— workflow resilience / inflight registry + continue-restart resume injection. - - In
agent-process.ts, the restart prompt now instructs agents to "Resume silently. Do NOT send a routine back-online Telegram" on continue/config-reload restarts (lines ~1035, ~1047). Back-online pings now only fire on user-requested handoff restarts. Duplicate Telegram messages/media/reactions are also suppressed inagent-manager.ts.
So the flood's root cause (uncontrolled lifecycle pings × restart loops) is patched. The flood was a symptom of restart instability, not of routing per se. The remaining question is the steady-state routing model now that the bleeding has stopped.
The web dashboard already exists. cortextOS ships a Next.js dashboard with an Activity feed, heartbeats, tasks, and goals. Per the agent CLAUDE.md, every significant action is supposed to be logged as an event and surfaced there. Much of what agents currently also push to Telegram (status, "I started X", "I finished Y") is already captured in the dashboard. Telegram is therefore partly redundant with the dashboard for non-urgent updates — memory note "Review drafts go to email; Telegram = short decisions/alerts only" already encodes Zach's stated intent that Telegram should be a low-volume decision/alert channel, not a firehose.
Constraints from memory / standing policy.
- - Apollo is Saylor-facing and clinic-scoped; its channel must stay separate and direct (Saylor has full VE authority and self-clears).
- - Every agent reply to Zach is supposed to be voice + text (Sage=cedar, others=onyx). High volume = a lot of voice notes too, which makes a flood especially painful.
- - "Telegram ACK first" and "typing indicator on receive" — Sage is expected to be responsive in real time, so whatever channel Zach uses to send must reach a live agent fast.
- - Approvals must never queue silently — whatever model is chosen, the approval-routing path must remain reliably visible.
Options
Option A — Route ALL agents through Sage (single channel)
How it works. Only Sage keeps a live bot pointed at Zach's chat ID. All other agents (Hippocrates, Forge, Mercury, Scribe, Felix, Librarian, Analyst, Argus) stop sending Telegram directly; they emit bus messages / dashboard events instead, and Sage aggregates, prioritizes, and forwards to Zach what matters (batched into briefings, plus immediate relay of true alerts/approvals). Apollo stays separate (Saylor-facing). Implementation: blank or redirect the worker agents' CHAT_ID, or add a routing guard so send-telegram from non-Sage agents fans into Sage's inbox instead of Telegram. Sage's morning/evening reviews and approval-routing responsibilities already make it the natural aggregator.
| Pros | One clean thread. Sage dedupes, prioritizes, and batches → Zach gets signal not noise. Matches the chief-of-staff design and the "Telegram = short decisions/alerts" policy. Structurally prevents a future flood (8 bots can't independently spam). One voice (cedar) instead of 8. |
| Cons | Single point of failure: if Sage is down/wedged/context-exhausted, Zach goes dark on the whole fleet until Sage restarts (agent-unresponsive failure modes are a known issue). Adds a relay hop — latency + a place for Sage to drop/garble a message. Loses the ability to DM a specific agent directly (you'd talk to all agents through Sage). Sage becomes a bottleneck for inbound replies too. |
| Effort | Medium. Need a routing guard or CHAT_ID redirect for 8 agents + Sage logic to relay alerts/approvals immediately (not just at briefing time) + a fallback if Sage is down. |
| Cost | $0 infra. Slight ongoing Opus cost (Sage does more relay reasoning). |
Option B — Mute individual agent bots; keep Sage + web dashboard
How it works. Worker agents stop pinging Telegram entirely (mute = stop all non-critical sends). Routine status lives only on the dashboard Activity feed. Sage remains Zach's conversational channel and surfaces anything urgent. Differs from A in that muted agents don't even route through Sage for routine items — those just become dashboard events Zach pulls when he wants them.
| Pros | Quietest possible phone. Forces the dashboard to be the system of record (which it's supposed to be). No relay-garbling of routine items. Zach controls when he consumes status (pull, not push). |
| Cons | Lower push-visibility: if Zach doesn't open the dashboard, he misses agent activity. Risk that a genuinely-important worker alert gets buried as a dashboard event unless Sage explicitly escalates it. Same Sage single-point-of-failure for the live channel. Loses per-agent directness entirely. |
| Effort | Low–Medium. Mute = config/guard to drop non-critical sends + ensure dashboard event logging is complete + define the "critical → escalate to Sage" carve-out. |
| Cost | $0. |
Option C — Keep per-agent channels as-is (now that the flood is fixed)
How it works. Do nothing structural. Rely on the quiet-reload + stop-side suppression + restart-loop hardening already shipped. Each agent keeps DMing Zach in its own thread.
| Pros | Zero new work. Maximum directness — Zach can DM any agent in its own thread and get a focused reply. Full granularity: you always know which agent said what. No single point of failure (no agent's outage silences another). |
| Cons | The fix is behavioral/prompt-level ("resume silently"), not a hard architectural cap — 8 bots can still independently decide to message Zach, so flood risk is reduced but not eliminated. No aggregation: even at healthy volume, 8 threads + voice notes is more noise than a chief-of-staff model. Doesn't advance the "Telegram = decisions/alerts only" intent. Future restart-storm regressions could re-flood. |
| Effort | None. |
| Cost | $0. |
Option D — Hybrid: critical/clinic agents direct, the rest through Sage (RECOMMENDED)
How it works. Tiers by audience and urgency:
- - Direct, keep their own channel: Apollo (Saylor-facing, must stay separate) and any agent that owns a true real-time alerting duty Zach wants unfiltered (e.g. a clinic/ops monitor like Hippocrates or Argus, if/when configured for alerts).
- - Through Sage: all the content/research/infra workers (Scribe, Forge, Mercury, Librarian, Analyst, Felix) — they emit dashboard events + bus messages; Sage aggregates and relays only decisions/alerts/approvals.
- - Dashboard: carries all routine status for everyone, regardless of tier.
- - A per-agent
notify_tiersetting (direct|via_sage|dashboard_only) makes the policy explicit and tunable instead of implicit in CHAT_ID values.
| Pros | Preserves directness exactly where it matters (Saylor channel, true alerts) while killing routine worker noise. Apollo's separation is honored by design, not by accident. Resilient: if Sage is down, the critical/clinic tier still reaches its audience directly. Matches existing org shape (Sage = chief of staff; Apollo = clinic) and the "Telegram = decisions/alerts" policy. Tunable: promote/demote an agent's tier as needs change. |
| Cons | Most moving parts: need a tier field + enforcement in send-telegram + Sage relay logic + clear rules for what counts as "critical." Some judgment calls on which agents are direct. Slightly more to document so agents know their tier. |
| Effort | Medium. Add notify_tier per agent, a routing guard in the bus send path, Sage relay/escalation logic. Reuses the aggregation logic Option A needs anyway. |
| Cost | $0 infra; modest Opus for Sage relay. |
Comparison table
| Dimension | A: All via Sage | B: Mute + dashboard | C: As-is | D: Hybrid |
|---|---|---|---|---|
| Phone noise | Lowest | Lowest | Medium-high | Low |
| Push visibility | High (curated) | Low (pull) | High (raw) | High (curated + critical raw) |
| Directness to a specific agent | Lost | Lost | Full | Partial (kept for critical/clinic) |
| Single point of failure | Yes (Sage) | Yes (Sage) | No | Reduced (critical tier survives Sage outage) |
| Apollo/Saylor separation | Manual carve-out | Manual carve-out | Already separate | By design |
| Flood-proof (architectural) | Yes | Yes | No (behavioral only) | Yes |
| Granularity / "who said it" | Low | Low (dashboard has it) | High | Medium |
| Effort | Medium | Low-Med | None | Medium |
| Fit with chief-of-staff model | Strong | Strong | Weak | Strongest |
Recommendation
Pick Option D (Hybrid), with Apollo direct, the six content/research/infra workers routed through Sage, and the dashboard carrying all routine status. Park the direct-alert slot for a clinic/ops monitor (Hippocrates or Argus) only if and when one is actually configured to emit time-sensitive alerts; until then those route through Sage too.
Why D over the others:
- - It is the only option that simultaneously honors the three hard constraints: Apollo must stay Saylor-facing and separate, approvals/alerts must never be silently buried, and Zach's stated intent is "Telegram = short decisions/alerts only."
- - Option C (do nothing) leaves the flood prevention purely behavioral — a "resume silently" instruction in a restart prompt. That's fragile; one regression in the restart loop or one chatty agent re-floods Zach. D adds an architectural cap (a send guard keyed on tier) so it cannot physically happen for worker agents.
- - Option A's single-point-of-failure is a real risk given the documented agent-unresponsive failure modes (oversized local files, self-poll wedge, op-token hangs). Putting 100% of fleet visibility behind Sage means a Sage wedge blinds Zach to everything. D keeps the critical/clinic tier alive independently.
- - Option B is the right destination for routine status (dashboard as system of record) and D incorporates it — D is essentially B plus a curated Sage relay plus the critical carve-out. D gives you B's quiet phone without B's "did I miss something important" anxiety.
- - D matches the org's existing shape (Sage = chief of staff who already runs briefings and routes approvals; Apollo = clinic orchestrator). It's the least surprising model and the easiest to reason about per-agent.
One caveat to log: whatever is chosen, also add a hard cap on lifecycle pings (e.g. one back-online per agent per N minutes, fleet-wide) so a restart storm can never exceed a small bounded count even for direct-tier agents. The quiet-reload patch handles routine reloads; a rate-limit handles the pathological case.
What deciding unblocks + concrete next steps
Deciding this unblocks: a clean, predictable inbox for Zach; a documented per-agent notification policy that new agents inherit on spawn; and confidence that the 709-flood class of event is structurally impossible rather than merely patched.
If Zach picks D (recommended), concrete next steps:
- 1. Add a
notify_tierfield to each agent'sconfig.json(direct|via_sage|dashboard_only). Set Apollo=direct; Scribe/Forge/Mercury/Librarian/Analyst/Felix=via_sage; Sage=direct(it is the relay); Argus/Hippocrates=via_sagefor now (promote todirectonly when an alerting duty exists). - 2. Add a routing guard in the bus
send-telegrampath (src/bus/): if caller isvia_sage, redirect the payload into Sage's inbox instead of Telegram; ifdashboard_only, drop the Telegram send and ensure the event is logged.directpasses through unchanged. - 3. Add Sage relay/escalation logic: Sage immediately forwards anything tagged alert/approval/blocker from a
via_sageagent; everything else gets batched into the morning/evening briefings. Reuse existing approval-routing so nothing queues silently. - 4. Add a fleet-wide lifecycle-ping rate limit (back-online/recovery): max 1 per agent per ~10 min, hard cap regardless of tier. Belt-and-suspenders on top of quiet-reload.
- 5. Confirm Argus's CHAT_ID (currently blank) and Apollo's points at Saylor, not Zach, before flipping the guard on.
- 6. Document the policy in the agent CLAUDE.md / AGENTS.md notification section and in MEMORY.md so spawned agents default to
via_sageunless explicitly promoted. - 7. Verify by triggering a controlled restart of two workers and confirming zero direct Telegram pings reach Zach (only a dashboard event + optional Sage summary).
If Zach instead wants the lowest-effort path right now, fall back to C and explicitly accept the behavioral-only protection, but still ship step 4 (the rate limit) since it's cheap insurance against another flood.
Relevant files:
- -
/Users/nightsage/cortextos/orgs/riles-fleet/agents/<name>/.env— per-agentBOT_TOKEN/CHAT_ID(8 of 9 non-Apollo agents share Zach's chat6228...860; Apollo =8441...234; Argus blank) - -
/Users/nightsage/cortextos/orgs/riles-fleet/agents/<name>/config.json— where anotify_tierfield would live - -
/Users/nightsage/cortextos/src/daemon/agent-process.ts— back-online send + quiet-reload restart prompts (lines ~1035, ~1047, ~1150) - -
/Users/nightsage/cortextos/src/daemon/agent-manager.ts— recovery ping (line ~425) + duplicate-message suppression (lines ~565, ~612, ~667) - -
/Users/nightsage/cortextos/src/bus/— where thesend-telegramrouting guard would be added
Decision 5: Fork Strategy — Track Upstream vs Maintain Divergent Forkv
Background (deep)
cortextOS runs as a private fork: drzachconner/cortextos (origin) forked from grandamenium/cortextos (upstream), James Goldbach's repo. Both remotes are wired in the local clone.
The hard divergence numbers (verified this session):
| Metric | Value |
|---|---|
| Merge base | upstream commit cdd8fc61 (= upstream PR #610, "voice-agent-factory skill") |
| Our position vs merge base | 360 commits ahead, 0 behind |
| Upstream merged commits we lack | 0 (merge base IS upstream HEAD) |
| Open upstream PRs | 30 (none merged yet — they sit on top of our shared base) |
This is the important nuance the morning scan compressed: we are not behind upstream's mainline at all. Upstream main and our merge base are the same commit. The 30 PRs are unmerged work-in-progress on upstream — mostly from James himself (#614–#625), Daniel Hoffman (thehoff, #626–#633 metrics/bus fixes), and community contributors. So "tracking upstream" today means tracking open PRs, not a moved mainline. That changes the calculus: there is no painful catch-up merge waiting; there is a stream of individual fixes we can evaluate one at a time.
Where our fork has actually diverged — daemon internals, heavily:
| File | Local churn since merge base |
|---|---|
src/daemon/agent-manager.ts | +620 lines, 18 local commits |
src/daemon/agent-process.ts | +498 lines |
src/daemon/codex-cli-runner.ts | +777 lines (brand-new file, ours only) |
src/daemon/session-liveness.ts | +344 lines, 6 local commits |
src/daemon/fast-checker.ts | +243 lines |
src/daemon/cron-scheduler.ts | +183 lines |
src/hooks/hook-crash-alert.ts | +86 lines |
| Total daemon+hooks area | ~3,067 insertions, 226 deletions |
The divergence is concentrated exactly where the valuable upstream reliability PRs also live. That is the core tension.
The re-work that triggered this decision: Forge's commit 2417f8a5 (last night, 6/13) hardened the restart loop after a 6/13 restart-loop incident. Its part (1) — "threshold derived at check time from the live heartbeat cron interval + 60min buffer, never below a cron-floor" — is functionally the same idea as upstream #621 ("per-agent heartbeat-staleness threshold from loop_interval"). We re-invented a fix James had already written and published as an open PR. Forge's commit also added two things #621 does NOT have: a inFlightRestarts dedupe set (pendingRestarts race guard) and a checkAgentProcessAlive pre-restart check. So it is not pure waste — it is a superset born of not knowing #621 existed.
Conflict-risk reality (the make-or-break detail for cherry-picking): I checked, per PR, whether the files it touches overlap our diverged files.
- - #621 writes a brand-new file
src/bus/heartbeat-staleness.ts(0 local commits there) plus a small import inmetrics.ts(1 local commit). Low conflict — but we already have an equivalent via Forge's commit living insession-liveness.ts, so #621 would be redundant and create two parallel threshold implementations. - - #623 (restart-into-wedge), #624 (midnight crash-budget reset + auto-unhalt), #625 (MCP-degraded boot), #620 (verify-spawn), #619 (restart-marker hygiene) all touch
agent-process.tsand/oragent-manager.ts— our two most-diverged files. High conflict. These would not cherry-pick cleanly; each needs a manual port. - - #618 (25s telegram long-poll), #571 (undici keepalive retry) touch
src/telegram/poller.ts/api.ts— only 1 local commit each. Low-medium conflict. - - #605 (atomic windowless lock mutex — fixes pid-less inbox deadlock) touches
src/utils/lock.ts(0 local commits) plus bus files. Low conflict and addresses a real deadlock class.
Monetization bearing: there's an active "paid private repo" idea. Divergence is a direct maintenance tax on productizing: every upstream security fix (and there are many — #592/#594/#596/#597/#603/#606/#607/#608 are all PTY-injection / path-traversal hardening) is one we must either port or re-discover. If we sell this, security fixes are non-negotiable; we cannot afford to silently miss upstream's security stream while diverging on features.
Options
Option A — Track upstream closely (regular merge, minimize divergence)
How it works: Treat upstream main as the source of truth. Regularly git merge upstream/main (or rebase our custom layer onto it). Push our generic improvements back to upstream as PRs so our diff shrinks. Keep only truly Zach-specific config (orgs/riles-fleet, agent prompts, brand assets) as local-only.
Pros:
- - Near-zero re-work — upstream reliability/security fixes arrive for free.
- - Smallest long-term maintenance burden.
- - Our generic fixes (Forge's race guard, codex-cli-runner) could land upstream, earning goodwill + free review.
- - Best foundation if upstream stays healthy.
Cons:
- - We've already diverged 360 commits in the daemon core; collapsing that back is a large one-time effort and means giving up custom daemon behavior or upstreaming it (slow, gated by James's review).
- - Less control: upstream design choices (e.g., their threshold formula vs Forge's cron-floor) may not match our fleet's needs.
- - Merge conflicts on every sync in agent-manager/agent-process until divergence shrinks.
- - Couples our production fleet's stability to upstream's release cadence and PR-merge latency (30 PRs sit open — upstream merges slowly).
Effort: High upfront (weeks of upstreaming + reconciliation), low ongoing.
Cost: ~0 incremental $ (engineering time only, mostly Forge/Codex).
Option B — Maintain divergent fork (full control, ignore upstream)
How it works: Treat the fork as our own product. Pull from upstream rarely or never. Re-solve problems as we hit them.
Pros:
- - Maximum control and velocity — no merge friction, ship whatever we want.
- - Custom daemon behavior (codex-cli-runner, Forge's defense-in-depth restart logic) stays exactly as designed.
- - Clean story for productization if we own the whole stack.
Cons:
- - We just proved this fails: Forge re-invented #621. That re-work recurs every time upstream fixes something we also hit.
- - Security blind spot: upstream has shipped ~8 PTY-injection/path-traversal hardening PRs. Diverging blindly means re-discovering each vulnerability the hard way — unacceptable if we monetize.
- - Growing maintenance burden; the fork ages out of compatibility with community skills/templates upstream evolves.
- - No free review on our daemon changes.
Effort: Low day-to-day, but high hidden cost (re-work + security exposure).
Cost: ~0 $ but high opportunity + risk cost.
Option C — Selective cherry-pick (deliberate pull of valuable upstream fixes, keep divergence)
How it works: Keep our divergent daemon. Run a standing weekly upstream-watch (Forge or a cron-driven scan): list new merged upstream commits + high-value open PRs, triage into {port now / skip / already-have}. Port low-conflict fixes by cherry-pick; port high-conflict ones by hand-reading the diff and re-implementing against our code. Always pull security fixes. Skip features that conflict with our design.
Pros:
- - Middle path: keeps full control of our daemon while plugging the exact gap that caused the Forge re-work.
- - Lets us prioritize — security fixes always, reliability fixes when they beat what we have, features rarely.
- - No big-bang reconciliation; incremental and reversible.
- - Compatible with productization: we can document "tracks upstream security, custom reliability layer."
Cons:
- - Requires discipline — a standing process, not a one-time act. If the weekly watch lapses, we silently drift back to Option B.
- - Hand-porting high-conflict PRs (#623/#624/#625) is real work each time.
- - Two parallel implementations can coexist (e.g., our threshold logic vs #621's) — must consciously pick one to avoid drift.
Effort: Low-medium ongoing (a triaged hour/week for Forge), no upfront reconciliation.
Cost: ~0 $ (Forge/Codex time).
Comparison
| Dimension | A: Track closely | B: Divergent fork | C: Selective cherry-pick |
|---|---|---|---|
| Re-work avoided | Maximum | Minimum | High (gap closed) |
| Control over daemon | Low | Maximum | High |
| Merge-conflict pain | High (until converged) | None | Medium (per-PR) |
| Security-fix coverage | Automatic | Manual/missed | Automatic-if-disciplined |
| Upfront effort | High | None | None |
| Ongoing effort | Low | Low (but hidden re-work) | Low-medium |
| Productization fit | Good (if upstream healthy) | Risky (security gap) | Best (controlled + secure) |
| Matches current reality | Poor (already 360 ahead) | Status quo (just failed) | Yes |
Recommendation
Option C — Selective cherry-pick, with a standing weekly upstream-watch and a hard rule: always pull security fixes.
Why C over A: we are already 360 commits deep into a heavily customized daemon (agent-manager +620, a 777-line codex-cli-runner that has no upstream equivalent). Collapsing that to track upstream closely is a multi-week reconciliation for benefits we can get incrementally. Our custom daemon behavior is genuinely differentiated and, for several pieces (Forge's race guard + process-liveness check), better than upstream.
Why C over B: B is the status quo and it just cost us the Forge re-work, and worse, it silently exposes us to upstream's security stream — fatal if we sell this.
C closes the exact gap (no awareness of upstream fixes) without throwing away divergence we've invested in.
Concrete cherry-pick triage — what to pull now:
| PR | What it fixes | Conflict risk | Action |
|---|---|---|---|
| #605 atomic windowless lock mutex (pid-less inbox deadlock) | Reliability — real deadlock class | Low (lock.ts untouched) | Port now — clean, high value |
| #571 undici keepalive retry on transport error | Telegram flakiness | Low | Port now — 1-file, cheap |
| #618 25s long-poll + split backoff | Telegram poll efficiency | Low-med | Port now — pairs with #571 |
| #621 per-agent heartbeat threshold from loop_interval | Watchdog false-positive | Low file-conflict but redundant | Skip — Forge's 2417f8a5 already covers this and adds race guard + process-liveness. Instead, consider upstreaming Forge's superset back to James (goodwill + free review). |
| #625 MCP-degraded boot (broken .mcp.json) | Boot resilience | High (agent-process.ts) | Hand-port — relevant to our codex/MCP setup |
| #623 restart-into-wedge detection + exit-reason.json | Reliability — directly adjacent to our 6/13 incident | High (agent-process.ts + hook-crash-alert.ts) | Hand-port — highest reliability value; reconcile with Forge's commit |
| #624 halted-agent midnight crash-budget reset + auto-unhalt | Reliability | High (agent-process.ts) | Hand-port — prevents stuck-halted agents |
| #620 verify-spawn + bounded retry + fleet alert | Spawn-failure truth | High (touches agent-manager + new file) | Evaluate — overlaps our spawn logic; port the alerter, reconcile the rest |
| #619 restart-marker hygiene | Lifecycle | High | Evaluate after #623/#624 (same files, do together) |
| Security PRs (#592/#594/#596/#597/#603/#606/#607/#608) | PTY-injection + path-traversal hardening | Mixed | Audit all — port any not already covered. Non-negotiable for productization. |
Order of execution: do the three low-conflict ports first (#605, #571, #618) to bank quick wins, then a single coordinated agent-process.ts porting session for #623+#624+#625 (and #619/#620) since they all touch the same file — porting them together avoids re-resolving the same conflicts repeatedly. Run the security audit in parallel.
Also: upstream Forge's restart-loop superset (2417f8a5) as a PR to James. It's strictly more complete than #621; getting it merged turns our re-work into a contribution and shrinks future divergence in that file.
What deciding unblocks + concrete next steps
Deciding C unblocks: (1) a clear standing process so we never re-invent another #621; (2) a security posture compatible with the paid-private-repo monetization idea; (3) Forge's queue for the next reliability sprint.
Next steps:
- 1. Adopt the standing weekly upstream-watch — add a cron (Forge, weekly) that runs
git fetch upstream+ lists new merged commits and new open PRs, triaged into port/skip/have. Output to Sage for routing. Owner: Forge. - 2. Sprint 1 (low-conflict, this week): Forge cherry-picks #605, #571, #618 onto a branch, runs
npm run build && npm test(current baseline: 157 test files, 2335 tests pass per 2417f8a5), opens internal PR. No upstream interaction needed. - 3. Sprint 2 (coordinated agent-process port): single Forge/Codex session hand-ports #623 + #624 + #625 (+#619/#620 if clean) into our diverged
agent-process.ts/agent-manager.ts, reconciling with Forge's 2417f8a5 logic. Test, internal PR. - 4. Security audit (parallel, high priority): Forge diffs upstream's 8 security PRs against our code; port anything not already covered. Flag to Sage/Zach if any gap is currently live in the fleet.
- 5. Upstream contribution: package 2417f8a5 (restart-loop hardening superset) as a PR to
grandamenium/cortextos, crediting the shared incident. Shrinks divergence + builds standing with James. - 6. Document the policy in the repo (
CONTRIBUTING.mdor aFORK-POLICY.md): "We maintain a divergent daemon; we always pull upstream security fixes; reliability/feature PRs are triaged weekly; generic improvements are upstreamed." This is the productization-ready story.
Decision gate for Zach: confirm C, and confirm the hard rule that upstream security fixes are always pulled (the one place we do not get to diverge). Everything else Forge can execute autonomously under existing overnight-work authorization.
How these decisions interactv
These four decisions are not independent — they share dependencies, a common gate philosophy, and a single scarce resource (your attention). Reading them together changes how you should sequence them.
1. Decision 5 (fork strategy) is the foundation under Decision 1 (monetization) and Decision 3 (notifications). The monetization pilot's eventual full-fleet destination assumes a "paid private repo" you can ship and patch. That is only safe if you have the security posture from Decision 5's hard rule (always pull upstream security fixes). If you ever productize, "we maintain a divergent daemon and always pull upstream security fixes" is literally a sales/diligence line. And Decision 3's whole reason for existing — the 709-flood — was a restart-loop symptom; Decision 5 is what keeps the restart-loop fixes (Forge's 2417f8a5, upstream #623/#624) coherent rather than re-invented. Decision 5 = C is the cheapest of the four and removes a recurring tax under the other three; confirm it first.
2. Decisions 1 and 2 share the exact same compliance fault line, and the same gate discipline. Both hinge on the business-associate/PHI boundary. Decision 1's recommended pilot is defined by staying no-PHI; Decision 2 is what the world looks like the moment you do cross into PHI (Athena/Apollo touching real consults). They are the two halves of one principle: **a no-PHI surface needs only a contract; a PHI surface needs a full BAA matrix + human-in-the-loop + reversibility before go-live.** If you green-light Decision 1's narrow pilot but let it creep toward ops, you have silently walked into Decision 2's regime without doing Decision 2's work. Hold the line: Decision 1's no-PHI clause and Decision 2's Gate-0 BAA matrix are the same fence viewed from two sides. The monetization full-fleet "Gate 2: compliance architecture" is, concretely, the Decision 2 BAA-matrix + human-confirm pattern applied to a customer clinic instead of VE.
3. Decision 2 (Athena/Apollo) and Decision 3 (notifications) intersect at the Apollo channel. Decision 3 keeps Apollo direct to Dr. Saylor by design. Decision 2 puts PHI-adjacent clinical material into Apollo's pipeline. So the Decision 3 rule "formally prohibit PHI bodies on Telegram" is not just a Zach-noise rule — it is a Decision 2 compliance control on the Apollo/Saylor channel too. When you implement the Decision 3 routing guard, the same guard should enforce Decision 2's "no PHI on Telegram, initials + opaque id only" constraint. One mechanism, two payoffs.
4. They compete for the same five-week window and the same builders. Buffalo (Decision 1) is the hard deadline at July 18-19. Decision 2's PHI vault + name-fix gem is a meaty Apollo/Forge build. Decision 5's cherry-pick sprints are Forge work. Decision 3 is a small Forge/bus change. Realistic sequencing: (a) confirm Decision 5=C and Decision 3=D now — both are cheap, both reduce ongoing pain, Decision 3 also closes a real Saylor-channel compliance hole; (b) put the bulk of the five-week runway into Decision 1's pilot assets (Loom, contract, ROI metric, outreach) because Buffalo doesn't move; (c) run Decision 2's local, non-blocking Step 1 (PHI vault + 6/12 cleanup) in parallel since it has no external dependency and de-risks PHI already at rest, but gate Decision 2's capture steps behind the BAA signatures rather than racing them against Buffalo.
5. The unifying meta-pattern across all four: buy evidence/safety cheaply before committing the expensive, irreversible thing. Decision 1 buys WTP evidence with a Loom before building an MSP. Decision 2 puts a human between a match and a chart before trusting auto-file. Decision 3 caps lifecycle pings architecturally before trusting behavioral prompts. Decision 5 watches upstream weekly before either a big-bang merge or a blind divergence. Each recommendation is the option that keeps the upside alive while making the downside cheap and reversible — which is the right posture given that the binding constraint across every one of these is your attention at a seven-figure clinic, not money.