Fieldforce Phase 3 — Task Creation AI + BYOK Foundation

Overview

Date2026-05-19 (revised 2026-05-20 after grill session)
StatusApproved for implementation planning
AuthorWayne Cheah
Domain refsCONTEXT.md, ADR-0021 (table naming), ADR-0016 (two-tier feature flags)
Predecessorsfieldforce-management-design (2026-05-15), phase-2-grill-session-decisions (2026-05-19)
SurfaceSolidStart Panel — /tasks/new, /settings/ai, /admin/ai-config

Pillar A — AI Access Foundation

platform-wide

Mode model: trial → platform → BYOK → disabled
AccessGate — every AI call passes through before any prompt is built
New tables: ai_plans, ai_models, ai_access_events, ai_config_defaults
Global AI kill-switch via admin_feature_flags.key = 'ai'
AES-GCM BYOK key encryption with per-row nonces, boot-required env secret

Pillar B — Task Creation AI Cluster

fieldforce feature

NL task parsing + auto-categorization + AI checklist — three deferred Phase 2 features
One combined-schema LLM call per Generate action (title, description, task_type, priority, due_date_hint, checklist_items)
POST /fieldforce/ai/parse-task — returns pre-filled fields; never creates a task
AI mode tab on Panel task creation form

Scope and Non-Goals

Explicitly out of scope this round:

Astro mobile AI mode (immediate follow-up change — same backend, new UI).
Other AI features: voice-to-text logging, smart assignment, AI completion summary, predictive delay.
Real payment processor / subscription webhooks (manual platform-admin field updates only).
Native provider structured-output / tool-call enforcement — deferred to a future module-wide change; this round uses prompt + post-hoc JSON-schema validation + retry.
Response caching, streaming responses, automated CI evals.
Per-feature trial caps (mode caps are org-wide across all AI features today).

Architecture

Component map — Panel calls Go HTTP layer; application layer adds AccessGate and AITaskParser on top of the existing provider factory; new tables are managed by CreateAIAccessTables() in migrator.go.

Key Design Points

No new AI provider plumbing: Existing Go ai module already has multi-provider support, provider factory, org-scoped usage tracking, and Anthropic / OpenAI / Google / Groq / Ollama adapters. Phase 3 adds a domain use case and a platform gate on top.
Fieldforce endpoint in fieldforce module: Domain-specific prompts and validation co-locate with the rest of fieldforce. The generic ai module stays free of domain leakage.
One endpoint, one LLM call: Prior design had two sequential LLM calls (parse + checklist). A single combined-schema call is faster, cheaper, and aligns "calls" debit with user-visible Generate actions.
Native structured-output deferred: Existing adapters lack tools / response_format plumbing across all providers. Phase 3 uses JSON-mode prompts + post-hoc schema validation + retry-once. OpenAI / Groq adapters gain a ResponseJSONOnly bool flag wiring to response_format: { type: "json_object" }.
Gate-first hot path: Every AI use case calls AccessGate.Authorize(orgID, feature) before constructing any prompt, then RecordUsage after a provider response. On provider failure, RecordError updates the same access-event row without debiting counters.

BYOK provider whitelist (this round): Only anthropic, openai, google are exposed via BYOK. Groq and Ollama remain platform-only. Expanding later is a one-line change to ai_config_defaults.byok_allowed_providers.

Schema source of truth: backend/go/internal/shared/infrastructure/database/postgresql/migrations/migrator.go. The .sql files under backend/go/migrations/ have never been executed and will be deleted. Phase 3 adds CreateAIAccessTables() to migrator.go, called after CreateFieldforceTables().

Existing-table reconciliation: org_ai_configs (phase-1 scaffold, never wired, zero rows) is dropped first. ai_configs and ai_usage are extended additively. Per ADR-0021 the org_* prefix is reserved for cross-cutting registries; the correct prefix is the module name.

ADR-0016 Gate Order

Echo middleware
auth → role → org-scope
FieldforceFeatureFlagMiddleware
Global fieldforce kill-switch + per-org fieldforce flag (per ADR-0016).
Handler → AccessGate.Authorize
a. Global AI kill-switch (admin_feature_flags.key = 'ai')
b. Per-org mode resolution (lazy-provision trial row if absent)
c. Subscription validity (platform mode only)
d. Plan + per-org cap check (trial and platform)
e. BYOK credential decrypt (BYOK mode only)
Parser — LLM call
Runs only after Authorize returns an AuthzResult.
AccessGate.RecordUsage / RecordError
On success: updates counters + access-event row. On provider failure: RecordError updates the event row with error diagnostics; no counter is debited.

Data Model

All changes land via CreateAIAccessTables() in migrator.go — idempotent via IF NOT EXISTS; safe to re-run on every server boot.

Cleanup — Drop Dead Scaffold

DROP TABLE IF EXISTS org_ai_configs;   -- phase-1 scaffolding, never wired, zero rows

`ai_configs` Extensions

New columns added to the live GORM-managed table (additive; one column type change):

mode: TEXT NOT NULL DEFAULT 'trial' CHECK ('trial','platform','byok','disabled')
model: TEXT — domain-specific model for AI access features (separate from chat/embedding/vision models)
api_key_encrypted: TYPE changed BYTEA (column is empty in production; cast is safe)
api_key_nonce: BYTEA — 12-byte AES-GCM nonce, generated fresh per Encrypt, never reused
Trial counters: trial_tokens_used, trial_tokens_limit, trial_calls_used, trial_calls_limit, trial_granted_at, trial_exhausted_at
Subscription state: subscription_status CHECK ('none','active','past_due','canceled','expired'), subscription_valid_until
Plan reference: plan_id UUID FK to ai_plans (added after table is created)
Platform counters: platform_tokens_used, platform_calls_used, platform_tokens_limit (NULL → fall back to plan or defaults), platform_calls_limit, platform_period_start
CHECK constraint: mode_requires_provider_model — platform/byok modes require provider + model to be non-null

Backfill: Existing rows (pre-Phase-3 production orgs that had no mode concept) are updated to mode = 'platform' so live AI does not unexpectedly fall into trial behavior.
UPDATE ai_configs SET mode = 'platform' WHERE mode = 'trial' AND created_at < NOW() - INTERVAL '1 day'

`ai_usage` Extension

ALTER TABLE ai_usage ADD COLUMN IF NOT EXISTS feature TEXT NOT NULL DEFAULT 'unknown';
-- Unique constraint extended to include feature column:
ALTER TABLE ai_usage ADD CONSTRAINT ai_usage_unique
  UNIQUE (organization_id, usage_date, provider, model, feature);

New: `ai_config_defaults` (singleton)

Purpose: Last-resort fallback when org and plan caps are both NULL. Deliberately conservative.
Singleton enforcement: CONSTRAINT singleton CHECK (id = '00000000-0000-0000-0000-000000000001')
Seeded defaults: provider = 'anthropic', model = 'claude-sonnet-4-6', platform_tokens_limit = 200,000, platform_calls_limit = 200, byok_allowed_providers = '{anthropic,openai,google}'

New: `ai_plans`

code	display_name	tokens_limit	calls_limit
trial	Free Trial	50,000	20
starter	Starter	200,000	200
pro	Pro	2,000,000	2,000
enterprise	Enterprise	20,000,000	20,000

Why are prices set to 0? Real billing is out of scope this round. Limits are deliberately low to force a sales conversation during onboarding (per grill decision 4b-i). The price_cents_per_month column exists for future Stripe sync.

New: `ai_models` (catalog)

provider	model_id	byok_visible	platform_eligible	recommended	input $/1k	output $/1k
anthropic	claude-sonnet-4-6	✓	✓	★	0.003	0.015
anthropic	claude-haiku-4-5	✓	✓		0.0008	0.004
openai	gpt-4o	✓		★	0.0025	0.010
openai	gpt-4o-mini	✓			0.00015	0.0006
google	gemini-2.0-pro	✓		★	0.00125	0.005
google	gemini-2.0-flash	✓			0.000075	0.0003

Why only Anthropic as platform_eligible? Gremlin has Anthropic platform keys at launch. OpenAI / Google platform billing comes in a later phase. Cost columns feed the usage estimate in GET /ai-config/usage; admin can update via the Models tab.

New: `ai_access_events`

Purpose: Per-decision audit trail with diagnostic columns: provider, model, tokens_consumed, latency_ms, http_status, error_code, error_detail (sanitized), provider_request_id.
decision CHECK values: allowed · denied_disabled · denied_global_killswitch · denied_trial_exhausted · denied_subscription_inactive · denied_platform_cap_exceeded · denied_no_byok_key · denied_byok_decrypt_failed · denied_model_deprecated · byok_test_succeeded · byok_test_failed
Retention: Rows older than 90 days purged by a scheduled job. Cron itself is out of scope this change.
error_detail sanitization: Regex strips sk-[A-Za-z0-9_-]+, sk-ant-[A-Za-z0-9_-]+, AIza[A-Za-z0-9_-]+ → <redacted>; truncates to 500 chars.

Provisioning

No org-creation hook. AccessGate lazy-provisions a trial row on first AI call via INSERT ... ON CONFLICT DO NOTHING. Two concurrent first-calls cannot double-insert. Rationale: avoids coupling core org creation to AI module availability.

Concurrency Model

Bounded over-spend accepted. Authorize (read counters) and RecordUsage (UPDATE counters) are not in one transaction — the provider call sits between them. Two concurrent requests near a cap can each be authorized. Worst-case over-spend ≈ max_tokens_per_call × in_flight_requests, bounded to a few thousand tokens for trial defaults.

RecordUsage uses an atomic CASE-based UPDATE for platform-mode counters so the lazy monthly reset and increment happen in one statement, race-free. Authorize is tolerant of stale-period counters — when platform_period_start < current_month_start, it treats used counters as 0 for the cap check.

API Surface

Fieldforce AI — parse-task

POST /api/organizations/:org_id/fieldforce/ai/parse-task
  Auth:  Admin | Owner | Manager | Supervisor
  Body:  { text: string (max 4000 chars), locale?: 'en'|'zh'|'ms' }
  200:   { title, description, task_type, priority, due_date_hint?, checklist_items[] }
  402:   trial_exhausted | subscription_inactive | platform_cap_exceeded
  403:   ai_disabled | ai_globally_disabled
  409:   invalid_mode_transition | subscription_required
  422:   no_byok_key | provider_not_allowed | invalid_api_key | ai_input_invalid
  429:   rate_limited
  502:   ai_unavailable | ai_output_invalid | byok_key_rejected | model_deprecated | provider_overloaded

parse-task never creates a task. Returns parsed fields for the form to pre-fill. User reviews, edits, and saves through the existing POST /tasks endpoint.

Org AI Config (org-scoped, Admin/Owner)

GET /ai-config: Returns mode, provider, model, has_api_key: boolean (never the key itself), plan, subscription state, trial/platform counters, and byok_model_catalog (per-provider lists of byok_visible models).
PUT /ai-config: Mode transitions enforced server-side. BYOK: validates provider ∈ whitelist, performs test call with 5s hard timeout, AES-GCM encrypts key with fresh 12-byte nonce. 'trial' and 'platform' transitions via PUT → 409 (only platform-admin can move orgs into these modes).
DELETE /ai-config/api-key: Clears encrypted key + nonce. If mode='byok', flips mode to 'disabled'. Audited as AI_BYOK_KEY_REMOVED + AI_MODE_CHANGED.
GET /ai-config/usage: Usage breakdown by feature/provider/model with cost estimates derived from ai_models cost columns at query time. Supports granularity=day|month and group_by=feature|provider|model.

api_key is never returned. GET /ai-config returns only has_api_key: boolean. Echo's request-body logger has a redactor for any field named api_key.

Platform-Admin Endpoints

GET/PUT /admin/ai-config/defaults: trial_enabled toggle, trial default provider/model, byok_allowed_providers array, last-resort platform caps.
GET/POST/PATCH /admin/ai-config/plans: CRUD on ai_plans. PATCH supports activate/deactivate.
GET/POST/PATCH/DELETE /admin/ai-config/models: CRUD on ai_models. Hard delete; orgs referencing the model see 502 model_deprecated on next call.
GET /admin/ai-config/orgs: LEFT JOIN ai_configs so orgs without a config row appear as "Not yet active". Filterable by mode/plan; searchable by org name.
PATCH /admin/ai-config/orgs/:id: trial→platform and disabled→platform MUST include plan_id + subscription_valid_until + provider + model — missing any → 409 subscription_required. Single AI_MODE_CHANGED audit row with before/after JSON.
POST /admin/ai-config/orgs/:id/reset-trial: Only allowed when current mode = 'disabled'. Zeroes trial counters, flips mode to 'trial'. Audited as AI_TRIAL_RESET.
GET /admin/ai-config/events: Paginated ai_access_events including all diagnostic columns.

Global AI kill-switch uses the existing admin_feature_flags endpoints with key = 'ai'. No new API endpoint is needed. UI surfaces it as a dedicated tab on /admin/ai-config.

Feature Key Registry

feature key	fires from	debits counters
`fieldforce:parse_task`	POST /fieldforce/ai/parse-task	Yes (trial + platform)
`byok:test_call`	BYOK validation ping on PUT /ai-config	No — accounting visibility only; never debits trial counters
`unknown`	Legacy / pre-Phase-3 rows	N/A

BYOK test call bypasses AccessGate.Authorize. The org's mode has not yet been switched to BYOK when the test fires. Uses the user-supplied key in-memory (never written until validated). Still subject to the global AI kill-switch. A failed test → 422; no row mutated in ai_configs.

Audit Actions

New actions emitted by write endpoints via the existing AuditHelper decorator:

AI_CONFIG_UPDATED · AI_BYOK_KEY_SET · AI_BYOK_KEY_REMOVED · AI_TRIAL_RESET · AI_MODE_CHANGED · AI_PLAN_UPDATED · AI_MODEL_UPDATED · AI_TRIAL_DEFAULT_CHANGED · AI_SUBSCRIPTION_STATUS_CHANGED · AI_GLOBAL_KILL_SWITCH_TOGGLED

Access Gate Logic

AccessGate.Authorize decision tree — 6-step flow from kill-switch check through mode-specific validation to credential handoff. All denied paths insert an ai_access_events row before returning.

Use Case Interface

// internal/modules/ai/application/usecase/access_gate.go

func (g *AccessGate) Authorize(ctx context.Context, orgID, userID, feature, requestID string) (*AuthzResult, *AuthzError)
func (g *AccessGate) RecordUsage(ctx context.Context, authz *AuthzResult, feature string,
                                  inputTokens, outputTokens int, latencyMs int, providerRequestID string) error
func (g *AccessGate) RecordError(ctx context.Context, authz *AuthzResult, feature string,
                                  errorCode, errorDetail string, httpStatus int, latencyMs int, providerRequestID string) error

AuthzResult.Credentials has unexported fields, no JSON tags, and a String() method returning "<redacted>" — accidental fmt.Sprintf("%+v", creds) cannot leak the key.

RecordUsage

Atomic trial decrement; sets trial_exhausted_at via CASE when either limit is reached.
Atomic platform counter with lazy monthly reset CASE-update — reset and increment in one statement, race-free.
Per-feature usage rollup to ai_usage via ON CONFLICT DO UPDATE.
Idempotent diagnostic update on the access-event row keyed by request_id.

Provider errors never debit counters. RecordError updates the existing access-event row with error_code, sanitized error_detail, latency_ms, and provider_request_id. The user received no result — no counter is decremented.

Key Isolation Rules

Decrypted BYOK keys live only in process memory for the duration of one HTTP request — never written to disk, never logged, never placed in a struct that gets serialized.
Credentials Go type: unexported fields, no JSON tags, String() and GoString() return "<redacted>".
Echo's request-body logger has a redactor for fields named api_key.
AES-GCM nonces are 12 bytes, generated fresh from crypto/rand per Encrypt, stored per-row alongside ciphertext; never reused.
AI_KEY_ENCRYPTION_SECRET (32-byte base64-encoded) is required at boot. Missing or wrong length → l.Fatal, process exits. Hard fail, not silent degradation.
error_detail strings pass through the regex sanitizer before storage on ai_access_events.

Failure-Mode Reference

Scenario	HTTP	Code	UI behavior
Global AI kill-switch active	403	`ai_globally_disabled`	AI tab hidden; "AI features temporarily disabled platform-wide" banner
No config row + trial disabled	403	`ai_disabled`	AI tab hidden
mode = 'disabled'	403	`ai_disabled`	AI tab hidden
Trial tokens or calls exhausted	402	`trial_exhausted`	Upgrade dialog opens; text preserved
Platform subscription not active/expired	402	`subscription_inactive`	"Subscription lapsed — contact account manager"
Platform monthly cap exceeded	402	`platform_cap_exceeded`	"Monthly limit reached. Contact support or wait until {next_month}."
BYOK, no key stored	422	`no_byok_key`	"Configure BYOK in Settings" CTA
BYOK decrypt fail (data corruption)	502	`invalid_byok_key`	"Re-enter your key" + ops alert
BYOK test call rejected by provider	422	`invalid_api_key`	Inline modal error, key field cleared
Provider not in BYOK whitelist	422	`provider_not_allowed`	Inline modal error
Forbidden mode transition	409	`invalid_mode_transition`	Inline explanation with current_mode + attempted_mode
Promote without plan/subscription	409	`subscription_required`	Modal validation failure
Org model deprecated	502	`model_deprecated`	Banner: "Selected model removed — pick a new one in Settings"
Provider 4xx/5xx/timeout	502	`ai_unavailable`	"AI temporarily unavailable"
BYOK provider rejects key mid-call	502	`byok_key_rejected`	"Your API key was rejected — re-enter it"
Output invalid after retry	502	`ai_output_invalid`	"AI couldn't parse — try rephrasing"
Provider rate limited	429	`rate_limited`	"Too many requests — try again in {retry_after}"

402, 409, 422, 502 distinctions per CONTEXT.md "HTTP status discipline". A 5xx provider failure does not debit any counter.

Frontend (SolidStart Panel)

Task Creation Form — AI Mode Tab

Route: /app/fieldforce/tasks/new (extended). Tab bar adds ✨ AI mode alongside the unchanged Manual tab.

Manual

✨ AI mode

Visit ABC Holdings at Jalan Ampang to check faulty air-con unit — urgent, need it done before Friday 5pm

Task creation form — AI mode tab. Free-text input with char counter, trial pill, Generate button, and post-generate success banner with Redo affordance.

On Generate (success): Form switches to Manual tab with all fields pre-filled. Banner: "✨ AI filled this in. Review and edit before saving." + Redo with AI button (re-opens AI textarea; replacing edited fields prompts confirm).
Trial counter: Fetched via useAIConfig() on form mount; refreshed after each Generate response.
AI tab hidden when: ai_disabled or ai_globally_disabled — Manual is the only path.
402 codes: Upgrade dialog opens; user's text is preserved.
502 ai_unavailable / ai_output_invalid: Inline error below textarea; auto-switch to Manual after 3 seconds.
502 model_deprecated: Banner above tab: "Your AI model was removed. Go to Settings → AI to pick a new one."

Org AI Settings Page

Route: /app/settings/ai (new). Access: Admin / Owner.

Trial mode

default

Label + "Model: Claude Sonnet 4.6 (platform default)"
Progress bars for tokens + calls used
Per-feature usage breakdown table
Upgrade card → "Contact sales" mailto link

Platform mode

paid

Plan name + subscription valid-until
Change model → picker filtered to platform_eligible
Month-to-date tokens + calls progress bars
Subscription banner if past_due

BYOK mode

bring your own key

"BYOK — Anthropic — claude-sonnet-4-6"
Per-feature usage chart (no caps)
Remove key · Change buttons
BYOK modal: provider radio → model dropdown (filtered by provider, deprecated_at hidden) → API key input → Save and test

Disabled mode

no AI

"AI is disabled for your org. Contact your admin to re-enable."
AI mode tab hidden everywhere

Advanced section (collapsed): Disable AI for the org. Confirm dialog explains effect. Calls PUT /ai-config { mode: 'disabled' }. No path from inside the task form to switch mode — must go through Settings.

Platform-Admin AI Config Page

Route: /admin/ai-config (new). Access: platform_admin. Five tabs:

Tab 1 — Defaults: trial enabled toggle, trial default provider/model (filtered to platform_eligible), BYOK allowed providers checkboxes, last-resort platform token/call limits. → PUT /admin/ai-config/defaults
Tab 2 — Plans: Table editor: add/edit/activate-deactivate ai_plans rows. Fields: code, display_name, tokens_limit, calls_limit, price_cents_per_month, is_active.
Tab 3 — Models: Table editor: add/edit/soft-deprecate ai_models rows. UI prevents platform_eligible = true unless cost columns are populated.
Tab 4 — Orgs: All orgs with mode/plan/subscription/usage/last_active_at. Row opens side panel: Promote to Platform (modal), Reset trial (only from disabled; requires typing org name), Disable AI, Adjust limits, Change plan, Update subscription state, last 20 access events for the org.
Tab 5 — Kill-switch: Toggle backed by admin_feature_flags.key = 'ai'. Confirmation requires typing the word "DISABLE." Audited as AI_GLOBAL_KILL_SWITCH_TOGGLED.

Used from the admin Orgs side panel. All four fields are required: Plan (from active ai_plans), Subscription valid until (date picker), Provider (distinct providers from platform_eligible models), Model (filtered by selected provider). Refuses with 409 subscription_required if any field is missing. Single AI_MODE_CHANGED audit row with before/after JSON on success.

Upgrade Dialog (shared)

Mounted once at root layout; visibility driven by a global Solid signal raised on any 402 response. Three CTAs:

Upgrade plan → mailto: contact-sales URL (env-configurable).
Use your own key → deep-link to /app/settings/ai?openByok=1 (auto-opens BYOK modal).
Maybe later → close dialog.

Hooks and i18n

Org hooks (features/ai/): useAIConfig() cached at root layout, invalidated on writes; useUpdateAIConfig(); useParseTask() — raises upgrade dialog signal on 402; useAIUsage()
Platform-admin hooks (features/admin-ai/): useAIDefaults, useListAIPlans, useUpsertAIPlan, useListAIModels, useUpsertAIModel, useDeleteAIModel, useListAIOrgs, usePatchAIOrg, useResetTrial, usePromoteToPlatform, useListAIEvents, useAIKillSwitch, useToggleAIKillSwitch
i18n: All UI strings through Paraglide (ai.* namespace), en/zh/ms translations in this change. AI-generated output follows caller's UI locale via prompt instruction.
Acceptance bars: en ≥98%, zh ≥95%, ms ≥85% locale-of-output correctness

Deliberately out of UI scope

No "switch to BYOK" path from inside the task form — must go through Settings.
No per-feature toggle — mode is org-wide.
No real-time trial counter via WebSocket — refreshes after each parse-task response.
No usage-chart drill-down beyond the 30-day card.
No inline subscription/payment flow — "Contact sales" until Stripe lands.

Prompts

Combined Output Schema

{
  "type": "object",
  "required": ["title", "description", "task_type", "priority", "checklist_items"],
  "properties": {
    "title":         { "type": "string", "maxLength": 120 },
    "description":   { "type": "string", "maxLength": 2000 },
    "task_type":     { "enum": ["service", "sales", "logistics", "other"] },
    "priority":      { "enum": ["low", "medium", "high", "urgent"] },
    "due_date_hint": { "type": "string", "format": "date-time" },
    "checklist_items": {
      "type": "array", "minItems": 0, "maxItems": 10,
      "items": { "type": "string", "maxLength": 120 }
    }
  },
  "additionalProperties": false
}

Locale Anchors

Locale	LOCALE_ANCHOR	LOCALE_REITERATION
en	(empty)	"Reminder: respond in English."
zh	"Respond in Simplified Chinese (简体中文)."	"提醒：必须用简体中文回答。"
ms	"Respond in Bahasa Melayu (Malaysian Malay, not Indonesian). Example title: 'Hantar invoice ke pelanggan ABC sebelum Jumaat.'"	"Peringatan: jawapan MESTI dalam Bahasa Melayu."

Enforcement Layer (prompt + post-hoc + retry-once)

Prompt with JSON instruction
System prompt includes "Respond as JSON matching the provided schema. No commentary. No markdown." OpenAI/Groq adapters also receive response_format: { type: "json_object" } via the new ResponseJSONOnly bool field on ChatRequest.
JSON decode and schema validation
Decode the model's Content field; validate against the JSON schema using github.com/santhosh-tekuri/jsonschema/v5 (or similar).
Retry once on failure
On JSON parse failure or schema-validation failure, retry with a corrective prompt that includes the validator error and the prior output.
502 on second failure
Return ai_output_invalid. UI shows "AI couldn't parse — try rephrasing or fill in manually." and auto-switches to Manual tab after 3 seconds.

Why not native structured-output now? The existing AI module has no tools / response_format plumbing across all five adapters. Adding it module-wide is a separate change. The ResponseJSONOnly flag for OpenAI/Groq is a one-hour addition that roughly halves JSON-parse failure rate on those two providers; it's not full schema enforcement.

Testing and Evals

Backend Unit Tests (no DB)

AccessGate.Authorize — one case per branch: no row + trial enabled, no row + trial disabled, mode=disabled, global kill-switch denied, trial at tokens limit, trial at calls limit, trial under limit, platform subscription expired, platform cap exceeded, platform model deprecated, BYOK no key, BYOK decrypt fails, BYOK happy path.
AccessGate.RecordUsage — trial decrement atomic, platform lazy-reset CASE, platform increment in current period, BYOK no counter decrement, ai_usage ON CONFLICT rollup, idempotent on request_id.
AESGCMCrypto — round-trip, wrong nonce/key/tamper fail, 1024 unique nonces asserted.
FieldforceAITaskParser.validateOutput — accepts well-formed; rejects bad enums, oversized checklist, oversized item, missing required field, malformed JSON.
FieldforceAITaskParser.run (provider mocked) — happy path; first call malformed → retry succeeds; both fail → typed error.
error_detail sanitizer — strips sk-, sk-ant-, AIza patterns; multiple matches per string; truncates at 500 chars. (7 test cases)

Backend Integration Tests (real Postgres + mocked provider HTTP)

parse-task happy path → 200, one ai_usage row with feature='fieldforce:parse_task', one ai_access_events row with decision='allowed' and populated diagnostic columns.
Lazy provisioning: org with no ai_configs row makes first AI call → row appears with mode='trial', plan_id pointing at trial plan.
Trial-exhausted → 402, no ai_usage row, event row with decision='denied_trial_exhausted'.
Platform period rollover: first call of new month resets counters atomically; two concurrent calls both succeed; counter equals exact sum.
Global kill-switch on → all gate calls return 403 ai_globally_disabled.
BYOK happy path → mocked provider receives Authorization header with decrypted org key.
BYOK with bad ciphertext → 502 invalid_byok_key + ops-alert log line.
PUT /ai-config (BYOK) + provider accepts test → encrypted row stored → GET returns has_api_key: true, never the key.
PUT /ai-config (BYOK) + provider rejects test → 422, no row mutated.
PUT /ai-config (platform) from org Admin → 409 invalid_mode_transition.
PATCH /admin/ai-config/orgs/:id without plan_id → 409 subscription_required.
reset-trial when org is disabled → counters zeroed, mode='trial', audit row written.
All forbidden mode transitions (byok→trial, platform→trial) → 409 with current_mode + attempted_mode in body.
Provider error path → RecordError called, event row has error_code='provider_unavailable', no usage row, no counter debit.
error_detail containing sk-ant-xxx stored as sk-ant-<redacted>.
locale='ms' → mocked provider receives Malay anchor + reiteration in prompt.
Migration test: CreateAIAccessTables() against snapshot with existing rows → ai_configs gains new columns with sane defaults; ai_usage backfilled with feature='unknown'; org_ai_configs gone; seed rows idempotent on re-run.

Frontend Tests (Vitest + @solidjs/testing-library)

TaskCreationForm: tab switching; AI tab disabled on ai_disabled/ai_globally_disabled; trial counter %; Generate fills fields + switches to Manual; 402 opens upgrade dialog with text preserved; 502 model_deprecated shows banner; 502 ai_unavailable shows inline error + auto-switch after 3s.
BYOKModal: provider radio drives model dropdown from byok_model_catalog; deprecated models hidden; Save disabled until all fields valid; 422 keeps modal open with key cleared.
CurrentModeCard: renders correctly for trial/platform/byok/disabled; platform shows month-to-date progress bars; BYOK shows "Remove key" button.
UpgradeDialog: three CTAs route correctly.
PromoteToPlatformModal: all four fields required; provider drives model dropdown.
PlatformAdminOrgsTable: reset-trial requires typing org name to confirm.
KillSwitchTab: confirmation requires typing "DISABLE."

Browser Smokes (Playwright + e2e-auth-bypass)

Admin configures BYOK with sandbox key → creates task via AI mode → task is created.
Platform admin promotes disabled org to Platform → org user uses AI mode → task is created.
Platform admin resets exhausted trial → org user can use AI again (no reload).
Platform admin flips kill-switch → org user sees ai_globally_disabled and AI tab is hidden.

Prompt Evals

~30 cases in backend/go/internal/modules/fieldforce/evals/parse_task/cases.jsonl. Coverage: en/zh/ms × {service,sales,logistics,other} × {priority levels} × {with/without due hint} × {ambiguous vs clear}. Run via make eval-parse-task. Not wired to CI this round. Cost: ~$0.20/run on claude-sonnet-4-6.

Locale	Schema validity	task_type	priority	Locale-of-output
en	≥98%	≥85%	≥75%	≥98%
zh	≥95%	≥80%	≥70%	≥95%
ms	≥90%	≥75%	≥65%	≥85%

If ms locale-of-output falls below 85% after prompt iteration, a follow-up change adds post-generation locale detection + corrective retry.

Security Review Checkpoints

✓ No code path returns api_key in any response body — only has_api_key: boolean.
✓ No code path logs the decrypted key. Credentials.String() and GoString() return <redacted>.
✓ AES-GCM nonce per-row, freshly generated from crypto/rand, never reused. TestAESGCM_FreshNonces_AreUnique asserts 1024 unique nonces.
✓ Env secret length asserted at boot — missing or wrong length → fatal exit.
✓ BYOK test call has hard 5-second timeout (context.WithTimeout).
✓ BYOK test call rejected when global AI kill-switch is active.
✓ error_detail sanitizer strips key-shaped substrings (7 unit tests, covering Anthropic/OpenAI/Google patterns).
✓ Model-deprecation returns 502, not a silent model substitution.
⏳ Cross-org leak test (org A's key never visible to org B) — DEFERRED; covered by integration tests before merge.

Observability

ai.gate.authorize: org_id, feature, mode, decision, latency_ms
ai.gate.record_usage: tokens, feature, mode
ai.gate.record_error: error_code, feature, provider_request_id
ai.byok.decrypt_failed: WARN + ops-alert pipeline
ai.crypto.boot_check_failed: FATAL at startup — missing or wrong-length env secret
ai.killswitch.toggled: emitted when admin flips the global flag

No new dashboards this round. A Grafana board fed by ai_access_events (denial rate, p95 latency, error_code distribution) is a Phase 3.5 nice-to-have.

Implementation Order

Migration + crypto + entities
CreateAIAccessTables(), AESGCMCrypto (boot-time secret check), lazy-provision logic, entity structs.
Plans + Models + Defaults seed
ai_plans, ai_models, ai_config_defaults seeded with the values in §3.4–3.6.
AccessGate use case
Authorize + RecordUsage + RecordError with full unit tests and handler integration pattern.
Org-scoped AI config endpoints
GET/PUT /ai-config, DELETE /api-key, GET /usage. Includes BYOK test-call flow and mode-transition state machine.
Platform-admin AI config endpoints
Defaults, plans, models, orgs, events, kill-switch tab (5 endpoint groups).
Fieldforce AI task parser
Single combined-schema call + JSON-mode flag + retry-once + validateOutput.
Eval harness + initial cases
cmd/eval-parse-task/main.go + cases.jsonl (~30 cases). make eval-parse-task working locally.
SolidStart hooks + i18n
All TanStack Query hooks; Paraglide ai.* namespace with en/zh/ms translations.
Panel Settings → AI page
All four mode cards, BYOK modal, upgrade card, advanced disable section.
Panel task creation AI mode + upgrade dialog
AI mode tab, Generate flow, pre-fill, Redo, all error states, shared UpgradeDialog mounted at root.
Panel admin AI config pages
All 5 tabs: Defaults, Plans, Models, Orgs (with side panel), Kill-switch.
Browser smokes + security review
Playwright flows, security checkpoint sign-offs, cross-org leak integration test.

Open Questions (Post-Round-1)

Real payment processor — Stripe Checkout vs alternatives; webhook handlers that update subscription_status and subscription_valid_until.
Native structured-output across all five providers (Anthropic tools, OpenAI json_schema response_format, Google functionDeclarations, Groq tools, Ollama format).
Per-feature trial caps — today caps are org-wide; finer-grained capping needs schema + AccessGate changes.
Automated CI evals — budget tolerance; on-label-only vs on-main-only triggers.
Multi-key BYOK — fallback OpenAI when Anthropic is down.
Streaming responses for parse-task — UX gain vs implementation complexity across adapters.
Anniversary-cycle billing periods when Stripe lands (today: calendar month reset).
ai_access_events retention cron — scheduled job to purge rows older than 90 days.
Grafana board for denial rate, p95 latency, error_code distribution fed by ai_access_events.
Per-model cost rate card editor in admin UI — column already on ai_models; small follow-up to expose editing.

Overview

Scope and Non-Goals

Architecture

Key Design Points

ADR-0016 Gate Order

Data Model

Cleanup — Drop Dead Scaffold

ai_configs Extensions

ai_usage Extension

New: ai_config_defaults (singleton)

New: ai_plans

New: ai_models (catalog)

New: ai_access_events

Provisioning

Concurrency Model

API Surface

Fieldforce AI — parse-task

Org AI Config (org-scoped, Admin/Owner)

Platform-Admin Endpoints

Feature Key Registry

Audit Actions

Access Gate Logic

Use Case Interface

RecordUsage

Key Isolation Rules

Failure-Mode Reference

Frontend (SolidStart Panel)

Task Creation Form — AI Mode Tab

Org AI Settings Page

Platform-Admin AI Config Page

Promote-to-Platform Modal

Upgrade Dialog (shared)

Hooks and i18n

Prompts

Combined Output Schema

Locale Anchors

Enforcement Layer (prompt + post-hoc + retry-once)

Testing and Evals

Backend Unit Tests (no DB)

Backend Integration Tests (real Postgres + mocked provider HTTP)

Frontend Tests (Vitest + @solidjs/testing-library)

Browser Smokes (Playwright + e2e-auth-bypass)

Prompt Evals

Security Review Checkpoints

Observability

Implementation Order

Open Questions (Post-Round-1)

`ai_configs` Extensions

`ai_usage` Extension

New: `ai_config_defaults` (singleton)

New: `ai_plans`

New: `ai_models` (catalog)

New: `ai_access_events`