Fieldforce Phase 3 — Task Creation AI + BYOK Foundation

Phase 3 delivers the first user-facing AI feature in Fieldforce — natural-language task creation on the SolidStart Panel — plus the platform-wide AccessGate that all future AI calls must pass through, covering trial/platform/BYOK/disabled modes, plan caps, AES-GCM encrypted BYOK key management, and a global AI kill-switch. Three deferred Phase 2 AI features (NL task parsing, auto-categorization, AI checklist generation) are delivered as one user action via a single combined-schema LLM call.

  • Predecessor Fieldforce Phase 2 — Management Design (2026-05-15) and Phase 2 Grill Session Decisions (2026-05-19)
  • ADRs 0016 · 0021
  • Status Approved

View source markdown ↗ generated by claude · diagrams mermaid

Phase 3 ships the first user-facing AI feature in Fieldforce — natural-language task creation on the SolidStart Panel — and the platform-wide AccessGate that every future AI call must pass through, enforcing a trial → platform → BYOK → disabled mode model with plan caps, AES-GCM BYOK key management, and a global kill-switch. Three deferred Phase 2 AI features are delivered as one user action via a single combined-schema LLM call.

Overview

  • Date2026-05-19 (revised 2026-05-20 after grill session)
  • StatusApproved for implementation planning
  • AuthorWayne Cheah
  • Domain refsCONTEXT.md, ADR-0021 (table naming), ADR-0016 (two-tier feature flags)
  • Predecessorsfieldforce-management-design (2026-05-15), phase-2-grill-session-decisions (2026-05-19)
  • SurfaceSolidStart Panel — /tasks/new, /settings/ai, /admin/ai-config

Pillar A — AI Access Foundation

platform-wide
  • Mode model: trial → platform → BYOK → disabled
  • AccessGate — every AI call passes through before any prompt is built
  • New tables: ai_plans, ai_models, ai_access_events, ai_config_defaults
  • Global AI kill-switch via admin_feature_flags.key = 'ai'
  • AES-GCM BYOK key encryption with per-row nonces, boot-required env secret

Pillar B — Task Creation AI Cluster

fieldforce feature
  • NL task parsing + auto-categorization + AI checklist — three deferred Phase 2 features
  • One combined-schema LLM call per Generate action (title, description, task_type, priority, due_date_hint, checklist_items)
  • POST /fieldforce/ai/parse-task — returns pre-filled fields; never creates a task
  • AI mode tab on Panel task creation form

Scope and Non-Goals

Explicitly out of scope this round:

  • Astro mobile AI mode (immediate follow-up change — same backend, new UI).
  • Other AI features: voice-to-text logging, smart assignment, AI completion summary, predictive delay.
  • Real payment processor / subscription webhooks (manual platform-admin field updates only).
  • Native provider structured-output / tool-call enforcement — deferred to a future module-wide change; this round uses prompt + post-hoc JSON-schema validation + retry.
  • Response caching, streaming responses, automated CI evals.
  • Per-feature trial caps (mode caps are org-wide across all AI features today).

Architecture

Component map — Panel calls Go HTTP layer; application layer adds AccessGate and AITaskParser on top of the existing provider factory; new tables are managed by CreateAIAccessTables() in migrator.go.
Component map — Panel calls Go HTTP layer; application layer adds AccessGate and AITaskParser on top of the existing provider factory; new tables are managed by CreateAIAccessTables() in migrator.go.

Key Design Points

No new AI provider plumbing
Existing Go ai module already has multi-provider support, provider factory, org-scoped usage tracking, and Anthropic / OpenAI / Google / Groq / Ollama adapters. Phase 3 adds a domain use case and a platform gate on top.
Fieldforce endpoint in fieldforce module
Domain-specific prompts and validation co-locate with the rest of fieldforce. The generic ai module stays free of domain leakage.
One endpoint, one LLM call
Prior design had two sequential LLM calls (parse + checklist). A single combined-schema call is faster, cheaper, and aligns "calls" debit with user-visible Generate actions.
Native structured-output deferred
Existing adapters lack tools / response_format plumbing across all providers. Phase 3 uses JSON-mode prompts + post-hoc schema validation + retry-once. OpenAI / Groq adapters gain a ResponseJSONOnly bool flag wiring to response_format: { type: "json_object" }.
Gate-first hot path
Every AI use case calls AccessGate.Authorize(orgID, feature) before constructing any prompt, then RecordUsage after a provider response. On provider failure, RecordError updates the same access-event row without debiting counters.

ADR-0016 Gate Order

  1. Echo middleware

    auth → role → org-scope

  2. FieldforceFeatureFlagMiddleware

    Global fieldforce kill-switch + per-org fieldforce flag (per ADR-0016).

  3. Handler → AccessGate.Authorize

    a. Global AI kill-switch (admin_feature_flags.key = 'ai')
    b. Per-org mode resolution (lazy-provision trial row if absent)
    c. Subscription validity (platform mode only)
    d. Plan + per-org cap check (trial and platform)
    e. BYOK credential decrypt (BYOK mode only)

  4. Parser — LLM call

    Runs only after Authorize returns an AuthzResult.

  5. AccessGate.RecordUsage / RecordError

    On success: updates counters + access-event row. On provider failure: RecordError updates the event row with error diagnostics; no counter is debited.

Data Model

Cleanup — Drop Dead Scaffold

DROP TABLE IF EXISTS org_ai_configs;   -- phase-1 scaffolding, never wired, zero rows

ai_configs Extensions

New columns added to the live GORM-managed table (additive; one column type change):

mode
TEXT NOT NULL DEFAULT 'trial' CHECK ('trial','platform','byok','disabled')
model
TEXT — domain-specific model for AI access features (separate from chat/embedding/vision models)
api_key_encrypted
TYPE changed BYTEA (column is empty in production; cast is safe)
api_key_nonce
BYTEA — 12-byte AES-GCM nonce, generated fresh per Encrypt, never reused
Trial counters
trial_tokens_used, trial_tokens_limit, trial_calls_used, trial_calls_limit, trial_granted_at, trial_exhausted_at
Subscription state
subscription_status CHECK ('none','active','past_due','canceled','expired'), subscription_valid_until
Plan reference
plan_id UUID FK to ai_plans (added after table is created)
Platform counters
platform_tokens_used, platform_calls_used, platform_tokens_limit (NULL → fall back to plan or defaults), platform_calls_limit, platform_period_start
CHECK constraint
mode_requires_provider_model — platform/byok modes require provider + model to be non-null

ai_usage Extension

ALTER TABLE ai_usage ADD COLUMN IF NOT EXISTS feature TEXT NOT NULL DEFAULT 'unknown';
-- Unique constraint extended to include feature column:
ALTER TABLE ai_usage ADD CONSTRAINT ai_usage_unique
  UNIQUE (organization_id, usage_date, provider, model, feature);

New: ai_config_defaults (singleton)

Purpose
Last-resort fallback when org and plan caps are both NULL. Deliberately conservative.
Singleton enforcement
CONSTRAINT singleton CHECK (id = '00000000-0000-0000-0000-000000000001')
Seeded defaults
provider = 'anthropic', model = 'claude-sonnet-4-6', platform_tokens_limit = 200,000, platform_calls_limit = 200, byok_allowed_providers = '{anthropic,openai,google}'

New: ai_plans

codedisplay_nametokens_limitcalls_limitprice_cents/mo
trialFree Trial50,000200
starterStarter200,0002000
proPro2,000,0002,0000
enterpriseEnterprise20,000,00020,0000

New: ai_models (catalog)

providermodel_idbyok_visibleplatform_eligiblerecommendedinput $/1koutput $/1k
anthropicclaude-sonnet-4-60.0030.015
anthropicclaude-haiku-4-50.00080.004
openaigpt-4o0.00250.010
openaigpt-4o-mini0.000150.0006
googlegemini-2.0-pro0.001250.005
googlegemini-2.0-flash0.0000750.0003

New: ai_access_events

Purpose
Per-decision audit trail with diagnostic columns: provider, model, tokens_consumed, latency_ms, http_status, error_code, error_detail (sanitized), provider_request_id.
decision CHECK values
allowed · denied_disabled · denied_global_killswitch · denied_trial_exhausted · denied_subscription_inactive · denied_platform_cap_exceeded · denied_no_byok_key · denied_byok_decrypt_failed · denied_model_deprecated · byok_test_succeeded · byok_test_failed
Retention
Rows older than 90 days purged by a scheduled job. Cron itself is out of scope this change.
error_detail sanitization
Regex strips sk-[A-Za-z0-9_-]+, sk-ant-[A-Za-z0-9_-]+, AIza[A-Za-z0-9_-]+<redacted>; truncates to 500 chars.

Provisioning

Concurrency Model

RecordUsage uses an atomic CASE-based UPDATE for platform-mode counters so the lazy monthly reset and increment happen in one statement, race-free. Authorize is tolerant of stale-period counters — when platform_period_start < current_month_start, it treats used counters as 0 for the cap check.

API Surface

Fieldforce AI — parse-task

POST /api/organizations/:org_id/fieldforce/ai/parse-task
  Auth:  Admin | Owner | Manager | Supervisor
  Body:  { text: string (max 4000 chars), locale?: 'en'|'zh'|'ms' }
  200:   { title, description, task_type, priority, due_date_hint?, checklist_items[] }
  402:   trial_exhausted | subscription_inactive | platform_cap_exceeded
  403:   ai_disabled | ai_globally_disabled
  409:   invalid_mode_transition | subscription_required
  422:   no_byok_key | provider_not_allowed | invalid_api_key | ai_input_invalid
  429:   rate_limited
  502:   ai_unavailable | ai_output_invalid | byok_key_rejected | model_deprecated | provider_overloaded

Org AI Config (org-scoped, Admin/Owner)

GET /ai-config
Returns mode, provider, model, has_api_key: boolean (never the key itself), plan, subscription state, trial/platform counters, and byok_model_catalog (per-provider lists of byok_visible models).
PUT /ai-config
Mode transitions enforced server-side. BYOK: validates provider ∈ whitelist, performs test call with 5s hard timeout, AES-GCM encrypts key with fresh 12-byte nonce. 'trial' and 'platform' transitions via PUT → 409 (only platform-admin can move orgs into these modes).
DELETE /ai-config/api-key
Clears encrypted key + nonce. If mode='byok', flips mode to 'disabled'. Audited as AI_BYOK_KEY_REMOVED + AI_MODE_CHANGED.
GET /ai-config/usage
Usage breakdown by feature/provider/model with cost estimates derived from ai_models cost columns at query time. Supports granularity=day|month and group_by=feature|provider|model.

Platform-Admin Endpoints

GET/PUT /admin/ai-config/defaults
trial_enabled toggle, trial default provider/model, byok_allowed_providers array, last-resort platform caps.
GET/POST/PATCH /admin/ai-config/plans
CRUD on ai_plans. PATCH supports activate/deactivate.
GET/POST/PATCH/DELETE /admin/ai-config/models
CRUD on ai_models. Hard delete; orgs referencing the model see 502 model_deprecated on next call.
GET /admin/ai-config/orgs
LEFT JOIN ai_configs so orgs without a config row appear as "Not yet active". Filterable by mode/plan; searchable by org name.
PATCH /admin/ai-config/orgs/:id
trial→platform and disabled→platform MUST include plan_id + subscription_valid_until + provider + model — missing any → 409 subscription_required. Single AI_MODE_CHANGED audit row with before/after JSON.
POST /admin/ai-config/orgs/:id/reset-trial
Only allowed when current mode = 'disabled'. Zeroes trial counters, flips mode to 'trial'. Audited as AI_TRIAL_RESET.
GET /admin/ai-config/events
Paginated ai_access_events including all diagnostic columns.

Feature Key Registry

feature keyfires fromdebits counters
fieldforce:parse_taskPOST /fieldforce/ai/parse-taskYes (trial + platform)
byok:test_callBYOK validation ping on PUT /ai-configNo — accounting visibility only; never debits trial counters
unknownLegacy / pre-Phase-3 rowsN/A

Audit Actions

New actions emitted by write endpoints via the existing AuditHelper decorator:

AI_CONFIG_UPDATED · AI_BYOK_KEY_SET · AI_BYOK_KEY_REMOVED · AI_TRIAL_RESET · AI_MODE_CHANGED · AI_PLAN_UPDATED · AI_MODEL_UPDATED · AI_TRIAL_DEFAULT_CHANGED · AI_SUBSCRIPTION_STATUS_CHANGED · AI_GLOBAL_KILL_SWITCH_TOGGLED

Access Gate Logic

AccessGate.Authorize decision tree — 6-step flow from kill-switch check through mode-specific validation to credential handoff. All denied paths insert an ai_access_events row before returning.
AccessGate.Authorize decision tree — 6-step flow from kill-switch check through mode-specific validation to credential handoff. All denied paths insert an ai_access_events row before returning.

Use Case Interface

// internal/modules/ai/application/usecase/access_gate.go

func (g *AccessGate) Authorize(ctx context.Context, orgID, userID, feature, requestID string) (*AuthzResult, *AuthzError)
func (g *AccessGate) RecordUsage(ctx context.Context, authz *AuthzResult, feature string,
                                  inputTokens, outputTokens int, latencyMs int, providerRequestID string) error
func (g *AccessGate) RecordError(ctx context.Context, authz *AuthzResult, feature string,
                                  errorCode, errorDetail string, httpStatus int, latencyMs int, providerRequestID string) error

AuthzResult.Credentials has unexported fields, no JSON tags, and a String() method returning "<redacted>" — accidental fmt.Sprintf("%+v", creds) cannot leak the key.

RecordUsage

  • Atomic trial decrement; sets trial_exhausted_at via CASE when either limit is reached.
  • Atomic platform counter with lazy monthly reset CASE-update — reset and increment in one statement, race-free.
  • Per-feature usage rollup to ai_usage via ON CONFLICT DO UPDATE.
  • Idempotent diagnostic update on the access-event row keyed by request_id.

Key Isolation Rules

  1. Decrypted BYOK keys live only in process memory for the duration of one HTTP request — never written to disk, never logged, never placed in a struct that gets serialized.
  2. Credentials Go type: unexported fields, no JSON tags, String() and GoString() return "<redacted>".
  3. Echo's request-body logger has a redactor for fields named api_key.
  4. AES-GCM nonces are 12 bytes, generated fresh from crypto/rand per Encrypt, stored per-row alongside ciphertext; never reused.
  5. AI_KEY_ENCRYPTION_SECRET (32-byte base64-encoded) is required at boot. Missing or wrong length → l.Fatal, process exits. Hard fail, not silent degradation.
  6. error_detail strings pass through the regex sanitizer before storage on ai_access_events.

Failure-Mode Reference

ScenarioHTTPCodeUI behavior
Global AI kill-switch active403ai_globally_disabledAI tab hidden; "AI features temporarily disabled platform-wide" banner
No config row + trial disabled403ai_disabledAI tab hidden
mode = 'disabled'403ai_disabledAI tab hidden
Trial tokens or calls exhausted402trial_exhaustedUpgrade dialog opens; text preserved
Platform subscription not active/expired402subscription_inactive"Subscription lapsed — contact account manager"
Platform monthly cap exceeded402platform_cap_exceeded"Monthly limit reached. Contact support or wait until {next_month}."
BYOK, no key stored422no_byok_key"Configure BYOK in Settings" CTA
BYOK decrypt fail (data corruption)502invalid_byok_key"Re-enter your key" + ops alert
BYOK test call rejected by provider422invalid_api_keyInline modal error, key field cleared
Provider not in BYOK whitelist422provider_not_allowedInline modal error
Forbidden mode transition409invalid_mode_transitionInline explanation with current_mode + attempted_mode
Promote without plan/subscription409subscription_requiredModal validation failure
Org model deprecated502model_deprecatedBanner: "Selected model removed — pick a new one in Settings"
Provider 4xx/5xx/timeout502ai_unavailable"AI temporarily unavailable"
BYOK provider rejects key mid-call502byok_key_rejected"Your API key was rejected — re-enter it"
Output invalid after retry502ai_output_invalid"AI couldn't parse — try rephrasing"
Provider rate limited429rate_limited"Too many requests — try again in {retry_after}"

402, 409, 422, 502 distinctions per CONTEXT.md "HTTP status discipline". A 5xx provider failure does not debit any counter.

Frontend (SolidStart Panel)

Task Creation Form — AI Mode Tab

Route: /app/fieldforce/tasks/new (extended). Tab bar adds ✨ AI mode alongside the unchanged Manual tab.

Manual
✨ AI mode
✨ AI filled this in. Review and edit before saving.
Task creation form — AI mode tab. Free-text input with char counter, trial pill, Generate button, and post-generate success banner with Redo affordance.
On Generate (success)
Form switches to Manual tab with all fields pre-filled. Banner: "✨ AI filled this in. Review and edit before saving." + Redo with AI button (re-opens AI textarea; replacing edited fields prompts confirm).
Trial counter
Fetched via useAIConfig() on form mount; refreshed after each Generate response.
AI tab hidden when
ai_disabled or ai_globally_disabled — Manual is the only path.
402 codes
Upgrade dialog opens; user's text is preserved.
502 ai_unavailable / ai_output_invalid
Inline error below textarea; auto-switch to Manual after 3 seconds.
502 model_deprecated
Banner above tab: "Your AI model was removed. Go to Settings → AI to pick a new one."

Org AI Settings Page

Route: /app/settings/ai (new). Access: Admin / Owner.

Trial mode

default
  • Label + "Model: Claude Sonnet 4.6 (platform default)"
  • Progress bars for tokens + calls used
  • Per-feature usage breakdown table
  • Upgrade card → "Contact sales" mailto link

Platform mode

paid
  • Plan name + subscription valid-until
  • Change model → picker filtered to platform_eligible
  • Month-to-date tokens + calls progress bars
  • Subscription banner if past_due

BYOK mode

bring your own key
  • "BYOK — Anthropic — claude-sonnet-4-6"
  • Per-feature usage chart (no caps)
  • Remove key · Change buttons
  • BYOK modal: provider radio → model dropdown (filtered by provider, deprecated_at hidden) → API key input → Save and test

Disabled mode

no AI
  • "AI is disabled for your org. Contact your admin to re-enable."
  • AI mode tab hidden everywhere

Advanced section (collapsed): Disable AI for the org. Confirm dialog explains effect. Calls PUT /ai-config { mode: 'disabled' }. No path from inside the task form to switch mode — must go through Settings.

Platform-Admin AI Config Page

Route: /admin/ai-config (new). Access: platform_admin. Five tabs:

Tab 1 — Defaults
trial enabled toggle, trial default provider/model (filtered to platform_eligible), BYOK allowed providers checkboxes, last-resort platform token/call limits. → PUT /admin/ai-config/defaults
Tab 2 — Plans
Table editor: add/edit/activate-deactivate ai_plans rows. Fields: code, display_name, tokens_limit, calls_limit, price_cents_per_month, is_active.
Tab 3 — Models
Table editor: add/edit/soft-deprecate ai_models rows. UI prevents platform_eligible = true unless cost columns are populated.
Tab 4 — Orgs
All orgs with mode/plan/subscription/usage/last_active_at. Row opens side panel: Promote to Platform (modal), Reset trial (only from disabled; requires typing org name), Disable AI, Adjust limits, Change plan, Update subscription state, last 20 access events for the org.
Tab 5 — Kill-switch
Toggle backed by admin_feature_flags.key = 'ai'. Confirmation requires typing the word "DISABLE." Audited as AI_GLOBAL_KILL_SWITCH_TOGGLED.

Promote-to-Platform Modal

Used from the admin Orgs side panel. All four fields are required: Plan (from active ai_plans), Subscription valid until (date picker), Provider (distinct providers from platform_eligible models), Model (filtered by selected provider). Refuses with 409 subscription_required if any field is missing. Single AI_MODE_CHANGED audit row with before/after JSON on success.

Upgrade Dialog (shared)

Mounted once at root layout; visibility driven by a global Solid signal raised on any 402 response. Three CTAs:

  • Upgrade planmailto: contact-sales URL (env-configurable).
  • Use your own key → deep-link to /app/settings/ai?openByok=1 (auto-opens BYOK modal).
  • Maybe later → close dialog.

Hooks and i18n

Org hooks (features/ai/)
useAIConfig() cached at root layout, invalidated on writes; useUpdateAIConfig(); useParseTask() — raises upgrade dialog signal on 402; useAIUsage()
Platform-admin hooks (features/admin-ai/)
useAIDefaults, useListAIPlans, useUpsertAIPlan, useListAIModels, useUpsertAIModel, useDeleteAIModel, useListAIOrgs, usePatchAIOrg, useResetTrial, usePromoteToPlatform, useListAIEvents, useAIKillSwitch, useToggleAIKillSwitch
i18n
All UI strings through Paraglide (ai.* namespace), en/zh/ms translations in this change. AI-generated output follows caller's UI locale via prompt instruction.
Acceptance bars
en ≥98%, zh ≥95%, ms ≥85% locale-of-output correctness
Deliberately out of UI scope
  • No "switch to BYOK" path from inside the task form — must go through Settings.
  • No per-feature toggle — mode is org-wide.
  • No real-time trial counter via WebSocket — refreshes after each parse-task response.
  • No usage-chart drill-down beyond the 30-day card.
  • No inline subscription/payment flow — "Contact sales" until Stripe lands.

Prompts

Combined Output Schema

{
  "type": "object",
  "required": ["title", "description", "task_type", "priority", "checklist_items"],
  "properties": {
    "title":         { "type": "string", "maxLength": 120 },
    "description":   { "type": "string", "maxLength": 2000 },
    "task_type":     { "enum": ["service", "sales", "logistics", "other"] },
    "priority":      { "enum": ["low", "medium", "high", "urgent"] },
    "due_date_hint": { "type": "string", "format": "date-time" },
    "checklist_items": {
      "type": "array", "minItems": 0, "maxItems": 10,
      "items": { "type": "string", "maxLength": 120 }
    }
  },
  "additionalProperties": false
}

Locale Anchors

LocaleLOCALE_ANCHORLOCALE_REITERATION
en(empty)"Reminder: respond in English."
zh"Respond in Simplified Chinese (简体中文).""提醒:必须用简体中文回答。"
ms"Respond in Bahasa Melayu (Malaysian Malay, not Indonesian). Example title: 'Hantar invoice ke pelanggan ABC sebelum Jumaat.'""Peringatan: jawapan MESTI dalam Bahasa Melayu."

Enforcement Layer (prompt + post-hoc + retry-once)

  1. Prompt with JSON instruction

    System prompt includes "Respond as JSON matching the provided schema. No commentary. No markdown." OpenAI/Groq adapters also receive response_format: { type: "json_object" } via the new ResponseJSONOnly bool field on ChatRequest.

  2. JSON decode and schema validation

    Decode the model's Content field; validate against the JSON schema using github.com/santhosh-tekuri/jsonschema/v5 (or similar).

  3. Retry once on failure

    On JSON parse failure or schema-validation failure, retry with a corrective prompt that includes the validator error and the prior output.

  4. 502 on second failure

    Return ai_output_invalid. UI shows "AI couldn't parse — try rephrasing or fill in manually." and auto-switches to Manual tab after 3 seconds.

Testing and Evals

Backend Unit Tests (no DB)

  • AccessGate.Authorize — one case per branch: no row + trial enabled, no row + trial disabled, mode=disabled, global kill-switch denied, trial at tokens limit, trial at calls limit, trial under limit, platform subscription expired, platform cap exceeded, platform model deprecated, BYOK no key, BYOK decrypt fails, BYOK happy path.
  • AccessGate.RecordUsage — trial decrement atomic, platform lazy-reset CASE, platform increment in current period, BYOK no counter decrement, ai_usage ON CONFLICT rollup, idempotent on request_id.
  • AESGCMCrypto — round-trip, wrong nonce/key/tamper fail, 1024 unique nonces asserted.
  • FieldforceAITaskParser.validateOutput — accepts well-formed; rejects bad enums, oversized checklist, oversized item, missing required field, malformed JSON.
  • FieldforceAITaskParser.run (provider mocked) — happy path; first call malformed → retry succeeds; both fail → typed error.
  • error_detail sanitizer — strips sk-, sk-ant-, AIza patterns; multiple matches per string; truncates at 500 chars. (7 test cases)

Backend Integration Tests (real Postgres + mocked provider HTTP)

  • parse-task happy path → 200, one ai_usage row with feature='fieldforce:parse_task', one ai_access_events row with decision='allowed' and populated diagnostic columns.
  • Lazy provisioning: org with no ai_configs row makes first AI call → row appears with mode='trial', plan_id pointing at trial plan.
  • Trial-exhausted → 402, no ai_usage row, event row with decision='denied_trial_exhausted'.
  • Platform period rollover: first call of new month resets counters atomically; two concurrent calls both succeed; counter equals exact sum.
  • Global kill-switch on → all gate calls return 403 ai_globally_disabled.
  • BYOK happy path → mocked provider receives Authorization header with decrypted org key.
  • BYOK with bad ciphertext → 502 invalid_byok_key + ops-alert log line.
  • PUT /ai-config (BYOK) + provider accepts test → encrypted row stored → GET returns has_api_key: true, never the key.
  • PUT /ai-config (BYOK) + provider rejects test → 422, no row mutated.
  • PUT /ai-config (platform) from org Admin → 409 invalid_mode_transition.
  • PATCH /admin/ai-config/orgs/:id without plan_id → 409 subscription_required.
  • reset-trial when org is disabled → counters zeroed, mode='trial', audit row written.
  • All forbidden mode transitions (byok→trial, platform→trial) → 409 with current_mode + attempted_mode in body.
  • Provider error path → RecordError called, event row has error_code='provider_unavailable', no usage row, no counter debit.
  • error_detail containing sk-ant-xxx stored as sk-ant-<redacted>.
  • locale='ms' → mocked provider receives Malay anchor + reiteration in prompt.
  • Migration test: CreateAIAccessTables() against snapshot with existing rows → ai_configs gains new columns with sane defaults; ai_usage backfilled with feature='unknown'; org_ai_configs gone; seed rows idempotent on re-run.

Frontend Tests (Vitest + @solidjs/testing-library)

  • TaskCreationForm: tab switching; AI tab disabled on ai_disabled/ai_globally_disabled; trial counter %; Generate fills fields + switches to Manual; 402 opens upgrade dialog with text preserved; 502 model_deprecated shows banner; 502 ai_unavailable shows inline error + auto-switch after 3s.
  • BYOKModal: provider radio drives model dropdown from byok_model_catalog; deprecated models hidden; Save disabled until all fields valid; 422 keeps modal open with key cleared.
  • CurrentModeCard: renders correctly for trial/platform/byok/disabled; platform shows month-to-date progress bars; BYOK shows "Remove key" button.
  • UpgradeDialog: three CTAs route correctly.
  • PromoteToPlatformModal: all four fields required; provider drives model dropdown.
  • PlatformAdminOrgsTable: reset-trial requires typing org name to confirm.
  • KillSwitchTab: confirmation requires typing "DISABLE."

Browser Smokes (Playwright + e2e-auth-bypass)

  • Admin configures BYOK with sandbox key → creates task via AI mode → task is created.
  • Platform admin promotes disabled org to Platform → org user uses AI mode → task is created.
  • Platform admin resets exhausted trial → org user can use AI again (no reload).
  • Platform admin flips kill-switch → org user sees ai_globally_disabled and AI tab is hidden.

Prompt Evals

~30 cases in backend/go/internal/modules/fieldforce/evals/parse_task/cases.jsonl. Coverage: en/zh/ms × {service,sales,logistics,other} × {priority levels} × {with/without due hint} × {ambiguous vs clear}. Run via make eval-parse-task. Not wired to CI this round. Cost: ~$0.20/run on claude-sonnet-4-6.

LocaleSchema validitytask_typepriorityLocale-of-output
en≥98%≥85%≥75%≥98%
zh≥95%≥80%≥70%≥95%
ms≥90%≥75%≥65%≥85%

If ms locale-of-output falls below 85% after prompt iteration, a follow-up change adds post-generation locale detection + corrective retry.

Security Review Checkpoints

  • ✓ No code path returns api_key in any response body — only has_api_key: boolean.
  • ✓ No code path logs the decrypted key. Credentials.String() and GoString() return <redacted>.
  • ✓ AES-GCM nonce per-row, freshly generated from crypto/rand, never reused. TestAESGCM_FreshNonces_AreUnique asserts 1024 unique nonces.
  • ✓ Env secret length asserted at boot — missing or wrong length → fatal exit.
  • ✓ BYOK test call has hard 5-second timeout (context.WithTimeout).
  • ✓ BYOK test call rejected when global AI kill-switch is active.
  • ✓ error_detail sanitizer strips key-shaped substrings (7 unit tests, covering Anthropic/OpenAI/Google patterns).
  • ✓ Model-deprecation returns 502, not a silent model substitution.
  • ⏳ Cross-org leak test (org A's key never visible to org B) — DEFERRED; covered by integration tests before merge.

Observability

ai.gate.authorize
org_id, feature, mode, decision, latency_ms
ai.gate.record_usage
tokens, feature, mode
ai.gate.record_error
error_code, feature, provider_request_id
ai.byok.decrypt_failed
WARN + ops-alert pipeline
ai.crypto.boot_check_failed
FATAL at startup — missing or wrong-length env secret
ai.killswitch.toggled
emitted when admin flips the global flag

No new dashboards this round. A Grafana board fed by ai_access_events (denial rate, p95 latency, error_code distribution) is a Phase 3.5 nice-to-have.

Implementation Order

  1. Migration + crypto + entities

    CreateAIAccessTables(), AESGCMCrypto (boot-time secret check), lazy-provision logic, entity structs.

  2. Plans + Models + Defaults seed

    ai_plans, ai_models, ai_config_defaults seeded with the values in §3.4–3.6.

  3. AccessGate use case

    Authorize + RecordUsage + RecordError with full unit tests and handler integration pattern.

  4. Org-scoped AI config endpoints

    GET/PUT /ai-config, DELETE /api-key, GET /usage. Includes BYOK test-call flow and mode-transition state machine.

  5. Platform-admin AI config endpoints

    Defaults, plans, models, orgs, events, kill-switch tab (5 endpoint groups).

  6. Fieldforce AI task parser

    Single combined-schema call + JSON-mode flag + retry-once + validateOutput.

  7. Eval harness + initial cases

    cmd/eval-parse-task/main.go + cases.jsonl (~30 cases). make eval-parse-task working locally.

  8. SolidStart hooks + i18n

    All TanStack Query hooks; Paraglide ai.* namespace with en/zh/ms translations.

  9. Panel Settings → AI page

    All four mode cards, BYOK modal, upgrade card, advanced disable section.

  10. Panel task creation AI mode + upgrade dialog

    AI mode tab, Generate flow, pre-fill, Redo, all error states, shared UpgradeDialog mounted at root.

  11. Panel admin AI config pages

    All 5 tabs: Defaults, Plans, Models, Orgs (with side panel), Kill-switch.

  12. Browser smokes + security review

    Playwright flows, security checkpoint sign-offs, cross-org leak integration test.

Open Questions (Post-Round-1)

  • Real payment processor — Stripe Checkout vs alternatives; webhook handlers that update subscription_status and subscription_valid_until.
  • Native structured-output across all five providers (Anthropic tools, OpenAI json_schema response_format, Google functionDeclarations, Groq tools, Ollama format).
  • Per-feature trial caps — today caps are org-wide; finer-grained capping needs schema + AccessGate changes.
  • Automated CI evals — budget tolerance; on-label-only vs on-main-only triggers.
  • Multi-key BYOK — fallback OpenAI when Anthropic is down.
  • Streaming responses for parse-task — UX gain vs implementation complexity across adapters.
  • Anniversary-cycle billing periods when Stripe lands (today: calendar month reset).
  • ai_access_events retention cron — scheduled job to purge rows older than 90 days.
  • Grafana board for denial rate, p95 latency, error_code distribution fed by ai_access_events.
  • Per-model cost rate card editor in admin UI — column already on ai_models; small follow-up to expose editing.