Fieldforce Phase 3 — Task Creation AI + BYOK Foundation
Phase 3 delivers the first user-facing AI feature in Fieldforce — natural-language task creation on the SolidStart Panel — plus the platform-wide AccessGate that all future AI calls must pass through, covering trial/platform/BYOK/disabled modes, plan caps, AES-GCM encrypted BYOK key management, and a global AI kill-switch. Three deferred Phase 2 AI features (NL task parsing, auto-categorization, AI checklist generation) are delivered as one user action via a single combined-schema LLM call.
Phase 3 ships the first user-facing AI feature in Fieldforce — natural-language task creation on the SolidStart Panel — and the platform-wide AccessGate that every future AI call must pass through, enforcing a trial → platform → BYOK → disabled mode model with plan caps, AES-GCM BYOK key management, and a global kill-switch. Three deferred Phase 2 AI features are delivered as one user action via a single combined-schema LLM call.
Overview
Pillar A — AI Access Foundation
platform-wide- Mode model:
trial → platform → BYOK → disabled - AccessGate — every AI call passes through before any prompt is built
- New tables:
ai_plans,ai_models,ai_access_events,ai_config_defaults - Global AI kill-switch via
admin_feature_flags.key = 'ai' - AES-GCM BYOK key encryption with per-row nonces, boot-required env secret
Pillar B — Task Creation AI Cluster
fieldforce feature- NL task parsing + auto-categorization + AI checklist — three deferred Phase 2 features
- One combined-schema LLM call per Generate action (title, description, task_type, priority, due_date_hint, checklist_items)
POST /fieldforce/ai/parse-task— returns pre-filled fields; never creates a task- AI mode tab on Panel task creation form
Scope and Non-Goals
Explicitly out of scope this round:
- Astro mobile AI mode (immediate follow-up change — same backend, new UI).
- Other AI features: voice-to-text logging, smart assignment, AI completion summary, predictive delay.
- Real payment processor / subscription webhooks (manual platform-admin field updates only).
- Native provider structured-output / tool-call enforcement — deferred to a future module-wide change; this round uses prompt + post-hoc JSON-schema validation + retry.
- Response caching, streaming responses, automated CI evals.
- Per-feature trial caps (mode caps are org-wide across all AI features today).
Architecture
Key Design Points
- No new AI provider plumbing
- Existing Go
aimodule already has multi-provider support, provider factory, org-scoped usage tracking, and Anthropic / OpenAI / Google / Groq / Ollama adapters. Phase 3 adds a domain use case and a platform gate on top. - Fieldforce endpoint in fieldforce module
- Domain-specific prompts and validation co-locate with the rest of fieldforce. The generic
aimodule stays free of domain leakage. - One endpoint, one LLM call
- Prior design had two sequential LLM calls (parse + checklist). A single combined-schema call is faster, cheaper, and aligns "calls" debit with user-visible Generate actions.
- Native structured-output deferred
- Existing adapters lack
tools/response_formatplumbing across all providers. Phase 3 uses JSON-mode prompts + post-hoc schema validation + retry-once. OpenAI / Groq adapters gain aResponseJSONOnly boolflag wiring toresponse_format: { type: "json_object" }. - Gate-first hot path
- Every AI use case calls
AccessGate.Authorize(orgID, feature)before constructing any prompt, thenRecordUsageafter a provider response. On provider failure,RecordErrorupdates the same access-event row without debiting counters.
ADR-0016 Gate Order
-
Echo middleware
auth → role → org-scope
-
FieldforceFeatureFlagMiddleware
Global fieldforce kill-switch + per-org fieldforce flag (per ADR-0016).
-
Handler → AccessGate.Authorize
a. Global AI kill-switch (
admin_feature_flags.key = 'ai')
b. Per-org mode resolution (lazy-provision trial row if absent)
c. Subscription validity (platform mode only)
d. Plan + per-org cap check (trial and platform)
e. BYOK credential decrypt (BYOK mode only) -
Parser — LLM call
Runs only after Authorize returns an
AuthzResult. -
AccessGate.RecordUsage / RecordError
On success: updates counters + access-event row. On provider failure: RecordError updates the event row with error diagnostics; no counter is debited.
Data Model
Cleanup — Drop Dead Scaffold
DROP TABLE IF EXISTS org_ai_configs; -- phase-1 scaffolding, never wired, zero rows
ai_configs Extensions
New columns added to the live GORM-managed table (additive; one column type change):
mode- TEXT NOT NULL DEFAULT 'trial' CHECK ('trial','platform','byok','disabled')
model- TEXT — domain-specific model for AI access features (separate from chat/embedding/vision models)
api_key_encrypted- TYPE changed BYTEA (column is empty in production; cast is safe)
api_key_nonce- BYTEA — 12-byte AES-GCM nonce, generated fresh per Encrypt, never reused
- Trial counters
trial_tokens_used,trial_tokens_limit,trial_calls_used,trial_calls_limit,trial_granted_at,trial_exhausted_at- Subscription state
subscription_statusCHECK ('none','active','past_due','canceled','expired'),subscription_valid_until- Plan reference
plan_idUUID FK toai_plans(added after table is created)- Platform counters
platform_tokens_used,platform_calls_used,platform_tokens_limit(NULL → fall back to plan or defaults),platform_calls_limit,platform_period_start- CHECK constraint
mode_requires_provider_model— platform/byok modes require provider + model to be non-null
ai_usage Extension
ALTER TABLE ai_usage ADD COLUMN IF NOT EXISTS feature TEXT NOT NULL DEFAULT 'unknown';
-- Unique constraint extended to include feature column:
ALTER TABLE ai_usage ADD CONSTRAINT ai_usage_unique
UNIQUE (organization_id, usage_date, provider, model, feature);
New: ai_config_defaults (singleton)
- Purpose
- Last-resort fallback when org and plan caps are both NULL. Deliberately conservative.
- Singleton enforcement
CONSTRAINT singleton CHECK (id = '00000000-0000-0000-0000-000000000001')- Seeded defaults
- provider = 'anthropic', model = 'claude-sonnet-4-6', platform_tokens_limit = 200,000, platform_calls_limit = 200, byok_allowed_providers = '{anthropic,openai,google}'
New: ai_plans
| code | display_name | tokens_limit | calls_limit | price_cents/mo |
|---|---|---|---|---|
| trial | Free Trial | 50,000 | 20 | 0 |
| starter | Starter | 200,000 | 200 | 0 |
| pro | Pro | 2,000,000 | 2,000 | 0 |
| enterprise | Enterprise | 20,000,000 | 20,000 | 0 |
New: ai_models (catalog)
| provider | model_id | byok_visible | platform_eligible | recommended | input $/1k | output $/1k |
|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-6 | ✓ | ✓ | ★ | 0.003 | 0.015 |
| anthropic | claude-haiku-4-5 | ✓ | ✓ | 0.0008 | 0.004 | |
| openai | gpt-4o | ✓ | ★ | 0.0025 | 0.010 | |
| openai | gpt-4o-mini | ✓ | 0.00015 | 0.0006 | ||
| gemini-2.0-pro | ✓ | ★ | 0.00125 | 0.005 | ||
| gemini-2.0-flash | ✓ | 0.000075 | 0.0003 |
New: ai_access_events
- Purpose
- Per-decision audit trail with diagnostic columns: provider, model, tokens_consumed, latency_ms, http_status, error_code, error_detail (sanitized), provider_request_id.
- decision CHECK values
allowed·denied_disabled·denied_global_killswitch·denied_trial_exhausted·denied_subscription_inactive·denied_platform_cap_exceeded·denied_no_byok_key·denied_byok_decrypt_failed·denied_model_deprecated·byok_test_succeeded·byok_test_failed- Retention
- Rows older than 90 days purged by a scheduled job. Cron itself is out of scope this change.
- error_detail sanitization
- Regex strips
sk-[A-Za-z0-9_-]+,sk-ant-[A-Za-z0-9_-]+,AIza[A-Za-z0-9_-]+→<redacted>; truncates to 500 chars.
Provisioning
Concurrency Model
RecordUsage uses an atomic CASE-based UPDATE for platform-mode counters so the lazy monthly reset and increment happen in one statement, race-free. Authorize is tolerant of stale-period counters — when platform_period_start < current_month_start, it treats used counters as 0 for the cap check.
API Surface
Fieldforce AI — parse-task
POST /api/organizations/:org_id/fieldforce/ai/parse-task
Auth: Admin | Owner | Manager | Supervisor
Body: { text: string (max 4000 chars), locale?: 'en'|'zh'|'ms' }
200: { title, description, task_type, priority, due_date_hint?, checklist_items[] }
402: trial_exhausted | subscription_inactive | platform_cap_exceeded
403: ai_disabled | ai_globally_disabled
409: invalid_mode_transition | subscription_required
422: no_byok_key | provider_not_allowed | invalid_api_key | ai_input_invalid
429: rate_limited
502: ai_unavailable | ai_output_invalid | byok_key_rejected | model_deprecated | provider_overloaded
Org AI Config (org-scoped, Admin/Owner)
- GET /ai-config
- Returns mode, provider, model,
has_api_key: boolean(never the key itself), plan, subscription state, trial/platform counters, andbyok_model_catalog(per-provider lists of byok_visible models). - PUT /ai-config
- Mode transitions enforced server-side. BYOK: validates provider ∈ whitelist, performs test call with 5s hard timeout, AES-GCM encrypts key with fresh 12-byte nonce.
'trial'and'platform'transitions via PUT → 409 (only platform-admin can move orgs into these modes). - DELETE /ai-config/api-key
- Clears encrypted key + nonce. If mode='byok', flips mode to 'disabled'. Audited as
AI_BYOK_KEY_REMOVED+AI_MODE_CHANGED. - GET /ai-config/usage
- Usage breakdown by feature/provider/model with cost estimates derived from
ai_modelscost columns at query time. Supportsgranularity=day|monthandgroup_by=feature|provider|model.
Platform-Admin Endpoints
- GET/PUT /admin/ai-config/defaults
- trial_enabled toggle, trial default provider/model, byok_allowed_providers array, last-resort platform caps.
- GET/POST/PATCH /admin/ai-config/plans
- CRUD on
ai_plans. PATCH supports activate/deactivate. - GET/POST/PATCH/DELETE /admin/ai-config/models
- CRUD on
ai_models. Hard delete; orgs referencing the model see 502model_deprecatedon next call. - GET /admin/ai-config/orgs
- LEFT JOIN ai_configs so orgs without a config row appear as "Not yet active". Filterable by mode/plan; searchable by org name.
- PATCH /admin/ai-config/orgs/:id
- trial→platform and disabled→platform MUST include plan_id + subscription_valid_until + provider + model — missing any → 409
subscription_required. SingleAI_MODE_CHANGEDaudit row with before/after JSON. - POST /admin/ai-config/orgs/:id/reset-trial
- Only allowed when current mode = 'disabled'. Zeroes trial counters, flips mode to 'trial'. Audited as
AI_TRIAL_RESET. - GET /admin/ai-config/events
- Paginated ai_access_events including all diagnostic columns.
Feature Key Registry
| feature key | fires from | debits counters |
|---|---|---|
fieldforce:parse_task | POST /fieldforce/ai/parse-task | Yes (trial + platform) |
byok:test_call | BYOK validation ping on PUT /ai-config | No — accounting visibility only; never debits trial counters |
unknown | Legacy / pre-Phase-3 rows | N/A |
Audit Actions
New actions emitted by write endpoints via the existing AuditHelper decorator:
AI_CONFIG_UPDATED · AI_BYOK_KEY_SET · AI_BYOK_KEY_REMOVED · AI_TRIAL_RESET · AI_MODE_CHANGED · AI_PLAN_UPDATED · AI_MODEL_UPDATED · AI_TRIAL_DEFAULT_CHANGED · AI_SUBSCRIPTION_STATUS_CHANGED · AI_GLOBAL_KILL_SWITCH_TOGGLED
Access Gate Logic
Use Case Interface
// internal/modules/ai/application/usecase/access_gate.go
func (g *AccessGate) Authorize(ctx context.Context, orgID, userID, feature, requestID string) (*AuthzResult, *AuthzError)
func (g *AccessGate) RecordUsage(ctx context.Context, authz *AuthzResult, feature string,
inputTokens, outputTokens int, latencyMs int, providerRequestID string) error
func (g *AccessGate) RecordError(ctx context.Context, authz *AuthzResult, feature string,
errorCode, errorDetail string, httpStatus int, latencyMs int, providerRequestID string) error
AuthzResult.Credentials has unexported fields, no JSON tags, and a String() method returning "<redacted>" — accidental fmt.Sprintf("%+v", creds) cannot leak the key.
RecordUsage
- Atomic trial decrement; sets
trial_exhausted_atvia CASE when either limit is reached. - Atomic platform counter with lazy monthly reset CASE-update — reset and increment in one statement, race-free.
- Per-feature usage rollup to
ai_usageviaON CONFLICT DO UPDATE. - Idempotent diagnostic update on the access-event row keyed by request_id.
Key Isolation Rules
- Decrypted BYOK keys live only in process memory for the duration of one HTTP request — never written to disk, never logged, never placed in a struct that gets serialized.
CredentialsGo type: unexported fields, no JSON tags,String()andGoString()return"<redacted>".- Echo's request-body logger has a redactor for fields named
api_key. - AES-GCM nonces are 12 bytes, generated fresh from
crypto/randper Encrypt, stored per-row alongside ciphertext; never reused. AI_KEY_ENCRYPTION_SECRET(32-byte base64-encoded) is required at boot. Missing or wrong length →l.Fatal, process exits. Hard fail, not silent degradation.error_detailstrings pass through the regex sanitizer before storage onai_access_events.
Failure-Mode Reference
| Scenario | HTTP | Code | UI behavior |
|---|---|---|---|
| Global AI kill-switch active | 403 | ai_globally_disabled | AI tab hidden; "AI features temporarily disabled platform-wide" banner |
| No config row + trial disabled | 403 | ai_disabled | AI tab hidden |
| mode = 'disabled' | 403 | ai_disabled | AI tab hidden |
| Trial tokens or calls exhausted | 402 | trial_exhausted | Upgrade dialog opens; text preserved |
| Platform subscription not active/expired | 402 | subscription_inactive | "Subscription lapsed — contact account manager" |
| Platform monthly cap exceeded | 402 | platform_cap_exceeded | "Monthly limit reached. Contact support or wait until {next_month}." |
| BYOK, no key stored | 422 | no_byok_key | "Configure BYOK in Settings" CTA |
| BYOK decrypt fail (data corruption) | 502 | invalid_byok_key | "Re-enter your key" + ops alert |
| BYOK test call rejected by provider | 422 | invalid_api_key | Inline modal error, key field cleared |
| Provider not in BYOK whitelist | 422 | provider_not_allowed | Inline modal error |
| Forbidden mode transition | 409 | invalid_mode_transition | Inline explanation with current_mode + attempted_mode |
| Promote without plan/subscription | 409 | subscription_required | Modal validation failure |
| Org model deprecated | 502 | model_deprecated | Banner: "Selected model removed — pick a new one in Settings" |
| Provider 4xx/5xx/timeout | 502 | ai_unavailable | "AI temporarily unavailable" |
| BYOK provider rejects key mid-call | 502 | byok_key_rejected | "Your API key was rejected — re-enter it" |
| Output invalid after retry | 502 | ai_output_invalid | "AI couldn't parse — try rephrasing" |
| Provider rate limited | 429 | rate_limited | "Too many requests — try again in {retry_after}" |
402, 409, 422, 502 distinctions per CONTEXT.md "HTTP status discipline". A 5xx provider failure does not debit any counter.
Frontend (SolidStart Panel)
Task Creation Form — AI Mode Tab
Route: /app/fieldforce/tasks/new (extended). Tab bar adds ✨ AI mode alongside the unchanged Manual tab.
- On Generate (success)
- Form switches to Manual tab with all fields pre-filled. Banner: "✨ AI filled this in. Review and edit before saving." + Redo with AI button (re-opens AI textarea; replacing edited fields prompts confirm).
- Trial counter
- Fetched via
useAIConfig()on form mount; refreshed after each Generate response. - AI tab hidden when
ai_disabledorai_globally_disabled— Manual is the only path.- 402 codes
- Upgrade dialog opens; user's text is preserved.
- 502 ai_unavailable / ai_output_invalid
- Inline error below textarea; auto-switch to Manual after 3 seconds.
- 502 model_deprecated
- Banner above tab: "Your AI model was removed. Go to Settings → AI to pick a new one."
Org AI Settings Page
Route: /app/settings/ai (new). Access: Admin / Owner.
Trial mode
default- Label + "Model: Claude Sonnet 4.6 (platform default)"
- Progress bars for tokens + calls used
- Per-feature usage breakdown table
- Upgrade card → "Contact sales" mailto link
Platform mode
paid- Plan name + subscription valid-until
- Change model → picker filtered to
platform_eligible - Month-to-date tokens + calls progress bars
- Subscription banner if
past_due
BYOK mode
bring your own key- "BYOK — Anthropic — claude-sonnet-4-6"
- Per-feature usage chart (no caps)
- Remove key · Change buttons
- BYOK modal: provider radio → model dropdown (filtered by provider, deprecated_at hidden) → API key input → Save and test
Disabled mode
no AI- "AI is disabled for your org. Contact your admin to re-enable."
- AI mode tab hidden everywhere
Advanced section (collapsed): Disable AI for the org. Confirm dialog explains effect. Calls PUT /ai-config { mode: 'disabled' }. No path from inside the task form to switch mode — must go through Settings.
Platform-Admin AI Config Page
Route: /admin/ai-config (new). Access: platform_admin. Five tabs:
- Tab 1 — Defaults
- trial enabled toggle, trial default provider/model (filtered to platform_eligible), BYOK allowed providers checkboxes, last-resort platform token/call limits. →
PUT /admin/ai-config/defaults - Tab 2 — Plans
- Table editor: add/edit/activate-deactivate ai_plans rows. Fields: code, display_name, tokens_limit, calls_limit, price_cents_per_month, is_active.
- Tab 3 — Models
- Table editor: add/edit/soft-deprecate ai_models rows. UI prevents
platform_eligible = trueunless cost columns are populated. - Tab 4 — Orgs
- All orgs with mode/plan/subscription/usage/last_active_at. Row opens side panel: Promote to Platform (modal), Reset trial (only from disabled; requires typing org name), Disable AI, Adjust limits, Change plan, Update subscription state, last 20 access events for the org.
- Tab 5 — Kill-switch
- Toggle backed by
admin_feature_flags.key = 'ai'. Confirmation requires typing the word "DISABLE." Audited asAI_GLOBAL_KILL_SWITCH_TOGGLED.
Promote-to-Platform Modal
Used from the admin Orgs side panel. All four fields are required: Plan (from active ai_plans), Subscription valid until (date picker), Provider (distinct providers from platform_eligible models), Model (filtered by selected provider). Refuses with 409 subscription_required if any field is missing. Single AI_MODE_CHANGED audit row with before/after JSON on success.
Upgrade Dialog (shared)
Mounted once at root layout; visibility driven by a global Solid signal raised on any 402 response. Three CTAs:
- Upgrade plan →
mailto:contact-sales URL (env-configurable). - Use your own key → deep-link to
/app/settings/ai?openByok=1(auto-opens BYOK modal). - Maybe later → close dialog.
Hooks and i18n
- Org hooks (
features/ai/) useAIConfig()cached at root layout, invalidated on writes;useUpdateAIConfig();useParseTask()— raises upgrade dialog signal on 402;useAIUsage()- Platform-admin hooks (
features/admin-ai/) - useAIDefaults, useListAIPlans, useUpsertAIPlan, useListAIModels, useUpsertAIModel, useDeleteAIModel, useListAIOrgs, usePatchAIOrg, useResetTrial, usePromoteToPlatform, useListAIEvents, useAIKillSwitch, useToggleAIKillSwitch
- i18n
- All UI strings through Paraglide (
ai.*namespace), en/zh/ms translations in this change. AI-generated output follows caller's UI locale via prompt instruction. - Acceptance bars
- en ≥98%, zh ≥95%, ms ≥85% locale-of-output correctness
Deliberately out of UI scope
- No "switch to BYOK" path from inside the task form — must go through Settings.
- No per-feature toggle — mode is org-wide.
- No real-time trial counter via WebSocket — refreshes after each parse-task response.
- No usage-chart drill-down beyond the 30-day card.
- No inline subscription/payment flow — "Contact sales" until Stripe lands.
Prompts
Combined Output Schema
{
"type": "object",
"required": ["title", "description", "task_type", "priority", "checklist_items"],
"properties": {
"title": { "type": "string", "maxLength": 120 },
"description": { "type": "string", "maxLength": 2000 },
"task_type": { "enum": ["service", "sales", "logistics", "other"] },
"priority": { "enum": ["low", "medium", "high", "urgent"] },
"due_date_hint": { "type": "string", "format": "date-time" },
"checklist_items": {
"type": "array", "minItems": 0, "maxItems": 10,
"items": { "type": "string", "maxLength": 120 }
}
},
"additionalProperties": false
}
Locale Anchors
| Locale | LOCALE_ANCHOR | LOCALE_REITERATION |
|---|---|---|
| en | (empty) | "Reminder: respond in English." |
| zh | "Respond in Simplified Chinese (简体中文)." | "提醒:必须用简体中文回答。" |
| ms | "Respond in Bahasa Melayu (Malaysian Malay, not Indonesian). Example title: 'Hantar invoice ke pelanggan ABC sebelum Jumaat.'" | "Peringatan: jawapan MESTI dalam Bahasa Melayu." |
Enforcement Layer (prompt + post-hoc + retry-once)
-
Prompt with JSON instruction
System prompt includes "Respond as JSON matching the provided schema. No commentary. No markdown." OpenAI/Groq adapters also receive
response_format: { type: "json_object" }via the newResponseJSONOnly boolfield onChatRequest. -
JSON decode and schema validation
Decode the model's Content field; validate against the JSON schema using
github.com/santhosh-tekuri/jsonschema/v5(or similar). -
Retry once on failure
On JSON parse failure or schema-validation failure, retry with a corrective prompt that includes the validator error and the prior output.
-
502 on second failure
Return
ai_output_invalid. UI shows "AI couldn't parse — try rephrasing or fill in manually." and auto-switches to Manual tab after 3 seconds.
Testing and Evals
Backend Unit Tests (no DB)
- AccessGate.Authorize — one case per branch: no row + trial enabled, no row + trial disabled, mode=disabled, global kill-switch denied, trial at tokens limit, trial at calls limit, trial under limit, platform subscription expired, platform cap exceeded, platform model deprecated, BYOK no key, BYOK decrypt fails, BYOK happy path.
- AccessGate.RecordUsage — trial decrement atomic, platform lazy-reset CASE, platform increment in current period, BYOK no counter decrement, ai_usage ON CONFLICT rollup, idempotent on request_id.
- AESGCMCrypto — round-trip, wrong nonce/key/tamper fail, 1024 unique nonces asserted.
- FieldforceAITaskParser.validateOutput — accepts well-formed; rejects bad enums, oversized checklist, oversized item, missing required field, malformed JSON.
- FieldforceAITaskParser.run (provider mocked) — happy path; first call malformed → retry succeeds; both fail → typed error.
- error_detail sanitizer — strips sk-, sk-ant-, AIza patterns; multiple matches per string; truncates at 500 chars. (7 test cases)
Backend Integration Tests (real Postgres + mocked provider HTTP)
- parse-task happy path → 200, one
ai_usagerow with feature='fieldforce:parse_task', oneai_access_eventsrow with decision='allowed' and populated diagnostic columns. - Lazy provisioning: org with no
ai_configsrow makes first AI call → row appears with mode='trial', plan_id pointing at trial plan. - Trial-exhausted → 402, no
ai_usagerow, event row with decision='denied_trial_exhausted'. - Platform period rollover: first call of new month resets counters atomically; two concurrent calls both succeed; counter equals exact sum.
- Global kill-switch on → all gate calls return 403
ai_globally_disabled. - BYOK happy path → mocked provider receives Authorization header with decrypted org key.
- BYOK with bad ciphertext → 502
invalid_byok_key+ ops-alert log line. - PUT /ai-config (BYOK) + provider accepts test → encrypted row stored → GET returns
has_api_key: true, never the key. - PUT /ai-config (BYOK) + provider rejects test → 422, no row mutated.
- PUT /ai-config (platform) from org Admin → 409
invalid_mode_transition. - PATCH /admin/ai-config/orgs/:id without plan_id → 409
subscription_required. - reset-trial when org is disabled → counters zeroed, mode='trial', audit row written.
- All forbidden mode transitions (byok→trial, platform→trial) → 409 with current_mode + attempted_mode in body.
- Provider error path → RecordError called, event row has error_code='provider_unavailable', no usage row, no counter debit.
- error_detail containing
sk-ant-xxxstored assk-ant-<redacted>. - locale='ms' → mocked provider receives Malay anchor + reiteration in prompt.
- Migration test: CreateAIAccessTables() against snapshot with existing rows → ai_configs gains new columns with sane defaults; ai_usage backfilled with feature='unknown'; org_ai_configs gone; seed rows idempotent on re-run.
Frontend Tests (Vitest + @solidjs/testing-library)
- TaskCreationForm: tab switching; AI tab disabled on ai_disabled/ai_globally_disabled; trial counter %; Generate fills fields + switches to Manual; 402 opens upgrade dialog with text preserved; 502 model_deprecated shows banner; 502 ai_unavailable shows inline error + auto-switch after 3s.
- BYOKModal: provider radio drives model dropdown from byok_model_catalog; deprecated models hidden; Save disabled until all fields valid; 422 keeps modal open with key cleared.
- CurrentModeCard: renders correctly for trial/platform/byok/disabled; platform shows month-to-date progress bars; BYOK shows "Remove key" button.
- UpgradeDialog: three CTAs route correctly.
- PromoteToPlatformModal: all four fields required; provider drives model dropdown.
- PlatformAdminOrgsTable: reset-trial requires typing org name to confirm.
- KillSwitchTab: confirmation requires typing "DISABLE."
Browser Smokes (Playwright + e2e-auth-bypass)
- Admin configures BYOK with sandbox key → creates task via AI mode → task is created.
- Platform admin promotes disabled org to Platform → org user uses AI mode → task is created.
- Platform admin resets exhausted trial → org user can use AI again (no reload).
- Platform admin flips kill-switch → org user sees
ai_globally_disabledand AI tab is hidden.
Prompt Evals
~30 cases in backend/go/internal/modules/fieldforce/evals/parse_task/cases.jsonl. Coverage: en/zh/ms × {service,sales,logistics,other} × {priority levels} × {with/without due hint} × {ambiguous vs clear}. Run via make eval-parse-task. Not wired to CI this round. Cost: ~$0.20/run on claude-sonnet-4-6.
| Locale | Schema validity | task_type | priority | Locale-of-output |
|---|---|---|---|---|
| en | ≥98% | ≥85% | ≥75% | ≥98% |
| zh | ≥95% | ≥80% | ≥70% | ≥95% |
| ms | ≥90% | ≥75% | ≥65% | ≥85% |
If ms locale-of-output falls below 85% after prompt iteration, a follow-up change adds post-generation locale detection + corrective retry.
Security Review Checkpoints
- ✓ No code path returns
api_keyin any response body — onlyhas_api_key: boolean. - ✓ No code path logs the decrypted key.
Credentials.String()andGoString()return<redacted>. - ✓ AES-GCM nonce per-row, freshly generated from
crypto/rand, never reused. TestAESGCM_FreshNonces_AreUnique asserts 1024 unique nonces. - ✓ Env secret length asserted at boot — missing or wrong length → fatal exit.
- ✓ BYOK test call has hard 5-second timeout (
context.WithTimeout). - ✓ BYOK test call rejected when global AI kill-switch is active.
- ✓ error_detail sanitizer strips key-shaped substrings (7 unit tests, covering Anthropic/OpenAI/Google patterns).
- ✓ Model-deprecation returns 502, not a silent model substitution.
- ⏳ Cross-org leak test (org A's key never visible to org B) — DEFERRED; covered by integration tests before merge.
Observability
ai.gate.authorize- org_id, feature, mode, decision, latency_ms
ai.gate.record_usage- tokens, feature, mode
ai.gate.record_error- error_code, feature, provider_request_id
ai.byok.decrypt_failed- WARN + ops-alert pipeline
ai.crypto.boot_check_failed- FATAL at startup — missing or wrong-length env secret
ai.killswitch.toggled- emitted when admin flips the global flag
No new dashboards this round. A Grafana board fed by ai_access_events (denial rate, p95 latency, error_code distribution) is a Phase 3.5 nice-to-have.
Implementation Order
- Migration + crypto + entities
CreateAIAccessTables(), AESGCMCrypto (boot-time secret check), lazy-provision logic, entity structs.
- Plans + Models + Defaults seed
ai_plans, ai_models, ai_config_defaults seeded with the values in §3.4–3.6.
- AccessGate use case
Authorize + RecordUsage + RecordError with full unit tests and handler integration pattern.
- Org-scoped AI config endpoints
GET/PUT /ai-config, DELETE /api-key, GET /usage. Includes BYOK test-call flow and mode-transition state machine.
- Platform-admin AI config endpoints
Defaults, plans, models, orgs, events, kill-switch tab (5 endpoint groups).
- Fieldforce AI task parser
Single combined-schema call + JSON-mode flag + retry-once + validateOutput.
- Eval harness + initial cases
cmd/eval-parse-task/main.go + cases.jsonl (~30 cases). make eval-parse-task working locally.
- SolidStart hooks + i18n
All TanStack Query hooks; Paraglide ai.* namespace with en/zh/ms translations.
- Panel Settings → AI page
All four mode cards, BYOK modal, upgrade card, advanced disable section.
- Panel task creation AI mode + upgrade dialog
AI mode tab, Generate flow, pre-fill, Redo, all error states, shared UpgradeDialog mounted at root.
- Panel admin AI config pages
All 5 tabs: Defaults, Plans, Models, Orgs (with side panel), Kill-switch.
- Browser smokes + security review
Playwright flows, security checkpoint sign-offs, cross-org leak integration test.
Open Questions (Post-Round-1)
- Real payment processor — Stripe Checkout vs alternatives; webhook handlers that update
subscription_statusandsubscription_valid_until. - Native structured-output across all five providers (Anthropic tools, OpenAI json_schema response_format, Google functionDeclarations, Groq tools, Ollama format).
- Per-feature trial caps — today caps are org-wide; finer-grained capping needs schema + AccessGate changes.
- Automated CI evals — budget tolerance; on-label-only vs on-main-only triggers.
- Multi-key BYOK — fallback OpenAI when Anthropic is down.
- Streaming responses for parse-task — UX gain vs implementation complexity across adapters.
- Anniversary-cycle billing periods when Stripe lands (today: calendar month reset).
- ai_access_events retention cron — scheduled job to purge rows older than 90 days.
- Grafana board for denial rate, p95 latency, error_code distribution fed by ai_access_events.
- Per-model cost rate card editor in admin UI — column already on ai_models; small follow-up to expose editing.