Architecture overview
flowchart LR
subgraph Customer["Customer Entra tenant"]
EntraCA["Conditional Access policies"]
EntraGroups["Groups (BG, exclusion, service)"]
SigninLogs["Sign-in logs"]
RiskAPI["Identity Protection (P2)"]
end
subgraph MSP["MSP team"]
Admin["MSP admin (browser)"]
end
subgraph Policytab["Policytab"]
Next["Next.js app on Vercel"]
Edge["Supabase Edge Functions (Deno)"]
DB["Supabase Postgres + RLS + Vault"]
Cron["pg_cron"]
end
Admin -- email/password sign-in --> Next
Next -- service-role queries --> DB
Next -- invoke --> Edge
Edge -- app-only token --> EntraCA
Edge -- app-only token --> EntraGroups
Edge -- app-only token --> SigninLogs
Edge -- app-only token --> RiskAPI
EntraCA -- "snapshot hash drift (resync / nightly)" --> Edge
Cron -- "pg_net POST" --> Edge
DB -- "alert insert trigger (pg_net)" --> Edge
Layers
Next.js (Vercel)
- App Router, server components by default
- Auth via Supabase email/password + Custom Access Token Hook (
add_msp_id_claim) - Talks to Postgres directly (user-scoped supabase client for RLS-enforced reads, service-role for writes that need to set
msp_idexplicitly) - Invokes Edge Functions for every Graph call (never calls Graph from Next directly)
Edge Functions (Deno)
- Run on Supabase's Deno runtime, deployed via
supabase functions deploy - Every function validates the caller's
msp_idclaim against the target tenant'smsp_idbefore any Graph call (defense-in-depth on top of RLS) - Use Vault-stored per-tenant credentials (with env-var fallback for local dev) to acquire app-only tokens via
client_credentials - Standard hardening on every Graph call:
Prefer: include-unknown-enum-members, 429/5xx retry withRetry-Afterhonoured
Postgres (Supabase)
- Shared schema (
public) for MSP, msp_user, tenant, audit_log, alert, ratelimit_bucket, graph_credential, notification_channel, msp_change_request - Per-tenant schema (
tenant_<uuid>) provisioned on first consent. Holds: policy_snapshot, policy_intent, group_state, exclusion_request, mfa_state, change_request, signin_summary_cache, risk_user_state, risk_detection_state - RLS scopes everything by the
msp_idJWT claim set by the Custom Access Token Hook - Vault stores credential blobs (Graph app-only secrets, notification channel secrets) - only secret refs in app tables
pg_cron
Canonical schedule inventory: pg_cron jobs. Baseline SQL reference: supabase/baseline/12_crons.sql.
Do not duplicate the full job table here.
kick_graph_subscription_renewals was unscheduled - CA change webhooks are unsupported by Microsoft Graph.
kick_bg_signin_check is defined for backwards compatibility but is not scheduled - kick_breakglass_checks superseded it (skips paused tenants).
All cron jobs that talk to Edge Functions do so via pg_net.http_post - fire-and-forget; the function does the actual work asynchronously. Vault-backed kicks read edge_functions_base_url and edge_functions_service_key from Supabase Vault.
Documentation layers
Machine index: manifest.yaml. Compiled surface and gates: SURFACE.md, GATES.md. Schema map: DATA_MODEL.md. Comparison baseline: comparison-baseline-model.md.
Data isolation
Three layers of isolation between MSPs:
- JWT claim -
add_msp_id_claimhook bakesmsp_idinto every token - RLS policies - every shared-schema table has
tenant_id in (select id from tenant where msp_id = current_msp_id())or equivalent - Schema separation - per-tenant data lives in its own schema; cross-tenant queries in app code go through service-role and explicit checks
What's NOT in the architecture
- No raw sign-in log storage - Impact analysis aggregates in memory and stores only summaries
- No cron-driven snapshots - Resync is admin-triggered, plus nightly snapshot cron. Portal-edit drift is detected by comparing snapshot hashes (not Graph change notifications).
- No customer-facing UI - Policytab is an MSP / internal-IT console. The customers themselves never sign in to it.