Architecture overview

flowchart LR
  subgraph Customer["Customer Entra tenant"]
    EntraCA["Conditional Access policies"]
    EntraGroups["Groups (BG, exclusion, service)"]
    SigninLogs["Sign-in logs"]
    RiskAPI["Identity Protection (P2)"]
  end

  subgraph MSP["MSP team"]
    Admin["MSP admin (browser)"]
  end

  subgraph Policytab["Policytab"]
    Next["Next.js app on Vercel"]
    Edge["Supabase Edge Functions (Deno)"]
    DB["Supabase Postgres + RLS + Vault"]
    Cron["pg_cron"]
  end

  Admin -- email/password sign-in --> Next
  Next -- service-role queries --> DB
  Next -- invoke --> Edge
  Edge -- app-only token --> EntraCA
  Edge -- app-only token --> EntraGroups
  Edge -- app-only token --> SigninLogs
  Edge -- app-only token --> RiskAPI
  EntraCA -- "snapshot hash drift (resync / nightly)" --> Edge
  Cron -- "pg_net POST" --> Edge
  DB -- "alert insert trigger (pg_net)" --> Edge

Layers

Next.js (Vercel)

App Router, server components by default
Auth via Supabase email/password + Custom Access Token Hook (add_msp_id_claim)
Talks to Postgres directly (user-scoped supabase client for RLS-enforced reads, service-role for writes that need to set msp_id explicitly)
Invokes Edge Functions for every Graph call (never calls Graph from Next directly)

Edge Functions (Deno)

Run on Supabase's Deno runtime, deployed via supabase functions deploy
Every function validates the caller's msp_id claim against the target tenant's msp_id before any Graph call (defense-in-depth on top of RLS)
Use Vault-stored per-tenant credentials (with env-var fallback for local dev) to acquire app-only tokens via client_credentials
Standard hardening on every Graph call: Prefer: include-unknown-enum-members, 429/5xx retry with Retry-After honoured

Postgres (Supabase)

Shared schema (public) for MSP, msp_user, tenant, audit_log, alert, ratelimit_bucket, graph_credential, notification_channel, msp_change_request
Per-tenant schema (tenant_<uuid>) provisioned on first consent. Holds: policy_snapshot, policy_intent, group_state, exclusion_request, mfa_state, change_request, signin_summary_cache, risk_user_state, risk_detection_state
RLS scopes everything by the msp_id JWT claim set by the Custom Access Token Hook
Vault stores credential blobs (Graph app-only secrets, notification channel secrets) - only secret refs in app tables

pg_cron

Canonical schedule inventory: pg_cron jobs. Baseline SQL reference: supabase/baseline/12_crons.sql.

Do not duplicate the full job table here.

kick_graph_subscription_renewals was unscheduled - CA change webhooks are unsupported by Microsoft Graph.

kick_bg_signin_check is defined for backwards compatibility but is not scheduled - kick_breakglass_checks superseded it (skips paused tenants).

All cron jobs that talk to Edge Functions do so via pg_net.http_post - fire-and-forget; the function does the actual work asynchronously. Vault-backed kicks read edge_functions_base_url and edge_functions_service_key from Supabase Vault.

Documentation layers

Machine index: manifest.yaml. Compiled surface and gates: SURFACE.md, GATES.md. Schema map: DATA_MODEL.md. Comparison baseline: comparison-baseline-model.md.

Data isolation

Three layers of isolation between MSPs:

JWT claim - add_msp_id_claim hook bakes msp_id into every token
RLS policies - every shared-schema table has tenant_id in (select id from tenant where msp_id = current_msp_id()) or equivalent
Schema separation - per-tenant data lives in its own schema; cross-tenant queries in app code go through service-role and explicit checks

What's NOT in the architecture

No raw sign-in log storage - Impact analysis aggregates in memory and stores only summaries
No cron-driven snapshots - Resync is admin-triggered, plus nightly snapshot cron. Portal-edit drift is detected by comparing snapshot hashes (not Graph change notifications).
No customer-facing UI - Policytab is an MSP / internal-IT console. The customers themselves never sign in to it.