escherize/enhance-usage-analytics-for-embedding.md

Enhance Usage Analytics for Embedding

Branch: enhance-usage-analytics-for-embedding Date: 2026-03-24 Stakeholders: Roman (spec owner), Vamsi (PM), John Swanson (tech design), Ben Grabow (implementation), Alessio Laiso (hackathon demo/mockup), Eliot (optional) Tech doc: Linear doc

What and Why

Metabase's embedding customers lack visibility into how their embedded content is being used. The view_log and query_execution tables already capture some embedding context (embedding_client, embedding_version since v51), but don't track auth method, tenant, GDPR-sensitive request metadata, or security enforcement flags.

This project extends both tables with richer context columns, creates/updates analytics views to expose them, and adds a GDPR toggle setting. The goal is to power an "Embedded Analytics" dashboard (demoed at hackathon by Alessio) showing embedding views, usage trends, tenant activity, and content popularity.

Project Context (from kickoff meeting 2026-03-25)

Collaboration model: This is a pairing project between Bryan and Ben Grabow. The embedding team is frontend-only, so backend help is needed (similar to past SDK admin settings work). Vamsi framed this as a full SDLC exercise - not just writing code, but the entire lifecycle: how to start, track, announce, provide updates in the project channel, ask people to test, etc.

Ben's personal goal: Wants to use this project to dive deep and understand things in detail rather than surface-level. Vamsi acknowledged this and noted the project is straightforward enough to be a good container for that.

Scope boundary: Our deliverable is the server-side work - migrations, populating the new columns, updating analytics views. Once our PR merges to master, a separate PR (done by the embedding/frontend team) will enhance the UI dashboards to consume this new data. We don't need to build the dashboards.

Timeline: Vamsi's rough expectation is 3-5 days. Reality is reality - communicate if it takes longer.

Testing: Should test with actual embedding scenarios (SDK, iframe, public links) and verify context is being captured into the database. Loom video updates encouraged in the project Slack channel.

Usage analytics system: The analytics views system was recently overhauled (by Bronza/someone). There's a mesh command that creates a local database and pulls YAML from resources, making it easy to change views and serialize back. Vamsi will share the relevant doc in the project Slack channel.

Data flow: Need to get HTTP header data (embedding client, user agent, IP, etc.) from request context and propagate it to where view_log entries are created. Bryan noted there's already infrastructure for this (metabase.analytics.sdk middleware, (metabase.request.current/current-request)).

Key decisions from meeting with Roman (2026-03-26)

Surface derivation priority: Endpoint-based surface wins over X-Metabase-Client header. If no endpoint match, check header. If no header, default to "internal". (Roman confirmed explicitly.)
Embedding client expansion: Currently only 3 values exist (embedding-sdk-react, embedding-simple, embedding-iframe). Embedding team will add embedding-iframe-full-app, embedding-iframe-static, embedding-iframe-public, and embedding-mcp as new client-side header values. Backend should handle unknown values as pass-through.
Auth method: Roman described the need to track how sessions were generated (JWT, SAML, etc.). Confirmed that auth_identity.provider (which already exists) provides this mapping. May be cut from first PR if complex; NULL auth_method is acceptable as a starting point.
is_download: Already derivable from query_execution.context (ends-with "download"). No new column needed.
Active tenants/users: Roman prefers views with actual rows (not just counts) so users can build their own count queries.
Tenant audit event: Roman demoed that tenant creation has empty details in audit log. Needs fix (Ticket 8).
Scope: Our deliverable is server-side only (migrations, data population, views). Embedding team does the dashboards/UX separately.

Architecture Overview

What exists today

view_log: Records content views (dashboards, questions, collections). Has embedding_client and embedding_version but not exposed in the v_view_log analytics view. Has has_access and context columns also unexposed.
query_execution: Records query runs. Has embedding_client, embedding_version, is_sandboxed, and context (which already distinguishes download types). Also not fully exposed in v_query_log.
SDK middleware (metabase.analytics.sdk): Binds *client* and *version* from X-Metabase-Client / X-Metabase-Client-Version headers. Currently only covers SDK/iframe clients - doesn't cover public, guest-embed, metabot, or agent-api surfaces.
Session middleware (metabase.server.middleware.session): Resolves current user from session or API key. Joins to auth_identity table (which has a provider column with values like "password", "jwt", "saml", "google", "ldap", "oidc"). Currently does NOT select the provider during session resolution.
Tenant model: EE-only. core_user.tenant_id exists and is loaded via @api/*current-user*. Tenant table has: id, name, slug, is_active, attributes, created_at, updated_at, tenant_collection_id.

What we're adding

Area	Change	Tables
Auth tracking	`auth_method` varchar(32)	view_log, query_execution
Tenant tracking	`tenant_id` integer	view_log, query_execution
Security flags	`is_impersonated`, `is_db_routed` booleans	query_execution only
Query parameters	`parameters` text (JSON)	query_execution only
GDPR-gated	`embedding_hostname`, `embedding_path`, `sanitized_user_agent`, `ip_address`	view_log, query_execution
Surface derivation	Computed from `embedding_client` at read time via CASE	analytics views only
Admin setting	`collect-analytics-pii` boolean (default false, no feature gate)	settings table
New view	`v_tenants`	analytics views
View updates	`v_view_log` v4, `v_query_log` v5, `v_users` v5	analytics views

Design decisions (from tech doc)

Extend existing tables rather than new tables - simpler, no joins needed
Both tables get context columns because downloads/API calls only hit query_execution (no view_log entry)
Surface is derived at read time from embedding_client via CASE expression - if mapping evolves, historical data updates automatically
Auth method determined at request time and bound as dynamic var - avoids per-event DB queries for stateless auth
GDPR columns always exist but only populated when admin setting enabled
Analytics view SQL lives in resources/migrations/instance_analytics_views/ - one subdirectory per view, versioned (v1/, v2/, ...), with per-database SQL files (postgres, mysql, h2). MySQL views need SQL SECURITY INVOKER and use concat() instead of ||.

Implementation Tickets

Ticket 1: Database Migrations - Add Columns

Description: Add new nullable columns to view_log and query_execution tables, plus indexes on tenant_id.

Files to create/modify:

resources/migrations/060/<timestamp>_embedding_analytics.yaml - new migration file

Columns for view_log: auth_method (varchar 32), tenant_id (int), embedding_hostname (varchar 512), embedding_path (varchar 2048), sanitized_user_agent (varchar 512), ip_address (varchar 45)

Columns for query_execution: auth_method (varchar 32), tenant_id (int), is_impersonated (boolean), is_db_routed (boolean), parameters (text), embedding_hostname (varchar 512), embedding_path (varchar 2048), sanitized_user_agent (varchar 512), ip_address (varchar 45)

Indexes: idx_view_log_tenant_id, idx_query_execution_tenant_id

Notes:

Follow existing pattern of one changeset per column per table (see v51.2024-07-22 for embedding_client/version precedent)
Adding nullable columns is metadata-only on Postgres/MySQL - no table rewrite
Use ${boolean.type} for booleans (maps to bit(1) on MySQL)
Latest v60 timestamp is ~2026-03-17, so use 2026-03-24+

Ticket 2: Admin Setting - `collect-analytics-pii`

Description: Add a boolean admin setting that gates collection of GDPR-sensitive fields.

Files to modify:

src/metabase/embedding/settings.clj - add defsetting (or a more general namespace, since this is not embedding-specific)

Implementation:

(defsetting collect-analytics-pii
  (deferred-tru "Controls whether analytics collects personally identifiable information (hostname, path, user agent, IP).")
  :type       :boolean
  :default    false
  :visibility :admin
  :export?    true)

Notes: Per Roman, this is NOT embedding-specific - PII collection applies to all surfaces. No :feature gate (available to all instances). Default false means GDPR columns won't be populated until an admin explicitly enables it. Reading a feature-gated setting on non-EE instances returns the default safely (no exception), but since there's no gate here, this is moot.

Ticket 3: Auth Method Detection and Dynamic Var

Description: Determine auth_method during request processing and make it available as a dynamic var for view_log/query_execution population.

Files to modify:

src/metabase/request/current.clj - add *auth-method* dynamic var (NOT api/common.clj which has TODOs to migrate vars away; request.current is the intended home for new request-scoped vars)
src/metabase/server/middleware/session.clj - add auth_identity.provider to session query; set auth-method in merge-current-user-info
src/metabase/request/session.clj - bind *auth-method* in middleware layer OUTSIDE do-with-current-user (so as-admin doesn't touch it - Clojure binding only shadows explicitly named vars)

Auth method values and where each is detected:

All session-based methods flow through auth_identity.provider via the LEFT JOIN in session-with-id-query. The session is created with an auth_identity_id FK during the login flow for each provider.

Value	How user authenticates	Where session/identity is created	Where detected at request time
`password`	`POST /api/session` with email+password	`auth-identity.provider/login! :provider/password` -> `auth-identity.session/create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider` in `session-with-id-query`
`jwt`	`GET /auth/sso?jwt=<token>`	`metabase-enterprise.sso.integrations.jwt/sso-get` -> `login! :provider/jwt` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`saml`	`GET /auth/sso` (redirect) -> `POST /auth/sso` (IdP callback)	`metabase-enterprise.sso.integrations.saml/sso-post` -> `login! :provider/saml` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`google`	`POST /auth/sso` with Google OAuth token	`metabase.sso.providers.google` -> `login! :provider/google` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`ldap`	`POST /api/session` (LDAP enabled, checked before password)	`session.api/login` -> `login! :provider/ldap` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`oidc`	Enterprise OIDC flow	`login! :provider/oidc` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`custom-oidc`	Enterprise custom OIDC	`login! :provider/custom-oidc` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`slack-connect`	Slack Connect auth	`login! :provider/slack-connect` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`support-access-grant`	Support access grant flow	`login! :provider/support-access-grant` -> `create-session-with-auth-tracking!`	LEFT JOIN `auth_identity.provider`
`api-key`	`X-Api-Key` header on any request	No session created. Key created via `advanced-config.file.api-keys/init-from-config-file!` or API. Dedicated `:type :api-key` user.	Hardcoded in `current-user-info-for-api-key` -> `(assoc :auth-method "api-key")`
`guest`	`/api/embed/:token` (signed JWT in URL)	No session. Stateless - JWT verified per request by `metabase.embedding-rest.api.embed`. No user identity established.	TODO: Bind `auth-method` in embed route middleware (Ticket 4 scope)
`public`	`/api/public/:uuid` (public UUID in URL)	No session. Stateless - UUID verified per request by `metabase.public-sharing-rest.api`. No user identity.	TODO: Bind `auth-method` in public route middleware (Ticket 4 scope)
NULL	Legacy sessions (pre-auth_identity migration), pulse/background jobs	Pre-existing `core_session` rows without `auth_identity_id`	LEFT JOIN returns NULL (correct)

Request-time flow for session-based methods:

User authenticates via one of the provider login flows above
auth-identity.session/create-session-with-auth-tracking! creates core_session row with auth_identity_id FK
Session key returned as cookie (metabase.SESSION) or X-Metabase-Session header
On subsequent requests: wrap-session-key extracts session key -> wrap-current-user-info runs session-with-id-query which LEFT JOINs auth_identity to get provider -> merged onto request as :auth-method
bind-current-user wraps handler in (sdk/with-auth-method! (:auth-method request) ...) to bind the dynamic var

Request-time flow for API keys:

wrap-current-user-info -> current-user-info-for-api-key matches key prefix, verifies bcrypt hash
Returns (assoc :auth-method "api-key") - no auth_identity table involved
Same bind-current-user binding as above

Request-time flow for embed/public (TODO - not yet implemented):

These routes don't go through session resolution at all
Need server-side (sdk/with-auth-method! "guest" ...) or "public" binding in route middleware
Will be done alongside Ticket 4's +embedding-client route wrapper

Implementation approach:

Add LEFT JOIN to auth_identity in the session-with-id-query (compiled, memoized query at line ~99-140 of session.clj). This adds provider to every session lookup with zero extra queries.
For API keys: return {:auth-method "api-key" ...} from current-user-info-for-api-key
For guest/public: these don't go through session resolution. Bind *auth-method* in the respective API route handlers or wrapping middleware (Ticket 4).

Critical file references:

Session query: src/metabase/server/middleware/session.clj lines 99-140
merge-current-user-info: same file lines 235-243
do-with-current-user: src/metabase/request/session.clj lines 38-61
Current user vars: src/metabase/api/common.clj lines 104-131

Ticket 4: Expand `embedding_client` Vocabulary for Surface Derivation

Description: Set embedding_client server-side for surfaces that don't send the X-Metabase-Client header (public, guest-embed, metabot, agent-api).

Files to modify:

src/metabase/public_sharing_rest/api.clj (or middleware) - bind analytics.sdk/*client* to "public"
src/metabase/embedding_rest/api/embed.clj (or middleware) - bind *client* to "guest-embed"
src/metabase/metabot/api.clj (or middleware) - bind *client* to "metabot"
src/metabase/agent_api/api.clj (or middleware) - bind *client* to "agent-api"

Implementation: Each uses an unconditional with-client! binding. On conflicts (e.g., SDK iframe hitting a public route), the endpoint's server-side value wins - the endpoint determines which surface the request is actually being served through. Create a thin middleware wrapper in api_routes/routes.clj:

(defn- +embedding-client [client handler]
  (fn [request respond raise]
    (sdk/with-client! [client]
      (handler request respond raise))))

Applied to routes:

"/public"  (+public-exceptions (+embedding-client "public" metabase.public-sharing-rest.api/routes))
"/embed"   (+message-only-exceptions (+embedding-client "guest-embed" metabase.embedding-rest.api/embedding-routes))
"/metabot" (+embedding-client "metabot" metabase.metabot.api/routes)
"/agent"   (+embedding-client "agent-api" metabase.agent-api.api/routes)

Surface mapping (computed at read time in analytics views):

embedding_client value	surface	source
NULL	internal	no header, no server binding
embedding-sdk-react	sdk	SDK header
embedding-sdk-react-preview	sdk-preview	SDK + X-Metabase-Embedded-Preview header
embedding-simple	modular	Simple/modular embedding header
embedding-simple-preview	modular-preview	Simple + preview header
embedding-iframe	iframe	iframe detection header
embedding-iframe-preview	iframe-preview	iframe + preview header
public	public	server-side binding (Ticket 4)
guest-embed	guest	server-side binding (Ticket 4)
metabot	metabot	server-side binding (Ticket 4)
agent-api	agent-api	server-side binding (Ticket 4)
(anything else)	pass through raw value	future-proofing

Ticket 5: Populate New Columns on view_log

Description: When creating view_log entries, include auth_method, tenant_id, and GDPR-gated fields.

Files to modify:

src/metabase/view_log/events/view_log.clj - modify generate-view to include new fields
src/metabase/view_log/models/view_log.clj - update before-insert if needed

Implementation:

auth_method: from request.current/*auth-method* dynamic var (Ticket 3)
tenant_id: from (:tenant_id @api/*current-user*) - already available
GDPR fields: check (collect-analytics-pii) setting, if true extract from (metabase.request.current/current-request):
- embedding_hostname: Origin header, extract hostname
- embedding_path: Referer header, extract path (strip query params, truncate to 2048)
- sanitized_user_agent: User-Agent header (store raw)
- ip_address: request.util/ip-address

Ticket 6: Populate New Columns on query_execution

Description: When creating query_execution entries, include auth_method, tenant_id, security flags, parameters, and GDPR-gated fields.

Files to modify:

src/metabase/query_processor/middleware/process_userland_query.clj - modify query-execution-info (line 125-159) and save-successful-execution-metadata! (line 72-78)
src/metabase/queries/models/query_execution.clj - update before-insert if needed

Implementation (two categories of fields based on when data is available):

Category A - Captured in query :info map via qp/userland-query (Ring thread, before async boundary):

auth_method: from *auth-method* dynamic var (bound in request.current)
tenant_id: from (:tenant_id @api/*current-user*)
parameters: JSON-encode (:parameters query) (top-level filter values)
GDPR fields: from (request.current/current-request) headers, gated by (collect-analytics-pii) setting

Best injection point: qp/userland-query in src/metabase/query_processor.clj (lines 101-108) - the single centralized function ALL userland query entry points call. Add a helper that reads dynamic vars and merges into :info:

(defn userland-query [query info]
  (-> query
      (assoc-in [:middleware :userland-query?] true)
      (update :info merge info (embedding-analytics-info))))

Category B - Set on result metadata (inside QP pipeline, extracted in add-and-save-execution-metadata-xform!):

is_sandboxed: already implemented via this pattern
is_impersonated: :impersonation/role is set by EE impersonation middleware which runs INSIDE the QP pipeline, AFTER query-execution-info captures data. Must follow is_sandboxed pattern - have impersonation middleware set value on result metadata.
is_db_routed: :destination-database/id is set by attach-destination-db-middleware during preprocessing, also INSIDE the QP pipeline. Same timing issue as is_impersonated. Must follow is_sandboxed pattern - have routing middleware set value on result metadata via swap-destination-db execution middleware in database_routing/middleware.clj.

Why Category B can't use query-time capture: query-execution-info runs in the around middleware BEFORE the inner qp call that triggers preprocessing. Both :impersonation/role and :destination-database/id are set during preprocessing, so they're not yet on the query when query-execution-info reads it.

Failed queries: For failed queries, save-failed-query-execution! saves the execution-info map built before the query ran. Category A fields will be present. Category B fields (is_sandboxed, is_impersonated, is_db_routed) will naturally be absent since the query never produced results. This matches existing is_sandboxed behavior.

Ticket 7: Analytics Views - Update v_view_log, v_query_log, Create v_tenants, Update v_users

Description: Update analytics views to expose new columns and derived surface field.

Files to create:

resources/migrations/instance_analytics_views/view_log/v4/postgres-view_log.sql (+ mysql, h2)
resources/migrations/instance_analytics_views/query_log/v5/postgres-query_log.sql (+ mysql, h2)
resources/migrations/instance_analytics_views/tenants/v1/postgres-tenants.sql (+ mysql, h2)
resources/migrations/instance_analytics_views/users/v5/postgres-users.sql (+ mysql, h2)

Migration file: Add changesets to resources/migrations/060/<timestamp>_analytics_views.yaml

v_view_log v4 additions: has_access, context, embedding_client, embedding_version, surface (CASE), auth_method, tenant_id, embedding_hostname, embedding_path, sanitized_user_agent, ip_address

v_query_log v5 additions: auth_method, tenant_id, is_sandboxed, is_impersonated, is_db_routed, parameters, embedding_client, embedding_version, surface (CASE), embedding_hostname, embedding_path, sanitized_user_agent, ip_address

v_tenants v1: New view exposing tenant_id, name, slug, is_active, attributes, created_at, updated_at, tenant_collection_id (qualified). No active_user_count - breaks pattern of analytics views exposing raw data. Aggregate can be done in dashboard layer via JOIN to v_users (which gets tenant_id in v5).

v_users v5: Add tenant_id and tenant_qualified_id columns

Notes:

MySQL views need SQL SECURITY INVOKER
MySQL uses concat() instead of || for string concatenation
Must register v_tenants in audit-db-view-names set in enterprise/backend/src/metabase_enterprise/audit_app/permissions.clj
Use ${boolean.type} (not bare boolean) for boolean columns in YAML - MySQL needs bit(1)
Use ${text.type} for the parameters column (maps to TEXT/LONGTEXT/CLOB per database)

Rollback targets (view versions diverge between databases due to MySQL-only v3 fixes):

New Version	PostgreSQL rollback	MySQL rollback	H2 rollback
v_view_log v4	v2	v3	v2
v_query_log v5	v4	v4	v4
v_users v5	v3	v4	v3
v_tenants v1	DROP VIEW IF EXISTS v_tenants	DROP VIEW IF EXISTS v_tenants	DROP VIEW IF EXISTS v_tenants

Ticket 8: Enrich Tenant Create Audit Event

Description: Include tenant name and attributes in the audit_log details when a tenant is created.

Files to modify:

enterprise/backend/src/metabase_enterprise/tenants/model.clj - update model-details for :model/Tenant to include :attributes

Current code (line 17-19):

(defmethod audit-app/model-details :model/Tenant
  [entity _event-type]
  (select-keys entity [:name :slug :is_active]))

Change: Add :attributes to the select-keys vector. The tenant object passed to the event already contains attributes (via present-tenant), so this is a one-line change.

Implementation Order

#	Linear	Description	Status	Dependencies
1	EMB-1500	Database migrations	In Review	none
2	EMB-1501	Admin setting: analytics-pii-retension-enabled	In Review	none
3	EMB-1502	Auth method detection and dynamic var	Paused	none
4	EMB-1503	Expand embedding_client vocabulary	In Review	none
5	EMB-1504	Populate new columns on view_log	Todo	EMB-1500, 1501, 1502, 1503, 1512
6	EMB-1505	Populate new columns on query_execution	Todo	EMB-1500, 1501, 1502, 1503, 1512
7	EMB-1506	Analytics views: v_view_log v4, v_query_log v5, v_tenants v1	Todo	EMB-1500
8	EMB-1507	Enrich tenant create audit event	In Review	none
-	EMB-1508	Test: PII setting controls gathering	Todo	EMB-1501
-	EMB-1512	Collect and populate GDPR-gated request metadata	In Review	EMB-1501

EMB-1500, 1501, 1502, 1503, 1507, 1512 can all be worked in parallel. EMB-1504 and EMB-1505 can be parallelized once their dependencies are met. EMB-1506 only needs the migrations.

Verification Plan

Migrations: Start Metabase against H2/Postgres, verify columns exist via REPL query
Auth method: Log in via different methods (password, API key), check *auth-method* is correctly bound
embedding_client expansion: Hit public/embed/metabot/agent endpoints, verify embedding_client is populated
view_log population: View a dashboard via public link, verify new columns are populated
query_execution population: Run a query via embedded SDK, verify auth_method, tenant_id, security flags
GDPR toggle: Enable setting, verify hostname/path/user-agent/IP are captured. Disable, verify they're NULL
Analytics views: Query v_view_log, v_query_log, v_tenants, v_users and verify derived surface column works correctly
Audit event: Create a tenant, check audit_log details includes attributes

Test Methodology

This section maps each ticket to the existing test file it should extend, the pattern to follow, concrete test cases, and the helpers/vars needed.

Ticket 1: Migrations

File: test/metabase/app_db/schema_migrations_test.clj

Pattern to follow: workspace-input-normalization-migration-test (line 2872). Uses impl/test-migrations with a migration range vector, inserts rows before (migrate!), then verifies new columns exist and are NULL on pre-existing rows.

Test cases:

view-log-new-columns-migration-test - Insert a view_log row before migration, run (migrate!), verify auth_method, tenant_id, hostname, path, user_agent, ip_address are all NULL on the pre-existing row.
query-execution-new-columns-migration-test - Same pattern for query_execution. Insert a row with required fields (hash, started_at, running_time, result_rows, native, executor_id, database_id, context), migrate, verify new columns are NULL.
tenant-id-index-migration-test - After migration, verify the index works by querying with a WHERE tenant_id = ? clause on both tables (implicit: if migration succeeded, index was created).

Helpers needed: impl/test-migrations, t2/insert-returning-pks!, mdb.query/query

Note: Migration IDs are placeholders until the Liquibase YAML is written. Replace "v60.YYYY-MM-DDTHH:MM:00" with actual IDs.

Ticket 2: `collect-analytics-pii` Admin Setting

File: test/metabase/analytics/usage_analytics_test.clj (new file - this setting has no natural home in existing test files)

Pattern to follow: Any defsetting test, e.g., mt/with-temporary-setting-values. The setting doesn't exist yet - it needs to be created first.

Test cases:

collect-analytics-pii-default-test - Verify default is false
collect-analytics-pii-toggle-test - Use mt/with-temporary-setting-values [collect-analytics-pii true] / false, verify round-trip
collect-analytics-pii-no-feature-gate-test - Wrap in mt/with-premium-features #{}, verify the setting still works (no EE gating)
collect-analytics-pii-admin-only-test - mt/user-http-request :rasta :put 403 "setting/collect-analytics-pii" verifies non-admin can't change it

Helpers needed: mt/with-temporary-setting-values, mt/with-premium-features, mt/user-http-request

Note: collect-analytics-pii does NOT exist yet. Define it (likely in metabase.analytics.core or a new metabase.analytics.settings ns) before writing tests.

Ticket 3: Auth Method Detection

File: test/metabase/analytics/usage_analytics_test.clj (same new file as Ticket 2)

Pattern to follow: Integration tests that make HTTP requests via mt/user-http-request or client/client, then inspect side effects.

Test cases:

auth-method-from-session-test - Log in via password, make a request, verify *auth-method* was bound to the auth_identity.provider value (e.g., "password")
auth-method-api-key-test - Create an API key with mt/with-temp [:model/ApiKey ...], make a request using it, verify *auth-method* = "api-key"
auth-method-nil-when-unauthenticated-test - Hit a public route, verify *auth-method* is nil

Helpers needed: mt/with-temp, mt/user-http-request, client/client

Note: *auth-method* does NOT exist yet. It needs to be defined as a dynamic var (likely in metabase.request.current or session middleware) and bound during request processing. Tests will need to observe the bound value indirectly - e.g., by checking the auth_method column written to view_log or query_execution (which means Tickets 3+5 or 3+6 test together in practice).

Ticket 4: Expanded `embedding_client`

File: test/metabase/view_log/events/view_log_test.clj (extend existing public/embed tests)

Pattern to follow: get-public-card-logs-view-test (line 250). This test already:

Wraps in mt/with-premium-features #{:audit-app}
Enables public sharing with mt/with-temporary-setting-values [enable-public-sharing true]
Creates a public card with public-test/with-temp-public-card
Hits the public endpoint via client/client
Asserts on latest-view

Extend these assertions to check the embedding_client value.

Test cases:

Extend get-public-card-logs-view-test - After public card query, verify (:embedding_client (latest-view ...)) = "public"
Extend get-embedded-card-embedding-logs-view-test (line 280) - Verify embedding_client = "guest-embed"
server-side-binding-wins-over-client-header-test - Send X-Metabase-Client: embedding-sdk-react header to a public route, verify embedding_client is still "public" (server-side wins)

Current state: embedding_client already exists on view_log and query_execution. The include-sdk-info function in analytics/sdk.clj (line 34) sets it from *client* dynamic var. Currently *client* is only bound by embedding-mw (for SDK requests). For public/embed/metabot/agent routes, binding needs to happen in the route middleware.

Helpers needed: public-test/with-temp-public-card, embed-test/with-embedding-enabled-and-new-secret-key!, latest-view, client/client

Ticket 5: Populate `view_log` New Columns

File: test/metabase/view_log/events/view_log_test.clj

Pattern to follow: card-read-ee-test (line 27). The pattern is:

(mt/with-premium-features #{:audit-app}
  (mt/with-temp [:model/User user {} :model/Card card {:creator_id (u/id user)}]
    (events/publish-event! :event/card-read {:object-id (u/id card) :user-id (u/the-id user) :context :question})
    (is (partial= {...} (latest-view (u/id user) (u/id card))))))

Test cases:

view-log-auth-method-populated-test - Bind *auth-method* (once it exists), publish a card-read event, verify :auth_method on the view_log row
view-log-tenant-id-populated-test - Create a user with :tenant_id, publish event as that user, verify :tenant_id on the view_log row
view-log-gdpr-fields-null-when-pii-disabled-test - With collect-analytics-pii = false, publish event via HTTP request (to get request context), verify GDPR fields are NULL
view-log-gdpr-fields-populated-when-pii-enabled-test - With collect-analytics-pii = true, publish event via HTTP request, verify hostname, path, user_agent, ip_address are populated
view-log-null-for-unauthenticated-requests-test - Hit public route, verify auth_method and tenant_id are NULL

Important: View logging requires #{:audit-app} premium feature. All tests must wrap in mt/with-premium-features #{:audit-app}. Without it, view events are silently dropped (see card-read-oss-no-view-logging-test at line 41 for proof).

Helpers needed: mt/with-premium-features, events/publish-event!, latest-view, mt/with-temporary-setting-values, public-test/with-temp-public-card, client/client

Note: tenant_id exists on core_user but NOT on view_log yet - the migration (Ticket 1) adds it. The view_log event handler will need to read it from the current user and write it to the new column.

Ticket 6: Populate `query_execution` New Columns

File: test/metabase/query_processor/middleware/process_userland_query_test.clj

Pattern to follow: success-test (line 69). The pattern uses with-query-execution! macro (line 51) which:

Intercepts save-execution-metadata!* via with-redefs
Captures the query-execution map in a promise
Returns a function (qe) that dereferences the result

The test then asserts on specific fields of the captured map, including :is_sandboxed false (line 97).

Category A fields (captured on the Ring thread, passed via :info map):

auth_method - from *auth-method* dynamic var
tenant_id - from current user
GDPR fields (hostname, path, user_agent, ip_address) - from current request, gated by collect-analytics-pii

Category B fields (set on result metadata inside the QP pipeline, same thread as is_sandboxed):

is_impersonated - set by impersonation middleware on [:data :is_impersonated]
is_db_routed - set by database routing middleware on [:data :is_db_routed]

Test cases for Category A:

Extend success-test assertions to include :auth_method nil, :tenant_id nil (baseline - no auth context in unit test)
query-execution-auth-method-populated-test - Bind *auth-method*, run query via mt/user-http-request, check QE row
query-execution-tenant-id-populated-test - Create user with tenant, run query, check QE row
query-execution-gdpr-fields-gated-by-setting-test - Toggle collect-analytics-pii, run query, verify fields are NULL vs populated

Test cases for Category B: 5. query-execution-is-impersonated-test - Follow is_sandboxed pattern: check (get-in result [:data :is_impersonated]) is picked up and saved. Requires EE test setup with impersonation. 6. query-execution-is-db-routed-test - Same pattern for is_db_routed. Requires EE test setup with database routing.

Important - async save: Query execution metadata is saved asynchronously. The with-query-execution! macro handles this by intercepting save-execution-metadata!* (defined at process_userland_query.clj:43) with with-redefs and using a promise. Integration tests (Category A via HTTP) will need a different approach - either:

Use with-redefs on save-execution-metadata!* to capture the map, or
Query t2/select-one :model/QueryExecution after the request completes (with a small wait or mt/wait-for)

Helpers needed: with-query-execution! macro, mt/user-http-request, mt/with-temporary-setting-values, t2/select-one

Ticket 7: Analytics Views

File: test/metabase/app_db/schema_migrations_test.clj (for the migration that creates the views) plus a new test/metabase/analytics/analytics_views_test.clj for query-level tests.

Key reference - current view SQL: resources/migrations/instance_analytics_views/. The current views are simple:

view_log/v3/ - Only exposes id, timestamp, user_id, entity_type, entity_id, entity_qualified_id. Many existing columns (has_access, context, embedding_client) are NOT yet in the view.
query_log/v4/ - Exposes core fields plus transform_id, lens_id, lens_params, joined to query table. Does not yet expose embedding_client, is_sandboxed, etc.
users/v4/ - Does not yet have tenant_id.
tenants/ - Does not exist yet.

This means Ticket 7 is adding a LOT of new columns to these views, not just tweaking them. The surface CASE expression is entirely new logic.

Test cases:

Migration test: After running migration, verify views v_view_log, v_query_log, v_tenants, v_users exist and return results
v_view_log surface derivation: Insert view_log rows with different embedding_client values, query v_view_log, verify surface CASE expression maps correctly (e.g., "embedding-sdk-react" -> "sdk", "public" -> "public-link", etc.)
v_tenants view: Create tenants with attributes, verify the view exposes them
v_users view: Verify tenant_id column appears in the view

Ticket 8: Audit Event for Tenant CRUD

File: test/metabase/events/audit_log_test.clj or the tenant API test file

Pattern to follow: Existing audit log event tests that check (mt/latest-audit-log-entry ...).

Test cases:

tenant-create-audit-event-test - Create a tenant, verify audit_log entry has :details including :attributes
tenant-update-audit-event-test - Update tenant attributes, verify the audit event captures the change

Summary: Category A vs Category B

This distinction matters for query_execution (Ticket 6) because:

Category A fields are values available on the Ring request thread: auth_method, tenant_id, GDPR fields. They're passed into the QP via the :info map on the query, and written to the query_execution row by save-execution-metadata!.
Category B fields are set inside the QP middleware pipeline on result metadata: is_sandboxed (existing), is_impersonated (new), is_db_routed (new). They follow the pattern at process_userland_query.clj:116 where (get-in acc [:data :is_sandboxed]) is read from the accumulated result.

The thread boundary means Category A fields must be threaded through :info -> query-execution map, while Category B fields are picked up from the QP result. Tests for each category use different setup patterns:

Category A: bind dynamic vars / set request context, then run query
Category B: set up EE middleware (sandboxing/impersonation/routing), run query, verify the middleware set the flag on result metadata AND it was persisted

HTTP Contract

Attribute	Header/Cookie	Value type	Currently represented in HTTP requests?	Clojure representation	Currently stored in DB?	table.column(s)
embedding hostname	`Origin` (header)	string	yes (header exists, not captured)	`(get-in (metabase.request.current/current-request) [:headers "origin"])`	no	new: view_log.embedding_hostname, query_execution.embedding_hostname
embedding path	`Referer` (header)	string	yes (header exists, not captured)	`(get-in (metabase.request.current/current-request) [:headers "referer"])`	no	new: view_log.embedding_path, query_execution.embedding_path
user id	`metabase.SESSION` (cookie) / `X-Metabase-Session` (header)	int	yes	`metabase.api.common/current-user-id`	yes	view_log.user_id, query_execution.executor_id
tenant_id	`metabase.SESSION` (cookie) / `X-Metabase-Session` (header)	int	yes	`(:tenant_id @metabase.api.common/current-user)`	yes (on core_user), not on analytics tables	core_user.tenant_id; new: view_log.tenant_id, query_execution.tenant_id
surface	`X-Metabase-Client` (header)	string	partial (SDK/iframe only, not public/embed/metabot/agent)	`metabase.analytics.sdk/client`	partial (as embedding_client)	view_log.embedding_client, query_execution.embedding_client; surface derived at read time via CASE in analytics views
authentication method	`metabase.SESSION` (cookie) / `X-Metabase-Session` (header) / `X-Api-Key` (header) / `?jwt=<token>` (URL param for guest embed) / request URI (for public/embed routes)	string	yes (headers/cookies exist, not captured as auth_method)	new `auth-method` var in `request.current`; from `auth_identity.provider` for sessions, "api-key" for API keys, "guest"/"public" for embed/public routes	no	new: view_log.auth_method, query_execution.auth_method
date	n/a (server timestamp)	timestamp	n/a	`(t/zoned-date-time)`	yes	view_log.timestamp, query_execution.started_at
sanitized user agent	`User-Agent` (header)	string	yes (header exists, not captured)	`(get-in (metabase.request.current/current-request) [:headers "user-agent"])`	no	new: view_log.sanitized_user_agent, query_execution.sanitized_user_agent
ip address	`X-Forwarded-For` (header) / `remote-addr`	string	yes	`(metabase.request.util/ip-address request)`	yes (login_history.ip_address only)	login_history.ip_address; new: view_log.ip_address, query_execution.ip_address
query filters	n/a (query info)	text (JSON)	n/a	`(:parameters query)` (top-level on the query map)	no	new: query_execution.parameters
is download?	n/a (query context)	boolean (derived)	n/a	query context keyword (`:csv-download`, `:embedded-csv-download`, etc.); derive boolean via `(str/ends-with? context "download")`	yes	query_execution.context (already distinguishes all download types). No new column - derive in analytics view CASE.
row and column security?	n/a (query middleware)	boolean	n/a	set in sandboxing middleware result metadata	yes	query_execution.is_sandboxed
db routing?	n/a (query middleware)	boolean	n/a	set on result metadata by `swap-destination-db` middleware (Category B - same pattern as is_sandboxed)	no	new: query_execution.is_db_routed
connection impersonation?	n/a (query middleware)	boolean	n/a	set on result metadata by impersonation middleware (Category B - same pattern as is_sandboxed)	no	new: query_execution.is_impersonated
tenant attributes	n/a	JSON text	n/a	`(:attributes (t2/select-one :model/Tenant :id tenant-id))`	yes	tenant.attributes; exposed via new v_tenants view
active tenants	n/a	n/a	n/a	`(t2/select :model/Tenant :is_active true)`	yes	tenant.is_active; exposed via new v_tenants view
active tenant users	n/a	n/a	n/a	join core_user on tenant_id	yes	core_user.tenant_id + core_user.is_active; queryable via v_users (new tenant_id column) joined to v_tenants
published dashboards/questions	n/a	n/a	n/a	already queryable	yes	v_content.made_public_by_user (non-null = published). No changes needed.
public embeds	n/a	n/a	n/a	already queryable	yes	v_content.is_embedding_enabled. No changes needed.

HTTP routes that generate analytics data

HTTP Verb	HTTP Route	What it populates	Context value
GET	`/api/card/:id`	view_log	`:question`
GET	`/api/dashboard/:id`	view_log	`:dashboard`
GET	`/api/collection/:id`	view_log	`:collection`
POST	`/api/dataset`	query_execution	`:ad-hoc`
POST	`/api/card/:id/query`	query_execution	`:question`
POST	`/api/dashboard/:id/dashcard/:id/card/:id/query`	query_execution	`:dashboard`
GET	`/api/public/card/:uuid/query`	query_execution	`:public-question`
GET	`/api/public/dashboard/:uuid/...`	query_execution	`:public-dashboard`
GET	`/api/embed/card/:token/query`	query_execution	`:embedded-question`
GET	`/api/embed/dashboard/:token/...`	query_execution	`:embedded-dashboard`
POST	`/api/card/:id/query/:export-format`	query_execution	`:csv-download` / `:xlsx-download` / `:json-download`
POST	`/api/embed/card/:token/query/:export-format`	query_execution	`:embedded-csv-download` / etc.
POST	`/api/public/card/:uuid/query/:export-format`	query_execution	`:public-csv-download` / etc.

Open Questions (from tech doc)

User-Agent sanitization: Current decision is to store raw and display safely. No stripping needed unless injection risk is identified.
embedding_path truncation: Strip query params first, then truncate to 2048. Implementation detail for Ticket 5/6.
Unknown X-Metabase-Client values: Pass through as-is in the surface CASE expression (the ELSE clause).

Settled Decisions

Auth identity reliability: core_session.auth_identity_id reliably points to the auth identity used to create that session (set at creation time in create-session-with-auth-tracking!). Safe to LEFT JOIN for auth_method.
Tenant context for unauthenticated requests: NULL tenant_id for public/embed requests is fine.
No backfill: Don't backfill auth_method or tenant_id for historical rows.
OIDC provider granularity: No need to distinguish between multiple OIDC providers. Raw "oidc" string is sufficient.
Index strategy: Out of scope for this project. Only the planned tenant_id indexes.

Settled Decisions (from Slack 2026-03-30)

Preview handling (EMB-1503)

Decision: Keep the -preview suffix on embedding_client values.

Context: The embedding-mw appends -preview to the client value when X-Metabase-Embedded-Preview: true is set (e.g., embedding-sdk-react becomes embedding-sdk-react-preview). The question was whether preview requests should be grouped with their non-preview counterpart or kept distinct.

Resolution (Poom + Nicolo):

Preview requests must be distinguished from production usage for auditing - admins can query sensitive data via the embed preview wizard, and this should appear in audit trails (Nicolo)
Usage analytics views (Ticket 7) should de-emphasize preview but still surface it - the main use case is admins viewing embedded user behavior, preview is the edge case (Poom)
Preview requests should NOT be ignored/skipped entirely because of the auditing concern

Implementation: Two TODOs in sdk.clj resolved with comments documenting this decision. No behavioral changes - the -preview suffix was already being appended.

Tracer Bullet Strategy

Approach

Rather than building all inputs, then all DB writes, then all views in sequence, we wire each value end-to-end (request -> DB table -> analytics view) to prove the pipeline works, then repeat for each additional value. This catches integration issues early.

Value Matrix

Values dictated by 1 input:

Value	Input Source	DB Column(s)	View	Status
embedding_client	Route prefix OR X-Metabase-Client header	view_log + query_execution	v_query_log (needs surface CASE)	Wired - middleware binds, tests verify DB values
embedding_version	X-Metabase-Client-Version header	view_log + query_execution	v_query_log	Wired
preview suffix	X-Metabase-Embedded-Preview header	appended to embedding_client	same as above	Wired - tested
tenant_id	current-user tenant	view_log + query_execution	needs view col	Migration done, not yet populated
embedding_hostname	Origin header	view_log + query_execution	needs view col	Extraction fn done (pii-request-info), not wired to writes
embedding_path	Referer header (path only)	view_log + query_execution	needs view col	Extraction fn done, not wired to writes
sanitized_user_agent	User-Agent header	view_log + query_execution	needs view col	Extraction fn done, not wired to writes
ip_address	Request remote addr	view_log + query_execution	needs view col	Extraction fn done, not wired to writes
is_impersonated	Connection impersonation flag (Category B - QP result metadata)	query_execution only	needs view col	Migration done, not yet populated
is_db_routed	DB routing flag (Category B - QP result metadata)	query_execution only	needs view col	Migration done, not yet populated
parameters	Query filter values (JSON)	query_execution only	needs view col	Migration done, not yet populated

Values dictated by multiple inputs:

Value	Input Sources	DB Column	Status
auth_method	Session type (auth_identity.provider) + route (public, guest-embed) + API key header	view_log + query_execution	Dynamic var + include-sdk-info done, not yet bound in auth middleware

Auth Method Wiring (from permissions-expert analysis)

Where auth method is already known: At session creation time via auth_identity.provider. The core_session table has auth_identity_id FK to auth_identity, which has a provider column.

Recommended binding approach: Derive at request time by LEFT JOINing auth_identity in session-with-id-query (session.clj:99-140). This adds provider to every session lookup with zero extra queries (indexed FK lookup on a single row).

Modify session-with-id-query to add [:auth_identity.provider :auth-method] to SELECT
For API keys: hardcode "api-key" in current-user-info-for-api-key
For public/embed routes: bind based on route prefix in embedding-mw alongside *client*
Bind *auth-method* in bind-current-user from the request map

PII + Tenant Wiring (from platform-expert analysis)

view_log write path: Request -> event published -> view_log/events/view_log.clj:generate-view -> record-views! -> t2/insert! -> before-insert hook calls analytics/include-sdk-info

query_execution write path: process-userland-query-middleware -> save-execution-metadata! -> include-sdk-info called at line 56 (BEFORE async dispatch) -> async agent thread does t2/insert-returning-pk!

tenant_id: Available via (:tenant_id @api/*current-user*). No *current-tenant* var. Must capture before async boundary.

PII fields: Best bound in embedding-mw alongside *client* and *version*. No need for separate middleware - same request, same tables, same dynamic-var-to-DB-write pattern. PII columns are named embedding_* so they're embedding-scoped.

is_impersonated / is_db_routed: Category B - set inside QP pipeline on result metadata. Must follow is_sandboxed pattern. perms/impersonated-user? exists for impersonation detection. is_db_routed has no existing flag - needs to be surfaced from routing middleware.

Analytics Views Upgrade (from enterprise-expert analysis)

View upgrade process: Create new version directory under resources/migrations/instance_analytics_views/<view_name>/v<N>/ with postgres/mysql/h2 SQL files. Add migration changeset to 060/ YAML. Rollback section points to previous version's SQL files.

Surface CASE expression: Goes directly in the SQL view definition (no separate derived view). Consistent with how views already do column renaming, type casting, and coalescing. No precedent for layered views.

PII in views: Include columns in views, gate collection at write time (Option A). SQL views can't conditionally include columns based on runtime settings. The pii-request-info function already handles the extraction; the GDPR setting gates whether it's called.

Two separate analytics systems:

internal_stats/query_executions.clj feeds Snowplow/phone-home telemetry (aggregate counts sent to Metabase HQ)
Analytics views (v_query_log, v_view_log) feed instance-level usage dashboards for admins

These should use consistent classification logic but serve different purposes. The query_executions.clj CASE expressions may also need updating for newer client values (public, guest-embed, metabot, agent-api).

Auth Method Testing Feasibility

Auth Method	Test Infrastructure	Feasibility
password	`mt/user-http-request` - standard test helper	Easy
API key	Tests in `api_keys/api_test.clj`, header-based	Easy
LDAP	`with-ldap-server!` macro - in-memory LDAP server	Feasible
SAML	`with-saml-default-setup!` macro, test cert at `test_resources/sso/auth0-public-idp.cert`	Feasible - mature infra
JWT	`with-jwt-default-setup!` macro (random secret per run)	Feasible - mature infra
Google	Google OAuth test setup exists	Feasible
OIDC	Enterprise OIDC test setup	Feasible
public	Existing view_log tests verify `embedding_client "public"` in DB	Easy
guest-embed	Existing embed test helpers	Easy

Testing approach: Use existing SSO test infra to perform login, make subsequent request with session, assert auth_method written to view_log/query_execution (or captured via test helper).

Recommended First Tracer Bullets

auth_method for password + API key - bind *auth-method* in session middleware, verify it lands in view_log/query_execution. Easiest to test, proves the full pipeline.
PII fields - wire pii-request-info into view_log/query_execution write path (EMB-1504/1505), GDPR-gated
tenant_id - bind from current user context, verify in both tables

Open Questions (for team discussion)

Should parameters be GDPR-gated? Filter parameter values can contain user-identifiable information (e.g., filtering by email, customer name). Should the parameters column on query_execution also be gated by the collect-analytics-pii setting? Or are parameter values considered non-PII?
~~Embedding client precedence.~~ RESOLVED: The endpoint's server-side binding wins (unconditional with-client!). The endpoint determines which surface the request is actually served through, regardless of what client header was sent.
Parameters column redundancy. The full query (including parameters) is already stored as json_query on query_execution. Storing parameters separately adds queryability but also doubles that data. Worth it?
NULL auth_method semantics. When auth_method is NULL (old sessions before auth_identity migration, background jobs, pulse rendering), should analytics treat this as "unknown" or "internal"? Should we explicitly bind "internal" for background jobs?
~~as-admin escalation.~~ RESOLVED: Safe. as-admin calls do-with-current-user which only rebinds the 7 vars it explicitly names (*current-user-id*, *is-superuser?*, etc.). A separately-bound *auth-method* (in middleware, outside do-with-current-user) is NOT touched by as-admin - Clojure binding only shadows explicitly named vars. The original auth method is correctly retained.

Review Feedback (from expert subagents, 2026-03-24)

Changes to adopt

Setting name: Use embedding-analytics-collect-pii (not embedded-) to match existing embedding-* prefix convention in embedding/settings.clj.

OUTCOME For this: we actually should just call it collect-analytics-pii since it is not tied to embedding. see:

[3:16 PM]Bryan  Question about setting name PII: Our product/tech docs have it called embedded-analytics-collect-pii.

1. embedding-analytics-collect-pii (not embedded-) better matches the existing embedding-* prefix convention in embedding/settings.clj

but the bigger thing is: is PII collection actually embedding specific? Should we capture that info for non-embedding requests too? if so we will change the name even more
[7:35 AM]roman  I think this isn’t embedding-specific

Feature gate: Add :feature :embedding from the start. OVERRIDDEN: No feature gate. Setting is not embedding-specific (per Roman), available to all instances.
Auth method values - use raw provider strings: Don't map "password" -> "cookie". Use the raw auth_identity.provider value directly. This gives us "password", "google", "ldap", "oidc", "custom-oidc", "slack-connect", "support-access-grant" for free without a mapping table. The provider describes how the user proved their identity - the transport mechanism (cookie vs header) is orthogonal.
embedding_client binding precedence: Use (or *client* "public") / (or *client* "guest-embed") instead of unconditional rebinding. OVERRIDDEN: Use unconditional with-client! binding. The endpoint's server-side value wins on conflicts - the endpoint determines which surface the request is served through.
Guest/Public binding location: Create thin wrapper functions in routes.clj (e.g., with-analytics-bindings) rather than modifying security-oriented wrappers (+public-exceptions, +message-only-exceptions). Keeps analytics concerns separate from security concerns.
Drop active_user_count from v_tenants: It breaks the pattern of analytics views exposing raw data. The join can be done in the dashboard layer using the tenant_id being added to v_users. No other analytics view computes aggregates inline.
Use ${text.type} not text: The YAML should use ${text.type} for the parameters column (maps to TEXT/LONGTEXT/CLOB per database). Same pattern as lens_params.
View rollback correctness: v_view_log rollback is tricky - current version is v2 for postgres/h2 but v3 for mysql. Rollbacks must reference the correct per-database previous version.

Confirmed safe (no changes needed)

Session query JOIN performance: Adding LEFT JOIN to auth_identity is negligible - PK lookup on one row, both sides indexed.
Impersonation detection: (boolean (:impersonation/role query)) is reliable - key only set when impersonation genuinely applied.
Database routing detection: (boolean destination-database-id) is reliable. CORRECTED: is_db_routed has the same timing issue as is_impersonated - must use result-metadata pattern. See Critical Issue #3 above.
Tenant table on OSS: Table exists on all instances (not EE-gated). v_tenants view will work everywhere (zero rows on OSS).
(:tenant_id @api/*current-user*) without defenterprise: Works fine - column is in default-user-columns, naturally NULL on OSS.
Qualified ID format: 'tenant_' || tenant_id is consistent with all other views. tenant_collection_id is NOT NULL so no nullable concat concern.
v_tenants in audit-db-view-names: Safe to add even before view is synced - permission checks simply ignore unknown tables.
Scheduled tasks (pulses): *auth-method* will be nil/NULL for pulse queries. Combined with context = 'pulse', this is unambiguous and correct.
Auth method in analytics views: Low security risk - only visible to admins who already have access to login history and user details.

Critical issues found (2026-03-25 review)

CRITICAL: Two thread boundaries lose dynamic vars for query_execution. The streaming response body runs on a different thread (streaming thread pool) from the Ring request thread, and save-execution-metadata! then submits work to yet another thread (pooled executor). Dynamic vars like *current-user*, *auth-method*, and (current-request) are NOT available on either thread.

Fix: All new column values must be captured as data in the query :info map BEFORE entering the streaming response - following the existing executed-by pattern. Do NOT read dynamic vars in query-execution-info. Instead:
- Best injection point: qp/userland-query in src/metabase/query_processor.clj (lines 101-108) - the single centralized function ALL userland query entry points call
- Add a helper embedding-analytics-info that reads dynamic vars on the Ring thread and returns a map of :auth-method, :tenant_id, and GDPR fields
- Merge this into :info inside userland-query
- Read them from (:info query) in query-execution-info
- Also add these keys to the ::lib.schema.info/info Malli schema in src/metabase/lib/schema/info.cljc
CRITICAL: is_impersonated will always be false if set in query-execution-info. The :impersonation/role key is set by EE impersonation middleware which runs INSIDE the QP pipeline, AFTER process-userland-query-middleware has already captured query-execution-info.

Fix: Follow the is_sandboxed pattern - have the impersonation middleware set the value on result metadata, then extract it in add-and-save-execution-metadata-xform! alongside is_sandboxed.
CRITICAL: is_db_routed has the same timing issue as is_impersonated. :destination-database/id is set by attach-destination-db-middleware during preprocessing, which runs INSIDE the QP pipeline, AFTER query-execution-info captures data. The original report incorrectly stated this was set before the pipeline.

Fix: Same as is_impersonated - set :is_db_routed on result metadata in the swap-destination-db execution middleware (database_routing/middleware.clj), then extract in add-and-save-execution-metadata-xform!.

Risks to be aware of

GDPR data retention: Once PII is collected and the setting is later disabled, historical PII remains visible through the views. The setting only controls future writes.

OUTCOME: Inline columns, purge deferred. All PII columns are nullable, so future purge = batched UPDATE SET column = NULL. Costly on large tables (tens of millions of rows) but acceptable since PII collection is off by default this phase. Existing PII precedent: login_history stores ip_address and device_description inline with no cleanup. Separate FK'd PII tables (purge = TRUNCATE, sub-second) were considered but deferred. Slack consensus: setting stays off this phase; Roman says don't paint into a corner; Ben confirms nullable columns = no technical limitation to future deletion.

Slack thread for reference:

[11:11 AM]Bryan  Something we should be aware of for GDPR: Once PII is collected and the setting is later disabled, historical PII will remain visible through the views. The setting only controls future writes. Is that something we should handle  as part of this project?
[11:15 AM]sameer  good point. I Think flipping the bit should be considered a pretty serious action.
For actual compliance, we'd need to re-write history and scrub PII[11:46 AM]roman  How difficult is it to delete this data when toggling off? Straightforward, I guess, if so let's do it for the first iteration 
[11:48 AM]sameer  depends on how many views there are
[4:39 PM]Ben Grabow  My understanding is that for this phase of the project we're building under the assumption that the setting will be turned off (by default off and remains off). At some later point we'll start another project/phase of project where we start turning the setting on, and we'll need to tackle the GDPR issues then. Not saying we shouldn't think about them but pragmatically speaking I don't think we need to address these concerns in the branch we're working on this week. (edited) 
[7:04 AM]roman  @Ben Grabow here it feels more important not to rush into building something, but to make sure we’re not doing anything that would make it hard to remove this data later. So if we have a plan and a clear understanding of what to do, that’s fine too.
So ideally we’d want to answer the hardest questions first, rather than getting stuck on them at the end[8:35 AM]Ben Grabow  I agree, we shouldn't paint ourselves into a corner and we should make sure that historical PII can be removed. I just wanted to clarify the scope of this phase of the project.

When we get to the phase when we are building a way to remove historical PII, it seems that we know which columns will contain PII, we will make those columns nullable, so in principle there is no technical limitation to deleting the historical PII.

There may be some UX and performance questions to answer. If we put the PII columns in their own table FK'd to the main tables, then deleting the PII would be as simple as dropping the table. Do we have precedent for storing PII elsewhere in the system or is this the first time?

Legacy sessions: auth_identity_id can be NULL on core_session for pre-migration sessions. LEFT JOIN handles this (returns NULL provider). Analytics should treat NULL auth_method from session requests as "unknown/legacy".
Index creation on large tables: CREATE INDEX takes a SHARE lock on Postgres which blocks writes. For very large view_log/query_execution tables in production, consider CREATE INDEX CONCURRENTLY (requires raw SQL changeset for Postgres, standard createIndex for MySQL/H2).

escherize/enhance-usage-analytics-for-embedding.md

Enhance Usage Analytics for Embedding

What and Why

Project Context (from kickoff meeting 2026-03-25)

Key decisions from meeting with Roman (2026-03-26)

Architecture Overview

What exists today

What we're adding

Design decisions (from tech doc)

Implementation Tickets

Ticket 1: Database Migrations - Add Columns

Ticket 2: Admin Setting - collect-analytics-pii

Ticket 3: Auth Method Detection and Dynamic Var

Ticket 4: Expand embedding_client Vocabulary for Surface Derivation

Ticket 5: Populate New Columns on view_log

Ticket 6: Populate New Columns on query_execution

Ticket 7: Analytics Views - Update v_view_log, v_query_log, Create v_tenants, Update v_users

Ticket 8: Enrich Tenant Create Audit Event

Implementation Order

Verification Plan

Test Methodology

Ticket 1: Migrations

Ticket 2: collect-analytics-pii Admin Setting

Ticket 3: Auth Method Detection

Ticket 4: Expanded embedding_client

Ticket 5: Populate view_log New Columns

Ticket 6: Populate query_execution New Columns

Ticket 7: Analytics Views

Ticket 8: Audit Event for Tenant CRUD

Summary: Category A vs Category B

HTTP Contract

HTTP routes that generate analytics data

Open Questions (from tech doc)

Settled Decisions

Settled Decisions (from Slack 2026-03-30)

Preview handling (EMB-1503)

Tracer Bullet Strategy

Approach

Value Matrix

Auth Method Wiring (from permissions-expert analysis)

PII + Tenant Wiring (from platform-expert analysis)

Analytics Views Upgrade (from enterprise-expert analysis)

Auth Method Testing Feasibility

Recommended First Tracer Bullets

Open Questions (for team discussion)

Review Feedback (from expert subagents, 2026-03-24)

Changes to adopt

Confirmed safe (no changes needed)

Critical issues found (2026-03-25 review)

Risks to be aware of

Ticket 2: Admin Setting - `collect-analytics-pii`

Ticket 4: Expand `embedding_client` Vocabulary for Surface Derivation

Ticket 2: `collect-analytics-pii` Admin Setting

Ticket 4: Expanded `embedding_client`

Ticket 5: Populate `view_log` New Columns

Ticket 6: Populate `query_execution` New Columns