Skip to content

Instantly share code, notes, and snippets.

@jordotech
Created March 25, 2026 15:23
Show Gist options
  • Select an option

  • Save jordotech/20dba8fffd496a028401346cc5d3bec1 to your computer and use it in GitHub Desktop.

Select an option

Save jordotech/20dba8fffd496a028401346cc5d3bec1 to your computer and use it in GitHub Desktop.
ENG-893: Per-Org S3 Access Isolation — Strategy & Test Results

ENG-893: Per-Org S3 Access Isolation via Email Domain Verification

Problem

Capitol.ai is a multi-tenant platform where organizations share the same AWS infrastructure. A Capitol.ai admin can add themselves to any organization (e.g., EY) and gain full access to that org's S3 files — uploads, workflow files, and generated outputs. Client orgs need assurance that only users with verified email domains can access their data.

Strategy

We implement email-domain-scoped IAM role assumption — a belt-and-suspenders approach combining STS AssumeRole with explicit IAM Deny policies.

Architecture

User Request (e.g., download file from org X)
    │
    ▼
┌─────────────────────────────────────────────┐
│  Application Layer (platform-api or         │
│  agentic-backend)                           │
│                                             │
│  1. Look up org record in DynamoDB          │
│     → iam_role_arn                          │
│     → iam_role_email_domains                │
│                                             │
│  2. Compare user's email domain to          │
│     iam_role_email_domains list             │
│                                             │
│  3a. MATCH (e.g., user@ey.com, org allows   │
│      ["ey.com"])                            │
│      → STS AssumeRole to per-org IAM role   │
│      → Use scoped credentials for S3 op     │
│                                             │
│  3b. NO MATCH (e.g., admin@capitol.ai)      │
│      → Use default IRSA credentials         │
│      → EXPLICIT DENY blocks the request     │
│      → Return HTTP 403 to user              │
└─────────────────────────────────────────────┘

Two Layers of Protection

Layer 1 — Application-level gating (STS AssumeRole)

When a user's email domain matches the org's iam_role_email_domains, the app assumes a per-org IAM role (s3-org-<org-id-prefix>-<workspace>) scoped to that org's S3 prefixes. Non-matching users never get the scoped role.

Layer 2 — IAM-level enforcement (Explicit Deny)

The default IRSA role (used by all pods) has an explicit deny policy on protected org prefixes. This deny uses a StringNotLike condition on aws:PrincipalArn to exempt only the per-org assumed roles (s3-org-*). Even if a bug in application code skips the email check, IAM itself blocks the request.

{
  "Sid": "DenyProtectedOrgPrefixes",
  "Effect": "Deny",
  "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", ...],
  "Resource": [
    "arn:aws:s3:::uploads-bucket/<org-id>/*",
    "arn:aws:s3:::files-bucket/files/<org-id>/*",
    "arn:aws:s3:::outputs-bucket/files/<org-id>/*"
  ],
  "Condition": {
    "StringNotLike": {
      "aws:PrincipalArn": "arn:aws:iam::<account>:role/s3-org-*"
    }
  }
}

Data Model

Two new optional fields on the existing organizations DynamoDB table (no migration needed — DynamoDB is schemaless):

Field Type Description
iam_role_arn String Per-org IAM role ARN for STS AssumeRole
iam_role_email_domains List<String> Email domains allowed to assume the role (e.g., ["ey.com"])

If either field is absent or empty, the org uses standard (non-isolated) S3 access.

S3 Buckets Covered

For each protected org, three buckets are isolated:

Bucket Service Prefix Pattern Content
capitol-ai-ingestion-pipeline-* platform-api <org-id>/* User uploads (collections, data sources)
capai-agentic-files-* agentic-backend files/<org-id>/* Workflow input files
capai-agentic-outputs-* agentic-backend files/<org-id>/* Generated outputs (documents, charts, etc.)

Per-Org IAM Role Structure

Each protected org gets a dedicated IAM role:

  • Name: s3-org-<org-id-prefix>-<workspace> (e.g., s3-org-4e80fe7b-development)
  • Trust policy: Only the platform-api and agentic-backend IRSA roles can assume it
  • S3 policy: Scoped to only that org's prefixes across all three buckets
  • ListBucket: Conditional on s3:prefix matching the org's paths

Terraform Configuration

Access isolation is conditional per workspace — only workspaces with entries in protected_org_prefixes get the deny/assume policies:

variable "protected_org_prefixes" {
  type = map(list(string))
  default = {
    default           = []
    staging           = []
    development       = ["4e80fe7b-c3b8-41d8-8cf6-c947a69af0c1"]
    ey-eu-west-1      = []  # Will be populated when EY orgs are protected
    # ...
  }
}

When the list is empty (most workspaces), count = 0 means zero additional IAM resources are created.

Pre-signed URL Security

Pre-signed S3 URLs inherit the IAM permissions of the role that generated them. When the default IRSA generates a pre-signed URL for a protected prefix, the URL itself will return AccessDenied — the deny policy applies at request time, not at URL generation time.


Test Scenarios

Tested in development workspace with:

  • Protected org: 4e80fe7b-c3b8-41d8-8cf6-c947a69af0c1 (ACME Publishing)
  • Allowed domain: gmail.com
  • Blocked user: jordan@capitol.ai (Capitol admin, not in allowed domains)
  • Allowed user: jordotech@gmail.com (email domain matches)

Scenario 1A: Authorized user uploads to collections (platform-api)

Step Action Expected Result
1 jordotech@gmail.com navigates to ACME Publishing org Org loads normally PASS
2 User uploads a file to a collection App resolves email domain → matches gmail.com → assumes s3-org-4e80fe7b-development role PASS
3 File appears in collection S3 PutObject succeeds with scoped credentials PASS

Scenario 1B: Unauthorized user denied access to collections (platform-api)

Step Action Expected Result
1 jordan@capitol.ai is added to ACME Publishing org Org membership granted (this is expected) PASS
2 User attempts to view/download a file App resolves email domain → capitol.ai not in allowed domains → uses default IRSA PASS
3 S3 request hits explicit deny IAM returns AccessDenied, app returns HTTP 403 with user-friendly message PASS
4 Error tracked in Sentry capture_exception() fires, visible in Sentry dashboard PASS

Scenario 2A: Authorized user uploads in agentic-backend

Step Action Expected Result
1 jordotech@gmail.com runs a workflow in ACME Publishing org Workflow executes normally PASS
2 Workflow uploads files to agentic-backend S3 bucket App assumes per-org role → S3 PutObject succeeds PASS

Scenario 2B: Unauthorized user denied agentic-backend file access

Step Action Expected Result
1 jordotech@gmail.com shares a workflow with jordan@capitol.ai Workflow shared successfully PASS
2 jordan@capitol.ai opens the shared workflow and clicks to view a file App generates pre-signed URL using default IRSA PASS
3 Browser follows pre-signed URL to S3 S3 returns AccessDenied (explicit deny on IRSA for protected prefix) PASS
4 Error message: assumed-role/agentic-backend-development-role is not authorized to perform: s3:GetObject ... with an explicit deny in an identity-based policy Deny policy correctly blocks access PASS

Files Changed

Terraform (infrastructure)

File Changes
eks/services/platform-api/iam.tf Added conditional assume-org-roles and deny-protected-orgs policies
eks/services/platform-api/output.tf Exported organizations table name/ARN for cross-service use
eks/services/platform-api/variables.tf Added protected_org_prefixes and uploads_bucket variables
eks/services/agentic-backend/iam.tf Added DynamoDB read for orgs table, conditional assume/deny policies covering 3 buckets
eks/services/agentic-backend/remotes.tf Added platform-api remote state reference
eks/services/agentic-backend/locals.tf Added organizations table name/ARN from remote state
eks/services/agentic-backend/config.tf Added ORGANIZATIONS_TABLE_NAME env var
eks/services/agentic-backend/variables.tf Added protected_org_prefixes and uploads_bucket variables

Application Code

File Changes
platform-api/src/utils/org_s3_access.py New — email domain resolution using OrganizationsClient
platform-api/src/routes/uploads.py Added 403 handling for S3 AccessDenied with Sentry tracking
platform-api/src/schemas/organizations.py Added iam_role_arn and iam_role_email_domains fields
agentic-backend/src/services/aws/org_s3_access.py New — email domain resolution using direct DynamoDB
agentic-backend/src/core/settings.py Added ORGANIZATIONS_TABLE_NAME to DynamoDB settings

Frontend

File Changes
platform-frontend/src/pages/data-sources/new-data-source-modal.tsx Shows backend 403 detail in upload error toast
platform-frontend/src/pages/data-sources/collection-page/hooks/use-collection-data.ts Access denied detection with navigation back to data sources

Manual AWS Configuration (per protected org)

Resource Details
IAM Role s3-org-<org-id-prefix>-<workspace> with trust policy for both service IRSA roles
Inline Policy S3 access scoped to org's prefixes across all three buckets
DynamoDB Record iam_role_arn and iam_role_email_domains fields on the organization item

Rollout Plan

  1. Development — Complete and verified (this document)
  2. EY workspaces — Add EY org IDs to protected_org_prefixes, create per-org IAM roles, update DynamoDB records
  3. Production — Enable for any org requiring data isolation

No application code changes needed per-org — only Terraform variables and AWS/DynamoDB configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment