ralphbean · April 2, 2026 14:38 · Apr 2, 2026
diff --git a/0010-per-org-github-apps.md b/0010-per-org-github-apps.md
@@ -0,0 +1,192 @@
+---
+title: "10. Per-org GitHub Apps for agent identity"
+status: Proposed
+relates_to:
+  - agent-architecture
+  - agent-infrastructure
+  - security-threat-model
+topics:
+  - security
+  - credentials
+  - identity
+  - compute-agnosticism
+---
+
+# 10. Per-org GitHub Apps for agent identity
+
+Date: 2026-04-02
+
+## Status
+
+Proposed
+
+## Context
+
+The admin CLI (branch `agent-admin-cli-clean-room-v4`) creates per-org GitHub
+Apps using the manifest flow — one app per agent role (fullsend, triage, coder,
+review) — and stores their private keys as repo secrets in the org's `.fullsend`
+repo. This is an implicit design decision that has not been explicitly justified
+or recorded.
+
+The standard GitHub App model is designed for multi-tenancy *without* per-org
+apps: a single app owner holds the private key, many orgs install the app, and
+the owner's backend mints scoped installation tokens per org. This is how
+Dependabot, Codecov, Renovate, and every GitHub Marketplace app works.
+
+The question is whether fullsend should follow that standard model (global
+shared apps) or require each adopting org to create its own apps.
+
+The answer depends on where agent workloads execute. Fullsend's design requires
+adopting orgs to run agent workloads on their own infrastructure. Today that
+means GitHub Actions; in the future it may mean Kubernetes clusters or other
+compute platforms the org controls. The credential mechanism must work across
+all of these.
+
+This ADR relates to [ADR 0009](0009-github-app-ephemeral-credentials.md), which
+addresses how ephemeral tokens are generated from app credentials. This ADR
+addresses the prior question: who owns the apps and where do the private keys
+live.
+
+## Options
+
+### Option 1: Per-org GitHub Apps (current implementation)
+
+Each adopting org creates its own set of GitHub Apps (one per agent role) during
+installation. The org holds the private keys and stores them in the `.fullsend`
+repo's secrets. Workflow runs in the org use these keys to mint short-lived
+installation tokens.
+
+**Pros:**
+- Compute-agnostic. The private key is a portable blob that can be stored in
+  any secret store (GitHub Actions secrets, Kubernetes Secrets, Vault). Token
+  minting is ~20 lines of code that runs identically on any platform.
+- Zero external dependencies. No central service, no availability concerns, no
+  single point of failure.
+- True sovereignty. The org owns the apps, the keys, and the permissions. No
+  trust relationship with a third party required.
+- Blast radius isolation. A compromise of one org's keys affects only that org.
+- Aligns with "the repo is the coordinator" — no external coordination layer.
+
+**Cons:**
+- Onboarding friction. The manifest flow requires browser interaction per app,
+  multiplied by the number of agent roles (currently 4).
+- Permissions drift. When a new permission or event subscription is needed for
+  an agent role, every org must update their apps individually. There is no
+  centralized upgrade path.
+- Operational burden. Org admins manage multiple GitHub Apps (creation,
+  installation, key rotation, deletion).
+- App name squatting. GitHub App slugs are globally unique. A naming collision
+  (`fullsend-acme-coder`) blocks the legitimate org from using that slug.
+- Lost key recovery. GitHub App private keys are only available at creation
+  time. If lost, the app must be deleted and recreated.
+
+### Option 2: Global shared apps with reusable workflow token vending
+
+Fullsend owns one set of global GitHub Apps. The private keys are stored as
+secrets in the `fullsend-ai` org. A reusable workflow in the `fullsend-ai` org
+mints installation tokens for calling orgs via `workflow_call`.
+
+**Pros:**
+- Simple onboarding. Orgs install the app with one click; no manifest flow.
+- Centralized permission updates. New permissions are added once to the global
+  apps.
+- No private key management for adopting orgs.
+
+**Cons:**
+- Coupled to GitHub Actions. `workflow_call` is a GitHub Actions primitive;
+  Kubernetes pods and other compute platforms cannot call reusable workflows.
+- Reusable workflow outputs are visible in logs, potentially leaking tokens.
+- Single point of compromise. If the fullsend org's secrets leak, all adopting
+  orgs are affected.
+- Centralized dependency. The fullsend org's workflows must be available for
+  any adopting org to operate.
+
+### Option 3: Global shared apps with OIDC token vending service
+
+Fullsend owns global apps and runs a stateless token vending service (e.g.,
+Cloudflare Worker, AWS Lambda). Callers present an OIDC token proving their
+identity; the service verifies it and returns a scoped installation token.
+
+**Pros:**
+- Simple onboarding. One-click app installation.
+- Stateless service. No database, trivially simple (~50 lines of code).
+- Self-hostable. Orgs can run their own instance.
+
+**Cons:**
+- Requires a running service — even a minimal one is infrastructure to operate,
+  monitor, and secure.
+- Identity federation complexity. GitHub Actions OIDC tokens only prove GitHub
+  Actions identity. Kubernetes uses different OIDC tokens with different trust
+  roots. Each new compute platform requires a new identity attestation
+  integration, creating an ever-growing compatibility matrix.
+- Network reachability. The service must be reachable from wherever compute
+  runs, which may not be possible from air-gapped or firewalled clusters.
+- Single point of compromise if a shared instance is used.
+
+### Option 4: Hybrid — global apps by default, per-org override
+
+Offer global shared apps as the default path, with an option for orgs that want
+full sovereignty to create their own apps.
+
+**Pros:**
+- Low friction for most adopters.
+- Sovereignty for those who need it.
+
+**Cons:**
+- Two code paths to maintain and test.
+- The global path still has the compute-coupling and centralization problems of
+  Options 2 or 3.
+- Complexity of supporting both models may exceed the benefit.
+
+## Decision
+
+Each adopting organization creates its own set of GitHub Apps — one per agent
+role — during fullsend installation (Option 1).
+
+The decisive factor is **compute-agnosticism**. Fullsend requires adopting orgs
+to run agent workloads on their own infrastructure. Today that is GitHub
+Actions; in the future it will include Kubernetes clusters and potentially other
+platforms. The credential mechanism must work identically regardless of where
+the compute runs.
+
+Per-org apps with org-held private keys satisfy this requirement cleanly:
+- The private key is a portable secret that can be stored in any platform's
+  secret management system.
+- Token minting (sign JWT, exchange for installation token) is a simple,
+  platform-independent operation.
+- No network reachability to a central service is required.
+- No platform-specific identity attestation (OIDC provider integration) is
+  needed.
+
+All alternatives that avoid per-org apps require either a central service
+(Options 3, 4) or a GitHub Actions-specific mechanism (Option 2). These
+approaches would require new integrations for each additional compute platform,
+creating an ever-growing compatibility matrix. Per-org apps avoid this entirely.
+
+The "no managed service" framing understates the actual constraint. The real
+architectural requirement is that the credential mechanism must be
+**compute-agnostic by design** — a consequence of fullsend's bring-your-own-
+infrastructure model.
+
+## Consequences
+
+- **Adopting orgs own their credential lifecycle.** They create apps, store
+  keys, rotate credentials, and delete apps on their own terms. No dependency
+  on fullsend infrastructure.
+- **Onboarding requires manifest flow interaction.** The admin CLI must guide
+  org admins through creating and installing multiple GitHub Apps. This is
+  more friction than a one-click marketplace install.
+- **No centralized permission upgrade path.** When agent roles need new
+  permissions or event subscriptions, each org must update their apps. The
+  admin CLI should provide tooling to detect and remediate permission drift
+  (e.g., an `analyze` subcommand that compares installed app permissions
+  against expected permissions).
+- **App slug collisions are possible.** The naming convention
+  (`fullsend-{org}-{role}`) uses a global namespace. Mitigation: the CLI
+  already supports slug overrides and known-slug mappings in config.
+- **Future compute platforms work automatically.** Moving from GitHub Actions
+  to Kubernetes (or any other platform) requires only storing the existing
+  private key in the new platform's secret store. No changes to the app
+  model, no new service integrations, no identity federation.
+- **Blast radius is isolated per org.** A key compromise in one org cannot
+  affect any other org.
diff --git a/conversation-per-org-github-apps.md b/conversation-per-org-github-apps.md
@@ -0,0 +1,250 @@
+# Conversation: Challenging the per-org GitHub Apps assumption
+
+Date: 2026-04-02
+
+Participants: rbean, Claude Opus 4.6
+
+## Context
+
+This conversation took place on the `agent-admin-cli-clean-room-v4` branch of
+the fullsend repo, which implements a CLI for org administrators to install
+fullsend. The CLI creates per-org GitHub Apps using the manifest flow — one app
+per agent role (fullsend, triage, coder, review) — and stores their private
+keys as repo secrets in the org's `.fullsend` repo.
+
+---
+
+## rbean
+
+I need you to do some research. The code on this branch rests on an implicit
+design decision (assumption!) about how github apps works. I want you to take a
+critical eye to that decision and challenge it. The assumption is that when an
+org administrator "installs" fullsend, they need to create their own github
+apps for use. Here's my chain of logic. We don't want to run a managed service.
+We want the org admins to run agent workloads on their infrastructure. since our
+only infra layer right now is github, that means github actions -- their github
+actions somewhere in their org. Those workflow runs need to act "as" the apps to
+comment on issues as them, submit PRs as them, submit code review as them. In
+order for that to work, the org administrator needs to have the private keys of
+the apps available as repo secrets in the .fullsend/ repo of their org (or
+somewhere in the org, anyways). And, (here's the implied challenge to the
+assumption..) if we were to have a single set of global github apps (one for
+each agent) that were shared globally by all orgs that use fullsend, then all of
+those orgs would need *copies* of the private keys associated with the apps, so
+that their workflow runs could mint short-term tokens for those apps, so that
+their workflow runs could act "as" those apps on repos in their org. Since we
+don't want to share those sensitive secrets broadly with anyone who wants to
+install fullsend - and such secret sharing is inadmissible - we're forced to
+have the installation process carried out by an org administrator create a set
+of apps scoped to their org that they have full administrative control over;
+they own and manage the private keys for those apps.
+
+## Claude
+
+### Challenging the "per-org apps" assumption
+
+Your reasoning chain has a **specific weak link** in step 3. Here it is,
+isolated:
+
+> "if we were to have a single set of global github apps... then all of those
+> orgs would need *copies* of the private keys"
+
+**This is not how GitHub Apps are designed to work.** The GitHub App
+installation model was built specifically for multi-tenancy *without* sharing
+private keys. Here's the standard flow:
+
+1. A developer (fullsend) creates **one** GitHub App and holds the private key
+2. Many orgs **install** it (granting permissions on their repos)
+3. The app owner's backend uses the private key to mint a JWT, then exchanges
+   it for an **installation access token** scoped to a single org
+4. That token can only access repos in the org where the app is installed, with
+   only the permissions granted during installation
+
+This is how every GitHub Marketplace app works — Dependabot, Codecov, Renovate,
+etc. None of them ask you to create your own copy of their app.
+
+### Where your reasoning actually holds up
+
+Your logic becomes valid when you add the constraint **"we don't want to run a
+managed service."** That's the real load-bearing assumption — not the GitHub App
+architecture, but the infrastructure constraint.
+
+If workflows run in the org's GitHub Actions, and there's no central service to
+call, then the workflow itself needs to mint the installation token. To do that,
+it needs the private key. And sharing private keys is inadmissible. So per-org
+apps.
+
+The chain is really: **no managed service -> compute runs in org's Actions ->
+org needs private keys -> can't share private keys -> per-org apps.**
+
+### Three concrete alternatives that challenge the assumption
+
+**1. Reusable workflows as a token vending proxy**
+
+This is the most interesting one because it requires **zero infrastructure
+outside GitHub**.
+
+A reusable workflow in the `fullsend-ai` org could:
+- Hold the global app private keys as org secrets
+- Be called by workflows in the target org via `workflow_call`
+- Mint a short-lived installation token for the caller's org
+- The calling workflow uses the token for the rest of its run
+
+The calling org's workflow looks like:
+```yaml
+jobs:
+  get-token:
+    uses: fullsend-ai/.github/.github/workflows/mint-token.yml@main
+    with:
+      role: coder
+  do-work:
+    needs: get-token
+    # use the token...
+```
+
+**Problem:** Reusable workflow outputs are strings visible in logs. You'd need
+to work around this — perhaps by having the reusable workflow *do* the
+authenticated work directly (not just mint a token), or by using an
+intermediate mechanism. Also, the fullsend org becomes a single point of
+compromise: if its secrets leak, all orgs are affected.
+
+**2. OIDC + stateless token vending**
+
+GitHub Actions can mint OIDC tokens that cryptographically prove the identity
+of the workflow (org, repo, branch, workflow name). A stateless function
+(Cloudflare Worker, AWS Lambda) could:
+- Verify the OIDC token
+- Check that the calling org has installed the fullsend app
+- Mint and return a scoped installation token
+
+This is the pattern GitHub recommends for cloud auth (AWS, GCP, Azure all use
+it). The service is:
+- Stateless (no database)
+- Trivially simple (~50 lines of code)
+- Self-hostable (orgs paranoid about centralization can run their own)
+
+You could argue this "isn't really a managed service" in the way that running a
+database-backed SaaS is. But it's still *something* running somewhere, which
+violates the constraint as stated.
+
+**3. Hybrid: global apps with opt-in per-org override**
+
+Offer global shared apps as the default (easy onboarding,
+`actions/create-github-app-token` with a centralized token vending mechanism),
+but allow orgs to bring their own apps if they want full sovereignty. The
+install CLI already has most of the machinery for this.
+
+### What per-org apps actually cost you
+
+The current approach isn't free. The costs are real and compound:
+
+1. **Permissions drift.** When you add a new event subscription or permission
+   to an agent role (say, `coder` now needs `statuses: write`), every org has
+   to update their app. With global apps, you update once. The code in
+   `types.go:36-94` defines permissions per role — but those are only applied
+   at creation time. There's no upgrade path.
+
+2. **Onboarding friction.** The manifest flow (`appsetup.go:268-385`) starts a
+   local HTTP server, opens a browser, POSTs a form, waits for a callback, then
+   requires a *second* browser interaction to install the newly created app.
+   Multiply by 4 roles. Compare to: "click Install on the fullsend app page,
+   select your org, done."
+
+3. **Operational burden.** The org admin now manages 4 GitHub Apps (create,
+   install, rotate keys if compromised, delete on uninstall). The
+   `handleExistingApp` method at `appsetup.go:216-251` already handles the "PEM
+   is lost" failure mode — which wouldn't exist with global apps because the org
+   never has the PEM.
+
+4. **App name squatting.** Since apps have global slug namespaces,
+   `fullsend-acme-coder` is claimed by whoever creates it first. If org `acme`
+   doesn't use fullsend but someone registers that slug, the real `acme` is
+   stuck.
+
+### What per-org apps buy you
+
+To be fair, the benefits are also real:
+
+1. **Zero external dependencies.** No service to run, no availability concerns,
+   no SPOF. Pure GitHub-native.
+2. **True sovereignty.** The org owns the apps, the keys, the permissions. No
+   trust in a third party.
+3. **Blast radius isolation.** A compromise of one org's keys affects only that
+   org.
+4. **Aligns with "the repo is the coordinator."** No external coordination
+   layer.
+
+### Assessment
+
+The assumption is **defensible but not forced**. Your reasoning chain presents
+it as the only option given the constraints, but the "no managed service"
+constraint is doing all the work. The GitHub App model itself doesn't require
+per-org apps — it was specifically designed for the opposite.
+
+The question worth asking is: **Is "no managed service" a binary constraint, or
+a spectrum?** A stateless OIDC token vender is orders of magnitude simpler than
+"a managed service" in the traditional sense. And the reusable workflow approach
+requires zero infrastructure at all — it just shifts where the secrets live
+(fullsend org instead of each customer org).
+
+If the answer is truly "zero infrastructure, not even a Lambda, not even a
+reusable workflow holding shared secrets," then per-org apps is the correct
+conclusion. But that constraint should be stated explicitly in the design docs,
+because it's the actual decision — the per-org apps are a consequence, not a
+decision in themselves.
+
+---
+
+## rbean
+
+Does your assessment change if you consider that we almost certainly have a
+future requirement to be able to run our workloads not on github actions, but
+instead on other infrastructure - like, say, a kubernetes cluster where the orgs
+that have installed fullsend bring their own cluster?
+
+## Claude
+
+Yes, significantly. It reinforces the per-org apps decision and undermines all
+three alternatives I raised.
+
+**Every alternative I proposed is tightly coupled to GitHub Actions as the
+compute platform:**
+
+1. **Reusable workflows** — `workflow_call` is a GitHub Actions primitive. A
+   Kubernetes pod can't call one. Dead on arrival.
+
+2. **OIDC token vending** — GitHub Actions OIDC tokens prove "I am a workflow
+   running in org X, repo Y." A Kubernetes pod doesn't have a GitHub Actions
+   OIDC token. You could use Kubernetes service account OIDC tokens instead, but
+   now your vending service needs to understand multiple identity attestation
+   mechanisms per platform, and you need to establish trust roots for each one.
+   It's no longer "a trivial stateless function" — it's a real identity
+   federation service.
+
+3. **Any centralized token minting** — needs to be network-reachable from
+   wherever the compute runs. GitHub Actions has network access by default. A
+   customer's air-gapped or firewalled Kubernetes cluster? Not necessarily.
+
+**Per-org apps with org-held private keys are compute-agnostic.** The private
+key is just a blob. The org stores it wherever their compute runs:
+
+- GitHub Actions -> repo/org secret
+- Kubernetes -> K8s Secret or Vault
+- Something else -> whatever secret store that platform uses
+
+The code to mint an installation token (`sign JWT with private key -> POST
+/app/installations/{id}/access_tokens`) is ~20 lines in any language and works
+identically regardless of where it executes. No platform-specific identity
+federation, no network reachability requirements to a central service, no
+coupling to GitHub's compute model.
+
+**The constraint isn't really "no managed service." It's "compute-agnostic by
+design."** And that's a much stronger justification for per-org apps than the
+one currently implicit in the code. The bring-your-own-compute requirement makes
+the alternatives not just inconvenient but architecturally unsound — each new
+compute platform would require a new identity attestation integration with the
+central service, turning it into an ever-growing compatibility matrix.
+
+Your assumption is correct, and for a better reason than the one you stated. I'd
+recommend making this explicit in the design docs — the per-org app model is a
+consequence of compute-agnosticism, not just secret-sharing aversion.
No results found