Skip to content

Instantly share code, notes, and snippets.

@jesserobbins
Created May 1, 2026 14:35
Show Gist options
  • Select an option

  • Save jesserobbins/b0eea10dc90f19e39401b7b275bb4f2b to your computer and use it in GitHub Desktop.

Select an option

Save jesserobbins/b0eea10dc90f19e39401b7b275bb4f2b to your computer and use it in GitHub Desktop.
msgvault#284 ↔ #304 merge assessment (by Claude Opus 4.7)

PR #284 ↔ #304 merge assessment

Evaluation of the changes required to wesm/msgvault#284 ("feat: import-pst — import Microsoft Outlook PST archives") once wesm/msgvault#304 (identities-collections-dedup) is merged into upstream main.

Conflict surface (mechanical)

Zero textual conflicts. PR284 only adds new files plus go.mod/go.sum entries:

File PR284 #304 Conflict?
cmd/msgvault/cmd/import_pst.go new No
internal/importer/pst_import.go new No
internal/importer/pst_import_test.go, pst_integration_test.go new No
internal/pst/* (reader, mime, tests, testdata) new No
go.mod, go.sum adds mooijtech/go-pst/v6 untouched No
PLAN.md new No

#304 does not touch go.mod/go.sum, the internal/pst/ tree, or the import_pst command. The branches are in completely disjoint files.

Semantic interactions to verify

PR284 follows the same shape as import-mbox and import-emlx. #304 instruments both of those importers with three behaviors that PR284 will need to mirror once it lands on top of #304. None of these will surface as merge conflicts; they're behavior parity gaps.

1. runStartupMigrations after InitSchema (REQUIRED FIX)

Every command that opens a store on #304 calls:

if err := st.InitSchema(); err != nil { ... }
if err := runStartupMigrations(st); err != nil {
    return fmt.Errorf("startup migrations: %w", err)
}

PR284's cmd/msgvault/cmd/import_pst.go (around line 580 in the diff) calls st.InitSchema() but stops there. Without runStartupMigrations, a user who runs msgvault import-pst … as the first command after upgrading from a pre-#304 build will get an outdated account_identities schema and skip the legacy [identity] config-block migration.

Fix: add the standard three-line block immediately after st.InitSchema().

2. Auto-confirm default identity at source creation (REQUIRED FIX)

#304's spec ("auto-default-identity") says every ingest path that knows a per-user identifier at source creation time must confirm it as an account-identifier identity, gated by a per-command --no-default-identity flag.

import-mbox and import-emlx already do this on #304: the importer-package Summary carries the new SourceID int64 field, and the CLI command calls confirmDefaultIdentity(st, sourceID, identifier, identifier, "account-identifier") after a successful import. The CLI also registers the standard --no-default-identity flag.

PR284's import-pst is structurally identical — import-pst <identifier> <pst-file> makes identifier an unambiguous per-user address. Fix:

  1. In internal/importer/pst_import.go, add SourceID int64 to PstImportSummary and assign summary.SourceID = src.ID after the GetOrCreateSource call (around line 966 in the diff).
  2. In cmd/msgvault/cmd/import_pst.go:
    • Add a package-level noDefaultIdentityImportPst bool.
    • Register importPstCmd.Flags().BoolVar(&noDefaultIdentityImportPst, "no-default-identity", false, noDefaultIdentityHelp) in init().
    • After summary, err := importer.ImportPst(...) succeeds, mirror the import-emlx block:
      if ctx.Err() == nil && !summary.HardErrors && !noDefaultIdentityImportPst {
          if summary.SourceID != 0 {
              confirmDefaultIdentity(st, summary.SourceID, identifier, identifier, "account-identifier")
          } else {
              logger.Warn("auto-default-identity: missing source id", "identifier", identifier)
          }
      }

The import-imessage exemption (no auto-confirm because participants aren't self-identifying) does not apply to PST — the user explicitly types their own address as the first positional arg.

3. Source type registration in remoteSourceTypes (no action needed)

#304 maintains remoteSourceTypes to gate which sources can have remote-deletion manifests. The current set is gmail only after the v1 gating commit. PST archives are local files with no remote authority, so PST sources should not — and currently will not — appear in remoteSourceTypes. No action needed on #284.

4. Dedup interaction (informational)

Once PR284 lands, PST-imported messages will participate in #304's dedup pipeline. The dedup engine keys on RFC822 Message-ID with content-hash fallback. PR284's MIME reconstruction populates Message-ID from TransportMessageHeaders when present, and synthesizes RFC 5322 headers for Exchange-native sends — so dedup against Gmail-imported copies of the same message will work for the ~80% of messages with TransportMessageHeaders, and fall back to content-hash for the synthesized rest. This is the right behavior; no action needed, but worth flagging in PR284's release note that PST imports become deduplicatable across accounts.

5. live-message filter (no action needed)

PR284 does not touch read paths. Once it lands, PST-imported messages are stored in the same messages table as everything else and inherit LiveMessagesWhere semantics automatically. No action needed.

Resolution recipe (when #284 rebases onto post-#304 main)

  1. git rebase origin/main — should apply cleanly with zero conflicts.
  2. Add three blocks:
    • runStartupMigrations(st) after InitSchema() in cmd/msgvault/cmd/import_pst.go.
    • SourceID int64 on PstImportSummary + summary.SourceID = src.ID in internal/importer/pst_import.go.
    • --no-default-identity flag + post-import confirmDefaultIdentity block in cmd/msgvault/cmd/import_pst.go.
  3. Run make test and verify import-pst <identifier> file.pst writes a row to account_identities (matching the pinned-test pattern from #304's mbox/emlx coverage).
  4. (Optional) Update PR284's body to note that PST-imported sources auto-confirm the supplied identifier as the account's default identity, with --no-default-identity to opt out.

Verdict

Lowest-risk merge in the queue. Pure additions on the file system, but PR284 pre-dates #304's auto-default-identity contract and so will land with a behavior gap unless the three-block fix above is applied. Total fix surface: ~15 lines across two files. No store-layer, query-layer, or schema work needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment