Evaluation of the changes required to wesm/msgvault#284 ("feat: import-pst — import Microsoft Outlook PST archives") once wesm/msgvault#304 (identities-collections-dedup) is merged into upstream main.
Zero textual conflicts. PR284 only adds new files plus go.mod/go.sum entries:
| File | PR284 | #304 | Conflict? |
|---|---|---|---|
cmd/msgvault/cmd/import_pst.go |
new | — | No |
internal/importer/pst_import.go |
new | — | No |
internal/importer/pst_import_test.go, pst_integration_test.go |
new | — | No |
internal/pst/* (reader, mime, tests, testdata) |
new | — | No |
go.mod, go.sum |
adds mooijtech/go-pst/v6 |
untouched | No |
PLAN.md |
new | — | No |
#304 does not touch go.mod/go.sum, the internal/pst/ tree, or the import_pst command. The branches are in completely disjoint files.
PR284 follows the same shape as import-mbox and import-emlx. #304 instruments both of those importers with three behaviors that PR284 will need to mirror once it lands on top of #304. None of these will surface as merge conflicts; they're behavior parity gaps.
Every command that opens a store on #304 calls:
if err := st.InitSchema(); err != nil { ... }
if err := runStartupMigrations(st); err != nil {
return fmt.Errorf("startup migrations: %w", err)
}PR284's cmd/msgvault/cmd/import_pst.go (around line 580 in the diff) calls st.InitSchema() but stops there. Without runStartupMigrations, a user who runs msgvault import-pst … as the first command after upgrading from a pre-#304 build will get an outdated account_identities schema and skip the legacy [identity] config-block migration.
Fix: add the standard three-line block immediately after st.InitSchema().
#304's spec ("auto-default-identity") says every ingest path that knows a per-user identifier at source creation time must confirm it as an account-identifier identity, gated by a per-command --no-default-identity flag.
import-mbox and import-emlx already do this on #304: the importer-package Summary carries the new SourceID int64 field, and the CLI command calls confirmDefaultIdentity(st, sourceID, identifier, identifier, "account-identifier") after a successful import. The CLI also registers the standard --no-default-identity flag.
PR284's import-pst is structurally identical — import-pst <identifier> <pst-file> makes identifier an unambiguous per-user address. Fix:
- In
internal/importer/pst_import.go, addSourceID int64toPstImportSummaryand assignsummary.SourceID = src.IDafter theGetOrCreateSourcecall (around line 966 in the diff). - In
cmd/msgvault/cmd/import_pst.go:- Add a package-level
noDefaultIdentityImportPst bool. - Register
importPstCmd.Flags().BoolVar(&noDefaultIdentityImportPst, "no-default-identity", false, noDefaultIdentityHelp)ininit(). - After
summary, err := importer.ImportPst(...)succeeds, mirror theimport-emlxblock:if ctx.Err() == nil && !summary.HardErrors && !noDefaultIdentityImportPst { if summary.SourceID != 0 { confirmDefaultIdentity(st, summary.SourceID, identifier, identifier, "account-identifier") } else { logger.Warn("auto-default-identity: missing source id", "identifier", identifier) } }
- Add a package-level
The import-imessage exemption (no auto-confirm because participants aren't self-identifying) does not apply to PST — the user explicitly types their own address as the first positional arg.
#304 maintains remoteSourceTypes to gate which sources can have remote-deletion manifests. The current set is gmail only after the v1 gating commit. PST archives are local files with no remote authority, so PST sources should not — and currently will not — appear in remoteSourceTypes. No action needed on #284.
Once PR284 lands, PST-imported messages will participate in #304's dedup pipeline. The dedup engine keys on RFC822 Message-ID with content-hash fallback. PR284's MIME reconstruction populates Message-ID from TransportMessageHeaders when present, and synthesizes RFC 5322 headers for Exchange-native sends — so dedup against Gmail-imported copies of the same message will work for the ~80% of messages with TransportMessageHeaders, and fall back to content-hash for the synthesized rest. This is the right behavior; no action needed, but worth flagging in PR284's release note that PST imports become deduplicatable across accounts.
PR284 does not touch read paths. Once it lands, PST-imported messages are stored in the same messages table as everything else and inherit LiveMessagesWhere semantics automatically. No action needed.
git rebase origin/main— should apply cleanly with zero conflicts.- Add three blocks:
runStartupMigrations(st)afterInitSchema()incmd/msgvault/cmd/import_pst.go.SourceID int64onPstImportSummary+summary.SourceID = src.IDininternal/importer/pst_import.go.--no-default-identityflag + post-importconfirmDefaultIdentityblock incmd/msgvault/cmd/import_pst.go.
- Run
make testand verifyimport-pst <identifier> file.pstwrites a row toaccount_identities(matching the pinned-test pattern from #304's mbox/emlx coverage). - (Optional) Update PR284's body to note that PST-imported sources auto-confirm the supplied identifier as the account's default identity, with
--no-default-identityto opt out.
Lowest-risk merge in the queue. Pure additions on the file system, but PR284 pre-dates #304's auto-default-identity contract and so will land with a behavior gap unless the three-block fix above is applied. Total fix surface: ~15 lines across two files. No store-layer, query-layer, or schema work needed.