ljw1004/LEARNINGS.md

Learnings

This file is for durable engineering wisdom that should survive refactors. It represents general senior-engineer maturity and expertise.

Engineering discipline

Every warning is harmful. Warnings indicate mismatch between intent and reality; either fix immediately or stop and design a clean fix.
Lint/format discipline applies everywhere, including standalone code. Even demo or standalone subprojects should wire and pass ESLint + Prettier (or arc lint equivalents), ideally via local scripts and scoped overrides rather than by skipping checks.
Independent validation/review tasks should run in parallel. When tasks do not depend on one another (e.g., separate review prompts, independent checks), launch them concurrently and reserve sequential execution for true dependency chains only.
Tests must not rely on fixed non-zero sleeps for correctness. If a test cares about ordering, gate the relevant async boundary explicitly or wait on an observable condition; setTimeout(..., 0) or tight 1ms polling can be acceptable for yielding/polling, but larger guessed delays make tests nondeterministic and hide race conditions.
Claude feedback is a quality input, not just a blocker gate. During planning/review loops, ask for and evaluate all meaningful feedback (not only blockers). Adopt any suggestion that improves correctness, elegance, or simplicity.
Prevent review oscillation by recording decisions and rationale in the plan itself. When accepting or rejecting significant feedback, encode the decision and justification directly in the plan document so subsequent review rounds see stable intent instead of re-opening the same tradeoffs.
Write plan docs in final form, not as iteration history. Present the chosen design directly, then list rejected alternatives separately if they add value. Do not narrate “we first thought X, then switched to Y” inside an implementation handoff document.
During a review loop, any round that causes plan changes must be followed by another full review round. Do not stop after a review that produced worthwhile improvements; continue iterating until a complete round yields no further changes worth making.
Match the review bar to the user's stated goal. If the user asks for viability of an idea or sketch plan, evaluate feasibility, core risks, and whether the mechanism can work in principle; do not criticize it for not being decision-complete unless the user asked for an implementation-ready plan.
KISS for internal tooling: prefer fewer knobs. For repo-local maintenance scripts with one intended workflow, avoid optional CLI flags that add decision surface (e.g. tool binary path overrides). Prefer standard environment assumptions (PATH) and fail loudly when prerequisites are missing.
When widening a local compatibility type, preserve that type's existing local convention unless the task explicitly includes re-normalizing it. Do not opportunistically "fix" casing, value families, or naming to match a neighboring subsystem's enum shape if the current task only requires adding one more accepted value.
Keep boundary-specific parsing at that boundary. If a value needs cleanup only because one source API has a peculiar shape (for example VS Code settings returning undefined | '' | EnumValue), handle that fixup right where that API is read. Do not generalize it into shared domain helpers whose names imply broader semantics than the boundary actually has.
Do not extract a shared normalize* helper when the only weirdness is one boundary's input shape. If two callsites are each reading the same awkward API boundary, small duplicated fixup at those reads is better than a shared helper with a broad name like normalizeReasoningEffort(). That helper falsely suggests domain-wide normalization semantics, hides the real source of the oddity, and invites future misuse outside the original boundary.
Match the local testing style before adding a new test surface. Do not introduce a brand-new class of component/unit test just to lock one small UI detail unless the surrounding area already uses that style or the behavior is important enough to justify changing the local testing pattern.
Do not add speculative test hooks. A data-testid or similar testing seam should exist because a concrete current test or harness needs it, not just because it might be useful someday.
A file-length limit is a readability goal, not a license to compress code. Never remove meaningful whitespace, paragraph breaks, or explanatory comments just to get under a line-count target. If the file is still too large after preserving readable structure, solve that honestly with better factoring, a justified exemption, or a new plan.
Don't guess; inspect first. Confirm reality from code/runtime state before recommending changes.
String-rewrite helpers need exact example-driven docs. For functions whose main job is transforming one string format into another, document the contract with concrete interesting examples that show the literal returned string whenever practical, not just abstract prose or summarized behavior; exact examples are the fastest way for future readers to verify edge cases and precedence.
When an error string names a bad value, verify which boundary actually emitted that value before changing identifier normalization. A visible invalid identifier may have been produced by a different layer than the one you are inspecting, so "fixing" the nearest mapping can be a cleanly implemented but wrong diagnosis.
A failing test does not, by itself, prove production code is wrong. Before changing implementation, check whether the asserted behavior is still an explicit contract in current code, docs, plan notes, or neighboring callsites. If the signal is mixed or missing, stop and ask the user instead of choosing a side by intuition.
Use observability as a constraint on implementation, not as a reason to widen contracts. If a parameter exists only so a downstream helper can emit a nicer log or telemetry field, the tail is wagging the dog. Preserve the essential black-box data flow and let logging adapt to that contract, not the other way around.
Prefer self-describing discriminants over boolean boundary parameters. A bare true/false contract like isResumedConversation forces readers to remember hidden meaning, while a small discriminated value like threadKind: 'new' | 'resume' explains itself at the callsite and in telemetry/tests.
Avoid type indirection when the concrete contract is clearer. Reaching through another type with forms like SomeType['kind'] often saves nothing while making the local boundary harder to read. If the real local contract is a small concrete union like 'new' | 'resume', write that union directly unless the indirection is carrying meaningful shared semantics rather than mere coupling.
Avoid filler nouns like Context and Utils in API and type names. If a boundary can be named after the concrete work it does, prefer the ugly-but-accurate name over a vague wrapper like *Context; and for small local return shapes like {thread: Thread; config: ConfigReadResponse}, prefer writing the structural type inline rather than inventing a named wrapper concept that adds no meaning.
Do not add "impossible" throws just to satisfy a weak type boundary. If callers always need a value after a successful prerequisite, strengthen the producer contract to return that value directly instead of forcing a second nullable lookup plus a defensive throw that nobody can truly justify. Prove the invariant at the ownership boundary, then expose that stronger contract outward.
Prefer one collection over parallel related collections. If two arrays represent the same discovered objects at different readiness levels, model them as one record shape with explicit nullable fields instead of relying on subset/synchronization invariants between collections.
Do not keep multiple mutable state fields whose relationships are enforced only by invariant comments when one source-of-truth shape would do. Parallel sets/maps like "all seen", "seen from source A", and "seen from source B" become hard to audit because readers must mentally replay every update site to prove they stay synchronized; prefer one collection with explicit provenance data, and derive projections from it when needed.
Do not add a sibling cached field unless there is a real second use for the retained value. If one boundary can fetch a value and immediately return it to its caller, adding a new mutable field just to “have it around” creates extra lifecycle questions (when is it set?, is it cleared?, can it drift from the primary field?) without earning its keep. Add the field only when later reads truly need retained ownership, and then make that retained state the clear source of truth.
If a boundary already has one retained cache and one immediate caller, prefer returning extra one-shot data to that caller over widening the retained cache. Resume-only metadata, startup-only probes, and similar acquisition byproducts usually belong in the acquisition result for immediate projection, not in long-lived owner state.
Keep single-owner derivation logic in the owning module. If parsing or policy derivation exists only to serve one established module's behavior, prefer private helpers inside that module over extracting a new sibling module and public type. Extraction should be earned by a real second owner, not by aesthetic tidiness.
Avoid generic *Utils modules. Keep single-owner helpers in the owning test/module, and name shared helpers after the concrete contract they provide rather than a catch-all “utils” bucket.
If removing an incidental wait breaks behavior elsewhere, the wait was masking a real synchronization bug. Do not restore the sleep as a convenience fix; move the synchronization to the true ownership boundary that needs the work to be settled.

Architecture and refactorability

Bias toward purity from the first implementation. Pure functional-style input→output logic is far cheaper to move, test, and recombine later than mixed logic/effects.
Separate decision logic from effects early. Keep computation in pure helpers/reducers; keep I/O, storage, DOM, and network writes in thin orchestration layers.
Design modules around effect boundaries. A module should either decide what should happen (pure) or perform side effects (impure), not both.
Treat modularity as a maintenance multiplier, not a cleanup pass. Building with clear boundaries up front avoids expensive, risky refactors after feature completion.

Logging and telemetry

When rich failure context must flow through many layers, prefer one summary string plus one opaque details object. This keeps ownership and projection simple: the error module formats the summary once, while callers forward the untouched details bag to telemetry/logging without mirroring its fields into parallel APIs.
When a boundary exposes {humanSummary, telemetry}**, keep that pair as the primary contract instead of tunneling it through** Error.message or wrapper types named like *Telemetry**.** One human-facing string and one opaque machine-facing bag is enough; duplicating them as message, summary, humanSummary, processTelemetry, or processTelemetry.telemetry creates ownership confusion and brittle propagation paths.
Do not hard-code telemetry taxonomy in product code unless product behavior truly branches on it. Product code should prefer emitting rich raw observations and stable context; slicing those observations into narrower categories is usually better owned by the telemetry-reading pipeline, where analysis can evolve without shipping new product logic every time the original classification proves incomplete.
Telemetry-only drift monitoring must stay best-effort and minimal. If a fetch exists only to learn drift over time, do not let it widen product state, picker contracts, startup blocking, or caching complexity. Emit one bounded best-effort signal and keep product behavior independent of that probe.
Treat logging as a debugging contract, not telemetry. Prefer boundary/failure events over high-volume chatter.
Aggregate telemetry should answer aggregate questions. If an event mainly helps reconstruct one transcript or debug one code path, keep it in logger.info instead of central telemetry unless there is a clear product-analysis need.
Do not substitute local logs for telemetry requirements. If a milestone asks for product-wide observability, implement centralized telemetry events; local logfile diagnostics are optional and never a replacement.
If a lower layer detects a diagnostic that must reach centralized telemetry, surface it as a structured event to the owning boundary instead of keeping it logger-only. This preserves one clear telemetry owner and avoids duplicating parsing logic across log consumers.
Keep logging logic lightweight and stateless. Avoid caches, dedup state, and suppression machinery that complicate core code.
For narrow debugging work, keep logs repro-driven and minimal. Do not create a shared module just to format a couple of debug-log fields, and do not carry unrelated fields just because they are easy to log; prefer ordinary logger statements at the owning callsite with only the data needed for the current investigation.
Keep logging helpers effect-free when possible. A helper that is named like logging should not also hide unrelated mutations or streaming side effects.
Log where state is owned. Emit diagnostics in the layer that owns the relevant lifecycle/state, instead of duplicating logs across call stacks.
Log transitions more than positions. State changes, request boundaries, and failure edges are usually more actionable than repeated steady-state snapshots.
Preserve diagnostic context in thrown errors. If errors are logged at a higher layer, carry actionable context in the error object/message so single-site logging does not lose detail.
Avoid local logger.error immediately followed by throw**.** Prefer a single logging site up-stack; preserve context in the thrown error itself.
Do not add logging-only catch/rethrow blocks. If a lower layer catches an error, it must be because it is translating state or enriching the thrown error contract, not merely to emit a log line before rethrow.
Do not add state solely for logging. Avoid timing fields, duplicate booleans, or extra counters whose only purpose is logs.
Do not compute elapsed time only for logs when the logger already timestamps lines. Prefer simpler boundary logs and let readers infer timing from adjacent log timestamps.
Avoid duplicate representations of the same fact. Logging helpers should not introduce parallel fields that require invariants to stay in sync.
Add truncation/special formatting only with evidence. Keep log payload handling simple unless real data shows logs are too large/noisy.
Do not manually prefix log lines with logger identity. Logger configuration/category should provide source attribution.
Do not write tests whose only purpose is to lock down debugging output. Tests are a debugging tool, and logs are a debugging tool; asserting debug-only wording usually just debugs the debugging. Prefer static checks and real-flow log inspection unless the logging path also changes functional control flow or user-visible product behavior.
Do not widen a module's production export surface just to unit-test its internals. If a helper is not part of the real owning contract, keep it private and test the behavior through the module's true public surface or through the higher-level owner that consumes it. Extra exports for tests are a design smell: they let tests dictate production shape instead of validating real behavior.

Mte90 · 2026-04-24T14:47:43Z

wow thanks!

ljw1004/LEARNINGS.md

Select an option

No results found

Select an option

No results found

Learnings

Engineering discipline

Architecture and refactorability

Logging and telemetry

Mte90 commented Apr 24, 2026

Uh oh!