zdrummond · January 13, 2026 03:10
diff --git a/gistfile1.txt b/gistfile1.txt
 # Preflight Checker - Complete Check Reference

 **Document Objective**: Comprehensive categorized list of all checks performed by the preflight checker for H100 ML deployment quality assurance.

 **Version**: 1.11.7
 **Last Updated**: 2025-11-30

 ---

 ## Check Count Summary

 | Category | Pattern-Based | AST-Based | Total |
 |----------|---------------|-----------|-------|
 | 1. GPU Efficiency | 5 | 6 | **11** |
 | 2. Training Issues | 8 | 3 | **11** |
 | 3. Loss Function | 7 | 3 | **10** |
 | 4. Evaluation Mode | 3 | 1 | **4** |
 | 5. In-Place Operations | - | 4 | **4** |
 | 6. Performance/CPU | - | 6 | **6** |
 | 7. File I/O | - | 3 | **3** |
 | 8. Code Quality | 17 | 9 | **26** |
 | 9. Import Validation | - | 7 | **7** |
 | 10. Test Quality | 7 | 2 | **9** |
 | 11. Fixture Analysis | - | 14 | **14** |
 | 12. Constants Enforcement | - | 1 | **1** |
 | 13. External Tools (Ruff) | - | 1 | **1** |
 | **TOTAL** | **47** | **60** | **107** |

 ---

 ## Table of Contents

 1. [GPU Efficiency Checks](#1-gpu-efficiency-checks)
 2. [Training Issues](#2-training-issues)
 3. [Loss Function Checks](#3-loss-function-checks)
 4. [Evaluation Mode Checks](#4-evaluation-mode-checks)
 5. [PyTorch In-Place Operation Detection](#5-pytorch-in-place-operation-detection)
 6. [Performance & CPU Bottleneck Checks](#6-performance--cpu-bottleneck-checks)
 7. [File I/O Bottleneck Detection](#7-file-io-bottleneck-detection)
 8. [Code Quality Checks](#8-code-quality-checks)
 9. [Import Validation](#9-import-validation)
 10. [Test Quality Checks](#10-test-quality-checks)
 11. [Fixture Analysis](#11-fixture-analysis)
 12. [Constants Enforcement](#12-constants-enforcement)
 13. [External Tool Integration](#13-external-tool-integration)

 ---

 ## 1. GPU Efficiency Checks

 **Category**: `gpu_efficiency`, `gpu_underutilization`, `gpu_synchronization`, `cpu_transfer`

 Ensures code is optimized for H100 GPU with 95GB memory.

 ### Pattern-Based Checks

 | Check | Message | Category |
 |-------|---------|----------|
 | `device = "cpu"` | Forcing CPU device - use cuda | `gpu_efficiency` |
 | `for...to(device)` | Moving to device in loop - batch transfer | `gpu_efficiency` |
 | `.cuda()...for` | GPU transfer in loop - vectorize first | `gpu_efficiency` |
 | `batch_size = N` (N < 100) | Small batch size - use >=256 for H100 | `gpu_efficiency` |
 | `BATCH_SIZE = N` (N < 100) | Small batch size constant - use >=256 for H100 | `gpu_efficiency` |

 ### AST-Based Checks (GPUUtilizationAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Small batch size assignment | Batch sizes < 128 flagged for H100 | `gpu_underutilization` |
 | `.item()` in loop | Forces GPU sync, causes stalls | `gpu_synchronization` |
 | `.cpu()` transfer | Unnecessary CPU transfer detected | `cpu_transfer` |
 | `.to("cpu")` | Transfer to CPU detected | `cpu_transfer` |
 | No batch size config | Training file without batch size configuration | `gpu_underutilization` |
 | DataLoader missing `pin_memory` | Should use pin_memory=True for faster GPU transfer | `gpu_efficiency` |

 ---

 ## 2. Training Issues

 **Category**: `training_issues`

 Detects common mistakes in training loops and gradient handling.

 ### Pattern-Based Checks

 | Check | Message | Category |
 |-------|---------|----------|
 | Multiple `.backward()` | Multiple backward passes - usually wrong | `training_issues` |
 | `optimizer.step()` without `zero_grad` | step() without zero_grad nearby | `training_issues` |
 | `loss.item()...backward()` | Getting item before backward - breaks graph | `training_issues` |
 | `model.train()...model.eval()` | Mixing train/eval in same scope | `training_issues` |
 | `torch.no_grad...backward` | Backward in no_grad context | `training_issues` |
 | `dropout...eval()` | Dropout active during eval | `training_issues` |
 | `DataLoader...num_workers=0` | Single-threaded data loading - use workers | `training_issues` |
 | `for...in dataset` (no DataLoader) | Iterating dataset without DataLoader | `training_issues` |

 ### AST-Based Checks (MLFunctionAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Training loop without `zero_grad()` | Missing gradient clearing | `training_issues` |
 | Training loop without `step()` | Missing optimizer weight update | `training_issues` |
 | `optimizer.step()` without nearby `zero_grad()` | step() called without clearing gradients | `training_issues` |

 ---

 ## 3. Loss Function Checks

 **Category**: `loss_issues`, `semantic_error`, `loss_routing`

 Ensures loss functions compute real values, not placeholders.

 ### Pattern-Based Checks

 | Check | Message | Category |
 |-------|---------|----------|
 | `loss.*return N.N` | Hardcoded loss value - compute from inputs | `loss_issues` |
 | `return N.N #.*loss` | Hardcoded loss value - compute from inputs | `loss_issues` |
 | `loss = N.N` | Hardcoded loss assignment | `loss_issues` |
 | `MSELoss.*+.*CrossEntropy` | Mixing incompatible losses | `loss_issues` |
 | `torch.log()` without epsilon | Log without stability epsilon | `loss_issues` |
 | `torch.sqrt()` without epsilon | Sqrt without stability epsilon | `loss_issues` |
 | `1 / torch` | Division without stability check | `loss_issues` |

 ### AST-Based Checks (MLFunctionAnalyzer, LossRoutingAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Loss function returns constant | Loss function returns a literal value | `semantic_error` |
 | Loss function ignores inputs | Critical parameters like `input`, `target`, `pred` not used | `semantic_error` |
 | Wrong loss routing | Loss type doesn't match specification for view_kind | `loss_routing` |

 ---

 ## 4. Evaluation Mode Checks

 **Category**: `eval_issues`

 Ensures models are in proper evaluation mode during inference.

 ### Pattern-Based Checks

 | Check | Message | Category |
 |-------|---------|----------|
 | `@torch.no_grad` not on eval function | no_grad decorator not on eval function | `eval_issues` |
 | `accuracy =...item() / len` | Computing metrics with .item() in loop | `eval_issues` |
 | `torch.mean...dim=None` | Mean without dimension - specify dim | `eval_issues` |

 ### AST-Based Checks (EvalModeChecker)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Model inference without `.eval()` | Model called without eval() in scope | `eval_issues` |

 **Smart Detection Features:**
 - Recognizes training contexts (won't flag during training)
 - Tracks `.eval()` calls within function scope
 - Excludes non-ML generators (password generators, data factories, etc.)
 - Recognizes methods that handle eval internally (e.g., `SentenceTransformer.encode()`)

 ---

 ## 5. PyTorch In-Place Operation Detection

 **Category**: `pytorch_inplace`

 Detects in-place tensor operations that may break autograd during backward passes.

 ### Detection Rules (InPlaceOperationAnalyzer)

 | Rule | Check | Description |
 |------|-------|-------------|
 | **Rule 1** | Augmented assignments | `+=`, `-=`, `*=`, `/=`, `//=`, `**=` on tensors |
 | **Rule 2** | Underscore methods | `.add_()`, `.mul_()`, `.zero_()`, `.fill_()`, `.clamp_()`, etc. |
 | **Rule 3** | Buffer modification in `forward()` | `self.buffer[idx] = value` or `self.cache = value` |
 | **Rule 4** | `.to()` without `.clone()` | In loss functions, `.to()` may return a view |

 **Smart Detection Features:**
 - Ignores operations inside `torch.no_grad()` and `torch.inference_mode()` contexts
 - Recognizes `.to().clone()` chains as safe pattern
 - Distinguishes integer/float counters from tensors
 - Tracks variable initializations for type inference

 ---

 ## 6. Performance & CPU Bottleneck Checks

 **Category**: `performance`, `cpu_bottleneck`

 Detects CPU-bound operations that bottleneck GPU training.

 ### AST-Based Checks (DataPipelineAnalyzer, PairResolverAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Sequential data processing | Class processes data in loops without parallelization | `cpu_bottleneck` |
 | DataLoader with `num_workers=0` | Single-threaded data loading | `cpu_bottleneck` |
 | DataLoader without `num_workers` | Missing num_workers specification | `cpu_bottleneck` |
 | PairResolver sequential processing | PairResolver uses sequential loops without multiprocessing | `performance` |

 ### AST-Based Checks (TensorOperationAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | `.item()` in loop | GPU synchronization in loop | `performance` |
 | Building tensor in loop | Creating tensors in loop - use vectorized ops | `performance` |

 ---

 ## 7. File I/O Bottleneck Detection

 **Category**: `io_bottleneck`, `hot_path_io`, `loop_io`

 Detects file I/O operations in performance-critical paths.

 ### Detected Operations (FileIOAnalyzer)

 - **Built-in**: `open()`, `read()`, `write()`, `close()`
 - **Path**: `read_text()`, `write_text()`, `read_bytes()`, `write_bytes()`
 - **JSON**: `json.dump()`, `json.load()` (not `dumps`/`loads`)
 - **Pickle**: `pickle.dump()`, `pickle.load()`
 - **PyTorch**: `torch.save()`, `torch.load()`, `save_checkpoint()`
 - **NumPy**: `np.save()`, `np.load()`, `np.savetxt()`, `np.loadtxt()`
 - **Pandas**: `to_csv()`, `read_csv()`, `to_hdf()`, `read_hdf()`

 ### Context-Aware Detection

 | Context | Severity | Category |
 |---------|----------|----------|
 | I/O in loop within hot path | Critical | `io_bottleneck` |
 | I/O directly in hot path | Severe | `hot_path_io` |
 | I/O in any loop | Warning | `loop_io` |

 **Hot Path Functions**: `train`, `training_step`, `train_epoch`, `fit`, `forward`, `__call__`, `process_batch`, `collate_fn`, `__getitem__`, `__iter__`, `__next__`

 ---

 ## 8. Code Quality Checks

 **Category**: `placeholders`, `debug_code`, `aliasing`, `backward_compatibility`, `duplicate_class`, `naming_collision`, `dead_code`, `unreachable_code`, `magic_number`

 ### Pattern-Based Checks (Placeholders & Debug)

 | Check | Message | Category |
 |-------|---------|----------|
 | `raise NotImplementedError` | Placeholder implementation | `placeholders` |
 | `pass # TODO` | TODO with pass statement | `placeholders` |
 | `# FIXME` | FIXME comment - address before deployment | `placeholders` |
 | `# HACK` | HACK comment - fix properly | `placeholders` |
 | `# XXX` | XXX marker - needs attention | `placeholders` |
 | `torch.randn...#...placeholder` | Random tensor marked as placeholder | `placeholders` |
 | `return torch.randn` | Returning random tensor - likely placeholder | `placeholders` |
 | `# TODO` | TODO comment - complete before deployment | `placeholders` |
 | `# Placeholder` | Placeholder comment found | `placeholders` |
 | `# Simplified` | Simplified/temporary implementation | `placeholders` |
 | `breakpoint()` | Breakpoint left in code | `debug_code` |
 | `import pdb` | Debugger import left in code | `debug_code` |
 | `print(.*debug` | Debug print statement | `debug_code` |

 ### Pattern-Based Checks (Aliasing & Compatibility)

 | Check | Message | Category |
 |-------|---------|----------|
 | `= # alias` | Variable aliasing detected | `aliasing` |
 | `ALIAS =` | ALIAS constant detected | `aliasing` |
 | `# for backward compat` | Backward compatibility - upgrade instead | `backward_compatibility` |
 | `# maintain backward compat` | Backward compatibility - upgrade instead | `backward_compatibility` |

 ### AST-Based Checks (Cross-File Analysis)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Duplicate class definitions | Same class defined in multiple files | `duplicate_class` |
 | Naming collisions | Class name aliased to different class | `naming_collision` |
 | Dead code detection | Methods never called (production or test-only) | `dead_code` |
 | Unreachable code | Code after return or always-false conditions | `unreachable_code` |
 | Duplicate conditions | Same condition in if-elif chain | `unreachable_code` |

 ### Magic Number Detection (MagicNumberAnalyzer)

 | Context | Description | Category |
 |---------|-------------|----------|
 | ML layer definitions | `Linear`, `Conv2d`, `Dropout`, etc. | `magic_number` |
 | Optimizer parameters | `lr`, `weight_decay`, `momentum` | `magic_number` |
 | Comparison thresholds | Probability/score comparisons | `magic_number` |
 | Scaling factors | Multipliers > 10 | `magic_number` |

 ---

 ## 9. Import Validation

 **Category**: `import_location`, `import_error`

 ### Import Location Checks (ImportLocationAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Import inside function | Imports should be at module level | `import_location` |
 | Import inside loop | Imports inside for/while loops | `import_location` |
 | Import inside if block | Conditional imports (except platform checks) | `import_location` |
 | Import inside with block | Imports in context managers | `import_location` |

 **Exceptions:**
 - Platform/version checks (`sys.platform`, `sys.version_info`)
 - Optional import patterns (`try/except ImportError`)
 - Lazy import patterns (`if var is None: import`)
 - Test files (function-level imports allowed)

 ### Import Resolution Checks (ImportValidator)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Cannot import name | Name not found in module | `import_error` |
 | Test file importing prod constants | Test-specific constants from prod module | `import_error` |
 | Production file importing test module | Prod code importing from test modules | `import_error` |

 ---

 ## 10. Test Quality Checks

 **Category**: `test_issues`, `test_quality`

 ### Pattern-Based Checks

 | Check | Message | Category |
 |-------|---------|----------|
 | `from unittest.mock` | Mock forbidden - use real models on H100 | `test_issues` |
 | `@patch(` | Patch decorator forbidden - use real implementations | `test_issues` |
 | `MagicMock(` | MagicMock forbidden - use real objects | `test_issues` |
 | `torch.randn(` | Random tensors in test - use real data | `test_issues` |
 | `np.random.randn` | Unseeded random data in test | `test_issues` |
 | `assert True` | Meaningless assertion | `test_issues` |
 | `except...pass` | Swallowing exceptions in test | `test_issues` |

 ### AST-Based Checks (TestQualityAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Test without assertions | `test_*` function has no assert/pytest.raises/unittest.assert | `test_quality` |
 | Assert True | `assert True` is meaningless | `test_quality` |

 ---

 ## 11. Fixture Analysis

 **Category**: `unused_fixture`, `duplicate_setup`, `fixture_antipattern`, `forbidden_direct_creation`, `missing_fixture_imports`, `undefined_fixture`

 ### Cross-File Fixture Checks (FixtureAnalyzer)

 | Check | Description | Category |
 |-------|-------------|----------|
 | Unused fixtures | Fixtures defined but never used | `unused_fixture` |
 | Duplicate setup code | Similar setup code in multiple tests | `duplicate_setup` |
 | Manual tempfile usage | Use pytest's `tmp_path` fixture instead | `fixture_antipattern` |
 | Manual patch usage | Consider fixtures for repeated mocking | `fixture_antipattern` |
 | Manual database connections | Use fixture with proper cleanup | `fixture_antipattern` |
 | Manual client initialization | Use fixture for consistent setup | `fixture_antipattern` |
 | try/finally cleanup | Use fixture with yield for cleanup | `fixture_antipattern` |

 ### Forbidden Direct Creation

 | Class | Suggestion | Category |
 |-------|------------|----------|
 | `MeasurementSystem()` | Use untrained_model or fresh_model fixture | `forbidden_direct_creation` |
 | `UnlabeledTrainer()` | Use trainer_factory or fresh_trainer fixture | `forbidden_direct_creation` |
 | `ModelTrainer()` | Use trainer fixtures instead | `forbidden_direct_creation` |
 | `NeuralNetwork()` | Use model fixtures instead | `forbidden_direct_creation` |
 | `DeepLearningModel()` | Use model fixtures instead | `forbidden_direct_creation` |

 ### Fixture Import Checks

 | Check | Description | Category |
 |-------|-------------|----------|
 | Missing fixture imports | Fixtures defined but not imported in conftest.py | `missing_fixture_imports` |
 | Undefined fixtures | Fixtures used but not defined anywhere | `undefined_fixture` |

 ---

 ## 12. Constants Enforcement

 **Category**: `constants_enforcement`

 Ensures magic numbers are defined in designated constants files.

 ### Rules (ConstantsEnforcer)

 | Rule | Description |
 |------|-------------|
 | Allowed files | `constants.py`, `test_constants.py` only |
 | Universally allowed | -1, 0, 1, 2, True, False, None |
 | Allowed contexts | `range()`, `enumerate()`, `len()`, array indexing, slicing |
 | Requires constant | All other numeric literals |

 ### Error Messages

 | Context | Example Hint |
 |---------|--------------|
 | Floats 0-1 | `DROPOUT_RATE`, `LEARNING_RATE`, `PROBABILITY_THRESHOLD` |
 | Large floats | `MAX_ITERATIONS`, `BUFFER_SIZE`, `TIMEOUT_MS` |
 | Small floats | `EPSILON`, `THRESHOLD`, `SCALING_FACTOR` |
 | Large integers | `BATCH_SIZE`, `NUM_EPOCHS`, `HIDDEN_DIM` |
 | Negative integers | `ERROR_CODE`, `INVALID_INDEX`, `PENALTY` |
 | Other integers | `MIN_THRESHOLD`, `MAX_RETRIES`, `NUM_LAYERS` |

 ---

 ## 13. External Tool Integration

 ### Ruff Code Quality (RuffChecker)

 **Category**: `code_quality`

 | Check | Description |
 |-------|-------------|
 | Ruff linting | Runs `ruff check .` on project |
 | Linting errors | Reports count and suggests `ruff check . --fix` |

 **Note**: Ruff is optional. If not installed, preflight passes this check.

 ---

 ## Skip Mechanism

 All checks support skip comments for legitimate exceptions:

 ```python
 # preflight-skip: reason for skipping (min 8 chars)
 code_to_skip()
 ```

 ### File-Level Skips

 ```python
 # preflight: skip-magic-numbers (grandfathered code)
 ```

 ---

 ## Configuration

 Checks can be enabled/disabled via `preflight.json`:

 ```json
 {
  "project_type": "neural_network",
  "source_dirs": ["src"],
  "min_batch_size": 256,
  "ml_checks": {
    "check_tensor_operations": true,
    "check_data_loaders": true,
    "detect_unreachable": true,
    "detect_magic_numbers": true,
    "detect_duplicate_classes": true,
    "detect_naming_collisions": true,
    "detect_dead_code": true,
    "detect_unused_fixtures": true,
    "detect_duplicate_setup": true,
    "detect_fixture_factories": true,
    "detect_fixture_antipatterns": true,
    "detect_forbidden_direct_creation": true,
    "detect_missing_fixture_imports": true,
    "suggest_fixtures": false
  }
 }
 ```

 ---

 ## Category Summary

 | Category | Count | Severity |
 |----------|-------|----------|
 | `placeholders` | 11 patterns | Critical |
 | `debug_code` | 3 patterns | Critical |
 | `loss_issues` | 7+ patterns | High |
 | `training_issues` | 10+ patterns | High |
 | `gpu_efficiency` | 6+ patterns | Medium |
 | `eval_issues` | 4+ patterns | Medium |
 | `pytorch_inplace` | 4 rules | High |
 | `performance` | 4+ checks | Medium |
 | `io_bottleneck` | Context-aware | Medium |
 | `test_issues` | 7 patterns | Medium |
 | `code_quality` | Multiple | Low-Medium |
 | `import_error` | 3 checks | Medium |
 | `fixture_*` | 6+ checks | Low-Medium |
 | `constants_enforcement` | 1 analyzer | Low |

 ---

 *This document is auto-generated from source code analysis. For implementation details, see the analyzer source files in `src/preflight/analyzers/`.*
	# Preflight Checker - Complete Check Reference

	Document Objective: Comprehensive categorized list of all checks performed by the preflight checker for H100 ML deployment quality assurance.

	Version: 1.11.7
	Last Updated: 2025-11-30

	---

	## Check Count Summary

	\| Category \| Pattern-Based \| AST-Based \| Total \|
	\|----------\|---------------\|-----------\|-------\|
	\| 1. GPU Efficiency \| 5 \| 6 \| 11 \|
	\| 2. Training Issues \| 8 \| 3 \| 11 \|
	\| 3. Loss Function \| 7 \| 3 \| 10 \|
	\| 4. Evaluation Mode \| 3 \| 1 \| 4 \|
	\| 5. In-Place Operations \| - \| 4 \| 4 \|
	\| 6. Performance/CPU \| - \| 6 \| 6 \|
	\| 7. File I/O \| - \| 3 \| 3 \|
	\| 8. Code Quality \| 17 \| 9 \| 26 \|
	\| 9. Import Validation \| - \| 7 \| 7 \|
	\| 10. Test Quality \| 7 \| 2 \| 9 \|
	\| 11. Fixture Analysis \| - \| 14 \| 14 \|
	\| 12. Constants Enforcement \| - \| 1 \| 1 \|
	\| 13. External Tools (Ruff) \| - \| 1 \| 1 \|
	\| TOTAL \| 47 \| 60 \| 107 \|

	---

	## Table of Contents

	1. [GPU Efficiency Checks](#1-gpu-efficiency-checks)
	2. [Training Issues](#2-training-issues)
	3. [Loss Function Checks](#3-loss-function-checks)
	4. [Evaluation Mode Checks](#4-evaluation-mode-checks)
	5. [PyTorch In-Place Operation Detection](#5-pytorch-in-place-operation-detection)
	6. [Performance & CPU Bottleneck Checks](#6-performance--cpu-bottleneck-checks)
	7. [File I/O Bottleneck Detection](#7-file-io-bottleneck-detection)
	8. [Code Quality Checks](#8-code-quality-checks)
	9. [Import Validation](#9-import-validation)
	10. [Test Quality Checks](#10-test-quality-checks)
	11. [Fixture Analysis](#11-fixture-analysis)
	12. [Constants Enforcement](#12-constants-enforcement)
	13. [External Tool Integration](#13-external-tool-integration)

	---

	## 1. GPU Efficiency Checks

	Category: `gpu_efficiency`, `gpu_underutilization`, `gpu_synchronization`, `cpu_transfer`

	Ensures code is optimized for H100 GPU with 95GB memory.

	### Pattern-Based Checks

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `device = "cpu"` \| Forcing CPU device - use cuda \| `gpu_efficiency` \|
	\| `for...to(device)` \| Moving to device in loop - batch transfer \| `gpu_efficiency` \|
	\| `.cuda()...for` \| GPU transfer in loop - vectorize first \| `gpu_efficiency` \|
	\| `batch_size = N` (N < 100) \| Small batch size - use >=256 for H100 \| `gpu_efficiency` \|
	\| `BATCH_SIZE = N` (N < 100) \| Small batch size constant - use >=256 for H100 \| `gpu_efficiency` \|

	### AST-Based Checks (GPUUtilizationAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Small batch size assignment \| Batch sizes < 128 flagged for H100 \| `gpu_underutilization` \|
	\| `.item()` in loop \| Forces GPU sync, causes stalls \| `gpu_synchronization` \|
	\| `.cpu()` transfer \| Unnecessary CPU transfer detected \| `cpu_transfer` \|
	\| `.to("cpu")` \| Transfer to CPU detected \| `cpu_transfer` \|
	\| No batch size config \| Training file without batch size configuration \| `gpu_underutilization` \|
	\| DataLoader missing `pin_memory` \| Should use pin_memory=True for faster GPU transfer \| `gpu_efficiency` \|

	---

	## 2. Training Issues

	Category: `training_issues`

	Detects common mistakes in training loops and gradient handling.

	### Pattern-Based Checks

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| Multiple `.backward()` \| Multiple backward passes - usually wrong \| `training_issues` \|
	\| `optimizer.step()` without `zero_grad` \| step() without zero_grad nearby \| `training_issues` \|
	\| `loss.item()...backward()` \| Getting item before backward - breaks graph \| `training_issues` \|
	\| `model.train()...model.eval()` \| Mixing train/eval in same scope \| `training_issues` \|
	\| `torch.no_grad...backward` \| Backward in no_grad context \| `training_issues` \|
	\| `dropout...eval()` \| Dropout active during eval \| `training_issues` \|
	\| `DataLoader...num_workers=0` \| Single-threaded data loading - use workers \| `training_issues` \|
	\| `for...in dataset` (no DataLoader) \| Iterating dataset without DataLoader \| `training_issues` \|

	### AST-Based Checks (MLFunctionAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Training loop without `zero_grad()` \| Missing gradient clearing \| `training_issues` \|
	\| Training loop without `step()` \| Missing optimizer weight update \| `training_issues` \|
	\| `optimizer.step()` without nearby `zero_grad()` \| step() called without clearing gradients \| `training_issues` \|

	---

	## 3. Loss Function Checks

	Category: `loss_issues`, `semantic_error`, `loss_routing`

	Ensures loss functions compute real values, not placeholders.

	### Pattern-Based Checks

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `loss.*return N.N` \| Hardcoded loss value - compute from inputs \| `loss_issues` \|
	\| `return N.N #.*loss` \| Hardcoded loss value - compute from inputs \| `loss_issues` \|
	\| `loss = N.N` \| Hardcoded loss assignment \| `loss_issues` \|
	\| `MSELoss.+.CrossEntropy` \| Mixing incompatible losses \| `loss_issues` \|
	\| `torch.log()` without epsilon \| Log without stability epsilon \| `loss_issues` \|
	\| `torch.sqrt()` without epsilon \| Sqrt without stability epsilon \| `loss_issues` \|
	\| `1 / torch` \| Division without stability check \| `loss_issues` \|

	### AST-Based Checks (MLFunctionAnalyzer, LossRoutingAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Loss function returns constant \| Loss function returns a literal value \| `semantic_error` \|
	\| Loss function ignores inputs \| Critical parameters like `input`, `target`, `pred` not used \| `semantic_error` \|
	\| Wrong loss routing \| Loss type doesn't match specification for view_kind \| `loss_routing` \|

	---

	## 4. Evaluation Mode Checks

	Category: `eval_issues`

	Ensures models are in proper evaluation mode during inference.

	### Pattern-Based Checks

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `@torch.no_grad` not on eval function \| no_grad decorator not on eval function \| `eval_issues` \|
	\| `accuracy =...item() / len` \| Computing metrics with .item() in loop \| `eval_issues` \|
	\| `torch.mean...dim=None` \| Mean without dimension - specify dim \| `eval_issues` \|

	### AST-Based Checks (EvalModeChecker)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Model inference without `.eval()` \| Model called without eval() in scope \| `eval_issues` \|

	Smart Detection Features:
	- Recognizes training contexts (won't flag during training)
	- Tracks `.eval()` calls within function scope
	- Excludes non-ML generators (password generators, data factories, etc.)
	- Recognizes methods that handle eval internally (e.g., `SentenceTransformer.encode()`)

	---

	## 5. PyTorch In-Place Operation Detection

	Category: `pytorch_inplace`

	Detects in-place tensor operations that may break autograd during backward passes.

	### Detection Rules (InPlaceOperationAnalyzer)

	\| Rule \| Check \| Description \|
	\|------\|-------\|-------------\|
	\| Rule 1 \| Augmented assignments \| `+=`, `-=`, `=`, `/=`, `//=`, `*=` on tensors \|
	\| Rule 2 \| Underscore methods \| `.add_()`, `.mul_()`, `.zero_()`, `.fill_()`, `.clamp_()`, etc. \|
	\| Rule 3 \| Buffer modification in `forward()` \| `self.buffer[idx] = value` or `self.cache = value` \|
	\| Rule 4 \| `.to()` without `.clone()` \| In loss functions, `.to()` may return a view \|

	Smart Detection Features:
	- Ignores operations inside `torch.no_grad()` and `torch.inference_mode()` contexts
	- Recognizes `.to().clone()` chains as safe pattern
	- Distinguishes integer/float counters from tensors
	- Tracks variable initializations for type inference

	---

	## 6. Performance & CPU Bottleneck Checks

	Category: `performance`, `cpu_bottleneck`

	Detects CPU-bound operations that bottleneck GPU training.

	### AST-Based Checks (DataPipelineAnalyzer, PairResolverAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Sequential data processing \| Class processes data in loops without parallelization \| `cpu_bottleneck` \|
	\| DataLoader with `num_workers=0` \| Single-threaded data loading \| `cpu_bottleneck` \|
	\| DataLoader without `num_workers` \| Missing num_workers specification \| `cpu_bottleneck` \|
	\| PairResolver sequential processing \| PairResolver uses sequential loops without multiprocessing \| `performance` \|

	### AST-Based Checks (TensorOperationAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| `.item()` in loop \| GPU synchronization in loop \| `performance` \|
	\| Building tensor in loop \| Creating tensors in loop - use vectorized ops \| `performance` \|

	---

	## 7. File I/O Bottleneck Detection

	Category: `io_bottleneck`, `hot_path_io`, `loop_io`

	Detects file I/O operations in performance-critical paths.

	### Detected Operations (FileIOAnalyzer)

	- Built-in: `open()`, `read()`, `write()`, `close()`
	- Path: `read_text()`, `write_text()`, `read_bytes()`, `write_bytes()`
	- JSON: `json.dump()`, `json.load()` (not `dumps`/`loads`)
	- Pickle: `pickle.dump()`, `pickle.load()`
	- PyTorch: `torch.save()`, `torch.load()`, `save_checkpoint()`
	- NumPy: `np.save()`, `np.load()`, `np.savetxt()`, `np.loadtxt()`
	- Pandas: `to_csv()`, `read_csv()`, `to_hdf()`, `read_hdf()`

	### Context-Aware Detection

	\| Context \| Severity \| Category \|
	\|---------\|----------\|----------\|
	\| I/O in loop within hot path \| Critical \| `io_bottleneck` \|
	\| I/O directly in hot path \| Severe \| `hot_path_io` \|
	\| I/O in any loop \| Warning \| `loop_io` \|

	Hot Path Functions: `train`, `training_step`, `train_epoch`, `fit`, `forward`, `__call__`, `process_batch`, `collate_fn`, `__getitem__`, `__iter__`, `__next__`

	---

	## 8. Code Quality Checks

	Category: `placeholders`, `debug_code`, `aliasing`, `backward_compatibility`, `duplicate_class`, `naming_collision`, `dead_code`, `unreachable_code`, `magic_number`

	### Pattern-Based Checks (Placeholders & Debug)

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `raise NotImplementedError` \| Placeholder implementation \| `placeholders` \|
	\| `pass # TODO` \| TODO with pass statement \| `placeholders` \|
	\| `# FIXME` \| FIXME comment - address before deployment \| `placeholders` \|
	\| `# HACK` \| HACK comment - fix properly \| `placeholders` \|
	\| `# XXX` \| XXX marker - needs attention \| `placeholders` \|
	\| `torch.randn...#...placeholder` \| Random tensor marked as placeholder \| `placeholders` \|
	\| `return torch.randn` \| Returning random tensor - likely placeholder \| `placeholders` \|
	\| `# TODO` \| TODO comment - complete before deployment \| `placeholders` \|
	\| `# Placeholder` \| Placeholder comment found \| `placeholders` \|
	\| `# Simplified` \| Simplified/temporary implementation \| `placeholders` \|
	\| `breakpoint()` \| Breakpoint left in code \| `debug_code` \|
	\| `import pdb` \| Debugger import left in code \| `debug_code` \|
	\| `print(.*debug` \| Debug print statement \| `debug_code` \|

	### Pattern-Based Checks (Aliasing & Compatibility)

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `= # alias` \| Variable aliasing detected \| `aliasing` \|
	\| `ALIAS =` \| ALIAS constant detected \| `aliasing` \|
	\| `# for backward compat` \| Backward compatibility - upgrade instead \| `backward_compatibility` \|
	\| `# maintain backward compat` \| Backward compatibility - upgrade instead \| `backward_compatibility` \|

	### AST-Based Checks (Cross-File Analysis)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Duplicate class definitions \| Same class defined in multiple files \| `duplicate_class` \|
	\| Naming collisions \| Class name aliased to different class \| `naming_collision` \|
	\| Dead code detection \| Methods never called (production or test-only) \| `dead_code` \|
	\| Unreachable code \| Code after return or always-false conditions \| `unreachable_code` \|
	\| Duplicate conditions \| Same condition in if-elif chain \| `unreachable_code` \|

	### Magic Number Detection (MagicNumberAnalyzer)

	\| Context \| Description \| Category \|
	\|---------\|-------------\|----------\|
	\| ML layer definitions \| `Linear`, `Conv2d`, `Dropout`, etc. \| `magic_number` \|
	\| Optimizer parameters \| `lr`, `weight_decay`, `momentum` \| `magic_number` \|
	\| Comparison thresholds \| Probability/score comparisons \| `magic_number` \|
	\| Scaling factors \| Multipliers > 10 \| `magic_number` \|

	---

	## 9. Import Validation

	Category: `import_location`, `import_error`

	### Import Location Checks (ImportLocationAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Import inside function \| Imports should be at module level \| `import_location` \|
	\| Import inside loop \| Imports inside for/while loops \| `import_location` \|
	\| Import inside if block \| Conditional imports (except platform checks) \| `import_location` \|
	\| Import inside with block \| Imports in context managers \| `import_location` \|

	Exceptions:
	- Platform/version checks (`sys.platform`, `sys.version_info`)
	- Optional import patterns (`try/except ImportError`)
	- Lazy import patterns (`if var is None: import`)
	- Test files (function-level imports allowed)

	### Import Resolution Checks (ImportValidator)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Cannot import name \| Name not found in module \| `import_error` \|
	\| Test file importing prod constants \| Test-specific constants from prod module \| `import_error` \|
	\| Production file importing test module \| Prod code importing from test modules \| `import_error` \|

	---

	## 10. Test Quality Checks

	Category: `test_issues`, `test_quality`

	### Pattern-Based Checks

	\| Check \| Message \| Category \|
	\|-------\|---------\|----------\|
	\| `from unittest.mock` \| Mock forbidden - use real models on H100 \| `test_issues` \|
	\| `@patch(` \| Patch decorator forbidden - use real implementations \| `test_issues` \|
	\| `MagicMock(` \| MagicMock forbidden - use real objects \| `test_issues` \|
	\| `torch.randn(` \| Random tensors in test - use real data \| `test_issues` \|
	\| `np.random.randn` \| Unseeded random data in test \| `test_issues` \|
	\| `assert True` \| Meaningless assertion \| `test_issues` \|
	\| `except...pass` \| Swallowing exceptions in test \| `test_issues` \|

	### AST-Based Checks (TestQualityAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Test without assertions \| `test_*` function has no assert/pytest.raises/unittest.assert \| `test_quality` \|
	\| Assert True \| `assert True` is meaningless \| `test_quality` \|

	---

	## 11. Fixture Analysis

	Category: `unused_fixture`, `duplicate_setup`, `fixture_antipattern`, `forbidden_direct_creation`, `missing_fixture_imports`, `undefined_fixture`

	### Cross-File Fixture Checks (FixtureAnalyzer)

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Unused fixtures \| Fixtures defined but never used \| `unused_fixture` \|
	\| Duplicate setup code \| Similar setup code in multiple tests \| `duplicate_setup` \|
	\| Manual tempfile usage \| Use pytest's `tmp_path` fixture instead \| `fixture_antipattern` \|
	\| Manual patch usage \| Consider fixtures for repeated mocking \| `fixture_antipattern` \|
	\| Manual database connections \| Use fixture with proper cleanup \| `fixture_antipattern` \|
	\| Manual client initialization \| Use fixture for consistent setup \| `fixture_antipattern` \|
	\| try/finally cleanup \| Use fixture with yield for cleanup \| `fixture_antipattern` \|

	### Forbidden Direct Creation

	\| Class \| Suggestion \| Category \|
	\|-------\|------------\|----------\|
	\| `MeasurementSystem()` \| Use untrained_model or fresh_model fixture \| `forbidden_direct_creation` \|
	\| `UnlabeledTrainer()` \| Use trainer_factory or fresh_trainer fixture \| `forbidden_direct_creation` \|
	\| `ModelTrainer()` \| Use trainer fixtures instead \| `forbidden_direct_creation` \|
	\| `NeuralNetwork()` \| Use model fixtures instead \| `forbidden_direct_creation` \|
	\| `DeepLearningModel()` \| Use model fixtures instead \| `forbidden_direct_creation` \|

	### Fixture Import Checks

	\| Check \| Description \| Category \|
	\|-------\|-------------\|----------\|
	\| Missing fixture imports \| Fixtures defined but not imported in conftest.py \| `missing_fixture_imports` \|
	\| Undefined fixtures \| Fixtures used but not defined anywhere \| `undefined_fixture` \|

	---

	## 12. Constants Enforcement

	Category: `constants_enforcement`

	Ensures magic numbers are defined in designated constants files.

	### Rules (ConstantsEnforcer)

	\| Rule \| Description \|
	\|------\|-------------\|
	\| Allowed files \| `constants.py`, `test_constants.py` only \|
	\| Universally allowed \| -1, 0, 1, 2, True, False, None \|
	\| Allowed contexts \| `range()`, `enumerate()`, `len()`, array indexing, slicing \|
	\| Requires constant \| All other numeric literals \|

	### Error Messages

	\| Context \| Example Hint \|
	\|---------\|--------------\|
	\| Floats 0-1 \| `DROPOUT_RATE`, `LEARNING_RATE`, `PROBABILITY_THRESHOLD` \|
	\| Large floats \| `MAX_ITERATIONS`, `BUFFER_SIZE`, `TIMEOUT_MS` \|
	\| Small floats \| `EPSILON`, `THRESHOLD`, `SCALING_FACTOR` \|
	\| Large integers \| `BATCH_SIZE`, `NUM_EPOCHS`, `HIDDEN_DIM` \|
	\| Negative integers \| `ERROR_CODE`, `INVALID_INDEX`, `PENALTY` \|
	\| Other integers \| `MIN_THRESHOLD`, `MAX_RETRIES`, `NUM_LAYERS` \|

	---

	## 13. External Tool Integration

	### Ruff Code Quality (RuffChecker)

	Category: `code_quality`

	\| Check \| Description \|
	\|-------\|-------------\|
	\| Ruff linting \| Runs `ruff check .` on project \|
	\| Linting errors \| Reports count and suggests `ruff check . --fix` \|

	Note: Ruff is optional. If not installed, preflight passes this check.

	---

	## Skip Mechanism

	All checks support skip comments for legitimate exceptions:

	```python
	# preflight-skip: reason for skipping (min 8 chars)
	code_to_skip()
	```

	### File-Level Skips

	```python
	# preflight: skip-magic-numbers (grandfathered code)
	```

	---

	## Configuration

	Checks can be enabled/disabled via `preflight.json`:

	```json
	{
	"project_type": "neural_network",
	"source_dirs": ["src"],
	"min_batch_size": 256,
	"ml_checks": {
	"check_tensor_operations": true,
	"check_data_loaders": true,
	"detect_unreachable": true,
	"detect_magic_numbers": true,
	"detect_duplicate_classes": true,
	"detect_naming_collisions": true,
	"detect_dead_code": true,
	"detect_unused_fixtures": true,
	"detect_duplicate_setup": true,
	"detect_fixture_factories": true,
	"detect_fixture_antipatterns": true,
	"detect_forbidden_direct_creation": true,
	"detect_missing_fixture_imports": true,
	"suggest_fixtures": false
	}
	}
	```

	---

	## Category Summary

	\| Category \| Count \| Severity \|
	\|----------\|-------\|----------\|
	\| `placeholders` \| 11 patterns \| Critical \|
	\| `debug_code` \| 3 patterns \| Critical \|
	\| `loss_issues` \| 7+ patterns \| High \|
	\| `training_issues` \| 10+ patterns \| High \|
	\| `gpu_efficiency` \| 6+ patterns \| Medium \|
	\| `eval_issues` \| 4+ patterns \| Medium \|
	\| `pytorch_inplace` \| 4 rules \| High \|
	\| `performance` \| 4+ checks \| Medium \|
	\| `io_bottleneck` \| Context-aware \| Medium \|
	\| `test_issues` \| 7 patterns \| Medium \|
	\| `code_quality` \| Multiple \| Low-Medium \|
	\| `import_error` \| 3 checks \| Medium \|
	\| `fixture_*` \| 6+ checks \| Low-Medium \|
	\| `constants_enforcement` \| 1 analyzer \| Low \|

	---

	This document is auto-generated from source code analysis. For implementation details, see the analyzer source files in `src/preflight/analyzers/`.
No results found