Skip to content

Instantly share code, notes, and snippets.

@pranavghadge
Created May 13, 2026 23:23
Show Gist options
  • Select an option

  • Save pranavghadge/21dd1b2ce95d30389d244fd0a87a638d to your computer and use it in GitHub Desktop.

Select an option

Save pranavghadge/21dd1b2ce95d30389d244fd0a87a638d to your computer and use it in GitHub Desktop.

WindBorne Systems - MLOps Engineering Submission

Candidate: Pranav Tanaji Ghadge

1. Complex ML Model Inference

The most complex inference system I’ve managed was a real-time demand forecasting pipeline designed to serve enterprise-scale transactional data with a sub-200ms latency SLA. The core model was a gradient-boosted ensemble trained on roughly 1.5TB of data (60M+ records).

The Stack: I used AWS SageMaker for model hosting, S3 as the primary feature store, and Airflow to orchestrate micro-batch feature updates. This allowed us to maintain prediction freshness without the overhead of heavy real-time feature engineering.

Key Optimizations:

  • Feature Pruning: By identifying and dropping redundant lag features, I reduced the inference payload and cut total inference time by ~22%.
  • Pre-materialization: I decoupled feature freshness from request latency by pre-materializing feature vectors in a low-latency cache, ensuring the model didn't have to wait on upstream data fetches.
  • Environment Tuning: I diagnosed and fixed a container misconfiguration where the model was being reloaded into memory on every single request. Resolving this removed a 60-80ms bottleneck per call. It was a clear reminder that in production, the bottleneck is often the infrastructure surrounding the model, not the model's math.

2. Production Pain & Resilience

My most significant production "pain" occurred when a pricing model began serving stale predictions to live customers for over four hours. There were no hard errors or 5xx status codes; the system just served plausible-looking but incorrect data. The root cause was an upstream partner feed providing malformed records that bypassed our schema validation but silently corrupted our feature distributions.

The Solution: To prevent this from recurring, I implemented a two-layer defense:

  1. Statistical Guardrails: I added ingestion-layer checks that compared incoming feature distributions against a 7-day rolling baseline. If the variance exceeded a specific threshold, it triggered an immediate alert.
  2. Shadow Validation: I deployed a lightweight reference model in parallel. This acted as a "sanity check," flagging the system whenever the primary model’s outputs drifted outside of an expected, statistically probable range.

Result: These changes reduced our Mean Time to Detection (MTTD) from 4 hours to under 30 minutes. More importantly, it shifted our team culture from reactive "firefighting" to a proactive stance where we caught data-level failures before they impacted the customer experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment