pranavghadge/submission.md

Created May 13, 2026 23:23

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/pranavghadge/21dd1b2ce95d30389d244fd0a87a638d.js"></script>
Save pranavghadge/21dd1b2ce95d30389d244fd0a87a638d to your computer and use it in GitHub Desktop.

Download ZIP

Raw

submission.md

WindBorne Systems - MLOps Engineering Submission

Candidate: Pranav Tanaji Ghadge

1. Complex ML Model Inference

The most complex inference system I’ve managed was a real-time demand forecasting pipeline designed to serve enterprise-scale transactional data with a sub-200ms latency SLA. The core model was a gradient-boosted ensemble trained on roughly 1.5TB of data (60M+ records).

The Stack: I used AWS SageMaker for model hosting, S3 as the primary feature store, and Airflow to orchestrate micro-batch feature updates. This allowed us to maintain prediction freshness without the overhead of heavy real-time feature engineering.

Key Optimizations:

Feature Pruning: By identifying and dropping redundant lag features, I reduced the inference payload and cut total inference time by ~22%.
Pre-materialization: I decoupled feature freshness from request latency by pre-materializing feature vectors in a low-latency cache, ensuring the model didn't have to wait on upstream data fetches.
Environment Tuning: I diagnosed and fixed a container misconfiguration where the model was being reloaded into memory on every single request. Resolving this removed a 60-80ms bottleneck per call. It was a clear reminder that in production, the bottleneck is often the infrastructure surrounding the model, not the model's math.

2. Production Pain & Resilience

My most significant production "pain" occurred when a pricing model began serving stale predictions to live customers for over four hours. There were no hard errors or 5xx status codes; the system just served plausible-looking but incorrect data. The root cause was an upstream partner feed providing malformed records that bypassed our schema validation but silently corrupted our feature distributions.

The Solution: To prevent this from recurring, I implemented a two-layer defense:

Statistical Guardrails: I added ingestion-layer checks that compared incoming feature distributions against a 7-day rolling baseline. If the variance exceeded a specific threshold, it triggered an immediate alert.
Shadow Validation: I deployed a lightweight reference model in parallel. This acted as a "sanity check," flagging the system whenever the primary model’s outputs drifted outside of an expected, statistically probable range.

Result: These changes reduced our Mean Time to Detection (MTTD) from 4 hours to under 30 minutes. More importantly, it shifted our team culture from reactive "firefighting" to a proactive stance where we caught data-level failures before they impacted the customer experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment