Created
January 21, 2026 08:28
-
-
Save ivanvmoreno/4057e06e7cfb93b86bec27c9fb8124db to your computer and use it in GitHub Desktop.
Quarto Santander presentation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| title: "Hello Santander 👋" | |
| format: | |
| revealjs: | |
| mermaid: | |
| theme: default | |
| --- | |
| # Who am I? | |
| * Iván Moreno ([ivan@ivan.build]()) | |
| * (Ex) Lead ML Engineer at Nieve Consulting | |
| * PhD candidate in AI Safety at Universidad Internacional de Valencia | |
| * Full-time motorcycles & coffee nerd 🛵☕ | |
| # What do I do? | |
| * Recommender systems | |
| * Customer segmentation | |
| * Demand forecasting | |
| * Zero-knowledge model verification | |
| * Autonomous systems & agentic workloads | |
| * Unstructured data processing (text and image) | |
| * Conversational AI | |
| * MLOps & LLMOps strategy and implementation | |
| # Academic Research 👨🔬 | |
| # No Answer Needed: Predicting LLM Answer Accuracy | |
| --- | |
|  | |
| --- | |
| ::: {.incremental} | |
| * **Idea:** do LLMs know whether they’re going to fail before generating a single token of the answer? | |
| * **Setup:** extract residual stream activations at the last token of the input. | |
| * **Methodology:** linear probe (classifier) trained to predict between questions the model will answer correctly vs incorrectly. | |
| * **Results:** a generalizable latent "correctness" direction encoded in the model internals was found. | |
| * [arxiv.org/abs/2509.10625](https://arxiv.org/abs/2509.10625) | |
| ::: | |
| # Predictive Selection of Optimal Language for Reasoning Tasks in LLMs | |
| --- | |
|  | |
| --- | |
| ::: {.incremental} | |
| * **Motivation:** recent empirical evidence shows non-English languages can match or exceed English performance on specific reasoning tasks. | |
| * **Problem:** drastic gap between standard voting and the theoretical "upper bound" where the correct answer exists in at least one language. | |
| * **Idea:** predict the optimal language subset a priori by analyzing the activation geometry of a single forward pass. | |
| ::: | |
| --- | |
| ::: {.incremental} | |
| * **Hypothesis:** "representational diversity" (measured via activation similarity kernels) correlates with accessing different, complementary subsets of parametric knowledge, or alternative reasoning paths. | |
| * **Methodology:** analyze the geometry of activations using non-linear kernels (e.g., RBF) and algorithmically (e.g. DDP) select the most diverse language subset for a given input. | |
| ::: | |
| # Industry 👨🔧 | |
| # Text-to-object pipeline for CRM SaaS | |
| ## Challenge | |
| * Custom query language, 2 variations, no formal specification and no common representation | |
| * Single-tenant hybrid (on-premise, multi-cloud) deployments | |
| * Unique, large & evolving data schemas per tenant | |
| * Multiple languages, geographies and industries with unique terminology & knowledge requirements | |
| ## Stack | |
| * **State machine:** Apache Burr | |
| * **LLMOps (prompt versioning, tracing, experiment tracking, feedback collection):** Langfuse | |
| * **AI Gateway:** LiteLLM | |
| * **Structured generation:** Pydantic + Instructor | |
| * **Evaluation Framework:** DeepEval | |
| * **PII Redaction:** Presidio | |
| * **Schema processing:** Celery + Apache Hamilton | |
| --- | |
| ::: {.r-stretch} | |
|  | |
| ::: | |
| --- | |
|  |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment