ivanvmoreno · January 21, 2026 08:28
diff --git a/gistfile1.txt b/gistfile1.txt
 ---
 title: "Hello Santander 👋"
 format: 
  revealjs:
    mermaid:
      theme: default
 ---

 # Who am I?

 * Iván Moreno ([ivan@ivan.build]())
 * (Ex) Lead ML Engineer at Nieve Consulting
 * PhD candidate in AI Safety at Universidad Internacional de Valencia
 * Full-time motorcycles & coffee nerd 🛵☕


 # What do I do?

 * Recommender systems
 * Customer segmentation
 * Demand forecasting
 * Zero-knowledge model verification
 * Autonomous systems & agentic workloads
 * Unstructured data processing (text and image)
 * Conversational AI
 * MLOps & LLMOps strategy and implementation

 # Academic Research 👨‍🔬

 # No Answer Needed: Predicting LLM Answer Accuracy

 ---

 ![](no_answer_needed.png)

 ---

 ::: {.incremental}

 * **Idea:** do LLMs know whether they’re going to fail before generating a single token of the answer?
 * **Setup:** extract residual stream activations at the last token of the input.
 * **Methodology:** linear probe (classifier) trained to predict between questions the model will answer correctly vs incorrectly.
 * **Results:** a generalizable latent "correctness" direction encoded in the model internals was found.
 * [arxiv.org/abs/2509.10625](https://arxiv.org/abs/2509.10625)

 :::

 # Predictive Selection of Optimal Language for Reasoning Tasks in LLMs

 ---

 ![](lsk.png)

 ---

 ::: {.incremental}

 * **Motivation:** recent empirical evidence shows non-English languages can match or exceed English performance on specific reasoning tasks.
 * **Problem:** drastic gap between standard voting and the theoretical "upper bound" where the correct answer exists in at least one language.
 * **Idea:** predict the optimal language subset a priori by analyzing the activation geometry of a single forward pass.

 :::

 ---

 ::: {.incremental}

 * **Hypothesis:** "representational diversity" (measured via activation similarity kernels) correlates with accessing different, complementary subsets of parametric knowledge, or alternative reasoning paths.
 * **Methodology:** analyze the geometry of activations using non-linear kernels (e.g., RBF) and algorithmically (e.g. DDP) select the most diverse language subset for a given input.

 :::

 # Industry 👨‍🔧

 # Text-to-object pipeline for CRM SaaS

 ## Challenge

 * Custom query language, 2 variations, no formal specification and no common representation
 * Single-tenant hybrid (on-premise, multi-cloud) deployments
 * Unique, large & evolving data schemas per tenant
 * Multiple languages, geographies and industries with unique terminology & knowledge requirements

 ## Stack

 * **State machine:** Apache Burr
 * **LLMOps (prompt versioning, tracing, experiment tracking, feedback collection):** Langfuse
 * **AI Gateway:** LiteLLM
 * **Structured generation:** Pydantic + Instructor
 * **Evaluation Framework:** DeepEval
 * **PII Redaction:** Presidio
 * **Schema processing:** Celery + Apache Hamilton

 ---

 ::: {.r-stretch}

 ![](c4.png)

 :::

 ---

 ![](seq.png)
	---
	title: "Hello Santander 👋"
	format:
	revealjs:
	mermaid:
	theme: default
	---

	# Who am I?

	* Iván Moreno ([ivan@ivan.build]())
	* (Ex) Lead ML Engineer at Nieve Consulting
	* PhD candidate in AI Safety at Universidad Internacional de Valencia
	* Full-time motorcycles & coffee nerd 🛵☕


	# What do I do?

	* Recommender systems
	* Customer segmentation
	* Demand forecasting
	* Zero-knowledge model verification
	* Autonomous systems & agentic workloads
	* Unstructured data processing (text and image)
	* Conversational AI
	* MLOps & LLMOps strategy and implementation

	# Academic Research 👨‍🔬

	# No Answer Needed: Predicting LLM Answer Accuracy

	---

	![](no_answer_needed.png)

	---

	::: {.incremental}

	* Idea: do LLMs know whether they’re going to fail before generating a single token of the answer?
	* Setup: extract residual stream activations at the last token of the input.
	* Methodology: linear probe (classifier) trained to predict between questions the model will answer correctly vs incorrectly.
	* Results: a generalizable latent "correctness" direction encoded in the model internals was found.
	* [arxiv.org/abs/2509.10625](https://arxiv.org/abs/2509.10625)

	:::

	# Predictive Selection of Optimal Language for Reasoning Tasks in LLMs

	---

	![](lsk.png)

	---

	::: {.incremental}

	* Motivation: recent empirical evidence shows non-English languages can match or exceed English performance on specific reasoning tasks.
	* Problem: drastic gap between standard voting and the theoretical "upper bound" where the correct answer exists in at least one language.
	* Idea: predict the optimal language subset a priori by analyzing the activation geometry of a single forward pass.

	:::

	---

	::: {.incremental}

	* Hypothesis: "representational diversity" (measured via activation similarity kernels) correlates with accessing different, complementary subsets of parametric knowledge, or alternative reasoning paths.
	* Methodology: analyze the geometry of activations using non-linear kernels (e.g., RBF) and algorithmically (e.g. DDP) select the most diverse language subset for a given input.

	:::

	# Industry 👨‍🔧

	# Text-to-object pipeline for CRM SaaS

	## Challenge

	* Custom query language, 2 variations, no formal specification and no common representation
	* Single-tenant hybrid (on-premise, multi-cloud) deployments
	* Unique, large & evolving data schemas per tenant
	* Multiple languages, geographies and industries with unique terminology & knowledge requirements

	## Stack

	* State machine: Apache Burr
	* LLMOps (prompt versioning, tracing, experiment tracking, feedback collection): Langfuse
	* AI Gateway: LiteLLM
	* Structured generation: Pydantic + Instructor
	* Evaluation Framework: DeepEval
	* PII Redaction: Presidio
	* Schema processing: Celery + Apache Hamilton

	---

	::: {.r-stretch}

	![](c4.png)

	:::

	---

	![](seq.png)
No results found