Last active
February 25, 2026 14:30
-
-
Save djalmaaraujo/d0a940fd89b82456c976114c05518991 to your computer and use it in GitHub Desktop.
Revisions
-
djalmaaraujo created this gist
Feb 25, 2026 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,257 @@ # Plan: Generalize Meta Matching for Authors, Sections, and Tags ## Context The tag matching system (prefix matching, normalization, suggestions, `tag_info` metadata) currently only works for tags. When a user asks "How are posts by John Smith doing?" and the LLM sends `any_author: "john smith"`, no resolution happens — the raw string goes straight to the API. If the casing or format doesn't match exactly, results may be empty. The Mage API already supports `find_keys` for all meta types (`ctx.mage["author"]`, `ctx.mage["section"]`, `ctx.mage["tag"]`) with the same interface. We need to generalize the matching pipeline so authors and sections get the same treatment as tags. **Key differences**: Authors/sections have no smart tag prefixes (`parsely_smart:*`) and no site-specific colon prefixes (`tag:`, `ssts:`). Their matching is simpler: search, normalize, exact match, suggestions. **Compare.py gets this for free** since it calls `ANALYTICS_TOOL.method()` → `query_analytics`, which is where resolution happens. ## Files to Modify 1. `apps/agent/tools/lib/tag_matcher.py` — Add generic `search_meta`, `normalize_for_matching`, `find_matching_meta` 2. `apps/agent/tools/analytics.py` — Add `_resolve_meta_filter`, update `query_analytics` to resolve authors/sections 3. `apps/agent/templates/agent/tools/query_analytics.md` — Update LLM instructions for meta matching 4. `tests/agent/tools/test_tag_matcher.py` — Tests for new generic functions 5. `tests/agent/tools/test_resolve_tag_filter.py` — Tests for meta resolution in analytics ## Implementation ### Step 1: Add `search_meta` generic search function (`tag_matcher.py`) ```python def search_meta(ctx, aspect: str, query: str, limit: int = 20) -> list[str]: """Search for meta values using the Mage API. Works for any aspect: "tag", "author", or "section". Returns list of meta values (including prefixes for tags). """ try: metas = ctx.mage[aspect].find_keys(query.lower(), limit=limit) results = [] if "keys" in metas: for item in metas["keys"]: if isinstance(item, dict) and aspect in item: value = item[aspect] if value: results.append(str(value)) log.debug(f"Meta search ({aspect}): query='{query}', hits={metas.get('hits', 0)}, results={results[:10]}") return results except Exception as e: log.error(f"Error searching {aspect}: {e}", exc_info=True) return [] ``` Make existing `search_tags` a thin wrapper: ```python def search_tags(ctx, query: str, limit: int = 20) -> list[str]: return search_meta(ctx, "tag", query, limit) ``` ### Step 2: Add `normalize_for_matching` with aspect-aware prefix handling (`tag_matcher.py`) Authors/sections don't have colon prefixes. The existing `normalize_tag_for_matching` calls `extract_tag_name` which strips everything before the first colon. An author named "Dr. Smith: Expert" would wrongly become "expert". We need an aspect parameter: ```python def normalize_for_matching(value: str, aspect: str = "tag") -> str: """Normalize a meta value for consistent matching. For tags: strips smart tag and colon prefixes before normalizing. For authors/sections: only normalizes case, hyphens, and whitespace. """ if aspect == "tag": value = extract_tag_name(value) normalized = value.lower() normalized = normalized.replace('-', ' ') normalized = ' '.join(normalized.split()) return normalized ``` Keep `normalize_tag_for_matching` as a backward-compatible wrapper: ```python def normalize_tag_for_matching(tag: str) -> str: return normalize_for_matching(tag, aspect="tag") ``` ### Step 3: Add `find_matching_meta` generic matching function (`tag_matcher.py`) For tags, delegates to existing `find_matching_tags`. For authors/sections, runs a simplified pipeline (no smart tags, no prefix discovery): ```python def find_matching_meta(ctx, query: str, aspect: str = "tag", smart_tag_display: str = "site") -> dict: """Find matching meta values for any aspect (tag, author, section). Returns dict with: "tags" (matched values), "match_type", "prefix_tags", and optionally "suggestions". """ if aspect == "tag": return find_matching_tags(ctx, query, smart_tag_display) if not query: return {"tags": [], "match_type": "none", "prefix_tags": []} query = query.strip() # Search Mage for candidates search_results = search_meta(ctx, aspect, query, limit=20) # Fallback: try hyphens instead of spaces if not search_results and ' ' in query: search_results = search_meta(ctx, aspect, query.replace(' ', '-'), limit=20) if not search_results: return {"tags": [], "match_type": "none", "prefix_tags": []} # Exact match using normalization query_normalized = normalize_for_matching(query, aspect=aspect) matches = [ candidate for candidate in search_results if normalize_for_matching(candidate, aspect=aspect) == query_normalized ] if matches: return {"tags": matches, "match_type": "exact", "prefix_tags": []} # No exact match - return suggestions return {"tags": [], "match_type": "none", "prefix_tags": [], "suggestions": search_results[:10]} ``` Note: Uses `"tags"` key for all aspects to keep the same response shape as `find_matching_tags`. This is a naming quirk but avoids changing the consumer code. ### Step 4: Generalize `_resolve_tag_filter` → `_resolve_meta_filter` (`analytics.py`) Add constants and a generic resolver: ```python ASPECT_FILTER_KEY = {"tag": "any_tag", "author": "any_author", "section": "any_section"} ENDPOINT_TO_ASPECT = {"tags": "tag", "authors": "author", "sections": "section"} def _resolve_meta_filter(ctx, aspect: str, value: str, filters: dict[str, Any]) -> dict | None: """Resolve a meta value string into matched values for any aspect.""" filter_key = ASPECT_FILTER_KEY[aspect] try: if aspect == "tag": result = find_matching_tags(ctx, value, "all") else: result = find_matching_meta(ctx, value, aspect) if result["tags"]: filters[filter_key] = result["tags"] meta_info = { "query": value, "aspect": aspect, "match_type": result["match_type"], "matched_values": result["tags"], "matched_count": len(result["tags"]), } if aspect == "tag": meta_info["matched_tags"] = result["tags"] meta_info["prefix_available"] = len(result["prefix_tags"]) meta_info["prefix_sample"] = [extract_tag_name(t) for t in result["prefix_tags"][:5]] suggestions = result.get("suggestions", []) if suggestions: meta_info["suggestions"] = suggestions[:10] meta_info["total_suggestions"] = len(suggestions) return meta_info except (AttributeError, KeyError, TypeError) as e: log.error(f"Meta matching error ({aspect}): {e}", exc_info=True) return None ``` Keep `_resolve_tag_filter` as a wrapper for backward compatibility: ```python def _resolve_tag_filter(ctx, tag_string: str, filters: dict[str, Any]) -> dict | None: return _resolve_meta_filter(ctx, "tag", tag_string, filters) ``` ### Step 5: Update `query_analytics` resolution logic (`analytics.py`) Replace the current tag-only resolution block (~lines 321-336) with a generalized loop: ```python # Handle meta matching for tags, authors, sections meta_info = None # 1) Meta parameter resolution (endpoint-specific detail views) if endpoint in ENDPOINT_TO_ASPECT and meta: aspect = ENDPOINT_TO_ASPECT[endpoint] filter_key = ASPECT_FILTER_KEY[aspect] meta_info = _resolve_meta_filter(request.ctx, aspect, meta, filters) if filters.get(filter_key): meta = None params["meta"] = meta log.info(f"Meta matching from meta ({aspect}): cleared meta, filters.{filter_key}={filters.get(filter_key)}") # 2) Filter parameter resolution (any_tag, any_author, any_section) for aspect, filter_key in ASPECT_FILTER_KEY.items(): if meta_info and meta_info.get("aspect") == aspect: continue # Already resolved via meta above filter_value = filters.get(filter_key) if filter_value and isinstance(filter_value, str): info = _resolve_meta_filter(request.ctx, aspect, filter_value, filters) log.info(f"Meta matching from filters ({aspect}): filters.{filter_key}={filters.get(filter_key)}") if info and meta_info is None: meta_info = info ``` In response construction, replace `tag_info` references with `meta_info`: ```python if meta_info: results["meta_info"] = meta_info # Backward compat if meta_info.get("aspect") == "tag": results["tag_info"] = meta_info ``` ### Step 6: Update imports in `analytics.py` ```python from agent.tools.lib.tag_matcher import extract_tag_name, find_matching_tags, find_matching_meta, search_meta ``` ### Step 7: Update LLM prompt template (`query_analytics.md`) Add instructions about `meta_info` for authors/sections alongside the existing tag instructions. The LLM should mention suggestions when no exact match is found for authors/sections too. ## Test Instructions ### New tests in `test_tag_matcher.py`: **`TestSearchMeta`**: Parametrized tests for `search_meta` with different aspects (author, section, tag), mock `ctx.mage[aspect].find_keys`. **`TestNormalizeForMatching`**: - Author with colon preserved: `normalize_for_matching("Dr. Smith: Expert", "author")` → `"dr. smith: expert"` - Author hyphen: `normalize_for_matching("John-Smith", "author")` → `"john smith"` - Tag strips prefix: `normalize_for_matching("tag:Olympics", "tag")` → `"olympics"` **`TestFindMatchingMeta`** (parametrized): - Author exact match (case insensitive) - Author hyphen/space normalization - Author no match returns suggestions - Section exact match - Tag delegates to `find_matching_tags` - Empty query returns none ### New tests in `test_resolve_tag_filter.py`: - `test_resolve_author_filter` — verifies `any_author` is populated - `test_resolve_section_filter` — verifies `any_section` is populated - `test_authors_endpoint_meta_triggers_resolution` — meta on authors endpoint - `test_sections_endpoint_meta_triggers_resolution` — meta on sections endpoint - `test_any_author_list_skips_resolution` — already-resolved list not re-resolved ### Run tests: ```bash docker compose exec backend pytest tests/agent/tools/test_tag_matcher.py -v docker compose exec backend pytest tests/agent/tools/test_resolve_tag_filter.py -v docker compose exec backend pytest tests/agent/ -v ``` ### Manual testing: 1. Ask "How are posts by [known author] doing?" — verify author resolution in logs 2. Ask "Show me top posts in [section name]" — verify section resolution 3. Ask "How is Olympics performing?" — verify tag resolution still works 4. Ask "Compare posts by [author1] vs [author2]" — verify compare gets resolution for free