Skip to content

Instantly share code, notes, and snippets.

@djalmaaraujo
Last active February 25, 2026 14:30
Show Gist options
  • Select an option

  • Save djalmaaraujo/d0a940fd89b82456c976114c05518991 to your computer and use it in GitHub Desktop.

Select an option

Save djalmaaraujo/d0a940fd89b82456c976114c05518991 to your computer and use it in GitHub Desktop.

Revisions

  1. djalmaaraujo created this gist Feb 25, 2026.
    257 changes: 257 additions & 0 deletions graceful-waddling-moon.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,257 @@
    # Plan: Generalize Meta Matching for Authors, Sections, and Tags

    ## Context

    The tag matching system (prefix matching, normalization, suggestions, `tag_info` metadata) currently only works for tags. When a user asks "How are posts by John Smith doing?" and the LLM sends `any_author: "john smith"`, no resolution happens — the raw string goes straight to the API. If the casing or format doesn't match exactly, results may be empty.

    The Mage API already supports `find_keys` for all meta types (`ctx.mage["author"]`, `ctx.mage["section"]`, `ctx.mage["tag"]`) with the same interface. We need to generalize the matching pipeline so authors and sections get the same treatment as tags.

    **Key differences**: Authors/sections have no smart tag prefixes (`parsely_smart:*`) and no site-specific colon prefixes (`tag:`, `ssts:`). Their matching is simpler: search, normalize, exact match, suggestions.

    **Compare.py gets this for free** since it calls `ANALYTICS_TOOL.method()``query_analytics`, which is where resolution happens.

    ## Files to Modify

    1. `apps/agent/tools/lib/tag_matcher.py` — Add generic `search_meta`, `normalize_for_matching`, `find_matching_meta`
    2. `apps/agent/tools/analytics.py` — Add `_resolve_meta_filter`, update `query_analytics` to resolve authors/sections
    3. `apps/agent/templates/agent/tools/query_analytics.md` — Update LLM instructions for meta matching
    4. `tests/agent/tools/test_tag_matcher.py` — Tests for new generic functions
    5. `tests/agent/tools/test_resolve_tag_filter.py` — Tests for meta resolution in analytics

    ## Implementation

    ### Step 1: Add `search_meta` generic search function (`tag_matcher.py`)

    ```python
    def search_meta(ctx, aspect: str, query: str, limit: int = 20) -> list[str]:
    """Search for meta values using the Mage API.
    Works for any aspect: "tag", "author", or "section".
    Returns list of meta values (including prefixes for tags).
    """
    try:
    metas = ctx.mage[aspect].find_keys(query.lower(), limit=limit)
    results = []
    if "keys" in metas:
    for item in metas["keys"]:
    if isinstance(item, dict) and aspect in item:
    value = item[aspect]
    if value:
    results.append(str(value))
    log.debug(f"Meta search ({aspect}): query='{query}', hits={metas.get('hits', 0)}, results={results[:10]}")
    return results
    except Exception as e:
    log.error(f"Error searching {aspect}: {e}", exc_info=True)
    return []
    ```

    Make existing `search_tags` a thin wrapper:
    ```python
    def search_tags(ctx, query: str, limit: int = 20) -> list[str]:
    return search_meta(ctx, "tag", query, limit)
    ```

    ### Step 2: Add `normalize_for_matching` with aspect-aware prefix handling (`tag_matcher.py`)

    Authors/sections don't have colon prefixes. The existing `normalize_tag_for_matching` calls `extract_tag_name` which strips everything before the first colon. An author named "Dr. Smith: Expert" would wrongly become "expert". We need an aspect parameter:

    ```python
    def normalize_for_matching(value: str, aspect: str = "tag") -> str:
    """Normalize a meta value for consistent matching.
    For tags: strips smart tag and colon prefixes before normalizing.
    For authors/sections: only normalizes case, hyphens, and whitespace.
    """
    if aspect == "tag":
    value = extract_tag_name(value)
    normalized = value.lower()
    normalized = normalized.replace('-', ' ')
    normalized = ' '.join(normalized.split())
    return normalized
    ```

    Keep `normalize_tag_for_matching` as a backward-compatible wrapper:
    ```python
    def normalize_tag_for_matching(tag: str) -> str:
    return normalize_for_matching(tag, aspect="tag")
    ```

    ### Step 3: Add `find_matching_meta` generic matching function (`tag_matcher.py`)

    For tags, delegates to existing `find_matching_tags`. For authors/sections, runs a simplified pipeline (no smart tags, no prefix discovery):

    ```python
    def find_matching_meta(ctx, query: str, aspect: str = "tag", smart_tag_display: str = "site") -> dict:
    """Find matching meta values for any aspect (tag, author, section).
    Returns dict with: "tags" (matched values), "match_type", "prefix_tags", and optionally "suggestions".
    """
    if aspect == "tag":
    return find_matching_tags(ctx, query, smart_tag_display)

    if not query:
    return {"tags": [], "match_type": "none", "prefix_tags": []}

    query = query.strip()

    # Search Mage for candidates
    search_results = search_meta(ctx, aspect, query, limit=20)

    # Fallback: try hyphens instead of spaces
    if not search_results and ' ' in query:
    search_results = search_meta(ctx, aspect, query.replace(' ', '-'), limit=20)

    if not search_results:
    return {"tags": [], "match_type": "none", "prefix_tags": []}

    # Exact match using normalization
    query_normalized = normalize_for_matching(query, aspect=aspect)
    matches = [
    candidate for candidate in search_results
    if normalize_for_matching(candidate, aspect=aspect) == query_normalized
    ]

    if matches:
    return {"tags": matches, "match_type": "exact", "prefix_tags": []}

    # No exact match - return suggestions
    return {"tags": [], "match_type": "none", "prefix_tags": [], "suggestions": search_results[:10]}
    ```

    Note: Uses `"tags"` key for all aspects to keep the same response shape as `find_matching_tags`. This is a naming quirk but avoids changing the consumer code.

    ### Step 4: Generalize `_resolve_tag_filter``_resolve_meta_filter` (`analytics.py`)

    Add constants and a generic resolver:

    ```python
    ASPECT_FILTER_KEY = {"tag": "any_tag", "author": "any_author", "section": "any_section"}
    ENDPOINT_TO_ASPECT = {"tags": "tag", "authors": "author", "sections": "section"}

    def _resolve_meta_filter(ctx, aspect: str, value: str, filters: dict[str, Any]) -> dict | None:
    """Resolve a meta value string into matched values for any aspect."""
    filter_key = ASPECT_FILTER_KEY[aspect]
    try:
    if aspect == "tag":
    result = find_matching_tags(ctx, value, "all")
    else:
    result = find_matching_meta(ctx, value, aspect)

    if result["tags"]:
    filters[filter_key] = result["tags"]

    meta_info = {
    "query": value,
    "aspect": aspect,
    "match_type": result["match_type"],
    "matched_values": result["tags"],
    "matched_count": len(result["tags"]),
    }
    if aspect == "tag":
    meta_info["matched_tags"] = result["tags"]
    meta_info["prefix_available"] = len(result["prefix_tags"])
    meta_info["prefix_sample"] = [extract_tag_name(t) for t in result["prefix_tags"][:5]]
    suggestions = result.get("suggestions", [])
    if suggestions:
    meta_info["suggestions"] = suggestions[:10]
    meta_info["total_suggestions"] = len(suggestions)
    return meta_info
    except (AttributeError, KeyError, TypeError) as e:
    log.error(f"Meta matching error ({aspect}): {e}", exc_info=True)
    return None
    ```

    Keep `_resolve_tag_filter` as a wrapper for backward compatibility:
    ```python
    def _resolve_tag_filter(ctx, tag_string: str, filters: dict[str, Any]) -> dict | None:
    return _resolve_meta_filter(ctx, "tag", tag_string, filters)
    ```

    ### Step 5: Update `query_analytics` resolution logic (`analytics.py`)

    Replace the current tag-only resolution block (~lines 321-336) with a generalized loop:

    ```python
    # Handle meta matching for tags, authors, sections
    meta_info = None

    # 1) Meta parameter resolution (endpoint-specific detail views)
    if endpoint in ENDPOINT_TO_ASPECT and meta:
    aspect = ENDPOINT_TO_ASPECT[endpoint]
    filter_key = ASPECT_FILTER_KEY[aspect]
    meta_info = _resolve_meta_filter(request.ctx, aspect, meta, filters)
    if filters.get(filter_key):
    meta = None
    params["meta"] = meta
    log.info(f"Meta matching from meta ({aspect}): cleared meta, filters.{filter_key}={filters.get(filter_key)}")

    # 2) Filter parameter resolution (any_tag, any_author, any_section)
    for aspect, filter_key in ASPECT_FILTER_KEY.items():
    if meta_info and meta_info.get("aspect") == aspect:
    continue # Already resolved via meta above
    filter_value = filters.get(filter_key)
    if filter_value and isinstance(filter_value, str):
    info = _resolve_meta_filter(request.ctx, aspect, filter_value, filters)
    log.info(f"Meta matching from filters ({aspect}): filters.{filter_key}={filters.get(filter_key)}")
    if info and meta_info is None:
    meta_info = info
    ```

    In response construction, replace `tag_info` references with `meta_info`:
    ```python
    if meta_info:
    results["meta_info"] = meta_info
    # Backward compat
    if meta_info.get("aspect") == "tag":
    results["tag_info"] = meta_info
    ```

    ### Step 6: Update imports in `analytics.py`

    ```python
    from agent.tools.lib.tag_matcher import extract_tag_name, find_matching_tags, find_matching_meta, search_meta
    ```

    ### Step 7: Update LLM prompt template (`query_analytics.md`)

    Add instructions about `meta_info` for authors/sections alongside the existing tag instructions. The LLM should mention suggestions when no exact match is found for authors/sections too.

    ## Test Instructions

    ### New tests in `test_tag_matcher.py`:

    **`TestSearchMeta`**: Parametrized tests for `search_meta` with different aspects (author, section, tag), mock `ctx.mage[aspect].find_keys`.

    **`TestNormalizeForMatching`**:
    - Author with colon preserved: `normalize_for_matching("Dr. Smith: Expert", "author")``"dr. smith: expert"`
    - Author hyphen: `normalize_for_matching("John-Smith", "author")``"john smith"`
    - Tag strips prefix: `normalize_for_matching("tag:Olympics", "tag")``"olympics"`

    **`TestFindMatchingMeta`** (parametrized):
    - Author exact match (case insensitive)
    - Author hyphen/space normalization
    - Author no match returns suggestions
    - Section exact match
    - Tag delegates to `find_matching_tags`
    - Empty query returns none

    ### New tests in `test_resolve_tag_filter.py`:

    - `test_resolve_author_filter` — verifies `any_author` is populated
    - `test_resolve_section_filter` — verifies `any_section` is populated
    - `test_authors_endpoint_meta_triggers_resolution` — meta on authors endpoint
    - `test_sections_endpoint_meta_triggers_resolution` — meta on sections endpoint
    - `test_any_author_list_skips_resolution` — already-resolved list not re-resolved

    ### Run tests:
    ```bash
    docker compose exec backend pytest tests/agent/tools/test_tag_matcher.py -v
    docker compose exec backend pytest tests/agent/tools/test_resolve_tag_filter.py -v
    docker compose exec backend pytest tests/agent/ -v
    ```

    ### Manual testing:
    1. Ask "How are posts by [known author] doing?" — verify author resolution in logs
    2. Ask "Show me top posts in [section name]" — verify section resolution
    3. Ask "How is Olympics performing?" — verify tag resolution still works
    4. Ask "Compare posts by [author1] vs [author2]" — verify compare gets resolution for free