Skip to content

Instantly share code, notes, and snippets.

@robert-mcdermott
Created April 8, 2026 07:42
Show Gist options
  • Select an option

  • Save robert-mcdermott/ebfc0a2af0e5bcd1681b566f2e3a2b10 to your computer and use it in GitHub Desktop.

Select an option

Save robert-mcdermott/ebfc0a2af0e5bcd1681b566f2e3a2b10 to your computer and use it in GitHub Desktop.
Dimension.AI Agentic Skill (SKILL.md)
name dimensions-ai
description Query the Dimensions.ai scholarly research database using its DSL API via Python. TRIGGER when: user asks about publications, grants, clinical trials, patents, researchers, research organizations, funding data, citation metrics, research output, scholarly data, academic papers, principal investigators, GRID IDs, ORCID, or Dimensions. Use this skill to build Python scripts that authenticate, query, paginate, and return structured data from the Dimensions Analytics API.

Dimensions.ai API Skill

Build Python scripts that query the Dimensions Analytics API to retrieve scholarly research data including publications, grants, clinical trials, patents, and researcher profiles.

Overview

The Dimensions Analytics API uses a custom query language called DSL (Dimensions Search Language). All interaction happens via HTTP POST requests carrying DSL query strings. Results come back as JSON. Python scripts should use the dimcli library for convenience (handles auth, pagination, and response parsing), but can also use raw requests if needed.

Authentication

The API key is stored in a dimcli configuration file (~/.dimcli/dsl.ini) and dimcli.login() with no arguments reads from it automatically.

import dimcli

dimcli.login()
dsl = dimcli.Dsl()

If you need raw requests access instead:

import requests

API_KEY = "your-key"  # Or read from environment/config
ENDPOINT = "https://app.dimensions.ai"

resp = requests.post(f"{ENDPOINT}/api/auth.json", json={"key": API_KEY})
resp.raise_for_status()
token = resp.json()["token"]

headers = {"Authorization": f"JWT {token}"}

# Make a query
resp = requests.post(
    f"{ENDPOINT}/api/dsl/v2",
    data='search publications for "malaria" return publications'.encode(),
    headers=headers,
)
result = resp.json()

DSL Query Structure

Every query has this form:

search <source> [in <index>] [for <terms>] [where <filters>] return <result> [limit N] [skip M] [sort by <field> [asc|desc]]
  • source: publications, grants, clinical_trials, patents, researchers, organizations
  • for: full-text search terms in double quotes
  • where: field-level filters
  • return: what to return (source records, facets, or specific fields)
  • limit: max records per request (max 1000, default 20)
  • skip: offset for pagination (max 50000 total)

Full-Text Search

search publications for "machine learning" return publications
search publications in title_abstract_only for "CRISPR" return publications
search publications in authors for "Jennifer Doudna" return publications
search grants in investigators for "Jane Smith" return grants

Boolean operators (must be UPPERCASE): AND, OR, NOT

search publications for "malaria AND africa AND (treatment OR prevention)" return publications

Triple-quote syntax for complex queries with nested quotes:

search publications for """
  "deep learning" AND ("natural language processing" OR "computer vision")
""" return publications

Field Filtering (where)

search publications where year in [2020:2024] return publications
search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications
search publications where research_orgs.name ~ "Harvard" return publications
search grants where funder_org_name = "National Institutes of Health" return grants
search clinical_trials where conditions = "breast cancer" return clinical_trials

Filter operators: =, !=, >, <, >=, <=, ~ (partial match), @ (Lucene field search), in (range/list), is empty, is not empty

Combine with and, or, not:

search publications where year >= 2020 and research_org_names ~ "Stanford" and type = "article" return publications

Returning Fields

Return all fields for maximum flexibility:

return publications[all]
return grants[all]
return clinical_trials[all]

Or use fieldsets:

return publications[basics + extras]
return grants[basics + extras + categories]

Or specify individual fields:

return publications[id + doi + title + authors + year + times_cited + research_orgs + abstract]

Sorting

return publications sort by times_cited desc
return grants sort by start_date desc
return publications sort by year asc

Pagination

Critical: The API returns max 1000 records per request and allows pagination up to 50,000 total records. Always use pagination when results may exceed 1000.

Using dimcli (recommended)

query_iterative handles pagination automatically:

import dimcli

dimcli.login()
dsl = dimcli.Dsl()

# Automatically paginates through ALL results (up to 50,000)
data = dsl.query_iterative(
    'search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]'
)

print(f"Total: {data.count_total}")
print(f"Retrieved: {len(data.publications)}")

Manual pagination with dimcli

import dimcli
import time

dimcli.login()
dsl = dimcli.Dsl()

LIMIT = 1000
skip = 0
all_results = []

while True:
    query = f'search publications where research_org_names ~ "MIT" and year = 2023 return publications[all] limit {LIMIT} skip {skip}'
    data = dsl.query(query)

    if not hasattr(data, 'publications') or len(data.publications) == 0:
        break

    all_results.extend(data.publications)
    total = data.stats.get("total_count", 0)

    if skip + LIMIT >= total or skip + LIMIT >= 50000:
        break

    skip += LIMIT
    time.sleep(2)  # respect rate limits (30 req/min)

print(f"Retrieved {len(all_results)} of {total} publications")

Manual pagination with raw requests

import requests
import time
import json

API_KEY = "your-key"
ENDPOINT = "https://app.dimensions.ai"

# Authenticate
resp = requests.post(f"{ENDPOINT}/api/auth.json", json={"key": API_KEY})
resp.raise_for_status()
headers = {"Authorization": f"JWT {resp.json()['token']}"}

LIMIT = 1000
skip = 0
all_results = []

while True:
    query = f'search grants where research_org_names ~ "Stanford" return grants[all] limit {LIMIT} skip {skip}'
    resp = requests.post(f"{ENDPOINT}/api/dsl/v2", data=query.encode(), headers=headers)
    resp.raise_for_status()
    result = resp.json()

    records = result.get("grants", [])
    if not records:
        break

    all_results.extend(records)
    total = result.get("_stats", {}).get("total_count", 0)

    if skip + LIMIT >= total or skip + LIMIT >= 50000:
        break

    skip += LIMIT
    time.sleep(2)

print(f"Retrieved {len(all_results)} of {total} grants")

Batching Large ID Lists

When filtering by lists of IDs, the API allows max 400 items per filter clause. Chunk larger lists:

from dimcli.utils import chunks_of
import json
import time

researcher_ids = [...]  # large list of researcher IDs

all_results = []
for chunk in chunks_of(researcher_ids, 200):
    query = f'search publications where researchers in {json.dumps(chunk)} return publications[all]'
    data = dsl.query_iterative(query)
    if hasattr(data, 'publications'):
        all_results.extend(data.publications)
    time.sleep(1)

# Deduplicate
seen = set()
unique = []
for r in all_results:
    if r["id"] not in seen:
        seen.add(r["id"])
        unique.append(r)

Technical Limits

Constraint Limit
Requests per minute per IP 30
Items in a filter clause 400
Boolean filter conditions 100
Full-text boolean clauses 100
Records per single query 1,000
Total records via pagination 50,000
Facet results 1,000 (no pagination)

Always add time.sleep(2) between paginated requests to stay within rate limits.

Common Query Patterns

Publications by a Researcher

By ORCID (most reliable):

search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]

By name (searches author name index):

search publications in authors for "Jennifer A Doudna" return publications[all]

By Dimensions researcher ID:

search publications where researchers.id = "ur.011301404166.06" return publications[all]

Publications by an Organization

By organization name (partial match):

search publications where research_org_names ~ "University of Oxford" and year in [2020:2024] return publications[all]

By GRID ID (exact):

search publications where research_orgs.id = "grid.4991.5" and year = 2023 return publications[all]

Grants by a Researcher

search grants in investigators for "Jane Smith" return grants[all]
search grants where researchers.orcid_id = "0000-0002-1838-9363" return grants[all]

Grants by an Organization

search grants where research_org_names ~ "Johns Hopkins" return grants[all]
search grants where research_orgs.id = "grid.21107.35" return grants[all]

Grants by a Funder

search grants where funder_org_name = "National Institutes of Health" return grants[all]
search grants where funder_orgs.acronym = "NSF" return grants[all]

Clinical Trials by a Researcher

search clinical_trials in investigators for "John Smith" return clinical_trials[all]
search clinical_trials where researchers.orcid_id = "0000-0002-1838-9363" return clinical_trials[all]

Clinical Trials by an Organization

search clinical_trials where research_orgs.name ~ "Mayo Clinic" return clinical_trials[all]

Clinical Trials by Condition/Topic

search clinical_trials for "ovarian neoplasms" return clinical_trials[all]
search clinical_trials where conditions = "breast cancer" return clinical_trials[all]
search clinical_trials where mesh_terms = "Ovarian Neoplasms" return clinical_trials[all]

Patents by Inventor or Organization

search patents in inventors for "John Smith" return patents[all]
search patents where assignees.name ~ "Google" return patents[all]

Topic-Based Search Across Sources

search publications for "ovarian neoplasms" where year in [2020:2024] return publications[all]
search grants for "ovarian neoplasms" return grants[all]
search clinical_trials for "ovarian neoplasms" return clinical_trials[all]

Finding a Researcher Profile

search researchers for "Jennifer Doudna" return researchers[all]
search researchers where orcid_id = "0000-0002-1838-9363" return researchers[all]
search researchers where last_name = "Doudna" and first_name = "Jennifer" return researchers[all]

Finding an Organization

search organizations for "Harvard" return organizations[all]
search organizations where name ~ "Harvard University" return organizations[all]
search organizations where id = "grid.38142.3c" return organizations[all]

Complete Python Script Template

This is the standard pattern for a script that queries Dimensions and returns all results with full pagination:

#!/usr/bin/env python3
"""Query Dimensions.ai API and return results as JSON."""

import dimcli
import json
import time
import sys


def query_dimensions(dsl_query: str, source: str) -> list[dict]:
    """Execute a DSL query with automatic pagination, return all records.

    Args:
        dsl_query: The DSL query string (without limit/skip - added automatically).
        source: The source type being queried (e.g., 'publications', 'grants').

    Returns:
        List of result dictionaries.
    """
    dimcli.login()
    dsl = dimcli.Dsl()

    data = dsl.query_iterative(dsl_query)
    results = getattr(data, source, [])
    print(f"Retrieved {len(results)} of {data.count_total} {source}", file=sys.stderr)
    return results


def main():
    # Example: Get all publications for a researcher by ORCID
    query = 'search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]'
    results = query_dimensions(query, "publications")

    # Output as JSON
    print(json.dumps(results, indent=2, default=str))


if __name__ == "__main__":
    main()

Data Source Field Reference

Publications (publications)

Search indexes: full_data (default), title_only, title_abstract_only, authors, concepts, raw_affiliations, funding, full_data_exact, acknowledgements

Fieldsets: basics, extras, categories, book, all

Key fields: id, doi, pmid, pmcid, title, abstract, authors, year, date, type, journal, volume, issue, pages, publisher, times_cited, recent_citations, relative_citation_ratio, field_citation_ratio, altmetric, open_access, mesh_terms, concepts, concepts_scores, research_orgs, research_org_names, research_org_countries, research_org_country_names, researchers, funders, funder_countries, supporting_grant_ids, reference_ids, referenced_pubs, clinical_trial_ids, source_title, issn, isbn, dimensions_url, linkout, document_type, date_inserted, date_online, date_print, acknowledgements, funding_section, book_doi, book_title, book_series_title, proceedings_title, subtitles, editors, arxiv_id, altmetric_id, resulting_publication_doi, journal_title_raw, journal_lists, score

Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc, category_sdg, category_uoa

Publication types: article, chapter, proceeding, monograph, preprint, book

Grants (grants)

Search indexes: full_data (default), title_only, title_abstract_only, raw_affiliations, investigators, concepts

Fieldsets: basics, extras, categories, all

Key fields: id, title, original_title, abstract, start_date, start_year, end_date, active_year, active_status, investigators, research_orgs, research_org_names, research_org_countries, research_org_types, funder_orgs, funder_org_name, funder_org_acronym, funder_org_countries, funder_org_cities, funder_org_states, funding_usd, funding_eur, funding_gbp, funding_cny, funding_aud, funding_chf, funding_nzd, funding_cad, funding_jpy, funding_currency, funding_schemes, project_numbers, foa_number, researchers, keywords, concepts, concepts_scores, language, language_title, linkout, dimensions_url, date_inserted, score

Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc, category_sdg, category_uoa

Clinical Trials (clinical_trials)

Search indexes: full_data (default), title_only, title_abstract_only, raw_affiliations, investigators

Fieldsets: basics, extras, studies, categories, all

Key fields: id, title, brief_title, acronym, abstract, start_date, end_date, active_years, overall_status, phase, registry, gender, conditions, interventions, investigators, mesh_terms, study_type, study_designs, study_arms, study_eligibility_criteria, study_minimum_age, study_maximum_age, study_outcome_measures, study_participants, research_orgs, researchers, funders, funder_countries, associated_grant_ids, publication_ids, publications, altmetric, linkout, dimensions_url, date_inserted, score

Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc

Patents (patents)

Search indexes: full_data (default), title_only, title_abstract_only, title_abstract_claims, inventors, assignees

Fieldsets: basics, extras, categories, all

Key fields: id, title, abstract, year, date, filing_date, filing_status, granted_date, granted_year, publication_date, publication_year, priority_date, priority_year, expiration_date, legal_status, jurisdiction, kind, application_number, family_id, family_count, claims_amount, inventor_names, inventors, assignee_names, assignees, assignee_countries, assignee_cities, assignee_state_codes, current_assignee_names, current_assignees, original_assignee_names, original_assignees, cpc, ipcr, times_cited, reference_ids, researchers, funders, funder_countries, associated_grant_ids, publication_ids, publications, additional_filters, federal_support, orange_book, linkout, dimensions_url, date_inserted, score

Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc

Researchers (researchers)

Fieldsets: basics, extras, all

All fields: id, first_name, last_name, orcid_id, nih_ppid, current_research_org, research_orgs, first_publication_year, last_publication_year, first_grant_year, last_grant_year, total_publications, total_grants, obsolete, redirect, dimensions_url, score

Organizations (organizations)

Search indexes: full_data (default)

Fieldsets: basics, nuts, all

All fields: id, name, acronym, types, status, established, city_name, state_name, country_code, country_name, latitude, longitude, linkout, wikipedia_url, dimensions_url, hierarchy_details, ultimate_parent_id, organization_child_ids, organization_parent_ids, organization_related_ids, redirect, ror_ids, isni_ids, wikidata_ids, cnrs_ids, hesa_ids, ucas_ids, ukprn_ids, orgref_ids, external_ids_fundref, nuts_level1_code, nuts_level1_name, nuts_level2_code, nuts_level2_name, nuts_level3_code, nuts_level3_name, score

Organization types: Company, Education, Healthcare, Nonprofit, Facility, Other, Government, Archive

DSL Functions

classify

Classify text into research categories:

classify(title="Effect of Climate Change on Crop Yields", abstract="...", system="FOR_2020")

Systems: FOR, FOR_2020, RCDC, HRCS_HC, HRCS_RAC, HRA, BRA, ICRP_CSO, ICRP_CT, UOA, SDG, SDG_2021

extract_concepts

Extract key concepts from text:

extract_concepts("Genome editing using CRISPR-Cas9 enables precise modifications...")

extract_affiliations

Match affiliation strings to GRID organizations:

extract_affiliations(affiliation="Department of Chemistry, University of Oxford, UK")

Batch mode (up to 200):

extract_affiliations(json=[{"affiliation": "MIT, Cambridge MA"}, {"affiliation": "Stanford University"}])

extract_grants

Find grant Dimensions ID from grant number:

extract_grants(grant_number="R01HL117329", fundref="100000050")
extract_grants(grant_number="HL117648", funder_name="NIH")

identify experts

Find experts on a topic:

identify experts from concepts ["CRISPR", "gene editing", "Cas9"]
  using publications
  where year >= 2020
  return experts limit 20

Schema Discovery

Use describe to inspect available fields at runtime:

data = dsl.query("describe source publications")
data = dsl.query("describe source grants")
data = dsl.query("describe source clinical_trials")
data = dsl.query("describe entity researchers")

Tips and Gotchas

  1. Return [all] fields for maximum downstream flexibility unless performance is a concern.
  2. Always paginate: Use dsl.query_iterative() for any query that might return more than 1000 results.
  3. Rate limiting: Max 30 requests/minute. Add time.sleep(2) between paginated requests.
  4. 50,000 record ceiling: You cannot paginate beyond 50,000 results. Add filters to narrow results if needed.
  5. Entity fields vs literal fields: Prefer literal fields for filtering when available (e.g., research_org_names instead of research_orgs.name) as they are faster and more reliable.
  6. Boolean operators must be UPPERCASE in full-text search: AND, OR, NOT.
  7. Researcher name search: Use the dedicated index (in authors for, in investigators for). At least two name components required.
  8. Partial match (~): Matches terms in any order within the field. Good for organization names.
  9. Deduplicate batched results: When chunking ID lists across multiple queries, always deduplicate by id.
  10. Hyper-authorship: Records with very many authors may be truncated. Retrieve them individually if needed.
  11. Token expiry: JWT tokens are valid for ~2 hours. dimcli handles re-authentication automatically.
  12. Empty results: Check for the source key in results before accessing (e.g., hasattr(data, 'publications')).
  13. [all] fieldset warning: May return deprecated fields. This is fine for data collection; ignore deprecation warnings.
  14. Date format: Dates are YYYY-MM-DD. Year fields are integers.
  15. Special characters in search terms need backslash escaping: ^, ", :, ~, \, [, ], {, }, (, ), !, |, &, +.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment