| name | dimensions-ai |
|---|---|
| description | Query the Dimensions.ai scholarly research database using its DSL API via Python. TRIGGER when: user asks about publications, grants, clinical trials, patents, researchers, research organizations, funding data, citation metrics, research output, scholarly data, academic papers, principal investigators, GRID IDs, ORCID, or Dimensions. Use this skill to build Python scripts that authenticate, query, paginate, and return structured data from the Dimensions Analytics API. |
Build Python scripts that query the Dimensions Analytics API to retrieve scholarly research data including publications, grants, clinical trials, patents, and researcher profiles.
The Dimensions Analytics API uses a custom query language called DSL (Dimensions Search Language). All interaction happens via HTTP POST requests carrying DSL query strings. Results come back as JSON. Python scripts should use the dimcli library for convenience (handles auth, pagination, and response parsing), but can also use raw requests if needed.
The API key is stored in a dimcli configuration file (~/.dimcli/dsl.ini) and dimcli.login() with no arguments reads from it automatically.
import dimcli
dimcli.login()
dsl = dimcli.Dsl()If you need raw requests access instead:
import requests
API_KEY = "your-key" # Or read from environment/config
ENDPOINT = "https://app.dimensions.ai"
resp = requests.post(f"{ENDPOINT}/api/auth.json", json={"key": API_KEY})
resp.raise_for_status()
token = resp.json()["token"]
headers = {"Authorization": f"JWT {token}"}
# Make a query
resp = requests.post(
f"{ENDPOINT}/api/dsl/v2",
data='search publications for "malaria" return publications'.encode(),
headers=headers,
)
result = resp.json()Every query has this form:
search <source> [in <index>] [for <terms>] [where <filters>] return <result> [limit N] [skip M] [sort by <field> [asc|desc]]
source:publications,grants,clinical_trials,patents,researchers,organizationsfor: full-text search terms in double quoteswhere: field-level filtersreturn: what to return (source records, facets, or specific fields)limit: max records per request (max 1000, default 20)skip: offset for pagination (max 50000 total)
search publications for "machine learning" return publications
search publications in title_abstract_only for "CRISPR" return publications
search publications in authors for "Jennifer Doudna" return publications
search grants in investigators for "Jane Smith" return grantsBoolean operators (must be UPPERCASE): AND, OR, NOT
search publications for "malaria AND africa AND (treatment OR prevention)" return publicationsTriple-quote syntax for complex queries with nested quotes:
search publications for """
"deep learning" AND ("natural language processing" OR "computer vision")
""" return publicationssearch publications where year in [2020:2024] return publications
search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications
search publications where research_orgs.name ~ "Harvard" return publications
search grants where funder_org_name = "National Institutes of Health" return grants
search clinical_trials where conditions = "breast cancer" return clinical_trialsFilter operators: =, !=, >, <, >=, <=, ~ (partial match), @ (Lucene field search), in (range/list), is empty, is not empty
Combine with and, or, not:
search publications where year >= 2020 and research_org_names ~ "Stanford" and type = "article" return publicationsReturn all fields for maximum flexibility:
return publications[all]
return grants[all]
return clinical_trials[all]Or use fieldsets:
return publications[basics + extras]
return grants[basics + extras + categories]Or specify individual fields:
return publications[id + doi + title + authors + year + times_cited + research_orgs + abstract]return publications sort by times_cited desc
return grants sort by start_date desc
return publications sort by year ascCritical: The API returns max 1000 records per request and allows pagination up to 50,000 total records. Always use pagination when results may exceed 1000.
query_iterative handles pagination automatically:
import dimcli
dimcli.login()
dsl = dimcli.Dsl()
# Automatically paginates through ALL results (up to 50,000)
data = dsl.query_iterative(
'search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]'
)
print(f"Total: {data.count_total}")
print(f"Retrieved: {len(data.publications)}")import dimcli
import time
dimcli.login()
dsl = dimcli.Dsl()
LIMIT = 1000
skip = 0
all_results = []
while True:
query = f'search publications where research_org_names ~ "MIT" and year = 2023 return publications[all] limit {LIMIT} skip {skip}'
data = dsl.query(query)
if not hasattr(data, 'publications') or len(data.publications) == 0:
break
all_results.extend(data.publications)
total = data.stats.get("total_count", 0)
if skip + LIMIT >= total or skip + LIMIT >= 50000:
break
skip += LIMIT
time.sleep(2) # respect rate limits (30 req/min)
print(f"Retrieved {len(all_results)} of {total} publications")import requests
import time
import json
API_KEY = "your-key"
ENDPOINT = "https://app.dimensions.ai"
# Authenticate
resp = requests.post(f"{ENDPOINT}/api/auth.json", json={"key": API_KEY})
resp.raise_for_status()
headers = {"Authorization": f"JWT {resp.json()['token']}"}
LIMIT = 1000
skip = 0
all_results = []
while True:
query = f'search grants where research_org_names ~ "Stanford" return grants[all] limit {LIMIT} skip {skip}'
resp = requests.post(f"{ENDPOINT}/api/dsl/v2", data=query.encode(), headers=headers)
resp.raise_for_status()
result = resp.json()
records = result.get("grants", [])
if not records:
break
all_results.extend(records)
total = result.get("_stats", {}).get("total_count", 0)
if skip + LIMIT >= total or skip + LIMIT >= 50000:
break
skip += LIMIT
time.sleep(2)
print(f"Retrieved {len(all_results)} of {total} grants")When filtering by lists of IDs, the API allows max 400 items per filter clause. Chunk larger lists:
from dimcli.utils import chunks_of
import json
import time
researcher_ids = [...] # large list of researcher IDs
all_results = []
for chunk in chunks_of(researcher_ids, 200):
query = f'search publications where researchers in {json.dumps(chunk)} return publications[all]'
data = dsl.query_iterative(query)
if hasattr(data, 'publications'):
all_results.extend(data.publications)
time.sleep(1)
# Deduplicate
seen = set()
unique = []
for r in all_results:
if r["id"] not in seen:
seen.add(r["id"])
unique.append(r)| Constraint | Limit |
|---|---|
| Requests per minute per IP | 30 |
| Items in a filter clause | 400 |
| Boolean filter conditions | 100 |
| Full-text boolean clauses | 100 |
| Records per single query | 1,000 |
| Total records via pagination | 50,000 |
| Facet results | 1,000 (no pagination) |
Always add time.sleep(2) between paginated requests to stay within rate limits.
By ORCID (most reliable):
search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]By name (searches author name index):
search publications in authors for "Jennifer A Doudna" return publications[all]By Dimensions researcher ID:
search publications where researchers.id = "ur.011301404166.06" return publications[all]By organization name (partial match):
search publications where research_org_names ~ "University of Oxford" and year in [2020:2024] return publications[all]By GRID ID (exact):
search publications where research_orgs.id = "grid.4991.5" and year = 2023 return publications[all]search grants in investigators for "Jane Smith" return grants[all]
search grants where researchers.orcid_id = "0000-0002-1838-9363" return grants[all]search grants where research_org_names ~ "Johns Hopkins" return grants[all]
search grants where research_orgs.id = "grid.21107.35" return grants[all]search grants where funder_org_name = "National Institutes of Health" return grants[all]
search grants where funder_orgs.acronym = "NSF" return grants[all]search clinical_trials in investigators for "John Smith" return clinical_trials[all]
search clinical_trials where researchers.orcid_id = "0000-0002-1838-9363" return clinical_trials[all]search clinical_trials where research_orgs.name ~ "Mayo Clinic" return clinical_trials[all]search clinical_trials for "ovarian neoplasms" return clinical_trials[all]
search clinical_trials where conditions = "breast cancer" return clinical_trials[all]
search clinical_trials where mesh_terms = "Ovarian Neoplasms" return clinical_trials[all]search patents in inventors for "John Smith" return patents[all]
search patents where assignees.name ~ "Google" return patents[all]search publications for "ovarian neoplasms" where year in [2020:2024] return publications[all]
search grants for "ovarian neoplasms" return grants[all]
search clinical_trials for "ovarian neoplasms" return clinical_trials[all]search researchers for "Jennifer Doudna" return researchers[all]
search researchers where orcid_id = "0000-0002-1838-9363" return researchers[all]
search researchers where last_name = "Doudna" and first_name = "Jennifer" return researchers[all]search organizations for "Harvard" return organizations[all]
search organizations where name ~ "Harvard University" return organizations[all]
search organizations where id = "grid.38142.3c" return organizations[all]This is the standard pattern for a script that queries Dimensions and returns all results with full pagination:
#!/usr/bin/env python3
"""Query Dimensions.ai API and return results as JSON."""
import dimcli
import json
import time
import sys
def query_dimensions(dsl_query: str, source: str) -> list[dict]:
"""Execute a DSL query with automatic pagination, return all records.
Args:
dsl_query: The DSL query string (without limit/skip - added automatically).
source: The source type being queried (e.g., 'publications', 'grants').
Returns:
List of result dictionaries.
"""
dimcli.login()
dsl = dimcli.Dsl()
data = dsl.query_iterative(dsl_query)
results = getattr(data, source, [])
print(f"Retrieved {len(results)} of {data.count_total} {source}", file=sys.stderr)
return results
def main():
# Example: Get all publications for a researcher by ORCID
query = 'search publications where researchers.orcid_id = "0000-0002-1838-9363" return publications[all]'
results = query_dimensions(query, "publications")
# Output as JSON
print(json.dumps(results, indent=2, default=str))
if __name__ == "__main__":
main()Search indexes: full_data (default), title_only, title_abstract_only, authors, concepts, raw_affiliations, funding, full_data_exact, acknowledgements
Fieldsets: basics, extras, categories, book, all
Key fields: id, doi, pmid, pmcid, title, abstract, authors, year, date, type, journal, volume, issue, pages, publisher, times_cited, recent_citations, relative_citation_ratio, field_citation_ratio, altmetric, open_access, mesh_terms, concepts, concepts_scores, research_orgs, research_org_names, research_org_countries, research_org_country_names, researchers, funders, funder_countries, supporting_grant_ids, reference_ids, referenced_pubs, clinical_trial_ids, source_title, issn, isbn, dimensions_url, linkout, document_type, date_inserted, date_online, date_print, acknowledgements, funding_section, book_doi, book_title, book_series_title, proceedings_title, subtitles, editors, arxiv_id, altmetric_id, resulting_publication_doi, journal_title_raw, journal_lists, score
Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc, category_sdg, category_uoa
Publication types: article, chapter, proceeding, monograph, preprint, book
Search indexes: full_data (default), title_only, title_abstract_only, raw_affiliations, investigators, concepts
Fieldsets: basics, extras, categories, all
Key fields: id, title, original_title, abstract, start_date, start_year, end_date, active_year, active_status, investigators, research_orgs, research_org_names, research_org_countries, research_org_types, funder_orgs, funder_org_name, funder_org_acronym, funder_org_countries, funder_org_cities, funder_org_states, funding_usd, funding_eur, funding_gbp, funding_cny, funding_aud, funding_chf, funding_nzd, funding_cad, funding_jpy, funding_currency, funding_schemes, project_numbers, foa_number, researchers, keywords, concepts, concepts_scores, language, language_title, linkout, dimensions_url, date_inserted, score
Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc, category_sdg, category_uoa
Search indexes: full_data (default), title_only, title_abstract_only, raw_affiliations, investigators
Fieldsets: basics, extras, studies, categories, all
Key fields: id, title, brief_title, acronym, abstract, start_date, end_date, active_years, overall_status, phase, registry, gender, conditions, interventions, investigators, mesh_terms, study_type, study_designs, study_arms, study_eligibility_criteria, study_minimum_age, study_maximum_age, study_outcome_measures, study_participants, research_orgs, researchers, funders, funder_countries, associated_grant_ids, publication_ids, publications, altmetric, linkout, dimensions_url, date_inserted, score
Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc
Search indexes: full_data (default), title_only, title_abstract_only, title_abstract_claims, inventors, assignees
Fieldsets: basics, extras, categories, all
Key fields: id, title, abstract, year, date, filing_date, filing_status, granted_date, granted_year, publication_date, publication_year, priority_date, priority_year, expiration_date, legal_status, jurisdiction, kind, application_number, family_id, family_count, claims_amount, inventor_names, inventors, assignee_names, assignees, assignee_countries, assignee_cities, assignee_state_codes, current_assignee_names, current_assignees, original_assignee_names, original_assignees, cpc, ipcr, times_cited, reference_ids, researchers, funders, funder_countries, associated_grant_ids, publication_ids, publications, additional_filters, federal_support, orange_book, linkout, dimensions_url, date_inserted, score
Category fields: category_for, category_for_2020, category_bra, category_hra, category_hrcs_hc, category_hrcs_rac, category_icrp_cso, category_icrp_ct, category_rcdc
Fieldsets: basics, extras, all
All fields: id, first_name, last_name, orcid_id, nih_ppid, current_research_org, research_orgs, first_publication_year, last_publication_year, first_grant_year, last_grant_year, total_publications, total_grants, obsolete, redirect, dimensions_url, score
Search indexes: full_data (default)
Fieldsets: basics, nuts, all
All fields: id, name, acronym, types, status, established, city_name, state_name, country_code, country_name, latitude, longitude, linkout, wikipedia_url, dimensions_url, hierarchy_details, ultimate_parent_id, organization_child_ids, organization_parent_ids, organization_related_ids, redirect, ror_ids, isni_ids, wikidata_ids, cnrs_ids, hesa_ids, ucas_ids, ukprn_ids, orgref_ids, external_ids_fundref, nuts_level1_code, nuts_level1_name, nuts_level2_code, nuts_level2_name, nuts_level3_code, nuts_level3_name, score
Organization types: Company, Education, Healthcare, Nonprofit, Facility, Other, Government, Archive
Classify text into research categories:
classify(title="Effect of Climate Change on Crop Yields", abstract="...", system="FOR_2020")Systems: FOR, FOR_2020, RCDC, HRCS_HC, HRCS_RAC, HRA, BRA, ICRP_CSO, ICRP_CT, UOA, SDG, SDG_2021
Extract key concepts from text:
extract_concepts("Genome editing using CRISPR-Cas9 enables precise modifications...")Match affiliation strings to GRID organizations:
extract_affiliations(affiliation="Department of Chemistry, University of Oxford, UK")Batch mode (up to 200):
extract_affiliations(json=[{"affiliation": "MIT, Cambridge MA"}, {"affiliation": "Stanford University"}])Find grant Dimensions ID from grant number:
extract_grants(grant_number="R01HL117329", fundref="100000050")
extract_grants(grant_number="HL117648", funder_name="NIH")Find experts on a topic:
identify experts from concepts ["CRISPR", "gene editing", "Cas9"]
using publications
where year >= 2020
return experts limit 20Use describe to inspect available fields at runtime:
data = dsl.query("describe source publications")
data = dsl.query("describe source grants")
data = dsl.query("describe source clinical_trials")
data = dsl.query("describe entity researchers")- Return
[all]fields for maximum downstream flexibility unless performance is a concern. - Always paginate: Use
dsl.query_iterative()for any query that might return more than 1000 results. - Rate limiting: Max 30 requests/minute. Add
time.sleep(2)between paginated requests. - 50,000 record ceiling: You cannot paginate beyond 50,000 results. Add filters to narrow results if needed.
- Entity fields vs literal fields: Prefer literal fields for filtering when available (e.g.,
research_org_namesinstead ofresearch_orgs.name) as they are faster and more reliable. - Boolean operators must be UPPERCASE in full-text search:
AND,OR,NOT. - Researcher name search: Use the dedicated index (
in authors for,in investigators for). At least two name components required. - Partial match (
~): Matches terms in any order within the field. Good for organization names. - Deduplicate batched results: When chunking ID lists across multiple queries, always deduplicate by
id. - Hyper-authorship: Records with very many authors may be truncated. Retrieve them individually if needed.
- Token expiry: JWT tokens are valid for ~2 hours.
dimclihandles re-authentication automatically. - Empty results: Check for the source key in results before accessing (e.g.,
hasattr(data, 'publications')). [all]fieldset warning: May return deprecated fields. This is fine for data collection; ignore deprecation warnings.- Date format: Dates are
YYYY-MM-DD. Year fields are integers. - Special characters in search terms need backslash escaping:
^,",:,~,\,[,],{,},(,),!,|,&,+.