Skip to content

Instantly share code, notes, and snippets.

@salmanmaq
Last active November 16, 2018 17:23
Show Gist options
  • Select an option

  • Save salmanmaq/b908ed1fe946e492dfd7679c695ae37f to your computer and use it in GitHub Desktop.

Select an option

Save salmanmaq/b908ed1fe946e492dfd7679c695ae37f to your computer and use it in GitHub Desktop.
Get a list of arXiv identifiers of Core and Non-Core records in INSPIRE
"""
Get the arxiv ids of all the Core and Non-Core records in INSPIRE.
"""
from invenio_search import current_search_client as es
from elasticsearch.helpers import scan
import numpy as np
core = []
non_core = []
for hit in scan(es, query={"query": {"exists": {"field": "arxiv_eprints"}}, "_source": ["core", "arxiv_eprints"]}, index='records-hep', doc_type='hep'):
source = hit['_source']
if source.get('core') == True:
core.append(source['arxiv_eprints'][0]['value'])
else:
non_core.append(source['arxiv_eprints'][0]['value'])
core = np.array(core)
non_core = np.array(non_core)
core.tofile('inspire_core_list.txt', sep='\n')
non_core.tofile('inspire_noncore_list.txt', sep='\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment