Skip to content

Instantly share code, notes, and snippets.

@rrunix
Last active April 6, 2018 10:01
Show Gist options
  • Select an option

  • Save rrunix/79080781b6beacbdb0d8c95044acbf7f to your computer and use it in GitHub Desktop.

Select an option

Save rrunix/79080781b6beacbdb0d8c95044acbf7f to your computer and use it in GitHub Desktop.
Obtain the number of cites for a set of URLs which denote academic papers. The program expects the links to be in a file called "links.txt" where each link is in a line. (It is necessary to install requests)
import requests
import re
cites_pattern = re.compile('Cited\s*by\s*([0-9]*)')
with open("links.txt", "r") as fin:
for line in fin:
line = line.strip()
page = requests.get("https://scholar.google.es/scholar?", params={'hl':'en', 'q':line}).text
match = cites_pattern.search(page)
if match:
groups = match.groups()
if len(groups) > 1:
print("Err, for {} more than 1 group".format(line))
else:
print(int(groups[0]))
else:
print(0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment