Last active
April 6, 2018 10:01
-
-
Save rrunix/79080781b6beacbdb0d8c95044acbf7f to your computer and use it in GitHub Desktop.
Obtain the number of cites for a set of URLs which denote academic papers. The program expects the links to be in a file called "links.txt" where each link is in a line. (It is necessary to install requests)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import requests | |
| import re | |
| cites_pattern = re.compile('Cited\s*by\s*([0-9]*)') | |
| with open("links.txt", "r") as fin: | |
| for line in fin: | |
| line = line.strip() | |
| page = requests.get("https://scholar.google.es/scholar?", params={'hl':'en', 'q':line}).text | |
| match = cites_pattern.search(page) | |
| if match: | |
| groups = match.groups() | |
| if len(groups) > 1: | |
| print("Err, for {} more than 1 group".format(line)) | |
| else: | |
| print(int(groups[0])) | |
| else: | |
| print(0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment