Skip to content

Instantly share code, notes, and snippets.

@lwis
Created March 16, 2015 12:09
Show Gist options
  • Select an option

  • Save lwis/a49bb775c28ac36136d0 to your computer and use it in GitHub Desktop.

Select an option

Save lwis/a49bb775c28ac36136d0 to your computer and use it in GitHub Desktop.
Download and generate digests for all links in content
import hashlib, re, urllib
resources = """<text with links in goes here>"""
def hashfile(afile, hasher, blocksize=65536):
buf = afile.read(blocksize)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(blocksize)
return hasher.hexdigest()
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', resources)
for url in urls:
fileurl, headers = urllib.urlretrieve(url)
print url
print hashfile(open(fileurl, 'rb'), hashlib.sha256())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment