Skip to content

Instantly share code, notes, and snippets.

@idclark
Created July 22, 2012 01:04
Show Gist options
  • Select an option

  • Save idclark/3157830 to your computer and use it in GitHub Desktop.

Select an option

Save idclark/3157830 to your computer and use it in GitHub Desktop.
Useful function to return text content free of HTML tags
from bs4 import BeautifulSoup
import urllib2
# choose a url
url = ' '
Soup = BeautifulSoup(urllib2.urlopen(url))
def get_tag_contents(tag_name):
content_list = []
for tag in Soup.find_all(tag_name):
contents = tag.contents
content_list.append(contents)
return content_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment