Skip to content

Instantly share code, notes, and snippets.

@amita-shukla
Last active March 26, 2020 19:22
Show Gist options
  • Select an option

  • Save amita-shukla/53cd4d6c223b923a01023c164b0285a2 to your computer and use it in GitHub Desktop.

Select an option

Save amita-shukla/53cd4d6c223b923a01023c164b0285a2 to your computer and use it in GitHub Desktop.
import nltk
nltk.download('punkt')
from nltk.stem import PorterStemmer
from nltk import word_tokenize
stemmer = PorterStemmer()
def stem_token(token):
stemmed_token = stemmer.stem(token)
return stemmed_token
def stem_text(text):
tokens = word_tokenize(text)
tokens = [token for token in tokens if (not isNumber(token) and token!='.' and token!=',') ] # filter numbers, periods
stemmed_tokens = [stem_token(token) for token in tokens]
stemmed_sentence = ' '.join(stemmed_tokens) # construct the text again
return stemmed_sentence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment