Skip to content

Instantly share code, notes, and snippets.

@rakasaka
Created August 24, 2011 21:45
Show Gist options
  • Select an option

  • Save rakasaka/1169341 to your computer and use it in GitHub Desktop.

Select an option

Save rakasaka/1169341 to your computer and use it in GitHub Desktop.
Unsupervised topic modeling in Ruby using LDA
require 'lda-ruby'
corpus = Lda::Corpus.new
corpus.add_document(Lda::TextDocument.new(corpus, "a lion is a wild feline animal", []))
corpus.add_document(Lda::TextDocument.new(corpus, "a dog is a friendly animal", []))
corpus.add_document(Lda::TextDocument.new(corpus, "a cat is a feline animal", []))
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = (2)
lda.em('random')
topics = lda.top_words(3)
# Results
# => {0=>["animal", "friendly", "dog"], 1=>["animal", "feline", "cat"]}
@rakasaka
Copy link
Copy Markdown
Author

LDA is short for Latent Dirichlet Allocation, an algorithm developed by David Blei, at Princeton. It is an unsupervised mechanism for surfacing topics and themes within a group of text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment