Skip to content

Instantly share code, notes, and snippets.

@varnit
Created November 9, 2011 18:29
Show Gist options
  • Select an option

  • Save varnit/1352377 to your computer and use it in GitHub Desktop.

Select an option

Save varnit/1352377 to your computer and use it in GitHub Desktop.

Revisions

  1. varnit revised this gist Nov 9, 2011. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    $ wget http://www.daviddlewis.com/resources/ testcollections/reuters21578/reuters21578.tar.gz
    $ wget http://www.daviddlewis.com/resources/testcollections/reuters21578/reuters21578.tar.gz

    $ mvn -e -q exec:java -Dexec.mainClass="org.apache.lucene.benchmark.utils.ExtractReuters" -Dexec.args="reuters/ reuters-extracted/"

  2. varnit revised this gist Nov 9, 2011. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    wget http://www.daviddlewis.com/resources/ testcollections/reuters21578/reuters21578.tar.gz
    $ wget http://www.daviddlewis.com/resources/ testcollections/reuters21578/reuters21578.tar.gz

    $ mvn -e -q exec:java -Dexec.mainClass="org.apache.lucene.benchmark.utils.ExtractReuters" -Dexec.args="reuters/ reuters-extracted/"

  3. varnit revised this gist Nov 9, 2011. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -6,7 +6,7 @@ $ hadoop dfs -put reuters-extracted/* reuters/

    $ bin/mahout seqdirectory -c UTF-8 -i reuters/ -o reuters-seqfiles

    $ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors -ow
    $ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors

    $ bin/mahout lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 70000 -x 20

  4. varnit created this gist Nov 9, 2011.
    13 changes: 13 additions & 0 deletions gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    wget http://www.daviddlewis.com/resources/ testcollections/reuters21578/reuters21578.tar.gz

    $ mvn -e -q exec:java -Dexec.mainClass="org.apache.lucene.benchmark.utils.ExtractReuters" -Dexec.args="reuters/ reuters-extracted/"

    $ hadoop dfs -put reuters-extracted/* reuters/

    $ bin/mahout seqdirectory -c UTF-8 -i reuters/ -o reuters-seqfiles

    $ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors -ow

    $ bin/mahout lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 70000 -x 20

    $ bin/mahout org.apache.mahout.clustering.lda.LDAPrintTopics -i reuters-lda-sparse/state-20/ -d reuters vectors/dictionary.file-* -dt sequencefile -w 5