Created
March 9, 2013 07:28
-
-
Save masayang/5123315 to your computer and use it in GitHub Desktop.
Revisions
-
masayang created this gist
Mar 9, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,15 @@ #! /usr/bin/env python # -*- coding: utf-8 -*- from mrjob.job import MRJob class MRWordCounter(MRJob): def mapper(self, key, line): for word in line.split(): yield word, 1 def reducer(self, word, occurrences): yield word, sum(occurrences) if __name__ == '__main__': MRWordCounter.run() This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,2 @@ #ローカルでの実行 python wc.py < creativecommons.txt This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,7 @@ #s3上ファイルを使ってEMR上で実行 export AWS_ACCESS_KEY_ID=<your aws access key> export AWS_SECRET_ACCESS_KEY=<your secret access key> python wc.py -r emr s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt -o s3://masayang-bootcamp/bootcamp4/EMRconsole/<your account>