Skip to content

Instantly share code, notes, and snippets.

@masayang
Created March 9, 2013 07:28
Show Gist options
  • Select an option

  • Save masayang/5123315 to your computer and use it in GitHub Desktop.

Select an option

Save masayang/5123315 to your computer and use it in GitHub Desktop.

Revisions

  1. masayang created this gist Mar 9, 2013.
    15 changes: 15 additions & 0 deletions mapper.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,15 @@
    #! /usr/bin/env python
    # -*- coding: utf-8 -*-

    from mrjob.job import MRJob

    class MRWordCounter(MRJob):
    def mapper(self, key, line):
    for word in line.split():
    yield word, 1

    def reducer(self, word, occurrences):
    yield word, sum(occurrences)

    if __name__ == '__main__':
    MRWordCounter.run()
    2 changes: 2 additions & 0 deletions mrjob_local.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,2 @@
    #ローカルでの実行
    python wc.py < creativecommons.txt
    7 changes: 7 additions & 0 deletions mrjob_s3.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,7 @@
    #s3上ファイルを使ってEMR上で実行


    export AWS_ACCESS_KEY_ID=<your aws access key>
    export AWS_SECRET_ACCESS_KEY=<your secret access key>

    python wc.py -r emr s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt -o s3://masayang-bootcamp/bootcamp4/EMRconsole/<your account>