Created
October 16, 2013 16:13
-
-
Save ewencp/7010531 to your computer and use it in GitHub Desktop.
Revisions
-
ewencp created this gist
Oct 16, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,24 @@ from mrjob.job import MRJob class JoinExample(MRJob): def mapper(self, id, record): # Use both large files as input. If you have orders and # customers, you'll have as input either # order_id, order_data # or # customer_id, customer_data # In this case, I assume both have a customerID field to join # on and that you'll be able to differentiate them in the # reducer yield record['customerId'], record def reducer(self, customerID, records): for record in records: if is_customer_record(record): # do something with the customer info else: # do something with the order info yield customerID, new_data if __name__ == '__main__': JoinExample.run()