-
-
Save rafael-gumiero/84f86fe528e1a0d0bb0063782084d278 to your computer and use it in GitHub Desktop.
Cloud Formation example for Glue Spark Job with metrics and scheduler
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| AWSTemplateFormatVersion: '2010-09-09' | |
| Description: Cloud Formation example for Glue Spark Job with metrics and scheduler | |
| Parameters: | |
| ArtifactBucket: | |
| Description: A global deployable artefact bucket | |
| Type: String | |
| Default: artefacts | |
| ServiceName: | |
| Description: Service Name that owns the stack when created | |
| Type: String | |
| Default: awesome-service | |
| Environment: | |
| Description: Name of the environment the service is being deployed. | |
| Type: String | |
| Default: dev | |
| Resources: | |
| MyGlueSparkJob: | |
| Type: AWS::Glue::Job | |
| Properties: | |
| Role: !Ref MyGlueSparkJobRole | |
| # TODO: Size your cluster accordingly - For each unit here your have 4vCPUs and 16GB RAM | |
| AllocatedCapacity: 5 | |
| Command: | |
| Name: glueetl | |
| ScriptLocation: !Sub s3://${ArtifactBucket}/${ServiceName}/glue_spark_etl_job.py | |
| DefaultArguments: | |
| "--environment": !Ref Environment | |
| "--enable-metrics": 'true' | |
| "--output_bucket_uri": !Sub "s3://${ServiceName}/etl_job_results_output" | |
| "--input_bucketuri": !Sub "s3://${ServiceName}/etl_job_results_input" | |
| ExecutionProperty: | |
| MaxConcurrentRuns: 1 | |
| MaxRetries: 1 | |
| Name: !Sub ${Environment}-${ServiceName}-generator | |
| MyGlueSparkJobJobTrigger: | |
| Type: AWS::Glue::Trigger | |
| Properties: | |
| Type: SCHEDULED | |
| Description: DESCRIPTION_SCHEDULED | |
| Schedule: cron(0 0 * * ? *) | |
| Actions: | |
| - JobName: !Sub ${Environment}-${ServiceName}-generator | |
| Name: !Sub ${Environment}-${ServiceName}-generator-trigger | |
| MyGlueSparkJobRole: | |
| Type: AWS::IAM::Role | |
| Properties: | |
| AssumeRolePolicyDocument: | |
| Version: "2012-10-17" | |
| Statement: | |
| - | |
| Effect: Allow | |
| Principal: | |
| Service: | |
| - "glue.amazonaws.com" | |
| Action: | |
| - "sts:AssumeRole" | |
| Path: "/" | |
| Policies: | |
| - | |
| PolicyName: root | |
| PolicyDocument: | |
| Version: "2012-10-17" | |
| Statement: | |
| - | |
| Effect: Allow | |
| Action: | |
| - logs:CreateLogGroup | |
| - logs:CreateLogStream | |
| - logs:PutLogEvents | |
| Resource: arn:aws:logs:*:*:* | |
| - | |
| Effect: Allow | |
| Action: | |
| - s3:* | |
| Resource: | |
| - Fn::ImportValue: !Sub "s3://${ServiceName}/etl_job_results_output" | |
| - Fn::Join: | |
| - '' | |
| - - Fn::ImportValue: "s3://${ServiceName}/etl_job_results_output" | |
| - '/*' | |
| - | |
| Effect: Allow | |
| Action: | |
| - s3:* | |
| Resource: | |
| - arn:aws:s3:::aws-glue-*/* | |
| - arn:aws:s3:::*/*aws-glue-*/* | |
| - arn:aws:s3:::aws-glue-* | |
| - | |
| Effect: Allow | |
| Action: | |
| - glue:CreateDatabase | |
| - glue:DeleteDatabase | |
| - glue:GetDatabase | |
| - glue:GetDatabases | |
| - glue:UpdateDatabase | |
| - glue:CreateTable | |
| - glue:DeleteTable | |
| - glue:BatchDeleteTable | |
| - glue:UpdateTable | |
| - glue:GetTable | |
| - glue:GetTables | |
| - glue:BatchCreatePartition | |
| - glue:CreatePartition | |
| - glue:DeletePartition | |
| - glue:BatchDeletePartition | |
| - glue:UpdatePartition | |
| - glue:GetPartition | |
| - glue:GetPartitions | |
| - glue:BatchGetPartition | |
| Resource: "*" | |
| - Effect: Allow | |
| Action: | |
| - 'athena:*' | |
| Resource: '*' | |
| - | |
| Effect: Allow | |
| Action: | |
| - cloudwatch:* | |
| Resource: '*' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment