Skip to content

Instantly share code, notes, and snippets.

@hamero
hamero / spark-jupyter-notebook-on-google-cloud-dataproc.md
Last active February 9, 2019 11:45
Spark Jupyter on Google Cloud DataProc

Set up a DataProc cluster

https://towardsdatascience.com/starting-to-develop-in-pyspark-with-jupyter-installed-in-a-big-data-cluster-8a84e4db27e4

gcloud beta dataproc clusters create spark-jupyter --zone us-east1-d --master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--num-workers 2 --worker-machine-type n1-standard-4 --worker-boot-disk-size 500 --image-version 1.3-deb9 \
--project XXX --optional-components=ANACONDA,JUPYTER,ZEPPELIN

Follow "Create an SSH tunnel to connect to a web interface on cluster detail page"

name: "VGG_CNN_M_1024"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'