# Setting up Spark on Ubuntu in clustered environment ## Getting Spark and setting up user ``` sudo adduser spark su spark cd wget http://apache.mirror.anlx.net/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz tar -xvf spark~ rm *.tgz mv spark~/ spark/ ``` ## Setting up Java ``` sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java9-installer ``` ## Getting the master/slave to run on startup Create a new file at `/etc/init.d/spark` ``` #!/bin/sh export SPARK_WORKER_INSTANCES=8 # insert number of cores to use here case "$1" in start) start-stop-daemon --start --chuid spark --exec /home/spark/spark/sbin/start-master.sh -q start-stop-daemon --start --chuid spark --exec /home/spark/spark/sbin/start-slave.sh spark://localhost:7077 -q ;; stop) start-stop-daemon --start --chuid spark --exec /home/spark/spark/sbin/stop-master.sh -q start-stop-daemon --start --chuid spark --exec /home/spark/spark/sbin/stop-slave.sh -q ;; esac exit 0 ``` Make it executable: `sudo chmod +x /etc/init.d/spark` Create symbolic link: `update-rc.d spark defaults`