# Building & Installing Apache Zeppelin 0.8 (from GIT) ### Install R On MacOSX, follow these steps: http://www.reed.edu/data-at-reed/software/R/r_studio.html - Install R - Install RStudio - Download the `startup_packages.R` script from that page: ``` # List of useful packages pkg <- c("tidyr", "dplyr", "ggplot2", "knitr", "rmarkdown") # Check if packages are not installed and assign the # names of the uninstalled packages to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())] # If there are any packages in the list that aren't installed, # install them if (length(new.pkg)) { install.packages(new.pkg, repos = "http://cran.rstudio.com") } ``` - Open this file in RStudio and click on "Source" to execute. - This will prompt to OK installing various packages ### Build Zeppelin - `git clone git@github.com:apache/zeppelin.git` - `cd zeppelin` - `mvn clean package -Pscala-2.11 -Pspark-2.2 -Phadoop-2.7 -Pr -DskipTests -Pbuild-distr` The installation package will be in `./zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT.tar.gz` ### Install Zeppelin - Extract the above `tar.gz` in a directory, say, `$Z` ### Download the latest Spark 2.2+ - http://spark.apache.org/downloads.html - Get the package "Pre-built for Apache Hadoop 2.7 and later" - Extract the above `tgz` in a directory, say, `$S` ### Configure Zeppelin - `(cd $Z/; ln -s $S spark)` - `(cd $Z/; cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh)` - `(cd $Z/; cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml)` - In `$Z/conf/zeppelin-env.sh`, uncomment & set `JAVA_HOME` and `SPARK_HOME` as follows: - For `JAVA_HOME`, set it to the location of JDK 1.8; e.g., `export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home` - `export SPARK_HOME=$ZEPPELIN_HOME/spark` (this assumes you've set the symlink as described above) ### Start/Stop Zeppelin - `$Z/bin/zeppelin-daemon.sh start` - `$Z/bin/zeppelin-daemon.sh stop`