# Building & Installing Apache Zeppelin 0.8 (from GIT)

### Install R

On MacOSX, follow these steps: http://www.reed.edu/data-at-reed/software/R/r_studio.html
- Install R
- Install RStudio
- Download the `startup_packages.R` script from that page:

```
# List of useful packages
pkg <- c("tidyr", "dplyr", "ggplot2", "knitr", "rmarkdown")

# Check if packages are not installed and assign the
# names of the uninstalled packages to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]

# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
  install.packages(new.pkg, repos = "http://cran.rstudio.com")
}
```

- Open this file in RStudio and click on "Source" to execute.
- This will prompt to OK installing various packages

### Build Zeppelin

- `git clone git@github.com:apache/zeppelin.git`
- `cd zeppelin`
- `mvn clean package -Pscala-2.11 -Pspark-2.2 -Phadoop-2.7 -Pr -DskipTests -Pbuild-distr`

The installation package will be in `./zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT.tar.gz`

### Install Zeppelin

- Extract the above `tar.gz` in a directory, say, `$Z`

### Download the latest Spark 2.2+

- http://spark.apache.org/downloads.html
- Get the package "Pre-built for Apache Hadoop 2.7 and later"
- Extract the above `tgz` in a directory, say, `$S`

### Configure Zeppelin

- `(cd $Z/; ln -s $S spark)`
- `(cd $Z/; cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh)`
- `(cd $Z/; cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml)`

- In `$Z/conf/zeppelin-env.sh`, uncomment & set `JAVA_HOME` and `SPARK_HOME` as follows:
  - For `JAVA_HOME`, set it to the location of JDK 1.8; e.g., `export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home`
  - `export SPARK_HOME=$ZEPPELIN_HOME/spark` (this assumes you've set the symlink as described above)

### Start/Stop Zeppelin

- `$Z/bin/zeppelin-daemon.sh start`
- `$Z/bin/zeppelin-daemon.sh stop`