# referecing: # https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04 # https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/ 1. Add 'kafka' user:: $ sudo useradd kafka -m 2. Install Java:: $ sudo apt-get update $ sudo apt-get install default-jre 3. Install zookeeper:: $ sudo apt-get install zookeeperd .. note:: After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181. 4. Confirm zookeeper is running on expected port:: $ telnet localhost 2181 Trying ::1... Connected to localhost. Escape character is '^]'. ruok <-- Type at empty prompt! imokConnection closed by foreign host. .. note:: if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session. 5. Download kafka from http://kafka.apache.org/downloads.html:: # with cntlm proxy installed and running if necessary $ export http_proxy=http://127.0.0.1:8009 $ export https_proxy=http://127.0.0.1:8009 # grab latest stable $ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz 6. untar and move binaries to /usr/local/kafka:: $ tar xvf kafka_2.11-0.10.0.0.tgz $ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka 7. Configure Kafka Server:: # turn on topic delete $ vi /usr/local/kafka/config/server.properites #>> At end of file add: delete.topic.enable = true # save and quit 8. Test Server:: $ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties ... [2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer) .. note:: This only starts the server temporarily for intial testing, the service should be registered later... 9. With the kafka sever running, open another session, and create a topic:: $ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1 Created topic "topic-test". 10. List available topics:: $ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 topic-test .. note:: You should see the created, 'topic-test' topic listed. 11. Send message to topic as a producer via the 'kafka-console-producer.sh':: echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test 12. *Consume* the send message:: $ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning .. note:: The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see */usr/local/kafka/bin/kafka-console-consumer.sh* help for more option details) ---- # for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html Spark installation 1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]:: 1.1 get Scala Build Tool ubuntu repository info:: wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb 1.2 Install sbt repostory info:: sudo dpkg -i sbt-0.13.11.deb 1.3 Update repository info and install 'sbt:: sudo apt-get update sudo apt-get install sbt 2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html):: wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz 3. untar and move:: tar xvf spark-2.0.0-bin-hadoop2.7.tgz sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark 4. Add spark configuraiton to your profile (or appropriate ENV configuration):: vi ~/.profile (Add the following to .profile) # set PATH so it includes user's private bin directories PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH" export PYSPARK_PYTHON=python3 5. Apply to current ENV:: source ~/.profile 5. Test configuration:: pyspark --> Should open the pyspark console