Skip to content

Instantly share code, notes, and snippets.

@epishkin
Last active December 27, 2015 16:49
Show Gist options
  • Select an option

  • Save epishkin/7357437 to your computer and use it in GitHub Desktop.

Select an option

Save epishkin/7357437 to your computer and use it in GitHub Desktop.
Hadoop Howto

Howto setup Cloudera Sandbox

Download hadoop & oozie

  1. Download tarballs of hadoop and oozie from http://www.cloudera.com/content/dev-center/en/home/developer-admin-resources/cdh-components.html
  2. extract into ~/opt/ so you should have them in ~/opt/hadoop-2.0.0-cdh4.4.0 and ~/opt/oozie-3.3.2-cdh4.4.0

Update env variables

in ~/.bash_login or ~/.bash_profile

export HDP_HOME=$HOME/opt/hadoop-2.0.0-cdh4.4.0
export HADOOP_HOME=$HDP_HOME/share/hadoop/mapreduce1
export HADOOP_CONF_DIR=${HDP_HOME}/etc/hadoop
 
export OOZIE_HOME=$HOME/opt/oozie-3.3.2-cdh4.4.0 
export OOZIE_URL=http://localhost:11000/oozie/
export OOZIE_TIMEZONE=America/New_York
 
export PATH="$HDP_HOME/bin-mapreduce1:$PATH"
export PATH="$OOZIE_HOME/bin:$PATH"
 
function hadoop-node02() {
  hadoop --config ${HDP_HOME}/etc/hadoop-node02 "$@"
}
 
function oozie-node02() {
  oozie "$@" -oozie http://production.cluster.host:11000/oozie/
}

Create 2 separate config sets for Cloudera VM and the production cluster

cp -R ${HDP_HOME}/etc/hadoop ${HDP_HOME}/etc/hadoop-node02
cp ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml.template ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml
 
cp ${HDP_HOME}/etc/hadoop/mapred-site.xml.template ${HDP_HOME}/etc/hadoop/mapred-site.xml

Configure namenode & job tracker for production cluster

Modify ${HDP_HOME}/etc/hadoop-node02/core-site.xml

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://production.cluster.host:8020</value>
  </property>
</configuration>

Modify ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>production.cluster.host:8021</value>
  </property>
</configuration> 

Configure namenode & job tracker for Cloudera VM

Modify ${HDP_HOME}/etc/hadoop/core-site.xml

 <configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>
</configuration>

Modify ${HDP_HOME}/etc/hadoop/mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

How to use

#List files in hdfs of your sandbox hadoop
hadoop fs -ls /

#List files from namenode02
hadoop-node02 fs -ls /tapad-data

#List 5 scheduled jobs in oozie at namenode02
oozie-node2 jobs -len 5 -jobtype coord

Cloudera QuickStart VM

  1. download from http://www.cloudera.com/content/support/en/downloads.html
  2. if anything asks for login/password try cloudera/cloudera

VMware sandbox setup

Port mapping

#put these settings into nat.conf on ***your laptop***
#make sure '192.168.170.128' is the IP of your box in vmware (run ifconfig)

sudo vi /Library/Preferences/VMware\ Fusion/vmnet8/nat.conf
 
[incomingtcp]
50010 = 192.168.170.128:50010
50020 = 192.168.170.128:50020
50030 = 192.168.170.128:50030
50060 = 192.168.170.128:50060
50070 = 192.168.170.128:50070
50075 = 192.168.170.128:50075
8020  = 192.168.170.128:8020
8021  = 192.168.170.128:8021
8888  = 192.168.170.128:8888
11000 = 192.168.170.128:11000
 
#now restart VMWare's NAT
sudo "/Applications/VMware Fusion.app/Contents/Library/vmnet-cli" --stop
sudo "/Applications/VMware Fusion.app/Contents/Library/vmnet-cli" --start

VirtualBox sandbox setup

you can follow instructions here: https://github.com/cloudera/cdk-examples/

Setup account

Create account for yourself on the sanbox

User name on your laptop and on the sandbox should match. Add to sudoers if you want.

sudo su -
useradd USERNAME
passwd USERNAME

Create user in HUE

Open HUE - http://localhost:8888/useradmin/ (cloudera/cloudera) and create hdfs account for yourself - User Admin -> Add User. Make sure 'Create user dir' is selected.

Create oozie workflows dir

#on the virtualbox execute
sudo su - hdfs
hadoop fs -mkdir /oozie/deployments
hadoop fs -chmod -R 777 /oozie/deployments
exit
 
#on your laptop as yourself execute
hadoop fs -mkdir /oozie/deployments/lib-scalding
@triplel
Copy link
Copy Markdown

triplel commented Nov 7, 2013

awesome stuff, marked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment