Skip to main content


Showing posts from January, 2019

Hadoop and Spark Setup

#get centos container git pull centos #start centos with port mapping docker run -d -p 8088:8088 -p 9870:9870 --name "centos" -i centos #access centos docker exec -it centos /bin/bash #download the instructions wget cd /tmp tar xzvf Hadoop_Spark_Fundamentals_Code_Notes-V3.0.tgz #follow the instructions in /tmp/Hadoop_Spark_Fundamentals_Code_Notes-V3.0/Lesson-2/Lesson-2.2/NOTES.txt to setup hadoop, pig, hive a few catches * adjust the JAVA_HOME and HADOOP_HOME accordingly based on your installation * modify HADOOP_PATH in the script under /tmp/Hadoop_Spark_Fundamentals_Code_Notes-V3.0/Lesson-2/Lesson-2.2/scripts * check if logs under /opt/hadoop-3.2.0/ is created * after hadoop 3.0.0, the namenode port is 9870 instead of 50070 * before running yarn, set the YARN_CONF_LIB export YARN_CONF_LIB=/opt/hadoop-3.2.0/etc/hado