vineri, 16 iulie 2021

Hadoop Cluster 3.3.1+ Apache Zookeeper-3.6.3 Installation on CentOS 8 Stream minimal install

 

      

 

Architecture

IP Address

Hostname

Role

192.168.137.3

hadoopmaster

NameNode, ResourceManager

192.168.137.4

hadoopslave1

SecondaryNameNode, DataNode, NodeManager

192.168.137.5

hadoopslave2

DataNode, NodeManager

 

CentOS setup
install necessary packages for OS

We pick up CentOS minimal ISO, once the system installed, we need several more basic packages:

sudo yum install -y net-tools

sudo yum install -y openssh-server

sudo yum install -y wget

The first line is to install ifconfig, while the second one is to be able to be ssh login by remote peer.


setup hostname for all nodes

hostnamectl set-hostname hadoopmaster

hostnamectl set-hostname hadoopslave1

hostnamectl set-hostname hadoopslave2

re-login to check the effect


setup jdk for all nodes

install jdk from oracle official website

Enter on
https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

Get a session









Copy linked








And make next command to run

cd ~

wget --header "Cookie: oraclelicense=accept-securebackup-cookie" https://download.oracle.com/otn-pub/java/jdk/8u291-b10/d7fc238d0cbf4b0dac67be84580cfb4b/jdk-8u291-linux-x64.rpm

yum localinstall -y jdk-8u291-linux-x64.rpm



add java.sh under /etc/profile.d/



export JAVA_HOME=/usr/java/latest

export JRE_HOME=/usr/java/latest/jre

export CLASSPATH=$JAVA_HOME/lib:.

export PATH=$PATH:$JAVA_HOME/bin












re-login, and you’ll find all environment variables, and java is well installed.



java -version

ls $JAVA_HOME

echo $PATH



if the java version goes wrong, you can



update-alternatives --config java








then choose a correct version.


setup user and user group on all nodes



sudo groupadd hadoop

sudo useradd -d /home/hadoop -g hadoop hadoop

sudo passwd hadoop


modify hosts file for network inter-recognition on all nodes



echo '192.168.137.3 hadoopmaster' >> /etc/hosts

echo '192.168.137.4 hadoopslave1' >> /etc/hosts

echo '192.168.137.5 hadoopslave2' >> /etc/hosts



check the recognition works using ping to hostnames.


setup ssh no password login on all nodes



On master server



su - hadoop

ssh-keygen -t rsa

ssh-copy-id hadoopmaster

ssh-copy-id hadoopslave1

ssh-copy-id hadoopslave2


stop & disable firewall



systemctl stop firewalld.service

systemctl disable firewalld.service


Hadoop Setup



the whole step operations happens on a single node, let’s say, hadoopmaster. In addition, we’ll login as user hadoop to finish all operations.


Download and untar on the file system.



su - hadoop

wget http://mirrors.sonic.net/apache/hadoop/common/stable/hadoop-3.3.1.tar.gz

tar -xvf hadoop-3.3.1.tar.gz

chmod 775 hadoop-3.3.1


Add environment variables for hadoop



append following content into ~/.bashrc after “export PATH”



export HADOOP_HOME=/home/hadoop/hadoop-3.3.1

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin



then make these variables effect:

source ~/.bashrc

Modify configuration files for hadoop


Add slave node hostnames into $HADOOP_HOME/etc/hadoop/slaves file

echo hadoopslave1 > $HADOOP_HOME/etc/hadoop/slaves

echo hadoopslave2 >> $HADOOP_HOME/etc/hadoop/slaves
Add secondary node hostname into $HADOOP_HOME/etc/hadoop/masters file

echo hadoopslave1 > $HADOOP_HOME/etc/hadoop/masters
Modify $HADOOP_HOME/etc/hadoop/core-site.xml as following



<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoopmaster:9000/</value>

<description>namenode settings</description>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop-3.3.1/tmp/hadoop-${user.name}</value>

<description> temp folder </description>

</property>

<property>

<name>hadoop.proxyuser.hadoop.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hadoop.groups</name>

<value>*</value>

</property>

</configuration>
Modify $HADOOP_HOME/etc/hadoop/hdfs-site.xml as following

<configuration>

<property>

<name>dfs.namenode.http-address</name>

<value>hadoopmaster:50070</value>

<description> fetch NameNode images and edits </description>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>hadoopslave1:50090</value>

<description> fetch SecondNameNode fsimage </description>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

<description> replica count </description>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///home/hadoop//hadoop-3.3.1hdfs/name</value>

<description> namenode </description>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///home/hadoop/hadoop-3.3.1/hdfs/data</value>

<description> DataNode </description>

</property>

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///home/hadoop/hadoop-3.3.1/hdfs/namesecondary</value>

<description> check point </description>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<property>

<name>dfs.stream-buffer-size</name>

<value>131072</value>

<description> buffer </description>

</property>

<property>

<name>dfs.namenode.checkpoint.period</name>

<value>3600</value>

<description> duration </description>

</property>



</configuration>


Modify $HADOOP_HOME/etc/hadoop/mapred-site.xml as following



<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobtracker.address</name>

<value>hdfs://trucy:9001</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoopmaster:10020</value>

<description>MapReduce JobHistory Server host:port, default port is 10020.</description>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoopmaster:19888</value>

<description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description>

</property>



</configuration>


Modify $HADOOP_HOME/etc/hadoop/yarn-site.xml as following

<configuration>



<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoopmaster</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>hadoopmaster:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hadoopmaster:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hadoopmaster:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoopmaster:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoopmaster:8088</value>

</property>



</configuration>


Create necessary folders



mkdir -p $HADOOP_HOME/tmp

mkdir -p $HADOOP_HOME/hdfs/name

mkdir -p $HADOOP_HOME/hdfs/data


Copy hadoop folders and environment settings to slaves



scp ~/.bashrc hadoopslave1:~/

scp ~/.bashrc hadoopslave2:~/



scp -r ~/hadoop-3.3.1 hadoopslave1:~/

scp -r ~/hadoop-3.3.1 hadoopslave2:~/


Modify on all nodes $HADOOP_HOME/etc/hadoop/workers as following

vim $HADOOP_HOME/etc/hadoop/workers

Modify lines with value ( one bye line )

hadoopslave1

hadoopslave2

ESC+:x


Launch hadoop cluster service
Format namenode for the first time launch

hdfs namenode -format
Launch dfs distributed file system

start-dfs.sh
Launch yarn distributed computing system

start-yarn.sh
Shutdown Hadoop Cluster

stop-yarn.sh

stop-dfs.sh


Verify the hadoop cluster is up and healthy


Verify by jps processus

Check jps on each node, and view results.











Verify on Web interface



192.168.137.3:50070 to view hdfs storage status.









192.168.137.3:8088 to view yarn computing system resources and application status.





Zookeeper 3.6.3 Cluster Setup for Hadoop



P.S. Please login as hadoop to build this cluster.


Download and untar



wget http://apache.mirrors.pair.com/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz

tar -zxvf apache-zookeeper-3.6.3-bin.tar.gz


Configuration files


Append following content in ~/.bashrc

export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.6.3-bin

export PATH=$PATH:$ZOOKEEPER_HOME/bin












then make these variables effect:

source ~/.bashrc


Create zoo.cfg file

cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg


Append the following to zoo.cfg

dataDir=/home/hadoop/apache-zookeeper-3.6.3-bin/data

dataLogDir=/home/hadoop/apache-zookeeper-3.6.3-bin/logs

server.1=hadoopmaster:2888:3888

server.2=hadoopslave1:2888:3888

server.3=hadoopslave2:2888:3888
Create necessary folders



mkdir -p $ZOOKEEPER_HOME/logs

mkdir -p $ZOOKEEPER_HOME/data


Copy the zookeeper folder and configuration to all nodes



cd ~

scp -r .bashrc hadoopslave1:~/

scp -r .bashrc hadoopslave2:~/



scp -r apache-zookeeper-3.6.3-bin hadoopslave1:~/

scp -r apache-zookeeper-3.6.3-bin hadoopslave2:~/


Specify the server id according to the zoo.cfg configuration, around all nodes



ssh hadoopmaster echo 1 > $ZOOKEEPER_HOME/data/myid

ssh hadoopslave1 echo 2 > $ZOOKEEPER_HOME/data/myid

ssh hadoopslave2 echo 3 > $ZOOKEEPER_HOME/data/myid

P.S. 1 for hadoopmaster, 2 for hadoopslave1, 3 for hadoopslave2


Launch and Shutdown Zookeeper Cluster Service



zkServer.sh start

zkServer.sh stop


Verify the Zookeeper cluster is up and healthy


use jps to check through all nodes







zkCli.sh to verify all nodes are well being synchronized. ( After your start zk on all nodes )









Done.

Niciun comentariu:

Trimiteți un comentariu