vineri, 16 iulie 2021

Hadoop Cluster 3.3.1+ Apache Zookeeper-3.6.3 Installation on CentOS 8 Stream minimal install

 

      

 

Architecture

IP Address

Hostname

Role

192.168.137.3

hadoopmaster

NameNode, ResourceManager

192.168.137.4

hadoopslave1

SecondaryNameNode, DataNode, NodeManager

192.168.137.5

hadoopslave2

DataNode, NodeManager

 

CentOS setup
install necessary packages for OS

We pick up CentOS minimal ISO, once the system installed, we need several more basic packages:

sudo yum install -y net-tools

sudo yum install -y openssh-server

sudo yum install -y wget

The first line is to install ifconfig, while the second one is to be able to be ssh login by remote peer.


setup hostname for all nodes

hostnamectl set-hostname hadoopmaster

hostnamectl set-hostname hadoopslave1

hostnamectl set-hostname hadoopslave2

re-login to check the effect


setup jdk for all nodes

install jdk from oracle official website

Enter on
https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

Get a session









Copy linked








And make next command to run

cd ~

wget --header "Cookie: oraclelicense=accept-securebackup-cookie" https://download.oracle.com/otn-pub/java/jdk/8u291-b10/d7fc238d0cbf4b0dac67be84580cfb4b/jdk-8u291-linux-x64.rpm

yum localinstall -y jdk-8u291-linux-x64.rpm



add java.sh under /etc/profile.d/



export JAVA_HOME=/usr/java/latest

export JRE_HOME=/usr/java/latest/jre

export CLASSPATH=$JAVA_HOME/lib:.

export PATH=$PATH:$JAVA_HOME/bin












re-login, and you’ll find all environment variables, and java is well installed.



java -version

ls $JAVA_HOME

echo $PATH



if the java version goes wrong, you can



update-alternatives --config java








then choose a correct version.


setup user and user group on all nodes



sudo groupadd hadoop

sudo useradd -d /home/hadoop -g hadoop hadoop

sudo passwd hadoop


modify hosts file for network inter-recognition on all nodes



echo '192.168.137.3 hadoopmaster' >> /etc/hosts

echo '192.168.137.4 hadoopslave1' >> /etc/hosts

echo '192.168.137.5 hadoopslave2' >> /etc/hosts



check the recognition works using ping to hostnames.


setup ssh no password login on all nodes



On master server



su - hadoop

ssh-keygen -t rsa

ssh-copy-id hadoopmaster

ssh-copy-id hadoopslave1

ssh-copy-id hadoopslave2


stop & disable firewall



systemctl stop firewalld.service

systemctl disable firewalld.service


Hadoop Setup



the whole step operations happens on a single node, let’s say, hadoopmaster. In addition, we’ll login as user hadoop to finish all operations.


Download and untar on the file system.



su - hadoop

wget http://mirrors.sonic.net/apache/hadoop/common/stable/hadoop-3.3.1.tar.gz

tar -xvf hadoop-3.3.1.tar.gz

chmod 775 hadoop-3.3.1


Add environment variables for hadoop



append following content into ~/.bashrc after “export PATH”



export HADOOP_HOME=/home/hadoop/hadoop-3.3.1

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin



then make these variables effect:

source ~/.bashrc

Modify configuration files for hadoop


Add slave node hostnames into $HADOOP_HOME/etc/hadoop/slaves file

echo hadoopslave1 > $HADOOP_HOME/etc/hadoop/slaves

echo hadoopslave2 >> $HADOOP_HOME/etc/hadoop/slaves
Add secondary node hostname into $HADOOP_HOME/etc/hadoop/masters file

echo hadoopslave1 > $HADOOP_HOME/etc/hadoop/masters
Modify $HADOOP_HOME/etc/hadoop/core-site.xml as following



<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoopmaster:9000/</value>

<description>namenode settings</description>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop-3.3.1/tmp/hadoop-${user.name}</value>

<description> temp folder </description>

</property>

<property>

<name>hadoop.proxyuser.hadoop.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hadoop.groups</name>

<value>*</value>

</property>

</configuration>
Modify $HADOOP_HOME/etc/hadoop/hdfs-site.xml as following

<configuration>

<property>

<name>dfs.namenode.http-address</name>

<value>hadoopmaster:50070</value>

<description> fetch NameNode images and edits </description>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>hadoopslave1:50090</value>

<description> fetch SecondNameNode fsimage </description>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

<description> replica count </description>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///home/hadoop//hadoop-3.3.1hdfs/name</value>

<description> namenode </description>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///home/hadoop/hadoop-3.3.1/hdfs/data</value>

<description> DataNode </description>

</property>

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///home/hadoop/hadoop-3.3.1/hdfs/namesecondary</value>

<description> check point </description>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<property>

<name>dfs.stream-buffer-size</name>

<value>131072</value>

<description> buffer </description>

</property>

<property>

<name>dfs.namenode.checkpoint.period</name>

<value>3600</value>

<description> duration </description>

</property>



</configuration>


Modify $HADOOP_HOME/etc/hadoop/mapred-site.xml as following



<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobtracker.address</name>

<value>hdfs://trucy:9001</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoopmaster:10020</value>

<description>MapReduce JobHistory Server host:port, default port is 10020.</description>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoopmaster:19888</value>

<description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description>

</property>



</configuration>


Modify $HADOOP_HOME/etc/hadoop/yarn-site.xml as following

<configuration>



<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoopmaster</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>hadoopmaster:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hadoopmaster:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hadoopmaster:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoopmaster:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoopmaster:8088</value>

</property>



</configuration>


Create necessary folders



mkdir -p $HADOOP_HOME/tmp

mkdir -p $HADOOP_HOME/hdfs/name

mkdir -p $HADOOP_HOME/hdfs/data


Copy hadoop folders and environment settings to slaves



scp ~/.bashrc hadoopslave1:~/

scp ~/.bashrc hadoopslave2:~/



scp -r ~/hadoop-3.3.1 hadoopslave1:~/

scp -r ~/hadoop-3.3.1 hadoopslave2:~/


Modify on all nodes $HADOOP_HOME/etc/hadoop/workers as following

vim $HADOOP_HOME/etc/hadoop/workers

Modify lines with value ( one bye line )

hadoopslave1

hadoopslave2

ESC+:x


Launch hadoop cluster service
Format namenode for the first time launch

hdfs namenode -format
Launch dfs distributed file system

start-dfs.sh
Launch yarn distributed computing system

start-yarn.sh
Shutdown Hadoop Cluster

stop-yarn.sh

stop-dfs.sh


Verify the hadoop cluster is up and healthy


Verify by jps processus

Check jps on each node, and view results.











Verify on Web interface



192.168.137.3:50070 to view hdfs storage status.









192.168.137.3:8088 to view yarn computing system resources and application status.





Zookeeper 3.6.3 Cluster Setup for Hadoop



P.S. Please login as hadoop to build this cluster.


Download and untar



wget http://apache.mirrors.pair.com/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz

tar -zxvf apache-zookeeper-3.6.3-bin.tar.gz


Configuration files


Append following content in ~/.bashrc

export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.6.3-bin

export PATH=$PATH:$ZOOKEEPER_HOME/bin












then make these variables effect:

source ~/.bashrc


Create zoo.cfg file

cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg


Append the following to zoo.cfg

dataDir=/home/hadoop/apache-zookeeper-3.6.3-bin/data

dataLogDir=/home/hadoop/apache-zookeeper-3.6.3-bin/logs

server.1=hadoopmaster:2888:3888

server.2=hadoopslave1:2888:3888

server.3=hadoopslave2:2888:3888
Create necessary folders



mkdir -p $ZOOKEEPER_HOME/logs

mkdir -p $ZOOKEEPER_HOME/data


Copy the zookeeper folder and configuration to all nodes



cd ~

scp -r .bashrc hadoopslave1:~/

scp -r .bashrc hadoopslave2:~/



scp -r apache-zookeeper-3.6.3-bin hadoopslave1:~/

scp -r apache-zookeeper-3.6.3-bin hadoopslave2:~/


Specify the server id according to the zoo.cfg configuration, around all nodes



ssh hadoopmaster echo 1 > $ZOOKEEPER_HOME/data/myid

ssh hadoopslave1 echo 2 > $ZOOKEEPER_HOME/data/myid

ssh hadoopslave2 echo 3 > $ZOOKEEPER_HOME/data/myid

P.S. 1 for hadoopmaster, 2 for hadoopslave1, 3 for hadoopslave2


Launch and Shutdown Zookeeper Cluster Service



zkServer.sh start

zkServer.sh stop


Verify the Zookeeper cluster is up and healthy


use jps to check through all nodes







zkCli.sh to verify all nodes are well being synchronized. ( After your start zk on all nodes )









Done.

Centos 8 Stream - haproxy balancer - percona 5.7 cluster

 CentOS Stream release 8 - minmal install

192.168.137.9 db1
192.168.137.10 db2
192.168.137.11 db3
192.168.137.12 clustercontroldb

dnf groupinfo "Development Tools"
dnf group install "Development Tools"
dnf update
dnf install mlocate
dnf install vim
dnf install net-tools
dnf install wget
vim /etc/sysconfig/selinux
firewall-cmd --zone=public --add-service=mysql --permanent
firewall-cmd --zone=public --add-port=3306/tcp       --permanent
firewall-cmd --zone=public --add-port=4567/tcp       --permanent
firewall-cmd --zone=public --add-port=4568/tcp       --permanent
firewall-cmd --zone=public --add-port=4444/tcp       --permanent
firewall-cmd --zone=public --add-port=4567/udp       --permanent
firewall-cmd --zone=public --add-port=9200/tcp --permanent
firewall-cmd --reload
yum install epel-release
yum install socat
yum remove mariadb-libs
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum install rsync nc
percona-release enable-only pxc-57
percona-release enable tools release
yum module disable mysql
yum install Percona-XtraDB-Cluster-full-57
systemctl  start mysql
grep 'temporary password' /var/log/mysqld.log
mysql_secure_installation
mysql -u root -p
mysql> create user sstuser@'%' identified by 'parolagrea';
mysql> grant all on *.* to sstuser@'%';
mysql> flush privileges;
mysql> quit

vim /etc/percona-xtradb-cluster.conf.d/wsrep.cnf
db1
wsrep_cluster_address = gcomm://
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so

wsrep_slave_threads = 8
wsrep_cluster_name = pxc-cluster
wsrep_node_name = pxc-cluster-node-db1
wsrep_node_address = db1
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sstuser:parolagrea

db2
wsrep_cluster_address = gcomm://db1,db2,db3
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so

wsrep_slave_threads = 8
wsrep_cluster_name = pxc-cluster
wsrep_node_name = pxc-cluster-node-db2
wsrep_node_address = db2
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sstuser:parolagrea

db3
wsrep_cluster_address = gcomm://db1,db2,db3
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so

wsrep_slave_threads = 8
wsrep_cluster_name = pxc-cluster
wsrep_node_name = pxc-cluster-node-db3
wsrep_node_address = db3
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sstuser:parolagrea

on db2,db3
vim /etc/percona-xtradb-cluster.conf.d/mysqld.cnf
change server-id=1 in server-id=2 and server-id=3

db1
systemctl start mysql@bootstrap

db2 - after db1 finish
systemctl start mysql

db3 - after db3 finish
systemctl start mysql

db1
mysql -u root -p

mysql> SHOW STATUS LIKE 'wsrep_local_state_comment';
mysql> show global status like 'wsrep_cluster_size';

In cluster of 3 at least 2 need to be up.
if systemctl stop mysql@bootstrap was gived on db1, and somehow gone off.
db2 and db3 will be up.
When db1 come alive, need to have setup wsrep_cluster_address = gcomm://db1,db2,db3 and use
systemctl start mysql

if u decide to make it again db1 as bootstrap proccess owner, after previous case, u need shutdown all the others nodes, systemctl stop mysql and then the last one systemctl stop mysql on db1.
After all mysql stoped, u can use again systemctl stop mysql@bootstrap on db1 and continue with usual starting nodes schema.

HAproxy settings

on all cluster nodes  ( might allready exist after full install percona cluster, u can overwrite it)

if not ...

on all nodes.

wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck
chmod +x clustercheck
mv /usr/bin/clustercheck /usr/bin/clustercheck.old
mv clustercheck /usr/bin/
vim /usr/bin/clustercheck

#MYSQL_USERNAME="${MYSQL_USERNAME:=-clustercheckuser}"
#MYSQL_PASSWORD="${MYSQL_PASSWORD-clustercheckparola!}"
MYSQL_USERNAME="clustercheckuser"
MYSQL_PASSWORD="clustercheckparola!"

yum install xinetd

check if /etc/xinetd.d/mysqlchk does exist.

if not

create it

vim /etc/xinetd.d/mysqlchk

# default: on
# description: mysqlchk
service mysqlchk
{
# this is a config for xinetd, place it in /etc/xinetd.d/
  disable = no
  flags = REUSE
  socket_type = stream
  type = UNLISTED
  port = 9200
  wait = no
  user = nobody
  server = /usr/bin/clustercheck
  log_on_failure += USERID
  only_from = 0.0.0.0/0
  # recommended to put the IPs that need
  # to connect exclusively (security purposes)
  per_source = UNLIMITED
}

chmod

service xinetd restart

on one node
mysql -u root -p

mysql> GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckparola!';

exit;

vim /etc/services

find line who contain 9200, comment and ad new line
mysqlchk 9200/tcp # mysqlchk

On HAproxy server

systemctl stop firewalld
setenforce 0
dnf install gcc gcc-c++ make pcre-devel bzip2-devel
dnf install haproxy
mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bk

vim /etc/haproxy/haproxy.cfg

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
chroot /usr/share/haproxy
user haproxy
group haproxy
daemon

defaults
log global
mode http
option tcplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000

frontend pxc-front
bind *:3307
mode tcp
default_backend pxc-back

frontend stats-front
bind *:80
mode http
default_backend stats-back

frontend pxc-onenode-front
bind *:3306
mode tcp
default_backend pxc-onenode-back

backend pxc-back
mode tcp
balance leastconn
option httpchk
server db1 192.168.137.9:3306 check port 9200 inter 12000 rise 3 fall 3
server db2 192.168.137.10:3306 check port 9200 inter 12000 rise 3 fall 3
server db3 192.168.137.11:3306 check port 9200 inter 12000 rise 3 fall 3

backend stats-back
mode http
balance roundrobin
stats uri /haproxy/stats
stats auth pxcstats:secret

backend pxc-onenode-back
mode tcp
balance leastconn
option httpchk
server db1 192.168.137.9:3306 check port 9200 inter 12000 rise 3 fall 3
server db2 192.168.137.10:3306 check port 9200 inter 12000 rise 3 fall 3 backup
server db3 192.168.137.11:3306 check port 9200 inter 12000 rise 3 fall 3 backup



systemctl start firewalld
firewall-cmd --permanent --add-port=9000/tcp
firewall-cmd --permanent --add-port=3030/tcp
firewall-cmd --permanent --add-port=80/tcp

firewall-cmd --reload


yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum install rsync nc
percona-release enable-only pxc-57
percona-release enable tools release
yum module disable mysql
yum install Percona-XtraDB-Cluster-client-57

vim /etc/services

find line who contain 9200, comment and ad new line
mysqlchk 9200/tcp # mysqlchk


dnf install sysbench

http://clustercontroldb/haproxy/stats
user:pxcstats
pass:secret

on db1

mysql> create database sbtest;
Query OK, 1 row affected (0.01 sec)

mysql> grant all on sbtest.* to 'sbtest'@'%' identified by 'sbpass';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

on haproxy TEST

sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3307 --mysql-user=sbtest --mysql-password=sbpass --mysql-db=sbtest --table_size=10000 --tables=2 --threads=1 --events=0 --time=60 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3307 --mysql-user=sbtest --mysql-password=sbpass --mysql-db=sbtest --range_size=100 --table_size=10000 --tables=2 --threads=1 --events=0 --time=60 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run