基于原生态hadoop2.7.9 HA集群搭建

发布日期：2022-02-07 06:39:40 浏览次数：5 分类：技术文章

本文共 29419 字，大约阅读时间需要 98 分钟。

1. 条件准备

软件准备：

Centos 7.2 64位操作系统,jdk1.8 64位, hadoop2.7.9,zookeeper 3.4.9,Hive 2.2

硬件条件：(Vmware虚拟机)

1台主节点机器, 配置cpu 1个内存2g 硬盘50G

2台从节点机器,其中一台内存为2G，其他配置一样

各个节点IP如下：

服务器名字	Ip地址	备注（为方便操作将hostname改为如下）
hadoop001	192.168.0.211	hadoop001
hadoop002	192.168.0.212	hadoop002
hadoop003	192.168.0.213	hadoop003

搭建预期结构

hostname	软件	进程
hadoop001	JDK,hadoop	Namenode, ZKFC , resourcemanager
hadoop002	JDK,hadoop	Zookeer, datanode, journalnode, quorumpeermain, nodemanager
hadoop003	JDK,hadoop	Zookeer, datanode, journalnode, quorumpeermain, nodemanager

2. 服务器准备

安装前需要安装好vmvare虚拟机，搭建好Linux服务器集群，新建用户hadoop，以下操作非必要都在hadoop用户下操作

2.1 关闭服务器防火墙

Centos7 默认用的是firewall作为防火墙，因此需要关闭，因此执行以下红色字体

查看已经开放的端口：firewall-cmd--list-ports

开启端口：firewall-cmd --zone=public--add-port=80/tcp --permanent

命令含义：

–zone #作用域

–add-port=80/tcp #添加端口，格式为：端口/通讯协议

–permanent #永久生效，没有此参数重启后失效

重启firewall：firewall-cmd --reload

停止firewall：systemctl stop firewalld.service

禁止firewall开机启动：systemctl disable firewalld.service

查看默认防火墙状态（关闭后显示notrunning，开启后显示running）：firewall-cmd –state

2.2 修改主机名

通过xshell，远程登录主机192.168.0.211，登录成功后,然后执行命令：

vim /etc/hostname 编辑主机名字为:hadoop001(这个需要重启服务器才能生效)，退出保存

接着执行命令：hostname hadoop001

修改完成211服务器主机名，分别以上方式修改另外服务器主机名。

2.3 修改主机hosts

登录到hadoop001服务器，执行以下命令：

vim /etc/hosts 进入文件编辑，加入以下内容：

192.168.0.211 hadoop001

192.168.0.212 hadoop002

192.168.0.213 hadoop003

然后保存退出。然后继续执行命令：

分别执行命令远程拷贝hosts文件到各个节点，覆盖掉本身的hosts文件。

scp –r /etc/hosts hadoop@192.168.0.212：/etc/

scp –r /etc/hosts

注意：1.如果ssh命令不能用，可能机器本身没有ssh，需要安装：yum install sshpass.x86_64

2.确保各个节点能相互ping通，如果ping不通，查看下防火墙是否关闭。

2.4 ssh免密登录设置

登录到hadoop001服务器hadoop用户下执行命令:

ssh-keygen –t rsa

进入目录/home/hadoop/.ssh/,发现多了三文件：authorized_keys,id_rsa,id_rsa.pub

执行命令： cp ~/.ssh/id_rsa.pub ~/.ssh.authorized_keys

验证执行：ssh localhost，查看本节点是否可以无密码登录。

将授权文件拷贝到其他节点,执行命令：

scp authorized_keyshadoop@hadoop002:~/.ssh/

scp id_rsahadoop@hadoop002:~/.ssh/

scp id_rsa.pubhadoop@hadoop002:~/.ssh/

同样其他节点也执行这样的操作。执行完毕后，测试一下。节点之间能否相互之间无密码登录。

2.5 ntp时间同步配置

（1）首先安装ntp服务

在linux的root用户下执行以下命令安装ntp服务

yuminstall ntp –y

(2) 修改ntp配置文件（server）

我们需要在linux集群中找到一台作为ntp服务器的server，其他机器则为ntp的client，因此，在server服务器上修改一下配置文件

执行 vim/etc/ntp.conf

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift

# Enable this if you want statistics to be logged.

#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats

filegen loopstats file loopstats type day enable

filegen peerstats file peerstats type day enable

filegen clockstats file clockstats type day enable

# Specify one or more NTP servers.

# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board

# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for

# more information.

#linux自带的时间同步，需要注释掉

#pool 0.ubuntu.pool.ntp.org iburst

#pool 1.ubuntu.pool.ntp.org iburst

#pool 2.ubuntu.pool.ntp.org iburst

#pool 3.ubuntu.pool.ntp.org iburst

# Use Ubuntu's ntp server as a fallback.

#pool ntp.ubuntu.com

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for

# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>

# might also be helpful.

# Note that "restrict" applies to both servers and clients, so a configuration

# that might be intended to block requests from certain clients could also end

# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.

restrict -4 default kod notrap nomodify nopeer noquery limited

restrict -6 default kod notrap nomodify nopeer noquery limited

# Local users may interrogate the ntp server more closely.

restrict 127.0.0.1

restrict ::1

#因为是内网，所以用本地时间做为服务器时间，注意这里不是127.0.0.1

server 127.127.1.0

fudge 127.127.1.0 stratum 8

#开放192.168.0.0 整个网段，即在这个网段的所有机器都可以使用 214 作为时间同步服务端

restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap

# Needed for adding pool entries

restrict source notrap nomodify noquery

# Clients from this (example!) subnet have unlimited access, but only if

# cryptographically authenticated.

#restrict 192.168.123.0 mask 255.255.255.0 notrust

# If you want to provide time to your local subnet, change the next line.

# (Again, the address is an example only.)

#broadcast 192.168.123.255

# If you want to listen to time broadcasts on your local subnet, de-comment the

# next lines. Please do this only if you trust everybody on the network!

#disable auth

#broadcastclient

#Changes recquired to use pps synchonisation as explained in documentation:

#http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918

#server 127.127.8.1 mode 135 prefer # Meinberg GPS167 with PPS

#fudge 127.127.8.1 time1 0.0042 # relative to PPS for my hardware

#server 127.127.22.1 # ATOM(PPS)

#fudge 127.127.22.1 flag3 1 # enable PPS API

设置完成退出保存，然后重启ntp服务

执行 service ntp restart

(3) 修改ntp配置文件（client）

同样的server端已经配置好了，client的配置相对简单

执行 vim /etc/ntp.conf

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift

# Enable this if you want statistics to be logged.

#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats

filegen loopstats file loopstats type day enable

filegen peerstats file peerstats type day enable

filegen clockstats file clockstats type day enable

# Specify one or more NTP servers.

# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board

# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for

# more information.

#linux自带的时间同步，需要注释掉

#pool 0.ubuntu.pool.ntp.org iburst

#pool 1.ubuntu.pool.ntp.org iburst

#pool 2.ubuntu.pool.ntp.org iburst

#pool 3.ubuntu.pool.ntp.org iburst

# Use Ubuntu's ntp server as a fallback.

#pool ntp.ubuntu.com

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for

# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>

# might also be helpful.

# Note that "restrict" applies to both servers and clients, so a configuration

# that might be intended to block requests from certain clients could also end

# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.

restrict -4 default kod notrap nomodify nopeer noquery limited

restrict -6 default kod notrap nomodify nopeer noquery limited

# Local users may interrogate the ntp server more closely.

restrict 127.0.0.1

restrict ::1

#增加214作为时间服务器即可

server 192.168.0.214

# Needed for adding pool entries

restrict source notrap nomodify noquery

# Clients from this (example!) subnet have unlimited access, but only if

# cryptographically authenticated.

#restrict 192.168.123.0 mask 255.255.255.0 notrust

# If you want to provide time to your local subnet, change the next line.

# (Again, the address is an example only.)

#broadcast 192.168.123.255

# If you want to listen to time broadcasts on your local subnet, de-comment the

# next lines. Please do this only if you trust everybody on the network!

#disable auth

#broadcastclient

#Changes recquired to use pps synchonisation as explained in documentation:

#http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918

#server 127.127.8.1 mode 135 prefer # Meinberg GPS167 with PPS

#fudge 127.127.8.1 time1 0.0042 # relative to PPS for my hardware

#server 127.127.22.1 # ATOM(PPS)

#fudge 127.127.22.1 flag3 1 # enable PPS API

退出保存，重启ntp服务

执行 service ntp restart

(4) 查看ntp服务是否配置完成

在ntp sever上执行 ntpq-p

在ntp client上执行 ntpq–p

即完成ntp服务的配置

2.6 上传安装文件

通过WinSCP软件，登录主机hadoop001。实现本地机器与远程机器的文件共享。将本机下的：hadoop2.7.4和jdk 1.8和zookeeper3.4.9 和hive2.2复制到hadoop001机器/opt/soft目录下。

注意：可以通过wincp软件进行本地拷贝

3. Zookeeper集群搭建

3.1 zookeeper安装包解压

首先将zookeeper安装包拷贝到hadoop001服务器/opt/soft目录下,然后执行解压命令

tar -zxvf zookeeper-3.4.9.tar.gz

解压完成即得到 zookeeper安装包

3.2 zookeeper配置安装

切换目录: cd /opt/soft/zookeeper-3.4.9/conf/

执行命令：cpzoo_sample.cfg zoo.cfg

复制一份zookeeper的配置文件，以便于进行配置

执行以下命令编辑文件：vim zoo.cfg

加入以下参数：

dataDir=/opt/data/zookeeper

dataLogDir=/opt/data/zookeeper/logs

在文件最后添加：

server.1=hadoop001:2888:3888

server.2=hadoop002:2888:3888

server.3=hadoop003:2888:3888

具体参数如下图：

然后退出保存。

然后创建文件夹，执行以下命令：

mkdir -p/opt/data/zookeeper

mkdir -p/opt/data/zookeeper/logs

创建zookeeper的data存放目录

然后在创建zookeeper的myid空文件：

touch/opt/data/zookeepe/myid

最后向该文件写入ID

echo 1> /opt/data/zookeepe/myid

3.3 将配置好的zookeeper拷贝到其他节点

scp -r zookeeper-3.4.9 hadoop002:/opt/soft/

scp -r zookeeper-3.4.9 hadoop003:/opt/soft/

然后分别在每台机器上执行

然后创建文件夹，执行以下命令：

mkdir -p /opt/data/zookeeper

mkdir -p/opt/data/zookeeper/logs

创建zookeeper的data存放目录

然后在创建zookeeper的myid空文件：

touch/opt/data/zookeepe/myid

最后向该文件写入ID

hadoop002：echo 2> /opt/data/zookeepe/myid

hadoop003：echo 3> /opt/data/zookeepe/myid

3.4 修改环境变量

在安装zookeeper服务器上的hadoop用用户下执行：

cd /home/hadoop vim .bash_profile

把以下内容加入到其中

exportZOOKEEPER_HOME=/opt/soft/zookeeper-3.4.9/

export PATH=$PATH:$ZOOKEEPER_HOME/bin

退出保存，然后执行 source .bash_profile 使其生效

3.5 zookeeper启动与测试

在每台机器上执行以下命令

zkServer.shstart

然后在执行 zkServer.sh status .有一个leader，两个follower，即正常启动

3.6 修改Zookeeper日志输出路径

如果不做修改，默认zookeeper的日志输出信息都打印到了zookeeper.out文件中，这样输出路径和大小没法控制，因为日志文件没有轮转。所以需要修改日志输出方式。具体操作如下：

1、修改$ZOOKEEPER_HOME/bin目录下的zkEnv.sh文件，ZOO_LOG_DIR指定想要输出到哪个目录，ZOO_LOG4J_PROP，指定INFO,ROLLINGFILE的日志APPENDER.

2、修改$ZOOKEEPER_HOME/conf/log4j.properties文件的：zookeeper.root.logger的值与前一个文件的ZOO_LOG4J_PROP保持一致，该日志配置是以日志文件大小轮转的，如果想要按照天轮转，可以修改为DaliyRollingFileAppender.

4. Hadoop集群搭建

4.1 hadoop压缩包解压

登录到hadoop001服务器上，移动hadoop安装包到/opt/soft下

执行解压命令: tar -zxvf hadoop-2.7.6.tar.gz 解压文件

然后执行创建文件夹命令：

mkdir -p /opt/data/hadoop/tmp

mkdir -p /opt/data/hadoop/dfs/data

mkdir -p /opt/data/hadoop/ dfs/name

4.2 hadoop文件配置

4.2.1 配置 JAVA_HOME

进入目录：cd /opt/soft/hadoop-2.7.6/etc/hadoop

配置文件:hadoop-env.sh,打开它修改JAVA_HOME值为(

export JAVA_HOME= /opt/soft/jdk1.8.0_171

export HADOOP_LOG_DIR=/opt/data/hadoop/logs

在以下yarn-size.xml配置的文件目录

配置文件:yarn-env.sh,打开它修改JAVA_HOME值为(

export JAVA_HOME= /opt/soft/jdk1.8.0_171

export HADOOP_LOG_DIR=/opt/data/hadoop/logs

4.2.2 配置slaves

配置文件:slaves，打开它写入内容(写入nodename结点即可)：

执行命令vim slaves 加入以下参数：

hadoop001

hadoop002

hadoop003

退出保存

4.2.3 配置core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>fs.defaultFS</name>

<value>hdfs://beh</value> ###hdfs的命名空间

</property>

<name>io.file.buffer.size</name>

</property>

<name>hadoop.tmp.dir</name>

<value>/opt/data/hadoop/tmp</value> ###自己创建的临时目录

<description>Abase for other temporary directories.</description>

</property>

<name>ha.zookeeper.quorum</name> ### zookeeper集群

<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value> -

</property>

<name>hadoop.proxyuser.root.hosts</name>

</property>

<name>hadoop.proxyuser.root.groups</name>

</property>

</configuration>

注：汉字部分不要加入文件，标红参数根据需要修改

4.2.4 配置hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>dfs.nameservices</name>

<value>beh</value> ###命名空间和cor-site.xml

</property>

<name>dfs.ha.namenodes.beh</name>

<value>hadoop001,hadoop002</value> ###主节点主机名

</property>

<name>dfs.namenode.rpc-address.beh.hadoop001</name>

<value>hadoop001:9000</value>

</property>

<name>dfs.namenode.http-address.beh.hadoop001</name>

<value>hadoop001:50070</value>

</property>

<name>dfs.namenode.rpc-address.beh.hadoop002</name>

<value>hadoop002:9000</value>

</property>

<name>dfs.namenode.http-address.beh.hadoop002</name>

<value>hadoop002:50070</value>

</property>

<name>dfs.namenode.shared.edits.dir</name> ###与zookeeper保持一致

<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/beh</value>

</property>

<name>dfs.journalnode.edits.dir</name>

<value>/opt/data/hadoop/journal</value>

</property>

<name>dfs.ha.automatic-failover.enabled</name>

</property>

<name>dfs.client.failover.proxy.provider.beh</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/hadoop/.ssh/id_rsa</value> ###无密码登录一致，一般默认

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/opt/data/hadoop/dfs/name</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/opt/data/hadoop/dfs/data</value>

</property>

<name>dfs.replication</name>

</property>

<name>dfs.webhdfs.enabled</name>

</property>

<name>dfs.journalnode.http-address</name>

</property>

<name>dfs.journalnode.rpc-address</name>

</property>

<name>ha.zookeeper.quorum</name>

<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>

</property>

</configuration>

注：汉字部分不要加入文件，标红参数根据需要修改

4.2.5 配置mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.-->

<name>mapreduce.framework.name</name>

</property>

<name>mapreduce.jobhistory.address</name>

</property>

<name>mapreduce.jobhistory.webapp.address</name>

</property>

</configuration>

4.2.6 配置yarn-site.xml

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>yarn.resourcemanager.connect.retry-interval.ms</name>

</property>

<name>yarn.resourcemanager.ha.enabled</name>

</property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value> ##两个yarn节点

</property>

<name>ha.zookeeper.quorum</name>

<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>

</property>

<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>

</property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>hadoop001</value>

</property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>hadoop002</value>

</property>

<name>yarn.resourcemanager.ha.id</name>

<value>rm1</value> ##这个是当前机器yarn节点，在热备需要改为rm2

<description>If we want to launch more than one RM in single node, we need this configuration</description>

</property>

<name>yarn.resourcemanager.recovery.enabled</name>

</property>

<name>yarn.resourcemanager.zk-state-store.address</name>

<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>

</property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<name>yarn.resourcemanager.zk-address</name>

<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>

</property>

<name>yarn.resourcemanager.cluster-id</name>

<value>beh-yarn</value> ##与之前的命名空间保持一致

</property>

<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>

</property>

<name>yarn.resourcemanager.address.rm1</name>

<value>hadoop001:8132</value>

</property>

<name>yarn.resourcemanager.scheduler.address.rm1</name>

<value>hadoop001:8130</value>

</property>

<name>yarn.resourcemanager.webapp.address.rm1</name>

<value>hadoop001:23188</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address.rm1</name>

<value>hadoop001:8131</value>

</property>

<name>yarn.resourcemanager.admin.address.rm1</name>

<value>hadoop001:8033</value>

</property>

<name>yarn.resourcemanager.ha.admin.address.rm1</name>

<value>hadoop001:23142</value>

</property>

<name>yarn.resourcemanager.address.rm2</name>

<value>hadoop002:8132</value>

</property>

<name>yarn.resourcemanager.scheduler.address.rm2</name>

<value>hadoop002:8130</value>

</property>

<name>yarn.resourcemanager.webapp.address.rm2</name>

<value>hadoop002:23188</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address.rm2</name>

<value>hadoop002:8131</value>

</property>

<name>yarn.resourcemanager.admin.address.rm2</name>

<value>hadoop002:8033</value>

</property>

<name>yarn.resourcemanager.ha.admin.address.rm2</name>

<value>hadoop002:23142</value>

</property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<name>yarn.nodemanager.local-dirs</name>

<value>/opt/data/hadoop/yarn</value>

</property>

<name>yarn.nodemanager.log-dirs</name>

<value>/opt/data/hadoop/logs</value>

</property>

<name>mapreduce.shuffle.port</name>

</property>

<name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

</property>

<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>

<value>/yarn-leader-election</value>

<description>Optional setting. The default value is /yarn-leader-election</description>

</property>

</configuration>

注：汉字部分不要加入文件，标红参数根据需要修改，文件夹需要自己创建

4.3 hadoop分发其他机器

执行以下命令将安装包分发

scp -r hadoop-2.7.6 hadoop002:/opt/soft/

scp -r hadoop-2.7.6 hadoop003:/opt/soft/

4.4 Hadoop环境变量配置

在每台服务器上的hadoop用用户下执行：

cd/home/hadoop vim .bash_profile

把以下内容加入到其中

export HADOOP_HOME=/opt/soft/hadoop-2.7.6/

exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

退出保存，然后执行 source .bash_profile 使其生效

4.5 启动测试集群

4.5.1 启动zookeeper集群

分别在hadoop001，hadoop002，hadoop003上执行

zkServer.sh start 启动zookeeper

然后查看状态 ./zkServer.sh status

（一个leader，两个follower）zookeeper正常启动

4.5.2 格式化HDFS的Zookeeper存储目录

在 hadoop001上执行（只需在一个 zookeeper 节点执行即可）：hdfs zkfc –formatZK

4.5.3 启动 JournalNode 集群

所有 journalnode 节点上分别执行：

hadoop-daemon.shstart journalnode

4.5.4 格式化并启动第一个 NameNode

选择 hadoop001

##格式化当前节点的 namenode 数据

hdfs namenode -format

##格式化 journalnode 的数据，这个是 ha 需要做的

hdfs namenode -initializeSharedEdits

##启动当前节点的 namenode 服务

hadoop-daemon.sh start namenode

4.5.5 格式化并启动第二个 NameNode

在 hadoop002执行：

##启 hadoop001已经格式化过，然后同步至 hadoop002

hdfs namenode -bootstrapStandby

##启动当前节点的 namenode 服务

hadoop-daemon.sh start namenode

4.5.6 启动所有DataNode

#每个 datanode 上执行hadoop-daemon.sh start datanode

4.5.7 启动 ZooKeeperFailoverController

所有 namenode 节点分别执行：

hadoop-daemon.sh start zkfc

4.5.8 登陆 namenode 服务器 web 端查看服务器状态

此时登陆 http://hadoop001:50070与 http://haoop002:50070

其中一个为 active 另一个为 standby 状态。

这里如果 PC 连接服务器使用浏览器需要输入IP_ADDRESS:50070来进行访问

4.5.9 启动YARN

在hadoop001上执行

start-yarn.sh

4.5.10 hadoop002 上启动 resourcemanager

yarn-daemon.sh start resourcemanager

4.5.11 登陆 resourcemanager 服务器 web 端查看服务器状态

此时登陆 http://hadoop001:23188与 http://haoop002:23188

其中一个为 active 另一个为 standby 状态。活跃节点可以正常访问，备用节点会自动跳转至活跃节

点的 web 地址。

http://resourcemanager_ipaddress:23188

这里如果 PC 连接服务器使用浏览器需要输入IP_ADDRESS:23188来进行访问。

4.5.12 测试集群性能

测试集群是否可能，热备是否切换等性能

5. Hive集群搭建

5.1 hive压缩包解压配置环境变量

登录到hadoop001服务器上，移动hadoop安装包到/opt/soft下

执行解压命令: tar -zxvf apache-hive-2.2.0-bin.tar.gz解压文件

在每台服务器上的hadoop用用户下执行：

cd /home/hadoop vim .bash_profile

把以下内容加入到其中

export HIVE_HOME=/opt/soft/hive-2.2.0

export HIVE_CONF_DIR=$HIVE_HOME/conf

export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

export PATH=$PATH:$HIVE_HOME/bin

5.2 安装mysql

配置MySQL(注:切换到root用户)

卸载CentOS自带的MySQL

rpm -qa | grep mysql

rpm -e mysql-libs-5.1.66-2.el6_3.i686--nodeps

yum -y install mysql-server

初始化MySQL

(1)修改mysql的密码(root权限执行)

cd /usr/bin

./mysql_secure_installation

(2)输入当前MySQL数据库的密码为root, 初始时root是没有密码的, 所以直接回车

Enter current password for root (enter fornone):

(3)设置MySQL中root用户的密码(应与下面Hive配置一致,下面设置为123456)

Set root password? [Y/n] Y

New password:

Re-enter new password:

Password updated successfully!

Reloading privilege tables..

... Success!

(4)删除匿名用户

Remove anonymous users? [Y/n] Y

... Success!

(5)是否不允许用户远程连接,选择N

Disallow root login remotely? [Y/n] N

... Success!

(6)删除test数据库

Remove test database and access to it?[Y/n] Y

Dropping test database...

... Success!

Removing privileges on test database...

... Success!

(7)重装

Reload privilege tables now? [Y/n] Y

... Success!

(8)完成

All done! If you've completed all of the above steps, your MySQL

installation should now be secure.

Thanks for using MySQL!

(9)登陆mysql

mysql -uroot -p

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'IDENTIFIED BY '123' WITH GRANT OPTION;

FLUSH PRIVILEGES;

exit;

至此MySQL配置完成

5.3 配置hive

5.3.1 编辑`hive-env.xml`文件

将hive-env.sh.template文件复制为hive-env.sh, 编辑hive-env.xml文件

JAVA_HOME=/opt/soft/jdk1.8.0_171

HADOOP_HOME=/opt/soft/hadoop-2.7.6

HIVE_HOME=/opt/soft/hive-2.2.0

export HIVE_CONF_DIR=$HIVE_HOME/conf

export HIVE_AUX_JARS_PATH=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar

export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib

export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"

5.3.2 编辑hive-site.xml文件

配置hive-site.xml文件, 将hive-default.xml.template文件拷贝为hive-default.xml, 并编辑hive-site.xml文件(删除所有内容，只留一个<configuration></configuration>)

配置项参考:

hive.server2.thrift.port– TCP的监听端口，默认为10000。

hive.server2.thrift.bind.host– TCP绑定的主机，默认为localhost

hive.server2.thrift.min.worker.threads– 最小工作线程数，默认为5。

hive.server2.thrift.max.worker.threads – 最小工作线程数，默认为500。

hive.server2.transport.mode – 默认值为binary（TCP），可选值HTTP。

hive.server2.thrift.http.port– HTTP的监听端口，默认值为10001。

hive.server2.thrift.http.path – 服务的端点名称，默认为cliservice。hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程，默认为5。hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程，默认为500。

Hive-site文件

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hadoop003:3306/hive?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>username to use against metastore database</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<description>password to use against metastore database</description>

</property>

<name>datanucleus.autoCreateSchema</name>

</property>

<name>datanucleus.autoCreateTables</name>

</property>

<name>datanucleus.autoCreateColumns</name>

</property>

<name>hive.metastore.warehouse.dir</name>

<description>location of default database for the warehouse</description>

</property>

<name>hive.downloaded.resources.dir</name>

<value>/opt/data/hive/tmp/resources</value>

<description>Temporary local directory for added resources in the remote file system.</description>

</property>

<name>hive.exec.dynamic.partition</name>

</property>

<name>hive.exec.dynamic.partition.mode</name>

<value>nonstrict</value>

</property>

<name>hive.exec.local.scratchdir</name>

<value>/opt/data/hive/tmp/HiveJobsLog</value>

<description>Local scratch space for Hive jobs</description>

</property>

<name>hive.downloaded.resources.dir</name>

<value>/opt/data/hive/tmp/ResourcesLog</value>

<description>Temporary local directory for added resources in the remote file system.</description>

</property>

<name>hive.querylog.location</name>

<value>/opt/data/hive/tmp/HiveRunLog</value>

<description>Location of Hive run time structured log file</description>

</property>

<name>hive.server2.logging.operation.log.location</name>

<value>/opt/data/hive/tmp/OpertitionLog</value>

<description>Top level directory where operation tmp are stored if logging functionality is enabled</description>

</property>

<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}.</description>

</property>

<name>hive.hwi.listen.host</name>

<value>hadoop003</value>

<description>This is the host address the Hive Web Interface will listen on</description>

</property>

<name>hive.hwi.listen.port</name>

<description>This is the port the Hive Web Interface will listen on</description>

</property>

<name>hive.server2.thrift.bind.host</name>

<value>hadoop003</value>

</property>

<name>hive.server2.thrift.port</name>

</property>

<name>hive.server2.thrift.http.port</name>

</property>

<name>hive.server2.thrift.http.path</name>

<value>cliservice</value>

</property>

<name>hive.server2.webui.host</name>

<value>hadoop003</value>

</property>

<name>hive.server2.webui.port</name>

</property>

<name>hive.scratch.dir.permission</name>

</property>

<name>hive.server2.enable.doAs</name>

<value>false</value>

</property>

<name>hive.auto.convert.join</name>

<value>false</value>

</property>

<name>spark.dynamicAllocation.enabled</name>

<description>动态分配资源</description>

</property>

<name>spark.driver.extraJavaOptions</name>

<value>-XX:PermSize=128M -XX:MaxPermSize=512M</value>

</property>

</configuration>

5.4 配置hive-config.sh文件

配置$HIVE_HOME/conf/hive-config.sh文件

## 增加以下三行

exportJAVA_HOME=/opt/soft/jdk1.8.0_171

exportHIVE_HOME=/opt/soft/hive-2.2.0

exportHADOOP_HOME=/opt/soft/hadoop-2.7.6

## 修改下列该行

HIVE_CONF_DIR=$HIVE_HOME/conf

5.5 拷贝JDBC包

将JDBC的jar包放入$HIVE_HOME/lib目录下

cp /home/hadoop/mysql-connector-java-5.1.6-bin.jar /opt/soft/hive-2.2.0/lib/

5.6 拷贝jline扩展包

将$HIVE_HOME/lib目录下的jline-2.12.jar包拷贝到$HADOOP_HOME/share/hadoop/yarn/lib目录下,并删除$HADOOP_HOME/share/hadoop/yarn/lib目录下旧版本的jline包

5.7 拷贝tools.jar包

复制$JAVA_HOME/lib目录下的tools.jar到$HIVE_HOME/lib下

cp  $JAVA_HOME/lib/tools.jar  ${HIVE_HOME}/lib

5.8 执行初始化Hive操作

选用MySQLysql和Derby二者之一为元数据库

注意:先查看MySQL中是否有残留的Hive元数据,若有,需先删除

schematool -dbType mysql -initSchema ## MySQL作为元数据库

其中mysql表示用mysql做为存储hive元数据的数据库, 若不用mysql做为元数据库, 则执行

schematool -dbType derby -initSchema ## Derby作为元数据库

脚本hive-schema-1.2.1.mysql.sql会在配置的Hive元数据库中初始化创建表

5.9 启动Metastore服务

执行Hive前, 须先启动metastore服务, 否则会报错

./hive --service metastore

然后打开另一个终端窗口,之后再启动Hive进程

5.10 测试

hiveshow databases;show tables;create table book (id bigint, name string) row format delimited fields terminated by '\t'; select * from book;select count(*) from book;

转载地址：https://blog.csdn.net/wyfsxs/article/details/80784733 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：docker安装zeppelin

下一篇：docker安装oracle

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

2. 服务器准备

2.1 关闭服务器防火墙

2.2 修改主机名

2.3 修改主机hosts

2.4 ssh免密登录设置

2.5 ntp时间同步配置

2.6 上传安装文件

3. Zookeeper集群搭建

3.1 zookeeper安装包解压

3.2 zookeeper配置安装

3.3 将配置好的zookeeper拷贝到其他节点

3.4 修改环境变量

3.5 zookeeper启动与测试

3.6 修改Zookeeper日志输出路径

4. Hadoop集群搭建

4.1 hadoop压缩包解压

4.2 hadoop文件配置

4.2.1 配置 JAVA_HOME

4.2.2 配置slaves

4.2.3 配置core-site.xml

4.2.4 配置hdfs-site.xml

4.2.5 配置mapred-site.xml

4.2.6 配置yarn-site.xml

4.3 hadoop分发其他机器

4.4 Hadoop环境变量配置

4.5 启动测试集群

4.5.1 启动zookeeper集群

4.5.2 格式化HDFS的Zookeeper存储目录

4.5.3 启动 JournalNode 集群

4.5.4 格式化并启动第一个 NameNode

4.5.5 格式化并启动第二个 NameNode

4.5.6 启动所有DataNode

4.5.7 启动 ZooKeeperFailoverController

4.5.8 登陆 namenode 服务器 web 端查看服务器状态

4.5.9 启动YARN

4.5.10 hadoop002 上启动 resourcemanager

4.5.11 登陆 resourcemanager 服务器 web 端查看服务器状态

4.5.12 测试集群性能

5. Hive集群搭建

5.1 hive压缩包解压配置环境变量

5.2 安装mysql

5.3 配置hive

5.3.1 编辑hive-env.xml文件

5.3.2 编辑hive-site.xml文件

5.4 配置hive-config.sh文件

5.5 拷贝JDBC包

5.6 拷贝jline扩展包

5.7 拷贝tools.jar包

5.8 执行初始化Hive操作

5.9 启动Metastore服务

5.10 测试

发表评论

最新留言

关于作者

推荐文章

5.3.1 编辑`hive-env.xml`文件