
本文共 9401 字,大约阅读时间需要 31 分钟。
一、配置网络环境 host
1.通过ip addr show或ifconfig命令查看 IP地址
2.修改主机名字:vi /etc/sysconfig/network
3.NETWORKING=yes
HOSTNAME=hdpvm1 ###这里配置,改成你想要的名字 eg:master
输入 :hostname 检测是否修改成功
不行就重启下
二、JDK安装
1.解压:
2. 配置 Java环境变量
vim ~/.bashrc
//在文件最后添加 export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH source ~/.bashrc //刷新配置 java –version //验证,查看 java 版本可能遇到的问题:
Linux安装jdk1.8时走到最后一步,检查jdk配置是否生效时,突然碰出一个问题来。
提示:bash: /usr/java/jdk1.8/bin/java: 权限不够
解决方法:
这时需要Linux连接Xshell,在Xshell中
输入命令:chmod +x /usr/java/jdk1.8/bin/java
三、关防火墙
#查看防火墙状态
service iptables status
#关闭防火墙
service iptables stop
#查看防火墙开机启动状态
chkconfig iptables --list
#关闭防火墙开机启动
chkconfig iptables off
四. ssh免密码验证配置
1.首先在Master机器配置
1-1.进去.ssh文件: [root@Hadoop-Master ~]#cd ~/.ssh 如果没有该目录,先执行一次ssh localhost,不要手动创建,不然配置好还要输入密码
1-2.生成秘钥 ssh-keygen:ssh-keygen -t rsa,一路狂按回车键就可以了,最终生成(id_rsa,id_rsa.pub两个文件)
1-3.生成authorized_keys文件:[spark@S1PA11 .ssh]$ cat id_rsa.pub >>authorized_keys
1-4.在另两台台机器Slave1、Slave2也生成公钥和秘钥
1-5.将Slave1机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s1
1-6.将Slave2机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s2
1-7.此切换到Master机器合并authorized_keys;
[root@Hadoop-Master .ssh]# cat id_rsa.pub_s1>> authorized_keys
[root@Hadoop-Master .ssh]# cat id_rsa.pub_s2>> authorized_keys
1-8.将authorized_keyscopy到Slave1、Slave2机器:
[root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave1:~/.ssh/
[root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave2:~/.ssh/
1-9.现在将各台 .ssh/文件夹权限改为700,authorized_keys文件权限改为600(or 644)
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys1-10.验证ssh
[root@Hadoop-Master .ssh]# ssh Hadoop-Slave1
Welcome to aliyun Elastic Compute Service! [root@Hadoop-Slave1 ~]# exit logout Connection to Hadoop-Slave1 closed. [root@Hadoop-Master .ssh]# ssh Hadoop-Slave2 Welcome to aliyun Elastic Compute Service! [root@Hadoop-Slave2 ~]# exit logout Connection to Hadoop-Slave2 closed.
五、Hadoop部署:
3.1 安装Hadoop
3.1.1 下载
(1)到Hadoop官网下载,我下载的是hadoop-3.0.0.tar.gz
(2)同jdk类似,在家目录下(/home/ubuntu/)创建文件夹hadoop:mkdir hadoop,然后解压:tar –zxvf hadoop-3.0.0.tar.gz
3.1.2 配置环境变量
执行如下命令:
vim ~/.bashrc
//在文件最后添加 export HADOOP_HOME=/home/ubuntu/hadoop/hadoop-3.0.0 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin source ~/.bashrc //刷新配置12345
3.1.3 创建文件目录
mkdir /home/ubuntu/hadoop/tmp
mkdir /home/ubuntu/hadoop/dfs mkdir /home/ubuntu/hadoop/dfs/data mkdir /home/ubuntu/hadoop/dfs/name1234
3.2 配置Hadoop
进入hadoop-3.0.0的配置目录:cd /home/ubuntu/hadoop/hadoop-3.0.0/etc/hadoop,依次修改hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及workers文件。
3.2.1 配置 hadoop-env.sh
vim hadoop-env.sh1
export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151 //在hadoop-env.sh中找到 JAVA_HOME,配置成对应安装路径
3.2.2 配置 core-site.xml (根据自己节点进行简单修改即可)
vim core-site.xml1
<configuration>
<property> <name>fs.defaultFS</name> <value>hdfs://mpi-1:9000</value> <description>HDFS的URI,文件系统://namenode标识:端口号</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/ubuntu/hadoop/tmp</value> <description>namenode上本地的hadoop临时文件夹</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <description>Size of read/write buffer used in SequenceFiles</description> </property> </configuration>1234567891011121314151617
3.2.3 配置 hdfs-site.xml
vim hdfs-site.xml1
<configuration>
<property> <name>dfs.replication</name> <value>2</value> <description>Hadoop的备份系数是指每个block在hadoop集群中有几份,系数越高,冗余性越好,占用存储也越多</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/ubuntu/hadoop/dfs/name</value> <description>namenode上存储hdfs名字空间元数据 </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/ubuntu/hadoop/dfs/data</value> <description>datanode上数据块的物理存储位置</description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>mpi-1:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> <description>dfs.permissions配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true</description> </property> </configuration>123456789101112131415161718192021222324252627282930
3.2.4 配置 mapred-site.xml
vim mapred-site.xml1
注:之前版本需要cp mapred-site.xml.template mapred-site.xml,hadoop-3.0.0直接是mapred-site.xml
<configuration>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description> <final>true</final> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>mpi-1:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>mpi-1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>mpi-1:19888</value> </property> <property> <name>mapred.job.tracker</name> <value>http://mpi-1:9001</value> </property> </configuration>123456789101112131415161718192021222324
3.2.5 配置 yarn-site.xml
vim yarn-site.xml1
<configuration>
<property> <name>yarn.resourcemanager.hostname</name> <value>mpi-1</value> <description>The hostname of the RM.</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>mpi-1:8032</value> <description>${yarn.resourcemanager.hostname}:8032</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>mpi-1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>mpi-1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>mpi-1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>mpi-1:8088</value> </property> </configuration>1234567891011121314151617181920212223242526272829303132
3.2.6 配置 workers 文件(之前版本是slaves,注意查看)
vim workers
--------------------- 原文:https://blog.csdn.net/secyb/article/details/80170804
4 运行Hadoop集群
4.1 格式化namenode
hdfs namenode -format
//第一次使用hdfs,必须对其格式化(只需格式化一次)
六、最后 没起来的话,报这样的错:
Starting namenodes on [mpi-1]
ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [mpi-1] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation. 2019-03-14 11:40:27,925 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [root@mpi-1 hadoop]# start-yarn.sh Starting resourcemanager ERROR: Attempting to operate on yarn resourcemanager as root ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation. Starting nodemanagers ERROR: Attempting to operate on yarn nodemanager as root ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.解决办法:
写在最前注意:
1、master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件 2、如果你的Hadoop是另外启用其它用户来启动,记得将root改为对应用户HDFS格式化后启动dfs出现以下错误:
[root@master sbin]# ./start-dfs.sh
Starting namenodes on [master] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [slave1] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation. 1234567891011查度娘,见一仁兄的博客有次FAQ,故参考处理顺便再做一记录
参考地址:https://blog.csdn.net/u013725455/article/details/70147331在/hadoop/sbin路径下:
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数
#!/usr/bin/env bash
HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root 123456还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root # Licensed to the Apache Software Foundation (ASF) under one or more 12345678修改后重启 ./start-dfs.sh,成功!
[root@master sbin]# ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. Starting namenodes on [master] 上一次登录:日 6月 3 03:01:37 CST 2018从 slave1pts/2 上 master: Warning: Permanently added 'master,192.168.43.161' (ECDSA) to the list of known hosts. Starting datanodes 上一次登录:日 6月 3 04:09:05 CST 2018pts/1 上 Starting secondary namenodes [slave1] 上一次登录:日 6月 3 04:09:08 CST 2018pts/1 上
七、最终在浏览器检查
9870端口查看(这里是9870,不是50070了)
在浏览器输入master的IP:9870,结果如下: 测试YARN 在浏览器输入master的IP:8088,结果如下: 注:将绑定IP或mpi-1改为0.0.0.0,而不是本地回环IP,这样,就能够实现外网访问本机的8088端口了。比如这里需要将yarn-site.xml中的<property>
<name>yarn.resourcemanager.webapp.address</name> <value>mpi-1:8088</value> </property>1234修改为:
<property>
<name>yarn.resourcemanager.webapp.address</name> <value>0.0.0.0:8088</value> </property>1234另外,可以直接参考Hadoop官网的默认配置文件进行修改,比如hdfs-site.xml文件,里面有详细的参数说明。另外可以使用hdfs dfs命令,比如hdfs dfs -ls /进行存储目录的查看。
发表评论
最新留言
关于作者
