1 准备工作
机器规划
本次是模拟生产部署,如果个人测试没那么多机子,可以减少机器,自己规划,一台机子部署多个组件。本人用的是docker机子部署的直接用的root用户,建议大家用非root进行部署,本文涉及的root权限的linux命令的大家自行转换。
name | ip | app |
---|---|---|
hadoop01 | 192.168.7.51 | active nameNode |
hadoop02 | 192.168.7.52 | standby nameNode |
hadoop03 | 192.168.7.53 | DN1+journalNode+ZK2+nodemanager |
hadoop04 | 192.168.7.54 | DN1+journalNode+ZK2+nodemanager |
hadoop05 | 192.168.7.55 | DN1+journalNode+ZK2+nodemanager |
hadoop06 | 192.168.7.56 | resourcemanager |
hadoop07 | 192.168.7.57 | resourcemanager |
配置host映射所有机子
vi /etc/hosts
192.168.7.51 hadoop01
192.168.7.52 hadoop02
192.168.7.53 hadoop03
192.168.7.54 hadoop04
192.168.7.55 hadoop05
192.168.7.56 hadoop06
192.168.7.57 hadoop07
hadoop01 到所有机子ssh免密可不设置
所有机子上执行 ,如果之前执行过。~/.ssh目录已存在,请勿重复执行,以免覆盖之前的秘钥
ssh-keygen -t rsa
全部空格跳过,生成密钥,在~/.ssh下
在hadoop01下执行
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
把hadoop01的id_rsa.pub 公钥复制追加到其他所有机子的authorized_keys中这样就实现了hadoop01上ssh到其他所有机子的免密的操作了。
ssh-copy-id hadoop[02-07]
安装JDK(已安装的跳过)
1 下载JDK(例如 jdk-8u171-linux-x64.tar.gz )到指定目录(例如 /usr/java 确认下权限755)
2 解压
tar -xvf jdk-8u171-linux-x64.tar.gz
3 建立软连接
rm -rf /usr/bin/java
ln -s /usr/java/jdk1.8.0_171/bin/java /usr/bin/java
4 配置环境变量
echo 'export JAVA_HOME=/usr/java/jdk1.8.0_171'>> /etc/profile
echo 'export JRE_HOME=$JAVA_HOME/jre' >> /etc/profile
echo 'export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin'>> /etc/profile
5 验证查看java版本
java -version
解压安装包到/app下
我用的版本太旧了,大家自行去下载想要的版本
推荐下载:CDH:Cloudera公司, CDH4(基于0.23.0) CHD5(基于Apache Hadoop 2.2.0开发)
下载方式:
cdh4 http://archive.cloudera.com/cdh4
cdh5 http://archive.cloudera.com/cdh5
版本参考:https://www.cloudera.com/developers/inside-cdh.html
安装ZOOKEEPER
在hadoop03-05上安装并启动ZK2
不知道怎么安装的自行google,baidu
2 修改配置(最核心也是最繁琐的一步)
所有配置文件都在解压包的etc下,各个配置的含义description 标签都有描述解释,这里不都介绍,需要的自行去查阅资料
cd /app/hadoop-2.5.0-cdh5.3.2/etc/hadoop
修改hadood-env.sh
vi hadoop-env.sh
java修改成自己的jdk目录
export JAVA_HOME=/usr/java/jdk1.8.0_171
修改core-site.xml
下面配置是自动切换的配置
<!-- 指定hdfs的nameservice为my-hadoop -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://my-hadoop/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<!-- 指定hadoop临时文件 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/app/data/hadoop/tmp</value>
</property>
<!-- zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
</property>
新增mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
配置参考如下:
<!-- MR YARN Application properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop02:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop02:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
修改hdfs-site.xml
vi hdfs-site.xml
配置参考如下
<!--指定hdfs的nameservice为my-hadoop,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>my-hadoop</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<!-- my-hadoop下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.my-hadoop</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<!-- nn1 nn2的RPC通信地址 和端口号,RPC用于和datanode通讯-->
<property>
<name>dfs.namenode.rpc-address.my-hadoop.nn1</name>
<value>hadoop01:8020</value>
<description>
RPC address for nomenode1 of my-hadoop
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.my-hadoop.nn2</name>
<value>hadoop02:8020</value>
<description>
RPC address for nomenode2 of my-hadoop
</description>
</property>
<!-- nn1,nn2的http通信地址和端口号,用来和web客户端通讯 -->
<property>
<name>dfs.namenode.http-address.my-hadoop.nn1</name>
<value>hadoop01:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.my-hadoop.nn2</name>
<value>hadoop02:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///app/data/hadoop/nameNode</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop03:8485;hadoop04:8485;hadoop05:8485/my-hadoop</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///app/data/hadoop/dataNode</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/app/data/hadoop/journalNode/</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled.my-hadoop</name>
<value>true</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.my-hadoop</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制占用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
修改yarn-site.xml
vi yarn-svite.xml
配置参考,注意这边设置resourcemanager.scheduler为FairScheduler,具体的在下面(新增fairscheduler.xml)这步会讲到。
注意:yarn 的resource的内存和cpu设置要根据实际情况设置,如有需要请自行查阅资料
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop06</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop07</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<description>fair-scheduler conf location</description>
<name>yarn.scheduler.fair.allocation.file</name>
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/app/data/hadoop/yarn</value>
</property>
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app/log/hadoop/yarn</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>
<property>
<description>Number of CPU cores that can be allocated for containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
修改slaves
vi slaves
删除原来的新增我们自己的
hadoop03
hadoop04
hadoop05
新增fairscheduler.xml
默认是capacity-scheduler.xml我们换成FairScheduler。Yarn 自带了两个支持多用户、多队列的调度器,分别是 Capacity Scheduler(容量调度器) 和 Fair Scheduler(公平调度器)。大家根据需要自行选择,我这边为了详细所需的操作才改的,配置也是随便写了,大家根据实际情况配置。
vi fairscheduler.xml
参考如下:
<?xml version="1.0"?>
<allocations>
<queue name="infrastructure">
<minResources>102400 mb, 50 vcores </minResources>
<maxResources>153600 mb, 100 vcores </maxResources>
<maxRunningApps>200</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
</queue>
<queue name="tool">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>
<queue name="sentiment">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>
</allocations>
3 启动hadoop
同步配置文件
将hadoo01的配置文件scp到其他机子上,这边就不全写出来了,大家自行scp
scp /app/hadoop-2.5.0-cdh5.3.2/etc/hadoop/* hadoop02:/app/hadoop-2.5.0-cdh5.3.2/etc/hadoop
启动
启动ZK 在zk安装路径下
bin/zkServer.sh start
用bin/zkServer.sh status 查看启动的状态
启动Hadoop集群:
注意:所有操作均在Hadoop部署目录下进行。
cd /app/hadoop-2.5.0-cdh5.3.2
Step1 :
在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode
Step2:
在[nn1] [nn2]我的是hadoop01 hadoop02上,对其进行格式化,并启动:
bin/hdfs zkfc -formatZK
sbin/hadoop-daemon.sh start zkfc
Step3:
在[nn1]我的是hadoop01上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
Step4:
在[nn2]上我的是hadoop02上,同步nn1的元数据信息,并启动:
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
经过以上四步操作,nn1和nn2会由zkfc 自动分配一个active,一个standby
访问:http://192.168.7.51:50070,http://192.168.7.52:50070查看hdfs信息
在hdfs-site.xml的配置 nn1,nn2的地址
192.168.7.51是hadoo01的地址,192.168.7.52是hadoo02的地址
Step5:
在[nn1]上,启动所有datanode
sbin/hadoop-daemons.sh start datanode
Step6:
在hadoop06上,启动yarn
sbin/start-yarn.sh
访问:http://192.168.7.56:8088查看是否启动成功
在yarn-site.xml中配置的
Step7:
在hadoop07上,启动备用主节点的ResourceManager
sbin/yarn-daemon.sh start resourcemanager
关闭Hadoop集群(关闭时使用,这里请勿运行)
在[nn1]上,输入以下命令
sbin/stop-dfs.sh
sbin/stop-yarn.sh
高可用验证
本操作
主节点 nameNode:hadoop01(192.168.7.51) ResourceManager:hadoop06(192.168.7.56)
备用节点 nameNode:hadoop02(192.168.7.52) ResourceManager:hadoop07(192.168.7.57)
step1.主节点—>备用节点
kill掉主节点的namenode,查看备用主节点的namenode状态是否切换为active;
kill掉主节点的ResourceManager,查看备用主节点的ResourceManager是否切换为active;
访问:http://192.168.7.51:50070,http://192.168.7.52:50070查看hdfs信息
访问:http://192.168.7.56:8088,http://192.168.7.56:8088查看yarn
step2.启动被杀死的原主节点的namenode和ResourceManager在Hadoop部署目录下进行
在hadoop01上
cd /app/hadoop-2.5.0-cdh5.3.2
sbin/hadoop-daemon.sh start namenode
在hadoop06上
cd /app/hadoop-2.5.0-cdh5.3.2
sbin/yarn-daemon.sh start resourcemanager
访问:http://192.168.7.51:50070,http://192.168.7.52:50070查看hdfs信息
访问:http://192.168.7.56:8088,http://192.168.7.56:8088查看yarn
step3.备用节点—>主节点
再kill备用主节点的namenode和ResourceManager,查看主节点的状态
访问:http://192.168.7.51:50070,http://192.168.7.52:50070查看hdfs信息
访问:http://192.168.7.56:8088,http://192.168.7.56:8088查看yarn
若能切换为active,那么Hadoop HA高可用集群搭建完成。
step4.启动被杀死的原备用节点的namenode和ResourceManager在Hadoop部署目录下进行
在hadoop02上
cd /app/hadoop-2.5.0-cdh5.3.2
sbin/hadoop-daemon.sh start namenode
在hadoop07上
cd /app/hadoop-2.5.0-cdh5.3.2
sbin/yarn-daemon.sh start resourcemanager
访问:http://192.168.7.51:50070,http://192.168.7.52:50070查看hdfs信息
访问:http://192.168.7.56:8088,http://192.168.7.56:8088查看yarn
结尾
至此本文结束,由于时间仓促,如有错误,希望大家包涵指正。操作比较多,希望第一次装hadoop的不要盲目的复制粘贴命令,本文更多提供一种参考,大家根据自己的实际情况来修改参数和命令,遇到问题多点耐心。祝大家成功,谢谢。