运行简单的样例程序报错
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
现象是job卡住, 进度一直停在20%
2023-08-24 15:02:43,732 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
023-08-24 15:02:43,749 INFO org.mortbay.log: Stopped [email protected]:8042
2023-08-24 15:02:43,755 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Exception when trying to cleanup container container_1692860510744_0002_01_000003: ExitCodeException exitCode=143:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.killContainer(DefaultContainerExecutor.java:450)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.signalContainer(DefaultContainerExecutor.java:406)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:419)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:750)
采用如下解决办法 25后面3个0 可以跑到73%才挂, 然后25后面4个0 跑到100%才挂
#RM(yarn-site.xml) 内存资源配置——两个参数:它们表示单个容器可以申请的最小与最大内存。
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>250000</value>
</property>
#NM(yarn-site.xml)前者表示单个节点可用的最大内存,RM中的两个值都不应该超过该值。
后者表示虚拟内存率,即占task所用内存的百分比,默认为2.1.
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>250000</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
找到问题描述: [YARN-4459] container-executor should only kill process groups - ASF JIRA
由于我用的是hadoop 2.7.2版本, 升级到 2.7.3版本尝试一下
换成2.7.3重新部署,果然没有任何问题了.