yarn - Hadoop doesn't use one node for job -
i've got 4 node yarn cluster set und running. had format namenode due smaller problem.
later ran hadoop's pi example verify every node still taking part in calculation, did. when start own job 1 of nodes not being used @ all.
i figured might because node doesn't have data work on. tried balance cluster using balancer. doesn't work , balancer tells me cluster balanced.
what missing?
while processing, applicationmaster negoriate nodemanager containers , nodemanager in turn try obtain nearest datanode resource. since replication factor 3, hdfs try place 1 whole copy on single datanode , distribute rest across datanodes.
1) change replication factor 1 (since trying benchmark, reducing replication should not big issue).
2) make sure client(machine give -copyfromlocal command) not have datanode running on it. if not, hdfs tend place of data in node since have reduced latency.
3) control file distribution using dfs.blocksize
property.
4) check status of datanodes using hdfs dfsadmin -report
.
Comments
Post a Comment