Wednesday 13 January 2016

How to add a new datanode in existing hadoop cluster without restarting.

Follow the below instructions to add  a new datanode in existing hadoop cluster without restarting.

1. Create a file "includes" under /conf directory.

2. Include the IP of the datanode in this file.

3. Add the property below to hdfs-site.xml

<property>
    <name>dfs.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
    <final>true</final>
</property>
4. Add the property below to mapred-site.xml
<property>
    <name>mapred.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
</property>
5. In Namenode, execute
 bin/hadoop dfsadmin -refreshNodes
6. In Jobtracker node, execute
 bin/hadoop mradmin -refreshNodes

7. Login to the new slave node and execute:

$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

8. Add IP of the new datanode in conf/slaves file

Finally, Execute the below command during non-peak hour

$ bin/start-balancer.sh

No comments:

Post a Comment