Some Technical things to share: How to add a new datanode in existing hadoop cluster without restarting.

Wednesday 13 January 2016

1. Create a file "includes" under /conf directory.

2. Include the IP of the datanode in this file.

3. Add the property below to hdfs-site.xml

<property>
   <name>dfs.hosts</name>
   <value>[HADOOP-HOME]/conf/includes</value>
   <final>true</final>
</property>

4. Add the property below to mapred-site.xml

<property>
<name>mapred.hosts</name>
<value>[HADOOP-HOME]/conf/includes</value>
</property>

5. In Namenode, execute

bin/hadoop dfsadmin -refreshNodes

6. In Jobtracker node, execute

bin/hadoop mradmin -refreshNodes

7. Login to the new slave node and execute:

$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

8. Add IP of the new datanode in conf/slaves file

Finally, Execute the below command during non-peak hour

$ bin/start-balancer.sh

Some Technical things to share