Setup Hadoop Cluster Using Ansible Playbook

5 min readMar 21, 2021

I setting my complete Hadoop cluster on top of EC2 instances of AWS.In my case, I am going to set up a name node and one data node, and one client.

These EC2 instances are my target nodes so in my inventory file, I am mentioning the target system IPs and their private ssh key.

To set up the complete Hadoop cluster I have to run three playbooks.

→ namenode_setup.yml

→ datanode_setup.yml

→ client_setup.yml

Name Node setup:

Before doing the task on the name node we will take the inputs:

Namenode Public IP
Namenode Private IP
Enter name node port
Namenode folder

Tasks which we have to perform on the namenode target node :

step:1 “downloading JDK”

step:2 “checking the JDK installed or not”

step:3 “installing JDK”

step:4 “downloading Hadoop”

step:5 “checking the Hadoop installed or not”

step:6 “installing Hadoop”

step:7 “making directory for name node”

step:8 “From controller node copying N_hdfs-site.xml file to name node /etc/hadoop/hdfs-site.xml”

this is the N_hdfs-site.xml file at the controller node.

step:9 “From controller node copying N_core-site.xml file to name node /etc/hadoop/core-site.xml”

this is the N_core-site.xml file at the controller node

step:10 “formatting the name node”

step:11 “starting name node service”

step:12 “see the status of service”

These are the SC of playbook namenode_setup.yml …

After running the playbook namenode_setup.yml:

Namenode setup successfully

Data Node setup:

Before doing the task on the data node we will take the inputs:

Datanode Public IP
Namenode Public IP
Namenode Port
Datanode Folder

Tasks which we have to perform on the data node target node :

Step:1 “downloading JDK”

Step:2 “checking the JDK installed or not”

Step:3 “installing JDK”

Step:4 “downloading Hadoop”

Step:5 “checking the Hadoop installed or not”

Step:6 “installing Hadoop”

Step:7 “making directory for data node”

Step:8 “From the controller node copying D_hdfs-site.xml file to data node /etc/hadoop/hdfs-site.xml”

this is the D_hdfs-site.xml file

Step:9 “From the controller node copying D_core-site.xml file to data node /etc/hadoop/core-site.xml”

this is the D_core-site.xml file

step:10 “starting data node service”

step:11 “see the status of service”

These are the screenshots of datanode_setup.yml…

After running the playbook datanode_setup.yml:

Datanode setup successfully.

Client Setup:

Before doing the task on the client we will take the inputs:

Client Public IP
Namenode Public IP
Namenode Port

Tasks which we have to perform on the client target node :

Step:1 “downloading JDK”

Step:2 “checking the JDK installed or not”

Step:3 “installing JDK”

Step:4 “downloading Hadoop”

Step:5 “checking the Hadoop installed or not”

Step:6 “installing Hadoop”

Step:7 “From controller node copying C_core-site.xml file to data node /etc/Hadoop/core-site.xml”

Note: At the client-side, we only configure the core-site.xml because the client never shares the space client only uploads the data and retrieves the data.

this is the C_core-site.xml file

Step:8 “checking status of connection”

These are the screenshots of client_setup.yml…

After running the playbook client_setup.yml:

client node setup successfully.
My complete Hadoop cluster setup successfully using ansible-playbook…

This is called automation. Manually doing tasks very time-consuming and chances of mistakes also. So why we don’t do the tasks using ansible playbooks. These playbooks do the configuration for us we only have to run the playbooks.And using ansible playbooks we only perform configuration using ansible we can also do the provision , testing and deployment etc .