Install Apache Hadoop 2.9.2 on Ubuntu 18.04 | Step by Step Guide | Big Data


  • Ubuntu 18.04 Operating System on Oracle VirtualBox
  • Good internet connection on your system
  • Good to have a laptop/desktop with 8GB RAM, 50 to 100 GB free space in HDD (Hard Disk Drive), any good processor

Introduction to Apache Hadoop

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Download Apache Hadoop

Login to Ubuntu VM and open the Terminal window(it will be black screen like Windows command prompt)

Install Java

Get Repo file from this URL:

Download Repo file using wget command in the Terminal

Copy the Repo file into /etc/apt/sources.list.d/ location using below command

sudo cp cloudera-manager.list /etc/apt/sources.list.d/

Download and Import the repository signing GPG key

sudo wget

sudo apt-key add archive.key

Update your system package index by running below command

sudo apt-get update

Install Java using the below command

sudo apt-get install oracle-j2sdk1.8

Java will be installed at this location: /usr/lib/jvm/java-8-oracle-cloudera

ls /usr/lib/jvm/java-8-oracle-cloudera

Set JAVA_HOME in the file(with location): ~/.bashrc using below command

sudo nano ~/.bashrc

Paste the below command

export JAVA_HOME=/usr/lib/jvm/java-8-oracle-cloudera export PATH=$PATH:$JAVA_HOME/bin

Run the below command to run refresh the .bashrc file

source ~/.bashrc

Run the below command to check the version of the Java installed

java -version

Install OpenSSH server & client using the below command

sudo apt-get install openssh-server openssh-client

Type "Y" continue

Create the SSH Key for passless login (Press enter button when it asks you to enter a filename to save the key)

ssh-keygen -t rsa -P ""

Copy the generated ssh key to authorized keys

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Connect localhost using OpenSSH by running below command

ssh localhost (Type yes button any prompt comes up)

Exit from the ssh connection using the below command


Open Apache Hadoop official website and click on the "Download" button


Click on the Binary download of the version 2.9.2 as shown below


Click on link below "We suggest the following mirror site for your download:"

Right click the file downloaded and click on "Open Containing Folder", it will be pointing to /tmp/..

Copy the downloaded Hadoop binary file into /home/dmadmin/datamaking/softwares manually

Install Apache Hadoop

Navigate to the location /home/dmadmin/datamaking/softwares using below command

cd datamaking/softwares/

Change the binary file permission using below command

sudo chmod a+x hadoop-2.9.2.tar.gz

Extract binary file

sudo tar -xzvf hadoop-2.9.2.tar.gz

Hadoop Installation Path(location) will be: /home/dmadmin/datamaking/softwares/hadoop-2.9.2

Run the below command

cd hadoop-2.9.2/

Add the HADOOP_HOME and JAVA_HOME paths in the bash file (.bashrc)

sudo nano ~/.bashrc

Add the below Hadoop path information into .bashrc file


export JAVA_HOME=/usr/lib/jvm/java-8-oracle-cloudera
export HADOOP_INSTALL=/home/dmadmin/datamaking/softwares/hadoop-2.9.2
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"


Run the below command to refresh bash file (.bashrc)

source ~/.bashrc

Find the host name by running below command

hostname -f

sudo nano /etc/hosts

From this, localhost datamaking

To this, localhost datamaking

Create or Modifiy Hadoop configuration files

Now edit the configuration files in /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop directory.

Create masters file and edit as follows,

cd /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop

sudo nano masters

Add the hostname in the masters file as shown below


Edit slaves file as follows,

sudo nano slaves

Add the hostname in the masters file as shown below


Edit core-site.xml as follows,

sudo nano core-site.xml

Add the property in the core-site.xml as shown below

<description>Parent directory for other temporary directories.</description>
<description>The name of the default file system. </description>

Run the below command to create the directory

sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/tmp

Edit hdfs-site.xml as follows,

sudo nano hdfs-site.xml

Add the property in the hdfs-site.xml as shown below




Run the below command to create directory for namenode

sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/namenode

Run the below command to create directory for namenode

sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/datanode

Run the below command to provide permission for the directories/sub-directories of hadoop_data

sudo chown -R dmadmin:dmadmin /home/dmadmin/datamaking/softwares/hadoop_data

Copy mapred-site file from the template file in configuration folder and the edit mapred-site.xml as follows,

sudo cp mapred-site.xml.template mapred-site.xml

sudo nano mapred-site.xml

Add the property in the mapred-site.xml as shown below


Edit yarn-site.xml as follows,

sudo nano yarn-site.xml

Add the property in the yarn-site.xml as shown below


Run the below command to know the Hadoop version

hadoop version

Format the namenode using below command [It is the one time activity in the Apache Hadoop cluster setup]

hadoop namenode -format

Navigate to the Hadoop configuration folder/directory using below command

cd /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop

Open the Hadoop Environment File( to set JAVA_HOME path

sudo nano

Run the below command to provide permission for the directories/sub-directories of softwares

sudo chown -R dmadmin:dmadmin /home/dmadmin/datamaking/softwares


Run the below command to check required Hadoop components/processes are started


Check the NameNode Web UI using below URL

NameNode Web UI: http://datamaking:50070

Check the YARN Web UI using below URL

YARN Web UI: http://datamaking:8088


We have learned how to install Apache Hadoop 2.9.2 successfully on Ubuntu 18.04 and verify all the Hadoop processes/Web UI are running properly.

Please provide us the feedback and suggestions on this blog post.

Happy Learning !!!

Post a Comment