Prerequisite
- Ubuntu 18.04 Operating System on Oracle VirtualBox
- Good internet connection on your system
- Good to have a laptop/desktop with 8GB RAM, 50 to 100 GB free space in HDD (Hard Disk Drive), any good processor
Introduction to Apache Hadoop
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Download Apache Hadoop
Login to Ubuntu VM and open the Terminal window(it will be black screen like Windows command prompt)Install Java
Get Repo file from this URL: https://archive.cloudera.com/cm6/6.3.0/ubuntu1804/apt/cloudera-manager.list
Download Repo file using wget command in the Terminal
Copy the Repo file into /etc/apt/sources.list.d/ location using below command
sudo cp cloudera-manager.list /etc/apt/sources.list.d/
Download and Import the repository signing GPG key
sudo wget https://archive.cloudera.com/cm6/6.3.0/ubuntu1604/apt/archive.key
sudo apt-key add archive.key
Update your system package index by running below command
sudo apt-get update
Install Java using the below command
sudo apt-get install oracle-j2sdk1.8
Java will be installed at this location: /usr/lib/jvm/java-8-oracle-cloudera
ls /usr/lib/jvm/java-8-oracle-cloudera
Set JAVA_HOME in the file(with location): ~/.bashrc using below command
sudo nano ~/.bashrc
Paste the below command
export JAVA_HOME=/usr/lib/jvm/java-8-oracle-cloudera export PATH=$PATH:$JAVA_HOME/bin
Run the below command to run refresh the .bashrc file
source ~/.bashrc
Run the below command to check the version of the Java installed
java -version
Install OpenSSH server & client using the below command
sudo apt-get install openssh-server openssh-client
Type "Y" continue
Create the SSH Key for passless login (Press enter button when it asks you to enter a filename to save the key)
ssh-keygen -t rsa -P ""
Copy the generated ssh key to authorized keys
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Connect localhost using OpenSSH by running below command
ssh localhost (Type yes button any prompt comes up)
Exit from the ssh connection using the below command
exit
Open Apache Hadoop official website and click on the "Download" button
URL: https://hadoop.apache.org
Click on the Binary download of the version 2.9.2 as shown below
URL: https://hadoop.apache.org/releases.html
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
Click on link below "We suggest the following mirror site for your download:"
http://apachemirror.wuchna.com/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
Right click the file downloaded and click on "Open Containing Folder", it will be pointing to /tmp/..
Copy the downloaded Hadoop binary file into /home/dmadmin/datamaking/softwares manually
Install Apache Hadoop
Navigate to the location /home/dmadmin/datamaking/softwares using below commandcd datamaking/softwares/
Change the binary file permission using below command
sudo chmod a+x hadoop-2.9.2.tar.gz
Extract binary file
sudo tar -xzvf hadoop-2.9.2.tar.gz
Hadoop Installation Path(location) will be: /home/dmadmin/datamaking/softwares/hadoop-2.9.2
Run the below command
cd hadoop-2.9.2/
Add the HADOOP_HOME and JAVA_HOME paths in the bash file (.bashrc)
sudo nano ~/.bashrc
Add the below Hadoop path information into .bashrc file
# HADOOP VARIABLES SETTINGS START HERE
export JAVA_HOME=/usr/lib/jvm/java-8-oracle-cloudera
export HADOOP_INSTALL=/home/dmadmin/datamaking/softwares/hadoop-2.9.2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
# HADOOP VARIABLES SETTINGS END HERE
Run the below command to refresh bash file (.bashrc)
source ~/.bashrc
Find the host name by running below command
hostname -f
sudo nano /etc/hosts
From this,
127.0.0.1 localhost
127.0.1.1 datamaking
To this,
127.0.0.1 localhost
127.0.0.1 datamaking
Create or Modifiy Hadoop configuration files
Now edit the configuration files in /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop directory.
Create masters file and edit as follows,
cd /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop
sudo nano masters
Add the hostname in the masters file as shown below
datamaking
Edit slaves file as follows,
sudo nano slaves
Add the hostname in the masters file as shown below
datamaking
Edit core-site.xml as follows,
sudo nano core-site.xml
Add the property in the core-site.xml as shown below
<property> <name>hadoop.tmp.dir</name> <value>/home/dmadmin/datamaking/softwares/hadoop_data/tmp</value> <description>Parent directory for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://datamaking:9000</value> <description>The name of the default file system. </description> </property>
Run the below command to create the directory
sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/tmp
Edit hdfs-site.xml as follows,
sudo nano hdfs-site.xml
Add the property in the hdfs-site.xml as shown below
<property> <name>dfs.namenode.name.dir</name> <value>/home/dmadmin/datamaking/softwares/hadoop_data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/dmadmin/datamaking/softwares/hadoop_data/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
Run the below command to create directory for namenode
sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/namenode
Run the below command to create directory for namenode
sudo mkdir -p /home/dmadmin/datamaking/softwares/hadoop_data/datanode
Run the below command to provide permission for the directories/sub-directories of hadoop_data
sudo chown -R dmadmin:dmadmin /home/dmadmin/datamaking/softwares/hadoop_data
Copy mapred-site file from the template file in configuration folder and the edit mapred-site.xml as follows,
sudo cp mapred-site.xml.template mapred-site.xml
sudo nano mapred-site.xml
Add the property in the mapred-site.xml as shown below
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Edit yarn-site.xml as follows,
sudo nano yarn-site.xml
Add the property in the yarn-site.xml as shown below
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>datamaking:8088</value> </property>
Run the below command to know the Hadoop version
hadoop version
Format the namenode using below command [It is the one time activity in the Apache Hadoop cluster setup]
hadoop namenode -format
Navigate to the Hadoop configuration folder/directory using below command
cd /home/dmadmin/datamaking/softwares/hadoop-2.9.2/etc/hadoop
Open the Hadoop Environment File(hadoop-env.sh) to set JAVA_HOME path
sudo nano hadoop-env.sh
Run the below command to provide permission for the directories/sub-directories of softwares
sudo chown -R dmadmin:dmadmin /home/dmadmin/datamaking/softwares
start-all.sh
or
start-dfs.sh
start-yarn.sh
Run the below command to check required Hadoop components/processes are started
jps
Check the NameNode Web UI using below URL
NameNode Web UI: http://datamaking:50070
Check the YARN Web UI using below URL
YARN Web UI: http://datamaking:8088
Summary
We have learned how to install Apache Hadoop 2.9.2 successfully on Ubuntu 18.04 and verify all the Hadoop processes/Web UI are running properly.Please provide us the feedback and suggestions on this blog post.
Happy Learning !!!
0 Comments