Introduction to Apache Hadoop

Apache Hadoop is a software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model called MapReduce and distributed storage component called HDFS(Hadoop Distributed File System). It is designed to scale up from single machine to thousands of machines, each offering local computation and storage. Apache Hadoop is a framework used to develop data processing applications which are executed in distributed computing style. Apache Hadoop uses master-slave architecture.

Apache Hadoop has many ecosystem components. The two main components are,

Hadoop Distributed File System (HDFS) - Scalable Distributed Storage Component

MapReduce - Distributed Computing Framework

Apache Hadoop works based on Master-Slave architecture.







Apache Hadoop Installation - Approach 1


  1. Install Oracle VirtualBox 6.0 on Windows 10
  2. Install Ubuntu 18.04 On Oracle VirtualBox
  3. Install Apache Hadoop 2.9.2 on Ubuntu 18.04

Apache Hadoop Installation - Approach 2

Create Gmail Account & Enable Free Trail in GCP (Google Cloud Platform)

Create VM Instance with Ubuntu 18.04 in Compute Engine

Install Apache Hadoop 2.9.2 on Ubuntu 18.04


Happy Learning!!!

Post a Comment

0 Comments