Installation of Single node Hadoop 2.2.0 on Ubuntu

080114_1547_Installatio1.png

Prerequisite for this is to prepare the environment:

    1. Before installing any application or software, update all the packages from all repositories:

root@roma-ubuntu :~$ sudo apt-get update

    1. For running Hadoop, it requires Java v1.7+, installing java:

root@roma-ubuntu :~$ sudo apt-get install openjdk-7-jdk
root@roma-ubuntu :~$ java -version
java version “1.7.0_25”
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
root@roma-ubuntu :~$ cd /usr/lib/jvm
root@roma-ubuntu :~$ ln -s java-7-openjdk-amd64 jdk
root@roma-ubuntu :~$ sudo apt-get install openssh-server

Add Dedicated Hadoop Group and User:

    1. Adding Group:

root@roma-ubuntu:/usr/lib/jvm# sudo addgroup hadoop
Adding group `hadoop’ (GID 1001)

Done

    1. Creating a user and adding the user to that group(it will ask for password):

root@roma-ubuntu:/usr/lib/jvm# sudo adduser –ingroup hadoop hduser
Adding user `hduser’ …
Adding new user `hduser’ (1001) with group `hadoop’ …
Creating home directory `/home/hduser’ …
Copying files from `/etc/skel’ …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
Full Name []: Roma
Room Number []: 1
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
root@roma-ubuntu:/usr/lib/jvm#

Setup SSH Certificate:
The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them and also local machine if you want to use Hadoop with it.
For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created:

    1. Run this key generation command:

hduser@roma-ubuntu:~$ ssh-keygen -t rsa -P ‘id-rsa’
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory ‘/home/hduser/.ssh’.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
f5:a9:29:c2:a1:e6:61:1a:73:f2:bb:b2:bd:fa:30:e2 hduser@roma-ubuntu
The key’s randomart image is:
+–[ RSA 2048]—-+
| |
| |
| . |
| . . . |
| . S o |
| o . o |
|. * * o . o |
|…# . . . |
| E+=O+ |
+—————–+

    1. Enabling SSH access to your local machine with this newly created key:

hduser@roma-ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    1. The final step is to test the SSH setup by connecting to your local machine with the hduser user.

hduser@roma-ubuntu:~$ ssh localhost

Download Hadoop 2.2.0
$ cd ~
$ wget http://www.trieuvan.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
Move Hadoop to directory of your choice, I have moved to :
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
Make sure to change the owner of all the files to the hduser user and Hadoop group by using this command:
$ sudo chown -R hduser:hadoop Hadoop

Setup Hadoop Environment Variables
The following are the required files we will use for the perfect configuration of the single node Hadoop cluster. I will be updating the files using PICO editor, you can use VI as well:

  1. Update $HOME/.bashrc
  2. yarn-site.xml
  3. core-site.xml
  4. mapred-site.xml
  5. hdfs-site.xml
    • Paste following code at the end of file:

$cd ~
$pico .bashrc

#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
###end of paste
    • Paste the following within configuration tag:

$ cd /usr/local/hadoop/etc/Hadoop
$ pico yarn-site.xml

<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
    • Paste the following within configuration tag:

$ pico core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
    • Create two directories to be used by namenode and datanode:

$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /usr/local/hadoop/etc/Hadoop
$ pico hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
</configuration>
    • Rename the template file and paste the following within configuration tag:

$ mv mapred-site.xml.template mapred-site.xml
$ pico mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
    • Paste the following at the end of file:

$ pico hadoop-env.sh

#modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/

Re-login into Ubuntu using hdser and check Hadoop version
hduser@roma-ubuntu:~$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
hduser@roma-ubuntu:~$
At this point, Hadoop is installed.

Formatting and Starting/Stopping the HDFS file system via the NameNode:
The first step to starting up your Hadoop installation is formatting the Hadoop file system which is implemented on top of the local file system of your cluster. You need to do this the first time you set up a Hadoop cluster.
NOTE: Do not format a running Hadoop file system as you will lose all the data currently in the cluster (in HDFS).

$ hdfs namenode –format

Start Hadoop services:
hduser@roma-ubuntu:~$ start-dfs.sh
Starting namenodes on [localhost]
hduser@localhost’s password:
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-roma-ubuntu.out
hduser@localhost’s password:
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-roma-ubuntu.out
Starting secondary namenodes [0.0.0.0]
hduser@0.0.0.0’s password:
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-roma-ubuntu.out

hduser@roma-ubuntu:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-roma-ubuntu.out
hduser@localhost’s password:
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-roma-ubuntu.out

hduser@roma-ubuntu:~$ jps
3473 DataNode
3317 NameNode
3800 ResourceManager
4105 NodeManager
4205 Jps
3653 SecondaryNameNode
hduser@roma-ubuntu:~$

To check Hadoop is working fine or not, run an Example
hduser@ubuntu: cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

You will see lots of processing going on:
Number of Maps = 2
Samples per Map = 5
13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2
13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name


Stop Hadoop by running the following command
hduser@roma-ubuntu:~$ stop-dfs.sh
Stopping namenodes on [localhost]
hduser@localhost’s password:
hduser@localhost’s password: localhost: Permission denied, please try again.
localhost: Permission denied, please try again.
hduser@localhost’s password:
localhost: stopping namenode
hduser@localhost’s password:
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
hduser@0.0.0.0’s password:
0.0.0.0: stopping secondarynamenode

hduser@roma-ubuntu:~$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
hduser@localhost’s password:
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
hduser@roma-ubuntu:~$

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

2 thoughts on “Installation of Single node Hadoop 2.2.0 on Ubuntu

Leave a Reply

Your email address will not be published. Required fields are marked *