Download Java from oracle site
Install it.
It will typicaly install to
/usr/java/
Now use alternatives to point java to the oracle jdk based java
alternatives --install /usr/bin/java java /usr/java/jdk1.6.0_45/bin/java 1600
Update the user's ~/.bash_profile to point
export JAVA_HOME=/usr/java/jdk1.6.0_45/bin/java
Do this in all the nodes where hadoop runs
Add cloudera Repo
sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
Install Jobtracker
sudo yum install hadoop-0.20-mapreduce-jobtrackercker
This install the following files to following dirs
logs - /var/log/hadoop-0.20-mapreduce/
libs - /usr/lib/hadoop/ , /usr/lib/hadoop-0.20-mapreduce/
service daemon - /etc/rc.d/init.d/hadoop-0.20-mapreduce-jobtracker
configs - /etc/hadoop , /etc/default/hadoop
Install Namenode
sudo yum install hadoop-hdfs-namenode
service daemon /etc/rc.d/init.d/hadoop-hdfs-namenode
script - /usr/lib/hadoop-hdfs/sbin/refresh-namenodes.sh
Install secondaryNamenode
Idealy should be in a machine different from the master.
In slave nodes install :
sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
daemon service - /etc/rc.d/init.d/hadoop-0.20-mapreduce-tasktracker
library - /usr/lib/hadoop-hdfs/ /usr/lib/hadoop-0.20-mapreduce/ /usr/lib/hadoop-0.20-mapreduce/contrib/
others:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop-config.sh
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop-daemons.sh
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop-daemon.sh
After this follow the instructions in
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_4_4.html#../CDH4-Installation-Guide/cdh4ig_topic_11_2.html?scroll=topic_11_2_2_unique_1
line by line and do the things.
1-st configure HDFS in the system and configure the jobracker/tasktracker daemons
some point to note is the the name node directories and data node directories need to be set permission for hdfs:hdfs and the mapred local directory should be set permission for mapred:hadoop .
When configuring the directories for the namenodes and datanodes there are few points to be taken care of :
- Namenode directories store the metadata and edit logs
- And the Datanode directories store the block-pools
You can also mention multiple entries for the namenodes metadata directory.
the 2nd one can idealy be a high performance NFS so that if the local disk fails then it can depend on the NFS drive.
You can configure mutliple volumes for the datanode like
dfs.datanode.data.dir
/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn
So if /data/1/dfs/dn fails then it trys the next volume.
You can also mention the toleration for the failed nodesby setting the
parameter in the hdfs-site.xml
dfs.datanode.failed.volumes.tolerated .
Which means that hadoop will only return an error if the above mentioned number of volumes fail. Hence in abv example we can mention 3. So that it return an error if and only if it fails to write in all of -
/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/
If multiple volumes are not configured thenthis volume tolerated whoulc be idealy 0 and shouldnt be set.
scp the configuration to various host nodes.
sudo scp -r /etc/hadoop/conf.test_cluster/* testuser@test.com:/etc/hadoop/conf.test_cluster/
Starting all service in cloudera
First time you suse the hadoop format the hdfs . The command is
sudo -u hdfs hadoop namenode -format
We format the user as hdfs user which is important.
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
Some FAQ and common issues
Whenever you are getting below error, trying to start a DN on a slave machine:
> java.io.IOException: Incompatible clusterIDs in /home/hadoop/dfs/data: namenode clusterID
= ****; datanode clusterID = ****
>
> It is because after you set up your cluster, you, for whatever reason, decided to reformat
your NN. Your DNs on slaves still bear reference to the old NN. To resolve this simply delete
and recreate data folder on that machine in local Linux FS, namely /home/hadoop/dfs/data.
Restarting that DN's daemon on that machine will recreate data/ folder's content and resolve
the problem.
In CDH4 every hadoop daemon is configured as service .
And hdfs daemons start with - hadoop-hdfs
while map-reduce daemons - start with - hadoop-0.20-mapreduce
Often the firewall can block access b/w the nodes . We may have to add rules to enable and disable various ports to overcome this.
|
Daemon |
Default Port |
Configuration Parameter |
HDFS |
Namenode |
50070 |
dfs.http.address |
Datanodes |
50075 |
dfs.datanode.http.address |
Secondarynamenode |
50090 |
dfs.secondary.http.address |
Backup/Checkpoint node? |
50105 |
dfs.backup.http.address |
MR |
Jobracker |
50030 |
mapred.job.tracker.http.address |
Tasktrackers |
50060 |
mapred.task.tracker.http.address |
|
Daemon |
Default Port |
Configuration Parameter |
Protocol |
Used for |
Namenode |
8020 |
fs.default.name? |
IPC: ClientProtocol |
Filesystem metadata operations. |
Datanode |
50010 |
dfs.datanode.address |
Custom Hadoop Xceiver: DataNode and DFSClient |
DFS data transfer |
Datanode |
50020 |
dfs.datanode.ipc.address |
IPC: InterDatanodeProtocol, ClientDatanodeProtocol
ClientProtocol |
Block metadata operations and recovery |
Backupnode |
50100 |
dfs.backup.address |
Same as namenode |
HDFS Metadata Operations |
Jobtracker |
Ill-defined.? |
mapred.job.tracker |
IPC: JobSubmissionProtocol, InterTrackerProtocol |
Job submission, task tracker heartbeats. |
Tasktracker |
127.0.0.1:0¤ |
mapred.task.tracker.report.address |
IPC: TaskUmbilicalProtocol |
Communicating with child jobs |
? This is the port part of hdfs://host:8020/.
? Default is not well-defined. Common values are 8021, 9001, or 8012. See MAPREDUCE-566.
Binds to an unused local port.
The above infor is taken from cloudera site :
http://blog.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
We may have to write ip rules to enable these ports.
Idealy you will have to add these rules to the line just above
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
that is
ACCEPT all -- hdnode1.test.com anywhere
ACCEPT all -- hdnode2.test.com anywhere
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
In my case the OS is Centos and default INPUT FILTER ends with
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
And hence added the ACCEPT just before them. This would be OS specific. The point is just that the namenode should accept communication from ports defined for hadoop services.
Now on checking the hdfs status at https://:50070 it shows that
the service is available and the declared hadooop datanode are LIVE
In some cases you will witness erors like
-10-01 12:43:02,084 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1574577678-***************-1377251137260 (storage id DS-1585458778-***************-50010-1379678037696) service to ***.***.***.com/***************:8020 beginning handshake with NN 2013-10-01 12:43:02,097 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1574577678-***************-1377251137260 (storage id DS-1585458778-***************-50010-1379678037696) service to ***.***.***.com/***************:8020 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1585458778-***************-50010-1379678037696, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-ac14694a-bacb-4180-a747-464778d2d382;nsid=680798099;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:648)
This happened for us when we where using DNS for hostname and IP resolution rather than adding individual nodes in /etc/hosts file.
However we found out that for the NN we needed to add all slave node hostname ip details into /etc/hosts
In case of slave we need to add entries for the NN and the respective slave alone.
Also manualy congifure the dfs.hosts file and add the full path to it in hdfs-site.xml
Above all try restarting the cluster. It should work now
The error basicaly occures when NN or Slave fails in identifying or resolving the name. For us it resolved form terminal but when running the hdfs it failed.
Eventhough we were not able to pinpoint the troublemaker. The above steps rectified it..
|