Number of maps are based on the number of splits.
Default split size is that of the block size. So that data is correctly partiioned ad block boundries.
Job - > task splitting is equal to number of splits made.
Hence this cannot be controlled. Except by changing the number of splits
The number max of task that run in a node is by default 2 .
This can be changes at each node by setting the parameter - mapred.tasktracker.map.tasks.maximum
Hence if you have a 4 core machine you can force the hadoop to run more than 2 task in a node.
And also its applicable if you are not running any reduce task.
The number of reduce task to be run can be set to zero by setting job.setNumReduceTasks(0);
Also if in your cluster you have one node a VM with a single core. you can set the max
map task to be run on it to 1or 2 by again setting the - mapred.tasktracker.map.tasks.maximum
in the mapred-site.xml of that node.
The default timeout interval for jobtrackers waiting for job completion is 600s this can be reset by adding
<property>
<name>mapred.task.timeout</name>
<value>3600000</value> <!--1hr -->
</property>
to mapred-site.xml
Default split size is that of the block size. So that data is correctly partiioned ad block boundries.
Job - > task splitting is equal to number of splits made.
Hence this cannot be controlled. Except by changing the number of splits
The number max of task that run in a node is by default 2 .
This can be changes at each node by setting the parameter - mapred.tasktracker.map.tasks.maximum
Hence if you have a 4 core machine you can force the hadoop to run more than 2 task in a node.
And also its applicable if you are not running any reduce task.
The number of reduce task to be run can be set to zero by setting job.setNumReduceTasks(0);
Also if in your cluster you have one node a VM with a single core. you can set the max
map task to be run on it to 1or 2 by again setting the - mapred.tasktracker.map.tasks.maximum
in the mapred-site.xml of that node.
The default timeout interval for jobtrackers waiting for job completion is 600s this can be reset by adding
<property>
<name>mapred.task.timeout</name>
<value>3600000</value> <!--1hr -->
</property>
to mapred-site.xml