One of the nightmares associated with writing MR code is with the difficulty associated with debugging and tracing the program. Since it is run as a Job in a cluster many newcomers find it very annoying. A solution to this is writing the jobs and run them in local standalone modes , so tat one cna debug and test as normal codes from within the IDE and then deploy them to clusters for running. And all these need to happen from within the environment.
We were able to do it using spring hadop and eclipse IDE. In short I evelop the jobs within the eclipse IDE debug and test them in single standalone jobtrackers running from within the IDE and then finally deploy them to original clusters.
Here is the spring and hadoop configuation and the test java class :
We were able to do it using spring hadop and eclipse IDE. In short I evelop the jobs within the eclipse IDE debug and test them in single standalone jobtrackers running from within the IDE and then finally deploy them to original clusters.
Here is the spring and hadoop configuation and the test java class :
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:batch="http://www.springframework.org/schema/batch"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.0.xsd">
hbase.zookeeper.quorum=xxxxx01 // Only required if you are connecting to hbase
hbase.zookeeper.property.clientPort=2181 //only required if you are connecting to hbase
hbase.mapred.outputtable=xxxxxx
output-path="xxxxx"
jar-by-class="com.xxx.xxx.xxx.xxx.xxxxx"
jar="classpath:xxxxx-0.0.1-job.jar"
/>
The sample code for testing this job is :
@RunWith(SpringJUnit4ClassRunner.class)Common pitfalls!!
@ContextConfiguration(locations={"/ApplicationContext.xml"})
public class XXXXTest {
@Inject
JobRunner jobRunner;
@Inject
Job xxxxJob;
@Test
public void test() {
Logger log = Logger.getLogger(XXXXTest.class);
log.info("Started the test!!");
Configuration conf = xxxxJob.getConfiguration();
//Any configuration that you need to perform upon the job should be done here !!
try {
jobRunner.call();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (0.94.6-cdh4.4.0), this version is Unknown
at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:100)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:111)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:120)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:181)
This happens due to some class loading problems in with hbase. Idealy the hbase should be loading from the bundled jars but some times this wired thing happens because of it loading from elsewhere may be from the running projects hbase jars. This can be avoided by adding an hbase-default.xml which mentions not to check for version issues.
Before making this update also make sure there are no duplicate hbase jars in the library that is making this problem.
The contents of hbase-defaul.xml is:
hbase.defaults.for.version.skip
true
Set to true to skip the 'hbase.defaults.for.version' check. Setting this to true can be useful in contexts other than the other side of a maven generation; i.e. running in an ide. You'll want to set this boolean to true to avoid seeing the RuntimException complaint: "hbase-default.xml file seems to be for and old version of HBase (0.92.1), this version is X.X.X-SNAPSHOT"