Monday, November 25, 2013

Debugging and Testing MR codes within IDE

One of the nightmares associated with writing MR code is with the difficulty associated with debugging and tracing the program. Since it is run as a Job in a cluster many newcomers find it very annoying. A solution to this is writing the jobs and run them in local standalone modes , so tat one cna debug and test as normal codes from within the IDE and then deploy them to clusters for running. And all these need to happen from within the environment.

We were able to do it using spring hadop and eclipse IDE. In short I evelop the jobs within the eclipse IDE debug and test them in single standalone jobtrackers running from within the IDE and then finally deploy them to original clusters.

Here is the spring and hadoop configuation and the test java class :


    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:util="http://www.springframework.org/schema/util"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
    http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
    http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.0.xsd">   

  
    
        hbase.zookeeper.quorum=xxxxx01 // Only required if you are connecting to hbase
        hbase.zookeeper.property.clientPort=2181 //only required if you are connecting to hbase
        hbase.mapred.outputtable=xxxxxx

    

   
   

  
   
   
    output-path="xxxxx"
    jar-by-class="com.xxx.xxx.xxx.xxx.xxxxx"
    jar="classpath:xxxxx-0.0.1-job.jar"                         
    />
   
      
   

   
 The sample code for testing this job is :

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations={"/ApplicationContext.xml"})
public class XXXXTest {
   
    @Inject
    JobRunner jobRunner;

    @Inject
    Job xxxxJob;

    @Test
    public void test() {
       
        Logger log = Logger.getLogger(XXXXTest.class);
        log.info("Started the test!!");
       
        Configuration conf = xxxxJob.getConfiguration();
       
       //Any configuration that you need to perform upon the job should be done here !!       
       
        try {
            jobRunner.call();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
       
    }


Common pitfalls!!

Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (0.94.6-cdh4.4.0), this version is Unknown
    at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
    at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:100)
    at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:111)
    at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:120)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:181)

This happens due to some class loading problems in with hbase. Idealy the hbase should be loading from the bundled jars but some times this wired thing happens because of it loading from elsewhere may be from the running projects hbase jars. This can be avoided by adding an hbase-default.xml which mentions not to check for version issues.

Before making this update also make sure there are no duplicate hbase jars in the library that is making this problem.

The contents of hbase-defaul.xml is:



    hbase.defaults.for.version.skip
    true
    Set to true to skip the 'hbase.defaults.for.version' check. Setting this to true can be useful in contexts other than the other side of a maven generation; i.e. running in an ide. You'll want to set this boolean to true to avoid seeing the RuntimException complaint: "hbase-default.xml file seems to be for and old version of HBase (0.92.1), this version is X.X.X-SNAPSHOT"
 

 
 

No comments: