1st: December 2013

Monday, December 30, 2013

Trplestore-Bigdata - Cluster Notes

The WORM (Write Once, Read Many) will generate much larger files. It's
primary use is to buffer writes on a cluster before they are migrated
into read-optimized key-range shards (index segment files).

http://sourceforge.net/projects/bigdata/forums/forum/676946/topic/5903581 - storing pages in bigdata and issues

http://sourceforge.net/projects/bigdata/forums/forum/676946/topic/5908980 - multitenancy API

Data can be uploaded in multiple ways to cluster :

Using the MappedRDFDataLoader - which uses multiple threads to efficiently upload data across multiple dataservers simulatneously
Using dataLoader Much rudimentary form of data upload with out Sail interfeace. htis can be extended by us to upload data from the mappers i hope. MappedRDFDataLoader is only possible form their cluster jini federation installation.
MappedRDFdataLoader - efficient and uses Sail interface .Easy to upload the data.Already tried out in by uploading data from mappers for data from FB

bigdata® - Graph DB

bigdata graph db can be installed as
Local Journal Model
Local DataService Federation
Local Embedded DataService Federation
Jini Fedaration

Get the bigdata src from the url - http://bigdata.svn.sourceforge.net/viewvc/bigdata/tags/BIGDATA_RELEASE_1_2_2/
I downloaded the tag 1_2_2.

Once downloaded setupo the NAS.

I preferred NFS as my nodes where running linux alone.samba is an option if windows nodes are also thr
install - nfs-kernel-server in your mster machine

and in the client machines install nfs-common applicable to ubuntu

find the folder which you plan to share as a network folder.
add its path to the /etc/exports file
/home/****/NFSHome 192.168.***.0/255.255.255.0(rw)

Now run exportfs -r
to export your folder to the network

At the client machines mount the file system by using commands like

test it first using the mount command

mount -t nfs xxx:/home/xx/NFSHome /home/xx/NFSHome
and then add to your fstab
then run mount -a
this should ensure that your nas folder is setup successfully

Now you can run the install script in your machine whr the src was unzipped
before that modify the value in build.properties.
follow the tutorial - http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide
i followed it and was quite useful.

Once you run the install you will find the bigdata files installed in nas folder.
move to you nas folder and change and append more configurations like where your zookeeper server should run in /config/bigdataStandalone.config
By default the configuration it takes is bigdataStandalone.config and runs all the service in the master server.

Now run bigdataenv.sh to set the environment parameters.
after that run the bigdata start and see the logs
In my case the initial trys failed because the zookeeper server dint start up..As the ip i set was not resolwing correctly.ensure the /etc/hosts entries are correct.

once it got started the listservice.sh should display somehting like

Waiting 5000ms for service discovery.
Zookeeper is running.
Discovered 1 jini service registrars.
   192.xx.xx.xx
Discovered 7 services
Discovered 0 stale bigdata services.
Discovered 6 live bigdata services.
Discovered 1 other services.
Bigdata services by serviceIface:
There are 1 instances of com.bigdata.jini.start.IServicesManagerService on 1 hosts
There are 1 instances of com.bigdata.journal.ITransactionService on 1 hosts
There are 2 instances of com.bigdata.service.IDataService on 1 hosts
There are 1 instances of com.bigdata.service.ILoadBalancerService on 1 hosts
There are 1 instances of com.bigdata.service.IMetadataService on 1 hosts
Bigdata services by hostname:
There are 6 live bigdata services on graphmaster
    There are 1 com.bigdata.jini.start.IServicesManagerService services
    There are 1 com.bigdata.journal.ITransactionService services
    There are 2 com.bigdata.service.IDataService services
    There are 1 com.bigdata.service.ILoadBalancerService services
    There are 1 com.bigdata.service.IMetadataService services

Thats it..the bigdata started working in a single node..Soon i shall update on multi cluster bigdata configurations!!

------------------------------------------------------------------------------------------------------------------------------

Starting bigdata
cd to the respective NFS folder where the bigdata resides.
select the node where you want to run the zookeeper.
run ./bigdataenv
run ./bigdata start

to check if services are startup run ./listServices.sh

running nanosparqlserver -

nanoSparqlServer.sh port namespace

http://192.168.192.105:9292 - It should give you the web screen!!

--------------------------------------------------------------------------------------

Running Bidata along hadoop is an interesting challenge.
For this you could run the dataservice in each of the nodes that run hadoop.
Then you could also make use of hadoop zookeeper instead of the zookeeperquorum being stauped by bigdata.

For this comment the
org.apache.zookeeper.server.quorum.QuorumPeerMain - class in bigdatacluster.config and the zookeeper configurations.
This should free a good amount of memory required for running zookeeper and offload this to the zookeeper used by hadoop.

Monday, December 23, 2013

Cloudera in Fedora 20

If you are planning using cloudera in fedora 10 you may end up in hiccups like

Transaction check error:
file /usr/bin/hadoop from install of hadoop-common-2.2.0-3.fc20.noarch conflicts with file from package hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.el6.x86_64

this is because the new fedora 20 brings alongside the hadoop distribution which conflicts with cloudera when you try to install

to remedy you can disable other repo while installing cloudera :

sudo yum --disablerepo="*" --enablerepo="cloudera*" install pig

you may also need to install

sudo yum install redhat-lsb
before running above

Wednesday, December 4, 2013

Load testing thrift services - Custom JMeter Sampler

Thrift Sampler.

In order to load test Thrift services , we need to write a java request based Sampler. For this we need to extend the AbstractJavaSamplerClient .

I have referred the url - http://ilkinbalkanay.blogspot.in/2010/03/load-test-whatever-you-want-with-apache.html - as a beginning.

Here is a sample code snippet

public class ThriftSampler extends AbstractJavaSamplerClient {

    private static final Logger log = LoggingManager.getLoggerForClass();
    private TTransport transport = null;
    private TProtocol protocol = null;
    private Hbase.Client client = null;
    private String tableName = null;



    @Override
    public Arguments getDefaultParameters() {
        Arguments defaultParameters = new Arguments();
        defaultParameters.addArgument("server", "");
        defaultParameters.addArgument("port", "");
        defaultParameters.addArgument("thrift-protocol", "");
        defaultParameters.addArgument("tablename", "");

        defaultParameters.addArgument("table-col1","");
        defaultParameters.addArgument("table-col2","");
        defaultParameters.addArgument("table-col3","");
        defaultParameters.addArgument("table-col4","");
        defaultParameters.addArgument("table-col5","");
        defaultParameters.addArgument("table-col6","");
        defaultParameters.addArgument("table-col7","");
        return defaultParameters;
    }

    @Override
    public void setupTest(JavaSamplerContext context) {
        String host = context.getParameter("server");
        String port = context.getParameter("port");

        tableName = context.getParameter("tablename");
        transport = new TSocket(host,Integer.parseInt(port));
        protocol = new TBinaryProtocol(transport, true, true);
        client = new Hbase.Client(protocol);
        try {
            transport.open();
        } catch (TTransportException e) {
            e.printStackTrace();
        }

    }


    public SampleResult runTest(JavaSamplerContext context) {
        og.debug("Straing the test");
        SampleResult result = new SampleResult();
        boolean success = true;
        result.sampleStart();

        String col1 = context.getParameter("table-col1");
        String col2 = context.getParameter("table-col2");
        String col3 = context.getParameter("table-col3");
        String col4 = context.getParameter("table-col4");
        String col5 = context.getParameter("table-col5");
        String col6 = context.getParameter("table-col6");
        String col7 = context.getParameter("table-col7");


        ArrayList mutations = new ArrayList();
        Map attributes = null;

        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col1),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col2),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col3),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col4),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col5),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col6),true));
        mutations.add(new Mutation(false, Charset.forName("UTF8").encode(FAMILYNAME+COLUMNNAME1), Charset.forName("UTF8").encode(col7),true));

        try {
            client.mutateRow(tableName, rowKey, mutations, attributes);
        } catch (IOError e) {
            e.printStackTrace();
        } catch (IllegalArgument e) {
            e.printStackTrace();
        } catch (TException e) {
            e.printStackTrace();
        }
         result.sampleEnd();
         result.setSuccessful(success);
         return result;
    }

    @Override
    public void teardownTest(JavaSamplerContext context) {
        super.teardownTest(context);
        transport.close();
    }

The pom.xml for the sampler would include following dependency be like :

org.apache.jmeter
       ApacheJMeter_core
       2.10

org.apache.jmeter
       ApacheJMeter_java
       2.10

org.apache.thrift
        libthrift
        0.9.0

org.apache.hbase
       hbase
       0.94.6-cdh4.4.0

Once compiled you need to configure the dependent libraries so that JMeter loads them this is done by configuring the
plugin_dependency_paths=
property in jmeter.properties in the bin folder inside Jmeter.

The newly compiled and packaged jar containing the Thrift sampler is put in the lib/ext folder inside JMeter folder .

in many cases you will need to enable logging to see how your sampler works for this configure the logging inside the jmeter.properties by enabling debug for your class by adding
log_level.=DEBUG
Also add this property
jmeter.loggerpanel.display=true

To enable logging for your code to be displayed in the log console within jmeter.

Once these are deployed you can start with a load by creating a template something like this:
For my case I configured CSVDataset for reading the test data from csv and then using the custom sampler for firing them onto the thrift server.

Then in my CustomThriftSampler I have configured the variables which i have mapped in CSVDataset configuration

Thats it now you can start firing your dats into Hbase through thrift and load test the performance..