Wednesday, October 23, 2013

Thrift service - learning new

Our requirement was to put data into Hbase from an external system(.Net).
Our first thought was to write our own WebService which would put data into the Hbase through the native client.

Apart from this we had a number of challenges:;

  • But we had an issue of separate cluster and dev environmnet .
  • Out input data volume is quite high of the tune of 40GB/hr
  • Each data log was of the tune of 6-10mb size
So we cannot take risk to write a WebService which can handle this data volume. So went for thrift hoping that it should be able to handle such heavy volume traffic.

So started off with thrift.But issue struck first.The only way our dev environment can access the cluster environmnet was throught the 80 port-basically HTTP protocol. And that too through a httpd proxy.
We though why not the thrift clients should be able to connect through the proxy, we can just proxy the port for the Thrift server component.

However it failed. Now we started  investigating and found that thrift that comes packaged alongside CDH4 does'nt support the  sending over HTTP. It was basically Binary protocol over TCP. So we required something like HAProxy to proxy the TCP protocol over.

Just out of curiosity I tried out the sample Clculator stub skeleton sample that comes with Thrift downloads. I downloaded the src and had to compile it for Centos. This was a challenge as the documentation was so few.

The configure and build commands we had to try and test each and every time.
and finally this worked for us to compile thrift from source for java

configure --without-python --without-cpp --with-java=yes ANT=/home/ibsuser/Applns/apache-ant-1.9.2/bin/ant JAVA_HOME=/usr/java/jdk1.6.0_31

Manually set the JAVA_HOME in the main makefile and the make file for the Java component.

sudo make install

generating java client code -
 thrift -o ./test2 --gen java tutorial.thrift
We ran the calculator service and tried connecting it through the client something like this

TTransport transport;
transport = new TSocket("*********", 9090);
TProtocol protocol = new  TBinaryProtocol(transport);
Calculator.Client client = new Calculator.Client(protocol);
perform(client);
 Our calculator service was started something like this

ServerTransport serverTransport = new TServerSocket(9090);
TServer server = new TSimpleServer(new Args(serverTransport).processor(processor));

We ran a TCPMonitor(Proxy) between them and tried connecting the client and server . It failed.
It was here i came across a blog which told me to start the service as a servlet to support clinet to connect through Http protocol or http proxy. So we followed and started the service as a TServlet.

within a class that extends TServlet
   super(new Calculator.Processor(new CalculatorHandler()),new TBinaryProtocol.Factory());

And now we connected to it through an http thrift client

HttpClient thriftTrans = new THttpClient("***********");
TBinaryProtocol thriftProt = new TBinaryProtocol(thriftTrans);
 thriftTrans.open();
Calculator.Client client = new Calculator.Client(thriftProt);
perform(client);
 Now I can add the Http proxy in between them and it works. So that solved my problem. But I was hesitant going for this as http protocol is not good for large chunks of data. So we reverted to the plan of Using HAProxy and normal vanilla Thrift server and client coniguration.

And I am still reading this for furthur refining our approach

- http://en.wikipedia.org/wiki/Apache_Thrift