Our requirement was to put data into Hbase from an external system(.Net).
Our first thought was to write our own WebService which would put data into the Hbase through the native client.
Apart from this we had a number of challenges:;
So started off with thrift.But issue struck first.The only way our dev environment can access the cluster environmnet was throught the 80 port-basically HTTP protocol. And that too through a httpd proxy.
We though why not the thrift clients should be able to connect through the proxy, we can just proxy the port for the Thrift server component.
However it failed. Now we started investigating and found that thrift that comes packaged alongside CDH4 does'nt support the sending over HTTP. It was basically Binary protocol over TCP. So we required something like HAProxy to proxy the TCP protocol over.
Just out of curiosity I tried out the sample Clculator stub skeleton sample that comes with Thrift downloads. I downloaded the src and had to compile it for Centos. This was a challenge as the documentation was so few.
The configure and build commands we had to try and test each and every time.
and finally this worked for us to compile thrift from source for java
We ran a TCPMonitor(Proxy) between them and tried connecting the client and server . It failed.
It was here i came across a blog which told me to start the service as a servlet to support clinet to connect through Http protocol or http proxy. So we followed and started the service as a TServlet.
And now we connected to it through an http thrift client
And I am still reading this for furthur refining our approach
- http://en.wikipedia.org/wiki/Apache_Thrift
Our first thought was to write our own WebService which would put data into the Hbase through the native client.
Apart from this we had a number of challenges:;
- But we had an issue of separate cluster and dev environmnet .
- Out input data volume is quite high of the tune of 40GB/hr
- Each data log was of the tune of 6-10mb size
So started off with thrift.But issue struck first.The only way our dev environment can access the cluster environmnet was throught the 80 port-basically HTTP protocol. And that too through a httpd proxy.
We though why not the thrift clients should be able to connect through the proxy, we can just proxy the port for the Thrift server component.
However it failed. Now we started investigating and found that thrift that comes packaged alongside CDH4 does'nt support the sending over HTTP. It was basically Binary protocol over TCP. So we required something like HAProxy to proxy the TCP protocol over.
Just out of curiosity I tried out the sample Clculator stub skeleton sample that comes with Thrift downloads. I downloaded the src and had to compile it for Centos. This was a challenge as the documentation was so few.
The configure and build commands we had to try and test each and every time.
and finally this worked for us to compile thrift from source for java
configure --without-python --without-cpp --with-java=yes ANT=/home/ibsuser/Applns/apache-ant-1.9.2/bin/ant JAVA_HOME=/usr/java/jdk1.6.0_31We ran the calculator service and tried connecting it through the client something like this
Manually set the JAVA_HOME in the main makefile and the make file for the Java component.
sudo make install
generating java client code -
thrift -o ./test2 --gen java tutorial.thrift
TTransport transport;Our calculator service was started something like this
transport = new TSocket("*********", 9090);
TProtocol protocol = new TBinaryProtocol(transport);
Calculator.Client client = new Calculator.Client(protocol);
perform(client);
ServerTransport serverTransport = new TServerSocket(9090);
TServer server = new TSimpleServer(new Args(serverTransport).processor(processor));
We ran a TCPMonitor(Proxy) between them and tried connecting the client and server . It failed.
It was here i came across a blog which told me to start the service as a servlet to support clinet to connect through Http protocol or http proxy. So we followed and started the service as a TServlet.
within a class that extends TServlet
super(new Calculator.Processor(new CalculatorHandler()),new TBinaryProtocol.Factory());
And now we connected to it through an http thrift client
HttpClient thriftTrans = new THttpClient("***********");Now I can add the Http proxy in between them and it works. So that solved my problem. But I was hesitant going for this as http protocol is not good for large chunks of data. So we reverted to the plan of Using HAProxy and normal vanilla Thrift server and client coniguration.
TBinaryProtocol thriftProt = new TBinaryProtocol(thriftTrans);
thriftTrans.open();
Calculator.Client client = new Calculator.Client(thriftProt);
perform(client);
And I am still reading this for furthur refining our approach
- http://en.wikipedia.org/wiki/Apache_Thrift