Monday, December 30, 2013

bigdata® - Graph DB

bigdata graph db can be installed as
Local Journal Model
Local DataService  Federation
Local Embedded DataService Federation
Jini Fedaration


Get the bigdata src from the url - http://bigdata.svn.sourceforge.net/viewvc/bigdata/tags/BIGDATA_RELEASE_1_2_2/
I downloaded the tag 1_2_2.

Once downloaded setupo the NAS.

I preferred NFS as my nodes where running linux alone.samba is an option if windows nodes are also thr
install - nfs-kernel-server in your mster machine

and in the client machines install nfs-common applicable to ubuntu

find the folder which you plan to share as a network folder.
add its path to the /etc/exports file
/home/****/NFSHome 192.168.***.0/255.255.255.0(rw)

Now run exportfs -r
to export your folder to the network

At the client machines mount the file system by using commands like


test it first using the mount command

mount -t nfs xxx:/home/xx/NFSHome /home/xx/NFSHome
and then add to your fstab
then run mount -a
this should ensure that your nas folder is setup successfully

Now you can run the install script in your machine whr the src was unzipped
before that modify the value in build.properties.
follow the tutorial - http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide
i followed it and was quite useful.

Once you run the install you will find the bigdata files installed in nas folder.
move to you nas folder and change and append more configurations like where your zookeeper server should run in /config/bigdataStandalone.config
By default the configuration it takes is bigdataStandalone.config and runs all the service in the master server.

Now run bigdataenv.sh to set the environment parameters.
after that run the bigdata start and see the logs
In my case the initial trys failed because the zookeeper server dint start up..As the ip i set was not resolwing correctly.ensure the /etc/hosts entries are correct.

once it got started the listservice.sh should display somehting like

Waiting 5000ms for service discovery.
Zookeeper is running.
Discovered 1 jini service registrars.
   192.xx.xx.xx
Discovered 7 services
Discovered 0 stale bigdata services.
Discovered 6 live bigdata services.
Discovered 1 other services.
Bigdata services by serviceIface:
  There are 1 instances of com.bigdata.jini.start.IServicesManagerService on 1 hosts
  There are 1 instances of com.bigdata.journal.ITransactionService on 1 hosts
  There are 2 instances of com.bigdata.service.IDataService on 1 hosts
  There are 1 instances of com.bigdata.service.ILoadBalancerService on 1 hosts
  There are 1 instances of com.bigdata.service.IMetadataService on 1 hosts
Bigdata services by hostname:
  There are 6 live bigdata services on graphmaster
    There are 1 com.bigdata.jini.start.IServicesManagerService services
    There are 1 com.bigdata.journal.ITransactionService services
    There are 2 com.bigdata.service.IDataService services
    There are 1 com.bigdata.service.ILoadBalancerService services
    There are 1 com.bigdata.service.IMetadataService services

Thats it..the bigdata started working in a single node..Soon i shall update on multi cluster bigdata configurations!!

------------------------------------------------------------------------------------------------------------------------------

Starting bigdata
cd to the respective NFS folder where the bigdata resides.
select the node where you want to run the zookeeper.
run ./bigdataenv
run ./bigdata start

to check if services are startup run ./listServices.sh

running nanosparqlserver -
nanoSparqlServer.sh port namespace
http://192.168.192.105:9292 - It should give you the web screen!!

--------------------------------------------------------------------------------------

Running Bidata along hadoop is an interesting challenge.
For this you could run the dataservice in each of the nodes that run hadoop.
Then you could also make use of hadoop zookeeper instead of the zookeeperquorum being stauped by bigdata.

For this comment the
org.apache.zookeeper.server.quorum.QuorumPeerMain - class in bigdatacluster.config and the zookeeper configurations.
This should free a good amount of memory required for running zookeeper and offload this to the zookeeper used by hadoop.






No comments: