1st: March 2013

Thursday, March 28, 2013

Solving time synchronisation in VM - virtual box

From the virtualbox menu select "Install Guest Additions" in the Device menu.
Now You will see the CDROM device in your guest machine loaded with a "media containing the installation files for guest additions"
You can select it and install the guest additions in the guest machines
Once installed restart the guest machine.

Now in the Host machien go to the virtaulBox installation folder.
Their you will find a program named VboxManage.
Using this you mention the time sync latency by issuing command from here

VBoxManage guestproperty set <guest-os-name> "VirtualBox/GuestAdd/VBoxService/-timesync-interval" 1000
VBoxManage guestproperty set <guest-os-name> "VirtualBox/GuestAdd/VBoxService/-timesync-set-threshold" 1000

that should keep your vms time sync to the accuracy of 1s!

Friday, March 8, 2013

BIgdata - compressed/zip/gz reading.

my problem was that i was having a freebase dump of 8GB. Which is gz format.

But when i explode it becomes close to 60GB. that was too large to be read into the hdfs sequentialy.

idealy i needed hadoop to read in in zip format itself . A little google search took me here -

http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

I found a personal blog of - https://github.com/kevinweil/hadoop-lzo very useful. And i followed his blog completely.

I got the code from Git. And did

ant compile-java

ant compile-native

And then i copied the native files within my - hadoop-lzo-0.4.15/lib/native/Linux-amd64-64

To the hadoop native lib folder - hadoop-1.1.0/lib/native/Linux-amd64-64

I also copied the hadoop-lzo-0.4.15.jar to - hadoop-lzo-0.4.15/lib (* i dbt whether this is needed. anyway did it)

Now you need to add the codec to the Hadoop configuration . For this add the following to the core-site.xml

<property>

    <name>io.compression.codecs</name>

    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>

</property>

<property>

    <name>io.compression.codec.lzo.class</name>

    <value>com.hadoop.compression.lzo.LzoCodec</value>

</property>

Now decompressed my gz and compressed it back to lzo format.

gz -d
If your file is corrupt you can do like

      gunzip < file.gz > file.txt
now i did install lzo by
      sudo apt-get install liblzo2-dev

and then i zipped into lzo format

     lzop file.txt

Now i coped the file.lzo into hdfs using :
hadoop -dfs put /lzofiles

Now i ran the the command
./hadoop jar hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer /lzofiles

It gave output like

3/03/07 09:19:04 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library

13/03/07 09:19:04 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]

13/03/07 09:19:04 INFO lzo.LzoIndexer: LZO Indexing directory /lzofiles...

Initialy i got error like

13/03/07 09:11:46 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library

java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)

at java.lang.Runtime.loadLibrary0(Runtime.java:845)

For this i ensured the JAVA_LIBRARY_PATH is set correctly to hadoop native lib folder.

i tried to echo'ed the variable from within the hadoop command file. and ensured thet variable is set. You can set it by putting

export JAVA_LIBRARY_PATH = in hadoop-env.sh

Also ensure that you build those libgplcompression.so files from within your machine. because these are native calls and depend on your machine architecture and os.

Thats it.

Now this is messy i need to fine tune it to suite to the spring batch way of intergration. .I shall do if time permits.till then

Happy coding.

Installing for cloudera is very simple
install the relavant repo and install the libraries

cd /etc/yum.repos.d/ && wget http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/cloudera-gplextras4.repo
yum install hadoop-lzo-cdh4 hadoop-lzo-cdh4-mr1

Grub2 - saving my Fedora17 - adding a new entry

I have bought a new SSD and installed Fedora 18 into it. Enabled TRIM on it and nived hime folder and swap out to my cylindar hard disk. Everything worked as planned until i found my fedora 17 boot menu missing.

Now i am looking for a quadruple boot with Windows 7, Fedora 17 ,Fedora 18, and Windows XP. Hmmm issues is with devices i bought during various times and their compatibilty with diff OS versions. Thanls to the reckless manufacturers they dont bother upgrading the drivers of outdated devices

Coming back my plan is to rescue my fedora 17 which disappeared from the boot menu. A little search on Grub bought me here - http://www.dedoimedo.com/computers/grub-2.html#mozTocId514088 .

I found my old boot partition safe in the old disk. I just need the grub to point to there and initiate the boot process. In Olden dys this was simple add them to the menu.1st . I rember doing that during collage days -

set root=(hd0,5)
linux /boot/vmlinuz
initrd /boot/initrd.img

set the root variable to (hd0,5) partition
linux - command loads the linux kernel
initrd - Load an initial ramdisk for a Linux kernel image, and set the appropriate parameters in the Linux setup area in memory. This may only be used after the linux command (see linux) has been run

But with Grub2 things have changed. Now you need to add the same as a script and put them into /etc/grub.d folder.

Now to get track of my harddisk location of the boot partition i lost i will have to get to the grub-cli and try out with its auto-complete feature typing root(hd0,....(presstab) .

In my case the linux kernel is
- vmlinuz-3.7.9-104.fc17.i686
and the ram image is
- initramfs-3.7.9-104.fc17.i686.img

So i created a script file that does this and add that script file to the /etc/grub.d/. Also chmod it to give executable persmission.

My existing scripts in /etc/grub.d/ are:
sudo ls /etc/grub.d/
00_header 10_linux 20_linux_xen 20_ppc_terminfo 30_os-prober 40_custom 41_custom README

So i decided to a new file like :
15_fedora17 - OS that is will be in the menu after my default linux fedora 18.

Now you can update the grub.cfg by runnimg grob-makeconfig -o

TO my suprise it automaticaly found my fedora 17 inspite of my manual addition.SO finalyy my grub config had two entries for fedora 17 one made by me and another found by fedora.

When i booted i faced another problem my swap space for fedora 17 was removed..grr
SO i had to change the boot options from grub menu and rename the swap space which was mentione din the boot param rd.lvm.lv=