1st: 2014

Monday, August 4, 2014

Quick Postgres Setup Fedora

Reference links :

https://fedoraproject.org/wiki/PostgreSQL
https://wiki.postgresql.org/wiki/First_steps
https://wiki.postgresql.org/wiki/YUM_Installation
https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t
https://wiki.postgresql.org/wiki/First_steps

sudo yum install postgresql-server postgresql-contrib
journalctl -xn
sudo postgresql-setup initdb
postgresql-setup upgrade
sudo service postgresql start

sudo passwd postgres - set password for user postgres

su -l postgres
"now login to postgres console"
pgsql
"update the password for postgres user"
\password postgres

sudo yum install pgadmin3
sudo vi /var/lib/pgsql/data/pg_hba.conf
"update the entry to support md5 authentication"
host all all 127.0.0.1/32 md5
service postgresql reload

Now open pgadmin clinet tool
And login with to database with
name - postgres
and the password you have set.

Now you may be able to connect from localmachine.
Since the service runs in 127.0.0.1:5432
If you want to connect from remote machine it should run on 192.* address your local network address. or you external ip address

In this case the postgresal should listen to those ips.You can enable by
editing
sudo vi /var/lib/pgsql/data/postgresql.conf

and update -
listen_addresses = '*'

service postgresql reload

sudo firewall-cmd --permanent --add-port=5432/tcp

sudo firewall-cmd --add-port=5432/tcp

Wednesday, July 16, 2014

Fedora - file association and application launcher.

In Fedora to install an application and et them reflected in th launcher you need to add the entry at
~/.local/share/applications/ - if that need to be shown in the launcher of the logged in user alone
/usr/share/applications/ - if it need to shown for all users

ls ~/.local/share/applications/
AdobeReader.desktop
chrome--Default.desktop
defaults.list
mimeapps.list
SQuirreL SQL Client.desktop

This is how it looks like .

And to associate a fletype to a particular application
Add the entry at
mimeapps.list - located at the same above folders.

The mimeapps.list looks like

[Added Associations]
application/pdf=evince.desktop;
text/vnd.graphviz=gedit.desktop;
image/jpeg=shotwell-viewer.desktop;
text/plain=gedit.desktop;libreoffice-calc.desktop;
application/x-x509-ca-cert=gedit.desktop;
application/xml=gedit.desktop;
text/x-java=gedit.desktop;
application/octet-stream=firefox.desktop;
application/x-trash=gedit.desktop;
application/x-shellscript=gedit.desktop;
image/png=shotwell-viewer.desktop;
application/x-wais-source=gedit.desktop;
application/x-executable=gedit.desktop;
application/x-ica=wfica.desktop

On the left hand side is the mimetype of the file. and
on the right hand side is the entry in the applications folder.

That it.

Monday, June 2, 2014

Installing hadoop snappy libraries in Amazon AMI

The aMAzon AMI instances are 64 bit and hence the defualt native libraries that come with hadoop distribution fail to load.

You get an exception like -
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

For me this happedned with hadoop-2.2.0
And I found that following post - http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos

And for this I had to recompile the hadoop-src in the target machine. (Linux AMI).

If you are planning you need to compile the compression library also alongside.

One can follow these blogs and reference sites. I basically followed them and a litle ingeuity.

Installing snappy compession library - https://code.google.com/p/snappy/

Compiling hadoop with snappy - https://code.google.com/p/hadoop-snappy/

General reference compiling hadoop - http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/

issues with automake - https://issues.apache.org/jira/browse/HADOOP-10110
https://issues.apache.org/jira/browse/HADOOP-10117
https://issues.apache.org/jira/browse/HADOOP-8580

Main blog - http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm

Once compiled copy the native libs to the respective folders and update the parameter -

export HADOOP_OPTS="$HADOOP_OPTS -server -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_HOME/lib/native/"

For hadoop 1.21. follow the url https://code.google.com/p/hadoop-snappy/ completely

Friday, May 23, 2014

Setting up Anonymous ftp server in Amazon EC2

anonymous_enable=YES
local_enable=NO
write_enable=NO
local_umask=022
anon_upload_enable=NO
anon_mkdir_write_enable=NO
anon_other_write_enable=NO
anon_world_readable_only=YES
connect_from_port_20=NO
hide_ids=YES
pasv_enable=YES
pasv_min_port=1024
pasv_max_port=1048
pasv_address=<<publlic dns name for the machine>>
pasv_addr_resolve=YES
listen_port=21
xferlog_enable=YES
ls_recurse_enable=NO
ascii_download_enable=NO
async_abor_enable=YES
one_process_model=YES
idle_session_timeout=120
data_connection_timeout=300
accept_timeout=60
connect_timeout=60
anon_max_rate=2048000
dirmessage_enable=YES
listen=YES

edit /etc/vsftpd/vsftpd.conf

sudo /etc/init.d/vsftpd start

open ports for inbound communication for 20-22

1024-1048

disable firewall

Wednesday, April 16, 2014

Wikipedia DBPedia and Extracting information

In semantci web a major source of information is from wikipedia. It stands as the single largest source of semantic information.Other competor is Freebase whose majority data is
retrieved from wikipedia. Wikipedia has another version known as DBPedia which allows to download the dataaset in triple format. This act as a starting point for building your knowledge graphs. But often the most difficult part is retrieving data that is relavant to you.
For instance you want to retrieve the tourist destinations in india..

The long and default way to do this is to get the all the resource instances form dpedia . check if the instance is of type hotel,meuseum etc . And then retrieve them.Then go find the latitude and longitude if given. If found then retrieve those in india alone. This tedious process needs the use of mapreduce programming and lot of iterations to finally retrieve the data.
There is a short and efficient way to do this. That is by in wikipedia every indormation is categorised. Thanks to the active content editores
So in short it results in something like

In short if you could get the contents from within the category tourism in india then you are actualy buildiing the knowledge graph about tourism in india. And this can be done!!!
You need to download the categories data from wikipedia and do the refinement on it. Basically a category description in wikipedia in triples format is :(As seen in DBPedia)

step1 ) Thats means this can be further refined to get the subcatergories as well as the subjects(topics) contained within the category tourism in india.
step 2) Once the subjects are got we can check it to be a category or not.If its a category its url will be of form resource/Category . And then we repeat the step 1
step 3) Is its not a category we take those subjects and their information from instances dump of wikipedia.

This way we would be able to extract the tourist destinations as well as the places of interest in india!!!!