Sunday, February 17, 2013

LinkedData cloud - intelligent web

My projects targets are getting more interesting. rather than just ending up on a PoC i need more information to be served before the user.
I was looking throuI came across are :
DBpedia - from our one and only wikipedia - you dont know their value until you come across a situation like what i have come across.
FreeBase - Google is everywhere..Here too they have their presenc by acquiring FreeBase
LinkedGeoData - Beatifuly captures the geographical data.AFIK it gets data from openstreet map.Another reason to use openstreet :)
Now to get a view of the intelligent web-3.0 just view here
LOD-cloud
As Tim Berner Lee sees  : this is the infant stage of one of the human biggest asset. The intelligent web that is gona replace the Document web of web2.0. And you can see the biggest data endpoints are DBPedia and Freebase. Sad thing is no other organisation is contributing to it. As usual all the research and support are primarly driven by Universities in Europe.A big cheer for them!!

SO my quest was to pull data on a particular topic from this intelligent web. There is one more attration to this web-3.0 its fully available for download in your desktop :). But under the condition when you make use of it the info you create should also be free...aka Creative Commons.

Now DBPedia provides sparql based query endpoints to do queries.You can accesss them at - http://dbpedia.org/snorql/
For a simple query to bring all naturalPlaces in wikipedia here is it -
PREFIX dbpedia3: <http://dbpedia.org/ontology/>
select distinct
* where { ?b dbpedia2:type :NaturalPlace .
optional {?b dbpedia3:type :NaturalPlace . }}
Volla you ask how is it possible to query the web ..hehe thats the truth of web-3.0 its not just of viewing the dumb documents..But its all about firing sparql queries..
Now bad news for RDBMS lovers..afterall Relational Schema was just meant for Enterprise DB not human knowledge :).In this case you need descriptive logic.. If you have learned machine learning and AI this field of mathmatics of axioms and instances should be familiar to you.
For Freebase its different . Thay dont have sparql endpoints.As usual Google is evil :) even if they say "dont be evil". In FreeBase case the data is available as tsv format.Hard time now i need to make them into semantic format aka rdf ya nt format.
It doesnt end their..My agony is growing - These datasets have very few info about India(Some disadvntages when you develop from a developing nation). So some guys Human rights or NGO's planning ofr a data collection next time try using the web-3.0 based techniques..Its worth the effort i guarentee..Atleast it will beuseful for guys like me lurking for data in the LOD cloud.If you want merge with the LOD cloud they have some very simple norms -
  • There must be resolvable http:// (or https://) URIs.
  • They must resolve, with or without content negotiation, to RDF data in one of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples).
  • The dataset must contain at least 1000 triples. (Hence, your FOAF file most likely does not qualify.)
  • The dataset must be connected via RDF links to a dataset that is already in the diagram. This means, either your dataset must use URIs from the other dataset, or vice versam. We arbitrarily require at least 50 links.
  • Access of the entire dataset must be possible via RDF crawling, via an RDF dump, or via a SPARQL endpoint.
Mean time i want to see how i can get some data i wont in semantic form available..I am looking for information extraction from websites..Hmm its a bad idea but thats the only way formward..
SO happy coding!!


Monday, February 4, 2013

OWL RDF RDFs - some muses

Was tring hard on aPoC. Came across a series of interesting facts. Typical symptoms of playinng araound with vleeding edgre open source technologoes

this time its Jena. I had made use of Jena based Fuseki triplestore server for doing the basic learning of owl,rdfs and sparql. It has been a good journey until i surfaced with comples owl reasoning.

thr was a case somethig like

--------------------------

?a spr:ancestor ?b .

and

spr:ancestor rdf:type owl:TransitiveProperty

------------------------------

I needed to create another property that spr:preantOf which is of the same nature as spr:ancestor but should be non-transitive

Some research bought me to a solution.(As this is a most common problem in Descriptive Logic)

If:

?a spr:someRule ?b

then if i declare something like

spr:sometestRule rdf:subPropertyOf spr:someRule

it subsumes that sometestRule is valid for all cases of someRule and then we could apply transitiveness to spr:someRule

---------------------------------------------------

 

But when i tries the same in Fuseki .that is by adding

spr:parent rdfs:dubPropertyOf spr:ancestor

then called ?a spr:parent ?b

it dint retrieve anything..

-------------------------------------------------------------------

A little reading bought me to the fact that the reasoning in jena is rule based reasining which is good for rdfs+ based constructs. And for owl based reasoning it doesnt use

the dessciptive logic based reasoning but the rule based reasning which entirely depends on instance : As is mentione din thr site :

"The Jena OWL reasoners could be described as instance-based reasoners. That is, they work by using rules to propagate the if- and only-if- implications of the OWL constructs on instance data. Reasoning about classes is done indirectly - for each declared class a prototypical instance is created and elaborated. If the prototype for a class A can be deduced as being a member of class B then we conclude that A is a subClassOf B. This approach is in contrast to more sophisticated Description Logic reasoners which work with class expressions and can be less efficient when handling instance data but more efficient with complex class expressions and able to provide complete reasoning."


--------------------------------------------------------------------------------------------------------------------------------

A solution to this is to configure the Pellet reasoner with Fuseki. The current version of fuseki was 0.2.5 which uses jena version 2.7.4 and ARQ version 2.9.4 .
The pellet stablee version in maven is 2.3.0 which is based on jena 2.6.4 . I tried to recompile pellet for 2.7.4 but thr was enormous incompatibilty issues as the latest jena has moved to apache openjena. while the jena version used by pellet was still the HP jena. Issues arise with arq. So i had to drop the idea. Anyone out thr could put in your inputs!!


When i started using Pellet i also found something insterestng :

Till date i only knew about : Jena Reasoner API, Sesame API .

But pellet i found a new one OWL API. They have their pellet 2.3.0 supported by OWL API and a pellet-jena for jena based resoners :| .

To get an idea of the diff b/w the API's do refer - http://answers.semanticweb.com/questions/2568/jena-api-or-owl-api-or-protege-owl-api


---------------------------------------------------------------------------------------------------------------------------------------


One can use jena-pellet -2.3.0,tdb-0.8.10 to successfully intergrate b/w jena and pellet reasoner.

easoner =  PelletReasonerFactory.theInstance().create();           
String directory = "data/DB1" ;           
dataset = TDBFactory.createDataset(directory);         
emptyModel = dataset.getDefaultModel();

model = ModelFactory.createOntologyModel( PelletReasonerFactory.THE_SPEC ,emptyModel);


When i tried to add multiple model data using code snippet like

<code>String source = "D:\\Project\\Store_DB\\tmp\\trail_1.rdf";
FileManager.get().readModel( tdb, source);

So i had to try to load each file into separate models and  and add those model into the main model. That way it worked. It seemed to be  some bug with the version og jena.i saw in one of the blogs. This kept me occupied 1 day ufff..Anyways good findings.Enjoi!