In the last post, I outlined an approach to convert a dependency graph (the result of dependency parsing) to RDF. The particular RDF format I used is Turtle, which is widely supported. Today, I would like to show how to load this RDF data in a Semantic Graph database and make queries on it.
There are several Semantic Web Frameworks and Graph databases that we could use, but my choice today is AllegroGraph free Server Edition from Franz Inc. It supports SPARQL as well as multiple Reasoners. Besides, it has excellent integration with Allegro Common Lisp and I am an avid user of that Lisp (in addition to LispWorks Lisp). It also supports clients written in Java, Python, Clojure, and many more via its REST API.
I downloaded the Virtual Machine package (Ubuntu 64-bit) and since my development machine is Windows 64-bit, installed the VM in my VMware Workstation 12 Player running on my Windows 10.
Please note that your version of the AllegroGraph server might differ from the one I am using. But that should not matter for our discussion.
There are convenient icons to Start and Stop the server.
Let us double click on Start AG and wait for a few seconds before the server is ready. Since the server is running inside a VM box, I need the IP of the server to connect to it from my Windows machine.
To find the IP address, let us open the Terminal and enter
~/franz/ipaddr.sh
This shows the IP address. The port to use is 10035.
To make sure the server is running fine, let us open a Browser and point to the server:
You can see that this shows the Web View of the server. This means the server is working fine. Although we can work with the database using this interface, we will instead access the services programmatically using Allegro Common Lisp.
Using Lisp Client
After launching AllegroCL, we have to first load the client library and set up the basic package environment:
(load “G:/acl-projects/Allegrograph/agraph.fasl”)
(eval-when (:compile-toplevel :load-toplevel :execute)
(require :agraph)
(db.agraph:enable-readtable :allegrograph)
)
(in-package :db.agraph.user)
(use-package :cl)
We then define some convenient functions:
(defun create-or-open (name server-ip function)
(funcall function name
:triple-store-class ‘remote-triple-store
:server server-ip
:port 10035
:user “test” :password “xyzzy”))
(defun create-store (name server-ip)
(create-or-open name server-ip #’create-triple-store))
(defun open-store (name server-ip)
(create-or-open name server-ip #’open-triple-store))
(defun close-store ()
(close-triple-store))
;;; Open the Dependency Graph DB
;;; Returns the number of triples in the DB
(defun open-dep-graph-store (dbname serverip)
(open-store dbname serverip)
(enable-!-reader)
(enable-print-decoded t)
(triple-count))
;; Close the currently open DB after committing the data.
(defun close-dep-graph-store ()
(commit-triple-store)
(close-store))
;;; Delete all the tuples. Not Committed!
(defun delete-all-tuples ()
(delete-triples :s nil))
Since the server is now running, let us create a new triple store (listener transcript is shown – result is in green text):
triple-store-user(15): (create-store “rdf-test” “192.168.240.133”)
#<remote-triple-store
rdf-test http://192.168.240.133:37792/sessions/8e0ed752-549e-b354-2e98-000c29d66220, open @ #x204a7a1a2>
Let us verify that there is no data in the triple store yet.
triple-store-user(16): (triple-count)
0
Let us also confirm the DB name:
triple-store-user(17): (db-name *db*)
“rdf-test”
OK. Now let us load our Turtle file, created in the last post.
triple-store-user(18): (load-turtle “G:/Python Projects/Dependency parser/Sample.ttl”)
300
{G}
Let us check the count of triples.
triple-store-user(19): (triple-count)
300
Although this step is not needed, for safety, let us commit the transaction, close the triple store and re-open it.
triple-store-user(20): (commit-triple-store)
t
triple-store-user(21): (close-store)
nil
Let us now open the triple store instead of creating a new one.
triple-store-user(22): (open-dep-graph-store “rdf-test” “192.168.240.133”)
300
triple-store-user(23): (db-name *db*)
“rdf-test”
triple-store-user(24): (triple-count)
300
If you recall, we had used a namespace called m in our triples. We have to register that name space as part of the current DB.
triple-store-user(25): (register-namespace “m” “http://mmsindia/depgraph/example/“)
Querying Using Prolog
One of the benefits of using Lisp client is that AllegroCL lets you query the triple store using its implementation of Prolog (this syntax is lispy compared to the standard Prolog). A good tutorial is here:
https://franz.com/agraph/support/documentation/current/prolog-tutorial.html
Let us start with a simple query: Find the lemmas of all the words appearing in the text:
triple-store-user(26): (remove-duplicates (apply #’append (select (?x)
(q- ? !m:lemma ?x))) :test #’string=)
(“jack” “happily” “walk” “down” “street” “icecream” “in” “hand” “suddenly” “a” …)
The above shows a partial list; there are many more words in the result.
You can see that the query pattern is pretty straightforward. We are looking for that ?x which appears as the third element of the triple:
<word-id> m:lemma <the-lemma>
Thus it matches jack in the triple
m:word-250 m:lemma “jack”
and so on.
Next, let us find the root words of all the sentences:
triple-store-user(27): (remove-duplicates (apply #’append (select (?z)
(q- ?x !m:ROOT ?)
(q- ?x !m:label ?z))) :test #’string=)
(“walking” “started” “saw” “kicked” “ran”)
Here we have a query that involves two triples.
The variable ?x in the first pattern matches the first element of every triple that has m:ROOT as the second element.
The second pattern matches the triple that has m:label as the second element, but whose first element is the same as the just matched ?x. The variable ?z then binds to the third element of that triple.
As you would have noticed, the pattern structure is quite similar to what we use in SPARQL.
Here are two more queries:
How many times does the word Jack appear in the text?
triple-store-user(28): (length (select (?x)
(q- ?x !m:label ?root)
(lispp (string= (upi->value ?root) “Jack”))
))
1
It occurs in the first sentence and not later.
How many sentences are there in the whole text?
triple-store-user(29): (length (remove-duplicates (apply #’append (select (?x)
(q- ?x !m:word ?)
)) :test #’string=))
5
OK, a total of 5 sentences.
Many complex queries can be conveniently expressed in the Prolog format. The Lisp client also supports SPARQL queries. Here is an example.
Which sentences contain the work Jack?
triple-store-user(30): (run-sparql
“prefix m: <http://mmsindia/depgraph/example/>
select ?x {
?x m:word ?w .
?w m:label \”Jack\” .
}” :results-format :lists)
(({sent-1}))
OK, only sent-1 has it.
Notice how the patterns of triples are quite similar in SPARQL and Prolog queries.
Let us close the DB.
triple-store-user(31): (close-store)
nil
I suppose you get the big picture. By converting the dependency graph of parsed sentences to RDF, we are able to load the triples into a Semantic Web DB (a graph database) and apply interesting queries on the data. With additional information stored, it may even be possible to make inferences over the data.
Although I have shown the use case of AllegroGraph, the RDF data can be loaded into any Semantic Web DB (and almost all of them support Turtle format) and similarly queried using SPARQL.
Hope you found this discussion useful.
Have a nice weekend!
Recent Comments