Dependency Graph to RDF – Part 2

Written by on September 30, 2018 in LISP, Natural Language Processing, Programming with 0 Comments

In the last post, I outlined an approach to convert a dependency graph (the result of dependency parsing) to RDF. The particular RDF format I used is Turtle, which is widely supported. Today, I would like to show how to load this RDF data in a Semantic  Graph database and make queries on it.

There are several Semantic Web Frameworks and Graph databases that we could use, but my choice today is AllegroGraph free Server Edition from Franz Inc. It supports SPARQL as well as multiple Reasoners. Besides, it has excellent integration with Allegro Common Lisp and I am an avid user of that Lisp (in addition to LispWorks Lisp). It also supports clients written in Java, Python, Clojure, and many more via its REST API.

I downloaded the Virtual Machine package (Ubuntu 64-bit) and since my development machine is Windows 64-bit, installed the VM in my VMware Workstation 12 Player running on my Windows 10.

AllegroGraph VM

AllegroGraph VM

Please note that your version of the AllegroGraph server might differ from the one I am using. But that should not matter for our discussion.

There are convenient icons to Start and Stop the server.

Let us double click on Start AG and wait for a few seconds before the server is ready. Since the server is running inside a VM box, I need the IP of the server to connect to it from my Windows machine.

To find the IP address, let us open the Terminal and enter 

~/franz/ipaddr.sh 

This shows the IP address. The port to use is 10035.

Finding the IP Address

Finding the IP Address

To make sure the server is running fine, let us open a Browser and point to the server:

Browser View

Browser View

You can see that this shows the Web View of the server. This means the server is working fine. Although we can work with the database using this interface, we will instead access the services programmatically using Allegro Common Lisp.

Using Lisp Client

After launching AllegroCL, we have to first load the client library and set up the basic package environment:

(load “G:/acl-projects/Allegrograph/agraph.fasl”)

(eval-when (:compile-toplevel :load-toplevel :execute)

    (require :agraph)

    (db.agraph:enable-readtable :allegrograph)

  )

(in-package :db.agraph.user)

(use-package :cl)

We then define some convenient functions:

(defun create-or-open (name server-ip function)

    (funcall function name

    :triple-store-class ‘remote-triple-store

    :server server-ip

    :port 10035

    :user “test” :password “xyzzy”))

(defun create-store (name server-ip)

    (create-or-open name server-ip #’create-triple-store))

(defun open-store (name server-ip)

    (create-or-open name server-ip #’open-triple-store))

(defun close-store ()

    (close-triple-store))

;;; Open the Dependency Graph DB

;;; Returns the number of triples in the DB

(defun open-dep-graph-store (dbname serverip)

    (open-store dbname serverip)

    (enable-!-reader)

    (enable-print-decoded t)

    (triple-count))

;; Close the currently open DB after committing the data.

(defun close-dep-graph-store ()

     (commit-triple-store)

     (close-store))

;;; Delete all the tuples. Not Committed!

(defun delete-all-tuples ()

    (delete-triples :s nil))

Since the server is now running, let us create a new triple store (listener transcript is shown – result is in green text):

triple-store-user(15): (create-store “rdf-test” “192.168.240.133”)

#<remote-triple-store

  rdf-test http://192.168.240.133:37792/sessions/8e0ed752-549e-b354-2e98-000c29d66220, open @ #x204a7a1a2>

Let us verify that there is no data in the triple store yet.

triple-store-user(16): (triple-count)

0

Let us also confirm the DB name:

triple-store-user(17): (db-name *db*)

“rdf-test”

OK. Now let us load our Turtle file, created in the last post.

triple-store-user(18): (load-turtle “G:/Python Projects/Dependency parser/Sample.ttl”)

300

{G}

Let us check the count of triples.

triple-store-user(19): (triple-count)

300

Although this step is not needed, for safety, let us commit the transaction, close the triple store and re-open it.

triple-store-user(20): (commit-triple-store)

t

triple-store-user(21): (close-store)

nil

Let us now open the triple store instead of creating a new one.

triple-store-user(22): (open-dep-graph-store “rdf-test” “192.168.240.133”)

300

triple-store-user(23): (db-name *db*)

“rdf-test”

triple-store-user(24): (triple-count)

300

If you recall, we had used a namespace called m in our triples. We have to register that name space as part of the current DB.

triple-store-user(25): (register-namespace “m” “http://mmsindia/depgraph/example/“)

http://mmsindia/depgraph/example/

Querying Using Prolog

One of the benefits of using Lisp client is that AllegroCL lets you query the triple store using its implementation of Prolog (this syntax is lispy compared to the standard Prolog). A good tutorial is here:

https://franz.com/agraph/support/documentation/current/prolog-tutorial.html

Let us start with a simple query: Find the lemmas of all the words appearing in the text:

triple-store-user(26): (remove-duplicates (apply #’append (select (?x)  

         (q- ? !m:lemma ?x))) :test #’string=)

(“jack” “happily” “walk” “down” “street” “icecream” “in” “hand” “suddenly” “a” …)

The above shows a partial list; there are many more words in the result.

You can see that the query pattern is pretty straightforward. We are looking for that ?x which appears as the third element of the triple:

<word-id> m:lemma <the-lemma>

Thus it matches jack in the triple

m:word-250 m:lemma “jack”

and so on.

Next, let us find the root words of all the sentences:

triple-store-user(27): (remove-duplicates (apply #’append (select (?z)  

        (q- ?x !m:ROOT ?)

        (q- ?x !m:label ?z))) :test #’string=)

(“walking” “started” “saw” “kicked” “ran”)

Here we have a query that involves two triples. 

The variable ?x in the first pattern matches the first element of every triple that has m:ROOT as the second element.

The second pattern matches the triple that has m:label as the second element, but whose first element is the same as the just matched ?x. The variable ?z then binds to the third element of that triple.

As you would have noticed, the pattern structure is quite similar to what we use in SPARQL.

Here are two more queries:

How many times does the word Jack appear in the text?

triple-store-user(28): (length (select (?x)  

        (q- ?x !m:label ?root)

        (lispp (string= (upi->value ?root) “Jack”)) 

        ))

1

It occurs in the first sentence and not later.

How many sentences are there in the whole text?

triple-store-user(29): (length (remove-duplicates (apply #’append (select (?x)  

        (q- ?x !m:word ?)

        )) :test #’string=))

5

OK, a total of 5 sentences.

Many complex queries can be conveniently expressed in the Prolog format. The Lisp client also supports SPARQL queries. Here is an example.

Which sentences contain the work Jack?

triple-store-user(30): (run-sparql

 “prefix m: <http://mmsindia/depgraph/example/> 

 select ?x {

    ?x m:word ?w .

    ?w m:label \”Jack\” .

 }” :results-format :lists)

(({sent-1}))

OK, only sent-1 has it.

Notice how the patterns of triples are quite similar in SPARQL and Prolog queries.

Let us close the DB.

triple-store-user(31): (close-store)

nil

I suppose you get the big picture. By converting the dependency graph of parsed sentences to RDF, we are able to load the triples into a Semantic Web DB (a graph database) and apply interesting queries on the data. With additional information stored, it may even be possible to make inferences over the data.

Although I have shown the use case of AllegroGraph, the RDF data can be loaded into any Semantic Web DB (and almost all of them support Turtle format) and similarly queried using SPARQL.

Hope you found this discussion useful.

Have a nice weekend!

Tags: , , , , , , , ,

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

Subscribe via RSS Feed

Leave a Reply

Your email address will not be published.

Top