In today’s post, let us see how we can enhance the grammar representation discussed so far to include both Number constraint and Parse Tree.
Fortunately, this turns out to be quite straightforward. Just as we do in Prolog, we need to include additional parameters, as needed, to each grammar rule.
In the earlier two posts, we trivially included the terminal words in the grammar. This is not practical. In a real application, we would have a lexicon that contains parts of speech and other features of each word, and will supply the same to the parser as and when needed. So if we wish to know if sleeps is a singular verb, we consult the lexicon and it will say yes. Building a comprehensive lexicon is very useful in NLP research (of course, quite a challenge) and this is one of the projects I have been working on for a few years now.
Anyway, for our simple grammar, we will write functions that will interface to the lexicon. In fact, our functions will work with just a few words!
Here are the POS utility functions that get called during parsing.
The updated grammar rules are shown here.
If you look at the non-terminals np, vp, etc., you will notice that they have two arguments – one for building the parse tree and the other for enforcing number agreement.
The other point to note is that terminal categories such as pn and n call our POS utility functions to check the word category and to get the corresponding number.
The beauty of unification is apparent here. For example, while parsing np, its number is determined by the constituent n or pn and this eventually gets passed on to the vp, where the choice of a verb (iv or tv) is constrained by the number transmitted by np.
Let us try a few sample sentences.
CP-USER 2 > (parse-grammar ‘s ‘(he sleeps))
(S (NP (PN HE)) (VP (IV SLEEPS)))
CP-USER 3 > (parse-grammar ‘s ‘(they sleep))
(S (NP (PN THEY)) (VP (IV SLEEP)))
CP-USER 4 > (parse-grammar ‘s ‘(they write books))
(S (NP (PN THEY)) (VP (TV WRITE) (NP (N BOOKS))))
Looks like number agreement is correctly enforced. Let us confirm by giving some invalid sentences.
CP-USER 5 > (parse-grammar ‘s ‘(he sleep))
NIL
CP-USER 6 > (parse-grammar ‘s ‘(they sleeps))
NIL
Let us see if transitive vs intransitive verb rule is enforced.
CP-USER 7 > (parse-grammar ‘s ‘(he writes))
NIL
Since writes is a transitive verb, our grammar expects a np to follow it. In the above case, none follows, hence the sentence is rejected. Quite correct!
So that is it! We now know how to implement grammars that take constraints and also build the parse tree along the way! How can we improve the above DCG grammar expressed in Common Prolog of LispWorks?
Obviously, we can enhance the grammar to include more variations, but the more striking possibility is to be able to simplify the syntax of the grammar itself! If you look carefully at each grammar rule, you will see a lot of redundancy. Much of this can be eliminated if we choose a simpler syntax and translate that to this syntax. Since this is Lisp, anything is indeed possible! I have such a working version and if time permits, I will share that implementation in a future post.
You can download the Lisp source code for the above grammar here.
That is it for now. Bye.



Recent Comments