Definite Clause Grammars (DCG) are an elegant formalism for specifying context free grammars, and part of their popularity is due to their support in the Prolog language. Most books on Natural Language processing usually include a brief coverage of DCGs, even though Natural languages are not context-free. Because of the ability to attach arbitrary actions on the RHS of the grammar, DCGs can do more than most context-free parsers.
As readers of my blogs know, I am a great fan of Lisp. The good news is that LispWorks Enterprise Edition comes with an add-on package called KnowledgeWorks that facilitates the building of complex knowledge-based systems. One component of this package is a Prolog engine implemented in Lisp. This engine even supports DCG! That means I can build language recognizers using DCG in Lisp. What more could one ask?
In today’s blog, I will show a toy DCG grammar for recognizing simple English sentences. If you are interested in learning more about DCG in Prolog, the following books might help:
- Clive Mathews, An Introduction to Natural Language Processing Through Prolog, Addison Wesley Longman, 1998.
- Michael A. Covington, Natural Language Processing for Prolog programmers, Prentice Hall, 1994.
- Pierre M.Nugues, Language processing with Perl and Prolog: Theories, Implementation, and Application (2nd Edition), Springer Verlag, 2014.
Let us look at one grammar rule:
(defgrammar det
((det (det ?Word)) ?Word ((is-det? ?Word) t) ))
This says, in essence, that if the input word satisfies the function is-det?, then it is a determiner. In addition, it builds a parse structure (det <input-word>) and is associated with the non-terminal det.
The function is-det? takes an argument and returns t (True) if it is a determiner. In this example, I am trivially checking for membership in a simple list, but in an actual implementation, this has to interface to a huge lexicon of many thousand words, containing detailed description (including semantic annotations) of each word.
Here is a slightly more interesting rule:
(defgrammar np
((np (np ?N)) (noun ?N))
((np (np ?D ?N)) (det ?D) (noun ?N))
((np (np ?D ?A ?N)) (det ?D) (adj ?A) (noun ?N)) )
This says that the non-terminal np (stands for Noun Phrase) is any one of the following:
NP -> Noun
NP -> Det Noun
NP -> Det Adj Noun
When the NP is satisfied by the input, it results in an appropriate parse tree.
The main entry point for the grammar is s, which defines the complete sentence. In our case, the rule is quite simple:
S -> NP VP
That is, a sentence is defined as a noun phrase followed by a verb phrase.
The full grammar is shown in the following image:
Let us try to parse some sentences:
CP-USER 2 > (parse-grammar ‘s ‘(the big dog chased the small cat))
(S (NP (DET THE) (ADJ BIG) (NOUN DOG)) (VP (VERB CHASED) (NP (DET THE) (ADJ SMALL) (NOUN CAT))))
The sentence the big dog chased the small cat is correct as per our grammar and the corresponding parse tree is returned.
CP-USER 3> (parse-grammar ‘s ‘(mohan saw mary))
(S (NP (NOUN MOHAN)) (VP (VERB SAW) (NP (NOUN MARY))))
This is also a valid sentence. You can see the difference between these two sentences in terms of Noun Phrase. The first sentence uses Det Adj Noun combination, whereas the second sentences just uses Noun.
Quite obviously, this is a toy grammar, and has been created just to show the DCG formalism in Lisp. English grammar is quite complex with many constraints coming into play, for example number agreement, semantic correctness, and so on. I will try to cover some of these topics in future posts.
For now, you can download the Lisp version of my grammar (remember this requires LispWorks Enterprise Edition).
Recent Comments