Text Generation Using iLangGen Framework

Written by on August 6, 2017 in LISP, Natural Language Processing, Programming with 0 Comments

The two primary areas in Natural Language processing are Natural Language Understanding and Natural Language Generation. The former is concerned with processing and making sense of natural language text, whereas the latter is concerned with synthesizing text, possibly from some deep representation. Both are fascinating and at the same time, challenging, areas of research. The good news is that both these areas have moved from research into mainstream today.

I have been fortunate enough to be associated with both these areas for many years (one of my first projects was to implement an ATN parser in Lisp – in the year 1987). Alongside my other project commitments, over the years, I have been gradually building three core components of NLP:

– A lexicon

– A parser/chunking engine

– A text generation framework

Even though Machine Learning is widely used in the area of text processing, I believe that an intelligent lexicon has its own uses in parsing and generation. I hope my lexicon will be ready for commercial use in the near future. I have been using it as an integral component of my chunking engine as well as the text generation system.

I hope to write more on all three projects in future posts, but today I would like to talk about iLangGen, a text generation framework that I have implemented in Common Lisp.

At its core, iLangGen uses a BNF-like grammar formalism to model the surface structure of English text (at present, it is limited to English, but it is possible to extend it to other languages too). It is quite feature-rich, for example, with the ability to build new grammars from existing grammars using composition and inheritance techniques. I will be giving examples of these in future posts.

Although the primary use of iLangGen is likely to be generating text in natural language, another interesting use case is generating test cases for an application such as compiler. We can build a grammar (non-trivial exercise) to generate sample programs that can be used as test inputs for a compiler.

OK, end of introduction. Let us look at a sample grammar:

iLangGen Grammar

iLangGen Grammar

Every grammar has a name. This grammar is called SimpleGrammar. After the name, there is a place holder for an optional parent grammar. In this case, there is none. After that, we have many rules, where each rule is made up of an LHS and RHS. Terminal elements are enclosed in double quotes; the others are non-terminals.

Once a grammar has been defined, we can generate text using the grammar. Text generation, in this case, is the result of traversing the implicit AND-OR graph. We can plug-in a custom function to participate in the traversal. For greater flexibility, iLangGen supports traversals that generate AST as well as those that do not involve building the AST.

Here I have defined two custom functions, one that makes use of the AST and another that doesn’t. If the function returns true, the traversal continues, else it is stopped. The print-ast function, for instance, returns nil when 10 traversals are over.

Custom Traversal Functions

Custom Traversal Functions

 

Let us generate text from this grammar, first, with the simple traverser.

Simple Graph Traversal

Simple Graph Traversal

 

As you can see, the sentences are generated and printed on the standard output stream. Because the function print-text returns t (True in Lisp), the traversal is completed in full.

Let us look at the other function that uses AST during traversal.

Traversal with AST

Traversal with AST

In this case, after each traversal, the underlying AST representation is available to our function. the function get-ast-of-node returns the AST corresponding to any node of the grammar. The other interesting fact to note is that the traversal stops after 10 sentences have been generated. Something like this is very useful if the grammar is capable of generating infinite number of sentences. Obviously, we don’t want to generate all sentences in that case!

There are many ways to attach hooks/handlers to control the traversal, as well as to fine-tune the generated data. We shall explore them in coming weeks.

That is it for now. Have a great day!

Tags: , ,

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

Subscribe via RSS Feed

Leave a Reply

Your email address will not be published.

Top