A few days ago, I came across a nice library called RiTa, which is described as a software toolkit for computational literature. Its two major features are text analysis and text generation.
The text analysis module parses given text to extract sentences, tokens, POS, stresses, and phonemes. There is also interesting functionality to conjugate verbs, and to identify word stems. It, however, stops short of deriving the parse tree.
The text generation component is the more interesting one (at least for me). It supports two approaches for generating text: One requires us to input a BNF-like grammar (with optional probabilities for the different nodes) and generates text from that grammar. This is somewhat similar to my own iLangGen project, although there are significant differences between the two. The second approach appears more appealing because it can generate text using sample text and nothing else (internally, it uses Markov chains). I found the latter quite interesting since it requires no complicated training, and decided to give it a spin.
A nice overview of text generation using Markov chains is given in the Tutorials section. For a more recent introduction to Markov chains, see this article.
Since I have to give sample input for generation, I chose two different examples. For my first example, I chose Robert Frost’s poem Mending Wall. I saved this poem in a text file.
The second example is based on my favourite Tamil classic Tirukkural. I chose the widely acclaimed English translation by G.U.Pope. I extracted 50 couplets from the first five chapters of the book (there are 133 chapters in all) and saved them to a text file.
I used Visual Studio Code on my iMac for writing and testing the program. Node.js should be installed on the system, and then you need to install the RiTa package using NPM. You can run the code from within the IDE, and also in the Terminal using node command. The program itself is quite simple and requires no detailed explanation.
Here is the Javascript program:
Here is the first output (10 lines) based on Robert Frost’s example Mending Wall.
When you run the program again, you get a different output:
The following two outputs are generated based on the first 50 couplets of Tirukkural.
And here is one more:
All the outputs look quite realistic, don’t they? Kudos to RiTa and its creator Daniel Howe!
By the way, as I mentioned earlier, we can run the program in the Terminal too. Here is the output (Tirukkural example) from a terminal session:
I am sure the quality and variety of the output would improve if a larger sample is supplied as input. Anyway, I just wanted to get a feel for this work, and I definitely find it interesting.
You can download my program from here.
Enjoy, and have a nice weekend!
Recent Comments