{"id":1045,"date":"2018-09-02T11:10:59","date_gmt":"2018-09-02T05:40:59","guid":{"rendered":"http:\/\/www.rangakrish.com\/?p=1045"},"modified":"2018-09-02T13:17:47","modified_gmt":"2018-09-02T07:47:47","slug":"experimenting-with-lisp-based-monadic-parser-combinator","status":"publish","type":"post","link":"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/","title":{"rendered":"Experimenting with A Lisp-based Monadic Parser Combinator"},"content":{"rendered":"<p>Recently I came across a nice Lisp-based Monadic Parser Combinator <a href=\"https:\/\/github.com\/massung\/parse\" target=\"_blank\" rel=\"noopener\">library<\/a>\u00a0written by Massung. Unlike the traditional parser generators such as ANTLR, this library allows us to <em><strong>embed<\/strong><\/em> the parser in Lisp. Similar libraries are available for other languages too (see, for example <a href=\"https:\/\/github.com\/jon-hanson\/parsecj\" target=\"_blank\" rel=\"noopener\"><em><strong>ParsecJ<\/strong><\/em><\/a> for Java). The original idea of Monadic parser is from Haskell&#8217;s <a href=\"http:\/\/hackage.haskell.org\/package\/parsec\" target=\"_blank\" rel=\"noopener\">Parsec<\/a> library.<\/p>\n<p>For a very readable discussion on <em><strong>Functor<\/strong><\/em>, <em><strong>Applicative<\/strong><\/em>, and <em><strong>Monadic<\/strong><\/em> approaches, see this <a href=\"https:\/\/www.schoolofhaskell.com\/school\/advanced-haskell\/functors-applicative-functors-and-monads\" target=\"_blank\" rel=\"noopener\">article<\/a>. There is a lot of material on the internet about this topic, so I am not going to get into the basics here.<\/p>\n<p>The important point to remember about the library (and those that fall in this category) is that it follows a <em><strong>top-down<\/strong><\/em>, <em><strong>recursive-descent<\/strong><\/em> strategy, and hence cannot handle <em><strong>left-recursive<\/strong><\/em> grammars (however, you can rewrite left-recursive grammars by eliminating left recursion).<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The main usage scenario for this type of parsers would be mini embedded <em><strong>Domain Specific Languages<\/strong><\/em> (DSLs). Because of the inherent ability to backtrack, unless used carefully, these parsers can have performance issues. Memoization can help somewhat.<\/p>\n<p>Massung gives nontrivial parsing\u00a0<a href=\"https:\/\/github.com\/massung\/parse\" target=\"_blank\" rel=\"noopener\">examples<\/a>\u00a0(JSON, XML, etc.) using this library. I played with the library for a while, and as part of trying to understand it, applied it to build a simple English grammar. In today&#8217;s post I would like to discuss my example, and along the way, give you a glimpse of the nice library.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The main parse function is <em><strong>parse<\/strong><\/em>. It takes two arguments, a parser and a lexer. The task of the lexer is to keep supplying tokens to the parser as long as the input stream is non-empty.<\/p>\n<p>Take a look at the <em><strong>read-words<\/strong><\/em> function below.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<figure id=\"attachment_1046\" aria-describedby=\"caption-attachment-1046\" style=\"width: 426px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1046\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/read-words\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png\" data-orig-size=\"426,108\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"read-words Function\" data-image-description=\"&lt;p&gt;read-words Function&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;read-words Function&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png\" class=\"size-full wp-image-1046\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png?resize=426%2C108\" alt=\"read-words Function\" width=\"426\" height=\"108\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png?w=426&amp;ssl=1 426w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/read-words.png?resize=300%2C76&amp;ssl=1 300w\" sizes=\"(max-width: 426px) 100vw, 426px\" \/><\/a><figcaption id=\"caption-attachment-1046\" class=\"wp-caption-text\"><strong>Function: read-words<\/strong><\/figcaption><\/figure>\n<p>It takes a list argument and returns a function that takes no arguments. This returned function is the lexer that the parser will use. As per the library&#8217;s documentation, the lexer must return 2 values, namely, the <em><strong>class<\/strong><\/em> of the in-coming token and the token itself (the latter is optional). However, in my case, because an English word can have multiple <em><strong>parts-of-speech<\/strong><\/em> (for example, <em><strong>sleep<\/strong><\/em> can be a <em><strong>Noun<\/strong><\/em> or <em><strong>Verb<\/strong><\/em> depending on the context), my lexer returns a <em><strong>list<\/strong><\/em> of potential POS categories, not just one.<\/p>\n<p>Another point worth mentioning is that the lexer could read the input from any source and not just from a list, as is the case in my example. As good design dictates, the parser is shielded from this logic.<\/p>\n<p>Let us turn our attention to the parser. In a parser combinator, we can build complex parser objects out of simpler ones. Perhaps the simplest parser in the library is <em><strong>.any<\/strong><\/em>. It matches any token and returns its value.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 1 &gt; (parse (.any) (read-words &#8216;(the dog is sleeping)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">THE<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>The first value printed is the value of the token and the second value indicates whether the parse was successful or not.<\/p>\n<p>Another parser is <em><strong>.many1<\/strong><\/em>. It takes a parser as its single argument and applies that parser one or more times. Let us combine <em><strong>.any<\/strong><\/em> and <em><strong>.many1<\/strong><\/em> thus:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 2 &gt; (parse (.many1 (.any)) (read-words &#8216;(the dog is sleeping)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">(THE DOG IS SLEEPING)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>As expected, <em><strong>.many1<\/strong><\/em> causes the parser to consume <em><strong>.any<\/strong><\/em> token <em><strong>as long as<\/strong><\/em> there is input. So the complete input is read and returned by the parser.<\/p>\n<p>Let us look at a parser combinator I wrote. <em><strong>.is-cat?<\/strong><\/em> takes a POS type as argument and checks whether it matches the POS type of the in-coming token.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 3 &gt; (parse (.is-cat? :pron) (read-words &#8216;(my cat)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">MY<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>The above command checks if the first word is of type <em><strong>:pron<\/strong><\/em> (pronoun), and since <em><strong>my<\/strong><\/em> is a pronoun, the parser succeeds. We could also give a list of possible POS types, in which case, the input token must match one of the given POS types.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 4 &gt; (parse (.is-cat? &#8216;(:verb :noun)) (read-words &#8216;(sleeping is good)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">SLEEPING<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>You may be wondering how the lexer knows about the POS category of the different words it encounters. The answer is simple: I have a <em><strong>toy<\/strong><\/em> lexicon for this purpose (shown later).<\/p>\n<p>By the way, the <em><strong>.is-cat?<\/strong><\/em> parser I have written is slightly different from the library-provided <em><strong>.is<\/strong><\/em>, in that the latter only expects a single token type from the lexer, whereas <em><strong>.is-cat?<\/strong><\/em> permits multiple types (in our case, POS types).<\/p>\n<p>I wrote another parse combinator, a logical extension of the library-provided <em><strong>.do<\/strong><\/em> parser. Whereas <em><strong>.do<\/strong><\/em> applies multiple parsers in sequence and returns the result of the <em><strong>last<\/strong><\/em> parser, my .<em><strong>seq<\/strong><\/em> parser applies multiple parsers in sequence, collects their individual result and finally returns that as a list. As you will see later, this makes some parsing tasks easy.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 5 &gt; (parse (.seq (.is-cat? &#8216;(:det)) (.is-cat? &#8216;(:adj)) (.is-cat? &#8216;(:noun)))<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #0000ff;\"><span class=\"Apple-converted-space\">\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 <\/span>(read-words &#8216;(the cute cat)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">(THE CUTE CAT)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>In the above example, we are trying to match three tokens, a <em><strong>determiner<\/strong><\/em>, an <em><strong>adjective<\/strong><\/em> and a <em><strong>noun<\/strong><\/em>, in that order.<\/p>\n<p>The library has a <em><strong>macro<\/strong><\/em> called <em><strong>define-parser<\/strong><\/em> that allows us to define a parser combinator by stitching together other parsers. This comes in quite handy.<\/p>\n<p>With this background, let us turn our attention to the English parser I have put together for this post. To stress, the grammar is extremely simplified in order to illustrate the use of the library. Besides, there is no semantic check or action associated with the grammar.<\/p>\n<p>Let us start with the Lexicon, which is used by the Lexer:<\/p>\n<figure id=\"attachment_1048\" aria-describedby=\"caption-attachment-1048\" style=\"width: 599px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"1048\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/lexicon\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png\" data-orig-size=\"599,305\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"The Lexicon\" data-image-description=\"&lt;p&gt;The Lexicon&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;The Lexicon&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png\" class=\"size-full wp-image-1048\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png?resize=599%2C305\" alt=\"The Lexicon\" width=\"599\" height=\"305\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png?w=599&amp;ssl=1 599w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/lexicon.png?resize=300%2C153&amp;ssl=1 300w\" sizes=\"(max-width: 599px) 100vw, 599px\" \/><\/a><figcaption id=\"caption-attachment-1048\" class=\"wp-caption-text\"><strong>The Lexicon<\/strong><\/figcaption><\/figure>\n<p>For each word, the lexicon associates a list of possible parts of speech corresponding to that word. The <em><strong>get-categories<\/strong><\/em> function, called by the lexer, looks up a word in the lexicon, and if found, returns the corresponding POS list.<\/p>\n<figure id=\"attachment_1049\" aria-describedby=\"caption-attachment-1049\" style=\"width: 377px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1049\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/get-categories\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png\" data-orig-size=\"377,58\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Function: get-categories\" data-image-description=\"&lt;p&gt;Function: get-categories&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Function: get-categories&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png\" class=\"size-full wp-image-1049\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png?resize=377%2C58\" alt=\"Function: get-categories\" width=\"377\" height=\"58\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png?w=377&amp;ssl=1 377w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/get-categories.png?resize=300%2C46&amp;ssl=1 300w\" sizes=\"(max-width: 377px) 100vw, 377px\" \/><\/a><figcaption id=\"caption-attachment-1049\" class=\"wp-caption-text\"><strong>Function: get-categories<\/strong><\/figcaption><\/figure>\n<p>To follow a uniform convention, I use <em><strong>define-parser<\/strong><\/em> to define a parser combinator for each of the POS categories, and use these to build the next level of Non-terminals.<\/p>\n<figure id=\"attachment_1050\" aria-describedby=\"caption-attachment-1050\" style=\"width: 425px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1050\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/grammar1\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png\" data-orig-size=\"425,468\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Basic POS Categories\" data-image-description=\"&lt;p&gt;Basic POS Categories&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Basic POS Categories&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png\" class=\"size-full wp-image-1050\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png?resize=425%2C468\" alt=\"Basic POS Categories\" width=\"425\" height=\"468\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png?w=425&amp;ssl=1 425w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar1.png?resize=272%2C300&amp;ssl=1 272w\" sizes=\"(max-width: 425px) 100vw, 425px\" \/><\/a><figcaption id=\"caption-attachment-1050\" class=\"wp-caption-text\"><strong>Basic POS Categories<\/strong><\/figcaption><\/figure>\n<p>I have defined <em><strong>noungroup<\/strong><\/em> to encompass <em><strong>Noun<\/strong><\/em>, <em><strong>Pronoun<\/strong><\/em> and <em><strong>Proper Noun<\/strong><\/em>.<\/p>\n<figure id=\"attachment_1051\" aria-describedby=\"caption-attachment-1051\" style=\"width: 251px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/noungroup.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1051\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/noungroup\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/noungroup.png\" data-orig-size=\"251,45\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Category: noungroup\" data-image-description=\"&lt;p&gt;Category: noungroup&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Category: noungroup&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/noungroup.png\" class=\"size-full wp-image-1051\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/noungroup.png?resize=251%2C45\" alt=\"Category: noungroup\" width=\"251\" height=\"45\" \/><\/a><figcaption id=\"caption-attachment-1051\" class=\"wp-caption-text\"><strong>Category: noungroup<\/strong><\/figcaption><\/figure>\n<p>The <em><strong>.either+<\/strong><\/em> parser combinator is a simple extension of Massung&#8217;s built-in <em><strong>.either<\/strong><\/em>, in that it allows more than 2 parsers to be specified. Here is the code for <em><strong>.either+<\/strong><\/em>:<\/p>\n<figure id=\"attachment_1052\" aria-describedby=\"caption-attachment-1052\" style=\"width: 425px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1052\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/eitherplus\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png\" data-orig-size=\"425,121\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Combinator: .either+\" data-image-description=\"&lt;p&gt;Combinator: .either+&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Combinator: .either+&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png\" class=\"size-full wp-image-1052\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png?resize=425%2C121\" alt=\"Combinator: .either+\" width=\"425\" height=\"121\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png?w=425&amp;ssl=1 425w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/eitherplus.png?resize=300%2C85&amp;ssl=1 300w\" sizes=\"(max-width: 425px) 100vw, 425px\" \/><\/a><figcaption id=\"caption-attachment-1052\" class=\"wp-caption-text\"><strong>Combinator: .either+<\/strong><\/figcaption><\/figure>\n<p>Here are the other elements of our toy grammar of English:<\/p>\n<figure id=\"attachment_1053\" aria-describedby=\"caption-attachment-1053\" style=\"width: 487px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1053\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/02\/experimenting-with-lisp-based-monadic-parser-combinator\/grammar2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png\" data-orig-size=\"487,309\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Other Productions\" data-image-description=\"&lt;p&gt;Other Productions&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Other Productions&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png\" class=\"size-full wp-image-1053\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png?resize=487%2C309\" alt=\"Other Productions\" width=\"487\" height=\"309\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png?w=487&amp;ssl=1 487w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png?resize=300%2C190&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/Grammar2.png?resize=140%2C90&amp;ssl=1 140w\" sizes=\"(max-width: 487px) 100vw, 487px\" \/><\/a><figcaption id=\"caption-attachment-1053\" class=\"wp-caption-text\"><strong>Other Rules<\/strong><\/figcaption><\/figure>\n<p><em><strong>.many<\/strong><\/em> invokes a parse combinator zero or more times, while <em><strong>.many1<\/strong><\/em> invokes it one or more times.<\/p>\n<p>Now that our grammar is defined, let us try to parse some sentences.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 6 &gt; (parse &#8216;sentence (read-words &#8216;(my son is studying)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">(MY SON IS STUDYING)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p>\n<p><span style=\"color: #0000ff;\">PARSE-TEST 7 &gt; (parse &#8216;sentence (read-words &#8216;(my son is studying and I am talking to my friend)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">(MY SON IS STUDYING AND I AM TALKING TO MY FRIEND)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p>\n<p><span style=\"color: #0000ff;\">PARSE-TEST 8 &gt; (parse &#8216;sentence (read-words &#8216;(the big dog is playing with the small cute cat)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">(THE BIG DOG IS PLAYING WITH THE SMALL CUTE CAT)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">T<\/span><\/p><\/blockquote>\n<p>What happens if we give an <em><strong>ungrammatical<\/strong><\/em> sentence? The <em><strong>parse<\/strong><\/em> function, by default, signals parse failure.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 9 &gt; (parse &#8216;sentence (read-words &#8216;(all men are mortal)))<\/span><\/p>\n<p><span style=\"color: #0000ff;\">Error: Parse failure<\/span><\/p>\n<p><span style=\"color: #0000ff;\"><span class=\"Apple-converted-space\">\u00a0 <\/span>1 (abort) Return to top loop level 0.<\/span><\/p>\n<p><span style=\"color: #0000ff;\">Type :b for backtrace or :c &lt;option number&gt; to proceed.<\/span><\/p>\n<p><span style=\"color: #0000ff;\">Type :bug-form &#8220;&lt;subject&gt;&#8221; for a bug report template or &#8220;:?&#8221; for other options.<\/span><\/p><\/blockquote>\n<p>We can use the <em><strong>errorp<\/strong><\/em> and <em><strong>error-value<\/strong><\/em> keyword arguments to control this behavior.<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">PARSE-TEST 10 &gt; (parse &#8216;sentence (read-words &#8216;(all men are mortal)) :errorp nil :error-value nil)<\/span><\/p>\n<p><span style=\"color: #0000ff;\">NIL<\/span><\/p>\n<p><span style=\"color: #0000ff;\">NIL<\/span><\/p><\/blockquote>\n<p>The way we have defined our parser functions, no parse tree is getting built during the parsing process. In a real parsing scenario, however, the parser needs to construct an intermediate representation of the parsed sentences in order to facilitate subsequent analysis. Massung&#8217;s library has several functions for keeping track of parsed state at every stage, but in order to keep the discussion simple, I have not used those in this simple example. If time permits, I will discuss this part in a future post.<\/p>\n<p>You can download the source for my example <a href=\"http:\/\/www.rangakrish.com\/downloads\/Parser Experimentation.lisp\" target=\"_blank\" rel=\"noopener\">here<\/a>\u00a0.<\/p>\n<p>Have a great weekend!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I came across a nice Lisp-based Monadic Parser Combinator library\u00a0written by Massung. Unlike the traditional parser generators such as ANTLR, this library allows us to embed the parser in Lisp. Similar libraries are available for other languages too (see, for example ParsecJ for Java). The original idea of Monadic parser is from Haskell&#8217;s Parsec [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[18,17],"tags":[152],"class_list":["post-1045","post","type-post","status-publish","format-standard","hentry","category-lisp","category-programming","tag-monadic-parser-combinator"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9OLnF-gR","jetpack-related-posts":[{"id":1843,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/12\/22\/distributed-computing-with-linda\/","url_meta":{"origin":1045,"position":0},"title":"Distributed Computing with Linda","author":"admin","date":"December 22, 2019","format":false,"excerpt":"Linda, originally proposed by Nicholas Carriero and David Gelernter, is a model of process communication using a small set of well-defined primitives, operating on a tuple space.\u00a0 Interestingly, Sicstus Prolog\u00a0comes with a library that implements Linda (both Server and Client).\u00a0 I played with it a little bit and really enjoyed\u2026","rel":"","context":"In &quot;Programming&quot;","block_context":{"text":"Programming","link":"https:\/\/www.rangakrish.com\/index.php\/category\/programming\/"},"img":{"alt_text":"The Simple Client","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Simple-client-source.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1368,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/","url_meta":{"origin":1045,"position":1},"title":"Parsing Text with Apache OpenNLP","author":"admin","date":"January 8, 2019","format":false,"excerpt":"In my earlier posts I have written about parsing text using spaCy\u00a0and MeaningCloud's parsing API. For today's article, I decided to take a look at OpenNLP, an open-source ML-based Java toolkit for parsing natural language text. OpenNLP is a fairly mature library and has been around since 2004 (source: Wikipedia).\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Parse Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1792,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/11\/23\/using-augmented-transition-networks-atn-for-information-extraction\/","url_meta":{"origin":1045,"position":2},"title":"Using Augmented Transition Networks (ATN) for Information Extraction","author":"admin","date":"November 23, 2019","format":false,"excerpt":"After Wood\u2019s paper [1], Augmented Transition Networks\u00a0(ATN) became popular in the 1970s, for parsing text. An ATN is a generalized transition network with two major enhancements: Support for recursive transitions, including jumping to other ATNs Performing arbitrary actions when edges are traversed Remembering state through the use of registers See\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"ATN for Modality","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":541,"url":"https:\/\/www.rangakrish.com\/index.php\/2017\/06\/04\/definite-clause-grammars-in-lisp-part-2\/","url_meta":{"origin":1045,"position":3},"title":"Definite Clause Grammars in Lisp &#8211; Part 2","author":"admin","date":"June 4, 2017","format":false,"excerpt":"In the last post, I showed how we can implement DCGs in LispWorks using the KnowledgeWorks package. The grammar discussed in that post did not take into account subject\/predicate number agreement. This is one of the basic constraints in English grammar. Today I will show how easy it is to\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"Prolog Grammar","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/06\/Prolog-Grammar.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":2832,"url":"https:\/\/www.rangakrish.com\/index.php\/2022\/06\/12\/definite-clause-grammars-in-lisp-part-4\/","url_meta":{"origin":1045,"position":4},"title":"Definite Clause Grammars in Lisp &#8211; Part 4","author":"admin","date":"June 12, 2022","format":false,"excerpt":"In a series of articles\u00a0written earlier, I had shown how it is possible to model Definite Clause Grammars (DCG) in LispWorks Lisp (Enterprise Edition). We use defgrammar\u00a0in Common Prolog (available as part of KnowledgeWorks package) to define our grammar rules. Here is a toy English grammar represented using defgrammar: This\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"DCG Using Defgrammar","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2022\/06\/defgrammar-version-300x177.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2022\/06\/defgrammar-version-300x177.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2022\/06\/defgrammar-version-300x177.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":575,"url":"https:\/\/www.rangakrish.com\/index.php\/2017\/08\/06\/text-generation-using-ilanggen-framework\/","url_meta":{"origin":1045,"position":5},"title":"Text Generation Using iLangGen Framework","author":"admin","date":"August 6, 2017","format":false,"excerpt":"The two primary areas in Natural Language processing are Natural Language Understanding and Natural Language Generation. The former is concerned with processing and making sense of natural language text, whereas the latter is concerned with synthesizing text, possibly from some deep representation. Both are fascinating and at the same time,\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"iLangGen Grammar","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/08\/Blog1.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/08\/Blog1.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/08\/Blog1.png?resize=525%2C300 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/comments?post=1045"}],"version-history":[{"count":0,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1045\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/media?parent=1045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/categories?post=1045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/tags?post=1045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}